Authors: Jiarui Ding; Sohrab Shah
Addresses: Department of Computer Science, University of British Columbia; Department of Molecular Biology, BC Cancer Agency, Vancouver, BC, V5T 4E6, Canada ' Department of Computer Science, University of British Columbia; Department of Molecular Biology, BC Cancer Agency; Department of Pathology, University of British Columbia, Vancouver, BC, V5T 4E6, Canada
Abstract: Hidden semi-Markov models are effective at modelling sequences with succession of homogenous zones by choosing appropriate state duration distributions. To compensate for model mis-specification and provide protection against outliers, we design a robust hidden semi-Markov model with Student's t mixture models as the emission distributions. The proposed approach is used to model array based comparative genomic hybridization data. Experiments conducted on the benchmark data from the Coriell cell lines, and glioblastoma multiforme data illustrate the reliability of the technique.
Keywords: array CGH data; copy number variation; hidden semi-Markov models; discriminative training; Student's t distribution; sequence modelling; bioinformatics; comparative genomic hybridisation; Coriell cell lines; glioblastoma multiforme.
International Journal of Data Mining and Bioinformatics, 2013 Vol.8 No.4, pp.427 - 442
Received: 04 May 2011
Accepted: 04 May 2011
Published online: 20 Oct 2014 *