Title: A robust hidden semi-Markov model with application to aCGH data processing

Authors: Jiarui Ding; Sohrab Shah

Addresses: Department of Computer Science, University of British Columbia; Department of Molecular Biology, BC Cancer Agency, Vancouver, BC, V5T 4E6, Canada ' Department of Computer Science, University of British Columbia; Department of Molecular Biology, BC Cancer Agency; Department of Pathology, University of British Columbia, Vancouver, BC, V5T 4E6, Canada

Abstract: Hidden semi-Markov models are effective at modelling sequences with succession of homogenous zones by choosing appropriate state duration distributions. To compensate for model mis-specification and provide protection against outliers, we design a robust hidden semi-Markov model with Student's t mixture models as the emission distributions. The proposed approach is used to model array based comparative genomic hybridization data. Experiments conducted on the benchmark data from the Coriell cell lines, and glioblastoma multiforme data illustrate the reliability of the technique.

Keywords: array CGH data; copy number variation; hidden semi-Markov models; discriminative training; Student's t distribution; sequence modelling; bioinformatics; comparative genomic hybridisation; Coriell cell lines; glioblastoma multiforme.

DOI: 10.1504/IJDMB.2013.056616

International Journal of Data Mining and Bioinformatics, 2013 Vol.8 No.4, pp.427 - 442

Received: 04 May 2011
Accepted: 04 May 2011

Published online: 20 Oct 2014 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article