Title: Boosting gene expression clustering with system-wide biological information: a robust autoencoder approach
Authors: Hongzhu Cui; Chong Zhou; Xinyu Dai; Yuting Liang; Randy Paffenroth; Dmitry Korkin
Addresses: Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 010609, USA ' Data Science Program, Worcester Polytechnic Institute, Worcester, MA 010609, USA ' Data Science Program, Worcester Polytechnic Institute, Worcester, MA 010609, USA ' Data Science Program, Worcester Polytechnic Institute, Worcester, MA 010609, USA ' Data Science Program, Worcester Polytechnic Institute, Worcester, MA 010609, USA; Mathematics Department, Worcester Polytechnic Institute, Worcester, MA 010609, USA; Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 010609, USA ' Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 010609, USA; Data Science Program, Worcester Polytechnic Institute, Worcester, MA 010609, USA; Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 010609, USA
Abstract: One of the first computational steps in exploration and analysis of the gene expression data is clustering. However, most of the standard clustering methods do not take prior biological information into account. Here, we propose a new approach for gene expression clustering analysis. The approach benefits from a new deep learning architecture, Robust Autoencode, and from incorporating prior system-wide biological information into the clustering process. We tested our approach on two gene expression datasets. Our approach outperformed all other clustering methods on the labelled yeast gene expression dataset. Furthermore, we showed that it is better in identifying the functionally common clusters on the unlabelled human gene expression dataset. The results demonstrate that our new deep learning architecture can generalise well the specific properties of gene expression profiles. Furthermore, the results confirm our hypothesis that the prior biological network knowledge is helpful in the gene expression clustering.
Keywords: gene expression; PPIs; protein-protein interactions; clustering; deep learning.
DOI: 10.1504/IJCBDD.2020.105113
International Journal of Computational Biology and Drug Design, 2020 Vol.13 No.1, pp.98 - 123
Received: 26 Jan 2019
Accepted: 15 Feb 2019
Published online: 13 Feb 2020 *