Authors: Priyojit Das; Sujay Saha
Addresses: UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, USA; Department of CSE, National Institute of Technology Calicut, Calicut, Kerala, India ' Department of CSE, Heritage Institute of Technology, Kolkata, West Bengal, India
Abstract: With the advent of single-cell RNA-seq (scRNA-seq) technology, the study of transcriptomic activity at single cell resolution has become extremely popular among the researchers. Analysis of the expression data generated from scRNA-seq technique has the potential to reveal unprecedented information about the heterogeneity in gene expression among tissue cells during both fixed point and development time. Although numerous statistical methods are used to analyse single cell transcriptomes, still pretty much area is open there to design analysis methods for the cell type identification. In this paper, a Hybrid Neighbourhood-Consensus (HNC) clustering algorithm is proposed to identify cellular states from single-cell gene expression data. The hybrid algorithm transforms the original data set by combining k-nearest neighbour adjacency matrix and consensus matrix obtained from single-cell expression matrix and then uses modified k-means algorithm to cluster transformed data set. To compare the performance of the proposed HNC algorithm with other unsupervised clustering methods, we used 12 real scRNA-seq data sets (cell types include - cancer, embryonic, pancreatic, lung and renal cell). From the comparison result, it is found that the HNC algorithm outperforms other standard single-cell analysis methods in terms of three external cluster validation indexes - Adjusted Rand Index, Purity and Normalised Mutual Information.
Keywords: single-cell RNA sequencing; clustering; consensus clustering; neighbourhood graph; embryonic development; clonal heterogeneity.
International Journal of Data Mining and Bioinformatics, 2021 Vol.25 No.3/4, pp.161 - 180
Accepted: 24 Nov 2021
Published online: 13 May 2022 *