Title: Evaluation of biological and technical variations in low-input RNA-Seq and single-cell RNA-Seq

Authors: Fan Gao; Jae Mun Kim; JiHong Kim; Ming-Yi Lin; Charles Y. Liu; Jonathan J. Russin; Christopher P. Walker; Reymundo Dominguez; Adrian Camarena; Joseph D. Nguyen; Jennifer Herstein; William Mack; Oleg V. Evgrafov; Robert H. Chow; James A. Knowles; Kai Wang

Addresses: Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Department of Neurosurgery, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Department of Neurosurgery, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA ' Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA

Abstract: Background: Low-input or single-cell RNA-Seq are widely used today, but two technical questions remain: 1) in technical replicates, what proportion of noises comes from input RNA quantity rather than variation of bioinformatics tools?; 2) In single neurons, whether variation in gene expression is attributable to biological heterogeneity or just random noise? To examine the sources of variability, we have generated RNA-Seq data from low-input (10/100/1000pg) reference RNA samples and 38 single neurons from human brains. Results: For technical replicates, the quantity of input RNA is negatively correlated with expression variation. For genes in the medium- and high-expression groups, input RNA amount explains most of the variation, whereas bioinformatic pipelines explain some variation for the low-expression group. The t-distributed stochastic neighbour embedding (t-SNE) method reveals data-inherent aggregation of low-input replicate data, and suggests heterogeneity of single pyramidal neuron transcriptome. Interestingly, expression variation in single neurons is biologically relevant. Conclusions: We found that differences in bioinformatics pipelines do not present a major source of variation.

Keywords: RNA-Seq; single-cell sequencing; bioinformatics; TopHat; RNA-Seq by expectation maximisation; RSEM; t-distributed stochastic neighbour embedding; t-SNE; principal component analysis; PCA; annotate variation; ANNOVAR; variance.

DOI: 10.1504/IJCBDD.2018.090839

International Journal of Computational Biology and Drug Design, 2018 Vol.11 No.1/2, pp.5 - 22

Available online: 23 Mar 2018

Full-text access for editors Access for subscribers Purchase this article Comment on this article