Authors: Gulshan Sharif; Tahir Mehmood
Addresses: Department of Mathematics and Statistics, Riphah International University, Islamabad, Pakistan ' School of Natural Sciences, National University of Sciences and Technology, Islamabad, Pakistan
Abstract: For yeast genotype phenotype mapping, where small set of influential genes are supposed to explain the variation in phenotypes, Partial Least Squares (PLS) had been used for influential variables i.e. genes selection. Modelling the PLS loading weights, which is an essential indicator for variable selection in PLS, through probability distribution has shown success in variable selection. We have revisited the yeast genotype phenotype mapping, where PLS loading weights appeared to be leptokurtic. Hence modelling the PLS loading weights with leptokurtic i.e. Laplace distributions can improves the yeast mapping. We have introduced the Laplace-PLS where leptokurtic PLS loading weights are modelled for influential gene selection. The comparison of genotype phenotype mapping through Laplace-PLS is made with PLS, Soft-threshold PLS (Soft-PLS), uninformative variable elimination in PLS (UVE-PLS) and distribution based truncation in PLS (Trunc-PLS). Monte-Carlo simulation has been used for parameter estimation and performance assessment. The PLS methods are evaluated through the predicted root means square error (RMSE), number of influential genes and selectivity index. Results indicate the Laplace-PLS results in least RMSE with smaller number of influential genes and with higher consistency level. Genotype phenotype mapping is explained through the background information like existence of premature stop codons, copy number variations, frame shift mutations, etc.
Keywords: partial least squares; variable selection; genomics; genotype phenotype mapping; leptokurtic distribution.
International Journal of Data Mining and Bioinformatics, 2018 Vol.21 No.1, pp.18 - 31
Received: 12 Feb 2018
Accepted: 22 Jul 2018
Published online: 09 Oct 2018 *