Title: Sample-to-sample p-value variability and its implications for multivariate analysis

Authors: Wei Wang; Wilson Wen Bin Goh

Addresses: School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, P.R. China ' School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, P. R. China; Department of Bioengineering, Tianjin University, Tianjin, P. R. China

Abstract: Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that p-values are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.

Keywords: p-value; statistical feature selection; t-test; variability; Wilcoxon rank-sum test.

DOI: 10.1504/IJBRA.2018.092691

International Journal of Bioinformatics Research and Applications, 2018 Vol.14 No.3, pp.235 - 254

Received: 19 May 2016
Accepted: 19 Oct 2016

Published online: 07 Dec 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article