Authors: Sangseob Leem; Dae Ho Lee; Taesung Park
Addresses: Department of Statistics, Seoul National University, Seoul, Korea ' Department of Internal Medicine, Gachon University Gil Medical Centre & College of Medicine, Incheon, 21565, Korea ' Department of Statistics, Seoul National University, Seoul, Korea
Abstract: The permutation test, a non-parametric method for assessing statistical significance, now widely used in many disciplines, including bioinformatics, is very useful in situations where a null distribution, of test statistics, is unknown or hard to determine. In permutation tests, the precision of significance depends on the number of permutations, although computation time precludes achieving extremely low p-values. In this paper, we propose a novel strategy, for approximating extremely low p-values. Our proposed method consists of three steps: (1) divide data into subsets and perform permutation tests for the subsets; (2) integrate p-values by Stouffer's z-score method; and (3) repeat the first and second steps, and average them. We herein demonstrate and validate our method, using simulation studies and two real biological examples. Those assessments showed that two p-values of about 1.0e-20 and 1.0e-50 could be well-estimated by the proposed method, in a single day, for samples larger than 5000.
Keywords: permutation test; low p-value; rapid approximation.
International Journal of Data Mining and Bioinformatics, 2018 Vol.21 No.4, pp.352 - 364
Received: 12 Jan 2019
Accepted: 12 Jan 2019
Published online: 30 Mar 2019 *