Article: An aggregation method for sparse logistic regression Journal: International Journal of Data Mining and Bioinformatics (IJDMB) 2017 Vol.17 No.1 pp.85 - 96 Abstract: L1 regularised logistic regression has now become a workhorse of data mining and bioinformatics: it is widely used for many classification problems, particularly ones with many features. However, L1 regularisation typically selects too many features and that so-called false positives are unavoidable. In this paper, we demonstrate and analyse an aggregation method for sparse logistic regression in high dimensions. This approach linearly combines the estimators from a suitable set of logistic models with different underlying sparsity patterns and can balance the predictive ability and model interpretability. Numerical performance of our proposed aggregation method is then investigated using simulation studies. We also analyse a published genome-wide case-control dataset to further evaluate the usefulness of the aggregation method in multi-locus association mapping. Inderscience Publishers - linking academia, business and industry through research

Title: An aggregation method for sparse logistic regression

Authors: Zhe Liu

Addresses: Department of Statistics, University of Chicago, 5734 S. University Avenue, Chicago, IL 60637, USA

Abstract: L₁ regularised logistic regression has now become a workhorse of data mining and bioinformatics: it is widely used for many classification problems, particularly ones with many features. However, L₁ regularisation typically selects too many features and that so-called false positives are unavoidable. In this paper, we demonstrate and analyse an aggregation method for sparse logistic regression in high dimensions. This approach linearly combines the estimators from a suitable set of logistic models with different underlying sparsity patterns and can balance the predictive ability and model interpretability. Numerical performance of our proposed aggregation method is then investigated using simulation studies. We also analyse a published genome-wide case-control dataset to further evaluate the usefulness of the aggregation method in multi-locus association mapping.

Keywords: logistic regression; aggregation; sparse model; sample-splitting; Markov chain Monte Carlo method; genome-wide association study.

DOI: 10.1504/IJDMB.2017.084028

International Journal of Data Mining and Bioinformatics, 2017 Vol.17 No.1, pp.85 - 96

Accepted: 08 Mar 2017
Published online: 03 May 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: An aggregation method for sparse logistic regression

Keep up-to-date