Title: Patient-centered yes/no prognosis using learning machines

Authors: I.R. Konig, J.D. Malley, S. Pajevic, C. Weimar, H-C. Diener, A. Ziegler

Addresses: Institut fur Medizinische Biometrie und Statistik, Universitat zu Lubeck, Ratzeburger Allee 160, 23538 Lubeck, Germany. ' Center for Information Technology, National Institutes of Health, Bethesda, MD, USA. ' Center for Information Technology, National Institutes of Health, Bethesda, MD, USA. ' Klinik und Poliklinik fur Neurologie, Universitat Duisburg-Essen, Germany. ' Klinik und Poliklinik fur Neurologie, Universitat Duisburg-Essen, Germany. ' Institut fur Medizinische Biometrie und Statistik, Universitat zu Lubeck, Ratzeburger Allee 160, 23538 Lubeck, Germany

Abstract: In the last 15 years several machine learning approaches have been developed for classification and regression. In an intuitive manner we introduce the main ideas of classification and regression trees, support vector machines, bagging, boosting and random forests. We discuss differences in the use of machine learning in the biomedical community and the computer sciences. We propose methods for comparing machines on a sound statistical basis. Data from the German Stroke Study Collaboration is used for illustration. We compare the results from learning machines to those obtained by a published logistic regression and discuss similarities and differences.

Keywords: bagging; boosting; random forests; acute ischemic strokes; support vector machines; SVM; machine learning; data mining; bioinformatics; classification; regression trees; patient-centred prognosis; prognostic studies; biomedical prognosis; clinical epidemiology; tutorial; medical prognosis.

DOI: 10.1504/IJDMB.2008.022149

International Journal of Data Mining and Bioinformatics, 2008 Vol.2 No.4, pp.289 - 341

Published online: 21 Dec 2008 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article