Int. J. of Data Mining and Bioinformatics   »   2012 Vol.6, No.4

 

 

Title: Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches

 

Authors: Yanxin Lu; Hua Xu; Neeraja B. Peterson; Qi Dai; Min Jiang; Joshua C. Denny; Mei Liu

 

Addresses:
Department of Human Anatomy, Histology and Embryology, Fudan University, 138 Yi Xue Yuan Road, Shanghai 200032, China; Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention (MICCAI), Fudan University, 138 Yi Xue Yuan Road, Shanghai 200032, China
Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA
Division of General Internal Medicine and Public Health, Department of Medicine, Vanderbilt University, Suite 6000 Medical Centre East, North Tower, Nashville, TN 37232, USA
Division of Epidemiology, Department of Medicine, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203-1738, USA
Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA
Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA
Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA

 

Abstract: Much epidemiologic information resides in literature, which is not in a computable format. To extract information and build knowledge bases of epidemiologic studies, we developed a system to extract noun phrases about epidemiologic exposures and outcomes. The system consists of two components: a natural language processing (NLP) engine a machine learning (ML) based classifier. Four ML algorithms were applied and compared over different feature sets. To evaluate the performance of the system, we manually constructed an annotated dataset. The system achieved the highest F-measure of 82.0% for extracting exposure terms, and 70% for extracting outcome terms.

 

Keywords: biomedical literature mining; machine learning; NLP; natural language processing; epidemiology; term extraction; evidence-based medicine; epidemiologic exposure; epidemiologic outcomes; noun phrases; bioinformatics; data mining.

 

DOI: 10.1504/IJDMB.2012.049284

 

Int. J. of Data Mining and Bioinformatics, 2012 Vol.6, No.4, pp.447 - 459

 

Submission date: 08 Mar 2011
Date of acceptance: 30 Oct 2011
Available online: 28 Sep 2012

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article