Title: A text-mining technique for extracting gene-disease associations from the biomedical literature

Authors: Hisham Al-Mubaid, Rajit K. Singh

Addresses: School of Science and Computer Engineering, University of Houston-Clear Lake, 2700 Bay Area Blvd, Box 40, Houston, Texas 77058, USA. ' School of Science and Computer Engineering, University of Houston-Clear Lake, 2700 Bay Area Blvd, Box 40, Houston, Texas 77058, USA

Abstract: We propose a new text mining technique to identify associations between biological entities, specifically genes-diseases associations, from the biomedical literature. The proposed method is very simple and straightforward; it uses two sets (a positive set and a negative set) of documents and utilises the concepts of expectation (ex), evidence (ev), and Z-scores in combining positive and negative evidences in determining the significant gene-disease associations from Medline documents. Moreover, the method offers an efficient way to handle gene names, aliases, symbols, and abbreviations. We evaluated the method in discovering gene-to-disease associations from literature and the experimental results are impressive. We verified our results and confirmed the effectiveness of the proposed technique by various ways. For example, we ran the technique on some discovered and known genes-diseases relationships. Our method was able to discover associations between genes and various diseases like Amyotrophic lateral sclerosis, Tuberous Sclerosis, Autism, Homocystinuria, Bipolar Disorder, Atherosclerosis and more.

Keywords: gene-disease associations; biomedical knowledge discovery; biomedical information extraction; text mining; mining biomedical literature; bioinformatics; genes; diseases.

DOI: 10.1504/IJBRA.2010.034075

International Journal of Bioinformatics Research and Applications, 2010 Vol.6 No.3, pp.270 - 286

Received: 31 Dec 2008
Accepted: 28 Sep 2009

Published online: 07 Jul 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article