Authors: Rong Xu; QuanQiu Wang
Addresses: Medical Informatics Division, Case Western Reserve University, Cleveland, OH 44106, USA ' ThinTek LLC, Palo Alto, CA 94306, USA
Abstract: Pharmacogenomics (PGx) studies are to identify genetic variants that may affect drug efficacy and toxicity. A machine understandable drug-gene relationship knowledge is important for many computational PGx studies and for personalised medicine. A comprehensive and accurate PGx-specific gene lexicon is important for automatic drug-gene relationship extraction from the scientific literature, rich knowledge source for PGx studies. In this study, we present a bootstrapping learning technique to rank 33,310 human genes with respect to their relevance to drug response. The algorithm uses only one seed PGx gene to iteratively extract and rank co-occurred genes using 20 million MEDLINE abstracts. Our ranking method is able to accurately rank PGx-specific genes highly among all human genes. Compared to randomly ranked genes (precision: 0.032, recall: 0.013, F1: 0.018), the algorithm has achieved significantly better performance (precision: 0.861, recall: 0.548, F1: 0.662) in ranking the top 2.5% of genes.
Keywords: pharmacogenomics; text mining; NLP; natural language processing; personalised medicine; iterative searching; ranking; pharmacogenomic genes; genetic variants; drug efficacy; drug toxicity; drug-gene relationship extraction; bootstrapping learning; biomedical literature; information retrieval.
International Journal of Computational Biology and Drug Design, 2013 Vol.6 No.1/2, pp.18 - 31
Published online: 20 Feb 2013 *Full-text access for editors Access for subscribers Purchase this article Comment on this article