Title: An iterative searching and ranking algorithm for prioritising pharmacogenomics genes

Authors: Rong Xu; QuanQiu Wang

Addresses: Medical Informatics Division, Case Western Reserve University, Cleveland, OH 44106, USA ' ThinTek LLC, Palo Alto, CA 94306, USA

Abstract: Pharmacogenomics (PGx) studies are to identify genetic variants that may affect drug efficacy and toxicity. A machine understandable drug-gene relationship knowledge is important for many computational PGx studies and for personalised medicine. A comprehensive and accurate PGx-specific gene lexicon is important for automatic drug-gene relationship extraction from the scientific literature, rich knowledge source for PGx studies. In this study, we present a bootstrapping learning technique to rank 33,310 human genes with respect to their relevance to drug response. The algorithm uses only one seed PGx gene to iteratively extract and rank co-occurred genes using 20 million MEDLINE abstracts. Our ranking method is able to accurately rank PGx-specific genes highly among all human genes. Compared to randomly ranked genes (precision: 0.032, recall: 0.013, F1: 0.018), the algorithm has achieved significantly better performance (precision: 0.861, recall: 0.548, F1: 0.662) in ranking the top 2.5% of genes.

Keywords: pharmacogenomics; text mining; NLP; natural language processing; personalised medicine; iterative searching; ranking; pharmacogenomic genes; genetic variants; drug efficacy; drug toxicity; drug-gene relationship extraction; bootstrapping learning; biomedical literature; information retrieval.

DOI: 10.1504/IJCBDD.2013.052199

International Journal of Computational Biology and Drug Design, 2013 Vol.6 No.1/2, pp.18 - 31

Published online: 18 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article