Title: Mining gene-centric relationships from literature: the roles of gene mutation and gene expression in supporting drug discovery

Authors: Luis Tari; Jagruti Patel; Jan Küntzer; Ying Li; Zhengwei Peng; Yuan Wang; Laura Aguiar; James Cai

Addresses: Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Roche Diagnostics GmbH, 82377 Penzberg, Bavaria, Germany ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, F. Hoffmann-La Roche AG, 4070 Basel, Switzerland ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA ' Pharmaceutical Research and Early Development (pRED) Informatics, Hoffmann-La Roche Inc., Nutley, NJ 07110, USA

Abstract: Identifying drug target candidates is an important task for early development throughout the drug discovery process. This process is supported by the development of new high-throughput technologies that enable better understanding of disease mechanism. It becomes critical to facilitate effective analysis of the large amount of biological data. However, with much of the biological knowledge represented in the literature in the form of natural text, analysis and interpretation of high-throughput data has not reached its potential effectiveness. In this paper, we describe our solution in employing text mining as a technique in finding scientific information for target and biomarker discovery from the biomedical literature. Our approach utilises natural language processing techniques to capture linguistic patterns for the extraction of biological knowledge from text. Additionally, we discuss how the extracted knowledge is used for the analysis of biological data such as next-generation sequencing and gene expression data.

Keywords: literature text mining; drug discovery; gene mutations; phenotypes; gene expression; drug targets; biomarkers; information extraction; natural language processing; NLP; knowledge extraction; biological knowledge; bioinformatics.

DOI: 10.1504/IJDMB.2014.064888

International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.4, pp.357 - 373

Received: 03 May 2012
Accepted: 04 May 2012

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article