You can view the full text of this article for free using the link below.

Title: Improved biomedical document retrieval system with PubMed term statistics and expansions

Authors: Huian Li, Jake Yue Chen

Addresses: University Information Technology Services, Indiana University, Indianapolis, IN 46202, USA. ' Department of Computer and Information Science Purdue University School of Science, Indiana University School of Informatics, Indianapolis, IN 46202, USA

Abstract: Large biomedical abstract databases such as MEDLINE enable users to search for large bodies of biomedical knowledge quickly. In this study, we describe a new framework to improve the performance of MEDLINE document retrieval. We first analysed and built a normalized term frequency distributions for 1.8 million terms by sampling from 1,500,000 MEDLINE abstracts. Then, we developed a statistical model to identify significantly observed terms (|gists|) in a document as additional document keywords to help improve document retrieval precisions. To improve document recalls, we integrated several biological ontologies that can expand user queries with semantically compatible terms. The framework was implemented in Oracle 10g.

Keywords: term statistics; ontology expansion; information retrieval; gists; biomedical documents; MEDLINE document retrieval; PubMed; retrieval precision; biological ontologies; significant terms; biomedical literature; biomedical research papers.

DOI: 10.1504/IJCIBSB.2009.024052

International Journal of Computational Intelligence in Bioinformatics and Systems Biology, 2009 Vol.1 No.1, pp.74 - 85

Available online: 24 Mar 2009

Full-text access for editors Access for subscribers Free access Comment on this article