Title: Automated information retrieval for quantitative risk assessment data

Authors: Jerry R. McGovern, Kathryn C. Dowling

Addresses: San Francisco Municipal Transportation Agency, 1 South Van Ness Avenue, San Francisco, CA 94103, USA. ' EQUIPS Initiative, Apartado Postal 18212, 28080 Madrid, Spain

Abstract: The amount of toxicologic literature available can be so copious as to present significant challenges to risk assessors tasked with identifying key studies. As a new approach to managing such information, an information specialist and a toxicologist developed an open source text mining computer program consisting of knowledge bases and search algorithms. Quantitative toxicologic data, such as dose levels or risk numbers, are often presented in the abstracts of scientific literature records, which, in turn, include full or partial abstracts. We chose to examine records containing human blood lead concentration (HBLC) data. The resulting program (HBLCFinder) searches for lead concentration data in a record|s abstract then determines the record|s relevancy to human blood. After several iterative modifications, we achieved recall (sensitivity), specificity and precision of 86%, 99% and 96%, respectively. The approach may be of use to risk assessors needing to identify quantitative data in online database records.

Keywords: human blood; lead concentration; quantitative concentration data; literature review; PubMed; risk assessment; toxicoinformatics; automated information retrieval; text mining; search engines; answer set; knowledge base; toxicology literature; open source; online databases.

DOI: 10.1504/IJRAM.2009.030703

International Journal of Risk Assessment and Management, 2009 Vol.13 No.3/4, pp.328 - 344

Accepted: 31 Mar 2009
Published online: 30 Dec 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article