Title: Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State model

Authors: Deyu Zhou, Yulan He, Chee Keong Kwoh

Addresses: Informatics Research Centre, University of Reading, 3rd Floor, Philip Lyle Building, Whiteknights, Reading RG6 6BX, UK. ' Informatics Research Centre, University of Reading, 3rd Floor, Philip Lyle Building, Whiteknights, Reading RG6 6BX, UK. ' School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, 639798 Singapore

Abstract: A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure. When applied in extracting protein-protein interactions, we found that it performed better than other established statistical methods and achieved 61.5% in F-score with balanced recall and precision values. Moreover, the statistical nature of the pure data-driven HVS model makes it intrinsically robust and it can be easily adapted to other domains.

Keywords: information extraction; hidden vector state model; protein-protein interactions; PPIs; bioinformatics; MEDLINE; text mining; biomedical literature.

DOI: 10.1504/IJBRA.2008.017164

International Journal of Bioinformatics Research and Applications, 2008 Vol.4 No.1, pp.64 - 80

Published online: 17 Feb 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article