Int. J. of Data Mining and Bioinformatics   »   2016 Vol.14, No.4

 

 

Title: BioTopic: a topic-driven biological literature mining system

 

Authors: Xi Wang; Peiyan Zhu; Tao Liu; Ke Xu

 

Addresses:
State Key Lab of Software Development Environment, Beihang University, Beijing 100191, China
State Key Lab of Software Development Environment, Beihang University, Beijing 100191, China
State Key Lab of Software Development Environment, Beihang University, Beijing 100191, China
State Key Lab of Software Development Environment, Beihang University, Beijing 100191, China

 

Abstract: Biology and biomedicine are flourishing disciplines, with massive biological data produced in experiments and huge amount of research papers published in journals. In such a big data context, unsupervised data mining methods such as topic models are used to extract topics from large-scale document collections. In this paper, we present a biological literature mining system based on topic modelling (BioTopic). Experiments show that the perplexity reduction percentage of our pre-processing method is 5% larger that of a traditional pre-processing method. The precision of our search performance reaches 86%, which is better that that of a unigram language model. Our method employs linguistic information from shallow parsing to better pre-process biological literature for topic models. BioTopic with fine-grained pre-processing and topic modelling works better than traditional literature mining systems.

 

Keywords: biological literature; biological topics; topic modelling; topic mining; big data; data mining; shallow parsing; fine-grained pre-processing; bioinformatics.

 

DOI: 10.1504/IJDMB.2016.075822

 

Int. J. of Data Mining and Bioinformatics, 2016 Vol.14, No.4, pp.373 - 386

 

Submission date: 06 Sep 2015
Date of acceptance: 18 Nov 2015
Available online: 06 Apr 2016

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article