Unstructured data mining: use case for CouchDB
by Richard K. Lomotey; Ralph Deters
International Journal of Big Data Intelligence (IJBDI), Vol. 2, No. 3, 2015

Abstract: 'Big data' has changed the status quo on digital content creation, storage and management. While data hoarding over the years has followed the structured-style storage approach, the recent nature of digital content, which is widely unstructured, creates the need to adopt different storage techniques. The NoSQL database systems are therefore proposed to accommodate most of the content being generated today. One of such NoSQL databases that have received significant enterprise adoption is the document-append style storage. The problem however is that, research and tools that can aid data mining tasks from such NoSQL databases is generally lacking. Even though document-append style storages allow data accessibility as web services and over URL/I, building a corresponding data mining tool deviates from the underlying techniques governing web crawlers. Also, existing data mining tools that have been designed for schema-based storages (e.g., RDBMS) are misfits. Hence, our goal in this work is to design a data analytics tool that enables knowledge discovery through information retrieval (i.e., terms) from document-append style storage. Three algorithms for terms extraction are tested which are: the inference-based apriori with a Bayesian component, the hidden Markov model, and the Bernoulli process. Overall, the paper proves the accuracy and speed of each algorithm.

Online publication date: Mon, 13-Jul-2015

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Big Data Intelligence (IJBDI):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com