Title: Terms analytics service for CouchDB: a document-based NoSQL

Authors: Richard K. Lomotey; Ralph Deters

Addresses: Department of Computer Science, University of Saskatchewan, Saskatoon, Canada ' Department of Computer Science, University of Saskatchewan, Saskatoon, Canada

Abstract: The reality that the scientific, industry and research communities have to deal with is the potential of 'Big Data'. The high-dimensional data (in digitised format) at our disposal can create opportunities such as discovery of new knowledge, creation of new online communities, and improvement on product and services delivery. The challenge however is that there are only few research, architectural designs and tools that can aid data mining processes from NoSQL databases. By focusing on terms and topic mining, this work proposes a data analytics framework that enables knowledge discovery through information retrieval and filtering from document-based NoSQL (specifically, CouchDB). The tool is algorithmically built and tested based on two methodologies namely: the inference-based apriori and the Baum-Welch algorithm. Preliminary test results of the proposed tool are also discussed based on the accuracy of each proposed algorithm where the inference-based apriori model performs better.

Keywords: data mining; NoSQL databases; Bayesian rule; unstructured data; inference-based apriori; hidden Markov model; HMM; Baum-Welch algorithm; analytics-as-a-service; AaaS; big data; data analytics; knowledge discovery; information retrieval; filtering.

DOI: 10.1504/IJBDI.2015.067567

International Journal of Big Data Intelligence, 2015 Vol.2 No.1, pp.23 - 36

Received: 20 May 2014
Accepted: 22 Aug 2014

Published online: 21 Mar 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article