Title: RSenter: terms mining tool from unstructured data sources

Authors: Richard K. Lomotey; Ralph Deters

Addresses: Department of Computer Science, University of Saskatchewan, Saskatoon S7N 5C9, Canada ' Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9, Canada

Abstract: The emergence of 'Big Data' is changing the data storage status quo at the business and corporate level. Previously, relational databases have been employed to accommodate business-related digital records but in today's data economy, the data is unstructured which puts limitations on relational databases. Thus, NoSQL databases have been proposed to contain the unstructured data which is chiefly schema-less, textual, file-based, and so on. However, the rise of unstructured data and the adoption of NoSQL storages lead to emerging challenges that call for active research. Firstly, existing data mining techniques are designed for schema-based data storages and are inapplicable to NoSQL storages. Secondly, NoSQL storages are from different vendors (or, providers) so require the understanding of multiple APIs to generate queries. These two challenges hinder data extraction for most businesses since information stored can be lost due to inaccessibility. Our ongoing research has therefore proposed a tool called RSenter that aids terms mining from unstructured data storages. Specific to NoSQL storages that are document-oriented, we detail the architectural design, the algorithms, and the benefits that distinguish the tool from other existing frameworks. Significantly, RSenter performs the required mining tasks in real-time which is crucial for business continuity.

Keywords: data mining; unstructured data; information extraction; terms mining; Big Data; NoSQL; text analytics; lemmatisation; data storage; business continuity.

DOI: 10.1504/IJBPIM.2013.059136

International Journal of Business Process Integration and Management, 2013 Vol.6 No.4, pp.298 - 311

Published online: 31 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article