Authors: Richard K. Lomotey; Ralph Deters
Addresses: Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9, Canada ' Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9, Canada
Abstract: Today's high-dimensional data, which is mostly unstructured, makes data patterns discovery (a.k.a. data mining) challenging and difficult for services engineers. Unstructured data mining deviates from existing information extraction methodologies that have been previously put forward due to the fact that recent data formation and storage has no standard schema; and the data is heterogeneous. While the topic is receiving significant attention recently from both the industry and academia, in this work, we aim at performing term association mining from distributed unstructured data storages. To achieve this goal, an analytics-as-a-service (AaaS) framework is proposed that theoretically relies on the Bernoulli algorithm to ensure the accurate determination association between terms. Specifically, the tool is applied to document-oriented data storages where the CouchDB data storage is employed for testing. The pilot evaluation of the proposed AaaS framework for the extraction of mining medical terms shows high accuracy and reliability regarding association maps.
Keywords: Bernoulli algorithm; association rules; Big Data; analytics-as-a-service; AaaS; unstructured data; data mining; terms association mining; medical terms.
International Journal of Business Process Integration and Management, 2014 Vol.7 No.1, pp.49 - 61
Published online: 31 Jul 2014 *Full-text access for editors Access for subscribers Purchase this article Comment on this article