Title: Estimating data stream tendencies to adapt clustering parameters

Authors: Marcelo Keese Albertini; Rodrigo Fernandes De Mello

Addresses: School of Computer Science, Federal University of Uberlandia, Uberlandia, MG, Brazil ' Institute of Mathematics and Computer Science, University of Sao Paulo, Sao Carlos, SP, Brazil

Abstract: A wide-range of applications based on processing of data streams have emerged in the last decade. They require specialised techniques to obtain representative models and extract information. Traditional data clustering algorithms have been adapted to include continuously arriving data by updating the current model. Most of data stream clustering algorithms aggregate new data into models according to parameters usually set by users. Problems arise when choosing the values of given parameters. When the phenomenon under study is stable, an analysis of a sample of the data stream or a priori knowledge can be used. However, when the behaviour changes over collection, parameters become obsolete and, consequently, the performance is degraded. In this paper, we study the problem of how to automatically adapt control parameters of data stream clustering algorithms. In this sense, we introduce a novel approach to estimate and use data tendencies in order to automatically modify control parameters. We present a proof of the convergence of our approach towards an ideal and unknown value of the control parameter. Experimental results confirm the estimation of data tendency improves learning control parameterisation.

Keywords: big data; data clustering; data stream; data sequence; adaptive clustering; data analysis.

DOI: 10.1504/IJHPCN.2018.088877

International Journal of High Performance Computing and Networking, 2018 Vol.11 No.1, pp.34 - 44

Received: 12 Jan 2016
Accepted: 16 Jan 2016

Published online: 11 Dec 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article