Title: Estimating data stream tendencies to adapt clustering parameters
Authors: Marcelo Keese Albertini; Rodrigo Fernandes De Mello
School of Computer Science, Federal University of Uberlandia, Uberlandia, MG, Brazil
Institute of Mathematics and Computer Science, University of Sao Paulo, Sao Carlos, SP, Brazil
Abstract: A wide-range of applications based on processing of data streams have emerged in the last decade. They require specialised techniques to obtain representative models and extract information. Traditional data clustering algorithms have been adapted to include continuously arriving data by updating the current model. Most of data stream clustering algorithms aggregate new data into models according to parameters usually set by users. Problems arise when choosing the values of given parameters. When the phenomenon under study is stable, an analysis of a sample of the data stream or a priori knowledge can be used. However, when the behaviour changes over collection, parameters become obsolete and, consequently, the performance is degraded. In this paper, we study the problem of how to automatically adapt control parameters of data stream clustering algorithms. In this sense, we introduce a novel approach to estimate and use data tendencies in order to automatically modify control parameters. We present a proof of the convergence of our approach towards an ideal and unknown value of the control parameter. Experimental results confirm the estimation of data tendency improves learning control parameterisation.
Keywords: big data; data clustering; data stream; data sequence; adaptive clustering; data analysis.
Int. J. of High Performance Computing and Networking, 2018 Vol.11, No.1, pp.34 - 44
Available online: 11 Dec 2017