Title: An efficient implementation of k-means clustering for time series data with DTW distance

Authors: Duong Tuan Anh; Le Huu Thanh

Addresses: Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, 268 Ly Thuong Kiet, Dist. 10, Ho Chi Minh City, Vietnam ' Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, 268 Ly Thuong Kiet, Dist. 10, Ho Chi Minh City, Vietnam

Abstract: Time series clustering is one of the crucial tasks in time series data mining. The most popular method in time series clustering is k-means algorithm due to its simplicity and flexibility. So far, k-means for time series clustering has been most used with Euclidean distance. Dynamic time warping (DTW) distance measure has increasingly been used as a similarity measurement for various data mining tasks in place of traditional Euclidean distance due to its superiority in sequence-alignment flexibility. However, there exist some difficulties in clustering with DTW distance, for example, the problem of shape averaging in DTW or the problem of speeding up DTW distance calculation. In this paper, we compare the performance of the three shape averaging methods in DTW: nonlinear alignment and averaging filter (NLAAF), prioritised shape averaging (PSA) and DTW barycenter averaging (DBA) and propose an efficient method to implement k-means clustering for time series data with DTW distance. In our method, we choose to use DBA method for shape-based time series averaging, apply early abandoning method for speeding up DTW distance calculation and median-based method for determining initial centroids for k-means clustering. The experimental results on benchmark datasets validate our proposed implementation method for time series k-means clustering with DTW.

Keywords: time series data mining; dynamic time warping; DTW; k-means clustering; time series shape averaging; early abandoning; centroid initialisation.

DOI: 10.1504/IJBIDM.2015.071311

International Journal of Business Intelligence and Data Mining, 2015 Vol.10 No.3, pp.213 - 232

Received: 27 Oct 2014
Accepted: 30 Nov 2014

Published online: 20 Aug 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article