Authors: Kelly K. Yip; David A. Nembhard
Addresses: Department of Industrial and Manufacturing Engineering, Penn State University, University Park, PA 16802, USA ' Department of Industrial and Manufacturing Engineering, Penn State University, University Park, PA 16802, USA
Abstract: Time series data are found in diverse fields including, science, business, medicine and engineering. In this paper, we consider sequential pattern mining for categorical time series data that contain multiple independent time-series. Frequent patterns are considered important in a variety of applications. However, it is common for data to contain noise, and/or for the source process to have considerable variability. Conventional sequential pattern mining methods that use exact matching address, some but not all of these difficulties. Two general approaches used in previous studies to mine sequential patterns in data with noise are distance-based clustering and hidden Markov models. While these approaches are useful in mining frequent sequential patterns in noisy data, we further propose a framework (MWASP: multiple-width approximate sequential pattern mining) that uncovers frequent approximate sequential patterns with various widths. A mined pattern in this framework is representative of a group of sequences that follow the pattern's event flow order. This gives insight into the occurrence of the pattern longitudinally, as well as across the population. The pattern can be recognised as a common pattern across the multiple time series, time, or both.
Keywords: data mining; hidden Markov model; HMM; sequential pattern search; sequential pattern mining; approximate sequential patterns; gaps; time series data; multiple time series.
International Journal of Data Mining, Modelling and Management, 2015 Vol.7 No.2, pp.108 - 129
Published online: 06 May 2015 *Full-text access for editors Access for subscribers Purchase this article Comment on this article