Article: Mining approximate sequential patterns with gaps Journal: International Journal of Data Mining, Modelling and Management (IJDMMM) 2015 Vol.7 No.2 pp.108 - 129 Abstract: Time series data are found in diverse fields including, science, business, medicine and engineering. In this paper, we consider sequential pattern mining for categorical time series data that contain multiple independent time-series. Frequent patterns are considered important in a variety of applications. However, it is common for data to contain noise, and/or for the source process to have considerable variability. Conventional sequential pattern mining methods that use exact matching address, some but not all of these difficulties. Two general approaches used in previous studies to mine sequential patterns in data with noise are distance-based clustering and hidden Markov models. While these approaches are useful in mining frequent sequential patterns in noisy data, we further propose a framework (MWASP: multiple-width approximate sequential pattern mining) that uncovers frequent approximate sequential patterns with various widths. A mined pattern in this framework is representative of a group of sequences that follow the pattern's event flow order. This gives insight into the occurrence of the pattern longitudinally, as well as across the population. The pattern can be recognised as a common pattern across the multiple time series, time, or both. Inderscience Publishers - linking academia, business and industry through research

Title: Mining approximate sequential patterns with gaps

Authors: Kelly K. Yip; David A. Nembhard

Addresses: Department of Industrial and Manufacturing Engineering, Penn State University, University Park, PA 16802, USA ' Department of Industrial and Manufacturing Engineering, Penn State University, University Park, PA 16802, USA

Abstract: Time series data are found in diverse fields including, science, business, medicine and engineering. In this paper, we consider sequential pattern mining for categorical time series data that contain multiple independent time-series. Frequent patterns are considered important in a variety of applications. However, it is common for data to contain noise, and/or for the source process to have considerable variability. Conventional sequential pattern mining methods that use exact matching address, some but not all of these difficulties. Two general approaches used in previous studies to mine sequential patterns in data with noise are distance-based clustering and hidden Markov models. While these approaches are useful in mining frequent sequential patterns in noisy data, we further propose a framework (MWASP: multiple-width approximate sequential pattern mining) that uncovers frequent approximate sequential patterns with various widths. A mined pattern in this framework is representative of a group of sequences that follow the pattern's event flow order. This gives insight into the occurrence of the pattern longitudinally, as well as across the population. The pattern can be recognised as a common pattern across the multiple time series, time, or both.

Keywords: data mining; hidden Markov model; HMM; sequential pattern search; sequential pattern mining; approximate sequential patterns; gaps; time series data; multiple time series.

DOI: 10.1504/IJDMMM.2015.069249

International Journal of Data Mining, Modelling and Management, 2015 Vol.7 No.2, pp.108 - 129

Published online: 06 May 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Mining approximate sequential patterns with gaps

Keep up-to-date