Title: Enhanced sequence identification technique for protein sequence database mining with hybrid frequent pattern mining algorithm

Authors: J. Jeyabharathi; D. Shanthi

Addresses: Department of Computer Science and Engineering, C.R. Engineering College, Madurai, Tamil Nadu, India ' Department of Computer Science and Engineering, PSNA College of Engineering and Technology, Dindigul, Tamil Nadu India

Abstract: Sequential pattern mining is the task of identifying the patterns present in a certain number of data instances. This paper proposes a novel Enhanced Sequence Identification (ESI) approach to effectively find the frequent patterns from the huge dataset. The Hybrid Frequent Pattern Mining (HFPM) algorithm employs the tree-based structure that achieves a significant reduction in the space complexity. The frequent items with dependency are added down to the leaves of the tree. The pruning strategy is added for pruning the infrequent items with respect to the minimum support threshold. Association rules are used for mining the frequent patterns by identifying the relationship between the items and finding the approximate frequent patterns from the databases. The proposed ESI-HFPM algorithm shows high performance with less memory consumption and lower run time than the existing algorithms. The proposed algorithm ensures the effective extraction of frequent patterns with the optimisation of resource constraints.

Keywords: ESI; enhanced sequence identification; GAPA; generalised approximate pattern algorithm; frequent pattern mining; data mining; protein sequence mining; sequential pattern mining; bioinformatics; protein sequences; association rules; resource constraints.

DOI: 10.1504/IJDMB.2016.080673

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.3, pp.205 - 229

Received: 20 Feb 2016
Accepted: 11 Sep 2016

Published online: 01 Dec 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article