Title: A fast and novel approach based on grouping and weighted mRMR for feature selection and classification of protein sequence data

Authors: Kiranpreet Kaur; Nagamma Patil

Addresses: Department of Information Technology, National Institute of Technology Karnataka, Surathkal, Mangalore, Karnataka, India ' Department of Information Technology, National Institute of Technology Karnataka, Surathkal, Mangalore, Karnataka, India

Abstract: The analysis of protein sequences under bioinformatics has gained wide importance in research area. Newly added protein sequences can be analysed using existing proteins and converting them into feature vector form. However, it emerges as a challenging task to deal with huge number of features obtained using sequence encoding techniques. Since all the features obtained are not actually required, a three-stage feature selection approach has been proposed. In the first stage, features are ranked and most irrelevant features are removed; in the second stage, conflicting features are grouped together; and in third stage, a fast approach based on weighted Minimum Redundancy Maximum Relevance (wMRMR) has been proposed and applied on grouped features. Different classification methods are used to analyse the performance of the proposed approach. It is observed that the proposed approach has increased classification accuracy results and reduced time consumption in comparison to the state-of-the-art methods.

Keywords: bioinformatics; feature selection; protein sequence data; filter method; mRMR; classification.

DOI: 10.1504/IJDMB.2020.105435

International Journal of Data Mining and Bioinformatics, 2020 Vol.23 No.1, pp.47 - 61

Accepted: 05 Nov 2019
Published online: 28 Feb 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article