Title: Remote homology detection using GA and NSGA-II on physicochemical properties

Authors: Mukti Routray; Niranjan Kumar Ray

Addresses: Department of Computer Science and Engineering, Silicon Institute of Technology, Bhubaneswar, Odisha, India ' School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India

Abstract: Remote homology detection at amino acid level is a complex problem in the area of computational biology. We have used machine learning algorithms to predict the homology of un-annotated protein sequences which can save time and cost. This work is divided in three phases. Initially the features are extracted from protein sequences using Principal Component Analysis (PCA) to build a chromosome set with representative features of each protein based on physicochemical properties. Second stage involves GA for the construction of a set of chromosomes for classification based on PCA and initialises the classifier to build up an error matrix. Third stage uses NSGA-II, crossover and mutation, and tournament selection for the next set of chromosomes. The output of this experiment is a set of minimum classification error values and minimum number of features used for classification of protein families. This approach gives superior accuracy over the profile-based methods.

Keywords: PCA; principal component analysis; feature selection and classification; genetic algorithm; profile-based methods.

DOI: 10.1504/IJCAT.2020.112688

International Journal of Computer Applications in Technology, 2020 Vol.64 No.4, pp.393 - 402

Received: 11 Feb 2020
Accepted: 12 Jun 2020

Published online: 28 Jan 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article