Title: Investigating an Artificial Immune System to strengthen protein structure prediction and protein coding region identification using the Cellular Automata classifier

Authors: P. Kiran Sree, I. Ramesh Babu, N.S.S.S.N. Usha Devi

Addresses: Department of C.S.E, JNT University Hyderabad, India. ' Department of C.S.E, Acharya Nagarjuna University, Guntur, India. ' JNT University Kakinada, India

Abstract: Genes carry the instructions for making proteins that are found in a cell as a specific sequence of nucleotides that are found in DNA molecules. But, the regions of these genes that code for proteins may occupy only a small region of the sequence. Identification of the coding regions plays a vital role in understanding these genes. In this paper we have explored an Artificial Immune System (AIS) that can be used to strengthen and identify the protein coding regions in a genomic DNA system in changing environments and the CA technique for protein structure prediction of small alpha/beta proteins using Rosetta. From an initial round of Rosetta sampling, we learn properties of the energy landscape that guide a subsequent round of sampling toward lower-energy structures. Three different approaches to improve tertiary fold prediction using the genetic algorithm are discussed: refinement of the search strategy; combination of prediction and experiment; inclusion of experimental data as selection criteria into the genetic algorithm. It has been developed using a slight variant of genetic algorithm. Good classifiers can be produced, especially when the number of the antigens is increased. However, an increase in the range of the antigens somehow affects the fitness of the immune system. Experimental results confirm the scalability of the proposed AIS FMACA based classifier to handle large volume of datasets irrespective of the number of classes, tuples and attributes. We note an increase in accuracy of more than 5.2%, over any existing standard algorithms that address this problem. This was the first algorithm to identify protein coding regions in mixed and also non-overlapping exon-intron boundary DNA sequences. The accuracy of prediction of the structure of proteins was also found comparable.

Keywords: cellular automata; unsupervised learning classifiers; MGA; genetic algorithms; AIS; artificial immune system; coding regions; FMACA; fuzzy multiple attractor cellular automata; pattern classifiers; protein structure prediction; nonlinear cellular automata; text clustering; K-means algorithm; bioinformatics; genomic DNA systems; DNA sequences.

DOI: 10.1504/IJBRA.2009.029044

International Journal of Bioinformatics Research and Applications, 2009 Vol.5 No.6, pp.647 - 662

Published online: 29 Oct 2009 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article