Title: DNA data clustering by combination of 3D cellular automata and n-grams for structure molecule prediction

Authors: Fatima Kabli; Reda Mohamed Hamou; Abdelmalek Amine

Addresses: GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria ' GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria ' GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria

Abstract: Knowledge extraction from genomic data is important activity for the biologist. In order to mine the underlying biological knowledge, we based on the Knowledge Discovery in Databases (KDD) process. In this paper, we transformed DNA sequences into texts: the text indexed by TF-IDF and n-grams approach. In the aim of grouping the similar DNA sequences, we applied the bio-inspired 3D cellular automata for clustering method. For the analysis of clustering results we based on the transformation of each DNA sequence into amino acids sequence; according to the standard genetic code, we concluded that the clusters help the biologists to select DNA sequences that can produce a type of medicament (molecule) and their various derivatives (low concentration in their composition).

Keywords: DNA sequences; amino acids; 3D cellular automata; molecule; n-grams; bioinformatics; DNA data clustering; structure molecule prediction; knowledge extraction; data mining; knowledge discovery.

DOI: 10.1504/IJBRA.2016.080718

International Journal of Bioinformatics Research and Applications, 2016 Vol.12 No.4, pp.299 - 311

Available online: 03 Dec 2016 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article