Title: DNA data clustering by combination of 3D cellular automata and n-grams for structure molecule prediction
Authors: Fatima Kabli; Reda Mohamed Hamou; Abdelmalek Amine
Addresses: GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria ' GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria ' GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria
Abstract: Knowledge extraction from genomic data is important activity for the biologist. In order to mine the underlying biological knowledge, we based on the Knowledge Discovery in Databases (KDD) process. In this paper, we transformed DNA sequences into texts: the text indexed by TF-IDF and n-grams approach. In the aim of grouping the similar DNA sequences, we applied the bio-inspired 3D cellular automata for clustering method. For the analysis of clustering results we based on the transformation of each DNA sequence into amino acids sequence; according to the standard genetic code, we concluded that the clusters help the biologists to select DNA sequences that can produce a type of medicament (molecule) and their various derivatives (low concentration in their composition).
Keywords: DNA sequences; amino acids; 3D cellular automata; molecule; n-grams; bioinformatics; DNA data clustering; structure molecule prediction; knowledge extraction; data mining; knowledge discovery.
International Journal of Bioinformatics Research and Applications, 2016 Vol.12 No.4, pp.299 - 311
Received: 22 Jun 2015
Accepted: 22 Mar 2016
Published online: 03 Dec 2016 *