Int. J. of Bioinformatics Research and Applications   »   2016 Vol.12, No.4

 

 

Title: DNA data clustering by combination of 3D cellular automata and n-grams for structure molecule prediction

 

Authors: Fatima Kabli; Reda Mohamed Hamou; Abdelmalek Amine

 

Addresses:
GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria
GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria
GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saïda, Saïda, Algeria

 

Abstract: Knowledge extraction from genomic data is important activity for the biologist. In order to mine the underlying biological knowledge, we based on the Knowledge Discovery in Databases (KDD) process. In this paper, we transformed DNA sequences into texts: the text indexed by TF-IDF and n-grams approach. In the aim of grouping the similar DNA sequences, we applied the bio-inspired 3D cellular automata for clustering method. For the analysis of clustering results we based on the transformation of each DNA sequence into amino acids sequence; according to the standard genetic code, we concluded that the clusters help the biologists to select DNA sequences that can produce a type of medicament (molecule) and their various derivatives (low concentration in their composition).

 

Keywords: DNA sequences; amino acids; 3D cellular automata; molecule; n-grams; bioinformatics; DNA data clustering; structure molecule prediction; knowledge extraction; data mining; knowledge discovery.

 

DOI: 10.1504/IJBRA.2016.10001719

 

Int. J. of Bioinformatics Research and Applications, 2016 Vol.12, No.4, pp.299 - 311

 

Date of acceptance: 22 Mar 2016
Available online: 03 Dec 2016

 

 

Editors Full text accessPurchase this articleComment on this article