Title: An advanced approach for DNA sequencing and similarities analysis on the basis of groupings of nucleotide bases

Authors: Kshatrapal Singh; Laxman Singh; Vijay Shukla; Yogesh Kumar Sharma; Arun Kumar Rai

Addresses: Department of CSIT, KIET Group of Institutions, Delhi – NCR, Ghaziabad, India ' Department of CSE (AI & ML), KIET Group of Institutions, Delhi – NCR, Ghaziabad, India ' Department of CSE (AI & DS), GNIOT Group of Institutions, Greater Noida, India ' Department of CSE, I.T.S. Engineering College, Greater Noida, India ' Department of CSE, Graphic Era Hill University, Bhimtal, India

Abstract: In order to seamlessly identify the links between various DNA sequences on a broad scale, DNA sequencing is a crucial tool. But there is still more potential for advancement in sequencing quality. A highly well-liked method for determining sequence similarities is the alignment-free technique. As per their chemical characteristics, the four bases of DNA sequences A, C, G, and T are separated in three types of groupings in this research. A primary DNA sequence is transformed into three symbolic sequences. In order to depict the sequence, the frequencies of group variations of three notational sequences have been aggregated in a 12-component vector. The nucleotide sequences of beta globin gene on a dataset of several species are characterised and compared using the Euclidean distances across inserted vectors. Using phylogenetic trees, the evolutionary relationships between various organisms are visually represented. A phylogenetic tree's branch structure shows how several species or other groups diverged from several common ancestors. Our findings are in agreement with recent biological assessments. Additionally, we compared our approach to a few currently used sequence comparing techniques and discover that it is more efficient and user-friendly. We also analysed the time and space complexities of our proposed approach.

Keywords: alignment-free technique; similarity analysis; bases groupings; mutation; phylogenetic tree.

DOI: 10.1504/IJDMB.2025.143005

International Journal of Data Mining and Bioinformatics, 2025 Vol.29 No.1/2, pp.133 - 149

Received: 21 Aug 2023
Accepted: 07 Feb 2024

Published online: 02 Dec 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article