Title: Search for regions with periodicity using the random position weight matrices in the C. elegans genome
Authors: Eugene V. Korotkov; Maria A. Korotkova
Addresses: Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, 33, bld. 2, Leninsky Ave. 119071, Moscow, Russia; MEPhI National Research Nuclear University, 31 Kashirskoe shosse, Moscow 115409, Russia ' MEPhI National Research Nuclear University, 31 Kashirskoe shosse, Moscow 115409, Russia
Abstract: The present study developed a mathematical method for determining tandem repeats in a DNA sequence. A multiple alignment of periods was calculated by direct optimisation of the position-weight matrix (PWM) without using the pairwise alignments or searching for similarity between periods. A new mathematical algorithm for periodicity search was developed using the random PWMs. The developed algorithm was applied in analysing the DNA sequences of the C. elegans genome. A total of 25,360 regions were found to possess a periodicity with the length of 2 to 50 bases. On the average, a periodicity of ~4000 nucleotides was found to be associated with each region. A significant portion of the revealed regions possess periods consisting of 10 and 11 nucleotides, multiple of 10 nucleotides and periods in the vicinity of 35 nucleotides. Only ~30% of the periods found were discovered previously. This study discussed the origin of periodicity with insertions and deletions.
Keywords: period; sequence; random matrix; alignment; multiple alignment; tandem repeats; weight matrices; similarity; dynamic programming.
International Journal of Data Mining and Bioinformatics, 2017 Vol.18 No.4, pp.331 - 354
Received: 29 Dec 2016
Accepted: 22 Oct 2017
Published online: 21 Nov 2017 *