International Journal of Computational Intelligence in Bioinformatics and Systems Biology (4 papers in press)
Discovering of Gapped Motifs using Particle Swarm Optimization
by Srinivasulu Reddy Uyyala, Michael Arock, A. V. Reddy
Abstract: In Bioinformatics, Motif discovery is one of the fundamental and important computational problems. It often corresponds to functionally and structurally important elements in DNA sequences and proteins. Motif (pattern, signal or domain) is a feature that occurs repeatedly in biological sequences, typically more often than expected, at random. Identifying these recurring patterns in biological sequences helps us to better understand the mechanisms that regulate gene expression. In the last decade, many computational methods have been proven useful in predicting real binding sites. No single method stands out as the sole best. Many methods are used to discover gapless motifs by ignoring gaps. Algorithms: RSAT, BioProspector, BIPAD, SPACER, SCOPE, MERMAID and GALAM2 were especially designed for discovering gapped motifs. Recently several evolutionary algorithms have been developed to solve motif discovery problem, because of their efficiency in searching multidimensional solution space. HPSO, IPSO-GA, PMbPSO and PSO+ are based on Particle Swarm Optimization (PSO) algorithms. Among these methods, PSO+ is the first one to be proposed for finding Gapped Motifs. PSO+ is less efficient in finding Gapped Motifs that are located at the center of a motif. Here, our contribution is, to find Gapped Motifs that are present at the center of two conserved regions efficiently by adopting features of PSO to solve the problem. We performed experiments first on simulated planted (l, d)-motifs and then modeled these to identify gapped motifs. Secondly, we have tested our algorithms for synthetic gapped motifs, (l, d) - X (m, n) - (l, d) signals by varying l, d and X (m, n) values. Finally, the same algorithm is used for real biological data sets and it is observed that our approach is also able to detect known gapped TFBSs more accurately and more efficiently.
Keywords: Motif Finding, Particle Swarm Optimization (PSO), Swarm Intelligence (SI), Transcriptional Factor Binding Sites (TFBS), Gapped Motifs.
SMISS: A protein function prediction server by integrating multiple sources
by Renzhi Cao, Zhaolong Zhong, Jianlin Cheng
Abstract: SMISS is a novel web server for protein function prediction. Three different predictors can be selected for different usage. It integrates different sources to improve the protein function prediction accuracy, including the query protein sequence, protein-protein interaction network, gene-gene interaction network, and the rules mined from protein function associations. SMISS automatically switch to ab initio protein function prediction based on the query sequence when there is no homologs in the database. It takes fasta format sequences as input, and several sequences can submit together without influencing the computation speed too much. PHP and Perl are two primary programming language used in the server. The CodeIgniter MVC PHP web framework and Bootstrap front-end framework are used for building the server. It can be used in different platforms in standard web browser, such as Windows, Mac OS X, Linux, and iOS. No plugins or Java needed for our website. Availability: http://tulip.rnet.missouri.edu/profunc/.
Keywords: protein function prediction; data integration; spatial gene-gene interaction network; protein-protein interaction network; chromosome conformation capturing.
Ortholog Detection: Pathway to Comparative Genomics
by MANPREET SINGH, SHAIFU GUPTA
Abstract: Accurate detection of orthologs is a key aspect of comparative genomics. Orthologs can be used to predict the function of newly sequenced genes from the model organisms as they retain the same biological function through the path of evolution. In this paper we describe different methods available for the detection of orthologs. Different computational methods, comprising of phylogenetic as well as pair-wise comparison methods are discussed and compared. Some other methods based on synteny and protein network comparisons are also discussed in the paper. The study shows that phylogenetic methods of detecting orthologs are comparatively accurate and reliable than the pair-wise graph based methods but computationally more intensive and slow. These should be used when we have sufficient computational power to operate. On the other hand Pair-wise approaches are fast and can handle large amount of data. Synteny based methods also form a good candidate for the detection of orthologs.
Keywords: Orthologs; evolution; phylogenetic methods; comparative genomics; pair-wise methods; synteny.
Combining associative classification with multifactor dimensionality reduction for predicting higher-order SNP interactions in case-control studies
by Suneetha Uppu, Aneesh Krishna, Raj P.Gopalan
Abstract: The identification and characterization of genotype-phenotype mapping is a central focus of current genome wide association interaction studies (GWAIS). Revealing these relationships for exposing the hidden structures of diseases has received considerable attention by a number of researchers. However, the current statistical and computational approaches ignore many complex genetic contexts. A multifactor dimensionality reduction based on associative classification was previously proposed for detecting multi-locus single nucleotide polymorphism (SNP) interactions in GWAIS. The datasets were simulated by varying minor allele frequency and heritability for five different penetrance functions, along with various case-control ratios and sample sizes. About, 54,900 simulated datasets were generated in total by evaluating the approach for one-locus to six-locus models on both balanced and imbalanced datasets. The approach is further studied in detail by adjusting threshold levels, and adding noise to the datasets. The simulated studies demonstrated significant improvements in accuracy by adjusting threshold values over the previous approaches. The results also indicate that the approach is robust in the presence of noise. Further, the findings from simulated studies are confirmed by evaluating on sporadic breast cancer and hypertension data. The application of this approach to real world data has demonstrated higher-order interactions among five SNPs for the manifestation of breast cancer, and three SNPs for the manifestation of hypertension.
Keywords: Epistasis; multifactor dimensionality reduction; associative classification; SNP interactions; data mining and machine learning approaches.