International Journal of Bioinformatics Research and Applications (6 papers in press)
Clustering Analysis of Soil Microbial Community on a Global Scale
by Tetsushi Tanaka, Andre Freire Cruz, Naoaki Ono, Shigehiko Kanaya
Abstract: There are huge numbers of bacteria in the soil, and their existence and function a?ect soil properties. Bacteria form a complex community called the microbiome. Compared to the ecosystem of plants and animals, we still know little about the soils microbial ecosystem. How soil microbiomes are di?erent throughout the world and how they relate to the region and the environment is of major interest. Recently, the development of next-generation sequencing has enabled researchers to accumulate metagenome profile data, and large-scale and comprehensive microbiome analyses are required by the scientific approach. In this study, we compared and analyzed the data from various environments on a global scale using the microbiome database, the Earth Microbiome Project (EMP). We calculated the distance based on genetic distance, named the UniFrac distance, and did clustering analysis. Clustering results were ecologically interpreted from the view of the function and the characteristics of bacteria. We revealed the characteristics of groups of bacteria related to paddy, vineyard, grasslands in Mongolian, forests, and biofilter. Furthermore, we investigated the relationship between clusters and climate zones. This research is expected to lead to knowledge of soil management based on the soil microbiome.
Keywords: Clustering Analysis; Metagenome; Microbiome; Soil Microbiology; UniFrac Distance.
Homology modeling and docking studies of strictosidine beta-D-glucosidase from Madagascar periwinkle (Catharanthus roseus Bunge)
by Piotr Szymczyk, Grazyna Szymanska, Malgorzata Majewska, Izabela Weremczuk-Jezyna, Michal Kolodziejczyk, Kamila Czarnecka, Pawel Szymanski, Ewa Kochan
Abstract: Here, the structure of the enzyme and substrate binding site of C. roseus strictosidine beta-D-glucosidase is reported. An homology model of C. roseus strictosidine beta-D-glucosidase was built using the Discovery Studio 4.1 software package and rice Os3bglu6 beta-glucosidase (PDB code: 3GNPA) as a template. The CDOCKER algorithm was used to dock the natural substrate, strictosidine, as well as an inhibitor D-glucono 1,5-lactone. Obtained structures were refined in the course of molecular dynamic simulation. Analysis of the acquired data provided information on the C. roseus strictosidine beta-D-glucosidase amino acid residues interaction with the natural substrate and inhibitor. Our findings expand the basic knowledge of the structure of the C. roseus strictosidine beta-D-glucosidase active site, and may also be used in the design of enzyme point mutations with improved stability and catalytic properties, or to change substrate specificity.
Keywords: homology modeling; ligand docking; enzyme-ligand interactions; alanine scanning.
A Framework for Neighborhood Configuration to Improve the KNN based Imputation Algorithms on Microarray Gene Expression Data
by Shilpi Bose, Chandra Das, Kuntal Ghosh, Matangini Chattopadhyay, Samiran Chattopadhyay
Abstract: In view of the several technical problems associated with microarray experiments, a considerable amount of entries are found missing in a typical microarray gene expression dataset. As a consequence, due to the unavailability of complete data, the effectiveness of the analysis algorithms deteriorates. Different imputation techniques are employed to address this problem among which, the weighted average based methods are widely used in several applications. These methods generate consistent results and are algorithmically simple, but they also suffer from some drawbacks that are seldom elaborated upon. These deficiencies have been pointed out in this work, and a new approach has been suggested to overcome those. The proposed framework is embedded in the K-nearest neighbor imputation method (KNNimpute), as well as its different versions. The idea is to achieve better neighborhood formation, in order to improve the prediction accuracies. It is based on a hybrid distance and gene transformation procedure which utilizes simultaneously the advantages of Euclidean distance, Mean squared residue score, and Pearson correlation coefficient to select the best possible neighbors, using pattern-based similarity. The framework is tested on ten well-known microarray datasets. From the experimental results it has been found that in each and every case, the proposed modified methods significantly outperform their corresponding traditional versions and are also comparable with the existing robust numerical methods.
Keywords: Missing value prediction; Microarray technology; Gene expression data; K-nearest neighbors; Pearson correlation coefficient; Mean Square Residue; Euclidean distance.
Multi-Descriptor Approaches to Oxygen Binding Proteins Prediction and Classification using Deep Learning
by Soumiya Hamena
Abstract: Oxygen binding proteins play a key role in the transport and storage of oxygen through the bodys cells, and identifying oxygen binding proteins is highly desirable for predicting functional annotation of these proteins. However, costly and time consuming biological tests can only determine a very small portion of all proteins sequences available currently in the databases. This has made computational approaches increasingly essential to help biologists predict and classify proteins function effectively. The key idea behind this work is to investigate the effect of using several descriptors and deep learning to achieve better prediction. To the best of our knowledge, no study on the subject has been undertaken. Three kinds of descriptors to represent protein sequences are considered in this study namely amino acid composition (AAC), dipeptide composition (DC) and conjoint triad feature (CTF). To carry out the classification process, Deep Neural Networks (DNN) has been designed. Firstly, we applied DNN to single descriptor prediction of oxygen binding proteins, that is using AAC, DC, CTF descriptors separately and then we applied DNN in a multi-descriptor prediction context by investigation of the integration of AAC-DC, AAC-CTF, DC-CTF and AAC-DC-CTF descriptors combinations. Secondly, DNN was developed to classify the oxygen binding proteins into six different classes which are: Erythrocruorin, Myoglobin, Hemerythrin, Hemocyanin, Hemoglobin and Leghemoglobin. The proposed approach can be viewed as a two stages process by applying first a binary classification then a multiclass classification. The experimental results show that the proposed models outperform the existing methods where we obtained a Matthews Correlation Coefficient (MCC) of 0.9677 for identifying oxygen binding proteins with the combination of AAC and DC. For classifying oxygen binding proteins, we achieved MCC of 0.4539, 0.9054, 0.9830, 0.9434, 0.8813 and 0.9441 for the above six classes, respectively.
Keywords: Oxygen binding proteins; Proteins function; AAC; DC; CTF; Deep Neural Networks; Deep Learning; Data Mining.
Bioinformatic analysis of post-transcriptional regulation of Endoplasmic reticulum stress genes by alternative polyadenylation and microRNA.
by Neha Singh, Srishti Shriya, Farhat Afza, Padmini Bisoyi, Arun K. Kashyap, Kundan K. Chaubey, Buddhi Prakash Jain
Abstract: Various post-transcriptional regulation processes modulate the Endoplasmic reticulum stress response. Alternative polyadenylation is one post-transcriptional regulatory mechanism which generates transcript with alternate 3'end. The transcript with different 3'UTR differentially modulated by various cis-regulator elements and miRNAs. Present work aims to characterize CPE in 16 ER stress associated genes. Canonical AAUAAA PAS is present in 13 out of total 24 sites (53%). Alternative polyadenylation was found in five genes (HSPA5, HSPA9, ATF6, SHQ1, and VLDLR) in NCBI and EST database. Further insilico analysis was done using computational tools as well as experimentally validated databases to find out miRNAs which target the ER stress associated genes. The information about the alternative polyadenylation and the microRNA sites in the ER stress associated genes might be useful for further study of regulation of ER stress response and its associated molecular and pathological processes.
Keywords: Alternative polyadenylation; core polyadenylation element; miRNA; Endoplasmic reticulum stress; polyadenylation signal.
A Comprehensive In-Silico characterization of Gasdermin superfamily genes regulating pyroptosis linked diseases
by Suman Rani, Praveen P. Balgir, Nandini , Bhagwant Singh, Husandeep Kaur
Abstract: Pyroptosis is a lytic form of inflammatory cell death. Non-Synonymous Single Nucleotide Polymorphism (nsSNPs) have been accepted as biomarkers of disease susceptibility. In the present study nsSNPs of Gasdermin Super family Genes, presently proven to be involved in pyroptosis and consequently in etiology of many diseases, were screened for their functional and structural effects. 3 nsSNPs of GSDMA, 2 each of GSDMB, GSDMC and GSDMD, 6 nsSNPs of GSDME and 2 nsSNPs of DFNB59 gene were predicted to be deleterious; showing decrease in structural stability by in silico tools. Further in silico analysis of post translationally modified Phosphorylation sites of members of gene family were checked for presence of SNPs which may influence the PTM and consequently the protein function. Current study has comprehensively tried to analyze all Gasdermin family gene variants with different in-silico algorithms combined with already available literature of wet lab experiments, to propose a list of nsSNPs that are the potential targets for future functional genetic analysis.
Keywords: nsSNPs; SIFT; PolyPhen; PROVEAN; MuPro; I-Mutant; SNP&Go;I-TASSER; STRUM; Accelrys Discovery Studio 4.0; Gasdermin family.