International Journal of Computational Biology and Drug Design (20 papers in press)
Data Acquisition and Electrical Instrumentation Engineering Modelling for Intelligent Learning and Recognition
by Jun Qin, Yuhao Jiang
Development of interactive computer learning program for genetics and molecular biology applications
by Xiaoli Yang, Bin Chen, Yifan Cai, Charles Tseng
Identification of dual target anti-inflammatory inhibitors using merged structure based pharmacophore modelling and docking approach
by Manikandan Selvaraj, Muhd Hanis MD Idris, Siti Norhidayu Mohd Amin, Mohd Zaki Salleh, Teh Lay Kek
Abstract: Merged structure-based pharmacophore modelling followed by 3-D database search and molecular docking were the sequential protocol applied in order to identify selective novel COX-2 and PDE4D as dual target anti-inflammatory inhibitors. Utilization of the key interaction features of crystal structures of COX-2 (pdb: 1CX2) and PDE4D (pdb: N0YN) was exploited as critical component in the selection of dual target inhibitors. Through this approach, nine chalcone and flavones scaffold like compounds were selected as putative dual target anti-inflammatory inhibitors from Asinex database. In general understanding such approach could provide valuable insights into discovery of novel anti-inflammatory inhibitors as therapeutic agents.
Keywords: Cyclooxygenase; phosphodiesterase; dual target; merged pharmacophore; docking.
Adaptive-Fuzzy clustering based texture analysis for classifying liver cancer in abdominal CT images
by Amita Das, Priti Das, S.S. Panda, Sukanta Sabut
Abstract: Segmentation of diseased liver in abdominal CT images is a challenging task due to variations in shapes, tissue similarity between adjoining organs like kidney, heart and also pathologies caused by diseases. The computer aided diagnosis (CAD) systems is very useful in automatic analysis of tumor position and finding a region of interest (ROI) from images. We propose a technique that integrates the fuzzy clustering with adaptive thresholding for segmenting the liver and the finding tumor region in abdominal CT images. Various features like texture features, morphological features and statistical features have been extracted from the output images and used as input to the classifier. The proposed method was evaluated in a series of 45 images taken from MICCAI datasets and open sources. Neural network classifier has been used to classify the malignant and benign tumor of the liver. The efficiency of the proposed algorithm is tested in terms of sensitivity, specificity, and accuracy. The accuracy of 97.82%, 95.74% is achieved in BPN and LVQ and a higher accuracy of 98.82% is achieved with PNN in detecting tumors which are comparable to published results. This method could be an effective solution for identifying the tumor region of liver on CT abdominal images.rnrn
Keywords: CT image; liver; tumor; segmentation; region of interest; neural network classifier.
Comparative analysis of machine learning based QSAR Models and molecular docking studies to screen potential anti-tubercular inhibitors against InhA of Mycobacterium tuberculosis
by Madhulata Kumari
Abstract: Machine learning techniques are advanced computational techniques which can be used to build quantitative structureactivity relationship (QSAR) model of compounds data set to find out important descriptors which are able to predict a specific biological activity from unknown compounds to discover better drugs. In the present study, by optimizing descriptors using correlation-based feature selection, principal component analysis, and genetic programing technique, several machine learning techniques were used to build QSAR models on three different experimental datasets of InhA inhibitors. The best QSAR models were deployed on a data set of 1450 approved drug from drug bank to screen new InhA inhibitors. Amoxicillin was found to show highest predicted activity pIC50=6.54, and Itraconazole was the second compound with highest predicted activity 6.4 (pIC50 ) that was calculated based on the best Random Forest (RF) model using CFS-GS-FW descriptor set in the dataset of ChEMBL997779 of InhA of Mtb. Additionally, screening by molecular docking identified top-ranked ten approved drugs as anti-tubercular hits showing G-scores -8.23 to -6.95 (in kcal/mol) as compared with control compounds(known InhA Mtb inhibitors) G-scores -7.86 to -6.68 (in kcal/mol). Thus results indicate these potent compounds may have the better binding affinity for InhA of Mtb. From ourstudies, we conclude that machine learning based QSAR models can be useful for the development of novel target specific anti-tubercular compounds.
Keywords: Machine learning algorithms; Quantitative structure-activity relationships; Support vector machine; Random forest; Multilayer Perceptron; Genetic Algorithm; Genetic Programming; Regression; Mycobacterium tuberculosis; Gaussian Process; Correlation-based Feature Selection; InhA.
A genetic programming-based approach and machine learning approaches to the classification of multiclass anti-malarial datasets
by Madhulata Kumari, Neeraj Tiwari
Abstract: Feature selection approaches have been widely applied to deal with the various sample size problem in the classification of activity of datasets. The present work focuses on the understanding system of descriptors of anti-malarial inhibitors by Genetic programming (GP) to understand the impact of descriptors on inhibitory effects. The experimental dataset of inhibitors of anti-malarial was to derive the optimized system by GP. Additionally, we have developed machine learning models using Random Forest, Decision Tree, Support Vector Machine and Naive Bayes on an antimalarial dataset obtained from ChEMBL database and evaluated for their predictive capability. Based on the statistical evaluation, Random Forest model showed the higher area under the curve (AUC), better accuracy, sensitivity, and specificity in the cross-validation tests as compared to others. The statistical results indicated that the RF model was the best predictive model with 82.51% accuracy, 89.7% ROC. We deployed the RF classifier model on three datasets; phytochemical compound dataset, NCI natural product dataset IV and approved drugs dataset containing 918, 423 and 1554 compounds resulting 153, 81 and 250 compounds respectively as anti-malarial compounds. Further, to prioritize drug-like compounds, Lipinskis rule was applied on active phytochemicals which resulted in 13 hit anti-malarial molecules. Thus, such predictive models are useful to find out novel hit anti-malarial compounds and could also be used to discover novel drugs for other diseases.
Keywords: Machine learning approaches; Data mining; Random Forest; SVM; Naïve Bayes; Decision Tree; Malaria; Phytochemical; Natural product.
Putative Inhibitors of Homology-modelled Chorismate Synthase of Shigella flexneri
by CATHERINE L
Abstract: Shigellosis is an infection of the intestinal epithelium. We focus on the multidrug resistant S. flexneri (MDRSf) pathogen. We chose as our target chorismate synthase (SfCS), a key enzyme in the biosynthesis of aromatic amino acids in the shikimate-chorismate pathway. The SfCS crystal structure is unknown, so we built a homology model using the SfCS Serotype 2a sequence. Using the model, clusters of protein-protein interaction anchor residue hotspots were obtained, upon which a pharmacophore model was built. Virtual screening on 22,723,923 compounds resulted in seven hits. Of these, one was admissible against checks for ADME-Tox pharmacokinetics and cytochrome P450 toxicities. A scaffold-hopping procedure resulted in two other candidates. All three were docked to pockets determined using a new measure of residue depth associated with the Voronoi procedure. Remarkably, the three putative inhibitors have high pIC50 values that exceed those of many common antibiotics now in use.
Keywords: Shigellosis; diseases of the poor; pharmacophore; virtual screening; scaffold-hopping.
Designing of suitable linkers for the chimeric proteins to achieve the desired flexibility and extended conformation
by Manoj Patidar, Naveen Yadav, Sarat Dalai
Abstract: The designing and production of therapeutic chimeric proteins is the central focus area of many industrial R & D and research institutes. The efficient productions of chimeric proteins not only require suitable domains or partner proteins and intact receptor binding sites, but the selection of linkers is equally important. The linkers are essential to provide space between two domains, prevent from steric hindrance and most importantly to make the chimeric proteins flexible. In this in silico study, we have systematically designed various linkers, tested their feasibility and evaluated their essentiality. Here, we fused cytokine of interest (i.e., IL-2) with IgG1 Fc via various linkers. We designed linkers of various lengths and amino acid composition and tested their ability to provide the extended conformation and desired flexibility and also to minimize the conformational changes. Additionally, we have evaluated the role of linker in dimerization of chimeric proteins. We next tested the influence of linkers on stability and functionality of chimeric proteins.
Keywords: Chimeric proteins; linker; Immunoglobulin; protein engineering; dimerization.
Exploring Polypharmacology of Some Natural Products Using Similarity Search Target Fishing Approach
by Ihab Almasri
Abstract: Natural products have long been considered as important sources for drug discovery due to the diversity of their chemical structures and broad range of biological activities attained by modulation of different biological targets. Therefore, the identification of the molecular targets of natural products is a milestone step in rational design of more potent and safer compounds. In this work, we explored the polypharmacology of three natural products having pleiotropic health beneficial effects: resveratrol, curcumin and berberine, using a ligand-based target fishing approach. The fishing protocol was started with the generation of a chemogenomic database that links individual targets with specific target ligands or group of drugs. Targets profile was then generated for each of the natural products via chemical/shape similarity search using ROCS software. The applied method was able not only to retrieve known targets within the top-ranked list for the natural compounds but also identified off-targets which were found by docking simulation to be potential targets and were consistent with recently identified bioactivities of these compounds. ROCS-based target fishing approach (RTFA) was proved to be successful in pharmacological profiling of the selected natural products and in the identification of new off-targets worth further evaluation.
Keywords: natural products; polypharmacology; similarity search; target fishing; docking.
An in silico approach for construction of a chimeric protein, targeting virulence factors of Shigella spp.
by Emad Kordbacheh
Abstract: Shigellosis is still a high burden gastrointestinal disease with increased frequency of antibiotic resistance. Regardless, there were about 50 different serotypes across the four different species of Shigella, the type III secretion apparatus (T3SA) are conserved among them; and IpaD, IpaB, and IcsA proteins participate in its function. recent studies indicate the stx gene has been found in all Shigella spp. and has a fundamental role in hemorrhagic colitis. Prior to chimeric construction design, bioinformatics tools were recruited for aiming this purpose. in the level of nucleosome, sequences choosing and optimizing, and in the phase of transcriptome, some prediction in associate with mRNA form, also in step of proteome, physicochemical parameter, best stability, first to third structures and model validation were some prediction performed in assistance with in-silico servers. Moreover, estimating antigenic and allergenic propensity, subcellular localization and protein functional was accomplished by bioinformatic software. Finally, these results would be beneficial in an animal model purpose for development a pervasive candidate immunogen against Shigella spp.rn
Keywords: Shigella species . Bioinformatics . Subunit vaccine . Virulence factors .
Special Issue on: ICIBM 2016 Recent Advances in Computational Systems Biology and Bioinformatics
A Flexible Approach to Reconstruct the Genomic Spatial Structure by the Genetic Algorithm
by Yan Zhang, William Hoskins, Ruofan Xia, Xiya Xia, Jim W. Zheng, Jijun Tang
Abstract: The 3D structures of the chromosomes play fundamental roles in essential cellular functions, e.g. gene regulation, gene expression, evolution. HiC technique provides the interaction density between loci on chromosomes.\r\nSeveral approaches have been developed to reconstruct the 3D model of the chromosomes from HiC data. However, all of the approaches are based on a particular mathematical model and lack of flexibility for new development.\r\nWe introduce a novel approach using the genetic algorithm. Our approach is\r\nflexible to accept any mathematical models to build a 3D chromosomal structure.\r\nAlso, our approach outperforms current techniques in accuracy
Keywords: : Genome; Spatial Structure; Genetic Algorithm; HiC.
Signal Translational Efficiency between mRNA Expression and Antibody-based Protein Expression for Breast Cancer and its Subtypes from Cell lines to Tissue
by Aida Yazdanparast, Lang Li, Milan Radovich, Lijun Cheng
Abstract: Background: Although gene transcripts and protein expression have been utilized to classify breast cancer subtypes, it is not clear whether the observed measurement of gene transcript abundance can predict its protein expression. Herein, we attempt to address gene transcript/protein associations using publically-available data on breast cancer tumor tissues and cell lines. Method: Correlation analysis between mRNAs and Reverse-phase protein arrays (RPPA) among 421 primary breast tumors and 33 breast cancer cell lines was conducted. Highly concordant proteins/genes were further analyzed in different breast cancer subtypes. Results: The overall accordance of mRNA/RPPA correlation between cell lines and primary tissue is R2=0.71. Since most of these genes are well known drug targets, highly concordant gene/RPPA associations not only confirm that these gene transcripts can serve as biomarkers for their protein products in drug target selection, but also imply that breast cancer cell lines can serve as good models for primary breast cancer tumors.
Keywords: Breast cancer; Reverse-phase protein array; mRNA; Cell lines; Protein abundance.
Native State of Complement Protein C3d Analysed via Hydrogen Exchange and Conformational Sampling
by Didier Devaurs, Malvina Papanastasiou, Dinler Antunes, Jayvee Abella, Mark Moll, Daniel Ricklin, John Lambris, Lydia Kavraki
Abstract: Hydrogen/deuterium exchange detected by mass spectrometry (HDX-MS) provides valuable information on protein structure and dynamics. Although HDX-MS data is often interpreted using crystal structures, it was suggested that conformational ensembles produced by molecular dynamics simulations yield more accurate interpretations. In this paper, we analyse the complement protein C3d by performing an HDX-MS experiment, and evaluate several interpretation methodologies using an existing prediction model to derive HDX-MS data from protein structure. To interpret and refine C3d's HDX-MS data, we look for a conformation (or conformational ensemble) of C3d that allows computationally replicating this data. We confirm that crystal structures are not a good choice and suggest that conformational ensembles produced by molecular dynamics simulations might not always be satisfactory either. Finally, we show that coarse-grained conformational sampling of C3d produces a conformation from which its HDX-MS data can be replicated and refined.
Keywords: complement protein C3d; hydrogen exchange; mass spectrometry; protein conformational sampling; coarse-grained conformational sampling; native state; X-ray crystallography; molecular dynamics; protein structures; conformational ensembles.
Inhibition of Polyamine biosynthesis for toxicity control in Serratia marcescens strain WW4 by targeting ornithine decarboxylase: A structure-based virtual screening study
by Kalyani Dhusia, Pramod Yadav, Rohit Farmer, Pramod Ramteke
Abstract: Ornithine decarboxylase (ODC) enzyme, catalyzes the decarboxylation of ornithine to form spermidine which is a committed step in the biosynthesis of Polyamines. Polyamines are crucial for growth, cell proliferation and differentiation, but are toxic when produced in excess. Ornithine is the immediate precursor, for the production of polyamines via Polyamine biosynthesis mechanism. In this biosynthesis, ODC plays the central role hence, is considered the key target for inhibitory study. Polyamines being produced by Ornithine, the immediate precursor and ODC) plays the central role in this biosynthesis pathway hence, is considered the key target for inhibitory study. Here, in the present work, structure of ODC was modelled and studied for its active site. 142 Natural products of Indofine Herbal Ingredient from Zinc Database were screened using Autodock Vina for the identification of leading herbal inhibitors. The results obtained from docking showed that Conessine, Sumaresinolic acid, DNC, Exolone, Naringenin, Hesperidin and Baicailin were the top most inhibiting candidates with Docking Affinity -9.7(Kcal/mol), -9.2 (Kcal/mol), -9.0 (Kcal/mol), -8.9 (Kcal/mol), -8.8(Kcal/mol), -8.8(Kcal/mol) and -8.2(Kcal/mol) respectively. According to our findings, Conessine (IUPAC name- N,N-dimethylcon-5-enin-3946;-amine) was found to be the best inhibitor and is an alkaloid which proves its immense importance as metabolites. These herbal inhibitors can turn out to be significantly crucial in controlling the toxicity caused by excess production of polyamines by these PGPBs. Thus, Polyamines being harmful when in excess are necessary to be controlled at their genesis and according to literature, no similar approach has been reported yet in arena of herbal inhibition for polyamine biosynthesis or for toxicity control in PGPBs.
Keywords: Ornithine decarboxylase; Herbal Inhibitor; Molecular dynamics simulation; Docking; Virtual screening.
Risk-associated and pathway-based method to detect association with Alzheimer\'s disease
by Jeffrey Mitchel, Laszlo Prokai, Youping Deng, Fan Zhang, Robert Barber
Abstract: It is becoming increasingly apparent that genes do not function alone but through complex biological pathways in complex diseases such as Alzheimers disease (AD). Unraveling these intricate pathways is essential to understanding biological mechanisms of AD. Pathway-based association analysis allows for the discovery of highly significant pathways from the AD vs normal controls samples. Knowledge of activation of these processes will lead to novel markers identifying their signatures in patients at high risk for AD. Based on the Integrated Pathway Analysis Database (IPAD), we developed pathway-based method to detect association with AD. First, we performed risk associated allele analysis to determine if a major or minor allele is associated with risk. Then we performed pathway-disease association analysis to identify 133 AD-associated pathways. Lastly, we performed pathway-patient association analysis to investigate the patients association and distribution among the 133 pathways. We found 5 AD-associated pathways that have the highest association with patients. We present a pathway-based method to detect AD-associated pathways from GWAS data. Our pathway-based analysis not only provides a technique to identify disease-associated pathways, but also help determine the pathway-patient association. We believe that the method can help us with a comprehensive understanding of the molecular mechanisms underlying complex diseases such as AD.
Keywords: Alzheimer’s disease; pathway analysis; biomarker discovery.
Evaluation of biological and technical variations in low-input RNA-Seq and single-cell RNA-Seq
by Fan Gao, Jae Mun Kim, JiHong Kim, Ming-Yi Lin, Charles Y. Liu, Jonathan J. Russin, Christopher P. Walker, William Mack, Oleg V. Evgrafov, Robert H. Chow, James A. Knowles, Kai Wang
Abstract: Background: Although low-input RNA-Seq and single-cell RNA-Seq have been widely used today, two technical questions remain: (1) in the absence of biological variation, what proportion of technical noise comes from input RNA quantity as compared to bioinformatics tools? (2) in biological samples from single neurons, whether variation in gene expression is attributable to biological heterogeneity or just random noise? To examine the sources of variability, we have generated RNA-Seq data from both low-input RNA (two reference RNA samples at 10pg, 100pg, 1000pg quantity, each with 3-6 replicates) and single neurons (16 and 22 cells from two human brains). Results: We performed comparative analysis of the low-input data using different quantification pipelines and dimensionality reduction algorithms. We also compared functional enrichment of the most variably expressed genes from low-input and single neuron data. In general, the quantity of input RNA is negatively correlated with variation of gene expression from technical replicates. For genes in the medium- and high-expression groups, input RNA amount explains most of the variation, whereas differences in the bioinformatics pipeline explain some variation for the low-expression group. The dimensionality reduction method t-SNE reveals data-inherent aggregation of technical replicates of low-input data, and suggests heterogeneity of single pyramidal neuron transcriptome. Interestingly, the variation in gene expression in single neurons is biologically relevant. Conclusions: We found that variation contributed from bioinformatics pipeline is generally minor compared to the quantity of input RNA. We also demonstrated that t-SNE is more effective than PCA to handle the noises from very low-input RNA-Seq (single-neuron level). All the data sets were made available in public repositories for future benchmarking studies.
Keywords: RNA-Seq; single cell sequencing; bioinformatics; TOPHAT; RSEM; t-SNE; PCA; ANNOVAR; variations.
A GPU-CPU Heterogeneous Algorithm for NGS Read Alignment
by Ahmad Al Kawam, Sunil Khatri, Aniruddha Datta
Abstract: In the Next Generation Sequencing (NGS) read alignment problem, millions of DNA fragments, called reads, are mapped to a reference genome. NGS has unleashed a wealth of genomic information by producing immense amounts of data. It is enabling humanity to learn more about the origins of life and the genetic basis of diseases like cancer. Genomic analysis is typically carried out using traditional computing platforms, which have become a limiting factor in the speed of the process. The massive scale of the problem makes it an attractive target for acceleration. In this paper, we design a read alignment algorithm designed to run on a heterogeneous system composed of a GPU and a multicore CPU. We introduce novel techniques for the alignment process and construct a computational pipeline of overlapped CPU and GPU stages. Our design exploits the GPU's massive parallelism and ability to hide memory access latency to align hundreds of reads concurrently. We use OpenMP to hide I/O latency on a parallel network file system by loading the reads in batches in an overlapped manner (via CPU), processing (mostly via GPU), and writing of separate batches (via CPU) to maximize throughput. We compare our tool with the BWA-mem alignment tool, and the results show substantial speedups.
Keywords: Next Generation Sequencing; Read Alignment; GPU Acceleration.
Neural signature of event-related N200 and P300 modulation in parietal lobe during human response inhibition
by Rupesh Kumar Chikara, Oleksii Komarov, Li-Wei Ko
Abstract: The response inhibition control is important for daily life activities, such as car driving, walking, and playing games. The role of inhibition in many studies remains an issue of debate, most researchers nevertheless agree that some sort of inhibition mechanism is involved in the deliberate cessation of a motor response. Therefore, the stop-signal paradigm has been designed to investigate the response inhibition process. Other aspects also encourage the importance of this study, because stop-signal task performance may contribute to neurological disorders such as schizophrenia disorder and attention deficit hyperactivity disorder (ADHD) or obsessive-compulsive disorder. The aim of this study was to explore EEG modulation during left- and right-hand response inhibitions by using ERP analyses. In this study, we observed inhibition related significant ERP modulation in N200 and P300 waves at frontal, central and parietal regions. These outcomes reveal the response inhibition related neural markers in frontal, central and parietal lobes.From these findings, the statistically independent nature of the inhibition mechanisms of the left-hand and right-hand responses in the frontal, central, and parietal brain areas is clearly marked.
Keywords: Electroencephalography (EEG); Stop-signal task; Response inhibition; Event-related potentials (ERP); N200; P300.
Special Issue on: ICIBM 2016 Recent Advances in Computational Systems Biology and Bioinformatics
Identifying the dynamic gene regulatory network during latent HIV-1 reactivation using high-dimensional ordinary differential equations
by Jaejoon Song, Michelle Carey, Hongjian Zhu, Hongyu Miao, Juan Ramirez, Hulin Wu
Abstract: Reactivation of latently infected cells have emerged as an important strategy for eradication of HIV. However, genetic mechanisms of regulation after reactivation remains unclear. We describe a five-step pipeline to study the dynamics of the gene regulatory network following a viral reactivation using high-dimensional ordinary differential equations. Our pipeline implements a combination of five different methods, by detecting temporally differentially expressed genes (step 1), clustering genes with similar temporal expression patterns into a small number of gene response modules (step 2), performing a functional enrichment analysis within each gene response module (step 3), identifying a network structure based on the gene response modules using ordinary differential equations (ODE) and a high-dimensional variable selection technique (step 4), and obtaining a gene regulatory model based on refined parameter estimates using nonlinear least squares (step 5). We applied our pipeline to a time course gene expression data of latently infected T-cells following a latency-reversion. We identified 3,926 temporally differentially expressed genes in the latently infected cell relative to the uninfected cell, after viral reactivation. The temporal profiles of these genes were clustered into 95 co-expression patterns. These clusters provide the dynamic gene response modules (set of genes that share similar response patterns over time). A within-module functional annotations were performed using pathway analysis. In addition, we have constructed a regulation network of the dynamic gene response modules using ordinary differential equations and a high-dimensional variable selection technique. Our results indicate that genetic mechanism after viral reactivation of latently infected cells can be described by a regulatory network of gene modules, which consists of genes that share similar temporal expression patterns. Our findings offer new insights for understanding the biological processes underlying the viral cycle after latency.
Keywords: HIV; Gene Regulatory Network; Ordinary Differential Equations.
Heterogeneity in untreated, stressed and drug tolerant cells: insights to evolution of cancer resistance
by Bhagya Wijayawardena, Gopinath Sivasankaran, Lang Li
Abstract: The evolution of cancer is a Darwinian process, the composition of cancer cell population varying with time, especially after chemotherapy. Such changes occurring in the genetic landscape of tumors are possibly responsible for the chemotherapeutic resistance. Due to the recent advances in single cellular sequencing, the evolution of cancers can be monitored at the single cellular resolution. We reanalyzed a previously published data set composed of single cell RNAseq data, collected before, during and after the establishment of Paclitaxel tolerance, to identify single cell heterogeneity. We used gene expression analysis, gene expression rank correlation analysis, pathway analysis and mutation analysis to identify within group variations of cancer cells after chemotherapy. We identified a decrease in cancer cell within group diversity during the transition to drug-tolerance. Additionally, we observed high mutation rate in stressed single cells, which suggests genetic instability of cancer cells that could ultimately result in development of drug resistance. Our analysis carry significant implications for developing personalized and efficient therapeutics against cancer.
Keywords: Single cell RNAseq; Drug resistance; Paclitaxel; chemotherapy; tumor; drug 18 tolerance; Illumina HiSeq; MDA-MB-231.