International Journal of Computational Biology and Drug Design (15 papers in press)
Data Acquisition and Electrical Instrumentation Engineering Modelling for Intelligent Learning and Recognition
by Jun Qin, Yuhao Jiang
Development of interactive computer learning program for genetics and molecular biology applications
by Xiaoli Yang, Bin Chen, Yifan Cai, Charles Tseng
MOLECULAR DOCKING STUDIES, IN-SILICO ADMET SCREENING, MM-GBSA BINDING FREE ENERGYOF SOME NOVEL CHALCONE SUBSTITUTED 9-ANILINOACRIDINES AS TOPOISOMERASE II INHIBITORS
by Kalirajan Rajagopal, Iniyavan K, Rathika G, Pandiselvi A
Abstract: Novel chalcone substituted 9-anilinoacridines(1a-z) were designed by insilico method for their Topoisomerase-II(Topo-II) inhibitory activity due to DNA-intercalating properties. Docking studies of compounds 1a-z as selective TOPO-II (id-1ZXM) inhibitors by using Schrodinger suit2016-2. Docking study for the molecules were performed by Glide module, insilco ADMET screening by qikprop module and free binding energy by Prime-MMGBSA module. The binding affinity of molecules towards TOPO-II was selected on the basis of GLIDE score. Many compounds showed strong hydrophobic interactions and hydrogen bonding interactions to inhibit TOPO-II. The compounds 1a-z, except 1k have good binding affinity with Glide scores in the range of -5.52 to -7.27 when compared with the standard Ethacridine(-4.23). The ADMET properties are within the recommended values. MM-GBSA binding results of the most potent inhibitor are favourable. The compounds, 1x,z,m,f,r,i with significant Glide scores may produce significant anti-microbial and anti-cancer activities for further investigations may prove their therapeutic potential.
Keywords: Acridine; Chalcone; docking studies; In-silico ADMET screening; MM-GBSA.
Identification of novel neuraminidase inhibitors through e-pharmacophore based virtual screening
by Rohini Kanagavelu, Shanthi Veerappapillai
Abstract: The surface protein of Influenza virus, Neuraminidase (NA), is believed to play a critical role in the release of new viral particle and thus spreads infection. Hence it has been considered as a possible drug target for influenza A virus infection. Despite the number of available drugs for the treatment of influenza infection, the emergence of mutants with novel mutations has embellished more resistance to potent NA inhibitor. Considering the same, in the present study an attempt has been made to discover potent inhibitors from ASINEX library of 467802 molecules through e-pharmacophore based virtual screening strategy. The results from our analysis along with available experimental evidences comprehend that the lead molecule BAS 04358434 could be used as a promising candidate for NA inhibition. Moreover, the hit compound showed potent inhibitory activity against all the mutant structures considered in our analysis. In summary, we speculate that the outcomes of this research are of substantial prominence in the rational designing of novel and efficacious NA inhibitors.
Keywords: Neuraminidase; e-Pharmacophore Model; Enrichment Analysis; Virtual Screening; ASINEX database; Qikprop.
In-silico drug target identification and pharmacophore mapping for Leishmania donovani based on metabolic pathways
by Nikita Chordia, Deepak Bhayal, Priyesh Hardia
Abstract: A wide variety of human population is infected with Leishmania donovani. It is a protozoan parasite which causes very lethal disease called as visceral leishmaniasis. It is the second killer parasitic disease after malaria. It is transmitted by female sandfly and infects both children and adults. It is very prevalent disease and reported to be spread in 88 countries causes 20000-30000 death each year. Till now, there is no specific vaccine or drug for visceral leishmaniasis. It is the most neglected tropical disease in terms of drug discovery and development. Here, we analyzed the metabolic pathway of this parasite for identifying potential drug target. The essential node (gene) which is non- homologous to human in the metabolic pathway were considered for network reconstruction. Reconstructed network is analyzed which results in identification of five drug targets namely: threonine aldolase, Acetyl-CoA acyltransferase pyruvate orthophosphate dikinase, ATP-binding cassette and P-glycoprotein. These targets are efficient and specific for treating Leishmania donovani parasite. For these identified drug targets, pharmacophore is designed that can be used as drug to treat visceral leishmaniasis. Further, docking studies reveals the action of pharmacophore on these drug targets.
Keywords: Leishmania donovani; leishmaniasis; sand fly; parasite; pathways; human; drug target; pharmacophore; docking; node; protein; gene.
INHIBITORY ROLE OF SELECTIVE PHYTOCHEMICALS AGAINST HIV-2 PROTEASE: A STUDY OF MOLECULAR DOCKING, ADMET AND DFT COMPUTATIONS
by Sobia Nazir Chaudry, Waqar Hussain, Nouman Rasool
Abstract: HIV/AIDS, caused by human immune deficiency virus (HIV), has become a significant problem for human lives. Plant extracted compounds are worthy because of their anti-viral, anti-fungal, anti-bacterial potential. This research aimed at in silico drug discovery against HIV-2 protease. A total of 2750 phytochemicals from various medicinal important plants were selected for the current study. Origin of these plants was Pakistan and India, have been reported to be used against different pathogens for long-time. The ADMET, molecular docking, DFT approaches were used to determine potential inhibitory characteristics of these phytochemicals. The ADMET analysis and molecular docking approaches resulted in selection of twenty phytochemicals Oxyresveratrol, Paprarine, Osajin, Eryvarin R, 12S-hydroxyandrographolide, 5, 7, 3', 4'-tetrahydroxyflavone, Hydroxymunduserone, Diprenyleriodictyol, Caffeic Acid, ApigeninB, 3-methoxy-4-hydroxyienzoic acid, Estafin, Feruloyltyramine, SigmoidinB, (+)-medioresinol, Tanaparthe, Xylan, Epoxy, Paprafumine, EryvarinQ which proved to be potential inhibitor against HIV-2 protease and can be opted for additional in vivo and in vivo studies to gain access to their inhibitory effects against HIV-2 protease. Above mentioned 20 phytochemicals showed binding affinity > _ 8.5kcal/mol showed effective inhibition against HIV-2. Furthermore, DFT approach revealed high reactivity for these twenty phytochemicals in binding cavity of HIV-2 protease based on ELUMO, EHOMO and band energy gap. These 20 phytochemicals are novel potential inhibitors against HIV-2 protease promising clinical applications. For commercial-scale applications of theses mentioned phytochemicals their efficacy, safety, reactivity, can be checked by in vitro and in vivo analysis as a potential inhibitor against HIV-2 protease in humans. The development of reported phytochemicals as potential drugs for HIV-2 protease would be therapeutically and economically feasible.
Keywords: ADMET; band energy gap; HIV-2 protease; DFT; molecular docking; phytochemicals.
In silico studies of bioactive phytocompounds with anticancer activity from in vivo and in vitro extracts of Justicia wynaadensis (Nees) T.Anderson
by Vandana C D, Shanti K N, Prashantha Karunakar, Vivek Chandramohan
Abstract: The current study is aimed at substantiating the anticancer activity of phytocompounds identified from extracts of in vivo and in vitro propagated Justicia wynaadensis (Nees) T. Anderson using in silico molecular docking & dynamics study. Initially, GC-MS analysis of cold aqueous extract of dried leaf of Justicia wynaadensis was performed. Few phytocompounds were selected from the GC-MS results of both aqueous extract, methanolic extract of callus and in vitro propagated leaf of Justicia wynaadensis respectively. The phytocompounds were selected through literature survey based on their anticancer activity. Totally twelve ligands were docked with Thymidylate synthase protein, where the binding energy and efficiencies were analyzed and compared with the reference drug Capecitabine. The docking result obtained suggested the presence of compounds with anticancer activity. All the twelve ligands showed binding affinity ranging from -5.0kcal/mol to -8.4 kcal/mol. Campesterol with -8.4 kcal/mol, Stigmasterol with -8.3 kcal/mol, Squalene with -6.3 kcal/mol, Vitamin E acetate, Phytol and Coumarin with -6.1 kcal/mol binding energy. To investigate the mechanism of action of Campesterol, Stigmasterol and Capecitabine with the target protein these compounds were subjected to dynamic simulation and results revealed that Campesterol was more stable than Stigmasterol and could be used as a potential lead-like molecule.
Keywords: Campesterol; Stigmasterol; GC-MS; Molecular dynamics; Thymidylate synthase.
In-Silico Analysis of PON1, PON2 and PON3 Genes Role in Coronary Artery Disease
by Sana Ashiq, Kanwal Ashiq
Abstract: Purpose: Globally coronary artery disease (CAD) is a leading cause of mortality. It is a multifactorial disorder which involves both environmental and genetic factors. There are three members of paraoxonase (PON) gene cluster, which include PON1, PON2 and PON3. These gene products are the antioxidant enzymes, which bind to lipoproteins in the circulation. The imbalance between oxidant and antioxidant mechanism is one of the major etiologic in the pathophysiology of the CAD. Several genetic and biochemical research investigations report the single nucleotide polymorphisms in the coding regions which effects the protein structure and function. Thus, the aim of the study is the in-silico analysis of the PON gene family and its role in the CAD.
Methods: Several computational tools which include the National Centre for Biotechnology Information (NCBI), Uniprot, Expasy, ProtParam, Gene Card, Protein Data Bank (PDB) and PDBSUM used for the in-silico analysis.
Results: Amino acids comparative analysis of all the three members revealed the different composition. The gene card analysis illustrates its association with different lipoproteins. The PDB results indicate the three dimensional structure and its ligands interaction. Further, by using the PDBSUM its interaction with ligands studied which indicate two major ligands one is phosphate and other is dodecyl-beta-d-maltoside.
Conclusion: The present study findings report various ligands which interact with the gene at different positions in addition it also supports its association with lipoproteins. All these findings can help in better understanding of the pathophysiology of the CAD and in the drug designing.
Keywords: Coronary artery disease; Genes; In-silico; Paraoxonase; Single nucleotide polymorphisms?.
Importance of safety maintenance of the survived with recent former infection experience during a pandemic syndrome episode: A Study by Difference Equation Approach
by Subhasis Bhattacharya, Suman Paul, Sudip Mukherjee
Abstract: During the outbreak of a highly infectious disease conceded by a virus, handling of healthcare catastrophe is the most momentous part. Any type of known or unknown relaxation may generate enormous loss in terms of population. Present study consider the concern that survived one who has some fresh former infection history can be fingered with appropriate care throughout the syndrome period otherwise a huge harm can be advent by the state. The study follow difference equation modelling considering two aspects where the survived with former infection history handled with care and not reckoned as a part of sustained population and the other is they encompassed with the general population category. The study considers an example of a hypothetical state with some give infection rate, death rate and quarantine rate. By using R- programme language the study observes that proper care for such group of population is very significant to reduce the situation like human loss.
Keywords: Infectious disease; SARS-CoV-2; 2019-nCov; Difference Equation; Survived from the infected; Quarantine rate; Death Rate.
Special Issue on: ICIBM 2019 State-of-the-art Computational Methods and Tools for Analysis of High-dimensional Biological and Biomedical Datasets
Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants
by Ruibang Luo, Tak-Wah Lam, Michael Schatz
Abstract: Motivation: Many rare diseases and cancers are fundamentally diseases of the genome. In the past several years, genome sequencing has become one of the most important tools in clinical practice for rare disease diagnosis and targeted cancer therapy. However, variant interpretation remains the bottleneck as is not yet automated and may take a specialist several hours of work per patient. On average, one-fifth of this time is spent on visually confirming the authenticity of the candidate variants.rnResults: We developed Skyhawk, an artificial neural network-based discriminator that mimics the process of expert review on clinically significant genomics variants. Skyhawk runs in less than one minute to review ten thousand variants, and about 30 minutes to review all variants in a typical whole-genome sequencing sample. Among the false positive singletons identified by GATK HaplotypeCaller, UnifiedGenotyper and 16GT in the HG005 GIAB sample, 79.7% were rejected by Skyhawk. Worked on the Variants with Unknown Significance (VUS), Skyhawk marked most of the false positive variants for manual review and most of the true positive variants no need for review.rn
Keywords: Clinical decision support; Variant validation; Artificial neural network; Third-generation sequencing; Variant calling.
PgenePapers: a novel database and search tools of reported regulatory pseudogenes
by Achal Awasthi, Yan Zhang
Abstract: Pseudogenes arose from duplication or retroduplication of genes, however, accumulation of mutations has disabled their protein-coding ability. Although they have been thought of as genomic fossils, recent studies have shown that a considerable number of pseudogenes are actually transcribed in normal and/or cancerous human tissues, and some of them can even regulate gene expression. Studies have detected pseudogene differential expression in specific cancer subtypes, indicating potential functions of pseudogenes in cancer development and clinical relevance to disease outcomes. All these show that pseudogenes make a new class of modulators of gene expression, however, their roles are still largely unknown. Unlike coding genes which have rich functional annotations, there is still a lack of functional annotations of pseudogenes. There is not yet a database focusing on regulatory roles of pseudogenes, even though functional studies have been published in literature. We extracted information about regulatory pseudogenes by analyzing PubMed literature using natural language processing techniques followed by manual curation. The expression values of genes and pseudogenes for all 31 cancer types studied in TCGA were used to get the correlation between genes and pseudogenes. Based on this information, we reconstructed the regulatory networks involving pseudogenes and regulated genes (pseudogene-gene pairs) with disease and tissue specific annotations. We further extended the pseudogene-gene networks to include information on potential miRNAs and drugs targeting components of the networks, based on expression profiles, miRNA binding predictions and known FDA approved drugs. We developed the first comprehensive database of reported regulatory pseudogenes. In order to facilitate the usage of the database, we also developed a user-friendly app called PgenePapers (https://integrativeomics.shinyapps.io/PgenePapers/) which allows flexible database search and provides network visualization. PgenePapers app can display the pseudogene-gene pairs with their functional categories, all the supporting text from literature, interactive visualization of the pseudogene-gene association networks, and customized gene-pseudogene-miRNA-drug networks.
Keywords: regulatory pseudogene; database; search tools; graph presentation; correlation network; Shiny app.
Generating Simulated SNP array and Sequencing Data to Assess Genomic Segmentation Algorithms
by Mark Zucker, Kevin Coombes
Abstract: In order to validate methods for the analysis of high throughput data, it is necessary to obtain data for which the underlying truth is known, so one can verify the accuracy of inferences made by the method and thus quantify the confidence with which it can make inferences. Knowing the ground truth can be extraordinarily difficult in biology, since one can essentially never knows, even in highly controlled conditions, what proportion of cells have what aberrations in a bulk cell sample, particularly in populations of aberration-prone cancer cells. For this reason, the ability to simulate SNP array and DNA sequencing data that recapitulates the variance structure and population complexity of real biological samples would be very useful in assessing the accuracy of and comparing bioinformatics algorithms. In particular, we discuss here the use of segmentation algorithms to identify breakpoints and copy number variation in SNP array or sequencing data. We developed a tool, implemented in an R package called TACG (True and Accurate Clone Generator), to simulate both ground truth and realistic SNP array and/or SNV data. We present this tool and apply it to the assessment of several different approaches to segmentation of copy number data from SNP arrays, with a particular interest in detecting CNVs in cancer samples. We demonstrate that DNAcopy, an algorithm using circular binary segmentation, generally performs best, which is in agreement with previous research. We further determine the conditions under which it and other methods break down. In particular, we assess how characteristics such as clonal heterogeneity, the presence of nested CNVs, and the type of aberration affect algorithm accuracy. The simulations we generated proved to be useful in determining not just the comparative overall accuracy of different algorithms, but also in determining how their efficacy is affected by the biological characteristics of samples from which the data was generated.
Keywords: SNP Array; copy number alteration; cancer; simulation.
Predicting Re-admission to Hospital for Diabetes Treatment: A Machine Learning Solution
by Satish M. Srinivasan, Yok-Fong Paat, Philmore Halls, Ruth Kalule, Thomas E. Harvey
Background: Predictive analytics embrace an extensive range of techniques including but are not limited to statistical modelling, Machine Learning, Artificial Intelligence and Data Mining. It has profound usefulness in different applications such as business intelligence, public health, disaster management and response, as well as many other fields. This technique is well-known as a practice for identifying patterns within data to predict future outcomes and trends. The objective of this study is to design and implement a predictive analytics system that can be used to forecast the likelihood that a diabetic patient will be readmitted to the hospital.
Results: Upon extensively cleaning the Diabetes 130-US hospitals dataset containing patient records spanning 10 years from 1999 till 2008, we modelled the relationship between the predictors and the response variable using the Random Forest classifier. Upon performing hyperparameter optimization for the Random Forest, we obtained a maximum AUC of 0.684 with a precision and recall of 46% and 60% respectively and an F1 Score of 52.07%. Our study reveals that attributes such as number of inpatient visits, discharge disposition, admission type, and number of laboratory tests are strong predictors for the response variable (i.e. re-admission of patients).
Conclusion: Findings from this study can help hospitals design suitable protocols to ensure that patients with a higher probability of re-admission are recovering well and possibly reduce the risk of future re-admission. In the long run, not only will our study improve the life quality of diabetic patients, it will also help reduce the medical expenses associated with re-admission.
Keywords: Random Forest; Data Cleaning; Predictive Analytics; Hyperparameter tuning; optimization.
The Minimum Weight Clique Partition Problem and its Application to Structural Variant Calling
by Matthew Hayes, Derrick Mullins
Abstract: The calling of genomic structural variants (SV) in high-throughput sequencing data necessitates prior discovery of abnormally aligned discordant read pair clusters that indicate candidate SVs. Some methods for SV discovery collect these candidate variants by heuristically searching for maximal cliques in an undirected graph, with nodes representing discordant read pairs and edges between vertices indicating that the read pairs overlap. This approach works well for identifying clusters that overlap with noisy mapping artifacts, but could miss distinct variant clusters that are created due to complex structural variants or overlapping breakpoints of distinct SVs. In this paper, we consider the Minimum Weight Clique Partition Problem and its application to the problem of discordant read pair clustering. Our results demonstrate that methods which approximate or heuristically solve this problem can enhance the predictive abilities of structural variant calling algorithms.
Keywords: clique; structural variant; minimum weight clique; minimum weight clique partition.
Rapid Evolution of Expression Levels in Hepatocellular Carcinoma
by Fan Zhang, Michael Kuo
Abstract: The human evolution and cancer evolution have been researched for several years, but little is known about the molecular similarities between human and cancer evolution. One interesting and important question when comparing and analyzing human evolution and cancer evolution is whether cancer susceptibility is related to human evolution. There are a few microarray studies on human evolution or cancer development. Yet, to date, no microarray studies have been performed with both. Since cancer is an evolution on a small time and space scale, we compared and analyzed liver gene expression data among orangutan, chimpanzee, human, nontumor tissue, and primary cancer using linear mixed model, Analysis of Variance (ANOVA), Gene Ontology (GO), and Human Evolution Based Cancer Gene Expression Analysis. Our results revealed not only rapid evolution of expression levels in hepatocellular carcinoma relative to the gene expression evolution rate of human, but also the correlation between human specific gene expression and cancer specific gene expression. Further gene ontology analysis also suggested statistical relationship between gene function and expression pattern might help understanding the relationship between human evolution and cancer development.
Keywords: cancer evolution; gene expression analysis; pathway analysis; Hepatocellular Carcinoma.