Forthcoming articles

International Journal of Computational Biology and Drug Design

International Journal of Computational Biology and Drug Design (IJCBDD)

These articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Register for our alerting service, which notifies you by email when new issues are published online.

Open AccessArticles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.
We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Computational Biology and Drug Design (23 papers in press)

Regular Issues

  • Data Acquisition and Electrical Instrumentation Engineering Modelling for Intelligent Learning and Recognition
    by Jun Qin, Yuhao Jiang 

  • Development of interactive computer learning program for genetics and molecular biology applications
    by Xiaoli Yang, Bin Chen, Yifan Cai, Charles Tseng 

  • In silico studies of bioactive phytocompounds with anticancer activity from in vivo and in vitro extracts of Justicia wynaadensis (Nees) T.Anderson   Order a copy of this article
    by Vandana C D, Shanti K N, Prashantha Karunakar, Vivek Chandramohan 
    Abstract: The current study is aimed at substantiating the anticancer activity of phytocompounds identified from extracts of in vivo and in vitro propagated Justicia wynaadensis (Nees) T. Anderson using in silico molecular docking & dynamics study. Initially, GC-MS analysis of cold aqueous extract of dried leaf of Justicia wynaadensis was performed. Few phytocompounds were selected from the GC-MS results of both aqueous extract, methanolic extract of callus and in vitro propagated leaf of Justicia wynaadensis respectively. The phytocompounds were selected through literature survey based on their anticancer activity. Totally twelve ligands were docked with Thymidylate synthase protein, where the binding energy and efficiencies were analyzed and compared with the reference drug Capecitabine. The docking result obtained suggested the presence of compounds with anticancer activity. All the twelve ligands showed binding affinity ranging from -5.0kcal/mol to -8.4 kcal/mol. Campesterol with -8.4 kcal/mol, Stigmasterol with -8.3 kcal/mol, Squalene with -6.3 kcal/mol, Vitamin E acetate, Phytol and Coumarin with -6.1 kcal/mol binding energy. To investigate the mechanism of action of Campesterol, Stigmasterol and Capecitabine with the target protein these compounds were subjected to dynamic simulation and results revealed that Campesterol was more stable than Stigmasterol and could be used as a potential lead-like molecule.
    Keywords: Campesterol; Stigmasterol; GC-MS; Molecular dynamics; Thymidylate synthase.

  • Importance of safety maintenance of the survived with recent former infection experience during a pandemic syndrome episode: A Study by Difference Equation Approach   Order a copy of this article
    by Subhasis Bhattacharya, Suman Paul, Sudip Mukherjee 
    Abstract: During the outbreak of a highly infectious disease conceded by a virus, handling of healthcare catastrophe is the most momentous part. Any type of known or unknown relaxation may generate enormous loss in terms of population. Present study consider the concern that survived one who has some fresh former infection history can be fingered with appropriate care throughout the syndrome period otherwise a huge harm can be advent by the state. The study follow difference equation modelling considering two aspects where the survived with former infection history handled with care and not reckoned as a part of sustained population and the other is they encompassed with the general population category. The study considers an example of a hypothetical state with some give infection rate, death rate and quarantine rate. By using R- programme language the study observes that proper care for such group of population is very significant to reduce the situation like human loss.
    Keywords: Infectious disease; SARS-CoV-2; 2019-nCov; Difference Equation; Survived from the infected; Quarantine rate; Death Rate.

  • Ultrasonic-assisted rapid extraction of Cassia Sieberiana D.C.: A Box-Behnken design process optimization   Order a copy of this article
    by Saidu Jibril, Hasnah Mohd Sirat, Norazah Basar, Roswanira Abdul Wahab, Naji A. Mahat, Wan Mohd Nuzul Hakimi Wan Salleh 
    Abstract: Ultrasonic-assisted extraction (UAE) on the Cassia sieberiana root was optimized by response surface methodology (RSM) using a three-factor-three-level Box-Behnken design (BBD). A single factor experiment revealed the logic ranges for the UAE parameters for extraction time, temperature and solvent to sample ratio were 10−60 min, 30−60
    Keywords: Box-Behnken; Cassia sieberiana; ultrasonic-assisted extraction; soxhlet extraction; optimization.

  • Pharmacokinetic and molecular docking studies of natural plant compounds of Hibiscus sabdariffa to design antihypertensive compounds targeting AT2R   Order a copy of this article
    Abstract: The Renin-angiotensin system (RAS) plays a major role in maintaining homeostasis of the cardiovascular system by maintaining the fluid and electrolyte balance. The biological functioning of RAS is mediated by octapeptide angiotensin II (Ang II). Ang II binds to two kinds of receptors i.e. angiotensin type 1 receptor (AT1R) and angiotensin type 2 receptor (AT2R) to mediate its action. In humans, AT1R mediates effects such as vasoconstriction, the release of aldosterone, and sodium absorption. AT2R is very abundant in fetus and neonates and is believed to promote vascular growth. AT2R is regarded as the protective arm of the RAS system. It mainly counteracts the effects produced by AT1R. AT2R also plays a role in mediating anti-proliferation, vasodilatation, cellular differentiation, and apoptosis. In the present study natural plant compounds, delphinidin 3-O-?-sambubioside, delphinidin-3-O-glucoside, and cyanidin-3-sambubioside which are anthocyanins, members of the flavonoid group were exploited for controlling the action of AT2R as these are natural colorant found in Hibiscus and exhibit antihypertensive properties. Delphinidin 3-O-?-sambubioside, delphinidin-3-O-glucoside, and cyanidin-3-sambubioside showed a good binding affinity with AT2R. Delphinidin 3-O-?-sambubioside, delphinidin-3-O-glucoside, and cyanidin-3-sambubioside show the binding affinity of -8.2kcal/mol, -8.4kcal/mol and, -9.0kcal/mol respectively with AT2R. Also, the physicochemical properties of these compounds were calculated computationally. The standard drug for AT2R i.e. compound 21 (C21) shown a binding affinity of -9.1kcal/mol which is almost similar to that of these compounds. The findings of the present study provide a new starting point for drug design and discovery for hypertension targeting AT2R from these naturally occurring organic compounds. Overall, this study provides a set of lead molecules that can be further explored through in vitro and in vivo experiments for the development of potential drugs against hypertension targeting AT2R.
    Keywords: Angiotensin II type 2 receptor; Delphinidin 3-O-?-sambubioside; cyanidin-3-sambubioside; Hibiscus sabdariffa.

  • A GPU BASED VIRTUAL SCREENING TOOL USING SOM   Order a copy of this article
    by Jayaraj P B, Mithun K M, Gopakumar G, Jaleel U C A 
    Abstract: This paper attempts to introduce the applicability of low cost GPU alternatives to a Virtual Screening technique using a novel SOM based technique. This method combines the unsupervised learning capability of the SOM with a subsequent supervised labeling of the trained SOM neurons for building the prediction model. This novel iteration-based SOM technique can label molecule as undefined classes, other than actives and inactives, which can reduce the false positives in the screening. The iteration-based refinement technique applied in the algorithm gives comparable accuracy with the previous methods. For running large datasets, the serial implementation of the proposed algorithm is very time-consuming and cannot be completed in a stipulated time frame. This has been overcome by exploiting the parallelism present in finding the winner neuron and neuron weight updating steps. A tool named SOMSCREEN is developed based on the proposed parallelized method to make the drug discovery process faster. The parallelized algorithm speeds up the virtual screening process considerably when implemented on a Graphic Processing Unit (GPU). It is observed that, the proposed method offers reduced false positive rate than the Random Forest based work. The source code and related files of the implementation are available at
    Keywords: Ligand based drug design; Artificial Neural Network; Virtual Screening; Self Organizing Map; Neuron.

  • Molecular Docking, ADME and Toxicity Study of Some Chemical and Natural Plant Based Drugs against COVID-19 Main Protease   Order a copy of this article
    by Rajesh Das 
    Abstract: The novel human coronavirus disease COVID-19 caused by SARS-CoV-2, firstly emerged in Wuhan is responsible for respiratory illness. This virus has spread rapidly around212 Countries and Territories. In viewof the non-availability of any secure vaccine, scientists around the world have been running to develop potential inhibitors against SARS-CoV-2. The present study helps us to identify and screen best phytochemicals (chemical drugs or plant based compounds) as potent inhibitors against COVID-19.In this study, we have measured the virtual interaction of COVID-19 main protease (PDB: 6LU7) with lung cancer, bronchitis and blood thinner drugs as well as some natural plant based compounds. The best docking results have been considered on the basis of disulfiram, tideglusib and shikonin as proposed by Jin et al. This study was done by taking into consideration of molecular docking, performed with Auto Dock 4.2 (ADT4.2). All chemical structures were optimized with Avogadro suite of MMFF94 force field. The final visualization of the docked structure was performed using Discovery Studio Visualizer.The binding energies obtained from docking of COVID-19 main protease with ligandscapmatinib, dabrafenib, alectinib, afatinib, trametinib, ceritinib, entrectinib, brigatinib, crizotinib, lorlatinib, osimertinib, tetracycline, amiodarone and zafirlukastwere found to be -10.59,-9.82, -9.79,-9.75, -9.74, -9.34,-9.13,-8.66, -8.60,-8.56, -8.42, -9.04, -8.47 and-9.81kcal/molrespectively. Similarly the binding energies obtained from the docking of plant based compounds(ligands)orientin, vicenin, cirsimartin, cirsilineol, apigenin, isothymusin, shogaol, paradol, gingerol and vasicinewere -7.95, -8.19, -7.65, -7.45, -7.56, -7.14, -6.61, -6.63, -6.12 and -6.08kcal/mol respectively.ADME drug-likeness and toxicity properties were performed with preADMET, molinspiration and Osiris suite. From this study, we will expectthese drugs to undergo validation in human clinical trials to use as promising candidates for antiviral treatmentwith high potential to fight against COVID-19.
    Keywords: COVID-19 main protease; lung cancer drug; bronchitis drug; blood thinner drug; plant based compounds; molecular docking; ADME; toxicity.

  • Molecular docking studies of Staphylococcal clumping factor A inhibitors from Elettaria cardamomum and Acacia nilotica   Order a copy of this article
    by Rosy Kumari, Ratish Chandra Mishra, Shivani Yadav, Jaya Parkash Yadav 
    Abstract: Clumping factors A (ClfA) is a cell wall adhesin protein of methicillin resistant Staphylococcus aureus (MRSA) which play an important role in interaction with host. In the present study, virtual screening of potential inhibitors from Elettaria cardamomum and Acacia nilotica was carried out against ClfA using autodock 4.0. Top score phytoligands were further subjected to Absorption, Distribution, Metabolism, Excretion (ADME) analysis. Among ninety nine phytochemicals screened, santalol, stigmasterol, undecylenic acid, ?-sitosterol, bergamotol,geraniol showed high dock score against ClfA. In addition undecylenic acid, ?-sitosterol and geraniol follows Lipinski rule and does not inhibit the metabolic enzyme cytochrome p450. Therefore, these compounds can be a potential source of drug development against MRSA.
    Keywords: MRSA; ClfA; Elettaria cardamomum; Acacia nilotica; Phytoligands; Molecular docking; Druglikness; ADME; MBE; PDB.

  • In Silico Neuroprotective Properties of Volatile Constituents of Grape (Vitis vinifera L.) Seed Extract Against Parkinsons Disease   Order a copy of this article
    by Venkatramanan Varadharajan 
    Abstract: Aggregation of ?-synuclein is one of the significant factors in the pathogenesis of Parkinsons disease. Many natural extracts demonstrate neuroprotective activity against PD by inhibiting ?-synuclein aggregation. In India, the grape extract is traditionally used as a brain tonic to boost memory power. Hence in this study, the neuroprotective activity of grape seed extract against Parkinsons disease under in silico conditions was investigated. Molecular docking studies indicated that volatile constituents of ethanolic extract of grape seed could bind to both C-terminal and N-terminal regions of ?-synuclein with more preference towards the formation of hydrogen bonds than hydrophobic interactions. Also, the compounds most commonly interacted with the residues Val40, Lys43 and Lys45 of N-terminal to form both hydrogen bonds and hydrophobic interactions. Among the compounds studied, Dasycarpidan-1-methanol, acetate (ester) molecule showed superior binding affinity, Blood-Brain Barrier penetration, drug-likeness and lead-likeness properties.
    Keywords: Parkinson’s disease; ?-synuclein; Grape seed extract; Volatiles; Molecular docking; Neuroprotective properties; in silico; Interaction; Blood-Brain Barrier; Drug-likeness.

  • Identification of potential anti-obesity drug scaffolds using molecular modeling   Order a copy of this article
    by Amie Jobe, Bincy Baby, Amanat Ali, Ranjit Vijayan 
    Abstract: The prevalence of obesity has remarkably increased in recent decades. An important strategy to combat obesity is to reduce the imbalance between energy intake and expenditure. Pancreatic lipase (PL) and acetyl coenzyme A carboxylase 2 (ACC2) are two promising targets for therapeutic treatment of obesity. In silico techniques including high throughput virtual screening, binding free energy calculations, and molecular dynamics (MD) simulation were used to identify molecules with good potential to inhibit these targets. Derivatives of coumaran-3-one and dioxabicyclo[3.3.0]octane-2,6-diamine are likely to possess inhibitory potential against PL while acetamide and hexanamide derivatives showed inhibitory potential against ACC2. MD simulations of the top scoring molecules confirmed that the identified molecules bind strongly and consistently in the binding site of PL and ACC2. The shortlisted molecules exhibited better interactions and affinity when compared to control molecules and thus could be explored as scaffolds for the development of anti-obesity drugs.
    Keywords: obesity; pancreatic lipase; acetyl coenzyme A carboxylase 2; molecular dynamics; molecular docking.

  • Development of a RNA-Seq based Prognostic Signature for Colon Cancer   Order a copy of this article
    by Bjarne Bartlett, Yong Zhu, Mark Menor, Vedbar Khadka, Jicai Zhang, Jie Zheng, Bin Jiang, Youping Deng 
    Abstract: RNA-Seq data has recently been used to successfully develop prognostic signatures to predict cancer patients who will have a worse prognosis. We designed a study to ascertain whether a prognostic model, based on RNA-Seq data, would have clinical utility for predicting survival in patients with colon adenocarcinomas (COAD). Of particular interest are early stage COAD patients, for whom the benefits of adjuvant therapy are unclear. Data from 468 COAD patients from The Cancer Genome Atlas (TCGA) were obtained and divided into two datasets: training (n=312) and validation (n=156). The training cohort was used to develop a prognostic signature by using univariate cox analysis to assess the prognostic potential of each gene and subsequently building a prognostic model using multivariate cox analysis. Patient survival was compared between different risk groups based on predictions from our model and cancer stage. In the training cohort, univariate cox analysis identified 15 genes (p
    Keywords: Colon cancer; RNA-Seq; Biomarkers; The Cancer Genome Atlas.

  • The Spreading of Covid-19 in India and its impact A Mathematical Analysis   Order a copy of this article
    by Bibhatsu Kuiri, Bubai Dutta, Saikat Santra, Paulomi Mandal, Khaleda Mallick, Ardhendu S. Patra 
    Abstract: The rapid spreading of the coronavirus in India and its behavior for the near future has been studied and analyzed as accurately as possible using the SEIR model as a fundamental tool. The official covid-19 data of infected and death cases in India up to 10th October, 2020 have been considered as raw data. The value of various parameters of the model is optimized by feeding the raw data in the simulation model. The various parameters are defined as infection rate, basic reproduction number, death rate, recovery time, exposure time, and other parameters to optimize the best fit model. The total population of India is considered 1.36 billion people. The simulation results that the number of recovered people will be 2.8X10^8 and number of deaths will be 4.2X10^6 after 800 days for the total population of India. In an ideal scenario, at the end of the pandemic total death count is expected to be of the order of 10^6 which is a big challenge.
    Keywords: Coronavirus (Covid-19); Simulation; SEIR model; India.

  • Interaction Network of Insulin Resistance Proteins with Organo-Phosphorus and Chlorine Pesticides   Order a copy of this article
    by Amitha Joy, S. Balaji, Md.Afroz Alam 
    Abstract: The work focuses on deciphering the underlying mechanism behind insulin resistance through exposure to pesticides. The selected organochlorine and organophosphorus pesticides were analyzed for their interactions with protein targets having a regulatory role in glucose metabolism and the insulin signaling pathway. Their binding affinities were understood based on the docking studies using AutoDock. Nine pesticides gave the minimum binding energy values ranging from -5.17 to -9.79 kcal/mol. An interaction protein network and pesticide network was generated. The merged network, an interaction network showing the binding affinities of pesticides with protein targets is also generated using Cytoscape. An understanding of the molecular interactions between pesticides and various protein targets can help in designing new lead molecules to treat pesticide-driven insulin resistance.
    Keywords: pesticides; insulin resistance; targets; docking; organochlorine; organophosphate; STITCH.

Special Issue on: ICIBM 2019 State-of-the-art Computational Methods and Tools for Analysis of High-dimensional Biological and Biomedical Datasets

  • Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants   Order a copy of this article
    by Ruibang Luo, Tak-Wah Lam, Michael Schatz 
    Abstract: Motivation: Many rare diseases and cancers are fundamentally diseases of the genome. In the past several years, genome sequencing has become one of the most important tools in clinical practice for rare disease diagnosis and targeted cancer therapy. However, variant interpretation remains the bottleneck as is not yet automated and may take a specialist several hours of work per patient. On average, one-fifth of this time is spent on visually confirming the authenticity of the candidate variants.rnResults: We developed Skyhawk, an artificial neural network-based discriminator that mimics the process of expert review on clinically significant genomics variants. Skyhawk runs in less than one minute to review ten thousand variants, and about 30 minutes to review all variants in a typical whole-genome sequencing sample. Among the false positive singletons identified by GATK HaplotypeCaller, UnifiedGenotyper and 16GT in the HG005 GIAB sample, 79.7% were rejected by Skyhawk. Worked on the Variants with Unknown Significance (VUS), Skyhawk marked most of the false positive variants for manual review and most of the true positive variants no need for review.rn
    Keywords: Clinical decision support; Variant validation; Artificial neural network; Third-generation sequencing; Variant calling.

  • PgenePapers: a novel database and search tools of reported regulatory pseudogenes   Order a copy of this article
    by Achal Awasthi, Yan Zhang 
    Abstract: Pseudogenes arose from duplication or retroduplication of genes, however, accumulation of mutations has disabled their protein-coding ability. Although they have been thought of as genomic fossils, recent studies have shown that a considerable number of pseudogenes are actually transcribed in normal and/or cancerous human tissues, and some of them can even regulate gene expression. Studies have detected pseudogene differential expression in specific cancer subtypes, indicating potential functions of pseudogenes in cancer development and clinical relevance to disease outcomes. All these show that pseudogenes make a new class of modulators of gene expression, however, their roles are still largely unknown. Unlike coding genes which have rich functional annotations, there is still a lack of functional annotations of pseudogenes. There is not yet a database focusing on regulatory roles of pseudogenes, even though functional studies have been published in literature. We extracted information about regulatory pseudogenes by analyzing PubMed literature using natural language processing techniques followed by manual curation. The expression values of genes and pseudogenes for all 31 cancer types studied in TCGA were used to get the correlation between genes and pseudogenes. Based on this information, we reconstructed the regulatory networks involving pseudogenes and regulated genes (pseudogene-gene pairs) with disease and tissue specific annotations. We further extended the pseudogene-gene networks to include information on potential miRNAs and drugs targeting components of the networks, based on expression profiles, miRNA binding predictions and known FDA approved drugs. We developed the first comprehensive database of reported regulatory pseudogenes. In order to facilitate the usage of the database, we also developed a user-friendly app called PgenePapers ( which allows flexible database search and provides network visualization. PgenePapers app can display the pseudogene-gene pairs with their functional categories, all the supporting text from literature, interactive visualization of the pseudogene-gene association networks, and customized gene-pseudogene-miRNA-drug networks.
    Keywords: regulatory pseudogene; database; search tools; graph presentation; correlation network; Shiny app.

  • Generating Simulated SNP array and Sequencing Data to Assess Genomic Segmentation Algorithms   Order a copy of this article
    by Mark Zucker, Kevin Coombes 
    Abstract: In order to validate methods for the analysis of high throughput data, it is necessary to obtain data for which the underlying truth is known, so one can verify the accuracy of inferences made by the method and thus quantify the confidence with which it can make inferences. Knowing the ground truth can be extraordinarily difficult in biology, since one can essentially never knows, even in highly controlled conditions, what proportion of cells have what aberrations in a bulk cell sample, particularly in populations of aberration-prone cancer cells. For this reason, the ability to simulate SNP array and DNA sequencing data that recapitulates the variance structure and population complexity of real biological samples would be very useful in assessing the accuracy of and comparing bioinformatics algorithms. In particular, we discuss here the use of segmentation algorithms to identify breakpoints and copy number variation in SNP array or sequencing data. We developed a tool, implemented in an R package called TACG (True and Accurate Clone Generator), to simulate both ground truth and realistic SNP array and/or SNV data. We present this tool and apply it to the assessment of several different approaches to segmentation of copy number data from SNP arrays, with a particular interest in detecting CNVs in cancer samples. We demonstrate that DNAcopy, an algorithm using circular binary segmentation, generally performs best, which is in agreement with previous research. We further determine the conditions under which it and other methods break down. In particular, we assess how characteristics such as clonal heterogeneity, the presence of nested CNVs, and the type of aberration affect algorithm accuracy. The simulations we generated proved to be useful in determining not just the comparative overall accuracy of different algorithms, but also in determining how their efficacy is affected by the biological characteristics of samples from which the data was generated.
    Keywords: SNP Array; copy number alteration; cancer; simulation.

  • Predicting Re-admission to Hospital for Diabetes Treatment: A Machine Learning Solution   Order a copy of this article
    by Satish M. Srinivasan, Yok-Fong Paat, Philmore Halls, Ruth Kalule, Thomas E. Harvey 
    Abstract: Abstract: Background: Predictive analytics embrace an extensive range of techniques including but are not limited to statistical modelling, Machine Learning, Artificial Intelligence and Data Mining. It has profound usefulness in different applications such as business intelligence, public health, disaster management and response, as well as many other fields. This technique is well-known as a practice for identifying patterns within data to predict future outcomes and trends. The objective of this study is to design and implement a predictive analytics system that can be used to forecast the likelihood that a diabetic patient will be readmitted to the hospital. Results: Upon extensively cleaning the Diabetes 130-US hospitals dataset containing patient records spanning 10 years from 1999 till 2008, we modelled the relationship between the predictors and the response variable using the Random Forest classifier. Upon performing hyperparameter optimization for the Random Forest, we obtained a maximum AUC of 0.684 with a precision and recall of 46% and 60% respectively and an F1 Score of 52.07%. Our study reveals that attributes such as number of inpatient visits, discharge disposition, admission type, and number of laboratory tests are strong predictors for the response variable (i.e. re-admission of patients). Conclusion: Findings from this study can help hospitals design suitable protocols to ensure that patients with a higher probability of re-admission are recovering well and possibly reduce the risk of future re-admission. In the long run, not only will our study improve the life quality of diabetic patients, it will also help reduce the medical expenses associated with re-admission.
    Keywords: Random Forest; Data Cleaning; Predictive Analytics; Hyperparameter tuning; optimization.

  • The Minimum Weight Clique Partition Problem and its Application to Structural Variant Calling   Order a copy of this article
    by Matthew Hayes, Derrick Mullins 
    Abstract: The calling of genomic structural variants (SV) in high-throughput sequencing data necessitates prior discovery of abnormally aligned discordant read pair clusters that indicate candidate SVs. Some methods for SV discovery collect these candidate variants by heuristically searching for maximal cliques in an undirected graph, with nodes representing discordant read pairs and edges between vertices indicating that the read pairs overlap. This approach works well for identifying clusters that overlap with noisy mapping artifacts, but could miss distinct variant clusters that are created due to complex structural variants or overlapping breakpoints of distinct SVs. In this paper, we consider the Minimum Weight Clique Partition Problem and its application to the problem of discordant read pair clustering. Our results demonstrate that methods which approximate or heuristically solve this problem can enhance the predictive abilities of structural variant calling algorithms.
    Keywords: clique; structural variant; minimum weight clique; minimum weight clique partition.

  • Rapid Evolution of Expression Levels in Hepatocellular Carcinoma   Order a copy of this article
    by Fan Zhang, Michael Kuo 
    Abstract: The human evolution and cancer evolution have been researched for several years, but little is known about the molecular similarities between human and cancer evolution. One interesting and important question when comparing and analyzing human evolution and cancer evolution is whether cancer susceptibility is related to human evolution. There are a few microarray studies on human evolution or cancer development. Yet, to date, no microarray studies have been performed with both. Since cancer is an evolution on a small time and space scale, we compared and analyzed liver gene expression data among orangutan, chimpanzee, human, nontumor tissue, and primary cancer using linear mixed model, Analysis of Variance (ANOVA), Gene Ontology (GO), and Human Evolution Based Cancer Gene Expression Analysis. Our results revealed not only rapid evolution of expression levels in hepatocellular carcinoma relative to the gene expression evolution rate of human, but also the correlation between human specific gene expression and cancer specific gene expression. Further gene ontology analysis also suggested statistical relationship between gene function and expression pattern might help understanding the relationship between human evolution and cancer development.
    Keywords: cancer evolution; gene expression analysis; pathway analysis; Hepatocellular Carcinoma.

  • LCLE: a web portal for lncRNA network analysis in liver cancer   Order a copy of this article
    by Xiuquan Wang, Keli Xu, Junqing Wang, Yunyun Zhou 
    Abstract: Most of the currently available co-expression network analysis method only can capture linear correlation among genes; however, ignore the non-linear dependent correlations. Accurately and easily getting the distance values among genes are of significant importance in clustering genes which are shared in the same biological functions. We developed an online tool, LCLE, which is able to systematically analyze gene expression data to identify more comprehensive relationships among lncRNAs and protein-coding genes (PCGs) from five different distances metrics. Our simulation results demonstrated that the selection of an appropriate distance method could help to identify novel important genes from networks. Users can download and visualize figures, and distance tables analyzed from publically available RNAseq data such as TCGA and GTEx or upload their own data for analysis. Overall, our web portal will benefit for biologists or clinicians without programming background in identifying novel co-regulation relations for lncRNAs and PCGs.
    Keywords: adjacency matrix; network analysis; correlation; non-coding RNA; cancer.

  • PATH: An interactive web platform for analysis of time-course high-dimensional genomic data   Order a copy of this article
    by Yuping Zhang, Yang Chen, Zhengqing Ouyang 
    Abstract: Discovering patterns in time-course genomic data can provide insights on the dynamics of biological systems in health and disease. Here, we present a Platform for Analysis of Time-course High-dimensional data (PATH) with applications in genomics research. This web application provides a user-friendly interface with interactive data visualization, dimension reduction, pattern discovery, feature selection based on the Principal Trend Analysis. Furthermore, the web application enables interactive and integrative analysis of time-course high-dimensional data based on the Joint Principal Trend Analysis. The utilities of PATH are demonstrated through simulated and real examples. PATH is freely accessible at
    Keywords: Dimension reduction; Longitudinal data; Visualization; Interactive analysis; Feature selection; Joint analysis.

  • On the analysis of the human immunome via an information theoretical approach   Order a copy of this article
    by Maciej Pietrzak, Gerard Lozanski, Michael Grever, Leslie Andritsos, James Blachly, Kerry Rogers, Michal Seweryn 
    Abstract: Deep phenotyping of the cellular components of the immune system (the immunome) enables to gain insight and decompose the multilayer immune network both in health and disease. This area requires computational approaches that allow to detect not only the large-scale changes in the dominant components of the immunome, but consistent differences in the non-abundant components. In this note, we build upon an approach of the authors developed in the context of T-cell antigen receptor repertoire analysis and develop an algorithm that scores cell populations.We show that our feature selection algorithm is at least as sensitive to signal as other, selected, machine learning tools. At the same time, our methods retains low level of false positives. We also demonstrate, that we are able to identify a set of positive controls in a real-life immunome data from Hairy Cell Leukemia patients and detect other, biologically relevant cell populations in this context.
    Keywords: Immunome; Renyi’s entropy; Shannon entropy; Renyi’s divergence,Contingency tables; I-Index; hairy cell leukemia.