International Journal of Bioinformatics Research and Applications (34 papers in press)
Bioinformatics Resources and Approaches for The Interaction of Oryza Sativa and Magnaporthe Oryzae Pathosystem
by Vinay Sharma, Varshika Singh, Pramod Katara
Abstract: Rice is a major cereal crop and serves as staple food for a large part of the human population of world. Rice blast, caused by Magnaporthe oryzae, is a very important disease that attacks rice; affecting its production and is of common occurrence wherever rice is grown. It is also considered as a model disease for the study of genetics and molecular pathology of host pathogen interactions. Numerous comprehensive studies on both the host and pathogen have been carried out using genomics, proteomics and bioinformatics approaches. Consequently an enormous amount of information has been made available for researchers to carry out further work on this pathosystem. rnBioinformatics has played a significant role in storage and interpretation of the data made available by various wet laboratory experiments, into useful biological information. This review presents an overview of the bioinformatics resources and approaches for the study of rice- Magnaporthe interaction. rn
Keywords: bioinformatics; disease; nucelotide sequence; pathogen; database; host- pathogen interaction; rice blast; genomics.
Efficient Formulation of the Rejection-based Algorithm for Biochemical Reactions with Delays
by Vo Thanh, Roberto Zunino, Corrado Priami
Abstract: The rejection-based stochastic simulation algorithm(RSSA) is an exact
simulation for realizing temporal behavior of biochemical reactions. It reduces the
number of propensity updates during the simulation by using propensity bounds of
reactions to select the next reaction firing.We present in this paper a new efficient
formulation of RSSA and extend it for incorporating biochemical reactions with
time delays. Our new algorithm explicitly keeps track of the putative firing times
of reactions and uses these to selects the next reaction firing. By using such a
representation, it can efficiently handle biochemical reactions with delays and
achieve computational efficiency over existing approaches for exact simulation.
Keywords: Computational biology; Stochastic simulation; Rejection-based stochastic simulation algorithm.
Exploring New Features of a-amylases from Different Source Organisms by an In Silico Approach
by Javad Harati
Abstract: Abstract rnA total of 78 full-length protein sequences of α-amylase from different source organisms were subjected to phylogenetic analysis, multiple sequence alignment (MSA), motif search, and physiochemical properties. The phylogenetic tree was built using the Maximum Likelihood (ML) method in Molecular Evolutionary Genetics Analysis (MEGA) software and was pointed out in two major clusters. One of the clusters included plants and animals, whereas the other one contained fungi, archaea, and bacteria. Furthermore, Firmicutes and Proteobacteria are bacterial phylum that placed in the same evolutionary cluster with plants and animals. The deviations from normal clusters were explained by both motif analysis data and constructing a new tree. MSA declared three conserved sequence blocks, 505-527, 725-745, and 1010-1030, that were present in all studied species. Moreover, it provided information about highly conserved residues at which three glycine and one aspartic acid residues were conserved. Motif analysis with Multiple EM for the Motif Elicitation (MEME) server revealed that Motif 4 HDTGSTQRHWPFPSDHVMQGYAYILTHPGIPCIFYDHFFDW, motif 6 EGAGGPSTAFDFTTKGILQEAVKGELWRLRDPQGKPPGMIGWWPERAVTF, and motif 11 EQIVKLIAIRKRNGIHSRSSIRILEAEGDLYVAMIDEKVCMKIG were present only in plants. Pearson correlation analysis to clarify relationships among different physiochemical properties showed a direct correlation between GRAVY and the aliphatic index and a reverse correlation between GRAVY and pI and instability indexes.
Keywords: a-Amylase; Sequence analysis; Phylogenetic analysis; Conserved regions and residues; Physiochemical characteristics.
Computational Protein Design of Bacteriocins based on structural scaffold of aureocin A53
by Sekhar Talluri
Abstract: Bacteriocins are highly potent polypeptide and protein antibiotics produced by bacteria. They are rapidly degraded in the environment after their use, due to their proteinaceous nature. Some bacteriocins are used as preservatives in foods. Native and engineered bacteriocins are of potential interest as replacements for conventional antibiotics that are loosing their efficacy due to development of antibiotic resistant strains. Aureocin A53 is a class II bacteriocin. It is a broad spectrum antibiotic, with demonstrated ability to inhibit growth of methycillin resistant Staphylococcus aureus (MRSA). Validated computational protein design tools have been used for reengineering of the Aureocin A53 sequence to produce novel sequence variants of the bacteriocins Aureocin A53 and Lacticin Q. The novel proteins are expected to possess an altered spectrum of bactericidal specificity and potency. The quality of the designed proteins was assessed by using structure validation tools and predicted to be better than that of an average experimentally determined protein structure. The protein designed by using FoldX is predicted to be more stable than native Aureocin A53.
Keywords: Bacteriocin; computational protein design; antibiotic; protein engineering; molecular modeling; MRSA (methycillin resistant Staphylococcus aureus).
Molecular docking and in vitro study of S. cumini-derived natural compounds on Receptor tyrosine kinases pathway components
by Pushpendra Singh, Felix Bast, Satej Bhushan, Richa Mehra, Pooja Kamboj
Abstract: Syzygium cumini (S. cumini) are used for a variety of biological activity such as anti-inflammatory, antidiabetic and antioxidant, and currently it has been reported for the DNA protection against radiation. Receptor tyrosine kinases (RTKs) are recognized to control various biological processes including, cell proliferation, metabolism, and apoptosis. These receptors have recently, trapped the consideration of the as an attractive target for cancer treatment due to the confirmation signifying their over-expression in cancer cells. The present research was subjected to screen S. cumini-derived natural compounds against RTKs pathway components by using molecular docking. Furthermore, in vitro anticancer activity of leaf extract of S. cumini such as cell proliferation (MTT), oxidative stress (NBT and H2CDFD) was reported. All selected natural compounds were docked with the X-ray crystal structure of RTKs signaling proteins by employing GLIDE (Grid-based ligand docking with energetics) Maestro 9.6. In the present investigation, our result highlighted that; myricetin, kaempferol, delphinidin chloride, ellagic acid, rutin, petunidin, gossypol, and mirtillin yielded a good dock score with all selected proteins. Protein-ligand interactions accentuated that the lipophilic, hydrogen bonding, π-π stacking, and cationπ interactions represent a ruling contribution at the active site. Moreover, reduction in cell viability with leaf extract of S. cumini treatment at concentrations of 5
Keywords: Keywords: Cancer; Receptor tyrosine kinases; Phosphoinositide-3 Kinase; Natural product compounds; and Maestro 9.6.rn.
Cell-Level 3D Reconstruction and Quantification of the Drosophila Wing Imaginal Disc
by David Breen, Liyuan Sui, Linge Bai, Frank Jülicher, Christian Dahmann
Abstract: We describe a set of techniques that, when applied to a 3D stack of confocal microscopy images, produces a volumetric model of an epithelial tissue, as well as a mesh model of its apicolateral cell boundaries. Via a projection step, detailed 3D models that approximate the individual cells in the epithelium are then defined. Once the individual cells are generated, their apical face area, height and volume may be computed and visualised, providing quantitative and visual data about the patterns of cells within the tissue. We have applied the techniques to the analysis of the developing wing imaginal disc of a late-larval Drosophila melanogaster. Our techniques are being applied to a series of specimens in an investigation that intends to quantitatively substantiate observed cell shape changes that occur during wing imaginal disc development.
Keywords: Reconstruction; implicit models; epithelial tissues; wing imaginal disc; visualisation.
Construction of Discrete Descriptions of Biological Shapes through Curvilinear Image Meshing
by Jing Xu, Andrey Chernikov
Abstract: Mesh generation is a useful tool for obtaining discrete descriptors of biological objects represented by images. The generation of meshes with straight sided elements has been fairly well understood. However, in order to match curved shapes that are ubiquitous in nature, meshes with curved (high-order) elements are required. Moreover, for the processing of large data sets, automatic meshing procedures are needed. In this work, we present a new technique that allows for the automatic construction of high-order curvilinear meshes. This technique allows for a transformation of straight-sided meshes to curvilinear meshes with C1 or C2 smooth boundaries while keeping all elements valid and with good quality as measured by their Jacobians. The technique is illustrated with examples. Experimental results show that the mesh boundaries naturally represent the objects' shapes, and the accuracy of the representation is improved compared to the corresponding linear mesh.
Keywords: biomedical image processing; high-order mesh generation; B.
RECENT ADVANCEMENT IN NEXT-GENERATION SEQUENCING TECHNIQUES AND ITS COMPUTATIONAL ANALYSIS
by Khalid Raza, Sabahuddin Ahmad
Abstract: Next Generation Sequencing (NGS), a recently evolved technology, have served a lot in the research and development sector of our society. This novel approach is a newbie and has critical advantages over the traditional Capillary Electrophoresis (CE) based Sanger Sequencing. The advancement of NGS has led to numerous important discoveries, which could have been costlier and time taking in case of traditional CE based Sanger sequencing. NGS methods are highly parallelized enabling to sequence thousands to millions of molecules simultaneously. This technology results into huge amount of data, which need to be analysed to conclude valuable information. Specific data analysis algorithms are written for specific task to be performed. The algorithms in group, act as a tool in analysing the NGS data. Analysis of NGS data unravels important clues in quest for the treatment of various life-threatening diseases; improved crop varieties and other related scientific problems related to human welfare. In this review, an effort was made to address basic background of NGS technologies, possible applications, computational approaches and tools involved in NGS data analysis, future opportunities and challenges in the area.
Keywords: Massive Parallel Sequencing; Variant Discovery; DNA-Seq; RNA-Seq; Computational Analysis.
Application of machine learning techniques towards classification of drug molecules specific to peptide deformylase against Helicobacter pylori
by Surekha Patil
Abstract: It is crucial to adapt to the current computational drug discovery pipeline to develop novel drug molecules to combat the gastric disorders caused by Helicobacter pylori. Virtual screening techniques can be used as a preliminary screening tool to identify the relevant compounds which may have drug-like properties. These drug-like molecules can be further screened to test their bioactivity against a particular protein target. In this context, we apply different machine learning techniques to generate models to predict the pIC50 value of drug molecules. Molecular descriptors were produced for the drug dataset. Initial models were developed for the dataset with a large number of descriptors. Later, feature reduction techniques were applied to yield feature descriptors with best six variables using three algorithms: principal component analysis (PCA), random forest, and genetic algorithm. Consequently, machine learning techniques were applied to the reduced dataset to develop predictive models. Na
Keywords: Helicobacter pylori; gastric disorders; drug molecule; target protein; virtual screening.
Computational study to understand mechanism of isoniazid drug resistance caused by mutation (R268H) in NADH dehydrogenase of Mycobacterium tuberculosis
by Lingaraja Jena, Shraddha Deshmukh, Tapaswini Nayak, Gauri Wankhade, Bhaskar Harinath
Abstract: NADH dehydrogenase (Ndh) of Mycobacterium tuberculosis is essential for conversion of NADH to NAD+ in presence of FMN. An increased NADH/NAD+ ratio was reported due to mutation (R268H) in Ndh, causing INH resistance. To study the effect of this mutation on Ndh, molecular dynamics (MD) simulation analysis was performed for both wild and mutant models independently as well as for docked complexes (Ndh-NADH and Ndh-FMN). Simulation study showed that mutation (R268H) affected the secondary structure of the enzyme giving extra stability to the mutant model R268H as observed in the RMSD plot. Further, it was observed that both wild type and mutant models of Ndh were quite stable in complex with NADH but in case of FMN, the Ndh mutant appears to be more unstable and might be the reason for decreasing NAD+ concentrations thus hindering INH-NAD adduct formation resulting in isoniazid resistance.
Keywords: NADH degydrogenase; tuberculosis; isoniazid; drug resistance; mutation; NAD.
Statistical Analysis of the in silico binding affinity of P-glycoprotein and its substrates with their experimentally known parameters to demonstrate a cost-effective approach for screening, ranking and possible prediction of potential substrates
by Suneetha Susan Cleave A, P.K. Suresh
Abstract: Over-expression of P-glycoprotein (P-gp) has been reported as a cause of multi-drug resistance in cancers and other diseases. Transport assays, which are generally used to find out the specificity of a compound to be effluxed, have always been time consuming, resource-intensive and expensive and thus, have inherent limitations to easily predict a compounds specificity. Hence, there is a clear-cut, unmet need to develop cost-effective methods for screening, identification and ranking of P-gp substrates. All compounds (23 substrates and 3 non-substrates) were docked to two homology modeled human P-gp conformations. The in silico binding affinities, obtained for all substrates, were checked for correlation with their experimentally determined efflux ratios, LogP values and number of hydrogen bond acceptors they possess. Docking results showed that all compounds demonstrated differences in relative binding affinity. Experimentally-derived efflux ratio obtained for 19 substrates from literature, for the first time showed a significant, Spearman correlation with binding energies to outward-facing conformation. Thus, it can be said that binding energies obtained from docking studies can possibly have significant potential in identifying the specificity and ranking P-gp substrates. This approach provides a sound foundation to strengthen the relationship of in silico binding energies with other experimentally defined physico-chemical parameters and can also be part of an iterative process to identify and develop a potentially, validatable solution.
Keywords: Autodock; in silico binding energy; P-glycoprotein (P-gP); efflux ratio; LogP; hydrogen bond acceptors; Spearman Rank Correlation.
Genetic algorithm based clustering for gene-gene interaction in episodic memory
by Sudhakar Tripathi, Ravi Bhushan Mishra, Anand Sharma
Abstract: After the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, it is now clear that gene-gene interactions are fundamental mechanisms for the development of complex diseases. In this paper, we propose a genetic algorithm based clustering algorithm to identify groups of related genes in episodic memory. This clustering method required number of clusters and number of genes in each cluster and fitness function. In this paper, we have taken STRING 9.1 clustering method result on episodic memory. We have used interaction between clusters as a fitness function for the genetic algorithm and have compared the result of genetic algorithm based clustering algorithm with standard K-means, STRING 9.1 K-means, Hierarchical and SOM. We have evaluated the performance of all the above methods using Rand index, Jaccard index and Minkowski index. Our comparative study demonstrates that the proposed genetic algorithm is close to hierarchical clustering method So far as the performance is concerned.
Keywords: gene-gene interaction; clustering; genetic algorithm; k-means; hierarchical; SOM; STRING 9.1.
Effect of single amino acid mutations on C-terminal domain of breast cancer susceptible protein 1
by Satish Kumar, Lingaraja Jena, Maheswata Sahoo, Kanchan Mohod, Sangeeta Daf, Ashok Varma
Abstract: The most commonly diagnosed cancer in women is the breast cancer. Around 5 - 10% of breast cancer cases are hereditary, mainly due to the mutation in the breast cancer susceptible BRCA1 and BRCA2 tumor-suppressor genes. More than hundreds mutations are documented in BRCA1 C-terminal region (BRCT), mainly associated with repairing DNA damage and cell cycle control. In this study, we employed different mutation analysis system such as SIFT, MutPred, PON-P2, META-SNP etc to predict the pathological effects of 95 distinct miss sense mutation on BRCT domain. Out of which, 37 mutations were predicted to be deleterious by all mutation analysis systems affecting the protein stability and its normal function leading to causing cancer. The computational approach for finding out the impact of mutation on BRCA protein may provide a way in early detection and therapy in breast cancer patients.
Keywords: breast cancer; mutation; BRCA1; BRCT; bioinformatics; mutation analysis.
On using the wisdom of the crowd principles in classification, Application on breast cancer diagnosis and prognosis.
by Merouane Amraoui, Tarik Boudghene Stambouli, Belal Alshaqaqi
Abstract: Breast cancer diagnosis and prognosis are an oblique processes, where errors can be fatal, it is done by experts only. Therefore, researchers are using the promising potentials of classification algorithms to detect malignant and benign tumours. Classification techniques vary widely, from individual classifiers such as rules, trees and functions to ensemble classifiers that combine serval classification algorithms. In this paper, we examine the use of wisdom of crowds in classification of breast cancer. We use four well-known data sets and run a collection of 53 algorithms combined with majority voting to simulate the wisdom of crowds. Furthermore, we report the results obtained from all of 53 algorithms executed individually on the four datasets. Therefore, this article can be perceived as a review for the classification methods as well. Finally, we compare the results obtained from applying majority voting using the best five classifiers, to those obtained by applying the wisdom of the crowds.
Keywords: breast cancer; wisdom of the crowd; WDBC; WPBC; BCD; Wisconsin; Weka; classification; majority voting; diagnosis; prognosis;.
Long Non-coding RNAs in Animal Genomes: Challenges and Promises
by Prashanth Suravajhala, Lingzhao Fang
Abstract: Majority of the eukaryotic genes do not code for proteins, i.e. there are regions without coding potential. If they do not code, it was earlier supposed to be of disinterest as they wouldnt be associated with any disease. However, the last decade has seen advances in the field with certain (non-coding) RNA molecules transcribed; regulate expression of genes and further known to affect the transcription and cell cycle of organism. A class of such non-coding RNAs identified during the last decade is long non-coding RNAs (lncRNA) that are known to play a role in wide variety of diseases. We outline a few challenges and promises of lncRNAs specific to animal/livestock genomes that we could exploit in identifying their role in various diseases. For brevity, we have considered bovine/clinical mastitis to show an example.
Keywords: Long non-coding RNAs; transcription; diseased genes.
Detection of Postural Balance Degradation using Fuzzy Neural Network
by Neeraj Singh
Abstract: Postural balance is often studied in order to understand the effect of sensory degradation with age. The aim this study is to develop a set of methods for analysing static and dynamic stabilogram signals to determine a different set of parameters, which can be used to detect a degradation in equilibrium using the self-adaptive neuro-fuzzy inference systems (SANFIS). For analysing the static stabilogram signal, the first method of detecting the critical point interval (CPI) at which sensory feedback is developed as part of a closed-loop postural control strategy. For analysing the dynamic stabilogram signal, the second method is developed as autoregressive moving average (ARMA) (rate of changes or fluctuation) and area of a curve under the slope from the Z-force signal (Z- Area) during stepping up. Static and dynamic balance is evaluated using a force plate for a group of young subjects and elderly subjects. The conducted experiments using static signals show that the lower values of CPI are associated with increased closed-loop postural control, indicating a quicker response to sensory input. The CPI for elderly subjects occurs significantly quicker than for young subjects, indicating that posture is more closely controlled. Similarly, the conducted experiments using dynamic signals show that the lower values of ARMA and higher values of Z-Area are indicative of a more hesitant step up. Young subjects have significantly higher values of ARMA than elderly subjects. Similarly, elderly subjects have significantly greater Z-Area values than young subjects. Further, the determined features from static and dynamic stabilogram signals are used to detect and predict the degradation in postural balance using fuzzy neural network. The selected features are randomly selected for training and testing during the classification and prediction in postural balance, where we have achieved average 95.3% accuracy of the result of classification and prediction of the degradation in equilibrium in 10 trials.
Keywords: Centre of pressure; postural control; stepping-up; ground reaction forces; clustering; neuro-fuzzy systems.
In vitro and in silico antimicrobial activity of compounds isolated from Trianthema decandra L.
by Geethalakshmi Rajarathinam, Sarada D.V.L.
Abstract: Phytochemical investigation on the leaves of Trianthema decandra resulted in the isolation and identification of two compounds from the chloroform extract. The compounds were characterized using HPLC, UV, FT-IR, NMR, LCMS and CHNS analyzer. The structure of compounds were elucidated from spectral data and named according to rules laid down in IUPAC nomenclature. A novel sterol was named 17-(5-ethyl-6-methylheptan-2-yl) -4, 4, 10, 13-tetramethyl-hexadecahydro-1H-cyclopenta (α) phenanthren-3-ol and the flavanoid was named 2-(3, 4 dihydroxy - phenyl)-3, 5, 7 trihydroxy- chromen-4 one. The sterol and flavonoid isolated were screened for antimicrobial activity against Staphylococcus aureus (MTCC 29213), Streptococcus faecalis (MTCC 0459), Enterococcus faecalis (MTCC 2729), Escherichia coli (MTCC 443), Pseudomonas aeruginosa (MTCC 1035), Salmonella typhi (MTCC 98), Vibrio cholerae (MTCC 3906), Proteus vulgaris (MTCC 1771), Bacillus subtilis (MTCC 121) and Yersinia enterocolitica (MTCC 840) in vitro using disc diffusion and broth dilution assays. The compounds exhibited very good activity against all the tested microorganisms. Diameter of zone of inhibition (DIZ) 23
Keywords: Trianthema decandra; Sterol; Flavonoid; antimicrobial; PBP.
Graph pruning based approach for inferring disease causing genes and associated pathways
by Jeethu Devasia, Priya Chandran
Abstract: Analysis of interactions among genes in molecular interaction
networks leads us into the understanding of cellular processes in a system level.
Differentially expressed genes and their interactions form the basis of the disease
state. The problem of inferring disease causing genes and dysregulated pathways
has obtained a vital position in computational biology research. But, the huge
size of the biological network makes this process computationally challenging.
Here, we tackle the problem of inferring disease causing genes and associated
pathways using graph pruning techniques which focus on the improvement in
accuracy of results in reasonable execution time and fetching more causal genes
and their pathways. Experimentation of the proposed approach and the reported
approaches in literature was done on real biological data. More efficient results in
terms of accuracy and execution time based on benchmark datasets were obtained
as its outcome. Apart from these, this paper focuses on retrieving more number
of newly identified genes and their pathways so that these genes/pathways could
be analyzed for any unknown influences in the disease development. Biological
relevance of the results was also analyzed. If the function of the newly identified
genes/pathways in the disease states could be validated biologically, it would
significantly influence our effort to design new drug targets and defeat the diseases.
Keywords: Biological Network; Gene expression; Disease causing genes; Dysregulated pathways; Graph pruning.
In silico deleterious prediction of Nonsynonymous Single Nucleotide Polymorphisms in Neurexin1 Gene for Mental Disorders
by Ashraf Hendam, Ahmed Farouk Al-Sadek, Hesham A. Hefny
Abstract: Neurexin1 (NRXN1) gene is playing an important role in synaptic formation, plasticity and maturity. Studies have reported non-synonymous SNPs in NRXN1 in patient with mental disorders. The current work is applying computational tools on recoded NRXN1 SNPs in mental disorder patients. The aim of the work is to identify deleterious SNPs, determine damaged protein features (function, stability) and recognize potential protein regions for future research. The effect on protein function is predicted by PROVEAN, SIFT and PolyPhen-2 while protein stability is predicted by MUpro and I-Mutant2.0. Prediction results have identified 2 SNPs to be deleterious by all tools. Higher deleterious results in the stability tools with the percentages of 72%, 78% than the function tools with 25%, 41% and 47%. Agreement percentage of deleterious prediction between stability tools was 56% while 12.5% in the function tools. The identified regions of NRXN1 for future research are SP and LNS4.
Keywords: Nonsynonymous SNP;In silico;Neurexin1;Mental;Disorders;Autism;PROVEAN;SIFT;PolyPhen-2 ;MUpro;I-Mutant2.0.
Identification of novel flowering genes using RNA-Seq pipeline employing combinatorial approach in Arabidopsis thaliana time-series apical shoot meristem data
by Sumukh Deshpande, Anne James, Chris Franklin, Lindsey Leach, Jianhua Yang
Abstract: Floral transition is a crucial event in the reproductive cycle of a flowering plant during which many genes are expressed that govern the transition phase and regulate the expression and functions of several other genes involved in the process. Identification of additional genes connected to flowering genes is vital since they may regulate flowering genes and vice versa. Through our study, expression values of these additional genes has been found similar to flowering genes FLC and LFY in the transition phase. The presented approach plays a crucial role in this discovery. An RNA-Seq computational pipeline was developed for identification of novel genes involved in floral transition from A. thaliana apical shoot meristem time-series data. By intersecting differentially expressed genes from Cuffdiff, DESeq and edgeR methods, 690 genes were identified. Using FDR cutoff of 0.05, we identified 30 genes involved in glucosinolate and glycosinolate biosynthetic processes as principle regulators in the transition phase which provide protection to plants from herbivores and pathogens during flowering. Additionally, expression profiles of highly connected genes in protein-protein interaction network analysis revealed 76 genes with non-functional association and high correlation to flowering genes FLC and LFY which suggests their potential and principal role in floral regulation not identified previously in any studies.
Keywords: Apical shoot; Flowering; Pipeline; Cuffdiff; Step Analysis; Differential expression; Enrichment; Arabidopsis Thaliana.
A Comparison of Genetic Imputation Methods using Long Life Family Study Genotypes and Sequence Data with the 1000 Genome Reference Panel
by Aldi Kraja
Abstract: This study compares methods of imputing genetic markers, given a typed GWAS scaffold from the Long Life Family Study (LLFS) and latest reference panel of 1000-Genomes. We examined two programs for pre-phasing haplotypes MACH / SHAPEIT and MINIMAC / IMPUTE for imputation. SHAPEIT is advantageous for haplotype pre-phasing. MINIMAC and IMPUTE produced similar imputation quality. We used a 4MB region on chromosome 2 of LLFS and in the Supplement, we compared methods using chromosome 19 data from the Genetic Analysis Workshop-19. IMPUTE had the advantage of using two references 1000G and a sequence for a subset of subjects. SHAPEIT and IMPUTE were used to finalize the full LLFS autosome imputation. In LLFS, 44% of ~80M autosomal imputed variants showed good imputation quality (info ≥ 0.30). Low imputation quality was associated with a predominantly low allele frequency in 1000-Genomes. New emerging large-scale sequences and enhanced imputation methodologies will further improve imputation quality.
Keywords: genetic imputation; 1000 genomes reference; sequence reference; MACH software; MINIMACH software; SHAPEIT software; IMPUTE software; FCGENE software; Long Life Family Study.
Extrapolating the effect of nonsynonymous SNP in bread wheat HSP16.9B gene: a molecular modeling and dynamics study
by Bharati Pandey, Saurabh Gupta, Atmakuri Ramakrishna Rao, Dev Mani Pandey, Ravish Chatrath
Abstract: Small heat shock proteins (sHSP) are ubiquitous and play a key role in protein homeostasis under stress conditions. Single nucleotide polymorphism was predicted in HSP16.9B gene but so far its impact on protein structure has not been extensively studied. Keeping this point in mind, we applied computational methods and performed molecular dynamics simulation to examine the effect of aspartic acid (D) substitution for asparagine (N) at residue 11 (D11N) in HSP16.9B. Furthermore, the secondary structural analysis revealed an addition of beta sheet before the mutation position in the mutant protein. Three dimensional protein structure modeling, validation of structures and molecular dynamics were performed to gain insight into the influence of the non-synonymous single nucleotide polymorphism on structural changes. The root mean square deviation result showed the stability of the mutated structure throughout simulations. The root mean square fluctuation and H-bond scores further supported our results. Altogether, our investigation will be a landmark to understanding the molecular basis of HSP16.9 functionality.
Keywords: Molecular dynamics simulation; Heat shock protein; Molecular modeling; Secondary structure.
Subspace module extraction from MI-based co-expression network
by Sarmistha Deb, Priyakshi Mahanta, Dhruba K. Bhattacharyya, Malay Ananda Dutta
Abstract: Most of the existing methods in literature have used proximity measures in the construction of co-expression networks (CEN) consisting of functional gene modules. This work describes the construction of co-expression network using mutual information (MI) as a proximity measure with non-linear correlation. The network modules are extracted that are defined over a subset of samples. This method has been tested on several publicly available datasets and the subspace network modules obtained have been validated in terms of both internal and external measures.
Keywords: co-expression network; mutual information; network modules; topological property.
Sample-to-sample p-value variability and its implications for multivariate analysis
by Wei Wang, Wilson Wen Bin Goh
Abstract: Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that p-values are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.
Keywords: p-value; statistical feature selection; t-test; variability; Wilcoxon rank-sum test.
Single-trial evoked potentials denoising using adaptive modelling
by Mahmoud Boudiaf, Moncef Benkherrat, Khaled Mansouri
Abstract: This study presents a method for improving the signal-to-noise ratio of single-trial event-related potentials. The method is based on adaptive linear combiner Hermite model. A variable step size least mean square algorithm is used to estimate and to adjust the parameters of the filter. The performances of the method are applied to simulated data and real event-related potential recordings. The method significantly enhances the observation of single trials and the estimation of amplitude and latency of the event-related potentials.
Keywords: adaptive linear combiner; EEG; event-related potentials; Hermite basis functions; VSS-LMS algorithm.
A survey of predictive analytics using big data with data mining
by S. Poornima, M. Pushpalatha
Abstract: Today, the world is filled with data like Oxygen. The amount of data being harvested and eaten up is flourishing vigorously in the digital world. The growing exploitation of novel inventions and social media leads to the generation of huge quantities of data called Big data which can bring remarkable information if analysed properly. Organizations may undergo for analysis of big data to having better decisions, thus big data analytics is being paid attention in recent times. For finding the concealed values from big data, society requires new schemes or strategies. Predictive analytics comprises of several statistical and analytical techniques for developing novel strategies for the future possibilities of prediction. Therefore, Predictive analytics becomes vital when an essential quantity of highly sensitive data has to be handled. Based on the perceived events, future probabilities and measures are predicted. With the aid of available data mining techniques, predictive analytics predicts the events in future and can make recommendations called prescriptive analytics. This review paper gives clear idea to apply data mining techniques and predictive analytics on different medical dataset to predict various diseases with accuracy levels, pros and cons, that concludes about the issues of those algorithms and futuristic approaches on big data.
Keywords: big data; classification; data mining; predictive analytics.
Potential of photoplethysmogram for the detection of calcification and stenosis in lower limb
by Neelamshobha Nirala, R. Periyasamy, Awanish Kumar
Abstract: Early detection of arterial stiffness (AS) and atherosclerosis in lower limb is useful for the detection of cardiovascular and diabetic foot diseases. We used photoplethysmogram for the screening of peripheral arterial disease (PAD), and detection of AS occurred due to calcification. The study included three different groups (15-normal subjects (group-1), 6-subjects with known calcification (group-2) and 13 PAD patients). Compared to group-1, we obtained a significant increase in rise-time (282.00 vs. 305.50, p value = 0.009) and area under rise-time (AUR) (71.075 vs. 76.085, p value = 0.041) in PAD group. Similarly in group-2 significant decreases in AUR (71.075 vs. 60.825, p value = 0.000), area under diastole (136.347 vs. 110.538, p value = 0.001), Area (209.729 vs. 170.202, p value = 0.000) and 'b/a' (0.697 vs. 0.933, p value = 0.020) ratio was obtained compared to group-1 and significant increase in these features were noted in comparison with PAD group. The present finding may aid in the detection of PAD and AS due to calcification and arrange proper treatments plan.
Keywords: Ankle brachial index (ABI); area under diastole (AUD); area under rise-time (AUR); medial artery calcification (MAC); peripheral arterial disease (PAD); photoplethysmography (PPG); rise-time (RT); second derivative of photoplethysmogram (SDPPG).
Discovery of novel inhibitors targeting movement protein for controlling the transmission of banana bunchy top virus infection in plantain by structure-based virtual screening
by Archana Prabahar, Subashini Swaminathan, Kalpana Raja, Srividhya Vellingiri, Ramalingam Jegadeesan, Bharathi Nathan
Abstract: Banana bunchy top virus (BBTV), the pathogen causing banana bunchy top disease (BBTD) belongs to the genus Babuvirus of the family Nanoviridae and produces significant yield loss. BBTD is the most destructive viral diseases affecting bananas worldwide causing infections that result in bunched leaves, stunted and fruitless plants. So far, there are no effective control measures for controlling and preventing this viral disease. The amino terminal region of the movement protein is responsible for cell-to-cell movement. The present study aims at inhibiting this target region by discovering novel inhibitors through virtual screening of small molecule libraries coupled with post-docking analysis of most potent inhibitors. Our study based on virtual screening of small molecule datasets determined 10 most potential inhibitors to be considered as lead compounds in controlling the spread of BBTV infection in plantain.
Keywords: amikacin; BBTD; BBTV; virtual screening.
Computational structural biology and modes of interaction between human annexin A6 with influenza A virus protein M2: a possible mechanism for reducing viral infection
by Sujay Ray, Arundhati Banerjee
Abstract: Influenza-A virus is a prime lethal causative factor for influenza. The M2 protein of influenza A virus plays an important responsibility in the cycle of viral replication. The human Annexin A6 protein targets and stops the viral budding for influenza A virus. Here, molecular level interactions between Annexin A6 and influenza A virus M2 protein were examined. Executing the techniques for molecular modelling, the 3D structures of the two proteins were built via energy optimisations. Interactions between the two proteins were analysed by molecular docking studies. Both Annexin A6 and M2 protein interacted strongly with a pivotal role of Asp and Lys residues, respectively. A conformational shift from helices and sheets to coils was observed in the M2 protein after its interaction with Annexin A6. This probe therefore helped to understand the molecular mechanism of the two proteins and the negative modulation of Annexin A6 on the M2 protein from influenza A virus.
Keywords: human annexin A6 protein; influenza A virus protein M2; molecular docking simulations; molecular level interactions; molecular mechanisms; molecular modelling; negative modulation; protein interaction calculator; viral replication cycle.
Tertiary and quaternary structure prediction of full-length human p53 by comparative modelling with structural environment-based alignment method
by Vaijayanthi Raghavan, Maulishree Agrahari, Dhananjaya Kale Gowda
Abstract: One of the fundamental components for a wide range of proteomics research is to determine the 3D structure and properties of proteins. Access to precise and accurate protein models becomes very essential to predict the drug binding region or optimising the stability and selectivity of biologics. Due to biological and technical challenges of p53, the full-length 3D structure is unavailable for the scientific community; thus, there is a need to develop the 3D structure of p53, which is a key player in preventing cancer. Here, we model all the 393 amino acids to generate full-length 3D models of human p53 in both monomeric and tetrameric forms using computational approaches. The 3D model building involved homology-based modelling techniques combined with a refinement approach and use of structural environment-based alignment method for developing quaternary structure of human p53. Our results showed that 3D models are more reliable when iterative modelling was used and structural environment-based alignment method is well-suited to model the tetramer. These structures can be utilised to develop p53 mutants, virtual screening, design/develop small molecules or target-drug interaction studies.
Keywords: homology modelling; human p53; structure prediction; transformation matrices; tumour suppressor protein.
Bioinformatic analysis of envelope gene of the dengue type 3 prevalent in India from 2005 onwards and comparison with dengue type 1
by Sumanta Dey, Ashesh Nandy, Papiya Nandy, Sukhen Das
Abstract: High incidence of dengue infection, particularly dengue serotypes 1 and 3, has been observed across India in the last few years with large number of fatalities. Since the surface situated envelope protein of the dengue virion is responsible for virus entry into the host cell, we have focused on the characterisation and analyses of the envelope gene with an aim to eventually develop inhibitors of the dengue virus. Two-dimensional graphical representations and phylogenetic relationships of the envelope gene show an inherent cross-national spread of the dengue virus. Moreover, hydropathy analysis shows amino acid compositional changes leading to morphological changes in the envelope protein and perhaps higher pathogenicity. We also found evidences of recombination-like events taking place in some of the genes of the full dengue type 3 genome. These observations serve to show the urgency of comprehensive genetic surveillance of the dengue virus to anticipate further damaging changes in the viral sequence.
Keywords: dengue envelope gene; dengue virus; envelope protein morphology; graphical representation; hydropathy analysis; phylogeny; recombination; transition-transversion ratios.
An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm
by Mohammed Hasan Aldulaimi, Suhaila Zainudin, Azuraliza Abu Bakar
Abstract: Many approaches have been proposed to enhance the accuracy of protein structural class. However, such approaches did not cover the low-similarity sequences which are proved to be quite challenging. In this study, a 71-dimensional integrated feature vector is extracted from the predicted secondary structure and hydropathy sequence using newly devised strategies for the purpose of categorising proteins into their major structural classes: all-α, all-β, α/β and α+β. A new combined method containing two machine learning algorithms has been proposed for feature selections in this study. Support vector machine (SVM) and genetic algorithm (GA) are combined using the wrapper method for the purpose of selecting top N features based on the level of their importance. The proposed method is evaluated using the jackknife upon two low-similarity sequences datasets, i.e. ASTRAL and D640. The overall accuracies of 83.93 and 92.2% are reported for the predictions pertaining to ASTRALtesting and D640 benchmarks, exceeding most of the current approaches.
Keywords: feature selection; genetic algorithm; hydropathical information; low-similarity; secondary structure sequence; support vector machine.
Special Issue on: Trends in Medical Imaging and Health Informatics
A Comprehension of Contemporary Effort for Tracking of Lip
by Nandini M S, Nagappa U. Bhajantri
Abstract: Lip tracking exercise is the most important prerequisite for lip reading system. Most of the Lip reading procedures are accessible based on lip contour analysis. Similarly, lip contour extraction is a fundamental footstep. As a results of lip contour extraction, initially the process of lip contour detection in the first frame of an audio-visual image sequence. Subsequently capturing contour in successive frames is normally named as lip tracking.This paper presents an overview of contemporary works on extracting face from digital video and classified the face into lip area and non-lip area by categorizing the approaches into low and high level processing techniques. These system can be applied on asymmetric lips, the mouth with visible teeth, tongue and mouth with mustache. Furthermore the comparative study of approaches and their effectiveness based on various factors are offered.
Keywords: Lip Reading; Lip Tracking; Lip Segmentation; Lip Localization; Adaboost; Statistical Estimator.
Special Issue on: Bio-Inspired Computing Systems and Their Applications in Medical Image Processing
Vision based malaria parasite image analysis : A systematic review
by Priyadarshini Adyasha Pattanaik, Tripti Swarnkar
Abstract: Abstract: Background: Malaria is one of the classic neglected serious diseases in many developing countries. The early stage of disease detection, accurate parasite count, detection of the aggressiveness of the disease, technical limitations, lack of expertise in malaria diagnosis and smart tools, lack of good quality healthcare services, funds so on are the challenges found during malaria diagnosis that requires a deeper analysis.
Objectives: This paper aims to give a review of the automated diagnosis or visual inspection of malaria parasites using histology images of thin or thick blood film smears. The goal here is to survey the existing works by addressing the issues differently or assigning partial solutions to the diagnosis errors.
Methods and Results: Various computer-aided diagnosis techniques are in use to solve tasks meticulously in a stratified description paradigm using non-linear transformation architectures.
Conclusion: This work elaborates a comprehensive study of various computer vision diagnostic approaches already proposed in this field with a future direction for better quicker malaria identification. This timely review aims to emphasize the increasing interest in deep learning in developing countries which would enhance the malaria diagnosis to a greater extent with improved visualization.
Keywords: Malaria parasites; Microscopy analysis; Computer vision diagnosis; Deep learning.