International Journal of Bioinformatics Research and Applications (37 papers in press)
Subspace Module Extraction from MI-Based Co-expression Network
by Sarmistha Deb, Priyakshi Mahanta, Dhruba Bhattacharyya, Malay Dutta
Abstract: Most of the existing methods in literature have used proximity measures in the construction of co-expression networks(CEN) consisting of functional gene modules. This work describes the construction of co-expression network using mutual information(MI) as a proximity measure with non-linear correlation. The network modules are extracted that are defined over a subset of samples. This method has been tested on several publicly available datasets and the subspace network modules obtained have been validated in terms of both internal and external measures.
Keywords: Mutual information; co-expression network; network modules; topological property.
Sample-to-sample p-value variability and its implications for multivariate analysis
by Wei Wang, Wilson Wen Bin Goh
Abstract: Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. On the other hand, with greater power (stronger effect size or sampling size), p-value variability doesnt quite converge, suggesting that p-values are a terrible indicator of estimated effect sizes. The t-test is also quite resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon Rank-Sum test is expected to be better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-value) in real data comprising 12 normal and 12 renal cancer patients actually worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.
Keywords: p-value; variability; t-test; Wilcox Rank-Sum test; statistical feature-selection.
Emerging trend of Big Data Analytics in Bioinformatics: A Literature Review
by Kalyan Nagaraj, Sharvani GS, Amulyashree Sridhar
Abstract: Advancement of unparalleled data in bioinformatics over the years is a major concern for storage and management. Such massive data must be handled efficiently to disseminate knowledge. Computational advancements in information technology present feasible analytical solutions to process such data. In this context, the paper is an attempt to highlight the influence of big data in bioinformatics. Some of the concepts emphasized are definition of big data; architectural platforms supporting data analytics; followed by the application of above mentioned analytical techniques towards complex problems in bioinformatics. The challenges and future prospects of big data analytics in bioinformatics are briefly discussed. This paper provides a comprehensive summary of several data analytical techniques available for bioinformatics researchers and computer scientists.
Keywords: big data; bioinformatics; data analytics.
Single-trial evoked potentials denoising using adaptive modelling
by Mahmoud Boudiaf, Moncef Benkherrat, Khaled Mansouri
Abstract: This study presents a method for improving the signal-to-noise ratio of single-trial event-related potentials. The method is based on adaptive linear combiner Hermite model. A variable step-size least-mean-square algorithm is used to estimate and to adjust the parameters of the filter. The performances of the method are applied to simulated data and real event-related potential recordings. The method significantly enhances the observation of single-trials and the estimation of amplitude and latency of the event-related potentials.
Keywords: Event-related potentials; adaptive linear combiner; Hermite basis functions; VSS-LMS algorithm.
Discovery of novel inhibitors targeting movement protein for controlling the transmission of Banana Bunchy Top Virus (BBTV) infection in plantain by structure-based virtual screening
by Archana Prabahar, Subashini Swaminathan, Kalpana Raja, Srividhya Vellingiri, Ramalingam Jegadeesan, Bharathi Nathan
Abstract: Banana bunchy top virus (BBTV), the pathogen causing banana bunchy top disease (BBTD) belongs to the genus Babuvirus of the family Nanoviridae and produces significant yield loss. BBTD is the most destructive viral diseases affecting bananas worldwide causing infections that result in bunched leaves stunted and fruitless plants. So far there are no effective control measures for controlling the spread or preventing this viral disease. The amino terminal region of the movement protein is responsible for cell-to-cell movement. The present study aims to inhibit this target region by discovering novel inhibitors through virtual screening of small molecule libraries coupled with post docking analysis of most potent inhibitors. Our study based on virtual screening of small molecule datasets determined ten most potential inhibitors to be considered as lead compounds in controlling the spread of BBTV infection in plantain.
Keywords: BBTV; BBTD; Virtual screening; Amikacin.
Computational Structural Biology and Modes of Interaction between Human Annexin A6 with Influenza A Virus Protein M2: A Possible Mechanism for Reducing Viral Infection
by Sujay Ray, Arundhati Banerjee
Abstract: Influenza-A virus is a prime lethal causative factor for influenza. The M2 protein of Influenza A virus plays an important responsibility in the cycle of viral replication. The human Annexin A6 protein targets and stops the viral budding for Influenza A virus. Here, molecular level interactions between Annexin A6 and Influenza A virus M2 protein were examined. Executing the techniques for molecular modeling, the 3D structures of the two proteins were built via energy optimizations. Interactions between the two proteins were analyzed by molecular docking studies. Both Annexin A6 and M2 protein interacted strongly with a pivotal role of Asp and Lys residues, respectively. A conformational shift from helices and sheets to coils was observed in the M2 protein after its interaction with Annexin A6. This probe therefore helped to understand the molecular mechanism of the two proteins and the negative modulation of Annexin A6 on the M2 protein from Influenza A virus.
Keywords: Molecular level interactions; Viral replication Cycle; Human Annexin A6 protein; Influenza A Virus Protein M2; Molecular modeling; Molecular Docking Simulations; Protein Interaction Calculator; Molecular mechanisms; Negative Modulation.
Tertiary and Quaternary Structure Prediction of Full Length Human p53 by Comparative Modeling with Structural Environment-Based Alignment Method
by Vaijayanthi Raghavan, Maulishree Agrahari, Dhananjaya K. Gowda
Abstract: One of the fundamental components for a wide range of proteomics research is to determine the 3D structure and properties of proteins. Access to precise and accurate protein models becomes very essential to predict the drug binding region or optimizing the stability and selectivity of biologics. Due to biological and technical challenges of p53 the full length 3D structure is unavailable for the scientific community thus a need to develop the 3D structure of p53, which is a key player in preventing cancer. Here we model all the 393 amino acids to generate full length 3D models of human p53 in both monomeric and tetrameric form using computational approaches. The 3D model building involved homology based modeling techniques combined with a refinement approach and use of structural environment-based alignment method for developing quaternary structure of human p53.These models are built with good accuracy supported validation reports. Our results showed that 3D models are more reliable when iterative modeling was used and structural environment-based alignment method is well suited to model the tetramer. These structures can be utilised to develop p53 mutants, virtual screening, design/develop small molecules or target-drug interaction studies of wt and mutant p53.
Keywords: Human p53; Tumor suppressor protein; Homology modeling; Transformation Matrices.
Bioinformatic analysis of envelope gene of the Dengue type 3 prevalent in India from 2005 onwards and comparison with Dengue type 1
by Sumanta Dey, Ashesh Nandy, Papiya Nandy, Sukhen Das
Abstract: A fresh wave of dengue infection, particularly dengue serotypes 1and 3, have been observed all across India in the last few years and has led to large number of fatalities. Since the surface situated envelope protein of the dengue virion is responsible for virus entry into the host cell, we have laid special emphasis on the characterization and analyses of the envelope gene with an aim to eventually develop inhibitors of the dengue virus. There are four serotypes of the dengue virus of which types 1 and 3 form the majority of cases in India. 2D graphical representations of the envelope gene from various countries indicate that the gene from an Indian dengue type 1 virus bears a strong resemblance to the genes from Asia as shown in our previous paper, whereas in the case of dengue type 3, the more prevalent form in India in recent years, the Indian strain representation shows strong likeness to strains from North America. Phylogenetic trees using alignment procedures also bear this out, implying an inherent cross-national spread of the dengue virus. Moreover, hydropathy analysis shows that amino acid compositional changes are tending to increase hydrophobic residues in the dengue type 3 viruses leading to morphological changes that may explain, in part, the higher pathogenicity of the dengue virus in India in recent times. In case of Indian Dengue type 3, we also found evidences of recombination-like eventstaking place in some of the genes of the full genome. These observations serve to show the urgency of comprehensive genetic surveillance of the dengue virus to anticipate further damaging changes in the viral sequence arising from some of the factors mentioned such as cross-national spread related to human travel, recombinations in the genetic make-up, asynonymous mutations leading to possible higher pathogenicity, changes in frequency of different dengue serotypes and the like.
Keywords: Dengue virus; phylogeny; graphical representation; dengue envelope gene; hydropathy analysis; envelope protein morphology; recombination; transition-transversion ratios.
Bioinformatics Resources and Approaches for The Interaction of Oryza Sativa and Magnaporthe Oryzae Pathosystem
by Vinay Sharma, Varshika Singh, Pramod Katara
Abstract: Rice is a major cereal crop and serves as staple food for a large part of the human population of world. Rice blast, caused by Magnaporthe oryzae, is a very important disease that attacks rice; affecting its production and is of common occurrence wherever rice is grown. It is also considered as a model disease for the study of genetics and molecular pathology of host pathogen interactions. Numerous comprehensive studies on both the host and pathogen have been carried out using genomics, proteomics and bioinformatics approaches. Consequently an enormous amount of information has been made available for researchers to carry out further work on this pathosystem. rnBioinformatics has played a significant role in storage and interpretation of the data made available by various wet laboratory experiments, into useful biological information. This review presents an overview of the bioinformatics resources and approaches for the study of rice- Magnaporthe interaction. rn
Keywords: bioinformatics; disease; nucelotide sequence; pathogen; database; host- pathogen interaction; rice blast; genomics.
Efficient Formulation of the Rejection-based Algorithm for Biochemical Reactions with Delays
by Vo Thanh, Roberto Zunino, Corrado Priami
Abstract: The rejection-based stochastic simulation algorithm(RSSA) is an exact
simulation for realizing temporal behavior of biochemical reactions. It reduces the
number of propensity updates during the simulation by using propensity bounds of
reactions to select the next reaction firing.We present in this paper a new efficient
formulation of RSSA and extend it for incorporating biochemical reactions with
time delays. Our new algorithm explicitly keeps track of the putative firing times
of reactions and uses these to selects the next reaction firing. By using such a
representation, it can efficiently handle biochemical reactions with delays and
achieve computational efficiency over existing approaches for exact simulation.
Keywords: Computational biology; Stochastic simulation; Rejection-based stochastic simulation algorithm.
Exploring New Features of a-amylases from Different Source Organisms by an In Silico Approach
by Javad Harati
Abstract: Abstract rnA total of 78 full-length protein sequences of α-amylase from different source organisms were subjected to phylogenetic analysis, multiple sequence alignment (MSA), motif search, and physiochemical properties. The phylogenetic tree was built using the Maximum Likelihood (ML) method in Molecular Evolutionary Genetics Analysis (MEGA) software and was pointed out in two major clusters. One of the clusters included plants and animals, whereas the other one contained fungi, archaea, and bacteria. Furthermore, Firmicutes and Proteobacteria are bacterial phylum that placed in the same evolutionary cluster with plants and animals. The deviations from normal clusters were explained by both motif analysis data and constructing a new tree. MSA declared three conserved sequence blocks, 505-527, 725-745, and 1010-1030, that were present in all studied species. Moreover, it provided information about highly conserved residues at which three glycine and one aspartic acid residues were conserved. Motif analysis with Multiple EM for the Motif Elicitation (MEME) server revealed that Motif 4 HDTGSTQRHWPFPSDHVMQGYAYILTHPGIPCIFYDHFFDW, motif 6 EGAGGPSTAFDFTTKGILQEAVKGELWRLRDPQGKPPGMIGWWPERAVTF, and motif 11 EQIVKLIAIRKRNGIHSRSSIRILEAEGDLYVAMIDEKVCMKIG were present only in plants. Pearson correlation analysis to clarify relationships among different physiochemical properties showed a direct correlation between GRAVY and the aliphatic index and a reverse correlation between GRAVY and pI and instability indexes.
Keywords: a-Amylase; Sequence analysis; Phylogenetic analysis; Conserved regions and residues; Physiochemical characteristics.
Computational Protein Design of Bacteriocins based on structural scaffold of aureocin A53
by Sekhar Talluri
Abstract: Bacteriocins are highly potent polypeptide and protein antibiotics produced by bacteria. They are rapidly degraded in the environment after their use, due to their proteinaceous nature. Some bacteriocins are used as preservatives in foods. Native and engineered bacteriocins are of potential interest as replacements for conventional antibiotics that are loosing their efficacy due to development of antibiotic resistant strains. Aureocin A53 is a class II bacteriocin. It is a broad spectrum antibiotic, with demonstrated ability to inhibit growth of methycillin resistant Staphylococcus aureus (MRSA). Validated computational protein design tools have been used for reengineering of the Aureocin A53 sequence to produce novel sequence variants of the bacteriocins Aureocin A53 and Lacticin Q. The novel proteins are expected to possess an altered spectrum of bactericidal specificity and potency. The quality of the designed proteins was assessed by using structure validation tools and predicted to be better than that of an average experimentally determined protein structure. The protein designed by using FoldX is predicted to be more stable than native Aureocin A53.
Keywords: Bacteriocin; computational protein design; antibiotic; protein engineering; molecular modeling; MRSA (methycillin resistant Staphylococcus aureus).
An Improved Method to enhance protein structural class prediction
Using their Secondary Structure Sequences and Genetic Algorithm.
by Mohammed Aldulaimi
Abstract: Many approaches have been proposed to enhance the accuracy of protein structural class. However, such approaches did not cover the low-similarity sequences which are proved to be quite challenging. In this study, a 71-dimensional integrated feature vector is extracted from the predicted secondary structure and hydropathy sequence using newly devised strategies for the purpose of categorizing proteins into their major structural classes: all-α, all-β, α/β, and α+β. A new combined method containing two machine-learning algorithms has been proposed for feature selections in this study. Support Vector Machine (SVM) and Genetic Algorithm (GA) are combined using the wrapper method for the purpose of selecting top N features based on the level of their importance. The proposed method is evaluated using the jackknife upon two low-similarity sequences datasets, i.e. ASTRAL and D640. The overall accuracies of 83.93% and 92.2% are reported for the predictions pertaining to〖 ASTRAL〗_testing and D640 benchmarks, exceeding most of the current approaches.
Keywords: Feature Selection; Genetic Algorithm; Hydropathical information; Secondary Structure Sequence; Low-similarity; Support Vector Machine.
Molecular docking and in vitro study of S. cumini-derived natural compounds on Receptor tyrosine kinases pathway components
by Pushpendra Singh, Felix Bast, Satej Bhushan, Richa Mehra, Pooja Kamboj
Abstract: Syzygium cumini (S. cumini) are used for a variety of biological activity such as anti-inflammatory, antidiabetic and antioxidant, and currently it has been reported for the DNA protection against radiation. Receptor tyrosine kinases (RTKs) are recognized to control various biological processes including, cell proliferation, metabolism, and apoptosis. These receptors have recently, trapped the consideration of the as an attractive target for cancer treatment due to the confirmation signifying their over-expression in cancer cells. The present research was subjected to screen S. cumini-derived natural compounds against RTKs pathway components by using molecular docking. Furthermore, in vitro anticancer activity of leaf extract of S. cumini such as cell proliferation (MTT), oxidative stress (NBT and H2CDFD) was reported. All selected natural compounds were docked with the X-ray crystal structure of RTKs signaling proteins by employing GLIDE (Grid-based ligand docking with energetics) Maestro 9.6. In the present investigation, our result highlighted that; myricetin, kaempferol, delphinidin chloride, ellagic acid, rutin, petunidin, gossypol, and mirtillin yielded a good dock score with all selected proteins. Protein-ligand interactions accentuated that the lipophilic, hydrogen bonding, π-π stacking, and cationπ interactions represent a ruling contribution at the active site. Moreover, reduction in cell viability with leaf extract of S. cumini treatment at concentrations of 5
Keywords: Keywords: Cancer; Receptor tyrosine kinases; Phosphoinositide-3 Kinase; Natural product compounds; and Maestro 9.6.rn.
Cell-Level 3D Reconstruction and Quantification of the Drosophila Wing Imaginal Disc
by David Breen, Liyuan Sui, Linge Bai, Frank Jülicher, Christian Dahmann
Abstract: We describe a set of techniques that, when applied to a 3D stack of confocal microscopy images, produces a volumetric model of an epithelial tissue, as well as a mesh model of its apicolateral cell boundaries. Via a projection step, detailed 3D models that approximate the individual cells in the epithelium are then defined. Once the individual cells are generated, their apical face area, height and volume may be computed and visualised, providing quantitative and visual data about the patterns of cells within the tissue. We have applied the techniques to the analysis of the developing wing imaginal disc of a late-larval Drosophila melanogaster. Our techniques are being applied to a series of specimens in an investigation that intends to quantitatively substantiate observed cell shape changes that occur during wing imaginal disc development.
Keywords: Reconstruction; implicit models; epithelial tissues; wing imaginal disc; visualisation.
Construction of Discrete Descriptions of Biological Shapes through Curvilinear Image Meshing
by Jing Xu, Andrey Chernikov
Abstract: Mesh generation is a useful tool for obtaining discrete descriptors of biological objects represented by images. The generation of meshes with straight sided elements has been fairly well understood. However, in order to match curved shapes that are ubiquitous in nature, meshes with curved (high-order) elements are required. Moreover, for the processing of large data sets, automatic meshing procedures are needed. In this work, we present a new technique that allows for the automatic construction of high-order curvilinear meshes. This technique allows for a transformation of straight-sided meshes to curvilinear meshes with C1 or C2 smooth boundaries while keeping all elements valid and with good quality as measured by their Jacobians. The technique is illustrated with examples. Experimental results show that the mesh boundaries naturally represent the objects' shapes, and the accuracy of the representation is improved compared to the corresponding linear mesh.
Keywords: biomedical image processing; high-order mesh generation; B.
RECENT ADVANCEMENT IN NEXT-GENERATION SEQUENCING TECHNIQUES AND ITS COMPUTATIONAL ANALYSIS
by Khalid Raza, Sabahuddin Ahmad
Abstract: Next Generation Sequencing (NGS), a recently evolved technology, have served a lot in the research and development sector of our society. This novel approach is a newbie and has critical advantages over the traditional Capillary Electrophoresis (CE) based Sanger Sequencing. The advancement of NGS has led to numerous important discoveries, which could have been costlier and time taking in case of traditional CE based Sanger sequencing. NGS methods are highly parallelized enabling to sequence thousands to millions of molecules simultaneously. This technology results into huge amount of data, which need to be analysed to conclude valuable information. Specific data analysis algorithms are written for specific task to be performed. The algorithms in group, act as a tool in analysing the NGS data. Analysis of NGS data unravels important clues in quest for the treatment of various life-threatening diseases; improved crop varieties and other related scientific problems related to human welfare. In this review, an effort was made to address basic background of NGS technologies, possible applications, computational approaches and tools involved in NGS data analysis, future opportunities and challenges in the area.
Keywords: Massive Parallel Sequencing; Variant Discovery; DNA-Seq; RNA-Seq; Computational Analysis.
Application of machine learning techniques towards classification of drug molecules specific to peptide deformylase against Helicobacter pylori
by Surekha Patil
Abstract: It is crucial to adapt to the current computational drug discovery pipeline to develop novel drug molecules to combat the gastric disorders caused by Helicobacter pylori. Virtual screening techniques can be used as a preliminary screening tool to identify the relevant compounds which may have drug-like properties. These drug-like molecules can be further screened to test their bioactivity against a particular protein target. In this context, we apply different machine learning techniques to generate models to predict the pIC50 value of drug molecules. Molecular descriptors were produced for the drug dataset. Initial models were developed for the dataset with a large number of descriptors. Later, feature reduction techniques were applied to yield feature descriptors with best six variables using three algorithms: principal component analysis (PCA), random forest, and genetic algorithm. Consequently, machine learning techniques were applied to the reduced dataset to develop predictive models. Na
Keywords: Helicobacter pylori; gastric disorders; drug molecule; target protein; virtual screening.
Computational study to understand mechanism of isoniazid drug resistance caused by mutation (R268H) in NADH dehydrogenase of Mycobacterium tuberculosis
by Lingaraja Jena, Shraddha Deshmukh, Tapaswini Nayak, Gauri Wankhade, Bhaskar Harinath
Abstract: NADH dehydrogenase (Ndh) of Mycobacterium tuberculosis is essential for conversion of NADH to NAD+ in presence of FMN. An increased NADH/NAD+ ratio was reported due to mutation (R268H) in Ndh, causing INH resistance. To study the effect of this mutation on Ndh, molecular dynamics (MD) simulation analysis was performed for both wild and mutant models independently as well as for docked complexes (Ndh-NADH and Ndh-FMN). Simulation study showed that mutation (R268H) affected the secondary structure of the enzyme giving extra stability to the mutant model R268H as observed in the RMSD plot. Further, it was observed that both wild type and mutant models of Ndh were quite stable in complex with NADH but in case of FMN, the Ndh mutant appears to be more unstable and might be the reason for decreasing NAD+ concentrations thus hindering INH-NAD adduct formation resulting in isoniazid resistance.
Keywords: NADH degydrogenase; tuberculosis; isoniazid; drug resistance; mutation; NAD.
A Survey of Predictive Analytics Using Big Data with Data Mining
by Poornima S., Pushpalatha M
Abstract: Today, the world is filled with data just like Oxygen. The amount of data that is being harvested and eaten up is flourishing vigorously in the digital world. The growing exploitation of novel inventions and social media lead to the generation of huge quantities of data which can bring in remarkable information in case it is analysed properly. This type of huge dataset is widely referred to as the big data, and conventional databases are not sufficient for fitting this due to its richer size. Organizations are required to do the management and analysis of big data for having better decisions and results. Hence, in the recent times, big data analytics is being paid attention. Information Technology has now reached the modern generation of Big Data that yields large volumes of structured and unstructured data for the perusal of research people and analysts. People are provided with sufficiently large data in their hands, and this copious data have valuable insight contents that help the policymakers, administrators and business analysts to make the right decisions at the right time. For finding the concealed values from the available data, society requires few schemes or strategies. Therefore, Predictive analytics becomes vital when an essential quantity of highly sensitive data has to be handled. Based on the perceived events, the future probabilities and measures are predicted. With the aid of the available data mining techniques, predictive analytics is in place to predict the events in the future and make recommendations. Predictive analytics comprises of several statistical and analytical techniques for developing novel strategies for the future possibilities prediction. In this review paper, different platforms and algorithms for the big data and predictive analytics are discussed along with the nature of dataset used for prediction and its pros and cons. This survey concludes about the issues and the futuristic approaches on big data.
Keywords: Big Data; Predictive Analytics; Data Mining; Classification.
Statistical Analysis of the in silico binding affinity of P-glycoprotein and its substrates with their experimentally known parameters to demonstrate a cost-effective approach for screening, ranking and possible prediction of potential substrates
by Suneetha Susan Cleave A, P.K. Suresh
Abstract: Over-expression of P-glycoprotein (P-gp) has been reported as a cause of multi-drug resistance in cancers and other diseases. Transport assays, which are generally used to find out the specificity of a compound to be effluxed, have always been time consuming, resource-intensive and expensive and thus, have inherent limitations to easily predict a compounds specificity. Hence, there is a clear-cut, unmet need to develop cost-effective methods for screening, identification and ranking of P-gp substrates. All compounds (23 substrates and 3 non-substrates) were docked to two homology modeled human P-gp conformations. The in silico binding affinities, obtained for all substrates, were checked for correlation with their experimentally determined efflux ratios, LogP values and number of hydrogen bond acceptors they possess. Docking results showed that all compounds demonstrated differences in relative binding affinity. Experimentally-derived efflux ratio obtained for 19 substrates from literature, for the first time showed a significant, Spearman correlation with binding energies to outward-facing conformation. Thus, it can be said that binding energies obtained from docking studies can possibly have significant potential in identifying the specificity and ranking P-gp substrates. This approach provides a sound foundation to strengthen the relationship of in silico binding energies with other experimentally defined physico-chemical parameters and can also be part of an iterative process to identify and develop a potentially, validatable solution.
Keywords: Autodock; in silico binding energy; P-glycoprotein (P-gP); efflux ratio; LogP; hydrogen bond acceptors; Spearman Rank Correlation.
Genetic algorithm based clustering for gene-gene interaction in episodic memory
by Sudhakar Tripathi, Ravi Bhushan Mishra, Anand Sharma
Abstract: After the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, it is now clear that gene-gene interactions are fundamental mechanisms for the development of complex diseases. In this paper, we propose a genetic algorithm based clustering algorithm to identify groups of related genes in episodic memory. This clustering method required number of clusters and number of genes in each cluster and fitness function. In this paper, we have taken STRING 9.1 clustering method result on episodic memory. We have used interaction between clusters as a fitness function for the genetic algorithm and have compared the result of genetic algorithm based clustering algorithm with standard K-means, STRING 9.1 K-means, Hierarchical and SOM. We have evaluated the performance of all the above methods using Rand index, Jaccard index and Minkowski index. Our comparative study demonstrates that the proposed genetic algorithm is close to hierarchical clustering method So far as the performance is concerned.
Keywords: gene-gene interaction; clustering; genetic algorithm; k-means; hierarchical; SOM; STRING 9.1.
Effect of single amino acid mutations on C-terminal domain of breast cancer susceptible protein 1
by Satish Kumar, Lingaraja Jena, Maheswata Sahoo, Kanchan Mohod, Sangeeta Daf, Ashok Varma
Abstract: The most commonly diagnosed cancer in women is the breast cancer. Around 5 - 10% of breast cancer cases are hereditary, mainly due to the mutation in the breast cancer susceptible BRCA1 and BRCA2 tumor-suppressor genes. More than hundreds mutations are documented in BRCA1 C-terminal region (BRCT), mainly associated with repairing DNA damage and cell cycle control. In this study, we employed different mutation analysis system such as SIFT, MutPred, PON-P2, META-SNP etc to predict the pathological effects of 95 distinct miss sense mutation on BRCT domain. Out of which, 37 mutations were predicted to be deleterious by all mutation analysis systems affecting the protein stability and its normal function leading to causing cancer. The computational approach for finding out the impact of mutation on BRCA protein may provide a way in early detection and therapy in breast cancer patients.
Keywords: breast cancer; mutation; BRCA1; BRCT; bioinformatics; mutation analysis.
On using the wisdom of the crowd principles in classification, Application on breast cancer diagnosis and prognosis.
by Merouane Amraoui, Tarik Boudghene Stambouli, Belal Alshaqaqi
Abstract: Breast cancer diagnosis and prognosis are an oblique processes, where errors can be fatal, it is done by experts only. Therefore, researchers are using the promising potentials of classification algorithms to detect malignant and benign tumours. Classification techniques vary widely, from individual classifiers such as rules, trees and functions to ensemble classifiers that combine serval classification algorithms. In this paper, we examine the use of wisdom of crowds in classification of breast cancer. We use four well-known data sets and run a collection of 53 algorithms combined with majority voting to simulate the wisdom of crowds. Furthermore, we report the results obtained from all of 53 algorithms executed individually on the four datasets. Therefore, this article can be perceived as a review for the classification methods as well. Finally, we compare the results obtained from applying majority voting using the best five classifiers, to those obtained by applying the wisdom of the crowds.
Keywords: breast cancer; wisdom of the crowd; WDBC; WPBC; BCD; Wisconsin; Weka; classification; majority voting; diagnosis; prognosis;.
Potential of photoplethysmogram for the detection of calcification and stenosis in lower limb
by Neelamshobha Nirala, Periyasamy R, Awanish Kumar
Abstract: Background: Early detection of arterial stiffness and atherosclerosis in lower limb is useful for the prediction of cardiovascular and diabetic foot diseases. Photoplethysmogram (PPG), a simple low-cost non-invasive technique that was used widely for the detection of arterial stiffness and screening of peripheral arterial disease. Therefore the aim of the present study is to use toe PPG signal based features for the screening of peripheral arterial disease (PAD) and detection of arterial stiffness due to calcification and differentiate it from normal and occlusive arteries.rnMethods: In the context, total 34 subjects were recruited and divided into three groups: group-I include 15 normal subjects, group-II comprised of 6 subjects with known calcification and group-III consist of 13 PAD patients. Six features rise time (RT), area under rise-time (AUR), area under diastole (AUD), Area, aging index and b/a ratio were derived from the PPG and significant difference between three groups was analyzed statistically.rnResult: Significantly low ABI value was obtained for PAD and it was detected only by RT (0.3037
Keywords: Area under rise time; Photoplethysmogram; Vascular Calcification; Peripheral Arterial Disease; Rise time.
Identification of protein complexes in protein-protein interaction networks by core-attachment approach incorporating gene expression profile
by Seketoulie Keretsu, Rosy Sarmah
Abstract: Due to the advancement in Proteomic technologies, bulk data of protein-protein interactions (PPI) are available which give researchers in bioinformatics the opportunity to explore and understand biological properties and structure from a networking perspective. Identification of protein complexes is a challenge that has emerged as an attraction to researchers particularly in computational biology. Various computational approaches were developed to identify protein complexes in PPI networks. In this paper, we give a new method based on the core-attachment approach with incorporation of gene expression data known as core-attachment with gene (CAG) expression to identify protein complexes in PPI networks. Experiment results support that our method CAG can detect protein complexes effectively. Validation by biological information, namely co-localisation and gene ontology semantic similarity score reveals that the complexes predicted by our method has high biological relevance. We also give a comparison of our method with four other popular methods in the field.
Keywords: gene expression analysis; protein clustering; protein complexes; protein-protein interaction networks.
In-silico analysis of marker genes from gene expression data of solanaceous plants responsible for various abiotic stresses
by Sanchita Gupta, Garima Singh, Swati Srivastava, Ashok Sharma
Abstract: Understanding the responses of plant against any environmental condition requires the expression analysis of transcriptome data. The present work focused on identifying the group of genes of Solanum tuberosum, differentially expressed in different abiotic stresses. The public database has assessed for the gene expression data in response to cold, heat and salt stresses, respectively. Furthermore, the common genes considered as marker genes, responding to all three abiotic conditions were analysed. The gene ontology classification of the marker genes and their visualisation in metabolic pathway was also analysed. The genes responsible for kunitz-type protease inhibitor precursor were found to be up-regulated, whereas the genes encoding lipid transfer protein showed down-regulation. These marker genes may be studied for further validation to see their role in stress responses to the medicinally important plants of solanaceae family.
Keywords: abiotic stress; co-expression; functional annotation; gene expression; gene ontology; metabolic pathway; microarray analysis; network analysis; solanaceae; Solanum tuberosum.
A novel approach to knowledge discovery and representation in biological databases
by Jing Lu, Cuiqing Wang, Malcolm Keech
Abstract: Extraction of motifs from biological sequences is among the frontier research issues in bioinformatics, with sequential patterns mining becoming one of the most important computational techniques in this area. A number of applications motivate the search for more structured patterns and concurrent protein motif mining is considered here. This paper builds on the concept of structural relation patterns and applies the concurrent sequential patterns (ConSP) mining approach to biological databases. Specifically, an original method is presented using support vectors as the data structure for the extraction of novel patterns in protein sequences. Data modelling is pursued to represent the more interesting concurrent patterns visually. Experiments with real-world protein datasets from the UniProt and NCBI databases highlight the applicability of the ConSP methodology in protein data mining and modelling. The results show the potential for knowledge discovery in the field of protein structure identification. A pilot experiment extends the methodology to DNA sequences to indicate a future direction.
Keywords: bioinformatics; biological databases; concurrent vector method; data analytics; DNA sequences; graphical modelling; knowledge discovery; protein motif mining; sequential patterns post-processing; structural relations.
Identification of potential biomarkers in nasopharyngeal carcinoma based on protein interaction analysis
by Yulanda Antonius, Didik Huswo Utomo, Widodo
Abstract: Nasopharyngeal carcinoma (NPC) is malignant tumour that strongly related to Epstein-Barr virus infection. Several methods are available for diagnosis but it only indicates the viral titre. This research aims to identify new potential biomarker and those contributions in NPC signalling pathway. Biomarker was identified by topological analysis, modularity analysis and functional analysis using Cytoscape 3.2.1. Furthermore, biomarkers' candidate expression was confirmed by microarray data from NCBI and analyzed by non-paired t-test. The results showed four potential biomarkers with the highest value in each parameter of topological analysis such as RPA1, USP7, UBC and TERF2, but only RPA1 included in protein module with the highest score of 4.526, while UBC and TERF2 involved in protein module with lower score. Moreover, RPA1 has high expression in NPC samples (p < 0.05; FC = 1.07) and mainly related to cell cycle pathway. This study might help to understand the NPC mechanism and develop an appropriate treatment.
Keywords: biomarker; latent protein; nasopharyngeal carcinoma; protein analysis; protein network; tumourigenesis.
Computational approach to reveal the modulation of Wnt and TGF-β signalling induced by Solenopsin B, an ant venom alkaloid
by Priya Das, Umesh P, Pawan K. Dhar, Achuthsankar S. Nair, Oommen V. Oommen
Abstract: Recently, therapeutic prospect of solenopsin A was reported with emphasis on the inhibition of angiogenesis by antagonising Akt and inhibiting insulin-mediated PI3K activation. In this study, we attempt to computationally predict the molecular genes and pathways altered by solenopsin B using microarray data. Functional analysis of differentially expressed genes i.e., gene ontology and pathway enrichment analysis using bioinformatics tools specifically indicated the gene-level variations leading to down-regulation in Wnt, ErbB and TGF-β signalling pathways.
Keywords: differential gene expression; fire ant; functional annotation; solenopsin B.
Special Issue on: Trends in Medical Imaging and Health Informatics
MATHEMATICAL INVESTIGATION OF AETIOLOGY AND PATHOGENSIS OF ATHEROSCLEROSIS IN HUMAN ARTERIES
by Gayathri Kaliappan, Shailendhra Karthikeyan
Abstract: To understand the role of medically significant hemodynamic wall parameters (HWPs) in the pathogenesis of vascular diseases, pulsatile blood flow in large human arteries of systemic, pulmonary and coronary circulation is investigated by mathematical modeling. To be medically realistic, the pressure gradient wave forms reported in the cardiology literature for the arteries considered are digitized and developed in Fourier series (McDonalds model). Three objectives of the article are to (i) compare qualitatively and quantitatively the pulsatile blood flow between the parallel plate and circular geometry, (ii) compare the HWPs in the three major circulations mentioned above to gain new medical or physiological insights (iii) understand if slip at the wall has significant influence on the HWPs. Our model is reliable since the results obtained here through exact solutions are in great agreement with those reported in the medical literature. New insights gained from our study, documented here for the first time in the hemodynamic literature, are: parallel plate geometry approximation is not reliable quantitatively; larger the radius (Womersley number), larger is the value of RRT and hence, higher the probability for vascular diseases; none of the commonly employed interface conditions are suitable for the hemodynamic studies. Comparing our results with earlier studies, we recommend that future research should focus on developing an interface condition exclusively for hemodynamics. We support the recent understanding that low wall shear stress and high oscillatory shear index need not co-locate. We have rendered new physiological insight for this result.
Keywords: Hemodynamics; Saffman slip condition; Wall Shear Stress; Oscillatory Shear Index; Relative Residence Time.
Machine Intelligence in Stroke Prediction
by Jeena R S, Sukeshkumar A
Abstract: The innovative developments in the field of machine intelligence have paved way to the growth of tools for assisting physicians in disease diagnosis. Early diagnosis and prognosis of stroke is crucial for timely prevention and cure. This research work focuses on the design of a stroke prediction system by investigating the various physiological parameters that are used as risk factors. Features extracted from various risk parameters carry vital information for the prediction of stroke. Classification algorithm that has been used with the number of attributes for prediction are SVM and ANN. Data collected from International Stroke Trial database was successfully trained and tested using both classifiers. The predictive models discussed here are based on different supervised machine learning techniques as well as on different input features and data samples. SVM gave an accuracy of 91 % while neural network outperforms SVM by providing an accuracy of 98.1 %.
Keywords: Stroke; Artificial Neural Network (ANN),Support Vector machine (SVM).
Enhanced decision tree algorithm using genetic algorithm for heart disease prediction
by Santosh Kumar, G. Sahoo
Abstract: In todays present scenario heart disease has greater impact on our lives and identified fatal due to its high mortality rate. The diagnosis of heart disease is more challenging due to its vulnerability. Gone to limitation of previous work of literature survey, enhanced decision tree algorithm is introduced and applied on UCI datasets. In order to predict heart disease, enhanced decision tree algorithm generates the decision rules which are later optimized by genetic algorithm. By then we examine the methods and operators of the algorithm. Finally our proposed algorithm is compared with decision tree (C4.5) and support vector machine (SVM) algorithm, the proposed algorithm shows high accuracy and its simplicity makes ideal for pattern recognition applications.
Keywords: C4.5; SVM; Genetic Algorithm (GA); Cardiovascular Disease (CVD); Heart Disease.
A Mathematical modeling on the effect of high intensity magnetic fields on pulsatile blood flow in human arteries
by Gayathri Kaliappan, Shailendhra Karthikeyan
Abstract: An attempt is made to investigate whether the static magnetic field (SMF) employed in MRI have any adverse effect on the hemodynamic wall parameters in large arteries or not. With the intention of addressing the controversy in the safety issues during MRI exposure, hemodynamics and pathology of large arteries such as brachial, femoral and pulmonary artery are compared by varying the intensity of SMF from high to ultra high. To be more medically accurate physiological pressure gradient waveforms taken from cardiology literature were digitized and adequate number of harmonics were extracted in order to represent them as Fourier series. All the medically relevant parameters related to endothelial functioning are significantly affected during the time of exposure to ultra high intensity SMF, irrespective of the fact whether the artery is closer or away from the heart. In such fields, the fluctuation of WSS vector in pulmonary artery is too severe as inferred from OSI values. The common hypothesis that low WSS and high OSI co-locate is not acceptable both in the absence and presence of magnetic field. It is also inferred that RRT can be considered as a single robust metric to predict the pathogenesis of vascular diseases when OSI is moderate. It is felt that more research is necessary, especially to clarify many existing contradictory results in this regard. The controversial reports in the literature of SMF motivated us to mathematically investigate the possible adverse effects of ultra high SMFs on pulsatile blood flow in large human arteries and find the maximum intensity of SMF up to which the blood flow and other medically relevant parameters are not significantly affected.
Keywords: Ultra high intensity magnetic field; Hemodynamic wall parameters; McDonald model.
Texture Analysis of Breast Thermograms using Neighborhood Gray Tone Difference Matrix
by Dayakshini Sathish, Surekha Kamath, Keerthana Prasad, Rajagopal Kadavigere
Abstract: Breast cancer is the leading cancer in women worldwide. Early detection can reduce the mortality rate of breast cancer. Breast thermography is a noninvasive and simple imaging technique used for early detection of breast cancer. Feature extraction and selection of appropriate features play a major role in computer aided detection of breast cancer using breast thermograms. In this article, texture features are extracted from automatically segmented breast thermograms by computing neighborhood gray tone difference matrix (NGTDM) and run length matrix (RLM). Significance of these features in differentiating the abnormal breast from the normal breast is found by statistical test. NGTDM extracted coarseness, busyness, complexity, strength and RLM extracted long run emphasis and run percentage are found to be significant by statistical test. Extracted features are computationally less expensive and attained an average accuracy of 80%, sensitivity of 94% and specificity of 71.4% using back propagation neural network classifier.
Keywords: breast cancer; breast thermography; asymmetry analysis; statistical test; neighborhood gray tone difference matrix.
Blood Glucose Regulation in Diabetes Mellitus Patients: A Review on Mathematical Plant Model and Control Algorithms
by Cifha Dias, Surekha Kamath, Sudha Vidyasagar
Abstract: In the current industrialized era, Diabetes Mellitus is spread worldwide, a metabolic disease termed as diabetes. In this condition human body is not able to maintain an acceptable range 70-180mg/dl of blood glucose and could lead to prolonged sickness. Diabetes at least doubles a persons risk of death. As the years progress by 2035 the expected death shall rise to 592 million. Several researches especially in the regulation of blood glucose were carried out and still remained an open challenge. With the increase in the modern technology and treatment towards the diabetes mellitus, there are several open opportunities to develop different methods of treatment. The main emphasize is to develop an efficient controller to the device which mimics the human pancreas. The elimination of the risk of occurrence of hypoglycemia is the main concern. Though many Artificial Pancreas systems are available, they are still subjected to many limitations. The challenges in development of Artificial Pancreas are choosing an appropriate mathematical model while developing an efficient control algorithm. Various mathematical models and methodological approach are reviewed and elaborated in this paper.
Keywords: Diabetes Mellitus; Hypoglycemia; Mathematical Plant Models; Control Strategy; Insulin Pump.
PFA based Feature Selection for Image Steganalysis
by Madhavi Desai
Abstract: This paper presents a universal image steganalysis method based on feature selection by Principal Feature Analysis. The goal of this paper is to increase the performance of existing image steganalysis approaches using principal feature analysis (PFA) based feature selection method and reduce the high dimensionality of the features used in state-of-the-art steganalysis methods. Principal component analysis (PCA) is widely used in pattern recognition applications. However, PCA has disadvantage that, all the generated features are transformed features. While, PFA selects the subset of preliminary features which contains necessary information. Principal Feature Analysis is applied on spatial domain SPAM (Subtractive pixel adjacency matrix) Features and in case of transform domain, CHEN features (Intrablock and Interblock Markov based features) and CC-PEV features (PEV features enhanced by Cartesian calibration). The experimental results show that Principal Feature Analysis is effective and efficient in eliminating redundant features. Experimental Results prove that the use of PFA method in Steganalysis is superior in terms of dimensionality reduction of features and increases the classification performance
Keywords: Image steganography; Image Steganalysis; High Dimensional Feature; Feature Selection; Principle Feature Analysis.