International Journal of Bioinformatics Research and Applications (45 papers in press)
Bioinformatics Resources and Approaches for The Interaction of Oryza Sativa and Magnaporthe Oryzae Pathosystem
by Vinay Sharma, Varshika Singh, Pramod Katara
Abstract: Rice is a major cereal crop and serves as staple food for a large part of the human population of world. Rice blast, caused by Magnaporthe oryzae, is a very important disease that attacks rice; affecting its production and is of common occurrence wherever rice is grown. It is also considered as a model disease for the study of genetics and molecular pathology of host pathogen interactions. Numerous comprehensive studies on both the host and pathogen have been carried out using genomics, proteomics and bioinformatics approaches. Consequently an enormous amount of information has been made available for researchers to carry out further work on this pathosystem. rnBioinformatics has played a significant role in storage and interpretation of the data made available by various wet laboratory experiments, into useful biological information. This review presents an overview of the bioinformatics resources and approaches for the study of rice- Magnaporthe interaction. rn
Keywords: bioinformatics; disease; nucelotide sequence; pathogen; database; host- pathogen interaction; rice blast; genomics.
Efficient Formulation of the Rejection-based Algorithm for Biochemical Reactions with Delays
by Vo Thanh, Roberto Zunino, Corrado Priami
Abstract: The rejection-based stochastic simulation algorithm(RSSA) is an exact
simulation for realizing temporal behavior of biochemical reactions. It reduces the
number of propensity updates during the simulation by using propensity bounds of
reactions to select the next reaction firing.We present in this paper a new efficient
formulation of RSSA and extend it for incorporating biochemical reactions with
time delays. Our new algorithm explicitly keeps track of the putative firing times
of reactions and uses these to selects the next reaction firing. By using such a
representation, it can efficiently handle biochemical reactions with delays and
achieve computational efficiency over existing approaches for exact simulation.
Keywords: Computational biology; Stochastic simulation; Rejection-based stochastic simulation algorithm.
Exploring New Features of a-amylases from Different Source Organisms by an In Silico Approach
by Javad Harati
Abstract: Abstract rnA total of 78 full-length protein sequences of α-amylase from different source organisms were subjected to phylogenetic analysis, multiple sequence alignment (MSA), motif search, and physiochemical properties. The phylogenetic tree was built using the Maximum Likelihood (ML) method in Molecular Evolutionary Genetics Analysis (MEGA) software and was pointed out in two major clusters. One of the clusters included plants and animals, whereas the other one contained fungi, archaea, and bacteria. Furthermore, Firmicutes and Proteobacteria are bacterial phylum that placed in the same evolutionary cluster with plants and animals. The deviations from normal clusters were explained by both motif analysis data and constructing a new tree. MSA declared three conserved sequence blocks, 505-527, 725-745, and 1010-1030, that were present in all studied species. Moreover, it provided information about highly conserved residues at which three glycine and one aspartic acid residues were conserved. Motif analysis with Multiple EM for the Motif Elicitation (MEME) server revealed that Motif 4 HDTGSTQRHWPFPSDHVMQGYAYILTHPGIPCIFYDHFFDW, motif 6 EGAGGPSTAFDFTTKGILQEAVKGELWRLRDPQGKPPGMIGWWPERAVTF, and motif 11 EQIVKLIAIRKRNGIHSRSSIRILEAEGDLYVAMIDEKVCMKIG were present only in plants. Pearson correlation analysis to clarify relationships among different physiochemical properties showed a direct correlation between GRAVY and the aliphatic index and a reverse correlation between GRAVY and pI and instability indexes.
Keywords: a-Amylase; Sequence analysis; Phylogenetic analysis; Conserved regions and residues; Physiochemical characteristics.
Computational Protein Design of Bacteriocins based on structural scaffold of aureocin A53
by Sekhar Talluri
Abstract: Bacteriocins are highly potent polypeptide and protein antibiotics produced by bacteria. They are rapidly degraded in the environment after their use, due to their proteinaceous nature. Some bacteriocins are used as preservatives in foods. Native and engineered bacteriocins are of potential interest as replacements for conventional antibiotics that are loosing their efficacy due to development of antibiotic resistant strains. Aureocin A53 is a class II bacteriocin. It is a broad spectrum antibiotic, with demonstrated ability to inhibit growth of methycillin resistant Staphylococcus aureus (MRSA). Validated computational protein design tools have been used for reengineering of the Aureocin A53 sequence to produce novel sequence variants of the bacteriocins Aureocin A53 and Lacticin Q. The novel proteins are expected to possess an altered spectrum of bactericidal specificity and potency. The quality of the designed proteins was assessed by using structure validation tools and predicted to be better than that of an average experimentally determined protein structure. The protein designed by using FoldX is predicted to be more stable than native Aureocin A53.
Keywords: Bacteriocin; computational protein design; antibiotic; protein engineering; molecular modeling; MRSA (methycillin resistant Staphylococcus aureus).
Molecular docking and in vitro study of S. cumini-derived natural compounds on Receptor tyrosine kinases pathway components
by Pushpendra Singh, Felix Bast, Satej Bhushan, Richa Mehra, Pooja Kamboj
Abstract: Syzygium cumini (S. cumini) are used for a variety of biological activity such as anti-inflammatory, antidiabetic and antioxidant, and currently it has been reported for the DNA protection against radiation. Receptor tyrosine kinases (RTKs) are recognized to control various biological processes including, cell proliferation, metabolism, and apoptosis. These receptors have recently, trapped the consideration of the as an attractive target for cancer treatment due to the confirmation signifying their over-expression in cancer cells. The present research was subjected to screen S. cumini-derived natural compounds against RTKs pathway components by using molecular docking. Furthermore, in vitro anticancer activity of leaf extract of S. cumini such as cell proliferation (MTT), oxidative stress (NBT and H2CDFD) was reported. All selected natural compounds were docked with the X-ray crystal structure of RTKs signaling proteins by employing GLIDE (Grid-based ligand docking with energetics) Maestro 9.6. In the present investigation, our result highlighted that; myricetin, kaempferol, delphinidin chloride, ellagic acid, rutin, petunidin, gossypol, and mirtillin yielded a good dock score with all selected proteins. Protein-ligand interactions accentuated that the lipophilic, hydrogen bonding, π-π stacking, and cationπ interactions represent a ruling contribution at the active site. Moreover, reduction in cell viability with leaf extract of S. cumini treatment at concentrations of 5
Keywords: Keywords: Cancer; Receptor tyrosine kinases; Phosphoinositide-3 Kinase; Natural product compounds; and Maestro 9.6.rn.
Cell-Level 3D Reconstruction and Quantification of the Drosophila Wing Imaginal Disc
by David Breen, Liyuan Sui, Linge Bai, Frank Jülicher, Christian Dahmann
Abstract: We describe a set of techniques that, when applied to a 3D stack of confocal microscopy images, produces a volumetric model of an epithelial tissue, as well as a mesh model of its apicolateral cell boundaries. Via a projection step, detailed 3D models that approximate the individual cells in the epithelium are then defined. Once the individual cells are generated, their apical face area, height and volume may be computed and visualised, providing quantitative and visual data about the patterns of cells within the tissue. We have applied the techniques to the analysis of the developing wing imaginal disc of a late-larval Drosophila melanogaster. Our techniques are being applied to a series of specimens in an investigation that intends to quantitatively substantiate observed cell shape changes that occur during wing imaginal disc development.
Keywords: Reconstruction; implicit models; epithelial tissues; wing imaginal disc; visualisation.
Construction of Discrete Descriptions of Biological Shapes through Curvilinear Image Meshing
by Jing Xu, Andrey Chernikov
Abstract: Mesh generation is a useful tool for obtaining discrete descriptors of biological objects represented by images. The generation of meshes with straight sided elements has been fairly well understood. However, in order to match curved shapes that are ubiquitous in nature, meshes with curved (high-order) elements are required. Moreover, for the processing of large data sets, automatic meshing procedures are needed. In this work, we present a new technique that allows for the automatic construction of high-order curvilinear meshes. This technique allows for a transformation of straight-sided meshes to curvilinear meshes with C1 or C2 smooth boundaries while keeping all elements valid and with good quality as measured by their Jacobians. The technique is illustrated with examples. Experimental results show that the mesh boundaries naturally represent the objects' shapes, and the accuracy of the representation is improved compared to the corresponding linear mesh.
Keywords: biomedical image processing; high-order mesh generation; B.
RECENT ADVANCEMENT IN NEXT-GENERATION SEQUENCING TECHNIQUES AND ITS COMPUTATIONAL ANALYSIS
by Khalid Raza, Sabahuddin Ahmad
Abstract: Next Generation Sequencing (NGS), a recently evolved technology, have served a lot in the research and development sector of our society. This novel approach is a newbie and has critical advantages over the traditional Capillary Electrophoresis (CE) based Sanger Sequencing. The advancement of NGS has led to numerous important discoveries, which could have been costlier and time taking in case of traditional CE based Sanger sequencing. NGS methods are highly parallelized enabling to sequence thousands to millions of molecules simultaneously. This technology results into huge amount of data, which need to be analysed to conclude valuable information. Specific data analysis algorithms are written for specific task to be performed. The algorithms in group, act as a tool in analysing the NGS data. Analysis of NGS data unravels important clues in quest for the treatment of various life-threatening diseases; improved crop varieties and other related scientific problems related to human welfare. In this review, an effort was made to address basic background of NGS technologies, possible applications, computational approaches and tools involved in NGS data analysis, future opportunities and challenges in the area.
Keywords: Massive Parallel Sequencing; Variant Discovery; DNA-Seq; RNA-Seq; Computational Analysis.
Application of machine learning techniques towards classification of drug molecules specific to peptide deformylase against Helicobacter pylori
by Surekha Patil
Abstract: It is crucial to adapt to the current computational drug discovery pipeline to develop novel drug molecules to combat the gastric disorders caused by Helicobacter pylori. Virtual screening techniques can be used as a preliminary screening tool to identify the relevant compounds which may have drug-like properties. These drug-like molecules can be further screened to test their bioactivity against a particular protein target. In this context, we apply different machine learning techniques to generate models to predict the pIC50 value of drug molecules. Molecular descriptors were produced for the drug dataset. Initial models were developed for the dataset with a large number of descriptors. Later, feature reduction techniques were applied to yield feature descriptors with best six variables using three algorithms: principal component analysis (PCA), random forest, and genetic algorithm. Consequently, machine learning techniques were applied to the reduced dataset to develop predictive models. Na
Keywords: Helicobacter pylori; gastric disorders; drug molecule; target protein; virtual screening.
Computational study to understand mechanism of isoniazid drug resistance caused by mutation (R268H) in NADH dehydrogenase of Mycobacterium tuberculosis
by Lingaraja Jena, Shraddha Deshmukh, Tapaswini Nayak, Gauri Wankhade, Bhaskar Harinath
Abstract: NADH dehydrogenase (Ndh) of Mycobacterium tuberculosis is essential for conversion of NADH to NAD+ in presence of FMN. An increased NADH/NAD+ ratio was reported due to mutation (R268H) in Ndh, causing INH resistance. To study the effect of this mutation on Ndh, molecular dynamics (MD) simulation analysis was performed for both wild and mutant models independently as well as for docked complexes (Ndh-NADH and Ndh-FMN). Simulation study showed that mutation (R268H) affected the secondary structure of the enzyme giving extra stability to the mutant model R268H as observed in the RMSD plot. Further, it was observed that both wild type and mutant models of Ndh were quite stable in complex with NADH but in case of FMN, the Ndh mutant appears to be more unstable and might be the reason for decreasing NAD+ concentrations thus hindering INH-NAD adduct formation resulting in isoniazid resistance.
Keywords: NADH degydrogenase; tuberculosis; isoniazid; drug resistance; mutation; NAD.
Statistical Analysis of the in silico binding affinity of P-glycoprotein and its substrates with their experimentally known parameters to demonstrate a cost-effective approach for screening, ranking and possible prediction of potential substrates
by Suneetha Susan Cleave A, P.K. Suresh
Abstract: Over-expression of P-glycoprotein (P-gp) has been reported as a cause of multi-drug resistance in cancers and other diseases. Transport assays, which are generally used to find out the specificity of a compound to be effluxed, have always been time consuming, resource-intensive and expensive and thus, have inherent limitations to easily predict a compounds specificity. Hence, there is a clear-cut, unmet need to develop cost-effective methods for screening, identification and ranking of P-gp substrates. All compounds (23 substrates and 3 non-substrates) were docked to two homology modeled human P-gp conformations. The in silico binding affinities, obtained for all substrates, were checked for correlation with their experimentally determined efflux ratios, LogP values and number of hydrogen bond acceptors they possess. Docking results showed that all compounds demonstrated differences in relative binding affinity. Experimentally-derived efflux ratio obtained for 19 substrates from literature, for the first time showed a significant, Spearman correlation with binding energies to outward-facing conformation. Thus, it can be said that binding energies obtained from docking studies can possibly have significant potential in identifying the specificity and ranking P-gp substrates. This approach provides a sound foundation to strengthen the relationship of in silico binding energies with other experimentally defined physico-chemical parameters and can also be part of an iterative process to identify and develop a potentially, validatable solution.
Keywords: Autodock; in silico binding energy; P-glycoprotein (P-gP); efflux ratio; LogP; hydrogen bond acceptors; Spearman Rank Correlation.
Genetic algorithm based clustering for gene-gene interaction in episodic memory
by Sudhakar Tripathi, Ravi Bhushan Mishra, Anand Sharma
Abstract: After the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, it is now clear that gene-gene interactions are fundamental mechanisms for the development of complex diseases. In this paper, we propose a genetic algorithm based clustering algorithm to identify groups of related genes in episodic memory. This clustering method required number of clusters and number of genes in each cluster and fitness function. In this paper, we have taken STRING 9.1 clustering method result on episodic memory. We have used interaction between clusters as a fitness function for the genetic algorithm and have compared the result of genetic algorithm based clustering algorithm with standard K-means, STRING 9.1 K-means, Hierarchical and SOM. We have evaluated the performance of all the above methods using Rand index, Jaccard index and Minkowski index. Our comparative study demonstrates that the proposed genetic algorithm is close to hierarchical clustering method So far as the performance is concerned.
Keywords: gene-gene interaction; clustering; genetic algorithm; k-means; hierarchical; SOM; STRING 9.1.
Effect of single amino acid mutations on C-terminal domain of breast cancer susceptible protein 1
by Satish Kumar, Lingaraja Jena, Maheswata Sahoo, Kanchan Mohod, Sangeeta Daf, Ashok Varma
Abstract: The most commonly diagnosed cancer in women is the breast cancer. Around 5 - 10% of breast cancer cases are hereditary, mainly due to the mutation in the breast cancer susceptible BRCA1 and BRCA2 tumor-suppressor genes. More than hundreds mutations are documented in BRCA1 C-terminal region (BRCT), mainly associated with repairing DNA damage and cell cycle control. In this study, we employed different mutation analysis system such as SIFT, MutPred, PON-P2, META-SNP etc to predict the pathological effects of 95 distinct miss sense mutation on BRCT domain. Out of which, 37 mutations were predicted to be deleterious by all mutation analysis systems affecting the protein stability and its normal function leading to causing cancer. The computational approach for finding out the impact of mutation on BRCA protein may provide a way in early detection and therapy in breast cancer patients.
Keywords: breast cancer; mutation; BRCA1; BRCT; bioinformatics; mutation analysis.
On using the wisdom of the crowd principles in classification, Application on breast cancer diagnosis and prognosis.
by Merouane Amraoui, Tarik Boudghene Stambouli, Belal Alshaqaqi
Abstract: Breast cancer diagnosis and prognosis are an oblique processes, where errors can be fatal, it is done by experts only. Therefore, researchers are using the promising potentials of classification algorithms to detect malignant and benign tumours. Classification techniques vary widely, from individual classifiers such as rules, trees and functions to ensemble classifiers that combine serval classification algorithms. In this paper, we examine the use of wisdom of crowds in classification of breast cancer. We use four well-known data sets and run a collection of 53 algorithms combined with majority voting to simulate the wisdom of crowds. Furthermore, we report the results obtained from all of 53 algorithms executed individually on the four datasets. Therefore, this article can be perceived as a review for the classification methods as well. Finally, we compare the results obtained from applying majority voting using the best five classifiers, to those obtained by applying the wisdom of the crowds.
Keywords: breast cancer; wisdom of the crowd; WDBC; WPBC; BCD; Wisconsin; Weka; classification; majority voting; diagnosis; prognosis;.
Long Non-coding RNAs in Animal Genomes: Challenges and Promises
by Prashanth Suravajhala, Lingzhao Fang
Abstract: Majority of the eukaryotic genes do not code for proteins, i.e. there are regions without coding potential. If they do not code, it was earlier supposed to be of disinterest as they wouldnt be associated with any disease. However, the last decade has seen advances in the field with certain (non-coding) RNA molecules transcribed; regulate expression of genes and further known to affect the transcription and cell cycle of organism. A class of such non-coding RNAs identified during the last decade is long non-coding RNAs (lncRNA) that are known to play a role in wide variety of diseases. We outline a few challenges and promises of lncRNAs specific to animal/livestock genomes that we could exploit in identifying their role in various diseases. For brevity, we have considered bovine/clinical mastitis to show an example.
Keywords: Long non-coding RNAs; transcription; diseased genes.
Detection of Postural Balance Degradation using Fuzzy Neural Network
by Neeraj Singh
Abstract: Postural balance is often studied in order to understand the effect of sensory degradation with age. The aim this study is to develop a set of methods for analysing static and dynamic stabilogram signals to determine a different set of parameters, which can be used to detect a degradation in equilibrium using the self-adaptive neuro-fuzzy inference systems (SANFIS). For analysing the static stabilogram signal, the first method of detecting the critical point interval (CPI) at which sensory feedback is developed as part of a closed-loop postural control strategy. For analysing the dynamic stabilogram signal, the second method is developed as autoregressive moving average (ARMA) (rate of changes or fluctuation) and area of a curve under the slope from the Z-force signal (Z- Area) during stepping up. Static and dynamic balance is evaluated using a force plate for a group of young subjects and elderly subjects. The conducted experiments using static signals show that the lower values of CPI are associated with increased closed-loop postural control, indicating a quicker response to sensory input. The CPI for elderly subjects occurs significantly quicker than for young subjects, indicating that posture is more closely controlled. Similarly, the conducted experiments using dynamic signals show that the lower values of ARMA and higher values of Z-Area are indicative of a more hesitant step up. Young subjects have significantly higher values of ARMA than elderly subjects. Similarly, elderly subjects have significantly greater Z-Area values than young subjects. Further, the determined features from static and dynamic stabilogram signals are used to detect and predict the degradation in postural balance using fuzzy neural network. The selected features are randomly selected for training and testing during the classification and prediction in postural balance, where we have achieved average 95.3% accuracy of the result of classification and prediction of the degradation in equilibrium in 10 trials.
Keywords: Centre of pressure; postural control; stepping-up; ground reaction forces; clustering; neuro-fuzzy systems.
Graph pruning based approach for inferring disease causing genes and associated pathways
by Jeethu Devasia, Priya Chandran
Abstract: Analysis of interactions among genes in molecular interaction
networks leads us into the understanding of cellular processes in a system level.
Differentially expressed genes and their interactions form the basis of the disease
state. The problem of inferring disease causing genes and dysregulated pathways
has obtained a vital position in computational biology research. But, the huge
size of the biological network makes this process computationally challenging.
Here, we tackle the problem of inferring disease causing genes and associated
pathways using graph pruning techniques which focus on the improvement in
accuracy of results in reasonable execution time and fetching more causal genes
and their pathways. Experimentation of the proposed approach and the reported
approaches in literature was done on real biological data. More efficient results in
terms of accuracy and execution time based on benchmark datasets were obtained
as its outcome. Apart from these, this paper focuses on retrieving more number
of newly identified genes and their pathways so that these genes/pathways could
be analyzed for any unknown influences in the disease development. Biological
relevance of the results was also analyzed. If the function of the newly identified
genes/pathways in the disease states could be validated biologically, it would
significantly influence our effort to design new drug targets and defeat the diseases.
Keywords: Biological Network; Gene expression; Disease causing genes; Dysregulated pathways; Graph pruning.
In silico deleterious prediction of Nonsynonymous Single Nucleotide Polymorphisms in Neurexin1 Gene for Mental Disorders
by Ashraf Hendam, Ahmed Farouk Al-Sadek, Hesham A. Hefny
Abstract: Neurexin1 (NRXN1) gene is playing an important role in synaptic formation, plasticity and maturity. Studies have reported non-synonymous SNPs in NRXN1 in patient with mental disorders. The current work is applying computational tools on recoded NRXN1 SNPs in mental disorder patients. The aim of the work is to identify deleterious SNPs, determine damaged protein features (function, stability) and recognize potential protein regions for future research. The effect on protein function is predicted by PROVEAN, SIFT and PolyPhen-2 while protein stability is predicted by MUpro and I-Mutant2.0. Prediction results have identified 2 SNPs to be deleterious by all tools. Higher deleterious results in the stability tools with the percentages of 72%, 78% than the function tools with 25%, 41% and 47%. Agreement percentage of deleterious prediction between stability tools was 56% while 12.5% in the function tools. The identified regions of NRXN1 for future research are SP and LNS4.
Keywords: Nonsynonymous SNP;In silico;Neurexin1;Mental;Disorders;Autism;PROVEAN;SIFT;PolyPhen-2 ;MUpro;I-Mutant2.0.
Identification of novel flowering genes using RNA-Seq pipeline employing combinatorial approach in Arabidopsis thaliana time-series apical shoot meristem data
by Sumukh Deshpande, Anne James, Chris Franklin, Lindsey Leach, Jianhua Yang
Abstract: Floral transition is a crucial event in the reproductive cycle of a flowering plant during which many genes are expressed that govern the transition phase and regulate the expression and functions of several other genes involved in the process. Identification of additional genes connected to flowering genes is vital since they may regulate flowering genes and vice versa. Through our study, expression values of these additional genes has been found similar to flowering genes FLC and LFY in the transition phase. The presented approach plays a crucial role in this discovery. An RNA-Seq computational pipeline was developed for identification of novel genes involved in floral transition from A. thaliana apical shoot meristem time-series data. By intersecting differentially expressed genes from Cuffdiff, DESeq and edgeR methods, 690 genes were identified. Using FDR cutoff of 0.05, we identified 30 genes involved in glucosinolate and glycosinolate biosynthetic processes as principle regulators in the transition phase which provide protection to plants from herbivores and pathogens during flowering. Additionally, expression profiles of highly connected genes in protein-protein interaction network analysis revealed 76 genes with non-functional association and high correlation to flowering genes FLC and LFY which suggests their potential and principal role in floral regulation not identified previously in any studies.
Keywords: Apical shoot; Flowering; Pipeline; Cuffdiff; Step Analysis; Differential expression; Enrichment; Arabidopsis Thaliana.
A Comparison of Genetic Imputation Methods using Long Life Family Study Genotypes and Sequence Data with the 1000 Genome Reference Panel
by Aldi Kraja, E. Warwick Daw, Petra Lenzini, Lihua Wang, Shiow Lin, Christine Williams, Alan Wells, Kathryn Lunetta, Joanne Murabito, Paola Sebastini, Guiseppe Tosto, Sandra Barral, Ryan Minster, Anatoly Yashin, Thomas Perls, Michael Province
Abstract: This study compares methods of imputing genetic markers, given a typed GWAS scaffold from the Long Life Family Study (LLFS) and latest reference panel of 1000-Genomes. We examined two programs for pre-phasing haplotypes MACH / SHAPEIT and MINIMAC / IMPUTE for imputation. SHAPEIT is advantageous for haplotype pre-phasing. MINIMAC and IMPUTE produced similar imputation quality. We used a 4MB region on chromosome 2 of LLFS and in the Supplement, we compared methods using chromosome 19 data from the Genetic Analysis Workshop-19. IMPUTE had the advantage of using two references 1000G and a sequence for a subset of subjects. SHAPEIT and IMPUTE were used to finalize the full LLFS autosome imputation. In LLFS, 44% of ~80M autosomal imputed variants showed good imputation quality (info ≥ 0.30). Low imputation quality was associated with a predominantly low allele frequency in 1000-Genomes. New emerging large-scale sequences and enhanced imputation methodologies will further improve imputation quality.
Keywords: genetic imputation; 1000 genomes reference; sequence reference; MACH software; MINIMACH software; SHAPEIT software; IMPUTE software; FCGENE software; Long Life Family Study.
A Comprehension of Contemporary Effort for Tracking of Lip
by Nandini M S, Nagappa U. Bhajantri
Abstract: Lip tracking exercise is the most important prerequisite for lip reading system. Most of the Lip reading procedures are accessible based on lip contour analysis. Similarly, lip contour extraction is a fundamental footstep. As a results of lip contour extraction, initially the process of lip contour detection in the first frame of an audio-visual image sequence. Subsequently capturing contour in successive frames is normally named as lip tracking.This paper presents an overview of contemporary works on extracting face from digital video and classified the face into lip area and non-lip area by categorizing the approaches into low and high level processing techniques. These system can be applied on asymmetric lips, the mouth with visible teeth, tongue and mouth with mustache. Furthermore the comparative study of approaches and their effectiveness based on various factors are offered.
Keywords: Lip Reading; Lip Tracking; Lip Segmentation; Lip Localization; Adaboost; Statistical Estimator.
Extrapolating the effect of nonsynonymous SNP in bread wheat HSP16.9B gene: a molecular modeling and dynamics study
by Bharati Pandey, Saurabh Gupta, Atmakuri Ramakrishna Rao, Dev Mani Pandey, Ravish Chatrath
Abstract: Small heat shock proteins (sHSP) are ubiquitous and play a key role in protein homeostasis under stress conditions. Single nucleotide polymorphism was predicted in HSP16.9B gene but so far its impact on protein structure has not been extensively studied. Keeping this point in mind, we applied computational methods and performed molecular dynamics simulation to examine the effect of aspartic acid (D) substitution for asparagine (N) at residue 11 (D11N) in HSP16.9B. Furthermore, the secondary structural analysis revealed an addition of beta sheet before the mutation position in the mutant protein. Three dimensional protein structure modeling, validation of structures and molecular dynamics were performed to gain insight into the influence of the non-synonymous single nucleotide polymorphism on structural changes. The root mean square deviation result showed the stability of the mutated structure throughout simulations. The root mean square fluctuation and H-bond scores further supported our results. Altogether, our investigation will be a landmark to understanding the molecular basis of HSP16.9 functionality.
Keywords: Molecular dynamics simulation; Heat shock protein; Molecular modeling; Secondary structure.
SCAN DB: An integrated catalogue of computationally characterized NER specific skin cancers
by Varsha Mehta, Tanya Singh, Ankush Bansal, Tiratha Raj Singh
Abstract: SCAN DB, acronym for Skin CAncer Ner DataBase, provides a unique, first of its kind repository for understanding the biochemistry of the NER pathway, disease dynamics, genetics, clinical information, expression, evolutionary trajectories and of the skin cancers. It is an exclusive and curated database focusing majorly on NER pathway, which assists in the development and discovery of new diagnostic and prognostic therapies, the characterization of these cancers via making complete use of scattered data available through publications, technical and clinical reports, databases etc. DNA damage has emerged as a major culprit in cancer and many age related diseases. Simultaneously, DNA repair and genomic integrity management have become of prime importance in this cancerous era. One of the significant pathways to remove these bulky lesions is Nucleotide Excision Repair (NER) pathway, whose deficiencies of NER repair proteins are also associated with the skin cancer prone inherited disorder - Xeroderma pigmentosum and other neurodegenerative abnormalities like Cockayne Syndrome and Trichothiodystrophy. However, a well structured, integrated and comprehensive resource of NER pathway and related skin cancers is presently not available. Therefore, SCAN DB effectively bridges this gap in knowledge. The database can be accessed using the URL http://bioinfoindia.org/SCANDB//index.php
Keywords: Nucleotide excision repair; Xeroderma pigmentosum; Cockayne Syndrome; Trichothiodystrophy; DNA damage; DNA repair.
Usage of Ensemble Model and Genetic Algorithm in Pipeline for Feature Selection from Cancer Microarray Data
by Barnali Sahu, Satchidananda Dehuri, Alok Jagadev
Abstract: This paper proposes an ensemble of feature selection techniques with genetic algorithm in the pipeline for selecting features from microarray data. The ensemble is a combination of a well- balanced collection of filter and wrapper-based feature selection methods. In addition, for further refinement of the resulting output of ensemble, the genetic algorithm in the pipeline is taken to produce a non-local set of robust feature subset. An extensive computational experiment has been carried out on a prostate cancer data set for validation of the method. Moreover, we have compared the performance of our method with group genetic algorithm (GGA). Finally, the resultant feature subsets of GA, GGA, and other constituents of the ensemble in standalone mode have been used for uncovering frequent patterns based on two popular association rule mining like Apriori and FP-growth. The experimental study confirms that the proposed method gives classification accuracy of 100%, 98.34%, 98.02%, and 97.00% based on an ensemble of classifiers w. r. t. 5, 10, 15, and 20 features, respectively. On the other hand, the classification accuracies of the same sequence of feature subsets selected by GGA are 92.34%, 90.34%, 86.54%, and 87.21%. Therefore, the proposed approach is treated as a promising alternative tool in the arena of feature selection and classification of microarray data.
Keywords: Microarray data; Differentially expressed genes; Ensemble feature selection; Apriori; FP-growth.
A Concept of Sub-bands Event Related Potentials to Increase classes of Brain Computer Interface system
by Mitul Kumar Ahirwal, Anil Kumar, Girish Kumar Singh
Abstract: Event Related Potential (ERP) detection and translation into commands for Brain Computer Interfacing (BCI) achieves significant stability on the basis of concrete theories of general physiological changes in Electroencephalogram (EEG) signals related to various tasks. However, each ERP related to particular task can be only exploited as one-to-one relation with specific command or operation. This limits the variability of BCI system and increases the amount of work to identify task related accurate pattern changes in EEG. In this paper, sub-band analysis of detected ERP is proposed in order to factorize one-to-one relation into one-to-many for increasing the variability of BCI system. First, the hypothesis based on analysis of Event-Related Spectral Perturbation (ERSP) is stated, and then the hypothetical concept is generalized with sub-bands decomposition of ERP, followed by culminative power estimation. Results show that the proposed technique can be easily implemented as a method of Combined Factorized Feature Extraction (CFFE) to execute multiple commands corresponding to single ERP. Classification is also performed with feed-forward neural network.
Keywords: ERP; EEG; Classification; Sub-band decomposition.
New gene selection algorithm using hypeboxes to improve performance of classifiers
by Adil Bagirov, Karim Mardaneh
Abstract: With the development of DNA microarray technology the expression levels of thousands of genes can be measured simultaneously in one single experiment. However, the large number of genes and relatively small number of samples in microarray data sets are among main difficulties for classification of new tumors. Therefore, efficient gene selection algorithms are required to identify differentially expressed genes or groups of genes and to improve performance of classifiers. A new gene selection algorithm is developed to improve performance of classifiers on gene expression data sets. The new gene selection algorithm is based on calculating the marginal hyberboxes of genes or groups of genes for each tumor type and overlaps of hyberboxes of different tumor types. The results on six gene expression data sets demonstrate that the algorithm is able to considerably reduce the number of genes and to significantly improve performance of classifiers.
Keywords: gene selection; gene expression; DNA mictoarray technology; hyperboxes.
A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification
by Shigang Liu, Jun Zhang, Yang Xiang, Dongxi Xiang
Abstract: Biomedical data are widely accepted in developing prediction models for identifying a specific tumor, drug discovery and classification of human cancers. However, previous studies usually focused on different classifiers, and overlook the class imbalance problem in real-world biomedical datasets. There are a lack of studies on evaluation of data pre-processing techniques, such as resampling and feature selection, on imbalanced biomedical data learning. The relationship between data pre-processing techniques and the data distributions has never been analysed in previous studies. This article mainly focuses on reviewing and evaluating some popular and recently developed resampling and feature selection methods for class imbalance learning. We analyse the effectiveness of each technique from data distribution perspective. Extensive experiments have been done based on five classifiers, four performance measures, eight learning techniques across twenty real-world datasets. Experimental results show that: (1) resampling and feature selection techniques exhibit better performance using support vector machine (SVM) classifier. However, resampling and Feature Selection techniques perform poorly when using C4.5 decision tree and Linear discriminant analysis classifiers; (2) for datasets with different distributions, techniques such as Random undersampling and Feature Selection perform better than other data pre-processing methods with T Location-Scale distribution when using SVM and KNN (K-nearest neighbours) classifiers. Random oversampling outperforms other methods on Negative Binomial distribution using Random Forest classifier with lower level of imbalance ratio; (3) Feature Selection outperforms other data pre-processing methods in most cases, thus, Feature Selection with SVM classifier is the best choice for imbalanced biomedical data learning.
Keywords: class-imbalance; data distribution; classification; biomedical data; resampling; feature selection.
A Software Tool for Protein Sequence Alignment
by Justin Lee, Shawn Wang
Abstract: Protein sequence comparison is one of the most popular techniques for protein data analysis. Because a specific function of a protein is often determined by a small segment in the sequence, algorithms for optimal local alignment are among the most studied. Since Smith and Waterman proposed the dynamic algorithm for optimal local alignment in 1981, many local alignment tools have been developed. Each of these tools was developed based on a specific cost model and adapted to the effectiveness of that cost model, often in comparison with algorithms that had been developed based on other cost models. As a consequence, these tools lack the flexibility of accepting different cost models and incorporating biological properties to guide the alignment algorithms. They often perform superior in some cases while lead to inaccurate alignment results in others. In this paper, we introduce an effective tool called INSPAL (INformation SPecific ALgorithm) that is not based on any specific cost model, instead allowing the user to adjust the parameters for alignment according to the sequences under consideration and the biological properties that are specific to these sequences. Experimental comparison with two most popular alignment tools ALIGN and SIM indicated that INSPAL generated better alignment results with appropriate settings of the parameters. INSPAL was developed as a Windows Installer Package using Microsoft Visual Studio C++.rnIt provides a user friendly graphic user interface and is very easy to install and use.
Keywords: protein sequence alignment; hydrophobicity; Pascarella Value; dynamic algorithm; bioinformatics.
Helix-helix interaction viewed in an angle frame indicates a role of the size of sidechains in packing
by Xiubei Liao
Abstract: Three-dimensional packing is an essential quality of proteins that determines their interaction with other proteins and their biological function. Especially the packing of helical elements is important for the folding, stability, and interactions of proteins. Previously, different hypothesis have been used to develop algorithms that would predict helical packing in proteins. So far there has been a dearth of reliable approaches to predict the types of residues used in hydrophobic cores. Furthermore, the stereological arrangement of individual amino acids and in three dimensional hydrophobic cores is rather difficult to determine. In order to simplify the description of packing inside a protein and between two proteins, we have determined the relationship among angles, distances, and residue usage between two helices. This approach provides a means to predict the three-dimensional packing of helices and allows for an understanding of the interaction within proteins and among proteins based on surface contact residue parameters.
Keywords: Protein Structure; Helix-helix interaction.
Recognizing of repetitive and stereotyped movements for children with Autism spectrum disorder
by Maha Jazouli, Soufiane Ezghari, Aicha Majda, Azeddine Zahi, Rachid Aalouane, Arsalane Zarghili
Abstract: Autism spectrum disorder (ASD) is a group of conditions that cause individuals to have difficulties with social impairment, communication difficulties, and repetitive and stereotyped behaviours. Autistic people often engage in stereotyped and repetitive motor movements. Hence, our aim is to put out a smart video surveillance system that facilitates the diagnosis of ASD for doctors. In this respect, we propose an automatic stereotypical motor movement detection system in real time. Firstly, we use the Kinect sensor to monitor the autistic child\'s movements. Secondly, we propose a data integration process to make the provided data from Kinect sensor more comprehensive and specific. Thirdly, we perform the gesture detection by using the well know machine learning algorithms such as decision tree, artificial neural network and nearest neighbour. We experiment our proposal in five stereotyped behaviours. The obtained result is very promising and shows that the data integration step enhances the gesture recognition.
Keywords: Autism spectrum disorder; Stereotypical motor movements; stereotyped behaviours; Kinect Sensor; gesture recognition; machine learning.
Similar Gene Expression Profiles Define Leptospirosis Clinical Outcomes
by Nivison Nery, Daniela Barreiro Claro, Janet Lindow
Abstract: Leptospirosis, an acute, febrile disease with high case fatality, is prevalent in many tropical, urban regions. The mechanisms leading to death from leptospirosis are not fully understood. However, recent studies indicate that differences in the immune response during acute infection are associated with fatality. To identify transcriptional signatures that could differentiate survivors and case fatalities, we analyzed data obtained from full human genome transcriptome profiling of whole blood from patients with different disease outcomes. Using clustering algorithms, we identified unique groups, demonstrating that surviving patients and fatal cases have significant differences in their transcriptional profiles. We also confirmed our prior findings, which showed expression differences in genes involved in the immune response.
Keywords: Clustering analysis; leptospirosis; gene expression.
Comparative Regression Performances of Machine Learning Methods Optimizing Hyperparameters: Application to Health Expenditures
by Songul Cinaroglu, Onur Baser
Abstract: Least Absolute Shrinkage and Selection Operator (Lasso), K-Nearest Neighbor (KNN), Random Forest (RF) and Support Vector Machine (SVM) regression are successful machine learning algorithms used in various areas. However, there has been no study analyzing health expenditures using machine learning methods. This work is a step forward in comparing the regression performances of L, NN, RF and SVM regression while changing hyperparameter values. In this study, lambda (λ), number of neighbors (NN), number of trees (NT) and epsilon (ε) parameter for L, NN, RF and SVM regression were determined as hyperparameters respectively. K-fold cross-validation was performed to examine regression performance results. These results show that KNN (R2˃0.75; RMSE˂0.70; MAE˂0.55) and L (R2˃0.79; RMSE˂0.20; MAE˂0.15) regression yields better results in predicting health expenditure per capita and out of pocket health expenditure (%) respectively. Moreover, L, KNN, RF and SVM regression methods performance differences are statistically significant (p˂0.001). It is hoped that these results will stimulate further interest in using machine learning methods to predict health expenditures.
Keywords: Machine Learning; Random Forest Regression; Support Vector Regression; Hyperparameter Optimization; Black-Box Optimization; Health Expenditures.
Biological characteristics evaluation to predict enzyme classes with support vector machine
by Gabriela Santos, Cristiane Nobre, Luis Zárate
Abstract: Predicting protein function is a latent problem and a challenge in the field of bioinformatics. Over the years several computational approaches have been proposed for this purpose. One of the approaches is based on characteristics, which makes use of biologic relevant information. The several contributions have considered one or a combination of characteristics belonging to the four protein structures in order to classify enzymes in one of its classes. In this study we evaluate a set of characteristics that represent the four structural levels (primary, secondary, tertiary and quaternary), such as electrostatic potential, hydrophobicity, amino acids frequency, distance between α-carbons and molecular weight for classify enzymes in one of its classes. The characteristics were combined with each other, forming 15 datasets. In this study, in order to evaluate the relevance of the characteristics, we consider the SVM classifier due presenting satisfactory results in the process of biological data classification. The objective of this study is to contribute for the most appropriate choice of characteristics for the proteins function prediction.
Keywords: Prediction of protein function; Enzyme; Suport vector machine.
A Hybrid Method for Classification of Physical Action Using Discrete Wavelet Transform and Artificial Neural Network
by Gopal Chandra Jana, Aleena Swetapadma, Prasant Kumar Pattnaik
Abstract: This paper proposes a method for physical action classification based on wavelet analysis and artificial neural network (ANN) from electromyography (EMG) signals. The physical action includes the person's normal action as well as aggressive action. During various types of physical actions, the EMG signals are recorded. Discrete wavelet transforms (DWT) with DB-4 wavelet is used for feature extraction from recorded EMG signals. Extracted features are given as input to the artificial neural network-based classifier to distinguish between normal actions and aggressive actions. The hybrid approach using combination of ANN and wavelet shows significance increase in level of accuracy in classifying the physical action. Hence proposed method can be used to discriminate the physical actions ultimately helps in identifying persons mental state.
Keywords: Electromyography (EMG); Wavelet analysis; Discrete wavelet transform (DWT); Artificial Neural Network (ANN); Classification.
Computational Studies to Explore the Role of MSI Associated DNA Mismatch Repair Mechanisms in HNPCC Through Expression and Interaction Data
by Sadhika Behl, Arushi Sharma, Prashant Survajhala, Tiratha Raj Singh
Abstract: Microsatellite instability (MSI) is an error mechanism associated with DNA mismatch repair (MMR) system constituting a set of genes. If MMR fails, MSI may lead to various forms of cancers such as hereditary non polyposis colorectal cancer (HNPCC). In this study, we explored the gene expression and network data to reveal the significance of MSI in HNPCC. Genes and proteins were observed for their specific role in HNPCC with respect to MSI and MMR. Besides standard markers, few genes such as PMS1, TP53, MLH1, CHEK2, RFC3, LIG1, AURKA, CCND1, POLD1, HMGB1, ERCC1, ERCC2, PTGS2, and SLC19A1were identified as putative markers having significant contribution in the regulation of the mechanisms associated with MSI and MMR for HNPCC. Experimental validation of these genes will prove to a promising outcome for further research and will aid in the maintenance of the disease.
Keywords: DNA mismatch repair; Microsatellite Instability; Hereditary non polyposis colorectal cancer; Significant Microarray Analysis; Differentially Expressed Genes.
Neural network based prediction of less side effect causing cancer drug targets in the network of MAPK pathways
by M.D. Aksam V.K, Chandrasekaran V.M., Sundaramurthy Pandurangan
Abstract: Computational side-effect prediction tools have been used in rational drug design to decrease the late-stage failure of the drugs under trial. Irrational selection of cancer drug targets in the deregulated MAPK pathways causes more side effects. Quantitative data on the network centralities and biological features degree, radiality, eccentricity, closeness, bridging, stress, pagerank centralities, essentiality, pathway-specific proteins, disease-causing proteins, protein domains and the other functional features exploited. We trained an artificial neural network with 15 selected features for the binary classification of side effects causing and less side-effect causing drug targets among the non-targeted proteins. Inter-relationship among the node centralities revealed three clusters with positive correlations. Among three clusters of centralities, the top centrality nodes overlap within the clusters playing multiple roles in the complex networks. Top-ranked proteins among the degree, eccentricity, betweenness centralities, possessing GO-based molecular function, involved in more than one biocarta pathways, domain content is prone to cause a number of side effects than other centralities and functional features. We predicted the following 15 less side effect causing cancer drug targets - Shc, Rap 1a, Mos, Tpl-2, PAC1, 4EBP1, GAB1, LAD, MEF2, ZAK, GADD45, TAB2, TAB1, ELK1 and SRF.
Keywords: Cancer drug targets identification; Network of MAPK pathways; Side effects; Essential proteins; Graph theory.
A hybrid method for differentially expressed genes identification and ranking from RNA-Seq data.
by Mohammad Samir Farooqi, Devendra Kumar, Dwijesh Chandra Mishra, Anil Rai, Niraj Kumar Singh
Abstract: RNA-Seq has gained immense popularity and emerged as a potential high-throughput platform for identification of differentially expressed (DE) genes. In order to estimate the nature of differential genes, it is important to find statistical distributional property of the data. In the present study we propose a new hybrid model (NBPFCROS) based on parametric and non-parametric statistic for the identification of DE genes. The NBP model based on Compound mixture of Poissongamma distribution is used as a parametric statistic and Fold change value derived using fold change rank ordering statistics (FCROS) algorithm is used as non-parametric statistic, we used a gene significance score pi value by combining expression fold change (f value) and statistical significance (P-value). The performance of NBPFCROS model was compared with NBP, FCROS, edgeR and DESeq2 models using synthetic and real RNA-Seq datasets and it was found that the developed model NBPFCROS is more robust as compared to the other models.
Keywords: RNA-seq; differentially expressed genes; parametric and non-parametric statistic; Fold change; gene significance score; classification accuracy; gene ranking.
Structure Based Inference of Functional Single Nucleotide Polymorphism and its Role in TGF1 Allied Colorectal Cancer (CRC)
by Ankita Shukla, Tiratha Raj Singh
Abstract: Motivation: Single-nucleotide polymorphisms (SNPs) play a crucial role in understanding the genetic basis of complex form of the human diseases. Till date vast varieties of studies have given major attention to TGFβR1 and TGFβR2 receptors in colorectal cancer (CRC), however TGFβ1 remains to be poorly understood. It is still a major challenge to identify the functional SNPs in a CRC-related TGFβ1 gene.
Background: CRC is the third most common form of the cancer related deaths worldwide. The relation between SNPs and CRC is a major concern; as they offer valuable markers for identifying genes responsible for disease susceptibility. SNPs majorly account for the more common form of genetic variation and majorly they fall in the coding regions of the human genome.
Method: In this study, total 136 mutations were retrieved for TGFβ1 out of which non-synonymous 37 mutations were considered. Initially sequence and structure based tools were used for damage prediction. The mutations that were predicted to be damaging by majority of the tools were then considered for the structure dynamics study.
Result: In this paper we targeted only one mutation type i.e. L28F to evaluate its effect on disease. Structure conservation studies have been performed to infer the effect of the mutation at the region with respect to its conservation profile. The study depicts the changes occurring to the overall structure due to a single amino acid variation (i.e. L28F) can probably cause damage to the structure by alterations at 2
Keywords: Colorectal cancer; Carcinogenesis; Molecular Dynamics; Polymorphism.
In silico Design and Analysis of Recombinant-Fibroin Fusion Protein as a Biomaterial for Enhanced Human Tissue Regeneration and Drug Delivery
by Mamatha Dadala Mary, Jyothi Singaraju, Swetha Kumari Koduru, Satyavathi Valluri V, Jayakumar Rajadas
Abstract: Chimeric proteins are fabricated by a combination of two or more independent genes coding for separate proteins, and these proteins are mostly used as biomaterials in the medical field. Silks are the protein polymers spun into fibers by some lepidopteran larvae, majorly silkworms. Since decades, silk fibers have been used in many clinical applications, because of their enhanced environmental stability, high density and insolubility in most solvents. Our present work focuses on in silico designing and construction of recombinant fusion protein of silkmoth Fibroin heavy chain (FibH) and Human Elafin (Elfn), a skin-derived anti leukoprotease protein, encoded by PI3 gene. A compatible biomaterial of recombinant-fibroin fusion protein has been designed with and without hydrophobic linker. The physicochemical properties, structural properties and stability of the two kinds of fusion proteins were analyzed in silico, which paves a way for their application as biomaterials in enhanced human tissue regeneration and in drug delivery system.
Keywords: Chimeric proteins; Silk biomaterial; Fibroin heavy chain; Elafin; Fusion protein; Human tissue regeneration.
An Efficient Framework for Accelerating Needleman-Wunsch Algorithm Using GPU
by Hamza Nadim, Mohamed Assal, Abdelfatah A. Hegazy
Abstract: The Needleman-Wunsch algorithm is considered the benchmark for global alignment, this work proposes a new implementation for the parallel NW algorithm over the GPU. Focusing on enhancing the second phase of the algorithm (The Fill) the most time demanding phase. The idea of filling a percentage of the matrix is presented which guarantees a decrease in execution time, the key was to find the minimum needed percentage to be filled while ensuring the same result as filling the whole matrix of the algorithm. Experiments show the effectiveness of the proposed model in execution time when compared with the sequential algorithm.
Keywords: Needleman-Wunsch; GPU; Cuda; Sequence Alignment;Partial Matrix Filling.
Comparative study of synonymous codon usage in bacteria growing at extreme temperatures
by Monisha Singhal, Pragya Chaturvedi, R.K. Gothwal, M.K. Mohan, Pooransingh Solanki
Abstract: With the availability of completely sequenced archaeal genomes it has become possible to compare the codon and amino acid usage strategies among different extremophiles. The adapted sequence of codons and amino acids decides the conformational pattern in structure of proteins and thereby confers on the specificity and structural integrity which remains maintain irrespective of the growth conditions. Correspondence analysis, a multivariate analysis method, was used to characterize various patterns present in the dataset of 200 genes encoding the ten key enzymes of citric acid cycle from 20 organisms surviving at varying degree of temperature. The study has shown that the different extremophiles follow a specific trend of codon usage and amino acid composition which is affected by temperature variation and base composition which is vital for functional and structural stability of enzymes and hence for their adaptive survival in such harsh environmental conditions. It was found that higher temperature favours high aromaticity score which can be linked to its thermal behaviour. The results and statistical analysis of various parameters of codon usage shows a level of preference in synonymous codons and indicates towards a kind of anonymous selection pressure which help stabilizing the genetic material at varying degree of temperature.
Keywords: Bioinformatics; Codon bias; Codon; Extremophiles; Codon Adaptation Index; Correspondence Analysis; Amino acid; Evolution; citric acid cycle; codon usage.
A Multilevel analysis of hiv1-miR-H1 miRNA using KPCA, K-means, Random Forest and Online Target Tools
by Vinai George Biju, Blessy Baby Mathew, Prashanth C M
Abstract: The goal of this study was to propose a workflow using machine learning to identify and predict the miRNA targets of Human Immunodeficiency virus 1. miRNAs which is 21 nt long are attained from larger hairpin RNA precursors and is maintained in the secondary structure of their precursor relatively than in primary chain of successions. The proposition approach for identification and prediction of miRNA targets in hiv1-miR-H1is based on secondary structure and E-value through machine learning. Data Linearity of Length and e-value for sequence match with hiv1-mir-H1 is verified using Kernel PCA. miRNA targets were grouped into clusters thereby indicating similar targets using K-means algorithm. Classification model using Random Forest was implemented regards to each secondary features variable considering feature relevance. A learning methodology is put forward that assimilate and integrate the score returned by various machine learning algorithms to predict cellular hiv1-miR-H1 targets. Gene targets results using TargetScan, miRanda, PITA, DIANA microT and RNAhybrid are also explored for multiple parameters.
Keywords: miRNA; HIV 1; KPCA; K-Means; Random Forest.
Multiple Alignment of Structures using Center Of Proteins
by Asish Mukhopadhyay, Kaushik Roy, Gilbert Cole
Abstract: Multiple Structure Alignment (MStA) is a fundamental tool for correlating the structural similarity of proteins with their functional similarity and has therefore received much attention from the proteomics community. A number of algorithms have been proposed, MUSTANG, POSA, MultiProt, CE-MC tornname a few. In this paper we propose a new algorithm, MASCOT. This uses the DSSP program to map a protein structure into a DSSP-sequence, reducing the structural alignment problem to a sequence alignment problem. Similar to an approximation algorithm for multiple sequence alignment, we have used a center-star approach to select a center-protein with respect to which to create an alignment. The root mean square deviation (RMSD) has been used as a measure of alignment quality, and we report this measure for a large and varied number of alignments. We compared the execution times of our algorithm with the well-known algorithm MUSTANG for all the tested alignments. MASCOT outperformed MUSTANG on all the samples except one. Another measure, ACC (Alignment Accuracy), was used to compare the performance of MASCOT and MUSTANG with protein structures drawn fromrnthe manually curated database HOMSTRAD.
Keywords: structural bioinformatics; protein structure alignment; computational biology; algorithms.
Special Issue on: Parallel Computing Methodologies for E-Medicine
Intelligent Model for Diabetic Retinopathy Diagnosis: A Hybridized Approach
by Santosh Nagnath Randive, Amol D. Rahulkar, Ranjan K. Senapati
Abstract: As Diabetic Retinopathy (DR) is considered as most common infectious diseases in humans, more researches are highly embracing this sensitive work on health sector. More contributions have been already proposed under various aspects, yet the attainment of accurate DR detection seems to be an issue. So this paper intends to make an innovative contribution by introducing a novel DR detection model, and further the proposed model tells the severity of retinopathy from the given input fundus image. The proposed model comprises of stages such as Segmentation, Feature Extraction and Classification. Here, Active contour model is used for segmentation, and GLCM, and GLRM features are extracted during feature extraction process. Since the length of the feature vector is too large, it is necessary to choose the significant number of features, and thus selecting the significant feature is a challenging task. Moreover, the classifier called Neural Network (NN) is used for classification purpose. As a main contribution, the extracted features (feature selection), and weight in NN model are optimally chosen by a new hybridized algorithm. The proposed Whale with Particle Swarm Optimization, termed as WP compares its performance over other conventional methods like Levenberg-Marquardt- Neural Network (LNN), Gradient Descent-Neural Network (GNN), Firefly-Neural Network (FNN), Particle Swarm optimization-Neural Network (PNN), Grey wolf Optimization-Neural Network (GWNN), Self Adaptive Greywolf optimization-Neural Network (SGWNN) and Whale Optimization-Neural Network (WNN) in terms of positive and negative measures. The implemented DR detection model is implemented in MATLAB 2017 a. The DIARETDBI database with 88 iris images is utilized for experimentation purpose. The positive measures are Accuracy, Specificity, Sensitivity, Precision, Negative Predictive Value (NPV), F1-Score and Matthews Correlation Coefficient (MCC). Similarly, the negative measures are False positive rate (FPR), False negative rate (FNR) and False Discovery Rate (FDR), and the superiority of the proposed model is proven.
Keywords: DR diagnosis; Feature Extraction; Classification; Weight Optimization; WP-Hybrid model.
Special Issue on: Data Mining and Its Applications in Bioinformatics and Biomedical Engineering
ADAPTIVE BIO-INSPIRED GENE OPTIMIZATION BASED DEEP NEURAL ASSOCIATIVE CLASSIFICATION FOR DIABETIC DISEASE DIAGNOSIS
by D. SASIREKHA, Punitha A
Abstract: Associative classification plays a significant role in data mining. The Several classification techniques have been proposed in existing works using association rules. However, the accuracy of existing classification technique was not adequate. In order to overcome this limitation, an Adaptive Bio-Inspired Gene Optimization Based Deep Neural Associative Classification (ABGO-DNAC) technique is proposed. ABGO-DNAC technique is developed to improve the classification performance for diabetic disease diagnosis at an early stage by generating association rules with a minimal number of medical attributes.The ABGO-DNAC technique used Adaptive Bio-Inspired Gene Optimization ABGO algorithm to generate the association rule by choosing a minimal number of optimal attributes from a medical dataset. With the support of formulated association rules, the ABGO-DNAC technique design a Gaussian Deep FeedForward Neural Learning (GDFNL) for diabetic disease classification.The GDFNL deeply analyses the patient's medical data with the aid of created association rules and classify the patients as normal or abnormal.Thus, ABGO-DNAC technique efficiently identifies the diabetic disease at an earlier stage with higher classification accuracy and minimum time.The simulation evaluation of ABGO-DNAC technique is performed on factors such as disease prediction accuracy, disease prediction time and false positive rate with respect to various number of patients. The simulation results depict the ABGO-DNAC technique is able to increase the disease prediction accuracy and also reduce the diabetic disease diagnosing as compared to state-of-the-art works.
Keywords: Association Rules; Diabetic Disease; Logistic Loss Function; Adaptive Bio-Inspired Gene Optimization; Gaussian Deep Feedforward Neural Learning; Adaptive Levy Mutation.