International Journal of Bioinformatics Research and Applications (59 papers in press)
SCAN DB: An integrated catalogue of computationally characterized NER specific skin cancers
by Varsha Mehta, Tanya Singh, Ankush Bansal, Tiratha Raj Singh
Abstract: SCAN DB, acronym for Skin CAncer Ner DataBase, provides a unique, first of its kind repository for understanding the biochemistry of the NER pathway, disease dynamics, genetics, clinical information, expression, evolutionary trajectories and of the skin cancers. It is an exclusive and curated database focusing majorly on NER pathway, which assists in the development and discovery of new diagnostic and prognostic therapies, the characterization of these cancers via making complete use of scattered data available through publications, technical and clinical reports, databases etc. DNA damage has emerged as a major culprit in cancer and many age related diseases. Simultaneously, DNA repair and genomic integrity management have become of prime importance in this cancerous era. One of the significant pathways to remove these bulky lesions is Nucleotide Excision Repair (NER) pathway, whose deficiencies of NER repair proteins are also associated with the skin cancer prone inherited disorder - Xeroderma pigmentosum and other neurodegenerative abnormalities like Cockayne Syndrome and Trichothiodystrophy. However, a well structured, integrated and comprehensive resource of NER pathway and related skin cancers is presently not available. Therefore, SCAN DB effectively bridges this gap in knowledge. The database can be accessed using the URL http://bioinfoindia.org/SCANDB//index.php
Keywords: Nucleotide excision repair; Xeroderma pigmentosum; Cockayne Syndrome; Trichothiodystrophy; DNA damage; DNA repair.
Usage of Ensemble Model and Genetic Algorithm in Pipeline for Feature Selection from Cancer Microarray Data
by Barnali Sahu, Satchidananda Dehuri, Alok Jagadev
Abstract: This paper proposes an ensemble of feature selection techniques with genetic algorithm in the pipeline for selecting features from microarray data. The ensemble is a combination of a well- balanced collection of filter and wrapper-based feature selection methods. In addition, for further refinement of the resulting output of ensemble, the genetic algorithm in the pipeline is taken to produce a non-local set of robust feature subset. An extensive computational experiment has been carried out on a prostate cancer data set for validation of the method. Moreover, we have compared the performance of our method with group genetic algorithm (GGA). Finally, the resultant feature subsets of GA, GGA, and other constituents of the ensemble in standalone mode have been used for uncovering frequent patterns based on two popular association rule mining like Apriori and FP-growth. The experimental study confirms that the proposed method gives classification accuracy of 100%, 98.34%, 98.02%, and 97.00% based on an ensemble of classifiers w. r. t. 5, 10, 15, and 20 features, respectively. On the other hand, the classification accuracies of the same sequence of feature subsets selected by GGA are 92.34%, 90.34%, 86.54%, and 87.21%. Therefore, the proposed approach is treated as a promising alternative tool in the arena of feature selection and classification of microarray data.
Keywords: Microarray data; Differentially expressed genes; Ensemble feature selection; Apriori; FP-growth.
A Concept of Sub-bands Event Related Potentials to Increase classes of Brain Computer Interface system
by Mitul Kumar Ahirwal, Anil Kumar, Girish Kumar Singh
Abstract: Event Related Potential (ERP) detection and translation into commands for Brain Computer Interfacing (BCI) achieves significant stability on the basis of concrete theories of general physiological changes in Electroencephalogram (EEG) signals related to various tasks. However, each ERP related to particular task can be only exploited as one-to-one relation with specific command or operation. This limits the variability of BCI system and increases the amount of work to identify task related accurate pattern changes in EEG. In this paper, sub-band analysis of detected ERP is proposed in order to factorize one-to-one relation into one-to-many for increasing the variability of BCI system. First, the hypothesis based on analysis of Event-Related Spectral Perturbation (ERSP) is stated, and then the hypothetical concept is generalized with sub-bands decomposition of ERP, followed by culminative power estimation. Results show that the proposed technique can be easily implemented as a method of Combined Factorized Feature Extraction (CFFE) to execute multiple commands corresponding to single ERP. Classification is also performed with feed-forward neural network.
Keywords: ERP; EEG; Classification; Sub-band decomposition.
New gene selection algorithm using hypeboxes to improve performance of classifiers
by Adil Bagirov, Karim Mardaneh
Abstract: With the development of DNA microarray technology the expression levels of thousands of genes can be measured simultaneously in one single experiment. However, the large number of genes and relatively small number of samples in microarray data sets are among main difficulties for classification of new tumors. Therefore, efficient gene selection algorithms are required to identify differentially expressed genes or groups of genes and to improve performance of classifiers. A new gene selection algorithm is developed to improve performance of classifiers on gene expression data sets. The new gene selection algorithm is based on calculating the marginal hyberboxes of genes or groups of genes for each tumor type and overlaps of hyberboxes of different tumor types. The results on six gene expression data sets demonstrate that the algorithm is able to considerably reduce the number of genes and to significantly improve performance of classifiers.
Keywords: gene selection; gene expression; DNA mictoarray technology; hyperboxes.
A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification
by Shigang Liu, Jun Zhang, Yang Xiang, Dongxi Xiang
Abstract: Biomedical data are widely accepted in developing prediction models for identifying a specific tumor, drug discovery and classification of human cancers. However, previous studies usually focused on different classifiers, and overlook the class imbalance problem in real-world biomedical datasets. There are a lack of studies on evaluation of data pre-processing techniques, such as resampling and feature selection, on imbalanced biomedical data learning. The relationship between data pre-processing techniques and the data distributions has never been analysed in previous studies. This article mainly focuses on reviewing and evaluating some popular and recently developed resampling and feature selection methods for class imbalance learning. We analyse the effectiveness of each technique from data distribution perspective. Extensive experiments have been done based on five classifiers, four performance measures, eight learning techniques across twenty real-world datasets. Experimental results show that: (1) resampling and feature selection techniques exhibit better performance using support vector machine (SVM) classifier. However, resampling and Feature Selection techniques perform poorly when using C4.5 decision tree and Linear discriminant analysis classifiers; (2) for datasets with different distributions, techniques such as Random undersampling and Feature Selection perform better than other data pre-processing methods with T Location-Scale distribution when using SVM and KNN (K-nearest neighbours) classifiers. Random oversampling outperforms other methods on Negative Binomial distribution using Random Forest classifier with lower level of imbalance ratio; (3) Feature Selection outperforms other data pre-processing methods in most cases, thus, Feature Selection with SVM classifier is the best choice for imbalanced biomedical data learning.
Keywords: class-imbalance; data distribution; classification; biomedical data; resampling; feature selection.
A Software Tool for Protein Sequence Alignment
by Justin Lee, Shawn Wang
Abstract: Protein sequence comparison is one of the most popular techniques for protein data analysis. Because a specific function of a protein is often determined by a small segment in the sequence, algorithms for optimal local alignment are among the most studied. Since Smith and Waterman proposed the dynamic algorithm for optimal local alignment in 1981, many local alignment tools have been developed. Each of these tools was developed based on a specific cost model and adapted to the effectiveness of that cost model, often in comparison with algorithms that had been developed based on other cost models. As a consequence, these tools lack the flexibility of accepting different cost models and incorporating biological properties to guide the alignment algorithms. They often perform superior in some cases while lead to inaccurate alignment results in others. In this paper, we introduce an effective tool called INSPAL (INformation SPecific ALgorithm) that is not based on any specific cost model, instead allowing the user to adjust the parameters for alignment according to the sequences under consideration and the biological properties that are specific to these sequences. Experimental comparison with two most popular alignment tools ALIGN and SIM indicated that INSPAL generated better alignment results with appropriate settings of the parameters. INSPAL was developed as a Windows Installer Package using Microsoft Visual Studio C++.rnIt provides a user friendly graphic user interface and is very easy to install and use.
Keywords: protein sequence alignment; hydrophobicity; Pascarella Value; dynamic algorithm; bioinformatics.
Helix-helix interaction viewed in an angle frame indicates a role of the size of sidechains in packing
by Xiubei Liao
Abstract: Three-dimensional packing is an essential quality of proteins that determines their interaction with other proteins and their biological function. Especially the packing of helical elements is important for the folding, stability, and interactions of proteins. Previously, different hypothesis have been used to develop algorithms that would predict helical packing in proteins. So far there has been a dearth of reliable approaches to predict the types of residues used in hydrophobic cores. Furthermore, the stereological arrangement of individual amino acids and in three dimensional hydrophobic cores is rather difficult to determine. In order to simplify the description of packing inside a protein and between two proteins, we have determined the relationship among angles, distances, and residue usage between two helices. This approach provides a means to predict the three-dimensional packing of helices and allows for an understanding of the interaction within proteins and among proteins based on surface contact residue parameters.
Keywords: Protein Structure; Helix-helix interaction.
Recognizing of repetitive and stereotyped movements for children with Autism spectrum disorder
by Maha Jazouli, Soufiane Ezghari, Aicha Majda, Azeddine Zahi, Rachid Aalouane, Arsalane Zarghili
Abstract: Autism spectrum disorder (ASD) is a group of conditions that cause individuals to have difficulties with social impairment, communication difficulties, and repetitive and stereotyped behaviours. Autistic people often engage in stereotyped and repetitive motor movements. Hence, our aim is to put out a smart video surveillance system that facilitates the diagnosis of ASD for doctors. In this respect, we propose an automatic stereotypical motor movement detection system in real time. Firstly, we use the Kinect sensor to monitor the autistic child\'s movements. Secondly, we propose a data integration process to make the provided data from Kinect sensor more comprehensive and specific. Thirdly, we perform the gesture detection by using the well know machine learning algorithms such as decision tree, artificial neural network and nearest neighbour. We experiment our proposal in five stereotyped behaviours. The obtained result is very promising and shows that the data integration step enhances the gesture recognition.
Keywords: Autism spectrum disorder; Stereotypical motor movements; stereotyped behaviours; Kinect Sensor; gesture recognition; machine learning.
Similar Gene Expression Profiles Define Leptospirosis Clinical Outcomes
by Nivison Nery, Daniela Barreiro Claro, Janet Lindow
Abstract: Leptospirosis, an acute, febrile disease with high case fatality, is prevalent in many tropical, urban regions. The mechanisms leading to death from leptospirosis are not fully understood. However, recent studies indicate that differences in the immune response during acute infection are associated with fatality. To identify transcriptional signatures that could differentiate survivors and case fatalities, we analyzed data obtained from full human genome transcriptome profiling of whole blood from patients with different disease outcomes. Using clustering algorithms, we identified unique groups, demonstrating that surviving patients and fatal cases have significant differences in their transcriptional profiles. We also confirmed our prior findings, which showed expression differences in genes involved in the immune response.
Keywords: Clustering analysis; leptospirosis; gene expression.
Comparative Regression Performances of Machine Learning Methods Optimizing Hyperparameters: Application to Health Expenditures
by Songul Cinaroglu, Onur Baser
Abstract: Least Absolute Shrinkage and Selection Operator (Lasso), K-Nearest Neighbor (KNN), Random Forest (RF) and Support Vector Machine (SVM) regression are successful machine learning algorithms used in various areas. However, there has been no study analyzing health expenditures using machine learning methods. This work is a step forward in comparing the regression performances of L, NN, RF and SVM regression while changing hyperparameter values. In this study, lambda (λ), number of neighbors (NN), number of trees (NT) and epsilon (ε) parameter for L, NN, RF and SVM regression were determined as hyperparameters respectively. K-fold cross-validation was performed to examine regression performance results. These results show that KNN (R2˃0.75; RMSE˂0.70; MAE˂0.55) and L (R2˃0.79; RMSE˂0.20; MAE˂0.15) regression yields better results in predicting health expenditure per capita and out of pocket health expenditure (%) respectively. Moreover, L, KNN, RF and SVM regression methods performance differences are statistically significant (p˂0.001). It is hoped that these results will stimulate further interest in using machine learning methods to predict health expenditures.
Keywords: Machine Learning; Random Forest Regression; Support Vector Regression; Hyperparameter Optimization; Black-Box Optimization; Health Expenditures.
Biological characteristics evaluation to predict enzyme classes with support vector machine
by Gabriela Santos, Cristiane Nobre, Luis Zárate
Abstract: Predicting protein function is a latent problem and a challenge in the field of bioinformatics. Over the years several computational approaches have been proposed for this purpose. One of the approaches is based on characteristics, which makes use of biologic relevant information. The several contributions have considered one or a combination of characteristics belonging to the four protein structures in order to classify enzymes in one of its classes. In this study we evaluate a set of characteristics that represent the four structural levels (primary, secondary, tertiary and quaternary), such as electrostatic potential, hydrophobicity, amino acids frequency, distance between α-carbons and molecular weight for classify enzymes in one of its classes. The characteristics were combined with each other, forming 15 datasets. In this study, in order to evaluate the relevance of the characteristics, we consider the SVM classifier due presenting satisfactory results in the process of biological data classification. The objective of this study is to contribute for the most appropriate choice of characteristics for the proteins function prediction.
Keywords: Prediction of protein function; Enzyme; Suport vector machine.
A Hybrid Method for Classification of Physical Action Using Discrete Wavelet Transform and Artificial Neural Network
by Gopal Chandra Jana, Aleena Swetapadma, Prasant Kumar Pattnaik
Abstract: This paper proposes a method for physical action classification based on wavelet analysis and artificial neural network (ANN) from electromyography (EMG) signals. The physical action includes the person's normal action as well as aggressive action. During various types of physical actions, the EMG signals are recorded. Discrete wavelet transforms (DWT) with DB-4 wavelet is used for feature extraction from recorded EMG signals. Extracted features are given as input to the artificial neural network-based classifier to distinguish between normal actions and aggressive actions. The hybrid approach using combination of ANN and wavelet shows significance increase in level of accuracy in classifying the physical action. Hence proposed method can be used to discriminate the physical actions ultimately helps in identifying persons mental state.
Keywords: Electromyography (EMG); Wavelet analysis; Discrete wavelet transform (DWT); Artificial Neural Network (ANN); Classification.
Computational Studies to Explore the Role of MSI Associated DNA Mismatch Repair Mechanisms in HNPCC Through Expression and Interaction Data
by Sadhika Behl, Arushi Sharma, Prashant Survajhala, Tiratha Raj Singh
Abstract: Microsatellite instability (MSI) is an error mechanism associated with DNA mismatch repair (MMR) system constituting a set of genes. If MMR fails, MSI may lead to various forms of cancers such as hereditary non polyposis colorectal cancer (HNPCC). In this study, we explored the gene expression and network data to reveal the significance of MSI in HNPCC. Genes and proteins were observed for their specific role in HNPCC with respect to MSI and MMR. Besides standard markers, few genes such as PMS1, TP53, MLH1, CHEK2, RFC3, LIG1, AURKA, CCND1, POLD1, HMGB1, ERCC1, ERCC2, PTGS2, and SLC19A1were identified as putative markers having significant contribution in the regulation of the mechanisms associated with MSI and MMR for HNPCC. Experimental validation of these genes will prove to a promising outcome for further research and will aid in the maintenance of the disease.
Keywords: DNA mismatch repair; Microsatellite Instability; Hereditary non polyposis colorectal cancer; Significant Microarray Analysis; Differentially Expressed Genes.
Neural network based prediction of less side effect causing cancer drug targets in the network of MAPK pathways
by M.D. Aksam V.K, Chandrasekaran V.M., Sundaramurthy Pandurangan
Abstract: Computational side-effect prediction tools have been used in rational drug design to decrease the late-stage failure of the drugs under trial. Irrational selection of cancer drug targets in the deregulated MAPK pathways causes more side effects. Quantitative data on the network centralities and biological features degree, radiality, eccentricity, closeness, bridging, stress, pagerank centralities, essentiality, pathway-specific proteins, disease-causing proteins, protein domains and the other functional features exploited. We trained an artificial neural network with 15 selected features for the binary classification of side effects causing and less side-effect causing drug targets among the non-targeted proteins. Inter-relationship among the node centralities revealed three clusters with positive correlations. Among three clusters of centralities, the top centrality nodes overlap within the clusters playing multiple roles in the complex networks. Top-ranked proteins among the degree, eccentricity, betweenness centralities, possessing GO-based molecular function, involved in more than one biocarta pathways, domain content is prone to cause a number of side effects than other centralities and functional features. We predicted the following 15 less side effect causing cancer drug targets - Shc, Rap 1a, Mos, Tpl-2, PAC1, 4EBP1, GAB1, LAD, MEF2, ZAK, GADD45, TAB2, TAB1, ELK1 and SRF.
Keywords: Cancer drug targets identification; Network of MAPK pathways; Side effects; Essential proteins; Graph theory.
A hybrid method for differentially expressed genes identification and ranking from RNA-Seq data.
by Mohammad Samir Farooqi, Devendra Kumar, Dwijesh Chandra Mishra, Anil Rai, Niraj Kumar Singh
Abstract: RNA-Seq has gained immense popularity and emerged as a potential high-throughput platform for identification of differentially expressed (DE) genes. In order to estimate the nature of differential genes, it is important to find statistical distributional property of the data. In the present study we propose a new hybrid model (NBPFCROS) based on parametric and non-parametric statistic for the identification of DE genes. The NBP model based on Compound mixture of Poissongamma distribution is used as a parametric statistic and Fold change value derived using fold change rank ordering statistics (FCROS) algorithm is used as non-parametric statistic, we used a gene significance score pi value by combining expression fold change (f value) and statistical significance (P-value). The performance of NBPFCROS model was compared with NBP, FCROS, edgeR and DESeq2 models using synthetic and real RNA-Seq datasets and it was found that the developed model NBPFCROS is more robust as compared to the other models.
Keywords: RNA-seq; differentially expressed genes; parametric and non-parametric statistic; Fold change; gene significance score; classification accuracy; gene ranking.
Structure Based Inference of Functional Single Nucleotide Polymorphism and its Role in TGF1 Allied Colorectal Cancer (CRC)
by Ankita Shukla, Tiratha Raj Singh
Abstract: Motivation: Single-nucleotide polymorphisms (SNPs) play a crucial role in understanding the genetic basis of complex form of the human diseases. Till date vast varieties of studies have given major attention to TGFβR1 and TGFβR2 receptors in colorectal cancer (CRC), however TGFβ1 remains to be poorly understood. It is still a major challenge to identify the functional SNPs in a CRC-related TGFβ1 gene.
Background: CRC is the third most common form of the cancer related deaths worldwide. The relation between SNPs and CRC is a major concern; as they offer valuable markers for identifying genes responsible for disease susceptibility. SNPs majorly account for the more common form of genetic variation and majorly they fall in the coding regions of the human genome.
Method: In this study, total 136 mutations were retrieved for TGFβ1 out of which non-synonymous 37 mutations were considered. Initially sequence and structure based tools were used for damage prediction. The mutations that were predicted to be damaging by majority of the tools were then considered for the structure dynamics study.
Result: In this paper we targeted only one mutation type i.e. L28F to evaluate its effect on disease. Structure conservation studies have been performed to infer the effect of the mutation at the region with respect to its conservation profile. The study depicts the changes occurring to the overall structure due to a single amino acid variation (i.e. L28F) can probably cause damage to the structure by alterations at 2
Keywords: Colorectal cancer; Carcinogenesis; Molecular Dynamics; Polymorphism.
In silico Design and Analysis of Recombinant-Fibroin Fusion Protein as a Biomaterial for Enhanced Human Tissue Regeneration and Drug Delivery
by Mamatha Dadala Mary, Jyothi Singaraju, Swetha Kumari Koduru, Satyavathi Valluri V, Jayakumar Rajadas
Abstract: Chimeric proteins are fabricated by a combination of two or more independent genes coding for separate proteins, and these proteins are mostly used as biomaterials in the medical field. Silks are the protein polymers spun into fibers by some lepidopteran larvae, majorly silkworms. Since decades, silk fibers have been used in many clinical applications, because of their enhanced environmental stability, high density and insolubility in most solvents. Our present work focuses on in silico designing and construction of recombinant fusion protein of silkmoth Fibroin heavy chain (FibH) and Human Elafin (Elfn), a skin-derived anti leukoprotease protein, encoded by PI3 gene. A compatible biomaterial of recombinant-fibroin fusion protein has been designed with and without hydrophobic linker. The physicochemical properties, structural properties and stability of the two kinds of fusion proteins were analyzed in silico, which paves a way for their application as biomaterials in enhanced human tissue regeneration and in drug delivery system.
Keywords: Chimeric proteins; Silk biomaterial; Fibroin heavy chain; Elafin; Fusion protein; Human tissue regeneration.
An Efficient Framework for Accelerating Needleman-Wunsch Algorithm Using GPU
by Hamza Nadim, Mohamed Assal, Abdelfatah A. Hegazy
Abstract: The Needleman-Wunsch algorithm is considered the benchmark for global alignment, this work proposes a new implementation for the parallel NW algorithm over the GPU. Focusing on enhancing the second phase of the algorithm (The Fill) the most time demanding phase. The idea of filling a percentage of the matrix is presented which guarantees a decrease in execution time, the key was to find the minimum needed percentage to be filled while ensuring the same result as filling the whole matrix of the algorithm. Experiments show the effectiveness of the proposed model in execution time when compared with the sequential algorithm.
Keywords: Needleman-Wunsch; GPU; Cuda; Sequence Alignment;Partial Matrix Filling.
Comparative study of synonymous codon usage in bacteria growing at extreme temperatures
by Monisha Singhal, Pragya Chaturvedi, R.K. Gothwal, M.K. Mohan, Pooransingh Solanki
Abstract: With the availability of completely sequenced archaeal genomes it has become possible to compare the codon and amino acid usage strategies among different extremophiles. The adapted sequence of codons and amino acids decides the conformational pattern in structure of proteins and thereby confers on the specificity and structural integrity which remains maintain irrespective of the growth conditions. Correspondence analysis, a multivariate analysis method, was used to characterize various patterns present in the dataset of 200 genes encoding the ten key enzymes of citric acid cycle from 20 organisms surviving at varying degree of temperature. The study has shown that the different extremophiles follow a specific trend of codon usage and amino acid composition which is affected by temperature variation and base composition which is vital for functional and structural stability of enzymes and hence for their adaptive survival in such harsh environmental conditions. It was found that higher temperature favours high aromaticity score which can be linked to its thermal behaviour. The results and statistical analysis of various parameters of codon usage shows a level of preference in synonymous codons and indicates towards a kind of anonymous selection pressure which help stabilizing the genetic material at varying degree of temperature.
Keywords: Bioinformatics; Codon bias; Codon; Extremophiles; Codon Adaptation Index; Correspondence Analysis; Amino acid; Evolution; citric acid cycle; codon usage.
A Multilevel analysis of hiv1-miR-H1 miRNA using KPCA, K-means, Random Forest and Online Target Tools
by Vinai George Biju, Blessy Baby Mathew, Prashanth C M
Abstract: The goal of this study was to propose a workflow using machine learning to identify and predict the miRNA targets of Human Immunodeficiency virus 1. miRNAs which is 21 nt long are attained from larger hairpin RNA precursors and is maintained in the secondary structure of their precursor relatively than in primary chain of successions. The proposition approach for identification and prediction of miRNA targets in hiv1-miR-H1is based on secondary structure and E-value through machine learning. Data Linearity of Length and e-value for sequence match with hiv1-mir-H1 is verified using Kernel PCA. miRNA targets were grouped into clusters thereby indicating similar targets using K-means algorithm. Classification model using Random Forest was implemented regards to each secondary features variable considering feature relevance. A learning methodology is put forward that assimilate and integrate the score returned by various machine learning algorithms to predict cellular hiv1-miR-H1 targets. Gene targets results using TargetScan, miRanda, PITA, DIANA microT and RNAhybrid are also explored for multiple parameters.
Keywords: miRNA; HIV 1; KPCA; K-Means; Random Forest.
Multiple Alignment of Structures using Center Of Proteins
by Asish Mukhopadhyay, Kaushik Roy, Gilbert Cole
Abstract: Multiple Structure Alignment (MStA) is a fundamental tool for correlating the structural similarity of proteins with their functional similarity and has therefore received much attention from the proteomics community. A number of algorithms have been proposed, MUSTANG, POSA, MultiProt, CE-MC tornname a few. In this paper we propose a new algorithm, MASCOT. This uses the DSSP program to map a protein structure into a DSSP-sequence, reducing the structural alignment problem to a sequence alignment problem. Similar to an approximation algorithm for multiple sequence alignment, we have used a center-star approach to select a center-protein with respect to which to create an alignment. The root mean square deviation (RMSD) has been used as a measure of alignment quality, and we report this measure for a large and varied number of alignments. We compared the execution times of our algorithm with the well-known algorithm MUSTANG for all the tested alignments. MASCOT outperformed MUSTANG on all the samples except one. Another measure, ACC (Alignment Accuracy), was used to compare the performance of MASCOT and MUSTANG with protein structures drawn fromrnthe manually curated database HOMSTRAD.
Keywords: structural bioinformatics; protein structure alignment; computational biology; algorithms.
Feature Prioritization on Big Genomic Data for Analyzing Gene-Gene Interactions
by Ahmad Aloqaily, Siamak Tafavogh, Bronwyn Harvey, Daniel Catchpoole, Paul Kennedy
Abstract: Exploring the large genomic data of humans has revolutionized our ability to explore the genetic architecture of disease. Examining the relationship between genetic markers across the whole genome enables us to obtain a better perception on genomic related diseases such as cancer. Complex diseases are not caused by single genes acting alone but are the result of intricate non-linear interactions among genetic factors, with each gene having a small effect on disease risk. Thus, there is a critical need to implement new approaches that can take into account non-linear gene-gene interactions in searching for markers that jointly cause complex diseases. However, determining the interaction between single nucleotide polymorphisms (SNP) specifically for more than two SNPs within large amounts of genomic data is a computationally expensive and sometimes an infeasible task. In this paper, an approach is developed to estimate the chance of survival for patients with Acute Lymphoblastic Leukaemia (ALL) by analyzing SNP datasets and the effect of their interactions on diseases. To this end, a novel feature prioritization algorithm is proposed called Interaction Effect Quantity (IEQ). The IEQ identifies and selects SNPs with a high potential of interaction by analyzing their distribution throughout the genomic data and eliminates the rest. This enables the conducting of a deeper non-linear interaction analysis between attributes in large genomic datasets. The results indicate that the IEQ measure enables the system to analyze interactions between up to four SNPs, while this analysis is much more computationally challenging if IEQ is not implemented. The results also show that despite a huge attribute elimination using the IEQ algorithm, the resulting F-measure for classification is greater than 89%.
Keywords: large genomic data; Dimensionality reduction; Feature prioritization; Gene-Gene interaction.
In-silico approach for the detection of key genes and their interaction involved in breast cancer cell lines
by Desam Neeharika, Swetha Sunkar
Abstract: Genes are known to play a pivotal role in Breast cancer. Any factor that that leads to a change in the expression level of the genes influences the mechanism and disturbs the functionality. Hence, our study aims to identify the key genes in the MCF-7 breast cancer cell line and their interaction with other genes using in silico approach. The microarray dataset GSE1400 was selected from GEO database and GEO2R tool was used and identified a total of 1932 Differentially Expressed Genes (DEGs), of which 1809 and 123 genes were up-regulated and down-regulated respectively. DAVID tool was used for functional annotations. Cytoscape was used for the screening of clusters and identification of hub genes. Among them, EGFR and HNRNPR were identified to have highest degree nodes from up-regulated and down-regulated gene networks respectively and hence can be used as possible molecular target.
Keywords: Breast cancer; Differentially Expressed Genes; GEO; in-silico analysis; Gene Ontology; Hub genes.
LIFT: LncRNA Identification and Function-prediction Tool
by Sumukh Deshpande, James Shuttleworth, Jianhua Yang, Sandy Taramonli, Matthew England
Abstract: Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. Accurate identification and sub-classification of lncRNAs is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify, classify and predict their biological functions in plant species. In this study, a novel computational framework called LIFT has been developed, which implements LASSO optimization and iterative Random Forests classification for selection of optimal features, a novel Position-Based Classification (PBC) method for sub-classifying lncRNAs into different classes and Bayesian-based function prediction approach for annotating lncRNA transcripts. Using LASSO, LIFT selected 31 optimal features and achieved 15-30% improvement in the prediction accuracy on plant species when evaluated against state-of-the-art CPC tools. Using PBC, LIFT successfully identified the intergenic and antisense transcripts with greater accuracy in A. thaliana and Z. mays datasets. The predicted functions were verified with published experimental results.
Keywords: lncRNA; LASSO; iterative Random Forests; Position-Based Classification; BMRF; function prediction.
Predicting Novel Interactions from HIV-1-Human PPI Data Integrated with Protein Signatures and GO Annotations
by Debasmita Pal, Kartick Chandra Mondal
Abstract: The research on host-pathogen protein-protein interactions (PPIs) becomes one of the most challenging areas of medical science for antiviral drug invention. Specifically, the virologists are now more focused on the pathogenesis of Human Immunodeficiency Virus - Type 1 (HIV-1) due to its virulent nature and epidemic spread throughout the world. The virus exploits a complex interaction network of HIV-1 and human cellular proteins to replicate and gradually destroys the human immunity power causing Acquired Immunodeficiency Syndrome (AIDS). In this paper, we propose a pattern mining based approach to predict novel interactions between HIV-1 and human proteins with an estimated confidence based on the experimentally validated known interactions curated in public PPI database. While predicting interactions, we also utilize the information on protein signatures and Gene Ontology (GO) annotations (Biological Process, Cellular Component and Molecular Function annotations) of human proteins. The integration of these additional information with PPI dataset results in predicting more potential interactions and also provides the signatures and GO annotations which might hold for the predicted interactions. We validate our predicted interactions by finding evidences from the literature and comparing with the predictions made by different computational approaches. We believe that our predicted information on PPIs along with the corresponding signatures and GO terms enlightens the PPI research field with greater knowledge and better understanding of viral replication process; subsequently enhancing the discovery of new drug targets.
Keywords: HIV-1 Proteins; Antiretroviral Drugs; Protein Signatures; GO Annotations; Protein-Protein Interactions; Association Rule Mining.
Identification of novel and available Bacillus anthracis inhibitors using drug repurposing approach and In-Silico methods
by Masood Aleeyar, Hamid Moghimi, Abbas Rouhollahi, Ramezan Ali Taheri
Abstract: Anthrax is a dangerous disease all around the world and it can be lethal if it is not properly treated. In order to address this problem, we decided to use the combination of drug repurposing and structure-based drug discovery to identify useful and effective alternative treatment for anthrax. In this study, we used molecular docking, molecular dynamics simulations and related analysis and found that Kanamycin, Tobramycin and Framycetin can be potential drugs for the treatment of Anthrax. Moreover, because our identified compounds are FDA approved drugs, there is no need to pass any safety test and all of the hit compounds can be tested in human subjects.
Keywords: Bacillus anthracis; anthrax; nucleoside hydrolase; molecular docking; molecular dynamic; In-Silico drug repurposing.
Clustering Analysis of Soil Microbial Community on a Global Scale
by Tetsushi Tanaka, Andre Freire Cruz, Naoaki Ono, Shigehiko Kanaya
Abstract: There are huge numbers of bacteria in the soil, and their existence and function a?ect soil properties. Bacteria form a complex community called the microbiome. Compared to the ecosystem of plants and animals, we still know little about the soils microbial ecosystem. How soil microbiomes are di?erent throughout the world and how they relate to the region and the environment is of major interest. Recently, the development of next-generation sequencing has enabled researchers to accumulate metagenome profile data, and large-scale and comprehensive microbiome analyses are required by the scientific approach. In this study, we compared and analyzed the data from various environments on a global scale using the microbiome database, the Earth Microbiome Project (EMP). We calculated the distance based on genetic distance, named the UniFrac distance, and did clustering analysis. Clustering results were ecologically interpreted from the view of the function and the characteristics of bacteria. We revealed the characteristics of groups of bacteria related to paddy, vineyard, grasslands in Mongolian, forests, and biofilter. Furthermore, we investigated the relationship between clusters and climate zones. This research is expected to lead to knowledge of soil management based on the soil microbiome.
Keywords: Clustering Analysis; Metagenome; Microbiome; Soil Microbiology; UniFrac Distance.
Homology modeling and docking studies of strictosidine beta-D-glucosidase from Madagascar periwinkle (Catharanthus roseus Bunge)
by Piotr Szymczyk, Grazyna Szymanska, Malgorzata Majewska, Izabela Weremczuk-Jezyna, Michal Kolodziejczyk, Kamila Czarnecka, Pawel Szymanski, Ewa Kochan
Abstract: Here, the structure of the enzyme and substrate binding site of C. roseus strictosidine beta-D-glucosidase is reported. An homology model of C. roseus strictosidine beta-D-glucosidase was built using the Discovery Studio 4.1 software package and rice Os3bglu6 beta-glucosidase (PDB code: 3GNPA) as a template. The CDOCKER algorithm was used to dock the natural substrate, strictosidine, as well as an inhibitor D-glucono 1,5-lactone. Obtained structures were refined in the course of molecular dynamic simulation. Analysis of the acquired data provided information on the C. roseus strictosidine beta-D-glucosidase amino acid residues interaction with the natural substrate and inhibitor. Our findings expand the basic knowledge of the structure of the C. roseus strictosidine beta-D-glucosidase active site, and may also be used in the design of enzyme point mutations with improved stability and catalytic properties, or to change substrate specificity.
Keywords: homology modeling; ligand docking; enzyme-ligand interactions; alanine scanning.
Special Issue on: CBM 2018 Biomedicine, Machine Learning and Big Data
ExBWS: Extended Bioinformatics Web Services for Sequence Analyses
by Robert Penchovsky
Abstract: The Extended Bioinformatics Web Services (ExBWS) represent a significant extension of the published EBWS PHP-based server providing useful tools for analyses of DNA, RNA, and protein sequences. Six new Web-based applets are freely available via the ExBWS to the user. They include a DNA/RNA translator, an AminoCODE transformer, a virtual PCR analyzer, a protein hydropathy plotter, protein reverse translator, and a Eukaryotic ORF finder. Each applet includes some novel feature. The AminoCODE transformer takes a protein sequence from one letter code to three and vice versa, the virtual PCR analyzer generates fragments with or without overhangs, the protein hydropathy plotter makes hydropathy plots of 10 frames of the input sequence, the reverse translator converts proteins to DNA according to the highest codons bias present in the selected organism. The Eukaryotic ORF finder searches for introns in the query sequence and translates ORFs of the processed sequence into proteins. All programs are freely available at http://penchovsky.atwebpages.com/applications.php.
Keywords: Bioinformatics web server; nucleic acids; and protein sequence analyses; virtual PCR; hydropathy plot; Reverse translator; Protein sequence conversion; eukaryotic ORF finder.
Prediction of Alzheimer Associated Proteins (PAAP): A perspective to understand Alzheimer Disease for Therapeutic Design
by Gaurav Gupta, Neha Gupta, Ankit Gupta, Pankaj Vaidya, Girish Kumar Singh, Varun Jaiswal
Abstract: Alzheimer disease is complex progressive neurodegenerative disease with no cure and its occurrence rate increased worldwide with increase of human life span. It became the sixth killer in US and no vaccines is available for the disease. Its unclear aetiology is the major hurdle in therapeutics discovery against it. Discovery of proteins/genes associated with AD can decipher the disease aetiology and further discovery of vaccine and drug targets. Analysing association of genes/proteins with AD required resources and expertise which have practical limitation. Computational methods can be used to predict the association of all possible genes/protein with AD. In current research data of all known proteins/genes associated with AD was used to develop machine learning based method for the prediction of proteins for their association with AD. High accuracy of the developed model warrants the reliability of the method. The developed method is expected to help in understanding of AD and discovery of new vaccine and drug target candidates for AD. The developed method is implemented on webserver and available at (http://188.8.131.52/cgi-enabled/index.html).
Keywords: Alzheimer Disease; Machine Learning; Protr; PAAP; SVM.
A Review of Dimensionality Reduction Methods Applied on Clinical Data of Diabetic Neuropathy Complaints
by Usharani R
Abstract: The Dimensionality Reduction technique on a large clinical data of Type II diabetes patients in identifying the causes and symptoms that they tend to develop neuropathic complaints. Considering the techniques using Machine Learning and Data Mining are ineffective for big data, it becomes necessary to use the most important phase called data preprocessing. An effective approach to downsizing the data to be considered for analysis is dimensionality reduction. The reduction of dimensions technically is to reduce the number of independent / dependent variables which are necessary for our analysis. These processes help in identifying the features to be considered for selection and extraction of data avoiding the redundant and irrelevant features. Such a selection chooses an optimal subset of the features originally defined for a pre conceived objective and the extraction is used to build a new set of features which are a linear combination of the original features. Dimensionality Reduction is done by applying the supervised and unsupervised learning one can conclude on the prediction and further analysis. The primary focus of this paper is on Supervised Learning where the variables are known beforehand and a combination of Feature Selection Techniques and Machine Learning Algorithms used with the goal of providing the practitioners with the probability of methods for using Dimensionality Reduction mechanism for feature extraction
Keywords: Type II Diabetes Mellitus; Diabetes & Neuropathic complaints; Dimensionality Reduction; Machine Learning Algorithms; Feature Selection; Feature Extraction;.
A System for Continuous Monitoring of Food Intake in Patients with Dysphagia
by Ingridy Barbalho, Patrício Silva, Cynthia Maia, Cicilia Leite
Abstract: The use of mobile devices for continuous monitoring of patients with a specific pathology can significantly help in their recovery. In this perspective, this work presents the development of an mHealth monitoring system to follow-up of patients with difficulty in swallowing and/or Oropharyngeal Dysphagia. The developed system aims to capture the movements and the acoustic signals generated during the process of chewing and swallowing, and classify them in solid, liquid or pasty material. After classified, this information is stored and generated a food record with detailed information about the meals made, relating the level of dysphagia in the patient, based on this information. To this end, it was implemented a domain ontology for data classification. Finally, the system was validated through experiments in the real environment, showing relevant results and providing quality of life to patients who need remote monitoring.
Keywords: Artificial Intelligence; Difficulty in Swallowing; Food Intake; Ontologies; Food History; Remote Monitoring; Patient monitoring; Health Technology.
Biclustering of Diabetic Nephropathy and Diabetic Retinopathy Microarray Data Using a Similarity-Based Biclustering Algorithm
by Titin Siswantining, Alhadi Bustamam, Fahrezal Zubedi, Sofia Debi Puspa, Zuherman Rustam
Abstract: In this study, we implemented a modified similarity-based biclustering (SBB) algorithm to identify a significant bicluster in diabetic nephropathy and retinopathy microarray data. Theoretically, SBB consists of four main phases, transforming data, the construction of row (gene) and column (condition) similarity matrices, the clustering of each similarity matrix and the extraction of the bicluster. Before implementing the SBB method, genes are selected using relative deviations and absolute deviations. We modified the SBB algorithm at the stage of data transformation using minmax normalization and compared the partitioning methods using medoids, k-means clustering and agglomerative hierarchical clustering (Wards linkage). Based on silhouette index validation experiments, SBB using PAM provided better clustering of genes and samples than K-means and AHC (Wards linkage). Furthermore, the proposed technique identified a meaningful non-overlapping bicluster on a real dataset. Using gene ontology (GO) enrichment analysis and the Bonferroni correction provided by the Database for Annotation, Visualization and Integrated Discovery, we have identified biological evidence in each bicluster that is significant in terms of gene functions and biological processes.
Keywords: biclustering; gene expression; microarray data; similarity-based biclustering.
Special Issue on: Data Mining and Its Applications in Bioinformatics and Biomedical Engineering
Cuckoo Search based Deterministic Scale (CSDS) for Computer Aided Heart Disease Detection
by Vankara Jayavani, Lavanya Devi D
Abstract: Predicative analysis in medical domain for computer aided disease Prediction become crucial practice in regular clinical practices. This is since, the false alarming or delay in disease detection is inversely proportionate to the clinical experience of the medical practitioner. Unlike the other domains the sensitivity that is the accuracy in disease prone is very much crucial in clinical practices. Particularly, the accuracy and sensitivity are more crucial in computer aided heart disease prediction methods. Hence, the recent research contributions are quantifying thepossibilities of optimizing machine learning approaches to achieve significance in computer aided methods to perform predictive analysis on heart disease detection. In regard to this context, this manuscript is defining a supervised learning approach by Cuckoo Search based Deterministic Scale (CSDS) to perform heart disease prediction. The experimental study indicating the significance of the proposed model in related to detection accuracy and sensitivity along with other performance metrics.
Keywords: Soft-Computing; NOPAS; CSFT; CSDS; Cuckoo Search; Dice Similarity Coefficient.
Infected cells of mammogram image and performance analysis using Imaging techniques
by Diderot. P. Kumara Guru, N. Vasudevan
Abstract: Radiological image acquisition of mammogram and interpretation of cell count with image processing techniques is the focus behind this work. Image processing techniques incorporated Mammogram Image based Nuclei Count (MINC) calculates cell count and pixel intensity in malignant and entire breast area. MINC thus provides pixel based supervision of the image in the entire area and its malignant area and along with corresponding view of CC and MLO. Specific focus has been done with calcification type which are malignant and PLEMORPHIC in nature. In addition, Region Morphometry features of area, perimeter, circularity and elongation has been discussed for the infected area.
Keywords: Cell count; Micro-calcifications; and Regional Morphometry.
CS-ABC: Cuckoo Search based Adaptive Boosting Classifier for Malaria Infected Erythrocyte Detection
by Chaya Jagtap D, Usharani Nsai
Abstract: Abstract:Malaria Microscopic image processing and feature extraction allows to define computer aided disease prediction methods using machine learning. This practice enables to predict the disease scope regardless of technical expertise of the individual domain expert. In this context, design of computer aided methods those learned from the labeled erythrocytes given as training corpus to detect malaria scope at premature level is a crucial research objective in recent past. The contribution of this manuscript is a supervised learning technique that enables to detect the malaria scope in given erythrocyte.Contrast to former efforts in this region, this study utilizes multiple observable phase images of the unspotted cells. Automatically, erythrocytes are segmented utilizing optical phase thresholds and redeployed to allow quantitative association of the phase images. Redeployed images are examined to mine manifold morphological descriptors on the basis of phase information. When the entire individual descriptors are statistically diverse among uninfected & infected cells, every descriptor will not allow parting of the populations at the satisfactory level aimed at clinical service. The experimental study carried on the proposal and other contemporary models evincing that the proposed CS-ABC is considerably significant with maximal prediction accuracy and minimal misclassification rate that compared toother contemporary models.rnKeywords: WHO
Keywords: WHO; QPI; HSV method; AGNES; morphological gradient approaches.
ADAPTIVE BIO-INSPIRED GENE OPTIMIZATION BASED DEEP NEURAL ASSOCIATIVE CLASSIFICATION FOR DIABETIC DISEASE DIAGNOSIS
by D. SASIREKHA, Punitha A
Abstract: Associative classification plays a significant role in data mining. The Several classification techniques have been proposed in existing works using association rules. However, the accuracy of existing classification technique was not adequate. In order to overcome this limitation, an Adaptive Bio-Inspired Gene Optimization Based Deep Neural Associative Classification (ABGO-DNAC) technique is proposed. ABGO-DNAC technique is developed to improve the classification performance for diabetic disease diagnosis at an early stage by generating association rules with a minimal number of medical attributes.The ABGO-DNAC technique used Adaptive Bio-Inspired Gene Optimization ABGO algorithm to generate the association rule by choosing a minimal number of optimal attributes from a medical dataset. With the support of formulated association rules, the ABGO-DNAC technique design a Gaussian Deep FeedForward Neural Learning (GDFNL) for diabetic disease classification.The GDFNL deeply analyses the patient's medical data with the aid of created association rules and classify the patients as normal or abnormal.Thus, ABGO-DNAC technique efficiently identifies the diabetic disease at an earlier stage with higher classification accuracy and minimum time.The simulation evaluation of ABGO-DNAC technique is performed on factors such as disease prediction accuracy, disease prediction time and false positive rate with respect to various number of patients. The simulation results depict the ABGO-DNAC technique is able to increase the disease prediction accuracy and also reduce the diabetic disease diagnosing as compared to state-of-the-art works.
Keywords: Association Rules; Diabetic Disease; Logistic Loss Function; Adaptive Bio-Inspired Gene Optimization; Gaussian Deep Feedforward Neural Learning; Adaptive Levy Mutation.
Multi spectral image classification based on deep feature extraction using deep learning technique
by Muralimohanbabu Y., Radhika K.
Abstract: Remote sensing image classification accuracy depends on the extraction of Deep Feature Extraction. Unsupervised deep feature extraction employs single-layer and deep convolutional networks. Application of supervised convolutional networks is highly challenging for multi- and hyperspectral imagery when input data dimensionality is high and labelled set is limited. To accomplish the mentioned, greedy layer-wise unsupervised pre-training combined with an appropriate algorithm for unsupervised learning of sparse features is proposed. This algorithm concentrates on sparse representations and sparsity of the extracted features at a time. The proposed method is applied for land use/cover classification of different spatial/spectral remote imagery. Comparing the current algorithms for classification, the proposed method performs well. Extraction of powerful discriminative features is possible with single-layer convolutional networks to obtain detailed results in classification. Different spatial/spectral parameters are calculated to quantify the results.
Keywords: unsupervised learning; deep learning; multispectral images; segmentation; accuracy; classification.
Fusion of registered Medical Images Using Deep Learning Convolutional Neural Network with Statistics-based Steered Image Filter
by Suneetha Rikhari
Abstract: Medical image fusion technique plays an increasingly critical role in many clinical applications by deriving the complementary information from medical images with different modalities. In this, a novel MR and CT image fusion approach is proposed which utilizes the deep learning convolutional neural networks (CNNs) with statistics based steered image filter (SSIF). In our method, a deep learning convolutional network is adopted to generate a weight map which integrates the pixel activity information from MR and CT images. The fusion process is conducted via SSIF fusion rule which computes the weights of obtained detail layers using image statistics. In addition, weighted average method is utilized to obtain the fused image. Further, proposed fusion algorithm is extended to applicable for RGB image fusion. Experimental results demonstrate that the proposed method can achieve promising results in terms of both visual quality and objective assessment.
Keywords: image registration; medical image fusion; convolutional neural networks; steered image filter; image statistics; image quality metrics.
Visualization of Meniscus from Knee Joint MRI and Assessment of its Size Differences due to Age, Gender and BMI
by Mallikarjunaswamy M S, Mallikarjun S Holi, Rajesh Raman, Sujana Theja J S
Abstract: Menisci play a major role in cushioning and distributing the stress due to body weight over the cartilage surface of femur and tibia. They also contribute in joint lubrication and provide stability for the joint with the support of ligaments. It is essential to quantify the size and volume of the meniscus in treatment of injured and discoid meniscus. Visualization of menisci using MRI is helpful in understanding location of menisci tears and degradation. In this work, menisci were segmented from knee joint MRI using seeded region growing algorithm and volume rendered for 3D visualization. The menisci are quantified using image processing method. The noninvasive method of visualization and quantification of menisci is useful for diagnosis and surgical planning of diseased knee joints especially elderly people and sports injured. Influencing factors for size difference in menisci due to age, gender and BMI is analyzed using statistical methods.
Keywords: Knee joint; meniscus; magnetic resonance imaging; body mass index; segmentation.
Special Issue on: CMBH 18 Advances in Bioinformatics and Biotechnology towards Medicine and Health
Numerical Investigation of Human Knee Joint for understanding the influence of Anterior Cruciate Ligament on the displacement and stress
by Bharath K. Bhat, Raviraja Adhikari, Kiran Kumar V Acharya
Abstract: Purpose: This study on healthy knee is to understand the response of human knee joint during normal loading. Method: Magnetic Resonance Images (MRI) of five healthy human knee joints were considered for this study. 3-Dimensional (3D) models of these healthy human knee joints were generated. Finite element analysis was undertaken on these models after their discretization. A load of 134 N along with corresponding boundary conditions on the human knee joint. Linear elastic material properties were considered for all the parts of human knee joint in this study. All the five subjects were also analyzed considering ruptured ACL under the same load and boundary conditions as earlier. Results: Among the five subjects, it was observed that maximum Von-Mises stress was in the expected regions of Anterior Cruciate Ligament with the mean value of stress being 13.934 MPa with a standard deviation of 5.1 MPa for ACL. Conclusion: The higher displacement and reduced stress observed in case of knee joint with ruptured ACL when compared with that of healthy knee joint indicate that the joint becomes more flexible in the absence of ACL.
Keywords: ACL injury; Lachman Test Simulation; Finite Element Analysis; Human Knee Joint.
A computational analysis of cancer inhibitor ortho benzoyl vanillin
by Jaynthy CHELLAM, Usha S
Abstract: The oxidative stress caused by improper dieting increases the formation of free radicals during metabolism results in the cancerous disease, threatening human life globally. The scavenging of free radicals by small organic molecules quench the thrust in search of drugs for cancer appears to be an area of new approaches towards the motive. In this aspect, the title compound ortho benzoyl vanillin (OBV) was synthesized , carried out characterisation studies, thermal study and powder XRD. The charge transfer mechanism, structure activity relationship property of OBV serves to be a potential target to fight against cancer. OBV has been analyzed for its pharmacophores and its anticancer activity using computational methods. OBV showed a good dock score and hence thereby can be further analysed by other techniques to prove its potential as an inhibitor of cancer.
Keywords: Ortho Benzoyl Vanillin (OBV); Anticancer activity; Powder XRD; Charge transfer mechanism; Pharmacophore.
Identification of Malignancy in Lung Using Artificial Neural Network
by Lalithakumari S, Pandian R
Abstract: Earlier diagnosis of cancer cell growth leads to save lots of precious human lives. It is necessary to develop some automated tool, in order to detect malignant state at the beginning stage itself. Many algorithms had been proposed earlier by many researchers in the past, but, the accuracy of prediction is always a challenging task. In this work, an artificial neural network based methodology is proposed to find the abnormal growth of lung tissues. Higher probability of detection is taken as an objective to get an automated tool, with great accuracy. Manual interpretation always leads to misdiagnosis .A full fledged Computer Tomography image set of lung of sixty five different humans, with normal and malignant health states have been considered in this work. The three different views of the CT scanning system such as axial, coronal and sagittal have also been considered in the data base creation. Distinct textural features of the images provide an intra class variation in its nature, make a neural network, feasible for classification of the normal images, identifying away from the malignant ones. Optimal feature sets derived from Haralick Gray level co occurrence Matrix and used as the dimension reduction way for feeding neural network. In this work, a binary classifier neural network has been proposed to identify the normal images out of all the images. The capability of the proposed neural network has been quantitatively computed using confusion matrix and found in terms of classification accuracy.
Keywords: GLCM; Haralick; Classification Accuracy; MSE; SSE and MSEREG.
Bio-Inspired Approaches for classification of Benign and Malignant tumor of skin
by Aman Gautam, Usha Chouhan
Abstract: In this paper, Skin diseases are malignancies that emerge from the skin. The reasons for skin malignant growth are carcinogenic cells can spread to different parts of the skin. There are three primary sorts of skin diseases: BCC, SCC and melanoma. The initial two are known as non-melanoma skin malignant growth or kindhearted (NMSC). More prominent than 90% of skin malignant growths are caused by bright beams from the Sun. This introduction expands the danger of every one of these kinds of skin malignancy. Because of a more slender ozone layer the presentation has expanded. Another normal wellspring of bright radiation Tanning beds. The motivation behind this paper is to arrange among kindhearted and dangerous tumor dependent on dermoscopic picture ISIC datasets utilizing five machine learning methods (MLP, SVM, RF, KNN, and LR). Accurately characterized cases were found as 94%, 89.50%, 82.00% and 90.00% for MLP, SVM, RF,DT and LR individually The precision is accomplished that is higher than the current methodologies.
Keywords: tumor images; Medical Decision Support System; Machine learning techniques.
Micro Electro Mechanical Systems in Nephrology
by Subhashini Radhakrishnan, Niranjani Ravi Chandran
Abstract: MEMS plays a major role in the clinical field and it is used to create new contrivance related to both biology and medicine and this contrivances saving people from mortal. MEMS has an ability to integrate several components into single device. In nephrology MEMS are used for hemodialysis. The kidney is one of the important organ in our body and it is responsible for removing wastes. If your kidney is not functioning properly then it leads to CKD (Chronic Kidney Diseases). CKD treatment can ameliorate patient from end stage of kidney failure, which is sufficient to cause mortal without artificial filtering (hemodialysis) or a kidney transplant. In artificial filtering (hemodialysis), we use MEMS technology in the part of vascular access. MEMS based pressure sensor are implanted into a plastic tube or coil and are attached to arteries and veins of your arm. Further, it is used to filter the blood of kidney failure patient. The technology helps to safeguard kidney failure patient from mortal. MEMS are easy to integrate into systems or modify. MEMS contrivances can use both mechanical and electrical components and it reduces the usage of energy. Initially,it is not fit for any huge power transformation, because of its size. The review briefs about an overview of hemodialysis and MEMS usage in nephrology
Keywords: Micro Electro Mechanical Systems; nephrology; hemodialysis; CKD; pressure sensor; IRAD; ESRD; vascular access.
Special Issue on: Capsules in Medical Image Processing
Hybrid Kernel Fuzzy C-Means clustering segmentation algorithm for Content Based Medical Image Retrieval application
by Lakshmana B., Sunil Kumar S. Manvi, Karibasappa K.G.
Abstract: Nowadays, a huge number of therapeutic images are generated because of number of patients' daily medical activities which are millions in size. Retrieving these therapeutic images from the huge dataset is a challenging task, hence Content Based Medical Image Retrieval (CBMIR) system is used. Clustering based segmentation for diagnosing the query image was proposed by retrieval system. A wide range of research has been made by the scientists to build up an enhanced algorithm for clustering. In this research paper, Hybrid Bee Colony and Cuckoo Search (HBCCS) based centroid initialization for segmenting the images with the help of Kernel Fuzzy C-Means clustering (KFCM) is proposed which is also known as CBMIR-KFCM-HBCCS. For CBMIR segmentation, KFCM is considered as a preferable system because of its execution in segmenting the query images. But, the major drawback of the conventional KFCM is initialization of centroids random that leads to rise of execution time to achieve the optimal solution. In order to stimulate the segmentation procedure, HBCSS is utilized to initialize the centroids of required clusters. The quantitative proportions of results were analyzed by utilizing the measurements like the number of emphases and processing time. The number of iterations and processing of CBMIR-KFCM-HBCCS technique takes least value while contrasted with conventional KFCM. The CBMIR-KFCM-HBCCS technique is efficient and faster than existing KFCM for segmenting the images in terms of specificity, sensitivity and accuracy. The proposed technique accomplished nearly 96% accuracy compared to other existing techniques.
Keywords: Therapeutic Images; Clustering; Segmentation; Fuzzy C-Means; Bee Colony; Cuckoo Search.
Special Issue on: Parallel Computing Methodologies for E-Medicine
Intelligent Model for Diabetic Retinopathy Diagnosis: A Hybridized Approach
by Santosh Nagnath Randive, Amol D. Rahulkar, Ranjan K. Senapati
Abstract: As Diabetic Retinopathy (DR) is considered as most common infectious diseases in humans, more researches are highly embracing this sensitive work on health sector. More contributions have been already proposed under various aspects, yet the attainment of accurate DR detection seems to be an issue. So this paper intends to make an innovative contribution by introducing a novel DR detection model, and further the proposed model tells the severity of retinopathy from the given input fundus image. The proposed model comprises of stages such as Segmentation, Feature Extraction and Classification. Here, Active contour model is used for segmentation, and GLCM, and GLRM features are extracted during feature extraction process. Since the length of the feature vector is too large, it is necessary to choose the significant number of features, and thus selecting the significant feature is a challenging task. Moreover, the classifier called Neural Network (NN) is used for classification purpose. As a main contribution, the extracted features (feature selection), and weight in NN model are optimally chosen by a new hybridized algorithm. The proposed Whale with Particle Swarm Optimization, termed as WP compares its performance over other conventional methods like Levenberg-Marquardt- Neural Network (LNN), Gradient Descent-Neural Network (GNN), Firefly-Neural Network (FNN), Particle Swarm optimization-Neural Network (PNN), Grey wolf Optimization-Neural Network (GWNN), Self Adaptive Greywolf optimization-Neural Network (SGWNN) and Whale Optimization-Neural Network (WNN) in terms of positive and negative measures. The implemented DR detection model is implemented in MATLAB 2017 a. The DIARETDBI database with 88 iris images is utilized for experimentation purpose. The positive measures are Accuracy, Specificity, Sensitivity, Precision, Negative Predictive Value (NPV), F1-Score and Matthews Correlation Coefficient (MCC). Similarly, the negative measures are False positive rate (FPR), False negative rate (FNR) and False Discovery Rate (FDR), and the superiority of the proposed model is proven.
Keywords: DR diagnosis; Feature Extraction; Classification; Weight Optimization; WP-Hybrid model.
An Empirical study of the Big Data classification Methodologies
by MUJEEB S.MD., PRAVEEN S.A.M. R., MADHAVI K.
Abstract: The two hasty emanating technologies are big data and cloud computing. Cloud computing is a novel archetype for providing the computing environment in contrast the big data processing technology is convenient for most of the resource types. Now, a productive cloud-based methodology must be devised for the effective management of the big data. This survey presents the distinct cloud-based classification and clustering approaches adopted for the effective big data classification. This article reviews 40 research papers in the field of big data classification methodologies, like Fuzzy classifier, Bayesian model, Support Vector Machine (SVM) classifier, K-means clustering, Collaborative filtering based clustering and so on. Moreover, an elaborative analysis and discussion are made by concerning the employed methodology, evaluation metrics, accuracy range, adopted framework, datasets utilized and the implementation tool. Eventually, the research gaps and issues of various conventional cloud-based big data classification schemes are presented for extending the researchers towards a better contribution of significant big data management.
Keywords: Big data; cloud computing; classification; clustering; Fuzzy; accuracy.
An exhaustive study on the Lung Cancer risk models
by MALAYIL SHANID, ANITHA A.
Abstract: One of the critical cancers leading to an upsurging rate of mortality is lung cancer. The Computed Tomography (CT) is the vastly adopted technique for effective cancer detection and risk assessment. The CT images experience the issues of varying pulmonary nodule volume, location, and low contrast. Distinct schemes have been modeled for overcoming these issues, but the developed schemes could not provide the automated detection and description of lung cancer. The mortality rate and the intrusive surgery can be reduced only through the risk assessment of cancer at the earlier stages. Hence, an essential lung cancer detection technique must be modeled for the risk assessment of cancer at the earlier stages. This review article is made by carrying out a detailed survey on 40 research works presenting the existing lung cancer detection methodologies, like Neural Network (NN) based detection, Support Vector Machine (SVM) based detection, K-Nearest Neighbor (K-NN) based detection, Boosting algorithm, Bayesian model and so on. In addition to this, extensive analysis and discussion is made with respect to the publication year, adopted detection schemes, evaluation metrics, utilized datasets, a simulation tool, accuracy range, and the extracted features. Subsequently, the research gaps and issues of the distinct lung cancer detection schemes are elucidated for directing the research work to a better contribution of effective cancer risk assessment.
Keywords: Lung cancer; CT; SVM; accuracy; detection.
Diagnosis of Abdominal Mass in Ultrasound Images using Linear Collaborative Discriminant Regression Classification
by Shivshankar Sambhajirao Kore, Ankush B. Kadam
Abstract: An abdominal ultrasound image is a practical way of checking internal organs, including the kidneys, gallbladder, liver, spleen. In general, unprocessed ultrasound images include a lot of embedded noises. Hence, it is tedious to provide a clear view of a region that is affected. This paper intends to develop an advanced model for diagnosing abdominal masses using US images. This detection technique is accomplished in two stages including Feature extraction and Classification. During the feature extraction process, texture feature is extracted from US image by Adaptive Gradient Location and Orientation histogram (AGLOH). Later in the classification stage, Linear Collaborative Discriminant Regression Classification (LCDRC) model is used to classify whether the image is normal or abnormal. In common, LDRC obtains low dimensional properties assisted by the classification norms of the Linear Regression Classification (LRC). The main feature of collaborative edition is its capacity to discover the discriminant subspace. In addition, the classification error produced by the collaborative demonstration is lesser when evaluated with the error produced by the demonstration of single class. Therefore, an improved diagnosis precision is achieved while identifying mass in the regions of abdomen. The features of the proposed AGLOH method are compared with conventional techniques such as Grey-level co-occurrence matrix (GLCM) and GLOH. Further, the classifier of the proposed LCDRC method is compared with conventional techniques such as SVM and NN and validates the effectiveness of the proposed method.
Keywords: Abdominal Mass; Ultrasonic image; Adaptive Gradient Location and Orientation Histogram; Linear collaborative discriminant regression classification; Performance measures.
Special Issue on: Soft Computing and Optimisation Techniques for Biomedical Data Mining and Analysis
Recursive Subspace Based Feature Selection Approach for early risk prediction of Chronic Disease in Patients
by Sandeepkumar Hegde, Monica R.Mundada
Abstract: These days chronic disease is considered as threat to human health. These diseases persist for a longer period of time and responsible for over 70% of death and disability all over the world. The Diabetes mellitus is considered one of such chronic disease which is challenging to both developed and developing nations. It is anticipated that the world's 10% population will suffer from diabetes by 2045. It has become a driving reason for death. Healthcare information's are multidimensional in nature. Feature selection is considered as part of the preprocessing stage which is applied to the data with higher dimension in order to lessen the features which impact on the prediction of the disease. In this paper, a novel recursive subspace based feature selection (RSFS) algorithm is proposed. The feature subspace is obtained recursively by computing the covariance matrix and eigenvalue pairs. The process of selecting the features from the given data set is repeated until the machine learning model early predicts the Diabetes Miletus disease with the optimal accuracy. The experiment is conducted using a diabetes data set accumulated from National Institute of Diabetes. Experimental results are compared with existing techniques. The outcome demonstrates an accuracy of 88.5% which is higher compared to the existing approaches.
Keywords: Feature subspace; Covariance; Eigen value; machine learning; chronic disease.
Nearest Neighbor Based Feature Selection and Classification approach for analyzing sentiments
by Rajalaxmi Hegde, S. Seema
Abstract: Sentiment analysis is considered as one of the most important aspect in the field of research. The aim in this paper is to select features and perform classification of data using positive and negative. The objective of work is to analyze sentiment and perform classification. The main aim is to perform feature selection and use neighbor based classifier and tune the hyper parameters to get the optimal value to find the model accuracy and improve the performance. Proposed method performs the feature selection of data using the nearest neighbor based approach where initially the distance metrics and the cosine similarity of the data are calculated based on the preprocessed data. The term weighting mechanism is obtained to identify the weightage of each terms in the document. Experiments are conducted using several feature vectorization methods and produces better results.
Keywords: Accuracy; classification; Feature Selection; Preprocessing; Vectors.
ANN Model for Detection and Classification of Sleep and Non-Sleep stages
by Bollampally Anupama, Somayajulu Laxmi Narayana, K.S. Rao
Abstract: Electroencephalogram (EEG) is the most prominent tool used in the sleep related researches; disorders related to sleep has become one of the prime issues in human life. This work proposes an efficient approach to discriminate sleep stage from non-sleep stage or awake fullness by analysing EEG signals from frontal lobes. A second order FIR filter is designed to segregate Delta and Theta waves from EEG. Empirical Mode Decomposition technique is adopted to extract distinct features like Kurtosis, MAD (Median Absolute Deviation) and IQR (InterQuartile range). An artificial neural network model trained by Feed forward back propagation algorithm is adopted to classify the sleep and non-sleep classes. The amplitude and frequency values of sub-bands of electroencephalogram signals vary in sleep stage compared to non-sleep stage. The experimental results obtained determine that the features extracted proved to be good discriminators of sleep and non-sleep stage with an accuracy of 92%, sensitivity 100% and specificity 98%.
Keywords: Empirical mode decomposition; Sleep disorders; Electroencephalogram; Artificial neural network.
Privacy preserving Reversible watermarking in the encrypted domain through self-blinding
by Jeeva K.A, Sheeba V.S
Abstract: This paper presents robust image-based reversible watermarking in the encrypted domain. Two algorithms with high embedding rate are proposed for embedding data in a homomorphic encrypted domain using Paillier encryption scheme. Both algorithms exploit the self-blinding property of Paillier scheme to accomplish flexibility in extraction. Using these algorithms, blind and error-free watermark extraction is possible in the plaintext domain and encrypted domain. The robustness of the algorithms has been validated by considering various noise attacks in the encrypted domain. The proposed methods outperform its predecessors on the same embedding platform either in the flexibility of extraction process or in the embedding capacity and finds applications in privacy preserving distributed signal processing which is a major requirement in cloud environment.
Keywords: Privacy; Encryption; Encrypted DCT; Homomorphism; Paillier encryption; Reversible watermarking; Signal processing in the encrypted domain; Self-blinding.
Optimization of sub-space clustering in a high dimension data using Laplacian graph and machine learning
by Ambika P.R, Bharathi Malakreddy A
Abstract: There are many applications like business analytics, computer vision and medical data analytics, where an unsupervised approach of learning is used for the high-dimension data clustering. The problem of the subspace clustering is modelled as a graph problem which has to retain the critical features from the N-dimension while applying a dimension reduction technique to maintain a higher accuracy and lower computational overhead trade-off. Most of the traditional approaches suffer from the efficiency degradation when applied to the high-dimension data. An optimization of sub-space clustering is proposed in this paper for learning models using Laplacian Graph on a high-dimension data of inheritable factor and metamorphosis. The proposed model addresses the curse of dimensionality problem through Laplacian matrix function which eliminates the diagonal components or elements from input data matrix to minimize the data redundancy within sub-space. The traditional KNN algorithm is improvised for the non-linear classification of subspace clustering on high-dimension data clinical importance. The study outcome exhibits that proposed system offers significant increment of 99% of accuracy in clustering operation.
Keywords: Subspace clustering; dimension reduction; graph partition problem; high dimension data; Laplacian graph.
Artificial Neural Network Model for Detection and Classification of Alcoholic patterns in EEG
by Bollampally Anupama, Somayajulu Laxmi Narayana, K.S. Rao
Abstract: Alcoholism influences the brain function and is one of the major causes for cognitive, emotional and behavioural impairments. This work investigates alcohol and normal states by analysing EEG signals recorded from Frontal lobes of brain. Delta and Theta waves are the primary signals which vary by the effect of Alcohol. A second order FIR filter is designed to segregate Delta and Theta waves from EEG data. Empirical mode decomposition technique is adopted to extract distinct features like kurtosis, median absolute deviation and Inter quartile range. These features are given as input to Artificial Neural network for the feature classification. After consumption of Alcohol, the amplitude and frequency corresponding to delta and theta waves, which are responsible for inactive and sleep states of the brain are found to be very low in comparison to a normal state. The results indicate that features extracted proved to be good discriminators of alcoholics and normal with an accuracy of 92%, sensitivity 100% and specificity 83%.
Keywords: EEG; Empirical Mode Decomposition; Intrinsic Mode Functions (IMF); Artificial Neural Network (ANN); Back Propagation Algorithm; Hilbert Transform.
Configuring Artificial Neural Network Using Optimization Techniques For Speaker Voice Recognition
by Namburi Dhana Lakshmi, M. Satya Sai Ram
Abstract: Speaker recognition is proposed in this work using ANN and optimization technique which finds wide variety of applications. MFCC and LPC coefficients are utilized to extract features from voice signal as preliminary process. In this work, these features are applied to Artificial Neural Network (ANN) to recognize speaker. This research focuses on configuring conventional ANN structure with zero hidden layers to multiple hidden layers. Neural Network based Speaker Recognition generally achieves better recognition rate and it is possible to improve the accuracy by utilizing the large training data set or by increasing the number of hidden layers. Over-fitting and under-fitting problems can be addressed by optimizing the number of hidden layers. Genetic algorithm is applied to optimize hidden layers and the number of neurons for performance enhancement. The result reveals that the performance of GA in configuring ANN accomplishes 98% accuracy which is superior to conventional ANN. This research utilizes two types of database preliminary with 980 standard TIMIT voice signal database and 200 real time voice signal datasets.
Keywords: Speaker Recognition; Mel-Frequency Cepstral Coefficient (MFCC); Linear Prediction-filter Coefficients (LPC); Artificial Neural Network (ANN); Genetic Algorithm.
Gestational Age Determination of Ultrasound Foetal Images Using Artificial Neural Network
by Jujjavarapu Sunitha Kumari, Usha Rani Nelakuditi
Abstract: Gestation age estimation serves as an important task in assessing the high risks existing during pregnancy. Ultrasound fetal images provide valuable information for better understanding of the developmental stages. To accomplish this, suitable biometric parameters are monitored manually and the accuracy of this manual determination relies on skilled sonographer or image qualities. Such manual parametric determination subjects to multiple decisions resulting in causing observational errors. Thus, the main aim of the proposed approach is to evaluate the biometric parameters automatically for determining gestational age based fetal growth using ultrasound images. Fetal images undergo adaptive histogram equalization for its characteristics enhancement that is then followed by Normal shrink homomorphic filtering and canny edge detection-based segmentation process to extract the significant features with enhanced quality. The desired parameters are recognized by applying PBT classification that are further provided to the adaptive ANN for evaluating the status of the fetus in terms of gestational age. This integrated neural network accurately measures the growth of fetus with minimized error, shorter time and no significant difference exists between the estimated and the actual values.
Keywords: Ultrasound fetal images; Image enhancement; Edge detection; Normal shrink homomorphic technique; PBT classification; Artificial neural network.
Classification of Breast Cancer Images Using Completed Local Ternary Pattern and Support Vector Machine
by M. Kusuma Sri, E. Gomathi
Abstract: Breast cancer is the major cause for deaths in women compared to other cancers. Though early detection of breast cancer reduces cancer deaths, it is a challenging task for physicians. Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) techniques are widely applied in texture classification applications. Since LBP is more sensitive to noise in texture classification, it needs to be improved for achieving better results. Though LTP is more robust to noise, there are few drawbacks. Completed LBP and Completed Local Binary Count techniques achieve good accuracy for texture classification, but they inherit few drawbacks of LBP. In this paper, completed LTP operator is applied on breast cancer images for better classification accuracy than LBP and Completed LBP operators, by extracting sign and magnitude components. Experimental results based on breast cancer database show that the proposed technique achieved better classification accuracy than existing similar approaches.
Keywords: Breast cancer image; Image Segmentation; Texture classification; Machine learning; Decision Tree; Logistic Regression; Support Vector Machine.