International Journal of Intelligent Information and Database Systems (22 papers in press)
Projecting Dependency Syntax Labels From English into Vietnamese in English-Vietnamese Bilingual Corpus
by Phuoc Tran, Van-Deo Duong, Dinh Dien, Bay Vo, Long Nguyen, Huu Nguyen
Abstract: In natural language processing, corpora play an important role, especially labeled corpora, including part of speech labeled corpora, component syntax labeled corpora, dependency syntax labelel corpora, and so on. These labeled corpora are used for corpus-based research and give higher quality results in the supervised learning methods than the non-labeled corpus. In this article, we have conducted to tag Vietnamese dependency labels based on an English-Vietnamese bilingual corpus in which English language was tagged dependency labels. The experimental results show that our method got a high tagging result with LAS measurement of 73.5% and UAS measurement of 81.7%.
Keywords: Natural language processing; Projecting dependency syntax; English- Vietnamese bilingual corpus.
Hierarchical Clustering on Metric Lattice
by Xiangyan Meng, Muyan Liu, Jingyi Wu, Huiqiu Zhou, Fei Xu, Qiufeng Wu
Abstract: This work proposes a new clustering algorithm named Fuzzy Interval Number Hierarchical Clustering (FINHC) by converting original data into fuzzy interval number (FIN) firstly, then it proves that denotes the collection of FINs is a lattice and introduces a novel metric distance based on the results from lattice theory, as well as combining them with hierarchical clustering. The relevant mathematical background about lattice theory and the specific algorithm which is used to construct FIN have been presented in this paper. Three evaluation indexes including compactness, recall and F1-measure are applied to evaluate .the performance of FINHC, Hierarchical Clustering(HC) k-means, k-medoids, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) in six experiments used UCI public datasets and one experiment used KEEL public dataset. The FINHC algorithm shows better clustering performance compared to other traditional clustering algorithms and the results are also discussed specifically.
Keywords: FIN; fuzzy interval number; hierarchical clustering; metric lattice; public datasets; compactness; recall; F1-measure.
Quality Materialized View Selection using Quantum Inspired Artificial Bee Colony Optimization
by Biri Arun
Abstract: The availability of huge volumes of digital data and powerful computers have facilitated the extraction of information, knowledge and wisdom for Decision Support System. The information value is solely dependent on data quality. Data warehouse provides quality data; it is required that it responds to queries within seconds. But on account of steadily growing data warehouse, the query response time is generally in hours and weeks. Materialized view is an efficient approach to facilitate timely extraction of information and knowledge for strategic business decision making. Selecting an optimal set of views for materialization, referred to as view selection, is a NP complete problem. In this paper, a Quantum Inspired Artificial Bee Colony Algorithm is proposed to address the view selection problem. Experimental results show that the proposed algorithm significantly outperforms the fundamental algorithm for view selection, HRUA, and other view selection algorithms like ABC, MBO, HBMO, BCOc, BCOi and BBMO.
Keywords: Artificial Bee Colony Optimization; Quantum Computing; Decision Support System; Data Warehouse; Materialized Views.
SmartGuard: An IoT based Intrusion Detection System for Smart Homes
by Nishtha Kesswani, Basant Agarwal
Abstract: With the help of sensor-enabled devices, we are able to achieve many tasks without human intervention. The sensor-enabled Internet of Things has made our lives simple. But at the same time security is an important issue that needs to be addressed. In this paper, we present an Intrusion detection system, SmartGuard that can be deployed in the modern world's Smart Home. The proposed system would be able to detect malicious behavior within the network as well as any malicious communications from outside.
Keywords: Intrusion detection; Smart Home ; Internet of Things;.
Random forest based active learning for content based image retrieval
by Nilesh Bhosle, Manesh Kokare
Abstract: The classification based relevance feedback approach is
considered as an emerging way to conquer the semantic gap problem
in content based image retrieval systems. However, such an approach
suffers from the problem of the imbalanced training dataset, which
causes instability and degradation in the retrieval results. In order to
tackle with this problem, a novel active learning approach based on
random forest classifier and feature reweighting technique is proposed in
this paper. Initially, a random forest classifier is used to learn the users
retrieval intention. Each binary decision tree in the forest vote for the
input images and final class is decided by considering the majority of
votes. Then, in active learning the most informative classified samples
are selected for manual labelling and added to the training dataset,
for retraining the classifier. Also, a feature reweighting technique based
on Hebbian learning is embedded in the retrieval loop to find the
weights of most perceptive features used for image representation. These
techniques are combined together to form a hypothesized solution for the
image retrieval problem. The experimental evaluation of the proposed
system is carried out on two different databases and shows a noteworthy
enhancement in retrieval results. An average precision of 87 % has been
achieved in 4 iterations of the relevance feedback.
Keywords: Content based image retrieval; Relevance feedback;Random forest learning; Active learning; Semantic gap; Feature reweighting; Information retrieval.
Meta-analysis of computational methods for breast cancer classification
by Tri-Cong Pham, Chi-Mai Luong, Antoine Doucet, Van-Dung Hoang, Diem-Phuc Tran, Duc-Hau Le
Abstract: Millions of women are suffering from breast cancer pressing burden on their shoulders and the global economy. Meanwhile, general treatment methods are applied without considering personalized health and genetic features. Artificial intelligence appears to be a robust method for breast cancer subtyping. Most of researches have been implemented on binary classification with limited number of data samples. Multi-classification is much more difficult especially on large number of samples. The study aims to use machine learning to find better ways to subtype breast cancer as well as find new disease causative genes which help facilitate more personalized treatment with limited side effect in the future. This study compares the accuracy of three classification methods in combination with eight feature selection methods on a dataset of 2,682 samples. The study shows that the highest accuracy was 83.96% with the SVM-C005 classifier and percentile feature selection (800 genes). Additionally, our method can predict causative disease genes of breast cancer with four of them known to be associated with breast cancer and 29 promising ones with supporting evidence from the literature. This shows the effectiveness of our research.
Keywords: Breast cancer; Gene expression; Multiclass classification; Feature selection; Gene selection; Microarray data.
Building Natural Language Responses from Natural Language Questions in the Spatio-Temporal Context
by Ghada Landoulsi, Khaoula Mahmoudi, Sami Faïz
Abstract: With the evolving research in geographic information system (GIS) owing to its ability to support decision makers in different fields, there is a strong need to enabling all users; specialists and non specialists to profit from this technology. Although, the key impediment to non specialists is the language to interact with the GIS and especially its embedded Geographic Database (GDB) which require SQL skills. In this paper we explore a new approach which alleviates nomad GIS users from any formatting effort by only using the natural language as a GDB communication mean. The process is generally twofold: (1) formatting the natural language user query to be processed by the GDB engine, and (2) translating the GDB retrieved answer to a text easily interpreted by all GIS users. The resulting implemented system was integrated to the OpenJump GIS and has been evaluated to give satisfactory results.
Keywords: Spatio-temporal data; Geographic Databases; Question Answering Systems; Structured Query Language; Natural Language Generation.
Special Issue on: SUSCOM-2019 Advancements in Computational Intelligence and Intelligent Database Design
Optimal Bag-of-Features using Random Salp Swarm Algorithm for Histopathological Image Analysis
by VENUBABU RACHAPUDI, Golagani Lavanya Devi
Abstract: Histopathological image classification is a prominent part of medical image classification. However, the classification of such images is a challenging task due to the presence of several morphological structures in the tissue images. Tissues images are categorized into four classes namely epithelium, connective, muscular, and nervous. These images have divergent variety of morphological structures. Therefore, the classification of these images is a grueling task and has been an active research area. Recently, bag-of-features method has been used for image classification tasks. However, bag-of-features method uses K-means algorithm to cluster the features, which is a sensitive algorithm towards the initial cluster centers and often traps into the local optima. Therefore, in this work, an efficient bag-of-features histopathological image classification method is presented using a novel variant of the salp swarm algorithm termed as random salp swarm algorithm. The efficiency of the proposed variant has been validated against the 20 benchmark functions. Further, the performance of the proposed method has been studied on blue histology image dataset and the results are compared with 5 other state-of-the-art meta-heuristic based bag-of-features methods in terms of four parameters, namely precision, recall, accuracy and F-measure. The experimental results demonstrates that the proposed method surpassed the other considered methods with an increase of 11% accuracy.
Keywords: Histopathological image classification; Salp Swarm Algorithm; Bag-of-features.
Machine Learning Based Book Recommender System: A Survey and New Perspectives
by Khalid Anwar, Jamshed Siddiqui, Shahab Saquib Sohail
Abstract: The exponential growth of recommender systems research has drawn the attention of the scientific community recently. These systems are very useful in reducing information overload and providing users with the items of their need. The major areas where recommender systems have contributed significantly include e-commerce, online auction, and books and conference recommendation for academia and industrialists. Book recommender systems suggest books of interest to users according to their preferences and requirements. In this article, we have surveyed machine learning techniques which have been used in book recommender systems. Moreover, evaluation metrics applied to evaluate recommendation techniques is also studied. Six categories for book recommendation techniques have been identified and discussed which would enable the scientific community to lay a foundation of research in the concerned field. We have also proposed future perspectives to improve recommender system. We hope that researchers exploring recommendation technology in general and book recommendation in particular will be finding this work highly beneficial.
Keywords: Book Recommender System; Machine Learning; Classification; Association Rule Mining; Evaluation Metrics.
A Conceptual Comparison of Metaheuristic Algorithms and Applications to Engineering Design Problems
by Kamalinder Kaur Kaleka, Avneet Kaur, Vijay Kumar
Abstract: This paper presents conceptual comparison among Spotted hyena
optimizer, Grey wolf optimizer, Particle swarm optimization, Ant colony
optimization, Gravitational search algorithm, Bat algorithm, Moth flame
optimization, Whale optimization algorithm. The behavior of these algorithms
is mathematical modeled to show the optimization process. Twenty-three
benchmark test functions are used to validate the performance of these
algorithms. The exploration and exploitation of these algorithms are analyzed
using convergence curve. The experimental results depict that Spotted hyena
optimizer and Grey wolf optimizer give optimal solutions as compared to the
other algorithms. Furthermore, these algorithms are tested on five constrained
engineering design problems. Experimental results reveal the applicability of
these algorithms in real-life engineering design problems.
Keywords: Metaheuristic; Spotted hyena optimizer; Gravitational search algorithm; Whale optimization algorithm; Moth flame optimization.
Fuzzy Based Approach to Incorporate Spatial Constraints in Possibilistic c-Means Algorithm for Remotely Sensed Imagery
by ABHISHEK SINGH, Anil Kumar
Abstract: This paper presents a robust Possibilistic c-Means with constraints (PCM-S) algorithms in a supervised way for remotely sensed imagery. The PCM-S overcome the disadvantages of PCM, by incorporating local information through spatial constraints to control the effect of neighboring terms. PCM-S has been deployed by adding spatial constraints in order to provide robustness to noise and outliers. Neighborhood labelling has been done in PCM-S by introducing local window (N_R) and regularizer parameter (?_i). Experiments have been conducted on Formosat-2 satellite imagery of Haridwar area in which classified results of PCM and PCM-S is optimized using Mean Membership Difference (MMD) method and performance of classifiers are analysed using Root Mean Square (RMSE) Method. Experiments performed on 1% Salt & Pepper Noisy Image and Original Image show that PCM-S classifier is effective in minimizing noisy pixels which produces least RMSE than PCM.
Keywords: Possibilistic c-Means (PCM); Possibilistic c-Means with constraints (PCM-S); Regularization Parameter; Mean Membership Difference (MMD); Root Mean Square Error (RMSE).
False-positive free transparent and optimal watermarking for colour images
by Neha Singh, Sandeep Joshi, Shilpi Birla
Abstract: Use of image watermarking as a tool for protection of ownership of the media has been successful. Embedding capacity, imperceptibility and robustness are three requirements of any watermarking technique. Singular Value Decomposition (SVD) and Discrete Wavelet Transforms (DWT) have been greatly used in this field. This paper presents a robust, blind watermarking technique for colour images based on SVD of DWT coefficients. The colour (RGB) image is first converted into Hue, Saturation and Value (HSV) model to segregate chromaticity information. The value plane undergoes 2-level DWT to represent the data in four parts. Horizontal (HL) and vertical (LH) sub-bands are used for embedding. These sub-bands are divided into non-overlapping blocks of size 4x 4. For each block, SVD is performed and the singular values (SV) are updated based on watermark bit and using Lagranges optimization principle. Two keys are used during embedding process. One of the key is used to distribute watermark into two parts to be embedded in two sub-bands. Another key is used as quantization step size during optimization of the SV for watermark embedding. Inverse of embedding technique is used to extract the watermark. Experiments show that the proposed technique is imperceptible as it offers PSNR > 40 dB. Also, the technique is able to resist general image processing operations (attacks) on the images with Structural Similarity Index measure (SSIM) nearly 1 and sufficiently high Bit Correct Ratio (BCR).
Keywords: Image Watermarking; Image processing; Singular Value Decomposition; Discrete Wavelet Transform.
A New Software Development Paradigm for Intelligent Information Systems
by Pooja Dehraj, Arun Sharma
Abstract: The continuous growth in software management cost requires the development of self-managed software systems. Using self-managed property, a system will take intelligent decisions to make a system work properly. Autonomic computing is the technique, which is used to develop such systems. Autonomic computing systems are highly reliable software systems. To enhance the quality of software systems, implementation of autonomic computing-based software development life cycle process may be a novel idea. It involves autonomous decision making by the autonomic component during the development of software. This approach reduces the complexity of the software development process. In addition, it resolves the purpose of autonomic computing to reduce software complexity and do real-time exception handling. In this paper, the implementation of the autonomic advisor based software development process is proposed using the cloud computing technique. Cloud Computing helps the developers to develop software, applications using deliverable services such as platform, infrastructure, and software. During the implementation and usage of autonomic advisor, the database becomes heavier. Therefore, to resolve such issues, cloud computing will be a beneficiary step. Other benefits of such an autonomous software development life cycle process are discussed further in this paper.
Keywords: Autonomic SDLC; Map-Reduce Requirement; Autonomic Advisor; Knowledge Database.
Earth Movers Distance Based Undersampling Approach for Handling Class-Imbalanced Data
by Rekha Gillala
Abstract: An imbalance data-set typically make prediction accuracy difficult. Most of the real-world data are imbalanced in nature. The traditional classifiers assume a well-balanced class distribution for training data but in practical data-sets show up an imbalance, thus obscure a classifier and degrade its capability to learn from such imbalanced data-sets. Data pre-processing approaches address this concern by using either random under-sampling or oversampling techniques. In this paper, we introduce Earth Movers Distance (EMD), as a similarity measure, to find the samples similar in nature and eliminate them as redundant from the dataset. Earth Movers distance has received a lot of attention in wide areas such as computer vision, image retrieval, machine learning, etc. The proposed Earth Movers Distance based under-sampling approach provides a solution at the data level to eliminate the redundant instances in majority samples without any loss of valuable information. This method is implemented with five conventional classifiers and one ensemble technique respectively, like C4.5 Decision tree (DT), k-Nearest Neighbor (k-NN), Multilayer Perceptron (MLP), Support Vector Machine (SVM), Naive Bayes (NB) and AdaBoost technique. The proposed method yields a superior performance on 21 data-sets from keel repository.
Keywords: Class Imbalance; Classification; Data Pre-processing; Sampling Technique; Earth Mover’s Distance.
Classification and Analysis of Users Review using Different Classification Techniques in Intelligent E-learning System
by Aditya Khamparia, Sanjay Kumar Singh, Ashish Kumar, Xia-Zhi Gao
Abstract: Background: The Internet comprised of large number of data in form of text, images, stickers etc. Which is also called as reviews or feedbacks created by user to share their expressions or knowledge. Users like to express their feelings as it is in free format and provides the information in an unstructured form (reviews/feedbacks). All those data may be in different kind like positive, negative or neutral, sometime it may be in a single word or a single sentence or in document form. Here it is intended to gain a better scope in E-learning and planned to extract knowledge from E-Learning sites like blogs, YouTube tutorials etc., Methods: There are few techniques which has be measured to provide better classifier like Classification- Support Vector Machine (SVM), Na
Keywords: Support vector machine; Sentiment Analysis; Opinion Mining; Supervised Learning; KNN; Naive.
Histopathological cells segmentation using exponential grasshopper optimization algorithm based fuzzy clustering method
by Varun Tiwari, S. C. Jain
Abstract: Automated cell segmentation in histopathological images is a challenging problem due to the complexities of these images. In this paper, a new exponential grasshopper optimization algorithm is presented which is further used to find the optimal fuzzy clusters for segmenting the cells in histopathological images. For better cluster quality, compactness is considered as the objective function. The performance of the proposed method is validated in terms of F1 score and aggregated jaccard index value on two standard histopathological image datasets, namely TNBC patients cancer dataset and UCSB bio segmentation images dataset. The simulation results show the effectiveness of the proposed method over other state-of-the-art clustering segmentation methods such as K-means and fuzzy c-means.
Keywords: Histopathological images; Cell segmentation; Nature-inspired algorithm; Grasshopper Optimization Algorithm.
A New Weighted Two-Dimensional Vector Quantization Encoding Method in Bag-of-Features for Histopathological Image Classification
by RAJU PAL, Mukesh Saraswat
Abstract: Automated histopathological image analysis is a challenging problem due to the complex morphological structure of histopathology images. Bag-of-features is one of the prominent image representation methods which has been successfully applied in histopathological image analysis. There are four phases in the bag-of-features method, namely feature extraction, codebook construction, feature encoding, and classification. Out of which feature encoding is one of the prime phases. In feature encoding phase, images are represented in terms of visual words before feeding into support vector machine classifier. However, the feature encoding phase of the bag-of-features framework considers the one feature to encode each image in terms of visual words due to which the system can not use the merits of other features. Therefore, to improve the efficacy of the bag-of-features framework, a new weighted two-dimensional vector quantization encoding method is proposed in this work. The proposed method is tested on two histopathological image datasets for classification. The experimental results show that the combination of SIFT and ORB features with a two-dimensional vector quantization encoding method returns 80.13% and 77.13% accuracy on ADL and Blue histology datasets respectively which is better than other considered encoding methods.
Keywords: Histolopathological image classification; Bag-of-features; Feature encoding.
Multi-Pose Facial Expression Recognition using Appearance based Facial Features
by Yogesh Kumar, Shashi Kant Verma, Sandeep Sharma
Abstract: Facial Expression Recognition is one among the most effective and accepted research predictions for the development of human-centered & interactive user interfaces with the propensity to respond to multimodal & natural occurring human communication. The interface helps to understand the human emotions and intentions channelized through several expressions expressed by the human face. The problem of automatic facial expression is both interesting and quite challenging with a strong impact on many application areas such as animation and human-computer interaction. The field has shown tremendous growth over the past years with its benchmarking efforts and progress. In this paper, an automated system to recognize facial expressions using the deep convolutional neural network is presented. The developed system is also tested in real time scenario by considering a camera to extract the human face and detect the facial expression. The proposed system has used the appearance based features to recognize the seven facial expressions (happy, sad, disgust, fear, anger, surprise & neutral) from image data with pose variations. The appearance based features are extracted by implementing an integrated approach of Gabor filter with Local Binary Pattern method and the selection process of extracted features is executed using the concept of Principal Component Analysis (PCA). The research database of JAFFE and KDEF are considered for experimentation. The proposed system performance is accessed using the evaluation metrics of precision, recall, f-measure, and recognition rate for the frontal and half side pose images.
Keywords: Facial Expressions; Appearance based facial expressions; Deep Convolutional Neural Network; Multi-pose face appearance.
Nature inspired computational intelligence implementation for privacy preservation in Map reduce framework
by Suman Madan, Puneet Goswami
Abstract: The next generation technologies made huge impact on the extent of data usage and is highly valued. The technologies motivated researchers to do lots of researches in data management field along with the advances in the automation in machine-human interactions across the globe. To handle this augmented big data, cloud data storage plays significant role. However, the issues of data security and data privacy preservation are still very challenging issues. The loss of privacy of user's data distresses the reliable service delivery. Several techniques are developed to do privacy preservation keeping in mind the data utility and data obfuscation; however, the trade-off among the privacy of data and its utility is not properly tackled. To solve many optimization problems in areas of science and technologies, the current trend is use of nature-inspired optimization algorithms which are examined on main features such as exploration and exploitation, diversity and adaptation and attractions and diffusion mechanisms. The work in this paper proposes implementation of two nature inspired optimization algorithms, namely Cat Swarm Optimization and Grey wolf optimizer, along with adaptation of k-anonymization criteria in the map-reduce framework for achieving privacy preservation goal. The new model will release only essential required information to users and will hide the confidential data parts. A fitness function is defined keeping in mind the trade-off between privacy and utility of information given to end-user. Lastly, a comparative analysis of new proposed technique is done with many established techniques on two performance metrics, namely Classification accuracy and Information loss. Further, the proposed algorithm is parallelized on the Map Reduce framework for handling the large-scale datasets.
Keywords: Privacy preservation; Grey wolf optimizer; cat swarm optimization.
Enhancing Sentiment Analysis using Enhanced Whale Optimization Algorithm
by Abdul Salam Mohammed, Vishal Shukla, Avinash Pandey
Abstract: Sentiment analysis is a contextual analysis of text that discovers opinion of users with respect to some sentimental topics commonly available at online social platform. Twitter is one of the popular social networking site where people express their views about any topic in the form of posts (tweets). These twitter posts are analyzed to obtain the viewpoints of users by using clustering based sentiment evaluation techniques. However, due to the subjective nature of the sentimental datasets metaheuristic clustering methods outruns the conventional methods for sentiment analysis. Therefore, in this paper, a new metaheuristic method based on whale optimization method (WOA) has been introduced for sentence-level sentiment classification. The proposed sentiment analysis method finds the optimal cluster centers from sentimental data. The performance of proposed sentiment analysis method has been tested on Twitter datasets and compared in respect to mean accuracy, mean recall, mean precision, mean fitness with other latest state-of- the art approaches including cuckoo search, whale optimization algorithm, grey wolf optimizer, bat algorithm, grasshopper optimization algorithm, and hybrid cuckoo search. The proposed sentiment evaluation method attains the highest accuracy for the most of datasets compared to state-of-the-art. Further, statistical analysis has also been performed to confirm the performance of proposed model.
Keywords: Sentiment Analysis; Metaheuristic Methods; Natural Language Processing; Clustering.
A novel DeepCNN model for Denoising analysis of MRI Brain Tumor images
by Srinivas B, Sasibhushana Rao G
Abstract: Medical images must be introduced to the specialists or doctors with high accuracy for the diagnosis of critical diseases like a brain tumor. In this paper, a novel DeepCNN model is proposed to perform MRI Brain Tumor image Denoising task and the results are compared with pre-trained DnCNN, Gaussian, adaptive, bilateral and guided filters. It is found that DeepCNN performs better than other filtering methods used. Different noise levels ranging from 5 to 50 and noises like Salt and pepper, Poisson, Gaussian, and Speckle noises are used to form the noisy images. Performance metrics like Peak Signal to Noise Ratio and Structural Similarity Index are calculated and compared across all filters and noises. The proposed DeepCNN model performs well for denoising with the unknown and known noise levels. It speeds up the training process and also improves the denoising performance because of using 17 convolutional layers and batch normalization.
Keywords: DeepCNN; Convolutional Neural Network; image denoising; deep denoiser; denoising CNN; DnCNN; general denoising filters; Machine Learning.
Security, Privacy and Trust (SPT): Privacy Preserving Model for Internet of Things
by Shelendra Kumar Jain, Nishtha Kesswani, Basant Agarwal
Abstract: With the advancements in the Information Technology, Internet of Things (IoT) has emerged as one of the dominant technologies. The IoT systems are capable of connecting everyone, everything and any service, and the analysis on the information gathered from such IoT devices provides signicant number of opportunities to solve many real-time problems such as in healthcare, agriculture, transport, smart-cities etc. However, the privacy protection is very important and challenging issue in the information sharing environment due to sensitive and personal information communicated through the IoT devices. Effective dealing with the privacy breaches in the IoT ecosystem is on the higher priority for the user satisfaction and success of the IoT market. In this paper, we present an overview of the issues and challenges being faced to deal with the privacy protection methods in the Internet of Things. We have proposed a privacy preserving model that ensures data privacy in IoT devices through a lightweight data collection and datarnaccess protocol in resource constrained IoT ecosystem. The experimental results and analysis show that the proposed model is effective, and provides relatively less time for data collection and data access as comparedrnthe existing models. We also provides a case study of the proposed approach on the healthcare based IoT system.
Keywords: Internet of Things; Data Privacy Protection; Obfuscation; Information Privacy; User Privacy; Healthcare; Agriculture.