International Journal of Intelligent Information and Database Systems (22 papers in press)
Improving Named Entity Recognition and Disambiguation in News Headlines
by Jayendra Barua, Rajdeep Niyogi
Abstract: In this paper, we present a framework for extraction and disambiguation of Hyphenated and Partially named entities in news headlines. The direct application of state-of-the-art named entity detection and disambiguation approaches on news headlines results in significantly degraded performance due to different headline formatting in comparison with regular text; hyphenated mentions; and partial entity mentions. In this paper, we introduce a novel framework that assists existing named entity recognition and disambiguation systems to deal with introduced challenges. In particular, we deal with hyphenated entity mentions and partial entity mentions present in news headlines. We modify the hyphenated and partial entity in a way that increases the probability of disambiguation to correct entity in Knowledge base. Our technique leverages headlines of recent past to improve the entity mentions in headlines. The experimental results show that our technique improves the F1-score of mention detection by 12% and 9% in state-of-the-art Stanford and Illinois NER systems, whereas F1-score of disambiguation is improved by 9%, 12%, 7% and 5% in AIDA, Wikifier, TagMe, and YODIE state-of-the-art NED systems respectively.
Keywords: Information Retrieval; Named Entity Disambiguation; Mention Detection; Mention Modification; News headlines; Natural Language Processing.
A New Method of Event Relation Identification
by Yang Junhui, L.I.U. Zongtian, L.I.U. Wei
Abstract: Aiming at the problem that the traditional event relation identification cannot be considered semantic relation of event structural characteristics, this paper proposes a method of semantic relation based on dependency and co-occurrences. Dividing the text into event representation, using the distribution characteristics of the event elements, the phenomenon of the co-occurrences elements and the dependence relation between the text events, excavate clues of semantic relevant events. Then cluster the event set with the related thread by the improved AP algorithm. Experiments show that the semantic role of the event (six elements) can more accurately for calculate the degree of dependence and the co-occurrences overlap ratio of event elements between the candidate relation events, helpful to the more abundant candidate related event set, so as to improve the recognition ability of the matter.
Keywords: event; event relation; event element; dependency; co-occurrence;AP algorithm.
Query optimization in real-time data warehouses
by Issam Hamdi
Abstract: Nowadays the update frequency for traditional data warehouses cannot meet the objectives of real-time data analysis relying on data freshness. To alleviate this problem, the Real-Time Data Warehouse (RTDW) technology has emerged. A RTDW allows decision makers to access and analyze fresh data as fast as possible in order to support real-time decision processes. In this paper, we focus on optimization techniques to speed up query processing; in particular, a query response time optimization and storage space optimization. Then, we propose an architecture called DETL-(m, k)-firm-RTDW architecture (Decentralized Extract-Transform-Load approach based on (m, k)-Firm constraint for Real-Time Data Warehouse). This architecture deals with diversity and disparities in data source systems to reduce the time for ETL and it has threefold objectives: i) guarantee the data freshness, and ii) enhance the deadline miss ratio even in the presence of conflicts and unpredictable workloads. Finally, we evaluate our feedback control scheduling architecture which considers both materialized views and data fragmentation using the TPC-DS (TPC, 2014) benchmark; the preliminary results are quite promising.
Keywords: Real-Time Data Warehouse;\r\n Real-Time Transactions;\r\n Materialized views; \r\nData partitioning; \r\nETL.
A new approach for workflow evolution using MDA technology
by Berraouna Abdelkader, Amirat Abdelkrim, Meslati Djamel
Abstract: Most companies, independently of their sizes and activities types, do not provide sufficient adaptability and evolution of their workflow needed to deal with changes in business agreements and organization methods. In this paper, we treat the adaptability and evolution of workflow based on MDA technology. And we present our new approach for workflow evolution at the design-time, in this approach the workflow model is described by a specification at a very high level of abstraction using Meta-model concepts. The workflow model can be evolved by the workflow designer using model evolution operations. The evolution of models can be managed in different ways. The designers can refine the workflow model according to the evolution scenario proposed by the company manager. In this work we have developed endogenous rules in ATL which allow the transformation and evolution of workflow model and among these rules we have used the endogenous transformation model that affects models expressed in the same language.
Keywords: Meta-Model; Workflow; e-commerce business; Dynamic Evolution,rnMDA; flexibility; ATL.
Projecting Dependency Syntax Labels From English into Vietnamese in English-Vietnamese Bilingual Corpus
by Phuoc Tran, Van-Deo Duong, Dinh Dien, Bay Vo, Long Nguyen, Huu Nguyen
Abstract: In natural language processing, corpora play an important role, especially labeled corpora, including part of speech labeled corpora, component syntax labeled corpora, dependency syntax labelel corpora, and so on. These labeled corpora are used for corpus-based research and give higher quality results in the supervised learning methods than the non-labeled corpus. In this article, we have conducted to tag Vietnamese dependency labels based on an English-Vietnamese bilingual corpus in which English language was tagged dependency labels. The experimental results show that our method got a high tagging result with LAS measurement of 73.5% and UAS measurement of 81.7%.
Keywords: Natural language processing; Projecting dependency syntax; English- Vietnamese bilingual corpus.
Hierarchical Clustering on Metric Lattice
by Xiangyan Meng, Muyan Liu, Jingyi Wu, Huiqiu Zhou, Fei Xu, Qiufeng Wu
Abstract: This work proposes a new clustering algorithm named Fuzzy Interval Number Hierarchical Clustering (FINHC) by converting original data into fuzzy interval number (FIN) firstly, then it proves that denotes the collection of FINs is a lattice and introduces a novel metric distance based on the results from lattice theory, as well as combining them with hierarchical clustering. The relevant mathematical background about lattice theory and the specific algorithm which is used to construct FIN have been presented in this paper. Three evaluation indexes including compactness, recall and F1-measure are applied to evaluate .the performance of FINHC, Hierarchical Clustering(HC) k-means, k-medoids, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) in six experiments used UCI public datasets and one experiment used KEEL public dataset. The FINHC algorithm shows better clustering performance compared to other traditional clustering algorithms and the results are also discussed specifically.
Keywords: FIN; fuzzy interval number; hierarchical clustering; metric lattice; public datasets; compactness; recall; F1-measure.
Quality Materialized View Selection using Quantum Inspired Artificial Bee Colony Optimization
by Biri Arun
Abstract: The availability of huge volumes of digital data and powerful computers have facilitated the extraction of information, knowledge and wisdom for Decision Support System. The information value is solely dependent on data quality. Data warehouse provides quality data; it is required that it responds to queries within seconds. But on account of steadily growing data warehouse, the query response time is generally in hours and weeks. Materialized view is an efficient approach to facilitate timely extraction of information and knowledge for strategic business decision making. Selecting an optimal set of views for materialization, referred to as view selection, is a NP complete problem. In this paper, a Quantum Inspired Artificial Bee Colony Algorithm is proposed to address the view selection problem. Experimental results show that the proposed algorithm significantly outperforms the fundamental algorithm for view selection, HRUA, and other view selection algorithms like ABC, MBO, HBMO, BCOc, BCOi and BBMO.
Keywords: Artificial Bee Colony Optimization; Quantum Computing; Decision Support System; Data Warehouse; Materialized Views.
SmartGuard: An IoT based Intrusion Detection System for Smart Homes
by Nishtha Kesswani, Basant Agarwal
Abstract: With the help of sensor-enabled devices, we are able to achieve many tasks without human intervention. The sensor-enabled Internet of Things has made our lives simple. But at the same time security is an important issue that needs to be addressed. In this paper, we present an Intrusion detection system, SmartGuard that can be deployed in the modern world's Smart Home. The proposed system would be able to detect malicious behavior within the network as well as any malicious communications from outside.
Keywords: Intrusion detection; Smart Home ; Internet of Things;.
Random forest based active learning for content based image retrieval
by Nilesh Bhosle, Manesh Kokare
Abstract: The classification based relevance feedback approach is
considered as an emerging way to conquer the semantic gap problem
in content based image retrieval systems. However, such an approach
suffers from the problem of the imbalanced training dataset, which
causes instability and degradation in the retrieval results. In order to
tackle with this problem, a novel active learning approach based on
random forest classifier and feature reweighting technique is proposed in
this paper. Initially, a random forest classifier is used to learn the users
retrieval intention. Each binary decision tree in the forest vote for the
input images and final class is decided by considering the majority of
votes. Then, in active learning the most informative classified samples
are selected for manual labelling and added to the training dataset,
for retraining the classifier. Also, a feature reweighting technique based
on Hebbian learning is embedded in the retrieval loop to find the
weights of most perceptive features used for image representation. These
techniques are combined together to form a hypothesized solution for the
image retrieval problem. The experimental evaluation of the proposed
system is carried out on two different databases and shows a noteworthy
enhancement in retrieval results. An average precision of 87 % has been
achieved in 4 iterations of the relevance feedback.
Keywords: Content based image retrieval; Relevance feedback;Random forest learning; Active learning; Semantic gap; Feature reweighting; Information retrieval.
Special Issue on: SUSCOM-2019 Advancements in Computational Intelligence and Intelligent Database Design
Optimal Bag-of-Features using Random Salp Swarm Algorithm for Histopathological Image Analysis
by VENUBABU RACHAPUDI, Golagani Lavanya Devi
Abstract: Histopathological image classification is a prominent part of medical image classification. However, the classification of such images is a challenging task due to the presence of several morphological structures in the tissue images. Tissues images are categorized into four classes namely epithelium, connective, muscular, and nervous. These images have divergent variety of morphological structures. Therefore, the classification of these images is a grueling task and has been an active research area. Recently, bag-of-features method has been used for image classification tasks. However, bag-of-features method uses K-means algorithm to cluster the features, which is a sensitive algorithm towards the initial cluster centers and often traps into the local optima. Therefore, in this work, an efficient bag-of-features histopathological image classification method is presented using a novel variant of the salp swarm algorithm termed as random salp swarm algorithm. The efficiency of the proposed variant has been validated against the 20 benchmark functions. Further, the performance of the proposed method has been studied on blue histology image dataset and the results are compared with 5 other state-of-the-art meta-heuristic based bag-of-features methods in terms of four parameters, namely precision, recall, accuracy and F-measure. The experimental results demonstrates that the proposed method surpassed the other considered methods with an increase of 11% accuracy.
Keywords: Histopathological image classification; Salp Swarm Algorithm; Bag-of-features.
Machine Learning Based Book Recommender System: A Survey and New Perspectives
by Khalid Anwar, Jamshed Siddiqui, Shahab Saquib Sohail
Abstract: The exponential growth of recommender systems research has drawn the attention of the scientific community recently. These systems are very useful in reducing information overload and providing users with the items of their need. The major areas where recommender systems have contributed significantly include e-commerce, online auction, and books and conference recommendation for academia and industrialists. Book recommender systems suggest books of interest to users according to their preferences and requirements. In this article, we have surveyed machine learning techniques which have been used in book recommender systems. Moreover, evaluation metrics applied to evaluate recommendation techniques is also studied. Six categories for book recommendation techniques have been identified and discussed which would enable the scientific community to lay a foundation of research in the concerned field. We have also proposed future perspectives to improve recommender system. We hope that researchers exploring recommendation technology in general and book recommendation in particular will be finding this work highly beneficial.
Keywords: Book Recommender System; Machine Learning; Classification; Association Rule Mining; Evaluation Metrics.
A Conceptual Comparison of Metaheuristic Algorithms and Applications to Engineering Design Problems
by Kamalinder Kaur Kaleka, Avneet Kaur, Vijay Kumar
Abstract: This paper presents conceptual comparison among Spotted hyena
optimizer, Grey wolf optimizer, Particle swarm optimization, Ant colony
optimization, Gravitational search algorithm, Bat algorithm, Moth flame
optimization, Whale optimization algorithm. The behavior of these algorithms
is mathematical modeled to show the optimization process. Twenty-three
benchmark test functions are used to validate the performance of these
algorithms. The exploration and exploitation of these algorithms are analyzed
using convergence curve. The experimental results depict that Spotted hyena
optimizer and Grey wolf optimizer give optimal solutions as compared to the
other algorithms. Furthermore, these algorithms are tested on five constrained
engineering design problems. Experimental results reveal the applicability of
these algorithms in real-life engineering design problems.
Keywords: Metaheuristic; Spotted hyena optimizer; Gravitational search algorithm; Whale optimization algorithm; Moth flame optimization.
Fuzzy Based Approach to Incorporate Spatial Constraints in Possibilistic c-Means Algorithm for Remotely Sensed Imagery
by ABHISHEK SINGH, Anil Kumar
Abstract: This paper presents a robust Possibilistic c-Means with constraints (PCM-S) algorithms in a supervised way for remotely sensed imagery. The PCM-S overcome the disadvantages of PCM, by incorporating local information through spatial constraints to control the effect of neighboring terms. PCM-S has been deployed by adding spatial constraints in order to provide robustness to noise and outliers. Neighborhood labelling has been done in PCM-S by introducing local window (N_R) and regularizer parameter (?_i). Experiments have been conducted on Formosat-2 satellite imagery of Haridwar area in which classified results of PCM and PCM-S is optimized using Mean Membership Difference (MMD) method and performance of classifiers are analysed using Root Mean Square (RMSE) Method. Experiments performed on 1% Salt & Pepper Noisy Image and Original Image show that PCM-S classifier is effective in minimizing noisy pixels which produces least RMSE than PCM.
Keywords: Possibilistic c-Means (PCM); Possibilistic c-Means with constraints (PCM-S); Regularization Parameter; Mean Membership Difference (MMD); Root Mean Square Error (RMSE).
False-positive free transparent and optimal watermarking for colour images
by Neha Singh, Sandeep Joshi, Shilpi Birla
Abstract: Use of image watermarking as a tool for protection of ownership of the media has been successful. Embedding capacity, imperceptibility and robustness are three requirements of any watermarking technique. Singular Value Decomposition (SVD) and Discrete Wavelet Transforms (DWT) have been greatly used in this field. This paper presents a robust, blind watermarking technique for colour images based on SVD of DWT coefficients. The colour (RGB) image is first converted into Hue, Saturation and Value (HSV) model to segregate chromaticity information. The value plane undergoes 2-level DWT to represent the data in four parts. Horizontal (HL) and vertical (LH) sub-bands are used for embedding. These sub-bands are divided into non-overlapping blocks of size 4x 4. For each block, SVD is performed and the singular values (SV) are updated based on watermark bit and using Lagranges optimization principle. Two keys are used during embedding process. One of the key is used to distribute watermark into two parts to be embedded in two sub-bands. Another key is used as quantization step size during optimization of the SV for watermark embedding. Inverse of embedding technique is used to extract the watermark. Experiments show that the proposed technique is imperceptible as it offers PSNR > 40 dB. Also, the technique is able to resist general image processing operations (attacks) on the images with Structural Similarity Index measure (SSIM) nearly 1 and sufficiently high Bit Correct Ratio (BCR).
Keywords: Image Watermarking; Image processing; Singular Value Decomposition; Discrete Wavelet Transform.
A New Software Development Paradigm for Intelligent Information Systems
by Pooja Dehraj, Arun Sharma
Abstract: The continuous growth in software management cost requires the development of self-managed software systems. Using self-managed property, a system will take intelligent decisions to make a system work properly. Autonomic computing is the technique, which is used to develop such systems. Autonomic computing systems are highly reliable software systems. To enhance the quality of software systems, implementation of autonomic computing-based software development life cycle process may be a novel idea. It involves autonomous decision making by the autonomic component during the development of software. This approach reduces the complexity of the software development process. In addition, it resolves the purpose of autonomic computing to reduce software complexity and do real-time exception handling. In this paper, the implementation of the autonomic advisor based software development process is proposed using the cloud computing technique. Cloud Computing helps the developers to develop software, applications using deliverable services such as platform, infrastructure, and software. During the implementation and usage of autonomic advisor, the database becomes heavier. Therefore, to resolve such issues, cloud computing will be a beneficiary step. Other benefits of such an autonomous software development life cycle process are discussed further in this paper.
Keywords: Autonomic SDLC; Map-Reduce Requirement; Autonomic Advisor; Knowledge Database.
Classification and Analysis of Users Review using Different Classification Techniques in Intelligent E-learning System
by Aditya Khamparia, Sanjay Kumar Singh, Ashish Kumar, Xia-Zhi Gao
Abstract: Background: The Internet comprised of large number of data in form of text, images, stickers etc. Which is also called as reviews or feedbacks created by user to share their expressions or knowledge. Users like to express their feelings as it is in free format and provides the information in an unstructured form (reviews/feedbacks). All those data may be in different kind like positive, negative or neutral, sometime it may be in a single word or a single sentence or in document form. Here it is intended to gain a better scope in E-learning and planned to extract knowledge from E-Learning sites like blogs, YouTube tutorials etc., Methods: There are few techniques which has be measured to provide better classifier like Classification- Support Vector Machine (SVM), Na
Keywords: Support vector machine; Sentiment Analysis; Opinion Mining; Supervised Learning; KNN; Naive.
Histopathological cells segmentation using exponential grasshopper optimization algorithm based fuzzy clustering method
by Varun Tiwari, S. C. Jain
Abstract: Automated cell segmentation in histopathological images is a challenging problem due to the complexities of these images. In this paper, a new exponential grasshopper optimization algorithm is presented which is further used to find the optimal fuzzy clusters for segmenting the cells in histopathological images. For better cluster quality, compactness is considered as the objective function. The performance of the proposed method is validated in terms of F1 score and aggregated jaccard index value on two standard histopathological image datasets, namely TNBC patients cancer dataset and UCSB bio segmentation images dataset. The simulation results show the effectiveness of the proposed method over other state-of-the-art clustering segmentation methods such as K-means and fuzzy c-means.
Keywords: Histopathological images; Cell segmentation; Nature-inspired algorithm; Grasshopper Optimization Algorithm.
A New Weighted Two-Dimensional Vector Quantization Encoding Method in Bag-of-Features for Histopathological Image Classification
by RAJU PAL, Mukesh Saraswat
Abstract: Automated histopathological image analysis is a challenging problem due to the complex morphological structure of histopathology images. Bag-of-features is one of the prominent image representation methods which has been successfully applied in histopathological image analysis. There are four phases in the bag-of-features method, namely feature extraction, codebook construction, feature encoding, and classification. Out of which feature encoding is one of the prime phases. In feature encoding phase, images are represented in terms of visual words before feeding into support vector machine classifier. However, the feature encoding phase of the bag-of-features framework considers the one feature to encode each image in terms of visual words due to which the system can not use the merits of other features. Therefore, to improve the efficacy of the bag-of-features framework, a new weighted two-dimensional vector quantization encoding method is proposed in this work. The proposed method is tested on two histopathological image datasets for classification. The experimental results show that the combination of SIFT and ORB features with a two-dimensional vector quantization encoding method returns 80.13% and 77.13% accuracy on ADL and Blue histology datasets respectively which is better than other considered encoding methods.
Keywords: Histolopathological image classification; Bag-of-features; Feature encoding.
Multi-Pose Facial Expression Recognition using Appearance based Facial Features
by Yogesh Kumar, Shashi Kant Verma, Sandeep Sharma
Abstract: Facial Expression Recognition is one among the most effective and accepted research predictions for the development of human-centered & interactive user interfaces with the propensity to respond to multimodal & natural occurring human communication. The interface helps to understand the human emotions and intentions channelized through several expressions expressed by the human face. The problem of automatic facial expression is both interesting and quite challenging with a strong impact on many application areas such as animation and human-computer interaction. The field has shown tremendous growth over the past years with its benchmarking efforts and progress. In this paper, an automated system to recognize facial expressions using the deep convolutional neural network is presented. The developed system is also tested in real time scenario by considering a camera to extract the human face and detect the facial expression. The proposed system has used the appearance based features to recognize the seven facial expressions (happy, sad, disgust, fear, anger, surprise & neutral) from image data with pose variations. The appearance based features are extracted by implementing an integrated approach of Gabor filter with Local Binary Pattern method and the selection process of extracted features is executed using the concept of Principal Component Analysis (PCA). The research database of JAFFE and KDEF are considered for experimentation. The proposed system performance is accessed using the evaluation metrics of precision, recall, f-measure, and recognition rate for the frontal and half side pose images.
Keywords: Facial Expressions; Appearance based facial expressions; Deep Convolutional Neural Network; Multi-pose face appearance.
Nature inspired computational intelligence implementation for privacy preservation in Map reduce framework
by Suman Madan, Puneet Goswami
Abstract: The next generation technologies made huge impact on the extent of data usage and is highly valued. The technologies motivated researchers to do lots of researches in data management field along with the advances in the automation in machine-human interactions across the globe. To handle this augmented big data, cloud data storage plays significant role. However, the issues of data security and data privacy preservation are still very challenging issues. The loss of privacy of user's data distresses the reliable service delivery. Several techniques are developed to do privacy preservation keeping in mind the data utility and data obfuscation; however, the trade-off among the privacy of data and its utility is not properly tackled. To solve many optimization problems in areas of science and technologies, the current trend is use of nature-inspired optimization algorithms which are examined on main features such as exploration and exploitation, diversity and adaptation and attractions and diffusion mechanisms. The work in this paper proposes implementation of two nature inspired optimization algorithms, namely Cat Swarm Optimization and Grey wolf optimizer, along with adaptation of k-anonymization criteria in the map-reduce framework for achieving privacy preservation goal. The new model will release only essential required information to users and will hide the confidential data parts. A fitness function is defined keeping in mind the trade-off between privacy and utility of information given to end-user. Lastly, a comparative analysis of new proposed technique is done with many established techniques on two performance metrics, namely Classification accuracy and Information loss. Further, the proposed algorithm is parallelized on the Map Reduce framework for handling the large-scale datasets.
Keywords: Privacy preservation; Grey wolf optimizer; cat swarm optimization.
Enhancing Sentiment Analysis using Enhanced Whale Optimization Algorithm
by Abdul Salam Mohammed, Vishal Shukla, Avinash Pandey
Abstract: Sentiment analysis is a contextual analysis of text that discovers opinion of users with respect to some sentimental topics commonly available at online social platform. Twitter is one of the popular social networking site where people express their views about any topic in the form of posts (tweets). These twitter posts are analyzed to obtain the viewpoints of users by using clustering based sentiment evaluation techniques. However, due to the subjective nature of the sentimental datasets metaheuristic clustering methods outruns the conventional methods for sentiment analysis. Therefore, in this paper, a new metaheuristic method based on whale optimization method (WOA) has been introduced for sentence-level sentiment classification. The proposed sentiment analysis method finds the optimal cluster centers from sentimental data. The performance of proposed sentiment analysis method has been tested on Twitter datasets and compared in respect to mean accuracy, mean recall, mean precision, mean fitness with other latest state-of- the art approaches including cuckoo search, whale optimization algorithm, grey wolf optimizer, bat algorithm, grasshopper optimization algorithm, and hybrid cuckoo search. The proposed sentiment evaluation method attains the highest accuracy for the most of datasets compared to state-of-the-art. Further, statistical analysis has also been performed to confirm the performance of proposed model.
Keywords: Sentiment Analysis; Metaheuristic Methods; Natural Language Processing; Clustering.
Security, Privacy and Trust (SPT): Privacy Preserving Model for Internet of Things
by Shelendra Kumar Jain, Nishtha Kesswani, Basant Agarwal
Abstract: With the advancements in the Information Technology, Internet of Things (IoT) has emerged as one of the dominant technologies. The IoT systems are capable of connecting everyone, everything and any service, and the analysis on the information gathered from such IoT devices provides signicant number of opportunities to solve many real-time problems such as in healthcare, agriculture, transport, smart-cities etc. However, the privacy protection is very important and challenging issue in the information sharing environment due to sensitive and personal information communicated through the IoT devices. Effective dealing with the privacy breaches in the IoT ecosystem is on the higher priority for the user satisfaction and success of the IoT market. In this paper, we present an overview of the issues and challenges being faced to deal with the privacy protection methods in the Internet of Things. We have proposed a privacy preserving model that ensures data privacy in IoT devices through a lightweight data collection and datarnaccess protocol in resource constrained IoT ecosystem. The experimental results and analysis show that the proposed model is effective, and provides relatively less time for data collection and data access as comparedrnthe existing models. We also provides a case study of the proposed approach on the healthcare based IoT system.
Keywords: Internet of Things; Data Privacy Protection; Obfuscation; Information Privacy; User Privacy; Healthcare; Agriculture.