International Journal of Intelligent Engineering Informatics (18 papers in press)
An Intelligent Undersampling Technique based upon Intuitionistic Fuzzy sets to alleviate Class Imbalance Problem of Classification with Noisy Environment
by Prabhjot Kaur, Anjana Gosain
Abstract: Traditional classification algorithms (TCA) does not work with the unequal class sizes. There are applications wherein the requirement is to discover the exceptional/rare cases such as frauds in credit card database or fraudulent mobile calls etc. TCA, when applied in such cases, are failed to detect rare cases. This is stated as the problem of imbalance classes. The problem is more serious when TCA are applied on the data distribution having other impurities like noise, overlapping classes and imbalance within classes. This paper presented an intelligent undersampling and ensemble based classification method to resolve the problem of imbalanced classes in noisy situation. A synthetic data-sets with different extent of noise is used to assess the classification performance of the proposed techniques. The results indicate that the presented undersampling and ensemble based classifier techniques has better classification performance in noisy situation when we compare them with RUS and SMOTE having classifiers like C4.5, RIPPLE, KNN, SVM, MLP, Naivebayes and with the ensemble techniques like Boosting, Bagging and RandomForest
Keywords: Class Imbalance; Intuitionistic Fuzzy Set; Undersampling; Class imbalance Learning; skewed distribution; Noisy environment; data level methods; ensemble approaches; Bagging; Boosting; Randomforest; Noise detection.
Template matching approach for automatic human body tracking in video
by Mehrez Abdellaoui, Ali Douik
Abstract: In this paper a novel template matching approach is presented to achieve automatic human body tracking in video sequences. The developed method which is based on a special template matching algorithm applied on a set of interest points detected on the human body contour. The matching approach is based on different types of similarity measures applied on consecutive frames from videos. Each frame was attacked with different types of noise: luminosity variation and motion blur. This new approach considers different matching constraints such as: cross-matching, uniqueness constraint and interest points appearances and disappearances between consecutive frames. The algorithm was validated on two different datasets and the obtained results are so encouraging with high values of matching rate and good Tracking rate.
Keywords: Interest points; template matching; similarity measures; tracking.
Stable Robust Predictive Controller for Nonlinear systems
by Ahmed Mnasser, Faouzi Bouani
Abstract: Stability of robust model predictive controller for SISO nonlinear dynamical systems is established in this paper. The neural networks model with parameter uncertainties is used to approximate the process behavior having different point functions. The control input action is obtained by solving online the minimax optimization problem subject to the model uncertainties and the input constraints. We have also study the stability of the closed loop system in the presence of model uncertainties by using the Lyapunov theory. A comparison study between the PID controller and the proposed robust predictive controller was performed to validate the feasibility of the use of the uncertain neural networks in control theory. A simulation example is presented in order to illustrate the efficiency of the proposed controller.
Keywords: Minimax optimization; Neural networks; Robust predictive control; Stability Analysis.
A Clustering Based Hybrid Approach for Dual Data Reduction
by Seema Rathee, Saroj Ratnoo, Jyoti Ahuja
Abstract: Abstract: - The research on data reduction techniques has become important to enhance the efficacy and efficiency of data mining algorithms which may otherwise be compromised in the presence of a large number of irrelevant attributes and redundant instances. Data can be reduced by selecting either a subset of attributes or instances. Dual selection treats the problem of feature and instance selection together as a single optimization problem. The problem of dual selection is relatively difficult as it involves an enormously large search space. In this paper, we propose a Hybrid Instance Feature Selection; HIFS-CHC method using Heterogeneous Recombination and Cataclysmic Mutation; CHC adaptive search genetic algorithm to solve the problem of dual selection. The proposed approach works in two stages. In the first stage, K-Means clustering algorithm is used to reduce the search space. The second stage incorporates stratified prototype selection and CHC algorithm for data reduction. The clustering based hybrid scheme is experimentally tested on sixteen benchmark datasets and compared with the other similar data reduction algorithms with respect to the predictive accuracy, reduction rate and execution time. Experimental results show that the proposed method outperforms the other methods in terms of reduction rate and execution time while preserving the predictive accuracy almost at the same level.
Keywords: Feature selection; instance selection; dual selection; data reduction; hybrid evolutionary approach.
A Purely Bayesian Approach for Proportional Visual Data Modeling
by Sami Bourouis
Abstract: In this paper, we focus on constructing new flexible and powerful parametric framework for proportional visual data modeling. In particular, we propose a Bayesian density estimation method based upon mixtures of scaled Dirichlet distributions. The consideration of Bayesian learning is interesting in several respects. It allows simultaneous parameters estimation and model selection, it permits also taking uncertainty into account by introducing prior information about the parameters and it allows overcoming learning problems related to over- or under-fitting. In this work, three key issues related to the Bayesian mixture learning are addressed which are the choice of prior distributions, the estimation of the parameters, and the selection of the number of components. Moreover, a principled Metropolis-within-Gibbs sampler algorithm for scaled Dirichlet mixtures is developed. Finally, the proposed Bayesian framework is tested via two challenging real-life applications namely scene reconstruction and face age estimation from images. The obtained results show the merits of our approach.rn
Keywords: Mixture models; Scaled Dirichlet; Bayesian inference; Gibbs sampling;rnMetropolis-Hastings; Scene reconstruction; Face age estimation.
Data Mining and Ontology Based Techniques in HealthCare Management
by Hassan Mahmoud, Enas Abbas, Ibrahim Fathy
Abstract: Recently, large amounts of data have been produced due to the achieved advances in biotechnology and health sciences fields. It includes clinical information and genetic data which contained in Electronic Health Records (EHRs). Therefore, there was a need for innovative and effective methods for representing this amount of data. On the other side, it is very important to detect syndromes, which can badly influence the human health in addition to putting financial burdens on their shoulders, in an early stage to avoid many complications. Recently, different data mining techniques in addition to ontology based techniques have played a great role in building automated systems that have the ability to detect syndromes efficiently and accurately. In this paper, we cover some of the research efforts that have employed either the data mining techniques or ontology based techniques, or both in detecting syndromes. Additionally, a set of well-known data mining techniques including Decision Trees (j48), Naïve Bayes, Multilayer Perceptron (MLP), and Random Forest (RF) has been assessed in performing the classification task using a publicly available heart diseases dataset.
Keywords: Data Mining, Ontology, Healthcare, Syndrome detection.
Special Issue on: Advances in Intelligent Big Data Analytics
Empirical Investigation of Dimension Hierarchy Sharing Based Metrics for Multidimensional Schema Understandability
by Anjana Gosain, Jaspreeti Singh
Abstract: Over the last years quality has gained lot of importance in the development of data warehouse systems. Predicting understandability of multidimensional schemas could play a key role in controlling data warehouse quality at early stages of development. In this area, some effort has been spent to define structural metrics and identity models for assessing quality of these systems. Of the structural properties used to define metrics, aspects of dimension hierarchies and its sharing plays primary role to enhance analytical capabilities of multidimensional schemas, thereby affecting their quality. The authors have previously proposed structural metrics based on aforementioned aspects. The objective of this study is to apply Principal Component Analysis (PCA) to find whether our metrics are improvements over the other existing metrics; and to apply Logistic Regression to study whether the metrics (selected as relevant in the extracted principal components) combined together are indicators of multidimensional schema understandability. The results of PCA confirm that our structural metrics based on the concept of sharing are different from other such metrics existing in the literature. Further, the metrics selected as principal components can be used in combination to predict understandability of data warehouse multidimensional schemas.
Keywords: Data Warehouse; Quality Metrics; Principal Component Analysis; Logistic Regression; Understandability; Multidimensional Schemas.
Measuring harmfulness of class imbalance by data complexity measures in oversampling methods
by Deepika Singh, Anjana Gosain, Anju Saha
Abstract: Many real world applications consist of skewed datasets which result in class imbalance problem. During classification, class imbalance cause underestimation of minority classes. Researchers have proposed a number of algorithms to deal with this problem. But recent research studies have shown that some skewed datasets are unharmful and applying class imbalance algorithms on these datasets lead to degenerated performance and increased execution time. In this research paper, we have pre-estimated the degree of harmfulness of class imbalance for skewed classification problems, using two of the data complexity measures: scatter matrix based class separability measure and ratio of intra-class versus inter-class nearest neighbors. Also the performance of oversampling based class imbalance classification algorithms have been analyzed with respect to these data complexity measures. The experiments are conducted using k-nearest neighbor (k-nn) and naivebayes as the base classifiers for this study. The obtained results illustrate the usefulness of these measures by providing the prior information about the nature of the imbalance datasets that help us to select the more efficient classification algorithm.
Keywords: class imbalance; data complexity measure; class separability measure; class overlapping; inter-class nearest neighbor; intra-class nearest neighbor; imbalance ratio; oversampling method.
Threshold based Empirical Validation of Object-Oriented Metrics on Different Severity Levels
by Aarti Aarti, Geeta Sikka, Renu Dhir
Abstract: Software metrics has become desideratum for the fault-proneness, reusability and effort prediction. To enhance and intensify the sufficiency of object-oriented (OO) metrics, it is crucial to perceive the relationship between OO metrics and fault-proneness at distinct severity levels. This paper characterize on the investigation of the software parts with higher probability of occurrence of faults. We examined the effect of thresholds on the OO metrics and build the predictive model based on those threshold values. This paper also instanced on the empirical validation of threshold values calculated for the OO metrics for predicting faults at different severity levels and builds the statistical model using logistic regression. This paper depicts the detection of fault-proneness by extracting the relevant OO metrics and focus on those projects that falls outside the specified risk level for allocating the more resources to them. We presented the effects of threshold values at different risk levels and also validated results on the KC1 dataset using machine learning and different classifiers. The results evaluated on the Receiver and operator (ROC) parameters concluded that threshold methodology has great potential for conducting prediction of faults and shows that analysis of result using machine learning techniques outperforms as compared to logistic regression.
Keywords: Fault; Object-oriented (OO) metrics; Classification; ROC; Level of severity; Empirical Validation.
An Ensemble Clustering Method for Intrusion Detection
by Kapil K. Wankhade, Kalpana C. Jondhale
Abstract: The amount of data in the field of computer networking growing rapidly and this urge new challenges in the field of an Intrusion Detection System (IDS). To handle such increasing volume of data, new hybrid approach has to be developed to overcome the problems such as high detection rate and low false alarm rate. An Intrusion Detection System plays a vital role against detection of malicious attacks. Data mining and machine learning techniques are important and plays vital role in detection of attacks. This paper mainly focuses on detection rate and false alarm rate so to resolves these problems a hybrid method, ensemble clustering has been proposed. This method tries to increase detection rate with lowering false alarm rate. The method has been tested on KDDCup99 network intrusion dataset and performs well as compared with other algorithms in terms of detection rate false alarm rate.
Keywords: boosting; classification; clustering; data mining; divide and merge; detection rate; false alarm rate; intrusion detection system; ensemble method; k-means.
Detecting Concept Drift using HEDDM in Data Stream
by Snehlata S. Dongre, Latesh G. Malik, Achamma Thomas
Abstract: In evolving Data Stream, when its concept undergoes a change it is known as concept drift. Detecting Concept Drift and handling it is a challenging task in Data Stream Mining. If an algorithm is not adapted to Concept Drift, then it directly affects its performance. A number of algorithms have been developed to handle concept drift, but they are not suited for both - Sudden Concept Drift and Gradual Concept Drift. Thus, there is a demand for an algorithm that can react to both the types of concept drift as well as incur less computational cost. A new approach - Hybrid Early drift Detection Method (HEDDM) - has been proposed for drift detection, which works with an ensemble method to improve the performance.
Keywords: Concept drift; data stream; classification; ensemble classifier; concept drift detection; DDM; EDDM; HEDDM; data stream mining; evolving data stream.
Dynamic Social Network Analysis and Performance Evaluation
by Sanur Sharma, Anurag Jain
Abstract: Social media in todays age is on a tremendous increase in terms of its usage and the enormous amount of data it generates which includes personal details of users, their images and the content that is being shared on such open source platforms. This has led to a lot of research and analysis of such networks and data that exists in social media. This paper is focused on dynamic analysis of social networks, where snapshots of network are taken at regular intervals and are analysed on various performance measures. The real time email dataset of a company (ENRON) has been evaluated and visualized dynamically. The network measures are evaluated at each timestamp and clustering is performed on that data and its performance is calculated on various measures. Tabu search optimization algorithm has been used for clustering the timestamped data and a comparison is done between the fixed size cluster and variable size clusters. The results suggests that for certain time stamps the value of precision, recall and f measure for fixed size clusters are better than the variable size clusters. These measures can further be used for the selection of the dynamic clustering techniques for social network analysis.
Keywords: Social Network; Dynamic Social Network; Clustering; Dynamic Network Analysis; Data Mining.
Special Issue on: Advances and Applications of Computational Intelligence
Speed Control of a Doubly-Fed Induction Machine (DFIM) Based on Fuzzy adaptive
by Abderazak SAIDI, Farid NACERI
Abstract: In this paper, we are interested in the adaptive fuzzy control a technique has been studied and applied, namely adaptive fuzzy control based on theory of Lyapunov. The system based on the stability theory is used to approximate the gains Ke and kdce to ensure the stability of the control in real time .the simulations results obtained by using Matlab environment gives that the fuzzy adaptive control more robust, also it has superior dynamics performances. The results and test of robustness will be presented.
Keywords: adaptive fuzzy control ; Doubly fed Induction Machine (DFIM) ; Fuzzy Control ; Robust control; regulator ; stability.
Whale Optimization Algorithm Based Controller Design for Reverse Osmosis Desalination Plants
by Natwar Singh Rathore, Vinay Pratap Singh
Abstract: In this contribution, whale optimization algorithm (WOA) based controllers are presented for reverse osmosis (RO) desalination plants. Two proportional-integral-derivative (PID) controllers are designed for flux and conductivity of RO plant model. The tuning of these controllers is carried out with a newly proposed algorithm i.e. WOA. The minimization of integral-of-squared-error (ISE) is considered as performance index for design of objective function in the problem. The performance of proposed controllers is compared with other optimization algorithms-based controllers. Simulation results show the supremacy of WOA based controllers over the other controllers. The proposed controllers are found best for RO desalination plants in terms of control of RO unit model.
Keywords: Conductivity; desalination; flux; integral-of-squared-error (ISE); proportional-integral-derivative (PID) controller; reverse osmosis (RO); whale optimization algorithm (WOA).
Performance evaluation of conventional and Fuzzy control systems for speed control of a DC motor using Positive Output Luo Converter
by Mohamed BOUTOUBA, Abdelghani El Ougli, Belkassem Tidhaf
Abstract: Precise speed control of DC motors is an important requirement for efficient industrial automation and diverse applications fields.
In this paper, a speed control of a DC motor for a photovoltaic system is proposed using fuzzy logic technique as a controller with a DC-DC converter type Positive output Luo converter.
Positive Output Luo converter, one of a new generation of DC-DC converters which presents multiples advantages, is used as an intermediary between the photovoltaic source and the DC motor, in order to control the transmitted power with low power losses. Multiples classical control techniques could be used to control DC motor speed. However, in this work a PI Fuzzy logic controller is proposed to get better pursuit, response and speed accuracy which represent important parameters to control on some industrial applications.
Different system blocks are developed on Matlab/Simulink as environment. Simulation results, using comparison between a Conventional PID controller and the PI-Fuzzy Logic controller, demonstrate the good behavior of the proposed system.
Keywords: DC motor; Speed control; Positive output Luo converter; PID controller; Fuzzy logic controller.
Evolutionary-based Method for Risk Stratification of Diabetic Patients
by Viorica Rozina Chifu, Emil Stefan Chifu, Ioan Salomie, Cristina Bianca Pop, Madalina Lupu
Abstract: Biologically-inspired computing is an interdisciplinary research domain that brings together principles from mathematics, computer science and biology in order to develop intelligent algorithms or high performance computing models that are able to capture the social behaviour of animals, insects, birds or other living organisms. Recently, bio inspired computing has been successfully applied for solving problems in the e-health domain. This chapter addresses the problem of optimality in the e-health domain by proposing an evolutionary-inspired method for clustering patients according to the risk of having diabetes. This method clusters patients based on their similarity with respect to the following features: age, sex, race category, body mass index, whether the patient has or hasnt hypertension, and the presence or absence of first-degree relatives with diabetes. Our method has been tested on the NHANESIII data set
Keywords: Patient Risk Stratification; Evolutionary Algorithms; Clustering indexes.
Design of an Adaptive Sliding Mode Controller for Efficiency Improvement of the MPPT for PV Water Pumping
by Sabah MIQOI, Abdelghani El Ougli, Belkassem Tidhaf
Abstract: This paper represents a conception and simulation of a photovoltaic (PV) water pump along with a new maximum power point tracker (MPPT) control to ensure the operation of the PV system at a maximum power for various climatic conditions. In particular, we propose a robust tracking controller, an adaptive sliding mode control (ASMC). Our system includes a PV panel, DC/DC Boost converter, a DC motor, a centrifuge water pump and an MPPT controller that generates the duty cycle to the boost converter. The proposed controller is compared to a sliding mode control (SMC) and a classic perturb and observe (P&O) algorithm. The system is simulated in MATLAB/SIMULINK and the results show the good functioning and the improvement of the performance of the PV system using the proposed controller.
Keywords: MPPT controller; DC/DC boost converter; PV panel; SMC (sliding mode control); adaptive sliding mode control; P&O algorithm; MPP; water pump; DC motor.
POFGURST: An expert intelligent system for mechanized oil palm fruit evaluating framework
by Gaurang Patkar
Abstract: The POFGURST framework is a product bundle for palm oil fruit grading using rough set theory. It is an apparatus for reviewing utilizing unthinkable information inside the structure of rough set hypothesis. POFGURST is intended to support the palm oil reviewing and information revelation process: From beginning perusing and preprocessing of the information, by means of calculation of insignificant trait sets and generation of if-then standards or expressive examples, to approval and investigation of the initiated principles or examples. POFGURST offers an exceedingly natural GUI environment where information navigational capacities are underlined. This product is uniquely intended for oil palm fruit evaluating and also malady expectation.
Keywords: Rough Set Theory; agriculturist; fuzzy logic; robotization; Unified modeling language; Chlorosis; Ganoderma,.