International Journal of Business Intelligence and Data Mining (64 papers in press)
STUDENTS SEARCH INTEREST MODEL OVER AN ORGANISATION BASED ON WEB LOG DATA.
by Sivakumaran A R, P. Marikkannu
Abstract: Browsing information through web has become part and parcel of life. There is hardly any user who does not browse through web. Generally any user when a search for content only looks at the top ten pages that get displayed in the web search. Therefore it has been proposed that the information such as the link that is created between both visited and unvisited web pages along with the path that is chosen in the search query needs a novel technique to give the best performance. The Operation Feature matrix (OFM) is used as one of the novel functionalities that have been used for extracting the data from the web. The automatically identified user profile is the graph based that is called as the Modified Page Outlook (MPO) graph was proposed that involves a link between the visited and the unvisited web pages.
Keywords: Web mining; Web log data Inter relation; Intra-relation; Total number of paths; Total number of relations Modified Page Outlook.
Risk Assessment and Management (RAM) in Enterprise Resource Planning (ERP) by advanced system Engineering theory
by Valanarasu R, A. Christy
Abstract: An ambitious task is to conduct a project related to enterprise resource planning (ERP). In any of the ERP business enterprises, the technological, psychosomatic and sociological characteristics are included. Managing such characteristics in any enterprises is high complex. The ERP as such categorised on the basis of implementation, software, supply chain management, resources, management and optimization. These characteristics are high risk to handle. The system still seems to be a growing system that has to be moulded into many forms. The aim of this manuscript is to present a risk assessment and management (RAM) in enterprise resource planning (ERP) by advanced system engineering theory.
Keywords: Enterprise Resource Planning; Risk management; Risk Assessment; Risk Value.
SEMANTIC WEB SERVICE DISCOVERY FOR MOBILE WEB SERVICES
by Bhuvaneswari A, G.R. Karpagam
Abstract: The process of service discovery is the most important task in web services. But this service discovery process may degrade network performance due to the mobile environment. To overcome these issues, a new approach called semantic mobile web services (SMWS) is proposed. This proposed
semantic mobile web services is used for a better discovery process even in the
high mobile environment. By using the query request (QRY REQ), response
(RSP) packets, user can identify the location of the mobile node as well as
discover the web services with the minimum utilisation of bandwidth resources.
During the service discovery process, a service registrar is included between
service requester and service provider to reduce overhead. The process of
matchmaking produces the exact match responses for the respective requestor
queries. This helps to increase the quality of performance and network
efficiency. Simulation has analysed the performance of the proposed semantic
mobile web services.
Keywords: Semantic Mobile Web Services (SMWS); Service Discovery; Matchmaking; Web Service Layered Architecture; Web Service Description Language (WSDL).
SURVEY OF VARIOUS METHODS FOR DIAGNOSTIC SIGNATURES FOR CUTANEOUS MELANOMA FROM GENETIC AND IMAGING DATA
by K. Thenmozhi, M. Rajesh Babu
Abstract: Early diagnosis of cutaneous melanoma is very hard for experienced dermatologists. Even though a lot of advanced imaging techniques and clinical diagnostic algorithms such as dermoscopy and the ABCD rule of dermoscopy respectively are available. The accuracy is an issue of distress (estimated to be about 75--85%) especially with oblique pigmented lesions. An effective diagnosis can be achieved by reducing the viewer variabilitys found in
dermatologists examinations. In order to improve some of existing methods
and budding new techniques to ease accurate, fast and reliable diagnosis of
cutaneous melanoma. In this paper different types diagnostic system of
melanoma namely, pre-processing feature extraction, feature selection and
classification is explained. The results of feature selection were optimised from
advanced classes of classification techniques; namely, two weighted k-nearest
neighbour (k-NN) classifiers (k = 1, 30), a decision tree (DT), and the random
forest (RF) algorithm are employed.
Keywords: Classification; composite biomarkers; Cutaneous Melanoma; dermoscopy and feature selection.
Detecting and Ranking events in Twitter using Diversity Analysis
by Daoud Daoud
Abstract: In Twitter and in other social media channels, detecting events is
very important and has many applications. However, this task is very
challenging because of the huge number of tweets that are posted every minute
and the massive scale of the spamming activities. In this paper, we present an
innovative approach for detecting events using data posted to Twitter. The
proposed approach is based on the concept of users attention by quantitatively
modelling the diversity of hashtags using Shannons index. Our method records
the diversity values on an hourly basis time-series. Using statistical techniques,
the method locates the intervals having diversity values that fall outside the
range of forecasted ones (normal state). We also present the labelling and
ranking techniques that are implemented in this research. Experimental results
on a dataset consisting of 15 million Arabic tweets show that our proposed
approach can effectively detect real-world events in Twitter.
Keywords: social media; event detection; diversity index; Twitter; Arabic;
hashtags; time-series analysis; z-score; events labelling; events ranking.
Developed global biotic cross pollination algorithm
by Sasikala Rani K, D. Rasi, S.N. Deepa
Abstract: This paper focuses on the visual-based colour image segmentation
with a global biotic cross pollination algorithm (GBCPA). The global biotic
cross pollination algorithm segments the structurally challenging objects based
on the colour, edge, entropy and edge information in the CIE L*a*b* colour
space. The L*a*b* colour space is a colour-opponent space considered to
approximate human vision. L* denotes the luminosity or brightness layer,
chromaticity layer a* indicates colour falls along red-green axis and
chromaticity layer b* indicates the blue-yellow axis. The FPO algorithm
considering the global biotic cross pollination is proposed to improve the
quality of the solution and computational speed. GBCPA is first introduced to
find the locality of the solution. The performance of GBCPA is tested on a
standard Berkeley segmentation dataset with 300 images. The dataset is
illustrated under different evaluating strategies.
Keywords: Global Biotic Cross Pollination Algorithm (GBCPA); artificial intelligence; dataset; Berkeley segmentation; entropy; Evolutionary Algorithms.
UPFC damping controller design using
multi-objective evolutionary algorithms
by Kannayeram G, P.S. Manoharan, M. Willjuice Iruthayarajan, T. Sivakumar
Abstract: In this paper, modified non-dominated sorting genetic algorithm-II
(MNSGA-II)-based optimal damping control of unified power flow controller
(UPFC) has been designed to enhance the damping of low frequency
oscillations in power systems. The robust damping of UPFC controller design
is formulated as multi-objective optimisation problem, thereby minimising the
integral squared error (ISE) of speed deviation and input control signal (u)
under wide range of operating conditions. The effectiveness of the proposed
controller is confirmed through nonlinear time domain simulation and Eigen
value analysis. The results are compared with NSGA-II and conventional
method. Simulation result reveals that the obtained Pareto-front using
MNSGA-II-based UPFC controllers are better and uniformly distributed due to
the controlled elitism and dynamic crowding distance concepts. The proposed
modulation index of shunt inverter (mE)-based damping controller is superior to
the other damping controllers under different loading conditions and improves
the stability of system.
Keywords: Flexible AC Transmission Systems (FACTS); Unified Power Flow Controller (UPFC); Non-dominated sorting genetic Algorithm (NSGA-II); Modified NSGA-II (MNSGA-II); Integral Squared Error (ISE); Genetic Al.
Performance Amelioration of Manets Using cooperative Routing with Cross-Layer Design
by S. Mylsamy, J. Premalatha
Abstract: In traditional routing there are various routing protocol used for
MANETs depending on the environment. The opportunistic routing protocol is
used as the basic protocol. In the CORBG location-based adaptive mechanism
was created and improves the network performance. The basic function of OR
is its ability to overhear the transmitted packet and to coordinate among
relaying nodes. A new routing protocol named cooperative opportunistic
routing based on geographic (CORBG) location has been implemented to get
better network performance. In the proposed work network optimisation might
be attained by CORBG protocol with cross-layer between network and
transport layer that evaluates the performance based on the network QoS
parameters like throughput, delay and energy consumption. In MANETs by
reason of dynamic communications and their decentralised admin makes the
network becomes more risk to have more attack. To prevent such network
problems, security mechanism were introduced.
Keywords: cross layer design; cooperative opportunistic routing based on
geographic location; energy consumption.
FIRA: FOCUSED INFORMATION RETRIEVAL ALGORITHM
by N. Kalpana, S. Appavu Alias Balamurugan
Abstract: Presently sight and sound information becomes available quickly
because of the propelled interactive media catching gadgets, such as,
computerised video recorder, portable camera, etc. As ordinary inquiry
by-content recovery does not fulfil clients needs in finding the desired
recordings viably, content-based video recovery stands out as the most sensible
answer to enhance the recovery quality. Moreover, video recovery utilising
inquiry by- picture is not effective in partnership with the recordings to the
clients advantage. In this manuscript, we intend a creative strategy to
accomplish the high calibre substance-based video recovery by finding the
transient examples. On premise of the found lobbyist designs, a productive
indexing procedure and a powerful grouping coordinating strategy are
incorporated to diminish the calculation cost and to raise the recovery
precision. Trial result uncover that our methodology is extremely encouraging
in upgrading content-based video recovery with regard to proficiency and
adequacy in NPTEL.
Keywords: Text-based video retrieval; Activist pattern; string matching; Pattern-based search; Fast-pattern-index tree; NPTEL.
Recognition of Sign Language using Image Processing
by Sandhya Arora, Ananya Roy
Abstract: According to World Health Organization, over 5% of the worlds
population have hearing and speaking disabilities. The primary language of
communication for people who are deaf and mute is the sign language. The
proposed system aims to recognise the American Sign Language and converts
it to text. Input given to the system is an image of the hand depicting the
necessary alphabet. The histogram of the input image is then computed and
checked for similarity with the histograms of pre-saved images by using the
Bhattacharyya Distance Metric. Implementation of the system will be a small
step in overcoming the social barrier of communication between the deaf-mute
people and the people who do not understand sign language. OpenCV is used
as a tool for implementing proposed system.
Keywords: American Sign Language; Bhattacharya Distance Metric; OpenCV histogram, .
AN EMPIRICAL APPROACH FOR COMPLEXITY REDUCTION AND FAULT PREDICTION FOR SOFTWARE QUALITY ATTRIBUTE
by Rajkumar , Viji , S. Duraisamy
Abstract: Designing the high-quality software is difficult one due to the high complexity and fault prone class. To reduce the complexity and predict the fault-prone class in the object orient software design, proposed a new empirical approach. This proposed approach concentrates more on to increase the software quality in the object oriented programming structures. This technique will collect the dataset and metric values from CK based metrics. And then complexity will be calculated based on the weighted approach. The fault prediction will be done, based on the low usage of the dataset and high complexity dataset. This helps to increase the software quality. In simulation section, the proposed approach has performed and analyse the parameters such as accuracy, fairness, recall, prediction rate and efficiency. Experimental results have shown that the proposed approach increases the prediction rate, accuracy and efficiency.
Keywords: Complexity reduction; Fault prediction; Software design; Software Quality; CK based metrics.
Data mining of unstructured big data in cloud computing
by A.K. Reshmy, D. Paulraj
Abstract: Hadoop Distributed File System, Talend, MapReduce (MR), YARN and Cloudera model have gotten to be prevalent advancements for expansive scale information association and investigation. In our work, we distinguish the prerequisites of the covered information association and propose an augmentation to the present programming model, called Comprehensive Hadoop Distributed File System along with MapReduce (C-HDFS-MR), to address them. The expanded interfaces is exhibited as application programming interface and actualized with regards to Image Processing application space. In our work, we show viability of C-HDFS-MR through contextual investigations of picture handling capacities along with the outcomes. Despite the fact that C-HDFS-MR has minimal overhead in information stockpiling and I/O operations, it enormously upgrades the framework execution and improves the application advancement process. Our proposed framework, C-HDFS-MR, works in the absence of progressions for the current prototypes, and is used by numerous applications to prerequisite of covered information.
Keywords: Big data; MapReduce (MR); Hadoop; Comprehensive Hadoop Distributed File System along with MapReduce (C-HDFS-MR); Medical Image Processing; Analysis; and Visualization (MIPAV).
A tabu search strategy to solve cell formation problem with ratio level data
by R. Kamalakannan, R. Sudhakara Pandian
Abstract: This paper concentrates on the cell formation problem for the ratio level data to the design of cellular manufacturing system The aim of this paper is to identify the machine cells and part family and as a result to create production cells in order to reduce the cell load variation A competent Tabu Search (TS) algorithm is proposed to investigate the search space of all possible solutions with a chain of moves This method is an iterative process for seeking a global optimum for the discrete combinatorial optimization problems The ratio level data is calculated in terms of time in seconds based on the data collected from the processing time of the part, production volume of the part and availability of the machine. The results clearly indicated that this proposed tabu search yield good results compared to the chosen benchmark problems.
Keywords: Tabu Search Algorithm; Cell Formation Problem; Ratio Level Data; Modified Grouping Efficiency.
Efficient Hardware Architecture for Integer Implementation of Multi-Alphabet Arithmetic Coding for Data Mining
by S.D. Jayavathi, A. Shenbagavalli, B. Ganapathy Ram
Abstract: The aim of this paper is to create an efficient hardware architecture for the Multi-alphabet arithmetic coding(MA-AC) in semicustom and full custom Application Specific Integrated Circuit(ASIC). The proposed hardware architectures are synthesized in Xilinx and Altera Field Programmable Gate Array (FPGA) devices to evaluate resource utilization and speed. Also, the physical design is encountered as ASIC device using Cadence Design environment tsmc0.18?m technology which shows area reduction of 12.75% and 23.61% and power consumption of 29.86% and 38.89% for encoder and decoder respectively.
Keywords: Multi-Alphabet Arithmetic coder; Encoder; Decoder; State diagram; Field Programmable Gate Array (FPGA); Application Specific Integrated Circuit (ASIC).
FREQUENT PATTERN SUB-SPACE CLUSTERING OPTIMIZATION (FPSSCO) ALGORITHM FOR DATAMINING FROM LARGE DATA BASE
by T. Sheik Yousuf, M. Indra Devi
Abstract: Data mining environment give a quick response to the user by fast and correctly pick-out the item from the large database is a very challenging task. Previously multiple algorithms were proposed to identify the frequent item since they are scanning database at multiple times. To overcome those problems we proposed Rehashing based Apriori Technique in which hashing technology is used to store the data in horizontal and vertical formats. Rehash Based Apriori uses hashing function to reduce the size of candidate item set and scanning of database, eliminate non-frequent items and avoid hash collision. After finding frequent item sets perform level wise subspace. We instigate Generalized Self Organized Tree based (GSTB) mechanism to adaptively selecting root to construct the path from the cluster head to neighbors when constructing the tree. Our experimental results show that our proposed mechanisms reduce the computational time of overall process.
Keywords: Sub-space clustering; GSTB (Generalized Self organized Tree Based Cluster Head selection).
Prediction parameters in Nano Fiber Composite Membrane for Effective Air Filtration Using Optimal Neural network
by V.S. Kandavel, Gabriel Mohan Kumar
Abstract: The capacity to build up steady and extensive trench structures by means of headed for great degree thin fibers would have wide innovative ramifications. Here we report a procedure to plan and make sandwich organized polyamide-6/polyacrylonitrile/polyamide-6 (PA-6/PAN/PA-6) composite membrane is considered.This is sensible for powerful air filtration via consecutive electro spinning by coordinating the elements of parts to foresee the distinctive mechanical properties with help of optimal weight of ANN structure.Distinctive inspired optimization strategies are used to touch base at the optimal weight of the ANN procedure. All the ideal results exhibit the way that the accomplished error values between the yield of the exploratory qualities and the anticipated qualities are firmly equivalent to zero in the outlined network.In addition, the most intense filtration accuracy and lower pressure drops furthermore the result demonstrates the base error of 96.72% dictated by the ANN. This is accomplished by the AFSO) strategies.
Keywords: Nanonets; Composite membrane; High efficiency; neural network and optimization techniques.
Behaviour-based Analysis of Tourism Demand in Egypt
by Taheya H. Ahmed, Mervat Abu-Elkheir, Ahmed Abou Elfetouh Saleh
Abstract: Tourism demand is the total number of persons who travel, or wish to travel, to use tourists' facilities and services at places away from their places of work or residence. Analysis of tourism demand helps companies understand tourists' needs and improves their marketing strategies. Current research for predicting tourism demand is targeted at foreign countries, and the little research targeted at predicting tourism demand in Egypt is based on macro forecasting and not on understanding the collective behavior of tourists. In this paper, we devise different granularities from tourist data that we collect and use these different granularities to provide different levels of demand prediction. We develop a hybrid prediction framework to analyze tourists behavior and infer behavior rules. These rules will act as recommendations that help to understand tourists' behavior and their needs, and define future policies regarding tourism in Egypt.
Keywords: Tourist Demand; Clustering; Data Mining; Cobweb; Classification; Egyptian Tourism.
A Multi-Objective Analysis Model in Mass Real Estate Appraisal
by Benedetto Manganelli, Pierfrancesco De Paola, Vincenzo Del Giudice
Abstract: The purpose of this research is to analyse the performance of a real estate valuation model based on the Multi Objective Decision Making methods. The optimal price function is achieved with the goal programming model. The price function which is described as the sum of the individual objectives (criteria), and the goals are the prices of comparable properties. The model integrates with the inductive and deductive approach overcomes many of the assumptions of the best known statistical approaches. The evaluation of the proposed model is performed by comparing the results obtained by the application, to the same case study, of a multiple linear regression model and a no-linear regression method based on Penalized Spline Smoothing model. The comparison shows, first of all, the best interpretation capabilities of the proposed model.
Keywords: Goal programming; multi-criteria; real estate market; multi objective decision making.
Information graph-based creation of parallel que-ries for databases
by Yulia Shichkina, Dmitry Gushchanskiy, Alexander Degtyarev
Abstract: The article describes the query parallelization method that takes into account the dependencies between operations in the data query. The method is based on the representation of the query as a directed graph with vertices as operations and edges as data connections. The graph is processed as an adjacency list, saving more memory than during processing a sparse adjacency matrix. The graph is modified only by operations, which do not change the elements of the adjacency list. Therefore it is possible to achieve intra-query
parallelism by consideration of a request structure and implementation of
mathematical methods of parallel calculations for its equivalent transformation.
This article also presents an example of complex query parallelisation and
describes applicability of the graph theory and methods of parallel computing
both for query parallelisation and optimisation.
Keywords: parallel computing; optimization methods; relational database; query; information graph; query parallelisation.
OLAP technology and Machine learning as the tools for validation of the Numerical Models of Convective Clouds
by Elena N. Stankova, Andrey V. Balakshiy, Dmitry A. Petrov, Vladimir V. Korkhov
Abstract: In the present work we use the technologies of machine learning and OLAP for more accurate forecasting of such phenomena as a thunderstorm, hail, heavy rain, using the numerical model of convective cloud. Three methods of machine learning: Support Vector Machine, Logistic Regression and Ridge Regression are used for making the decision on whether or not a dangerous convective phenomenon occurs at present atmospheric conditions. The OLAP technology is used for development of the concept of multidimensional data base intended for distinguishing the types of the phenomena (thunderstorm, heavy rainfall and light rain). Previously developed complex information system is used for collecting the data about the state of the atmosphere and about the place and at the time when dangerous convective phenomena are recorded.
Keywords: OLAP; online analytical processing; machine learning; validation of numerical models; numerical model of convective cloud; weather forecasting; thunderstorm; multidimensional data base; data mining.
Understanding urban development types and drivers in Wallonia. A multi-density approach
by Ahmed Mustafa, Ismaïl Saadi, Mario Cools, Jacques Teller
Abstract: In this study, urban development process in the Walloon region (Belgium) has been analysed. Two main aspects of development are quantitatively measured: the development type and the definition of the main drivers of the urbanisation process. Unlike most existing studies that consider the urban development as a binary process, this research considers the urban development as a continuous process, characterized by different levels of urban density. Eight urban classes are defined based on the Belgian cadastral data for years 2000 and 2010. A multinomial logistic regression model is employed to examine the main driving forces of the different densities. Sixteen drivers were selected, including accessibility, geo-physical features, policies and socio-economic factors. Finally, the changes from the non-urban to one of the urban density classes are detected and classified into different development types. The results indicate that zoning status (political factor), slope, distance to roads, population densities and mean land price respectively have impact on the urbanization process whatever maybe the density. The results also show that the impact of these factors highly varies from one density to another.
Keywords: urban development; urban density; development type; driving forces; multinomial logistic regression model; cadastral data.
Epsilon-Fuzzy Dominance Sort Based Composite Discrete Artificial Bee Colony optimisation for Multi-Objective Cloud Task Scheduling Problem
by Gomathi B, Karthikeyan Krishnasamy, B. Saravana Balaji
Abstract: Cloud computing environment provides on-demand virtualized resources for cloud application. The scheduling of tasks in cloud application is well known NP-hard problem. The Task scheduling problem is more complicated while satisfying multiple objectives, which are conflict in nature. In this paper, Epsilon-fuzzy Dominance based Composite Discrete Artificial Bee Colony (EDCABC) approach is used to generate Pareto optimal solutions for multi-objective task scheduling problem in cloud. Three conflicting objectives, such as makespan, execution cost and resource utilization, are considered for task scheduling problem. The Epsilon-fuzzy dominance sort approach is used to choose the best solutions from the Pareto optimal solution set in the multi-objective domain. EDCABC with composite mutation strategies and fast local search method are used to enrich the local searching behaviors which help to avoid the premature convergence. The performance and efficiency of the proposed algorithm is compared with NSGA-II and MOPSO algorithms. The simulation results express that proposed EDCABC algorithm substantially minimizes the makespan, execution cost and ensures the proper resource utilization when compare to specified existing algorithm.
Keywords: Task scheduling; Discrete Artificial Bee Colony; Cloud computing; Makespan; Execution cost; Fuzzy Dominance.
Haphazard, Enhanced Haphazard and Personalised Anonymisation for Privacy Preserving Data Mining on Sensitive Data Sources
by Prakash M, G. Singaravel
Abstract: Privacy preserving data mining is a fast growing new era of research
due to recent advancements in information, data mining, communications
and security technologies. Government agencies and many other
non-governmental organisations often need to publish sensitive data that
contain information about individuals. The important problem is publishing
data about individuals without revealing sensitive information about them. A
breach in the security of a sensitive data may expose the private information of
an individual, or the interception of a private communication may compromise
the security of a sensitive data. The objective of the research is to publish data
without revealing the sensitive information of individuals, at the same time the
miner need to discover non-sensitive knowledge. To achieve the above
objective, haphazard anonymisation, enhanced haphazard anonymisation and
personalised anonymisation are proposed for privacy and utility preservation.
The performances are evaluated based on vulnerability to attacks, efficiency
and data utility.
Keywords: Anonymisation; Big Data; Data Analytics; Data Mining; Data Publishing; Microdata; Privacy Preserving Techniques; Privacy Preserving; Privacy; Sensitive Data Publishing.
Proposal and Examination of the FLAP Algorithm
by Daniel Giterman, Eyal Brill
Abstract: In real classification problems, common learning algorithms
generally fail to describe instances that require complicated classification logic.
Additionally, it is often difficult to ensure a satisfying amount of classified data
for their training. In this work, we propose and examine a new learning
algorithm that also integrates expert logic. Essentially, this algorithm takes
advantage of unclassified data to produce a self-generated fuzzy inference
system that is eventually used as a classifier. It also utilises a mere sample of
classified data in order to compare various classifiers constructed from different
algorithm options, thus finally achieving an assumingly more accurate result.
As part of our study, this algorithm was compared with six well-known
supervised learning algorithms such as artificial neural networks, support
vector machine and random forest. We used the ten-fold cross-validation
technique with Kappa statistic to assess algorithm performance. Subsequently,
in order to find statistically significant dissimilarities among the algorithms, we
used a two-tailed Friedman test. After the null hypothesis was rejected, we used
a Nemenyi post-hoc test to prove differences between pairs of algorithms.
Consequently, despite lacking in efficiency and scalability, our algorithm
proved to be highly competitive and demonstrated excellent classification
Keywords: fuzzy logic; fuzzy inference systems; learning algorithms; semi-supervised learning; hybrid algorithms; data classification; algorithms comparison; statistical tests.
Modelling Economic Choice under Radical Uncertainty: Machine Learning Approaches
by Antov Gerunov
Abstract: This paper utilises a novel experimental dataset on consumer choice
to investigate and benchmark the performance of alternative statistical models
under conditions of extreme uncertainty. We compare the results of logistic
regression, discriminant analysis, na
Keywords: choice; decision-making; social network; machine learning; uncertainty; social network; logistic regression; neural network; random forest; consumer choice; modeling.
Estimation of Coffee Rust infection and growth through two-level classifier ensembles based on expert knowledge
by David Camilo Corrales Munoz, Emmanuel Lasso, Apolinar Figueroa Casas, Agapito Ledezma, Juan Carlos Corrales
Abstract: Rust is a disease that leads to considerable losses in the worldwide coffee industry. There are many contributing factors to the onset of coffee rust e.g. crop management decisions and the prevailing weather. In Colombia the coffee production has been considerably reduced by 31 % on average during the epidemic years compared with 2007. Recent research efforts focus on detection of disease incidence using computer science techniques such as supervised learning algorithms. However, a number of different authors demonstrate that results are not sufficiently accurate using a single classifier. Authors in the computer field propose alternatives for this problem, making use of techniques that combine classifier results. Nevertheless, the traditional approaches have a limited performance due to dataset absence. Therefore, we proposed two-level classifier ensembles for coffee rust infection and growth estimation in Colombian crops, based on expert knowledge.
Keywords: coffee; rust; classifier; ensemble; dataset; expert; knowledge.
Rough Set Theory-Based Feature Selection and FGA-NN Classifier for Medical Data Classification
by B. Vijayalakshmi, Sugumar Rajendran
Abstract: The prediction of heart disease is difficult task, which needs much
experience and knowledge. In order to reduce the risk of heart disease
prediction, in this paper we proposed a rough set theory-based feature selection
and FGA-NN classifier. The overall process of the proposed system consists of
two main steps, such as: 1) feature reduction; 2) heart disease prediction. At
first, the kernel fuzzy c-means clustering with roughest theory (KFCMRS)
algorithm is applied to the high dimensional data to reduce the dimension of the
attribute. After that, the medical data classification is done through FGA-NN
classifier. To improve the prediction performance, hybridisation of firefly and
genetic algorithm (FGA) is utilised with NN for weight optimisation. At last,
the experimentation is performed by means of Cleveland, Hungarian, and
Switzerland datasets. The experimentation result proves that the FGA-NN
classifier outperformed the existing approach by attaining the accuracy of 83%.
Keywords: Heart disease; FGA-NN; KFCMRS; scaled conjugate gradient; prediction; feature reduction; optimisation.
An Improved Incremental Algorithm for mining Weighted Class Association Rules
by B. Subbulakshmi, C. Deisy
Abstract: Constructing fast and accurate classifiers for large data sets is an important task in data mining. Associative Classification can produce more efficient and accurate classifiers than traditional classification techniques. Weighted Class Association Rule (WCAR) mining reflects significance of items by considering their weight. Moreover, real time databases are dynamic. This influences the need for incremental approach for classification. Existing incremental classification algorithms suffer from issues like longer execution time and higher memory usage. This paper proposes an algorithm which uses hash structure to store weighted frequent items and the concept of difference of object identifiers to compute the support faster. For mining incremental databases, pre-large concept is used to reduce the number of re-scans over the original database. The proposed algorithm was implemented and tested on experimental data sets taken from UCI repository. The results show that proposed algorithm for mining WCARs gives better results compared to existing algorithm.
Keywords: Classification; Class Association Rules; Weighted Frequent Itemsets; Incremental Mining; Hash Structure.
Students Performance Prediction using Hybrid Classifier Technique in Incremental Learning
by Roshani Ade
Abstract: The performance in higher education is a turning point in the academics for all students. This academic performance is influenced by many factors, therefore it is essential to develop predictive data mining model for student's performance so as to identify the difference between high learners and slow learners student. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. In our paper we used the hybrid classifier approach for the prediction of students performance using Fuzzy ARTMAP and Bayesian ARTMAP classifier. Sensitivity analysis was performed and irrelevant inputs were eliminated. The performance measures used to determine the performance of the techniques include Matthews Correlation Co-efficient (MCC), Accuracy Rate, True Positive, False Positive and Percentage correctly classified instances. The combined result gives the good accuracy for predicting students
Keywords: Hybrid Classifier; Incremental Learning; Fuzzy ARTMAP; MCC.
Multiple Object Tracking by Employing Shaped Based Features and Kalman Filter
by Felix M. Philip, Rajeswari Mukesh
Abstract: There has been fast development happening in the multimedia and
the related technologies, particularly associated with visual tracking and search
operations. Moving target detection has been comprehensively engaged in
various arenas but has the disadvantage that the scheme is frequently complex
and also that tracking is affected numerous external factors. In this article,
multiple objects recognition and tracking is projected so as to progress the
method and make it more robust and general with assistance of shape-based
features and Kalman filter. Primarily, the input video is rehabilitated to frames
and then manually segmented for object segmentation. Consequently, the
objects are tracked with the help of Kalman filtering. The method is assessed
under standard evaluation metrics of error value and the score value. The
technique achieved maximum score values of 95% and minimum error value of
25%. The results validate the effectiveness of the technique.
Keywords: Moving Object Detection; Tracking; Shape based Features; Kalman Filtering and Segmentation.
Multi Performance Parameters Analysis in a Manufacturing System Using Fuzzy Logic and Optimal Neural Network Model
by R. Prasanna Lakshmi, P. Nelson Raja
Abstract: Support operations enhance machine conditions; additionally
involve potential creation time, conceivably postponing the client orders. The
target of this paper is to decide execution parameters in every work stations
with foresee the cost, dependability and accessibility of the business. This
estimate examination considers two sorts of various methodologies, for
example, FLP ideal neural system model. At first utilising FLP to foresee the
exhibitions parameters and expanding the exactness of examination by means
of ANN with motivated enhancement procedure to upgrade the weights in
structure. All the ideal results exhibit the way that the accomplished mistake
values between the yield of the trial values and the anticipated qualities are
firmly equivalent to zero in the planned system. From the outcomes the
proposed KHO-based ideal neural system demonstrates the exactness is 98.23%
it is contrasted with the Pareto improvement model.
Keywords: Preventive maintenance; optimisation; neural network; fuzzy logic and manufacturing industry.
Privacy Preserving Data Mining using Hiding Maximum Utility Item First Algorithm By means of Grey wolf optimisation Algorithm
by M.T. Ketthari, Rajendran Sugumar
Abstract: In the privacy preserving data mining, the utility mining casts a very
vital part. The objective of the suggested technique is performed by concealing
the high sensitive item sets with the help of the hiding maximum utility item
first (HMUIF) algorithm, which effectively evaluates the sensitive item sets by
effectively exploiting the user defined utility threshold value. It successfully
attempts to estimate the sensitive item sets by utilising optimal threshold value,
by means of the grey wolf optimisation (GWO) algorithm. The optimised
threshold value is then checked for its performance analysis by employing
several constraints like the HF, MC and DIS. The novel technique is performed
and the optimal threshold resultant item sets are assessed and contrasted with
those of diverse optimisation approaches. The novel HMUIF considerably cuts
down the calculation complication, thereby paving the way for the
enhancement in hiding performance of the item sets.
Keywords: Data Mining; Privacy Preserving Utility Mining; Sensitive Item sets; optimal threshold; Grey wolf optimisation.
Fuzzy- MCS algorithm based Ontology generation for E Assessment
by A. Santhanavijayan, S.R. Balasundaram
Abstract: Ontologies can lead to important improvements in the definition of a
courses knowledge domain, in the generation of an adapted learning path, and
in the assessment phase. This paper provides an initial discussion of the role of
ontologies in the context of e-learning. Generally, automatic assessment is
preferred over manual assessment to avoid bias errors, human errors and also
conserves teachers time. Evaluation through objective tests like multiple
choice questions has gained a lot of importance in the e-assessment system.
Here we have proposed an efficient ontology generation based on soft
computing techniques in e-assessment for multiple choice questions. We have
employed fuzzy logic incorporated with optimisation algorithm like modified
cuckoo search algorithm. Here a set of rules are first designed for creating the
ontology. The rules are generated using fuzzy logic and these rules are
optimised in order to generate a better ontology structure.
Keywords: Ontologies; MCS algorithm; Fuzzy; e-learning.
Minimal constraint based cuckoo search algorithm for Removing Transmission Congestion and Rescheduling the Generator units
by N. Chidambararaj, K. Chitra
Abstract: In the paper, a minimal constraint based cuckoo search (CS)
algorithm is proposed for solving transmission congestion problem by
considering both increase and decrease in generation power. Thus, the proposed
algorithm is used to optimise the real power changes of generator while
transmission congestion occurred. Then, the power loss, generator sensitivity
factor and congestion management cost of the system is evaluated by the
proposed algorithm according to the transmission congestion. The proposed
method is implemented in MATLAB working platform and their congestion
management performance is analysed. The performance of the proposed
method is compared with the other existing methods such as fuzzy adaptive
bacterial foraging (FABF), simple bacterial foraging (SBF), particle swarm
optimisation (PSO), and artificial neural network (ANN)-CS respectively. The
congestion management cost is reduced up to 26.169%. Through the analysis of
comparison, it is shown that the proposed technique is better and outperforms
other existing techniques in terms of congestion management measures.
Keywords: minimal constraint based CS algorithm; PSO; ANN; real power; congestion management; power loss and congestion management cost.
Effective Discovery of Missing Links in Citation Networks Using Citation Relevancy Check Process
by Nivash J P, L.D. Dhinesh Babu
Abstract: Effective dissemination of knowledge published by eminent authors
in reputed journals and ensuring that the referred work is cited properly is the
need of the hour. Citation analysis is about the similarity measures of articles or
journals which are put forward to scaling as well as clustering procedures. A
proper citation relevancy check (CRC) is required to avoid the missing links in
the citation networks. Both similar and dissimilar references in the articles have
important article citations. The purpose of this work is devise a method to find
the most significant articles which can provide useful information to the journal
editors and writers. The strategy presented in this paper can assist an author to
incorporate most important articles and can help the editor in evaluating the
quality of the references. The main benefit in detecting the missing articles is
improvement in quality of research along with increased citation count.
Keywords: Citation network analysis; Missing citations; Citation relevancy check; Increasing citation count.
A Distributed Cross-layer Recommender System Incorporating Product Diffusion
by Ephina Thendral, C. Valliyammai
Abstract: In this era of online retailing, personalisation of web content has
become very essential. Recommender system is a tool for extraction of relevant
information to render personalisation in web information retrieval systems.
With an inclination towards customer oriented service, there is a need to
understand the adaptability of customers, to provide products/services of
interest at the right time. In this paper, a model for distributed context aware
cross layer recommender system incorporating the principle of product
diffusion is proposed. The offline-online modelled recommender system learns
offline about the adaptation time of users using the principle of product
diffusion and then, uses online explore-then-exploit strategy to make effective
recommendations to the user at the most probable time of consumption. Also,
an algorithm based on product adaptability is proposed for recommending new
items to the most probable users. The extensive experiments and results
demonstrate the efficiency, scalability, reliability and enhanced retrieval
effectiveness of the proposed recommender system model.
Keywords: Recommender Systems; Personalization; Product Diffusion; Distributed Graph Model; Hadoop; Hbase; Titan graph database; Spark; Cross layer; Distributed processing.
A Critique of Imbalanced Data Learning Approaches for Big Data Analytics
by Amril Nazir
Abstract: Biomedical research becomes reliant on multi-disciplinary,
multi-institutional collaboration, and data sharing is becoming increasingly
important for researchers to reuse experiments, pool expertise and validate
approaches. However, there are many hurdles for data sharing, including the
unwillingness to share, lack of flexible data model for providing context
information for shared data, difficulty to share syntactically and semantically
consistent data across distributed institutions, and expensive cost to provide
tools to share the data. In our work, we develop a web-based collaborative
biomedical data sharing platform SciPort to support biomedical data sharing
across distributed organisations. SciPort provides a generic metadata model for
researchers to flflexibly customise and organise the data. To enable convenient
data sharing, SciPort provides a central server-based data sharing architecture,
where data can be shared by one click through publishing metadata to the
central server. To enable consistent data sharing, SciPort provides collaborative
distributed schema management across distributed sites. To enable semantic
consistency for data sharing, SciPort provides semantic tagging through
controlled vocabularies. SciPort is lightweight and can be easily deployed for
building data sharing communities for biomedical research.
Keywords: imbalanced big data learning; large-scale imbalanced data analysis; high-dimensional imbalanced data learning.
A Novel Multi-class Ensemble model based on feature selection using Hadoop framework for classifying imbalanced Biomedical Data
by THULASI BIKKU, N. Sambasiva Rao, Ananda Rao Akepogu
Abstract: Due to the exponential growth of biomedical repositories such as
PubMed and Medline, an accurate predictive model is essential for knowledge
discovery in Hadoop environment. Traditional decision tree models such as
multi-variate Bernoulli model, random forest and multinominal na
Keywords: Ensemble model; Hadoop; Imbalanced data; Medical databases; Textual Decision Patterns.
An optimised approach to detect the identity of
hidden information in gray scale and colour images
by Murugeswari Ganesan, Deisy Chelliah, Ganesan Govindan
Abstract: Feature-based steganalysis is an emerging trend in the domain of
Information Forensics, aims to discover the identity of secret information
present in the covert communication by analysing the statistical features of
cover/stego image. Due to massive volumes of auditing data as well as complex
and dynamic behaviours of steganogram features, optimising those features is
an important open problem. This paper focused on optimising the number of
features using the proposed quick artificial bee colony (qABC) algorithm. Here
we tested for three steganalysers, namely subtractive pixel adjacency matrix
(SPAM), phase aware projection model (PHARM) and colour filter array
(CFA) for the break our steganographic system (BOSS) 1.01 datasets. The
significant improvement in the convergence nature of qABC quickly improves
the solution and fine tune the search than their real counterparts. The results
reveal that qABC method with support vector machine (SVM) classifier
outperforms the non-optimised version concerning classification accuracy and
reduced number of feature sets.
Keywords: Steganalysis; Feature Selection; Optimisation; Classification.
An Effective Preprocessing Algorithm for Model Building in Collaborative Filtering based Recommender System
by Srikanth T, M. Shashi
Abstract: Recommender systems suggest interesting items for online users based on the ratings expressed by them for the other items maintained globally as the rating matrix. The rating matrix is often sparse and very huge due to large number of users expressing their ratings only for a few items among the large number of alternatives. Sparsity and scalability are the challenging issues to achieve accurate predictions in recommender systems. This paper focuses on model building approach to collaborative filtering-based recommender systems using low rank matrix approximation algorithms for achieving scalability and accuracy while dealing with sparse rating matrices. A novel preprocessing methodology is proposed to counter data sparsity problem by transforming the sparse rating matrix denser before extracting latent factors to appropriately characterise the users and items in low dimensional space. The quality of predictions made either directly or indirectly through user clustering were investigated and found to be competitive with the existing collaborative filtering methods in terms of reduced MAE and increased NDCG values on bench mark datasets.
Keywords: Recommender System; Collaborative Filtering; Dimensionality Reduction; Pre- Processing,Sparsity,Scalability,Matrix Factorization.
Error Tolerant Global Search Incorporated With Deep Learning Algorithm to Automatic Hindi Text Summarization
by J. Anitha, P.V.G.D. Prasad Reddy, M.S. Prasad Babu
Abstract: There is an exponential growth in the available electronic
information in the last two decades. It causes a huge necessity to quickly
understand high volume text data. This paper describes an efficient algorithm
and it works by assigning scores to sentences in the document which is to be
summarised. It also focuses on document extracts; a particular kind of
computed document summary. The proposed approach uses fuzzy classifier and
deep learning algorithm. Fuzzy classifier produces score for each sentence and
the deep learning (DL) also produces score for each sentence. The combination
of score from both fuzzy classifier and DL produces the hybrid score. Finally,
the summarised text can be generated based on this hybrid score. In our
proposed approach, we have achieved an average precision rate of 0.92 and
average recall rate of 0.88 and the compression rate is 10% according to the
Keywords: GSA; Fuzzy; summarisation; hybrid; deep learning.
Network Affinity Aware Energy Efficient Virtual Machine Placement Algorithm
by Ranjana Ramamurthy, S. Radha, J. Raja
Abstract: Efficient mapping of virtual machine request to the available
physical machine is an optimisation problem in data centres. It is solved by
aiming to minimise the number of physical machines and utilising them to their
maximum capacity. Another avenue of optimisation in data centre is the energy
consumption. Energy consumption can be reduced by using fewer physical
machines for a given set of VM requests. An attempt is made in this work to
propose an energy efficient VM placement algorithm that is also network
affinity aware. Considering the network affinity between VMs during the
placement will reduce the communication cost and the network overhead. The
proposed algorithm is evaluated using the Cloudsim toolkit and the
performance in terms of energy consumed, communication cost and number of
active PMs, is compared with the standard first fit greedy algorithm.
Keywords: Virtualisation; affinity aware; cloud computing; virtual machine placement; network affinity.
A Secured Best Data Center Selection in Cloud Computing Using Encryption Technique
by Prabhu A., M. Usha
Abstract: In this work, we have proposed an approach for providing very high security to the cloud system. Our proposed method comprises of three phases namely authentication phase, cloud data centre selection phase and user related service agreement phase. For the purpose of accessing data from the cloud server, we will need a secure authentication key. In the authentication phase, the user authentication is verified and gets the key then encrypts the file using blowfish algorithm. Before encryption the input data is divided into column-wisely with the help of pattern matching approach. In the approach, the encryption and decryption processes are carried out by employing the blowfish algorithm. We can optimally select the cloud data centre to store the data; here the position is optimally selected with the help of bat algorithm. In the final phase, the user service agreement is verified. The implementation will be done by cloud sim simulator.
Keywords: Authentication key; blowfish; Bat algorithm; pattern match; Cloud Data Center Selection.
Combined Local color curvelet and mesh pattern for image retrieval system
by YESUBAI RUBAVATHI
Abstract: This manuscript presents the content based image retrieval
system using new textural features such as colour local curvelet (CLC) based
textural descriptor and colour local mesh pattern (CLMP), for the intention
of increasing the performance of the image retrieval system. The proposed
methods can be able to utilise the distinctive details obtained from spatial
coloured textural patterns of various spectral components within the particular
local image region. Furthermore, to acquire the benefit of harmonising
effect through joint colour texture information, the oppugant colour textural
features that obtain the texture patterns of spatial interactions among spectral
planes are also integrated in to the creation of CLC and CLMP. Extensive and
comparative experiments have been conducted on two benchmark databases,
i.e., Corel-1k, MIT VisTex. Retrieval results show that image retrieval using
colour local texture features yields better precision and recall than retrieval
approaches using either by colour or texture features.
Keywords: Content based image retrieval system; Curvelet transform; Local mesh pattern; Color local curvelets; Color local mesh pattern.
FUZZY BASED AUTOMATED INTERRUPTION TESTING MODEL FOR MOBILE APPLICATIONS
by Malini A, K. . Sundarakantham, C. Mano Prathibhan, A. Bhavithrachelvi
Abstract: Testing of mobile applications during the occurrence of interrupts is
termed as interrupt testing. Interrupts can occur either internally within the
mobile or from other external factors or systems. Interruption in any smart
phones may decrease the performance of mobile applications. In this paper, an
automated interruption testing model is proposed to analyse the responsiveness
of mobile applications during interrupts. This model monitors the applications
installed in the mobile devices and evaluates the overall performance of mobile
applications during interrupt using fuzzy logic. An enhanced MobiFuzzy
evaluation system (MFES) is proposed that is used to dynamically analyse the
test results and identify necessary information required for tuning the
application. Fuzzy logic will help the developers or testers in tuning the
application performance; by automatically categorising the impact
Keywords: Mobile application testing; Interrupt testing; Application tracker; Performance testing.
Evolution of Singular Value Decomposition in Recommendation Systems : A Review
by Rachana Mehta, Keyur Rana
Abstract: Proliferation of internet and web applications has led to
exponential growth of users and information over web. In such information
overload scenarios, recommender systems have shown their prominence by
providing user with information of their interest. Recommender systems
provide item recommendation or generate predictions. Amongst the various
recommendation approaches, collaborative filtering techniques have emerged
well because of its wide item applicability. Model-based collaborative
filtering techniques which use parameterised model for prediction are more
preferred as compared to their memory-based counterparts. However, the
existing techniques deals with static data and are less accurate over sparse,
high dimensional data. In order to alleviate such issues, matrix factorisation
techniques like singular value decomposition are preferred. These techniques
have evolved from using simple user-item rating information to auxiliary
social and temporal information. In this paper, we provide a comprehensive
review of such matrix factorisation techniques and their applicability to
different input data.
Keywords: Recommendation System; Collaborative filtering; Matrix factorization;Singular Value Decomposition; Information retrieval;Data mining;Auxiliary information; Latent features;Model learning;Data sparsity.
Investigating Different Fitness Criteria for Swarm-based Clustering
by Maria P.S. Souza, Telmo M. Silva Filho, Getulio J.A. Amaral, Renata M.C.R. Souza
Abstract: Swarm-based optimisation methods have been previously used for
tackling clustering tasks, with good results. However, the results obtained by
this kind of algorithm are highly dependent on the chosen fitness criterion.
In this work, we investigate the influence of four different fitness criteria
on swarm-based clustering performance. The first function is the typical
sum of distances between instances and their cluster centroids, which is the
most used clustering criterion. The remaining functions are based on three
different types of data dispersion: total dispersion, within-group dispersion
and between-groups dispersion. We use a swarm-based algorithm to optimise these criteria and perform clustering tasks with nine real and artificial
datasets. For each dataset, we select the best criterion in terms of adjusted
Rand index and compare it with three state-of-the-art swarm-based clustering
algorithms, trained with their proposed criteria. Numerical results confirm the
importance of selecting an appropriate fitness criterion for each clustering
Keywords: Swarm Optimisation; Fitness criterion; Clustering; Artificial Bee Colony; Particle Swarm Optimisation.
A Combined PFCM and Recurrent Neural Network based Intrusion Detection System for Cloud Environment
by Manickam M., N. Ramaraj, C. Chellappan
Abstract: The main objective of this paper is intrusion detection system for a
cloud environment using combined PFCM-RNN. Traditional IDSs are not
suitable for cloud environment as network-based IDSs (NIDS) cannot detect
encrypted node communication, also host-based IDSs (HIDS) are not able to
find the hidden attack trail. The traditional intrusion detection is largely
inefficient to be deployed in cloud computing environments due to their
openness and specific essence. Accordingly, this proposed work consists of two
modules namely clustering module and classification module. In clustering
module, the input dataset is grouped into clusters with the use of possibilistic
fuzzy C-means clustering (PFCM). In classification module, the centroid from
the clusters is given to the recurrent neural network which is used to classify
whether the data is intruded or not. For experimental evaluation, we use the
benchmark database and the results clearly demonstrate the proposed technique
outperformed conventional methods.
Keywords: cloud computing; intrusion detection system; Possibilistic Fuzzy C-means clustering; recurrent neural network.
Master node fault tolerance in distributed big data processing clusters
by Ivan Gankevich, Yuri Tipikin, Vladimir Korkhov, Vladimir Gaiduchok, Alexander Degtyarev, A. Bogdanov
Abstract: Distributed computing clusters are often built with commodity
hardware which leads to periodic failures of processing nodes due to
relatively low reliability of such hardware. While worker node fault-tolerance
is straightforward, fault tolerance of master node poses a bigger challenge.
In this paper master node failure handling is based on the concept of master
and worker roles that can be dynamically re-assigned to cluster nodes along
with maintaining a backup of the master node state on one of worker
nodes. In such case no special component is needed to monitor the health
of the cluster while master node failures can be resolved except for the
cases of simultaneous failure of master and backup. We present experimental
evaluation of the technique implementation, show benchmarks demonstrating
that a failure of a master does not affect running job, and a failure of backup
results in re-computation of only the last job step.
Keywords: parallel computing; Big Data processing; distributed computing; backup node; state transfer; delegation; cluster computing; fault-tolerance.
The integration of a newly defined N-gram concept and vector space model for documents ranking
by Mostafa Salama, Wafaa Salah
Abstract: Vector space model (VSM) is used in measuring the similarity
between documents according to the frequency of common words among
them. Furthermore, the N-gram concept is integrated in VSM to put
into consideration the relation between common consecutive words in the
documents. This approach does not consider the context and semantic
dependency between nonconsecutive words existing in the same sentence.
Accordingly, the approach proposed here presents a new definition of the
N-gram concept as N non-consecutive words located in the same sentence,
and utilises this definition in the VSM to enhance the measurement of
the semantic similarity between documents. This approach measures and
visualises the correlation between the words that are commonly existing
together within the same sentence to enrich the analysis of domain experts.
The results of the experimental work show the robustness of the proposed
approach against the current ranking models.
Keywords: N-gram; vector space model; Text Mining.
MODELLING AND SIMULATION OF ANFISBASED MPPT FOR PV SYSTEM WITH MODIFIED SEPIC CONVERTER
by M. Senthil Kumar, P.S. Manoharan, R. Ramachandran
Abstract: This paper presents modelling and simulation of artificial
neuro-fuzzy inference system (ANFIS) based maximum power point tracking
(MPPT) algorithm for PV system with modified SEPIC converter. The
conventional existing MPPT methods are having major drawbacks of high
oscillations at maximum power point and low efficiency due to uncertain
nature of solar radiation and temperature. These mentioned problems can be
solved by the proposed adaptive (ANFIS) based MPPT. The proposed work
involves ANFIS and modified single ended primary inductor converter
(SEPIC) to extract maximum power from PV panel. The results obtained from
proposed methodology are compared with other MPPT algorithms such as
perturb and observe (P&O), incremental conductance (INC) and radial basis
function network (RBFN). The improvement in voltage rating of modified
SEPIC is compared with conventional SEPIC converter. The result confirms
the superiority of the proposed system.
Keywords: ANFIS;INC;Modified SEPIC;P&O;RBFN.
Distributed Algorithms for Improved Associative Multilabel Document Classification considering Reoccurrence of Features and handling Minority Classes
by Preeti Bailke, S.T. Patil
Abstract: Existing work in the domain of distributed data mining mainly
focuses on achieving the speedup and scaleup properties rather than improving
performance measures of the classifier. Improvement in speedup and scaleup is
obvious when distributed computing platform is used. But its computing power
should also be used for improving performance measures of the classifier. This
paper focuses on the same by considering reoccurrence of features and
handling minority classes. Since it is very time consuming to run such complex
algorithms on large datasets sequentially, distributed versions of the algorithms
are designed and tested on the Hadoop cluster. Base associative classifier is
designed based on multi-class, multi-label associative classification (MMAC)
algorithm. Since no similar distributed algorithms exist, proposed algorithms
are compared with the base classifier and have shown improvement in classifier
Keywords: Multilabel associative classifier; Hadoop; Pig; Feature reoccurrence; Minority Class; Distributed Algorithm.
A survey on time series motif discovery
by Cao Duy Truong, Duong Tuan Anh
Abstract: Time series motifs are repeated subsequences in a long time series.
Discovering time series motifs is an important task in time series data mining
and this problem has received significant attention from researchers in data
mining communities. In this paper, we intend to provide a comprehensive
survey of the techniques applied for time series motif discovery. The survey
also briefly describes a set of applications of time series motif in various
domains as well as in high-level time series data mining tasks. We hope that
this article can provide a broad and deep understanding of the time series motif
Keywords: time series; motif discovery; window-based; segmentation-based; motif applications.
Trust Management Scheme for Authentication in Secure Cloud Computing Using Double Encryption Method
by P. Sathishkumar, V. Venkatachalam
Abstract: In cloud computing and banking, the consumer as well as supplier
required for their service as protection and confidence. In this document
suggest the belief value oriented verification procedure by the aid of encryption
procedure, this verification segment bank marketing database are measured to
the kernel fuzzy c-means clustering (KFCM) method. Clustered datas are
accumulated in the cloud to the confidence data verification procedure. In the
verification segment, the consumer verification is confirmed and acquires the
verification key then encrypts the file by the double encryption algorithm.
Primarily the confidence finest data implemented homomorphic encryption to
encrypt the data by blowfish algorithm and then encrypted data are
accumulated in cloud data core. This procedure oriented the banking data will
be steadily legalised in cloud computing procedure. The outcomes are
exemplify the improved encryption time and extremely legitimate the data in
Keywords: Authentication; Cloud Security; Cloud Services; Trust Management; clustering; cloud computing; encryption and decryption.
Technology in its Context
by Tobias Christian Fischer
Abstract: The purpose of the literature review is to identify characteristics,
concepts, and theories of business intelligence (BI). The status quo of BI is
based on the literature review, which covers 86 journal articles from the three
areas of accounting, strategy, and information systems between 2006 and 2014.
The review combines two established frameworks to illustrate new insights
regarding the macro and micro levels of BI. The complementary combination
of both levels produces a new lens that shows the conceptualisation and
characteristics of BI in a holistic view. The result of the study shows that BI is
used as a monolithic concept and static tool with technical control mechanisms.
Another result implies that BI is in a phase of maturity, in which it fulfils an
organisational purpose without considering its social context or ecosystem in
which it occurs. The literature review contributes to the characterisation and
theorisation of BI and shows that a company depend on both characteristics and
the purpose for which BI is used.
Keywords: accounting; business intelligence; conceptualization; information systems; literature review; macro level; measurement system; micro level; strategy; technology .
Trajectory tracking of the robot end-effector for the minimally invasive surgeries
by Jose De Jesus Rubio, Panuncio Cruz, Enrique Garcia, Cesar Felipe Juarez, David Ricardo Cruz, Jesus Lopez
Abstract: The surgery technology has been highly investigated, with the
purpose to reach an efficient way of working in medicine. Consequently,
robots with small tools have been incorporated in many kind of surgeries
to reach the following improvements: the patient gets a faster recovery, the
surgery is not invasive, and the robot can access to the body occult parts. In
this article, an adaptive strategy for the trajectory tracking of the robot end
effector is addressed; it consists of a proportional derivative technique plus
an adaptive compensation. The proportional derivative technique is employed
to reach the trajectory tracking. The adaptive compensation is employed to
reach approximation of some unknown dynamics. The robot described in this
study is employed in minimally invasive surgeries.
Keywords: Trajectory tracking; robot; minimal invasive surgery.
Multi Label Learning Approaches for Multi Species Avifaunal Occurrence Modelling: A Case Study of South Eastern Tamil Nadu
by Appavu Alias Balamurugan, P.K.A. Chitra, S. Geetha
Abstract: Many multi label problem transformation (PT) and algorithm
adaptation (AA) methods need to be explored to get good candidate for
avifaunal occupancy modelling. This research contrasted eight commonly used
state-of-the-art PT and AA multi label methods. The data was created by
collecting January 2014December 2014 records from e-bird repository for the
study area Madurai district, south eastern Tamil Nadu. The analysis shows that
classifier chain (CC) and multi label naive Bayes (MLNB) are the good
aspirants for avifauna data. The MLNB did best with 0.019 hamming loss and
90% average precision. To the best of our knowledge this is the first time to use
MLNB for avifaunal data and the results of multi label naive Bayes concludes
that out of 143 species observed, six species had high occurrence rate and 68
species had low occurrence rate.
Keywords: Species distribution models; multi species; multi label Learning; Multi Label Naive Bayes; Central part of southern Tamil Nadu.
Analytics on Talent Search Examination Data
by Anagha Vaidya, Vyankat Munde, Shailaja Shirwaikar
Abstract: Learning analytics and educational data mining has greatly
supported the process of assessing and improving the quality of education.
While learning analytics has a longer development cycle, educational data
mining suffers from the inadequacy of data captured through learning
processes. The data captured from examination process can be suitably
extended to perform some descriptive and predictive analytics. This paper
demonstrates the possibility of actionable analytics on the data collected from
talent search examination process by adding to it some data pre-processing
steps. The analytics provides some insight into the learners characteristics
and demonstrates how analytics on examination data can be a major support
for bringing the quality in education field.
Keywords: Learning Analytics; Educational Data Mining; clustering; linear modelling.
A fast clustering approach for large multidimensional data
by Hajar Rehioui, Abdellah Idrissi
Abstract: Density-based clustering is a strong family of clustering methods. The
strength of this family is its ability to classify data of arbitrary shapes and to
omit the noise. Among them density-based clustering (DENCLUE), which is
one of the well-known powerful density-based clustering methods. DENCLUE is
based on the concept of the hill climbing algorithm. In order to find the clusters,
DENCLUE has to reach a set of points called density attractors. Despite the
advantages of DENCLUE, it remains sensitive to the growth of the size of data
and of the dimensionality, in the fact that the density attractors are calculated of
each point in the input data. In this paper, in the aim to overcome the DENCLUE
shortcoming, we propose an efficient approach. This approach replaces the
concept of the density attractor by a new concept which is the hyper-cube
representative. The experimental results, provided from several datasets, prove
that our approach finds a trade-off between the performance of clustering and the
fast response time. In this way, the proposed clustering methods work efficiently
for large of multidimensional data.
Keywords: Large Data; Dimensional Data; Clustering; Density based clustering; DENCLUE.
CBRec: a book recommendation system for
children using the matrix factorisation and
content-based filtering approaches
by Yiu-Kai Ng
Abstract: Promoting good reading habits among children is essential, given
the enormous influence of reading on students development as learners and
members of the society. Unfortunately, very few (children) websites or online
applications recommend books to children, even though they can play a
significant role in encouraging children to read. Given that a few popular
book websites suggest books to children based on the popularity of books or
rankings on books, they are not customised/personalised for each individual
user and likely recommend books that users do not want or like. We have
integrated the matrix factorisation approach and the content-based approach,
in addition to predicting the grade levels of books, to recommend books for
children. Recent research works have demonstrated that a hybrid approach,
which combines different filtering approaches, is more effective in making
recommendations. Conducted empirical study has verified the effectiveness of
our proposed children book recommendation system.
Keywords: Book recommendation; matrix factorisation; content analysis; children.
Enhancing Purchase Decision using Multi-word Target Bootstrapping with Part-of-Speech Pattern Recognition Algorithm
by M. Pradeepa Sivaramakrishnan, C. Deisy
Abstract: In this research work, multi-word target related terms are extracted
automatically from the customer reviews for sentiment analysis. We used LIDF
measure and have proposed a novel measure called, TCumass in iterative
multi-word target (IMWT) bootstrapping algorithm. In addition, part-of-speech
pattern recognition (PPR) algorithm has been proposed to identify the
appropriate target and emotional words from multi-word target related terms.
This article aims to bring out both implicit and explicit targets with their
corresponding polarities in an unsupervised manner. We proposed two models
namely, MWTB without PPR and MWTB with PPR. Thus, the present research
illustrates the comparison between the proposed works and the existing
multi-aspect bootstrapping (MAB) algorithm. The experiment has been done
based on different data sets and thereafter the performance evaluated using
different measures. From this study, the result expounds that MWTB with PPR
model performs well, having achieved the precise targets and emotional words.
Keywords: Bootstrapping; emotional polarity; multi-word target; Part-of-Speech (POS); sentiment analysis.
Probabilistic Variable Precision Fuzzy Rough Set Technique for Discovering Optimal Learning Patterns in E-learning
by Bhuvaneshwari K.S, D. Bhanu, S. Sophia, S. Kannimuthu
Abstract: In e-learning environment, optimal learning patterns are discovered
for realising and understanding the effective learning styles. The value of
uncertain and imprecise knowledge collected has to be categorised into classes
known as membership grades. Rough set theory is potential in categorising data
into equivalent classes and fuzzy logic may be applied through soft thresholds
for refining equivalence relation that quantifies correlation between each class
of elucidated data. In this paper, probabilistic variable precision fuzzy rough set
technique (PVPFRST) is proposed for deriving robust approximations and
generalisations that handles the types of uncertainty namely stochastic,
imprecision and noise in membership functions. The result infers that the
degree of accuracy of PVPFRST is 21% superior to benchmark techniques.
Result proves that PVPFRST improves effectiveness and efficiency in
identifying e-learners styles and increases the performance by 27%, 22% and
25% in terms of discrimination rate, precision and recall value than the
Keywords: Inclusion degree; Probabilistic fuzzy information system; fuzzy membership grade; Crispness coefficient; Probabilistic variable precision fuzzy rough set; Inclusion function.
Inferring the Level of Visibility from Hazy Images
by Alexander A. S. Gunawan, Heri Prasetyo, Indah Werdiningsih, Janson Hendryli
Abstract: In our research, we would like to exploit crowdsourced photos from
social media to create low-cost fire disaster sensors. The main problem is to
analyse how hazy the environment looks like. Therefore, we provide a brief
survey of methods dealing with visibility level of hazy images. The methods
are divided into two categories: single-image approach and learning-based
approach. The survey begins with discussing single image approach. This
approach is represented by visibility metric based on contrast-to-noise ratio
(CNR) and similarity index between hazy image and its dehazing image. This
is followed by a survey of learning-based approach using two contrast
approaches that is: 1) based on theoretical foundation of transmission light,
combining with the depth image using new deep learning method; 2) based on
black-box method by employing convolutional neural networks (CNN) on hazy
Keywords: Hazy image; visibility level; single image approach; learning based approach; social media.
The Complexity of Cluster-Connectivity of Wireless Sensor Networks
by H.K. Dai, H.C. Su
Abstract: Wireless sensor networks consist of sensor devices with limited
computational capabilities and memory operating in bounded energy
resources; hence, network optimisation and algorithmic development in
minimising the total energy or power while maintaining the connectivity of
the underlying network are crucial for their design and maintenance. We
consider a generalised system model of wireless sensor networks whose node
set is decomposed into multiple clusters, and show that the decision and the
associated minimisation problems of the connectivity of clustered wireless
sensor networks appear to be computationally intractable completeness and
hardness, respectively, for the non-deterministic polynomial-time complexity
class. An approximation algorithm is devised to minimise the number of end
nodes of inter-cluster edges within a factor of 2 of the optimum for the
Keywords: wireless sensor network; connectivity; spanning tree; nondeterministic polynomial-time complexity class; approximation algorithm.