Forthcoming articles

 


International Journal of Business Intelligence and Data Mining

 

These articles have been peer-reviewed and accepted for publication in IJBIDM, but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

 

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

 

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

 

Articles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

 

Register for our alerting service, which notifies you by email when new issues of IJBIDM are published online.

 

We also offer RSS feeds which provide timely updates of tables of contents, newly published articles and calls for papers.

 

International Journal of Business Intelligence and Data Mining (64 papers in press)

 

Regular Issues

 

  • STUDENTS SEARCH INTEREST MODEL OVER AN ORGANISATION BASED ON WEB LOG DATA.   Order a copy of this article
    by Sivakumaran A R, P. Marikkannu 
    Abstract: Browsing information through web has become part and parcel of life. There is hardly any user who does not browse through web. Generally any user when a search for content only looks at the top ten pages that get displayed in the web search. Therefore it has been proposed that the information such as the link that is created between both visited and unvisited web pages along with the path that is chosen in the search query needs a novel technique to give the best performance. The Operation Feature matrix (OFM) is used as one of the novel functionalities that have been used for extracting the data from the web. The automatically identified user profile is the graph based that is called as the Modified Page Outlook (MPO) graph was proposed that involves a link between the visited and the unvisited web pages.
    Keywords: Web mining; Web log data Inter relation; Intra-relation; Total number of paths; Total number of relations Modified Page Outlook.
    DOI: 10.1504/IJBIDM.2016.10002431
     
  • Risk Assessment and Management (RAM) in Enterprise Resource Planning (ERP) by advanced system Engineering theory   Order a copy of this article
    by Valanarasu R, A. Christy 
    Abstract: An ambitious task is to conduct a project related to enterprise resource planning (ERP). In any of the ERP business enterprises, the technological, psychosomatic and sociological characteristics are included. Managing such characteristics in any enterprises is high complex. The ERP as such categorised on the basis of implementation, software, supply chain management, resources, management and optimization. These characteristics are high risk to handle. The system still seems to be a growing system that has to be moulded into many forms. The aim of this manuscript is to present a risk assessment and management (RAM) in enterprise resource planning (ERP) by advanced system engineering theory.
    Keywords: Enterprise Resource Planning; Risk management; Risk Assessment; Risk Value.
    DOI: 10.1504/IJBIDM.2016.10002433
     
  • SEMANTIC WEB SERVICE DISCOVERY FOR MOBILE WEB SERVICES   Order a copy of this article
    by Bhuvaneswari A, G.R. Karpagam 
    Abstract: The process of service discovery is the most important task in web services. But this service discovery process may degrade network performance due to the mobile environment. To overcome these issues, a new approach called semantic mobile web services (SMWS) is proposed. This proposed semantic mobile web services is used for a better discovery process even in the high mobile environment. By using the query request (QRY REQ), response (RSP) packets, user can identify the location of the mobile node as well as discover the web services with the minimum utilisation of bandwidth resources. During the service discovery process, a service registrar is included between service requester and service provider to reduce overhead. The process of matchmaking produces the exact match responses for the respective requestor queries. This helps to increase the quality of performance and network efficiency. Simulation has analysed the performance of the proposed semantic mobile web services.
    Keywords: Semantic Mobile Web Services (SMWS); Service Discovery; Matchmaking; Web Service Layered Architecture; Web Service Description Language (WSDL).
    DOI: 10.1504/IJBIDM.2017.10003120
     
  • SURVEY OF VARIOUS METHODS FOR DIAGNOSTIC SIGNATURES FOR CUTANEOUS MELANOMA FROM GENETIC AND IMAGING DATA   Order a copy of this article
    by K. Thenmozhi, M. Rajesh Babu 
    Abstract: Early diagnosis of cutaneous melanoma is very hard for experienced dermatologists. Even though a lot of advanced imaging techniques and clinical diagnostic algorithms such as dermoscopy and the ABCD rule of dermoscopy respectively are available. The accuracy is an issue of distress (estimated to be about 75--85%) especially with oblique pigmented lesions. An effective diagnosis can be achieved by reducing the viewer variabilitys found in dermatologists examinations. In order to improve some of existing methods and budding new techniques to ease accurate, fast and reliable diagnosis of cutaneous melanoma. In this paper different types diagnostic system of melanoma namely, pre-processing feature extraction, feature selection and classification is explained. The results of feature selection were optimised from advanced classes of classification techniques; namely, two weighted k-nearest neighbour (k-NN) classifiers (k = 1, 30), a decision tree (DT), and the random forest (RF) algorithm are employed.
    Keywords: Classification; composite biomarkers; Cutaneous Melanoma; dermoscopy and feature selection.
    DOI: 10.1504/IJBIDM.2017.10003508
     
  • Detecting and Ranking events in Twitter using Diversity Analysis   Order a copy of this article
    by Daoud Daoud 
    Abstract: In Twitter and in other social media channels, detecting events is very important and has many applications. However, this task is very challenging because of the huge number of tweets that are posted every minute and the massive scale of the spamming activities. In this paper, we present an innovative approach for detecting events using data posted to Twitter. The proposed approach is based on the concept of users attention by quantitatively modelling the diversity of hashtags using Shannons index. Our method records the diversity values on an hourly basis time-series. Using statistical techniques, the method locates the intervals having diversity values that fall outside the range of forecasted ones (normal state). We also present the labelling and ranking techniques that are implemented in this research. Experimental results on a dataset consisting of 15 million Arabic tweets show that our proposed approach can effectively detect real-world events in Twitter.
    Keywords: social media; event detection; diversity index; Twitter; Arabic; hashtags; time-series analysis; z-score; events labelling; events ranking.
    DOI: 10.1504/IJBIDM.2017.10003611
     
  • Developed global biotic cross pollination algorithm for CIS   Order a copy of this article
    by Sasikala Rani K, D. Rasi, S.N. Deepa 
    Abstract: This paper focuses on the visual-based colour image segmentation with a global biotic cross pollination algorithm (GBCPA). The global biotic cross pollination algorithm segments the structurally challenging objects based on the colour, edge, entropy and edge information in the CIE L*a*b* colour space. The L*a*b* colour space is a colour-opponent space considered to approximate human vision. L* denotes the luminosity or brightness layer, chromaticity layer a* indicates colour falls along red-green axis and chromaticity layer b* indicates the blue-yellow axis. The FPO algorithm considering the global biotic cross pollination is proposed to improve the quality of the solution and computational speed. GBCPA is first introduced to find the locality of the solution. The performance of GBCPA is tested on a standard Berkeley segmentation dataset with 300 images. The dataset is illustrated under different evaluating strategies.
    Keywords: Global Biotic Cross Pollination Algorithm (GBCPA); artificial intelligence; dataset; Berkeley segmentation; entropy; Evolutionary Algorithms.
    DOI: 10.1504/IJBIDM.2017.10003631
     
  • UPFC damping controller design using multi-objective evolutionary algorithms   Order a copy of this article
    by Kannayeram G, P.S. Manoharan, M. Willjuice Iruthayarajan, T. Sivakumar 
    Abstract: In this paper, modified non-dominated sorting genetic algorithm-II (MNSGA-II)-based optimal damping control of unified power flow controller (UPFC) has been designed to enhance the damping of low frequency oscillations in power systems. The robust damping of UPFC controller design is formulated as multi-objective optimisation problem, thereby minimising the integral squared error (ISE) of speed deviation and input control signal (u) under wide range of operating conditions. The effectiveness of the proposed controller is confirmed through nonlinear time domain simulation and Eigen value analysis. The results are compared with NSGA-II and conventional method. Simulation result reveals that the obtained Pareto-front using MNSGA-II-based UPFC controllers are better and uniformly distributed due to the controlled elitism and dynamic crowding distance concepts. The proposed modulation index of shunt inverter (mE)-based damping controller is superior to the other damping controllers under different loading conditions and improves the stability of system.
    Keywords: Flexible AC Transmission Systems (FACTS); Unified Power Flow Controller (UPFC); Non-dominated sorting genetic Algorithm (NSGA-II); Modified NSGA-II (MNSGA-II); Integral Squared Error (ISE); Genetic Al.
    DOI: 10.1504/IJBIDM.2017.10003632
     
  • Performance Amelioration of Manets Using cooperative Routing with Cross-Layer Design   Order a copy of this article
    by S. Mylsamy, J. Premalatha 
    Abstract: In traditional routing there are various routing protocol used for MANETs depending on the environment. The opportunistic routing protocol is used as the basic protocol. In the CORBG location-based adaptive mechanism was created and improves the network performance. The basic function of OR is its ability to overhear the transmitted packet and to coordinate among relaying nodes. A new routing protocol named cooperative opportunistic routing based on geographic (CORBG) location has been implemented to get better network performance. In the proposed work network optimisation might be attained by CORBG protocol with cross-layer between network and transport layer that evaluates the performance based on the network QoS parameters like throughput, delay and energy consumption. In MANETs by reason of dynamic communications and their decentralised admin makes the network becomes more risk to have more attack. To prevent such network problems, security mechanism were introduced.
    Keywords: cross layer design; cooperative opportunistic routing based on geographic location; energy consumption.
    DOI: 10.1504/IJBIDM.2017.10003958
     
  • FIRA: FOCUSED INFORMATION RETRIEVAL ALGORITHM   Order a copy of this article
    by N. Kalpana, S. Appavu Alias Balamurugan 
    Abstract: Presently sight and sound information becomes available quickly because of the propelled interactive media catching gadgets, such as, computerised video recorder, portable camera, etc. As ordinary inquiry by-content recovery does not fulfil clients needs in finding the desired recordings viably, content-based video recovery stands out as the most sensible answer to enhance the recovery quality. Moreover, video recovery utilising inquiry by- picture is not effective in partnership with the recordings to the clients advantage. In this manuscript, we intend a creative strategy to accomplish the high calibre substance-based video recovery by finding the transient examples. On premise of the found lobbyist designs, a productive indexing procedure and a powerful grouping coordinating strategy are incorporated to diminish the calculation cost and to raise the recovery precision. Trial result uncover that our methodology is extremely encouraging in upgrading content-based video recovery with regard to proficiency and adequacy in NPTEL.
    Keywords: Text-based video retrieval; Activist pattern; string matching; Pattern-based search; Fast-pattern-index tree; NPTEL.
    DOI: 10.1504/IJBIDM.2017.10004072
     
  • Recognition of Sign Language using Image Processing   Order a copy of this article
    by Sandhya Arora, Ananya Roy 
    Abstract: According to World Health Organization, over 5% of the worlds population have hearing and speaking disabilities. The primary language of communication for people who are deaf and mute is the sign language. The proposed system aims to recognise the American Sign Language and converts it to text. Input given to the system is an image of the hand depicting the necessary alphabet. The histogram of the input image is then computed and checked for similarity with the histograms of pre-saved images by using the Bhattacharyya Distance Metric. Implementation of the system will be a small step in overcoming the social barrier of communication between the deaf-mute people and the people who do not understand sign language. OpenCV is used as a tool for implementing proposed system.
    Keywords: American Sign Language; Bhattacharya Distance Metric; OpenCV histogram, .
    DOI: 10.1504/IJBIDM.2017.10004374
     
  • AN EMPIRICAL APPROACH FOR COMPLEXITY REDUCTION AND FAULT PREDICTION FOR SOFTWARE QUALITY ATTRIBUTE   Order a copy of this article
    by Rajkumar , Viji , S. Duraisamy 
    Abstract: Designing the high-quality software is difficult one due to the high complexity and fault prone class. To reduce the complexity and predict the fault-prone class in the object orient software design, proposed a new empirical approach. This proposed approach concentrates more on to increase the software quality in the object oriented programming structures. This technique will collect the dataset and metric values from CK based metrics. And then complexity will be calculated based on the weighted approach. The fault prediction will be done, based on the low usage of the dataset and high complexity dataset. This helps to increase the software quality. In simulation section, the proposed approach has performed and analyse the parameters such as accuracy, fairness, recall, prediction rate and efficiency. Experimental results have shown that the proposed approach increases the prediction rate, accuracy and efficiency.
    Keywords: Complexity reduction; Fault prediction; Software design; Software Quality; CK based metrics.
    DOI: 10.1504/IJBIDM.2017.10004682
     
  • Data mining of unstructured big data in cloud computing   Order a copy of this article
    by A.K. Reshmy, D. Paulraj 
    Abstract: Hadoop Distributed File System, Talend, MapReduce (MR), YARN and Cloudera model have gotten to be prevalent advancements for expansive scale information association and investigation. In our work, we distinguish the prerequisites of the covered information association and propose an augmentation to the present programming model, called Comprehensive Hadoop Distributed File System along with MapReduce (C-HDFS-MR), to address them. The expanded interfaces is exhibited as application programming interface and actualized with regards to Image Processing application space. In our work, we show viability of C-HDFS-MR through contextual investigations of picture handling capacities along with the outcomes. Despite the fact that C-HDFS-MR has minimal overhead in information stockpiling and I/O operations, it enormously upgrades the framework execution and improves the application advancement process. Our proposed framework, C-HDFS-MR, works in the absence of progressions for the current prototypes, and is used by numerous applications to prerequisite of covered information.
    Keywords: Big data; MapReduce (MR); Hadoop; Comprehensive Hadoop Distributed File System along with MapReduce (C-HDFS-MR); Medical Image Processing; Analysis; and Visualization (MIPAV).
    DOI: 10.1504/IJBIDM.2017.10004683
     
  • A tabu search strategy to solve cell formation problem with ratio level data   Order a copy of this article
    by R. Kamalakannan, R. Sudhakara Pandian 
    Abstract: This paper concentrates on the cell formation problem for the ratio level data to the design of cellular manufacturing system The aim of this paper is to identify the machine cells and part family and as a result to create production cells in order to reduce the cell load variation A competent Tabu Search (TS) algorithm is proposed to investigate the search space of all possible solutions with a chain of moves This method is an iterative process for seeking a global optimum for the discrete combinatorial optimization problems The ratio level data is calculated in terms of time in seconds based on the data collected from the processing time of the part, production volume of the part and availability of the machine. The results clearly indicated that this proposed tabu search yield good results compared to the chosen benchmark problems.
    Keywords: Tabu Search Algorithm; Cell Formation Problem; Ratio Level Data; Modified Grouping Efficiency.
    DOI: 10.1504/IJBIDM.2017.10004684
     
  • Efficient Hardware Architecture for Integer Implementation of Multi-Alphabet Arithmetic Coding for Data Mining   Order a copy of this article
    by S.D. Jayavathi, A. Shenbagavalli, B. Ganapathy Ram 
    Abstract: The aim of this paper is to create an efficient hardware architecture for the Multi-alphabet arithmetic coding(MA-AC) in semicustom and full custom Application Specific Integrated Circuit(ASIC). The proposed hardware architectures are synthesized in Xilinx and Altera Field Programmable Gate Array (FPGA) devices to evaluate resource utilization and speed. Also, the physical design is encountered as ASIC device using Cadence Design environment tsmc0.18?m technology which shows area reduction of 12.75% and 23.61% and power consumption of 29.86% and 38.89% for encoder and decoder respectively.
    Keywords: Multi-Alphabet Arithmetic coder; Encoder; Decoder; State diagram; Field Programmable Gate Array (FPGA); Application Specific Integrated Circuit (ASIC).
    DOI: 10.1504/IJBIDM.2017.10004685
     
  • FREQUENT PATTERN SUB-SPACE CLUSTERING OPTIMIZATION (FPSSCO) ALGORITHM FOR DATAMINING FROM LARGE DATA BASE   Order a copy of this article
    by T. Sheik Yousuf, M. Indra Devi 
    Abstract: Data mining environment give a quick response to the user by fast and correctly pick-out the item from the large database is a very challenging task. Previously multiple algorithms were proposed to identify the frequent item since they are scanning database at multiple times. To overcome those problems we proposed Rehashing based Apriori Technique in which hashing technology is used to store the data in horizontal and vertical formats. Rehash Based Apriori uses hashing function to reduce the size of candidate item set and scanning of database, eliminate non-frequent items and avoid hash collision. After finding frequent item sets perform level wise subspace. We instigate Generalized Self Organized Tree based (GSTB) mechanism to adaptively selecting root to construct the path from the cluster head to neighbors when constructing the tree. Our experimental results show that our proposed mechanisms reduce the computational time of overall process.
    Keywords: Sub-space clustering; GSTB (Generalized Self organized Tree Based Cluster Head selection).
    DOI: 10.1504/IJBIDM.2017.10004686
     
  • Prediction parameters in Nano Fiber Composite Membrane for Effective Air Filtration Using Optimal Neural network   Order a copy of this article
    by V.S. Kandavel, Gabriel Mohan Kumar 
    Abstract: The capacity to build up steady and extensive trench structures by means of headed for great degree thin fibers would have wide innovative ramifications. Here we report a procedure to plan and make sandwich organized polyamide-6/polyacrylonitrile/polyamide-6 (PA-6/PAN/PA-6) composite membrane is considered.This is sensible for powerful air filtration via consecutive electro spinning by coordinating the elements of parts to foresee the distinctive mechanical properties with help of optimal weight of ANN structure.Distinctive inspired optimization strategies are used to touch base at the optimal weight of the ANN procedure. All the ideal results exhibit the way that the accomplished error values between the yield of the exploratory qualities and the anticipated qualities are firmly equivalent to zero in the outlined network.In addition, the most intense filtration accuracy and lower pressure drops furthermore the result demonstrates the base error of 96.72% dictated by the ANN. This is accomplished by the AFSO) strategies.
    Keywords: Nanonets; Composite membrane; High efficiency; neural network and optimization techniques.
    DOI: 10.1504/IJBIDM.2017.10004689
     
  • Behaviour-based Analysis of Tourism Demand in Egypt   Order a copy of this article
    by Taheya H. Ahmed, Mervat Abu-Elkheir, Ahmed Abou Elfetouh Saleh 
    Abstract: Tourism demand is the total number of persons who travel, or wish to travel, to use tourists' facilities and services at places away from their places of work or residence. Analysis of tourism demand helps companies understand tourists' needs and improves their marketing strategies. Current research for predicting tourism demand is targeted at foreign countries, and the little research targeted at predicting tourism demand in Egypt is based on macro forecasting and not on understanding the collective behavior of tourists. In this paper, we devise different granularities from tourist data that we collect and use these different granularities to provide different levels of demand prediction. We develop a hybrid prediction framework to analyze tourists behavior and infer behavior rules. These rules will act as recommendations that help to understand tourists' behavior and their needs, and define future policies regarding tourism in Egypt.
    Keywords: Tourist Demand; Clustering; Data Mining; Cobweb; Classification; Egyptian Tourism.
    DOI: 10.1504/IJBIDM.2017.10004783
     
  • A Multi-Objective Analysis Model in Mass Real Estate Appraisal   Order a copy of this article
    by Benedetto Manganelli, Pierfrancesco De Paola, Vincenzo Del Giudice 
    Abstract: The purpose of this research is to analyse the performance of a real estate valuation model based on the Multi Objective Decision Making methods. The optimal price function is achieved with the goal programming model. The price function which is described as the sum of the individual objectives (criteria), and the goals are the prices of comparable properties. The model integrates with the inductive and deductive approach overcomes many of the assumptions of the best known statistical approaches. The evaluation of the proposed model is performed by comparing the results obtained by the application, to the same case study, of a multiple linear regression model and a no-linear regression method based on Penalized Spline Smoothing model. The comparison shows, first of all, the best interpretation capabilities of the proposed model.
    Keywords: Goal programming; multi-criteria; real estate market; multi objective decision making.
    DOI: 10.1504/IJBIDM.2017.10004784
     
  • Information graph-based creation of parallel que-ries for databases   Order a copy of this article
    by Yulia Shichkina, Dmitry Gushchanskiy, Alexander Degtyarev 
    Abstract: The article describes the query parallelization method that takes into account the dependencies between operations in the data query. The method is based on the representation of the query as a directed graph with vertices as operations and edges as data connections. The graph is processed as an adjacency list, saving more memory than during processing a sparse adjacency matrix. The graph is modified only by operations, which do not change the elements of the adjacency list. Therefore it is possible to achieve intra-query parallelism by consideration of a request structure and implementation of mathematical methods of parallel calculations for its equivalent transformation. This article also presents an example of complex query parallelisation and describes applicability of the graph theory and methods of parallel computing both for query parallelisation and optimisation.
    Keywords: parallel computing; optimization methods; relational database; query; information graph; query parallelisation.
    DOI: 10.1504/IJBIDM.2017.10004785
     
  • OLAP technology and Machine learning as the tools for validation of the Numerical Models of Convective Clouds   Order a copy of this article
    by Elena N. Stankova, Andrey V. Balakshiy, Dmitry A. Petrov, Vladimir V. Korkhov 
    Abstract: In the present work we use the technologies of machine learning and OLAP for more accurate forecasting of such phenomena as a thunderstorm, hail, heavy rain, using the numerical model of convective cloud. Three methods of machine learning: Support Vector Machine, Logistic Regression and Ridge Regression are used for making the decision on whether or not a dangerous convective phenomenon occurs at present atmospheric conditions. The OLAP technology is used for development of the concept of multidimensional data base intended for distinguishing the types of the phenomena (thunderstorm, heavy rainfall and light rain). Previously developed complex information system is used for collecting the data about the state of the atmosphere and about the place and at the time when dangerous convective phenomena are recorded.
    Keywords: OLAP; online analytical processing; machine learning; validation of numerical models; numerical model of convective cloud; weather forecasting; thunderstorm; multidimensional data base; data mining.
    DOI: 10.1504/IJBIDM.2017.10004787
     
  • Understanding urban development types and drivers in Wallonia. A multi-density approach   Order a copy of this article
    by Ahmed Mustafa, Ismaïl Saadi, Mario Cools, Jacques Teller 
    Abstract: In this study, urban development process in the Walloon region (Belgium) has been analysed. Two main aspects of development are quantitatively measured: the development type and the definition of the main drivers of the urbanisation process. Unlike most existing studies that consider the urban development as a binary process, this research considers the urban development as a continuous process, characterized by different levels of urban density. Eight urban classes are defined based on the Belgian cadastral data for years 2000 and 2010. A multinomial logistic regression model is employed to examine the main driving forces of the different densities. Sixteen drivers were selected, including accessibility, geo-physical features, policies and socio-economic factors. Finally, the changes from the non-urban to one of the urban density classes are detected and classified into different development types. The results indicate that zoning status (political factor), slope, distance to roads, population densities and mean land price respectively have impact on the urbanization process whatever maybe the density. The results also show that the impact of these factors highly varies from one density to another.
    Keywords: urban development; urban density; development type; driving forces; multinomial logistic regression model; cadastral data.
    DOI: 10.1504/IJBIDM.2017.10004788
     
  • Epsilon-Fuzzy Dominance Sort Based Composite Discrete Artificial Bee Colony optimisation for Multi-Objective Cloud Task Scheduling Problem   Order a copy of this article
    by Gomathi B, Karthikeyan Krishnasamy, B. Saravana Balaji 
    Abstract: Cloud computing environment provides on-demand virtualized resources for cloud application. The scheduling of tasks in cloud application is well known NP-hard problem. The Task scheduling problem is more complicated while satisfying multiple objectives, which are conflict in nature. In this paper, Epsilon-fuzzy Dominance based Composite Discrete Artificial Bee Colony (EDCABC) approach is used to generate Pareto optimal solutions for multi-objective task scheduling problem in cloud. Three conflicting objectives, such as makespan, execution cost and resource utilization, are considered for task scheduling problem. The Epsilon-fuzzy dominance sort approach is used to choose the best solutions from the Pareto optimal solution set in the multi-objective domain. EDCABC with composite mutation strategies and fast local search method are used to enrich the local searching behaviors which help to avoid the premature convergence. The performance and efficiency of the proposed algorithm is compared with NSGA-II and MOPSO algorithms. The simulation results express that proposed EDCABC algorithm substantially minimizes the makespan, execution cost and ensures the proper resource utilization when compare to specified existing algorithm.
    Keywords: Task scheduling; Discrete Artificial Bee Colony; Cloud computing; Makespan; Execution cost; Fuzzy Dominance.
    DOI: 10.1504/IJBIDM.2017.10004803
     
  • Haphazard, Enhanced Haphazard and Personalised Anonymisation for Privacy Preserving Data Mining on Sensitive Data Sources   Order a copy of this article
    by Prakash M, G. Singaravel 
    Abstract: Privacy preserving data mining is a fast growing new era of research due to recent advancements in information, data mining, communications and security technologies. Government agencies and many other non-governmental organisations often need to publish sensitive data that contain information about individuals. The important problem is publishing data about individuals without revealing sensitive information about them. A breach in the security of a sensitive data may expose the private information of an individual, or the interception of a private communication may compromise the security of a sensitive data. The objective of the research is to publish data without revealing the sensitive information of individuals, at the same time the miner need to discover non-sensitive knowledge. To achieve the above objective, haphazard anonymisation, enhanced haphazard anonymisation and personalised anonymisation are proposed for privacy and utility preservation. The performances are evaluated based on vulnerability to attacks, efficiency and data utility.
    Keywords: Anonymisation; Big Data; Data Analytics; Data Mining; Data Publishing; Microdata; Privacy Preserving Techniques; Privacy Preserving; Privacy; Sensitive Data Publishing.
    DOI: 10.1504/IJBIDM.2017.10004861
     
  • Proposal and Examination of the FLAP Algorithm   Order a copy of this article
    by Daniel Giterman, Eyal Brill 
    Abstract: In real classification problems, common learning algorithms generally fail to describe instances that require complicated classification logic. Additionally, it is often difficult to ensure a satisfying amount of classified data for their training. In this work, we propose and examine a new learning algorithm that also integrates expert logic. Essentially, this algorithm takes advantage of unclassified data to produce a self-generated fuzzy inference system that is eventually used as a classifier. It also utilises a mere sample of classified data in order to compare various classifiers constructed from different algorithm options, thus finally achieving an assumingly more accurate result. As part of our study, this algorithm was compared with six well-known supervised learning algorithms such as artificial neural networks, support vector machine and random forest. We used the ten-fold cross-validation technique with Kappa statistic to assess algorithm performance. Subsequently, in order to find statistically significant dissimilarities among the algorithms, we used a two-tailed Friedman test. After the null hypothesis was rejected, we used a Nemenyi post-hoc test to prove differences between pairs of algorithms. Consequently, despite lacking in efficiency and scalability, our algorithm proved to be highly competitive and demonstrated excellent classification potential.
    Keywords: fuzzy logic; fuzzy inference systems; learning algorithms; semi-supervised learning; hybrid algorithms; data classification; algorithms comparison; statistical tests.
    DOI: 10.1504/IJBIDM.2017.10004943
     
  • Modelling Economic Choice under Radical Uncertainty: Machine Learning Approaches   Order a copy of this article
    by Antov Gerunov 
    Abstract: This paper utilises a novel experimental dataset on consumer choice to investigate and benchmark the performance of alternative statistical models under conditions of extreme uncertainty. We compare the results of logistic regression, discriminant analysis, na
    Keywords: choice; decision-making; social network; machine learning; uncertainty; social network; logistic regression; neural network; random forest; consumer choice; modeling.
    DOI: 10.1504/IJBIDM.2017.10004944
     
  • Estimation of Coffee Rust infection and growth through two-level classifier ensembles based on expert knowledge   Order a copy of this article
    by David Camilo Corrales Munoz, Emmanuel Lasso, Apolinar Figueroa Casas, Agapito Ledezma, Juan Carlos Corrales 
    Abstract: Rust is a disease that leads to considerable losses in the worldwide coffee industry. There are many contributing factors to the onset of coffee rust e.g. crop management decisions and the prevailing weather. In Colombia the coffee production has been considerably reduced by 31 % on average during the epidemic years compared with 2007. Recent research efforts focus on detection of disease incidence using computer science techniques such as supervised learning algorithms. However, a number of different authors demonstrate that results are not sufficiently accurate using a single classifier. Authors in the computer field propose alternatives for this problem, making use of techniques that combine classifier results. Nevertheless, the traditional approaches have a limited performance due to dataset absence. Therefore, we proposed two-level classifier ensembles for coffee rust infection and growth estimation in Colombian crops, based on expert knowledge.
    Keywords: coffee; rust; classifier; ensemble; dataset; expert; knowledge.
    DOI: 10.1504/IJBIDM.2017.10004945
     
  • Rough Set Theory-Based Feature Selection and FGA-NN Classifier for Medical Data Classification   Order a copy of this article
    by B. Vijayalakshmi, Sugumar Rajendran 
    Abstract: The prediction of heart disease is difficult task, which needs much experience and knowledge. In order to reduce the risk of heart disease prediction, in this paper we proposed a rough set theory-based feature selection and FGA-NN classifier. The overall process of the proposed system consists of two main steps, such as: 1) feature reduction; 2) heart disease prediction. At first, the kernel fuzzy c-means clustering with roughest theory (KFCMRS) algorithm is applied to the high dimensional data to reduce the dimension of the attribute. After that, the medical data classification is done through FGA-NN classifier. To improve the prediction performance, hybridisation of firefly and genetic algorithm (FGA) is utilised with NN for weight optimisation. At last, the experimentation is performed by means of Cleveland, Hungarian, and Switzerland datasets. The experimentation result proves that the FGA-NN classifier outperformed the existing approach by attaining the accuracy of 83%.
    Keywords: Heart disease; FGA-NN; KFCMRS; scaled conjugate gradient; prediction; feature reduction; optimisation.
    DOI: 10.1504/IJBIDM.2017.10005016
     
  • An Improved Incremental Algorithm for mining Weighted Class Association Rules   Order a copy of this article
    by B. Subbulakshmi, C. Deisy 
    Abstract: Constructing fast and accurate classifiers for large data sets is an important task in data mining. Associative Classification can produce more efficient and accurate classifiers than traditional classification techniques. Weighted Class Association Rule (WCAR) mining reflects significance of items by considering their weight. Moreover, real time databases are dynamic. This influences the need for incremental approach for classification. Existing incremental classification algorithms suffer from issues like longer execution time and higher memory usage. This paper proposes an algorithm which uses hash structure to store weighted frequent items and the concept of difference of object identifiers to compute the support faster. For mining incremental databases, pre-large concept is used to reduce the number of re-scans over the original database. The proposed algorithm was implemented and tested on experimental data sets taken from UCI repository. The results show that proposed algorithm for mining WCARs gives better results compared to existing algorithm.
    Keywords: Classification; Class Association Rules; Weighted Frequent Itemsets; Incremental Mining; Hash Structure.
    DOI: 10.1504/IJBIDM.2017.10005098
     
  • Students Performance Prediction using Hybrid Classifier Technique in Incremental Learning   Order a copy of this article
    by Roshani Ade 
    Abstract: The performance in higher education is a turning point in the academics for all students. This academic performance is influenced by many factors, therefore it is essential to develop predictive data mining model for student's performance so as to identify the difference between high learners and slow learners student. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. In our paper we used the hybrid classifier approach for the prediction of students performance using Fuzzy ARTMAP and Bayesian ARTMAP classifier. Sensitivity analysis was performed and irrelevant inputs were eliminated. The performance measures used to determine the performance of the techniques include Matthews Correlation Co-efficient (MCC), Accuracy Rate, True Positive, False Positive and Percentage correctly classified instances. The combined result gives the good accuracy for predicting students
    Keywords: Hybrid Classifier; Incremental Learning; Fuzzy ARTMAP; MCC.
    DOI: 10.1504/IJBIDM.2017.10005099
     
  • Multiple Object Tracking by Employing Shaped Based Features and Kalman Filter   Order a copy of this article
    by Felix M. Philip, Rajeswari Mukesh 
    Abstract: There has been fast development happening in the multimedia and the related technologies, particularly associated with visual tracking and search operations. Moving target detection has been comprehensively engaged in various arenas but has the disadvantage that the scheme is frequently complex and also that tracking is affected numerous external factors. In this article, multiple objects recognition and tracking is projected so as to progress the method and make it more robust and general with assistance of shape-based features and Kalman filter. Primarily, the input video is rehabilitated to frames and then manually segmented for object segmentation. Consequently, the objects are tracked with the help of Kalman filtering. The method is assessed under standard evaluation metrics of error value and the score value. The technique achieved maximum score values of 95% and minimum error value of 25%. The results validate the effectiveness of the technique.
    Keywords: Moving Object Detection; Tracking; Shape based Features; Kalman Filtering and Segmentation.
    DOI: 10.1504/IJBIDM.2017.10005166
     
  • Multi Performance Parameters Analysis in a Manufacturing System Using Fuzzy Logic and Optimal Neural Network Model   Order a copy of this article
    by R. Prasanna Lakshmi, P. Nelson Raja 
    Abstract: Support operations enhance machine conditions; additionally involve potential creation time, conceivably postponing the client orders. The target of this paper is to decide execution parameters in every work stations with foresee the cost, dependability and accessibility of the business. This estimate examination considers two sorts of various methodologies, for example, FLP ideal neural system model. At first utilising FLP to foresee the exhibitions parameters and expanding the exactness of examination by means of ANN with motivated enhancement procedure to upgrade the weights in structure. All the ideal results exhibit the way that the accomplished mistake values between the yield of the trial values and the anticipated qualities are firmly equivalent to zero in the planned system. From the outcomes the proposed KHO-based ideal neural system demonstrates the exactness is 98.23% it is contrasted with the Pareto improvement model.
    Keywords: Preventive maintenance; optimisation; neural network; fuzzy logic and manufacturing industry.
    DOI: 10.1504/IJBIDM.2017.10005297
     
  • Privacy Preserving Data Mining using Hiding Maximum Utility Item First Algorithm By means of Grey wolf optimisation Algorithm   Order a copy of this article
    by M.T. Ketthari, Rajendran Sugumar 
    Abstract: In the privacy preserving data mining, the utility mining casts a very vital part. The objective of the suggested technique is performed by concealing the high sensitive item sets with the help of the hiding maximum utility item first (HMUIF) algorithm, which effectively evaluates the sensitive item sets by effectively exploiting the user defined utility threshold value. It successfully attempts to estimate the sensitive item sets by utilising optimal threshold value, by means of the grey wolf optimisation (GWO) algorithm. The optimised threshold value is then checked for its performance analysis by employing several constraints like the HF, MC and DIS. The novel technique is performed and the optimal threshold resultant item sets are assessed and contrasted with those of diverse optimisation approaches. The novel HMUIF considerably cuts down the calculation complication, thereby paving the way for the enhancement in hiding performance of the item sets.
    Keywords: Data Mining; Privacy Preserving Utility Mining; Sensitive Item sets; optimal threshold; Grey wolf optimisation.
    DOI: 10.1504/IJBIDM.2017.10006335
     
  • Fuzzy- MCS algorithm based Ontology generation for E Assessment   Order a copy of this article
    by A. Santhanavijayan, S.R. Balasundaram 
    Abstract: Ontologies can lead to important improvements in the definition of a courses knowledge domain, in the generation of an adapted learning path, and in the assessment phase. This paper provides an initial discussion of the role of ontologies in the context of e-learning. Generally, automatic assessment is preferred over manual assessment to avoid bias errors, human errors and also conserves teachers time. Evaluation through objective tests like multiple choice questions has gained a lot of importance in the e-assessment system. Here we have proposed an efficient ontology generation based on soft computing techniques in e-assessment for multiple choice questions. We have employed fuzzy logic incorporated with optimisation algorithm like modified cuckoo search algorithm. Here a set of rules are first designed for creating the ontology. The rules are generated using fuzzy logic and these rules are optimised in order to generate a better ontology structure.
    Keywords: Ontologies; MCS algorithm; Fuzzy; e-learning.
    DOI: 10.1504/IJBIDM.2017.10006336
     
  • Minimal constraint based cuckoo search algorithm for Removing Transmission Congestion and Rescheduling the Generator units   Order a copy of this article
    by N. Chidambararaj, K. Chitra 
    Abstract: In the paper, a minimal constraint based cuckoo search (CS) algorithm is proposed for solving transmission congestion problem by considering both increase and decrease in generation power. Thus, the proposed algorithm is used to optimise the real power changes of generator while transmission congestion occurred. Then, the power loss, generator sensitivity factor and congestion management cost of the system is evaluated by the proposed algorithm according to the transmission congestion. The proposed method is implemented in MATLAB working platform and their congestion management performance is analysed. The performance of the proposed method is compared with the other existing methods such as fuzzy adaptive bacterial foraging (FABF), simple bacterial foraging (SBF), particle swarm optimisation (PSO), and artificial neural network (ANN)-CS respectively. The congestion management cost is reduced up to 26.169%. Through the analysis of comparison, it is shown that the proposed technique is better and outperforms other existing techniques in terms of congestion management measures.
    Keywords: minimal constraint based CS algorithm; PSO; ANN; real power; congestion management; power loss and congestion management cost.
    DOI: 10.1504/IJBIDM.2017.10006337
     
  • Effective Discovery of Missing Links in Citation Networks Using Citation Relevancy Check Process   Order a copy of this article
    by Nivash J P, L.D. Dhinesh Babu 
    Abstract: Effective dissemination of knowledge published by eminent authors in reputed journals and ensuring that the referred work is cited properly is the need of the hour. Citation analysis is about the similarity measures of articles or journals which are put forward to scaling as well as clustering procedures. A proper citation relevancy check (CRC) is required to avoid the missing links in the citation networks. Both similar and dissimilar references in the articles have important article citations. The purpose of this work is devise a method to find the most significant articles which can provide useful information to the journal editors and writers. The strategy presented in this paper can assist an author to incorporate most important articles and can help the editor in evaluating the quality of the references. The main benefit in detecting the missing articles is improvement in quality of research along with increased citation count.
    Keywords: Citation network analysis; Missing citations; Citation relevancy check; Increasing citation count.
    DOI: 10.1504/IJBIDM.2017.10006338
     
  • A Distributed Cross-layer Recommender System Incorporating Product Diffusion   Order a copy of this article
    by Ephina Thendral, C. Valliyammai 
    Abstract: In this era of online retailing, personalisation of web content has become very essential. Recommender system is a tool for extraction of relevant information to render personalisation in web information retrieval systems. With an inclination towards customer oriented service, there is a need to understand the adaptability of customers, to provide products/services of interest at the right time. In this paper, a model for distributed context aware cross layer recommender system incorporating the principle of product diffusion is proposed. The offline-online modelled recommender system learns offline about the adaptation time of users using the principle of product diffusion and then, uses online explore-then-exploit strategy to make effective recommendations to the user at the most probable time of consumption. Also, an algorithm based on product adaptability is proposed for recommending new items to the most probable users. The extensive experiments and results demonstrate the efficiency, scalability, reliability and enhanced retrieval effectiveness of the proposed recommender system model.
    Keywords: Recommender Systems; Personalization; Product Diffusion; Distributed Graph Model; Hadoop; Hbase; Titan graph database; Spark; Cross layer; Distributed processing.
    DOI: 10.1504/IJBIDM.2017.10006339
     
  • A Critique of Imbalanced Data Learning Approaches for Big Data Analytics   Order a copy of this article
    by Amril Nazir 
    Abstract: Biomedical research becomes reliant on multi-disciplinary, multi-institutional collaboration, and data sharing is becoming increasingly important for researchers to reuse experiments, pool expertise and validate approaches. However, there are many hurdles for data sharing, including the unwillingness to share, lack of flexible data model for providing context information for shared data, difficulty to share syntactically and semantically consistent data across distributed institutions, and expensive cost to provide tools to share the data. In our work, we develop a web-based collaborative biomedical data sharing platform SciPort to support biomedical data sharing across distributed organisations. SciPort provides a generic metadata model for researchers to flflexibly customise and organise the data. To enable convenient data sharing, SciPort provides a central server-based data sharing architecture, where data can be shared by one click through publishing metadata to the central server. To enable consistent data sharing, SciPort provides collaborative distributed schema management across distributed sites. To enable semantic consistency for data sharing, SciPort provides semantic tagging through controlled vocabularies. SciPort is lightweight and can be easily deployed for building data sharing communities for biomedical research.
    Keywords: imbalanced big data learning; large-scale imbalanced data analysis; high-dimensional imbalanced data learning.
    DOI: 10.1504/IJBIDM.2017.10006340
     
  • A Novel Multi-class Ensemble model based on feature selection using Hadoop framework for classifying imbalanced Biomedical Data   Order a copy of this article
    by THULASI BIKKU, N. Sambasiva Rao, Ananda Rao Akepogu 
    Abstract: Due to the exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Traditional decision tree models such as multi-variate Bernoulli model, random forest and multinominal na
    Keywords: Ensemble model; Hadoop; Imbalanced data; Medical databases; Textual Decision Patterns.
    DOI: 10.1504/IJBIDM.2018.10006485
     
  • An optimised approach to detect the identity of hidden information in gray scale and colour images   Order a copy of this article
    by Murugeswari Ganesan, Deisy Chelliah, Ganesan Govindan 
    Abstract: Feature-based steganalysis is an emerging trend in the domain of Information Forensics, aims to discover the identity of secret information present in the covert communication by analysing the statistical features of cover/stego image. Due to massive volumes of auditing data as well as complex and dynamic behaviours of steganogram features, optimising those features is an important open problem. This paper focused on optimising the number of features using the proposed quick artificial bee colony (qABC) algorithm. Here we tested for three steganalysers, namely subtractive pixel adjacency matrix (SPAM), phase aware projection model (PHARM) and colour filter array (CFA) for the break our steganographic system (BOSS) 1.01 datasets. The significant improvement in the convergence nature of qABC quickly improves the solution and fine tune the search than their real counterparts. The results reveal that qABC method with support vector machine (SVM) classifier outperforms the non-optimised version concerning classification accuracy and reduced number of feature sets.
    Keywords: Steganalysis; Feature Selection; Optimisation; Classification.
    DOI: 10.1504/IJBIDM.2017.10006486
     
  • An Effective Preprocessing Algorithm for Model Building in Collaborative Filtering based Recommender System   Order a copy of this article
    by Srikanth T, M. Shashi 
    Abstract: Recommender systems suggest interesting items for online users based on the ratings expressed by them for the other items maintained globally as the rating matrix. The rating matrix is often sparse and very huge due to large number of users expressing their ratings only for a few items among the large number of alternatives. Sparsity and scalability are the challenging issues to achieve accurate predictions in recommender systems. This paper focuses on model building approach to collaborative filtering-based recommender systems using low rank matrix approximation algorithms for achieving scalability and accuracy while dealing with sparse rating matrices. A novel preprocessing methodology is proposed to counter data sparsity problem by transforming the sparse rating matrix denser before extracting latent factors to appropriately characterise the users and items in low dimensional space. The quality of predictions made either directly or indirectly through user clustering were investigated and found to be competitive with the existing collaborative filtering methods in terms of reduced MAE and increased NDCG values on bench mark datasets.
    Keywords: Recommender System; Collaborative Filtering; Dimensionality Reduction; Pre- Processing,Sparsity,Scalability,Matrix Factorization.
    DOI: 10.1504/IJBIDM.2017.10006817
     
  • Error Tolerant Global Search Incorporated With Deep Learning Algorithm to Automatic Hindi Text Summarization   Order a copy of this article
    by J. Anitha, P.V.G.D. Prasad Reddy, M.S. Prasad Babu 
    Abstract: There is an exponential growth in the available electronic information in the last two decades. It causes a huge necessity to quickly understand high volume text data. This paper describes an efficient algorithm and it works by assigning scores to sentences in the document which is to be summarised. It also focuses on document extracts; a particular kind of computed document summary. The proposed approach uses fuzzy classifier and deep learning algorithm. Fuzzy classifier produces score for each sentence and the deep learning (DL) also produces score for each sentence. The combination of score from both fuzzy classifier and DL produces the hybrid score. Finally, the summarised text can be generated based on this hybrid score. In our proposed approach, we have achieved an average precision rate of 0.92 and average recall rate of 0.88 and the compression rate is 10% according to the experimental analysis.
    Keywords: GSA; Fuzzy; summarisation; hybrid; deep learning.
    DOI: 10.1504/IJBIDM.2017.10006978
     
  • Network Affinity Aware Energy Efficient Virtual Machine Placement Algorithm   Order a copy of this article
    by Ranjana Ramamurthy, S. Radha, J. Raja 
    Abstract: Efficient mapping of virtual machine request to the available physical machine is an optimisation problem in data centres. It is solved by aiming to minimise the number of physical machines and utilising them to their maximum capacity. Another avenue of optimisation in data centre is the energy consumption. Energy consumption can be reduced by using fewer physical machines for a given set of VM requests. An attempt is made in this work to propose an energy efficient VM placement algorithm that is also network affinity aware. Considering the network affinity between VMs during the placement will reduce the communication cost and the network overhead. The proposed algorithm is evaluated using the Cloudsim toolkit and the performance in terms of energy consumed, communication cost and number of active PMs, is compared with the standard first fit greedy algorithm.
    Keywords: Virtualisation; affinity aware; cloud computing; virtual machine placement; network affinity.
    DOI: 10.1504/IJBIDM.2018.10007005
     
  • A Secured Best Data Center Selection in Cloud Computing Using Encryption Technique   Order a copy of this article
    by Prabhu A., M. Usha 
    Abstract: In this work, we have proposed an approach for providing very high security to the cloud system. Our proposed method comprises of three phases namely authentication phase, cloud data centre selection phase and user related service agreement phase. For the purpose of accessing data from the cloud server, we will need a secure authentication key. In the authentication phase, the user authentication is verified and gets the key then encrypts the file using blowfish algorithm. Before encryption the input data is divided into column-wisely with the help of pattern matching approach. In the approach, the encryption and decryption processes are carried out by employing the blowfish algorithm. We can optimally select the cloud data centre to store the data; here the position is optimally selected with the help of bat algorithm. In the final phase, the user service agreement is verified. The implementation will be done by cloud sim simulator.
    Keywords: Authentication key; blowfish; Bat algorithm; pattern match; Cloud Data Center Selection.
    DOI: 10.1504/IJBIDM.2018.10007299
     
  • Combined Local color curvelet and mesh pattern for image retrieval system   Order a copy of this article
    by YESUBAI RUBAVATHI 
    Abstract: This manuscript presents the content based image retrieval system using new textural features such as colour local curvelet (CLC) based textural descriptor and colour local mesh pattern (CLMP), for the intention of increasing the performance of the image retrieval system. The proposed methods can be able to utilise the distinctive details obtained from spatial coloured textural patterns of various spectral components within the particular local image region. Furthermore, to acquire the benefit of harmonising effect through joint colour texture information, the oppugant colour textural features that obtain the texture patterns of spatial interactions among spectral planes are also integrated in to the creation of CLC and CLMP. Extensive and comparative experiments have been conducted on two benchmark databases, i.e., Corel-1k, MIT VisTex. Retrieval results show that image retrieval using colour local texture features yields better precision and recall than retrieval approaches using either by colour or texture features.
    Keywords: Content based image retrieval system; Curvelet transform; Local mesh pattern; Color local curvelets; Color local mesh pattern.
    DOI: 10.1504/IJBIDM.2017.10007514
     
  • FUZZY BASED AUTOMATED INTERRUPTION TESTING MODEL FOR MOBILE APPLICATIONS   Order a copy of this article
    by Malini A, K. . Sundarakantham, C. Mano Prathibhan, A. Bhavithrachelvi 
    Abstract: Testing of mobile applications during the occurrence of interrupts is termed as interrupt testing. Interrupts can occur either internally within the mobile or from other external factors or systems. Interruption in any smart phones may decrease the performance of mobile applications. In this paper, an automated interruption testing model is proposed to analyse the responsiveness of mobile applications during interrupts. This model monitors the applications installed in the mobile devices and evaluates the overall performance of mobile applications during interrupt using fuzzy logic. An enhanced MobiFuzzy evaluation system (MFES) is proposed that is used to dynamically analyse the test results and identify necessary information required for tuning the application. Fuzzy logic will help the developers or testers in tuning the application performance; by automatically categorising the impact
    Keywords: Mobile application testing; Interrupt testing; Application tracker; Performance testing.
    DOI: 10.1504/IJBIDM.2017.10007515
     
  • Evolution of Singular Value Decomposition in Recommendation Systems : A Review   Order a copy of this article
    by Rachana Mehta, Keyur Rana 
    Abstract: Proliferation of internet and web applications has led to exponential growth of users and information over web. In such information overload scenarios, recommender systems have shown their prominence by providing user with information of their interest. Recommender systems provide item recommendation or generate predictions. Amongst the various recommendation approaches, collaborative filtering techniques have emerged well because of its wide item applicability. Model-based collaborative filtering techniques which use parameterised model for prediction are more preferred as compared to their memory-based counterparts. However, the existing techniques deals with static data and are less accurate over sparse, high dimensional data. In order to alleviate such issues, matrix factorisation techniques like singular value decomposition are preferred. These techniques have evolved from using simple user-item rating information to auxiliary social and temporal information. In this paper, we provide a comprehensive review of such matrix factorisation techniques and their applicability to different input data.
    Keywords: Recommendation System; Collaborative filtering; Matrix factorization;Singular Value Decomposition; Information retrieval;Data mining;Auxiliary information; Latent features;Model learning;Data sparsity.
    DOI: 10.1504/IJBIDM.2017.10007516
     
  • Investigating Different Fitness Criteria for Swarm-based Clustering   Order a copy of this article
    by Maria P.S. Souza, Telmo M. Silva Filho, Getulio J.A. Amaral, Renata M.C.R. Souza 
    Abstract: Swarm-based optimisation methods have been previously used for tackling clustering tasks, with good results. However, the results obtained by this kind of algorithm are highly dependent on the chosen fitness criterion. In this work, we investigate the influence of four different fitness criteria on swarm-based clustering performance. The first function is the typical sum of distances between instances and their cluster centroids, which is the most used clustering criterion. The remaining functions are based on three different types of data dispersion: total dispersion, within-group dispersion and between-groups dispersion. We use a swarm-based algorithm to optimise these criteria and perform clustering tasks with nine real and artificial datasets. For each dataset, we select the best criterion in terms of adjusted Rand index and compare it with three state-of-the-art swarm-based clustering algorithms, trained with their proposed criteria. Numerical results confirm the importance of selecting an appropriate fitness criterion for each clustering task.
    Keywords: Swarm Optimisation; Fitness criterion; Clustering; Artificial Bee Colony; Particle Swarm Optimisation.
    DOI: 10.1504/IJBIDM.2017.10007517
     
  • A Combined PFCM and Recurrent Neural Network based Intrusion Detection System for Cloud Environment   Order a copy of this article
    by Manickam M., N. Ramaraj, C. Chellappan 
    Abstract: The main objective of this paper is intrusion detection system for a cloud environment using combined PFCM-RNN. Traditional IDSs are not suitable for cloud environment as network-based IDSs (NIDS) cannot detect encrypted node communication, also host-based IDSs (HIDS) are not able to find the hidden attack trail. The traditional intrusion detection is largely inefficient to be deployed in cloud computing environments due to their openness and specific essence. Accordingly, this proposed work consists of two modules namely clustering module and classification module. In clustering module, the input dataset is grouped into clusters with the use of possibilistic fuzzy C-means clustering (PFCM). In classification module, the centroid from the clusters is given to the recurrent neural network which is used to classify whether the data is intruded or not. For experimental evaluation, we use the benchmark database and the results clearly demonstrate the proposed technique outperformed conventional methods.
    Keywords: cloud computing; intrusion detection system; Possibilistic Fuzzy C-means clustering; recurrent neural network.
    DOI: 10.1504/IJBIDM.2017.10007763
     
  • Master node fault tolerance in distributed big data processing clusters   Order a copy of this article
    by Ivan Gankevich, Yuri Tipikin, Vladimir Korkhov, Vladimir Gaiduchok, Alexander Degtyarev, A. Bogdanov 
    Abstract: Distributed computing clusters are often built with commodity hardware which leads to periodic failures of processing nodes due to relatively low reliability of such hardware. While worker node fault-tolerance is straightforward, fault tolerance of master node poses a bigger challenge. In this paper master node failure handling is based on the concept of master and worker roles that can be dynamically re-assigned to cluster nodes along with maintaining a backup of the master node state on one of worker nodes. In such case no special component is needed to monitor the health of the cluster while master node failures can be resolved except for the cases of simultaneous failure of master and backup. We present experimental evaluation of the technique implementation, show benchmarks demonstrating that a failure of a master does not affect running job, and a failure of backup results in re-computation of only the last job step.
    Keywords: parallel computing; Big Data processing; distributed computing; backup node; state transfer; delegation; cluster computing; fault-tolerance.
    DOI: 10.1504/IJBIDM.2017.10007764
     
  • The integration of a newly defined N-gram concept and vector space model for documents ranking   Order a copy of this article
    by Mostafa Salama, Wafaa Salah 
    Abstract: Vector space model (VSM) is used in measuring the similarity between documents according to the frequency of common words among them. Furthermore, the N-gram concept is integrated in VSM to put into consideration the relation between common consecutive words in the documents. This approach does not consider the context and semantic dependency between nonconsecutive words existing in the same sentence. Accordingly, the approach proposed here presents a new definition of the N-gram concept as N non-consecutive words located in the same sentence, and utilises this definition in the VSM to enhance the measurement of the semantic similarity between documents. This approach measures and visualises the correlation between the words that are commonly existing together within the same sentence to enrich the analysis of domain experts. The results of the experimental work show the robustness of the proposed approach against the current ranking models.
    Keywords: N-gram; vector space model; Text Mining.
    DOI: 10.1504/IJBIDM.2017.10007893
     
  • MODELLING AND SIMULATION OF ANFISBASED MPPT FOR PV SYSTEM WITH MODIFIED SEPIC CONVERTER   Order a copy of this article
    by M. Senthil Kumar, P.S. Manoharan, R. Ramachandran 
    Abstract: This paper presents modelling and simulation of artificial neuro-fuzzy inference system (ANFIS) based maximum power point tracking (MPPT) algorithm for PV system with modified SEPIC converter. The conventional existing MPPT methods are having major drawbacks of high oscillations at maximum power point and low efficiency due to uncertain nature of solar radiation and temperature. These mentioned problems can be solved by the proposed adaptive (ANFIS) based MPPT. The proposed work involves ANFIS and modified single ended primary inductor converter (SEPIC) to extract maximum power from PV panel. The results obtained from proposed methodology are compared with other MPPT algorithms such as perturb and observe (P&O), incremental conductance (INC) and radial basis function network (RBFN). The improvement in voltage rating of modified SEPIC is compared with conventional SEPIC converter. The result confirms the superiority of the proposed system.
    Keywords: ANFIS;INC;Modified SEPIC;P&O;RBFN.
    DOI: 10.1504/IJBIDM.2017.10007894
     
  • Distributed Algorithms for Improved Associative Multilabel Document Classification considering Reoccurrence of Features and handling Minority Classes   Order a copy of this article
    by Preeti Bailke, S.T. Patil 
    Abstract: Existing work in the domain of distributed data mining mainly focuses on achieving the speedup and scaleup properties rather than improving performance measures of the classifier. Improvement in speedup and scaleup is obvious when distributed computing platform is used. But its computing power should also be used for improving performance measures of the classifier. This paper focuses on the same by considering reoccurrence of features and handling minority classes. Since it is very time consuming to run such complex algorithms on large datasets sequentially, distributed versions of the algorithms are designed and tested on the Hadoop cluster. Base associative classifier is designed based on multi-class, multi-label associative classification (MMAC) algorithm. Since no similar distributed algorithms exist, proposed algorithms are compared with the base classifier and have shown improvement in classifier performance measures.
    Keywords: Multilabel associative classifier; Hadoop; Pig; Feature reoccurrence; Minority Class; Distributed Algorithm.
    DOI: 10.1504/IJBIDM.2017.10007924
     
  • A survey on time series motif discovery   Order a copy of this article
    by Cao Duy Truong, Duong Tuan Anh 
    Abstract: Time series motifs are repeated subsequences in a long time series. Discovering time series motifs is an important task in time series data mining and this problem has received significant attention from researchers in data mining communities. In this paper, we intend to provide a comprehensive survey of the techniques applied for time series motif discovery. The survey also briefly describes a set of applications of time series motif in various domains as well as in high-level time series data mining tasks. We hope that this article can provide a broad and deep understanding of the time series motif discovery field.
    Keywords: time series; motif discovery; window-based; segmentation-based; motif applications.
    DOI: 10.1504/IJBIDM.2017.10008074
     
  • Trust Management Scheme for Authentication in Secure Cloud Computing Using Double Encryption Method   Order a copy of this article
    by P. Sathishkumar, V. Venkatachalam 
    Abstract: In cloud computing and banking, the consumer as well as supplier required for their service as protection and confidence. In this document suggest the belief value oriented verification procedure by the aid of encryption procedure, this verification segment bank marketing database are measured to the kernel fuzzy c-means clustering (KFCM) method. Clustered datas are accumulated in the cloud to the confidence data verification procedure. In the verification segment, the consumer verification is confirmed and acquires the verification key then encrypts the file by the double encryption algorithm. Primarily the confidence finest data implemented homomorphic encryption to encrypt the data by blowfish algorithm and then encrypted data are accumulated in cloud data core. This procedure oriented the banking data will be steadily legalised in cloud computing procedure. The outcomes are exemplify the improved encryption time and extremely legitimate the data in the cloud.
    Keywords: Authentication; Cloud Security; Cloud Services; Trust Management; clustering; cloud computing; encryption and decryption.
    DOI: 10.1504/IJBIDM.2017.10008075
     
  • Technology in its Context   Order a copy of this article
    by Tobias Christian Fischer 
    Abstract: The purpose of the literature review is to identify characteristics, concepts, and theories of business intelligence (BI). The status quo of BI is based on the literature review, which covers 86 journal articles from the three areas of accounting, strategy, and information systems between 2006 and 2014. The review combines two established frameworks to illustrate new insights regarding the macro and micro levels of BI. The complementary combination of both levels produces a new lens that shows the conceptualisation and characteristics of BI in a holistic view. The result of the study shows that BI is used as a monolithic concept and static tool with technical control mechanisms. Another result implies that BI is in a phase of maturity, in which it fulfils an organisational purpose without considering its social context or ecosystem in which it occurs. The literature review contributes to the characterisation and theorisation of BI and shows that a company depend on both characteristics and the purpose for which BI is used.
    Keywords: accounting; business intelligence; conceptualization; information systems; literature review; macro level; measurement system; micro level; strategy; technology .
    DOI: 10.1504/IJBIDM.2017.10008076
     
  • Trajectory tracking of the robot end-effector for the minimally invasive surgeries   Order a copy of this article
    by Jose De Jesus Rubio, Panuncio Cruz, Enrique Garcia, Cesar Felipe Juarez, David Ricardo Cruz, Jesus Lopez 
    Abstract: The surgery technology has been highly investigated, with the purpose to reach an efficient way of working in medicine. Consequently, robots with small tools have been incorporated in many kind of surgeries to reach the following improvements: the patient gets a faster recovery, the surgery is not invasive, and the robot can access to the body occult parts. In this article, an adaptive strategy for the trajectory tracking of the robot end effector is addressed; it consists of a proportional derivative technique plus an adaptive compensation. The proportional derivative technique is employed to reach the trajectory tracking. The adaptive compensation is employed to reach approximation of some unknown dynamics. The robot described in this study is employed in minimally invasive surgeries.
    Keywords: Trajectory tracking; robot; minimal invasive surgery.
    DOI: 10.1504/IJBIDM.2018.10008077
     
  • Multi Label Learning Approaches for Multi Species Avifaunal Occurrence Modelling: A Case Study of South Eastern Tamil Nadu   Order a copy of this article
    by Appavu Alias Balamurugan, P.K.A. Chitra, S. Geetha 
    Abstract: Many multi label problem transformation (PT) and algorithm adaptation (AA) methods need to be explored to get good candidate for avifaunal occupancy modelling. This research contrasted eight commonly used state-of-the-art PT and AA multi label methods. The data was created by collecting January 2014December 2014 records from e-bird repository for the study area Madurai district, south eastern Tamil Nadu. The analysis shows that classifier chain (CC) and multi label naive Bayes (MLNB) are the good aspirants for avifauna data. The MLNB did best with 0.019 hamming loss and 90% average precision. To the best of our knowledge this is the first time to use MLNB for avifaunal data and the results of multi label naive Bayes concludes that out of 143 species observed, six species had high occurrence rate and 68 species had low occurrence rate.
    Keywords: Species distribution models; multi species; multi label Learning; Multi Label Naive Bayes; Central part of southern Tamil Nadu.
    DOI: 10.1504/IJBIDM.2018.10008307
     
  • Analytics on Talent Search Examination Data   Order a copy of this article
    by Anagha Vaidya, Vyankat Munde, Shailaja Shirwaikar 
    Abstract: Learning analytics and educational data mining has greatly supported the process of assessing and improving the quality of education. While learning analytics has a longer development cycle, educational data mining suffers from the inadequacy of data captured through learning processes. The data captured from examination process can be suitably extended to perform some descriptive and predictive analytics. This paper demonstrates the possibility of actionable analytics on the data collected from talent search examination process by adding to it some data pre-processing steps. The analytics provides some insight into the learners characteristics and demonstrates how analytics on examination data can be a major support for bringing the quality in education field.
    Keywords: Learning Analytics; Educational Data Mining; clustering; linear modelling.
    DOI: 10.1504/IJBIDM.2018.10008308
     
  • A fast clustering approach for large multidimensional data   Order a copy of this article
    by Hajar Rehioui, Abdellah Idrissi 
    Abstract: Density-based clustering is a strong family of clustering methods. The strength of this family is its ability to classify data of arbitrary shapes and to omit the noise. Among them density-based clustering (DENCLUE), which is one of the well-known powerful density-based clustering methods. DENCLUE is based on the concept of the hill climbing algorithm. In order to find the clusters, DENCLUE has to reach a set of points called density attractors. Despite the advantages of DENCLUE, it remains sensitive to the growth of the size of data and of the dimensionality, in the fact that the density attractors are calculated of each point in the input data. In this paper, in the aim to overcome the DENCLUE shortcoming, we propose an efficient approach. This approach replaces the concept of the density attractor by a new concept which is the hyper-cube representative. The experimental results, provided from several datasets, prove that our approach finds a trade-off between the performance of clustering and the fast response time. In this way, the proposed clustering methods work efficiently for large of multidimensional data.
    Keywords: Large Data; Dimensional Data; Clustering; Density based clustering; DENCLUE.
    DOI: 10.1504/IJBIDM.2017.10008309
     
  • CBRec: a book recommendation system for children using the matrix factorisation and content-based filtering approaches   Order a copy of this article
    by Yiu-Kai Ng 
    Abstract: Promoting good reading habits among children is essential, given the enormous influence of reading on students development as learners and members of the society. Unfortunately, very few (children) websites or online applications recommend books to children, even though they can play a significant role in encouraging children to read. Given that a few popular book websites suggest books to children based on the popularity of books or rankings on books, they are not customised/personalised for each individual user and likely recommend books that users do not want or like. We have integrated the matrix factorisation approach and the content-based approach, in addition to predicting the grade levels of books, to recommend books for children. Recent research works have demonstrated that a hybrid approach, which combines different filtering approaches, is more effective in making recommendations. Conducted empirical study has verified the effectiveness of our proposed children book recommendation system.
    Keywords: Book recommendation; matrix factorisation; content analysis; children.
    DOI: 10.1504/IJBIDM.2018.10008310
     
  • Enhancing Purchase Decision using Multi-word Target Bootstrapping with Part-of-Speech Pattern Recognition Algorithm   Order a copy of this article
    by M. Pradeepa Sivaramakrishnan, C. Deisy 
    Abstract: In this research work, multi-word target related terms are extracted automatically from the customer reviews for sentiment analysis. We used LIDF measure and have proposed a novel measure called, TCumass in iterative multi-word target (IMWT) bootstrapping algorithm. In addition, part-of-speech pattern recognition (PPR) algorithm has been proposed to identify the appropriate target and emotional words from multi-word target related terms. This article aims to bring out both implicit and explicit targets with their corresponding polarities in an unsupervised manner. We proposed two models namely, MWTB without PPR and MWTB with PPR. Thus, the present research illustrates the comparison between the proposed works and the existing multi-aspect bootstrapping (MAB) algorithm. The experiment has been done based on different data sets and thereafter the performance evaluated using different measures. From this study, the result expounds that MWTB with PPR model performs well, having achieved the precise targets and emotional words.
    Keywords: Bootstrapping; emotional polarity; multi-word target; Part-of-Speech (POS); sentiment analysis.
    DOI: 10.1504/IJBIDM.2018.10008334
     
  • Probabilistic Variable Precision Fuzzy Rough Set Technique for Discovering Optimal Learning Patterns in E-learning   Order a copy of this article
    by Bhuvaneshwari K.S, D. Bhanu, S. Sophia, S. Kannimuthu 
    Abstract: In e-learning environment, optimal learning patterns are discovered for realising and understanding the effective learning styles. The value of uncertain and imprecise knowledge collected has to be categorised into classes known as membership grades. Rough set theory is potential in categorising data into equivalent classes and fuzzy logic may be applied through soft thresholds for refining equivalence relation that quantifies correlation between each class of elucidated data. In this paper, probabilistic variable precision fuzzy rough set technique (PVPFRST) is proposed for deriving robust approximations and generalisations that handles the types of uncertainty namely stochastic, imprecision and noise in membership functions. The result infers that the degree of accuracy of PVPFRST is 21% superior to benchmark techniques. Result proves that PVPFRST improves effectiveness and efficiency in identifying e-learners styles and increases the performance by 27%, 22% and 25% in terms of discrimination rate, precision and recall value than the benchmark approaches.
    Keywords: Inclusion degree; Probabilistic fuzzy information system; fuzzy membership grade; Crispness coefficient; Probabilistic variable precision fuzzy rough set; Inclusion function.
    DOI: 10.1504/IJBIDM.2018.10008496
     
  • Inferring the Level of Visibility from Hazy Images   Order a copy of this article
    by Alexander A. S. Gunawan, Heri Prasetyo, Indah Werdiningsih, Janson Hendryli 
    Abstract: In our research, we would like to exploit crowdsourced photos from social media to create low-cost fire disaster sensors. The main problem is to analyse how hazy the environment looks like. Therefore, we provide a brief survey of methods dealing with visibility level of hazy images. The methods are divided into two categories: single-image approach and learning-based approach. The survey begins with discussing single image approach. This approach is represented by visibility metric based on contrast-to-noise ratio (CNR) and similarity index between hazy image and its dehazing image. This is followed by a survey of learning-based approach using two contrast approaches that is: 1) based on theoretical foundation of transmission light, combining with the depth image using new deep learning method; 2) based on black-box method by employing convolutional neural networks (CNN) on hazy images.
    Keywords: Hazy image; visibility level; single image approach; learning based approach; social media.
    DOI: 10.1504/IJBIDM.2018.10008497
     
  • The Complexity of Cluster-Connectivity of Wireless Sensor Networks   Order a copy of this article
    by H.K. Dai, H.C. Su 
    Abstract: Wireless sensor networks consist of sensor devices with limited computational capabilities and memory operating in bounded energy resources; hence, network optimisation and algorithmic development in minimising the total energy or power while maintaining the connectivity of the underlying network are crucial for their design and maintenance. We consider a generalised system model of wireless sensor networks whose node set is decomposed into multiple clusters, and show that the decision and the associated minimisation problems of the connectivity of clustered wireless sensor networks appear to be computationally intractable completeness and hardness, respectively, for the non-deterministic polynomial-time complexity class. An approximation algorithm is devised to minimise the number of end nodes of inter-cluster edges within a factor of 2 of the optimum for the cluster-connectivity.
    Keywords: wireless sensor network; connectivity; spanning tree; nondeterministic polynomial-time complexity class; approximation algorithm.
    DOI: 10.1504/IJBIDM.2018.10008498