Forthcoming articles

International Journal of Business Intelligence and Data Mining

International Journal of Business Intelligence and Data Mining (IJBIDM)

These articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Register for our alerting service, which notifies you by email when new issues are published online.

Open AccessArticles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.
We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Business Intelligence and Data Mining (89 papers in press)

Regular Issues

  • An Effective Preprocessing Algorithm for Model Building in Collaborative Filtering based Recommender System   Order a copy of this article
    by Srikanth T, M. Shashi 
    Abstract: Recommender systems suggest interesting items for online users based on the ratings expressed by them for the other items maintained globally as the rating matrix. The rating matrix is often sparse and very huge due to large number of users expressing their ratings only for a few items among the large number of alternatives. Sparsity and scalability are the challenging issues to achieve accurate predictions in recommender systems. This paper focuses on model building approach to collaborative filtering-based recommender systems using low rank matrix approximation algorithms for achieving scalability and accuracy while dealing with sparse rating matrices. A novel preprocessing methodology is proposed to counter data sparsity problem by transforming the sparse rating matrix denser before extracting latent factors to appropriately characterise the users and items in low dimensional space. The quality of predictions made either directly or indirectly through user clustering were investigated and found to be competitive with the existing collaborative filtering methods in terms of reduced MAE and increased NDCG values on bench mark datasets.
    Keywords: Recommender System; Collaborative Filtering; Dimensionality Reduction; Pre- Processing,Sparsity,Scalability,Matrix Factorization.
    DOI: 10.1504/IJBIDM.2017.10006817
  • Trajectory tracking of the robot end-effector for the minimally invasive surgeries   Order a copy of this article
    by Jose De Jesus Rubio, Panuncio Cruz, Enrique Garcia, Cesar Felipe Juarez, David Ricardo Cruz, Jesus Lopez 
    Abstract: The surgery technology has been highly investigated, with the purpose to reach an efficient way of working in medicine. Consequently, robots with small tools have been incorporated in many kind of surgeries to reach the following improvements: the patient gets a faster recovery, the surgery is not invasive, and the robot can access to the body occult parts. In this article, an adaptive strategy for the trajectory tracking of the robot end effector is addressed; it consists of a proportional derivative technique plus an adaptive compensation. The proportional derivative technique is employed to reach the trajectory tracking. The adaptive compensation is employed to reach approximation of some unknown dynamics. The robot described in this study is employed in minimally invasive surgeries.
    Keywords: Trajectory tracking; robot; minimal invasive surgery.
    DOI: 10.1504/IJBIDM.2018.10008077
  • Analytics on Talent Search Examination Data   Order a copy of this article
    by Anagha Vaidya, Vyankat Munde, Shailaja Shirwaikar 
    Abstract: Learning analytics and educational data mining has greatly supported the process of assessing and improving the quality of education. While learning analytics has a longer development cycle, educational data mining suffers from the inadequacy of data captured through learning processes. The data captured from examination process can be suitably extended to perform some descriptive and predictive analytics. This paper demonstrates the possibility of actionable analytics on the data collected from talent search examination process by adding to it some data pre-processing steps. The analytics provides some insight into the learners characteristics and demonstrates how analytics on examination data can be a major support for bringing the quality in education field.
    Keywords: Learning Analytics; Educational Data Mining; clustering; linear modelling.
    DOI: 10.1504/IJBIDM.2018.10008308
  • CBRec: a book recommendation system for children using the matrix factorisation and content-based filtering approaches   Order a copy of this article
    by Yiu-Kai Ng 
    Abstract: Promoting good reading habits among children is essential, given the enormous influence of reading on students development as learners and members of the society. Unfortunately, very few (children) websites or online applications recommend books to children, even though they can play a significant role in encouraging children to read. Given that a few popular book websites suggest books to children based on the popularity of books or rankings on books, they are not customised/personalised for each individual user and likely recommend books that users do not want or like. We have integrated the matrix factorisation approach and the content-based approach, in addition to predicting the grade levels of books, to recommend books for children. Recent research works have demonstrated that a hybrid approach, which combines different filtering approaches, is more effective in making recommendations. Conducted empirical study has verified the effectiveness of our proposed children book recommendation system.
    Keywords: Book recommendation; matrix factorisation; content analysis; children.
    DOI: 10.1504/IJBIDM.2018.10008310
  • Inferring the Level of Visibility from Hazy Images   Order a copy of this article
    by Alexander A. S. Gunawan, Heri Prasetyo, Indah Werdiningsih, Janson Hendryli 
    Abstract: In our research, we would like to exploit crowdsourced photos from social media to create low-cost fire disaster sensors. The main problem is to analyse how hazy the environment looks like. Therefore, we provide a brief survey of methods dealing with visibility level of hazy images. The methods are divided into two categories: single-image approach and learning-based approach. The survey begins with discussing single image approach. This approach is represented by visibility metric based on contrast-to-noise ratio (CNR) and similarity index between hazy image and its dehazing image. This is followed by a survey of learning-based approach using two contrast approaches that is: 1) based on theoretical foundation of transmission light, combining with the depth image using new deep learning method; 2) based on black-box method by employing convolutional neural networks (CNN) on hazy images.
    Keywords: Hazy image; visibility level; single image approach; learning based approach; social media.
    DOI: 10.1504/IJBIDM.2018.10008497
  • Application of a hybrid data mining model to identify the main predictive factors influencing hospital length of stay   Order a copy of this article
    by Ahmed Belderrar, Abdeldjebar Hazzab 
    Abstract: Length of hospital stay is one of the most appropriate measures that can be used for management of hospital resources and assistant of hospital admissions. The main predictive factors associated with the length of stay are critical requirements and should be identified to build a reliable prediction model for hospital stays. A hybrid integration approach consisting of fuzzy radial basis function neural network and hierarchical genetic algorithms was proposed. The proposed approach was applied on a data set collected from a variety of intensive care units. We achieved an acceptable forecast accuracy level with more than 80.50%. We found 14 common predictive factors. Most notably, we consistently found that the demographic characteristics, hospital features, medical events and comorbidities strongly correlates to the length of stay. The proposed approach can be used as an effective tool for healthcare providers and can be extended to other hospital predictions.
    Keywords: data mining; hospital management; length of hospital stay; hybrid prediction model; predictive factors.
    DOI: 10.1504/IJBIDM.2018.10008777
  • Genetic Algorithm based Intelligent Multiagent Architecture for Extracting Information from Hidden Web Databases   Order a copy of this article
    by Weslin D, T. Joshva Devadas 
    Abstract: Though there are enormous amount of information available in the web, only very small portion of the available information is visible to the users. Due to the non-visibility of huge information, the traditional search engines cannot index or access all information present in the web. The main challenge in the mining of the relevant information from a huge hidden web database is to identify the entry points to access the hidden web databases. The existing web crawlers cannot retrieve all information from the hidden web databases. To retrieve all the relevant information from the hidden web, this paper proposes an architecture that uses genetic algorithm and intelligent agents for accessing hidden web databases. The proposed architecture is termed as genetic algorithm based intelligent multi-agent system (GABIAS). The experimental results show that the proposed architecture provides better precision and recall than the existing web crawlers.
    Keywords: Genetic Algorithm (GA); Hidden Web; Intelligent Agent; Web Crawler.
    DOI: 10.1504/IJBIDM.2018.10008837
  • A Novel Attribute Based Dynamic Clustering with Schedule Based Rotation Method (ADC-SBR) for Outlier Detection   Order a copy of this article
    by Karthikeyan .G, P. Balasubramanie 
    Abstract: Detection of outliers in bank transactions has gained popularity in the recent years. The existing outlier detection techniques are unable to process the high volume of data. Hence, to address this issue, an efficient attribute based dynamic clustering-schedule based rotation (ADC-SBR) method is proposed. The similarity between transactions within a cluster is estimated using Jaccard coefficient based labelling approach and the optimal cluster head is chosen by the similarity-based cluster head selection (SbCHS) method. The outlier detection is performed in two levels. The node level outlier detection is performed using linear regression model and the cluster level outlier detection is performed by deviation based ranking. An own dataset with bank transactions is used for the experimental analysis. The suggested method is implemented in Apache Spark and is compared with existing algorithms for the metrics. The comparison results prove that the proposed method is optimal for all metrics than existing algorithms.
    Keywords: Attribute based Dynamic Clustering (ADC) - Schedule based Rotation (SBR); Jaccard coefficient; Linear Regression method; Deviation based ranking; Similarity based Cluster Head Selection (SbCHS).
    DOI: 10.1504/IJBIDM.2018.10009135
  • Mining Multilingual and Multiscript Twitter Data: Unleashing the Language and Script Barrier   Order a copy of this article
    by Bidhan Sarkar, Nilanjan Sinhababu, Manob Roy, Pijush Kanti Dutta Pramanik, Prasenjit Choudhury 
    Abstract: Micro-blogging sites like Twitter have become an opinion hub where views on diverse topics are expressed. Interpreting, comprehending and analysing this emotion-rich information can unearth many valuable insights. The job is trivial if the tweets are in English. But lately, increase in native languages for communication has imposed a great challenge in social media mining. Things become more complicated when people use Roman scripts to write non-English languages. India, being a country with a diverse collection of scripts and languages, encounters the problem severely. We have developed a system that automatically identifies and classifies native tweets, irrespective of the script used. Converting all tweets to English, we get rid of the script vs language problem. The new approach we formulated consists of Script Identification, Language analysis, and Clustered mining. Considering English and the top two Indian languages, we found that the proposed framework gives better precision than the prevailing approaches.
    Keywords: Twitter Mining; Language Classification; Script Identification; Indic language; Preprocessing; Naive Bayes; Support Vector Machine; LDA.
    DOI: 10.1504/IJBIDM.2018.10009136
  • An Automated Ontology Learning for benchmarking classifier models through Gain-Based Relative-Non-Redundant (GBRNR) Feature Selection : A case-study with Erythemato   Order a copy of this article
    by S. Sivasankari, Shomona Gracia Jacob 
    Abstract: Erythemato-squamous disease (ESD) is one of the complex diseases in the dermatology field, the diagnosis of which is challenging, due to common morphological features and often leads to inconsistent results. Besides, diagnosis has been done on the basis of inculcated visible symptoms pertinent with the expertise of the physician. Hence, ontology construction for prediction of Erythemato-squamous disease through data mining techniques was believed to yield a clear representation of the relationships between the disease, symptoms and course of treatment. However, the classification accuracy required to be high in order to obtain a precise ontology. This required identifying the correct set of optimal features required to predict ESD. This paper proposes the Gain based Relative-Non-Redundant Attribute selection approach for diagnosis of ESD. This methodology yielded 98.1% classification accuracy with Adaboost algorithm that executed J48 as the base classifier. The feature selection approach revealed an optimal feature set comprising of 19 selected features.
    Keywords: Ontology; Feature Selection; Classifier; Web Ontology Language; Gain Base;Erythemato-Squamous.
    DOI: 10.1504/IJBIDM.2018.10009138
  • Optimal Page Ranking System For Web Page Personalization Using MKFCM And GSA   Order a copy of this article
    by Pranitha P., M.A.H. Farquad, G. Narshimha 
    Abstract: In this personalised web search (PWS), we utilise a kernel-based FCM for clustering a web pages. For effective personalised web search, queries are optimised using GSA with respect to clustered query sessions. In offline processing, initially preprocess the input information taken from consumer visited web pages and are transformed in to numerical matrix. These matrices are gathered with the help of kernel-based FCM method after produce a vector for consumer query and detect a minimum distance as centroid values these values are input to the GSA algorithm. It will engender these links given top N web pages from cluster. In online processing, the user query is engaged as input then extract some web pages from Google, Bing, Yahoo also extract content and snippet from web pages. Finally, detect a sum of contents and snippets and web pages would be considered in descending order.
    Keywords: Kernelbased Fuzzy c-means; Clustering; offline; online; preprocessing; Google; Bing; Yahoo.
    DOI: 10.1504/IJBIDM.2018.10009140
  • Privacy Preserving-Aware Over Big Data in Clouds Using GSA and Map Reduce Framework   Order a copy of this article
    by Sekar K., Mokkala Padmavathamma 
    Abstract: This paper proposes a privacy preserving-aware-based approach over Big data in clouds using GSA and MapReduce framework. It consists of two modules such as; MapReduce module and evaluation module. In MR module, convolution process is applied to the dataset and creates a new kernel matrix. The convolution process is correctly done; the utility and privacy information of the data is well secured. Once the convolution process is over, the privacy-persevering framework over big data in cloud systems is performed based on the evaluation module. In Evaluation module, the neural-network is trained based on the Gravitational Search Algorithm with Scaled conjugate gradient (GSA-SCG) algorithm which is improving the utility of the privacy data. Finally, the reduced privacy datas are stored in the service provider (CSP). The MapReduce framework is to ensure the private data, which is in charge for anonymising original data sets as per privacy requirements.
    Keywords: Map reduce; privacy preserving; big data; Cloud service provider; cloud system; GSA; convolution; entropy.
    DOI: 10.1504/IJBIDM.2018.10009361
  • Secure Hash Algorithm based Multiple Tenant User Security over Green Cloud Environment   Order a copy of this article
    by Ram Mohan, S. Padmalal, B. Chitra 
    Abstract: This paper proposes a green cloud multi-tenant trust authentication with secure hash algorithm-3 (GreenCloud-MTASHA3) scheme to eliminate the unauthorised tenant access. GreenCloud-MTASHA3 scheme provide security over the multiple tenant requests by referring the confidentiality, integrity and availability rate. Confidentiality refers to limiting the unauthorised tenants green cloud data access using the additive homomorphic privacy property in proposed scheme. Additive homomorphic privacy property-based encryption function is developed to improve the privacy preserving level. To attain the integrity level between the tenant requests and green cloud server machine in GreenCloud-MTASHA3 scheme an encrypted trust data management process is carried out. Trustworthiness of tenant request is measured to maintain the consistency level on security with minimal computational time. The proposed scheme attains the confidentiality, integrity and availability rate on communicating task. Experiment is conducted on factors such as secure computation confidence, authorised tenant computational time and space taken on storing encrypted data.
    Keywords: Green Cloud; Security; Confidentiality; Secure Hash Algorithm; Computational Time; Multi-Tenant; Integrity; Privacy Level; Cryptographic System.
    DOI: 10.1504/IJBIDM.2018.10009362
  • Frequent Pattern Mining for Parameterised Automatic Variable Key based cryptosystems   Order a copy of this article
    by Shaligram Prajapat 
    Abstract: Huge amount of information is exchanged electronically in most enterprises and organisations. In particular, in all financial and e-business set ups the amount of data stored or exchanged is growing enormously over public network among variety of computing devices. Securing this gargantuan sized input is challenging. This paper provides a framework for securing information exchange using parametric approaches with AVK approach and investigating strength of this cryptosystem using mining algorithms on symmetric key-based cryptosystem. This work demonstrates association rule application as one of the component of cryptic mining system used to process the encrypted data for extracting use full patterns and association. The degree of identified patterns may be use full to rank the degree of safety and class of cryptic algorithm, during auditing of security algorithms.
    Keywords: Mining algorithms; symmetric key cryptography; AVK.
    DOI: 10.1504/IJBIDM.2018.10009363
    by Vellingiriraj EK, P. Balasubrmanie 
    Abstract: The ancient Tamil characters recognition is the complex task because there is no sufficient training information is available. Various researchers attempted to perform accurate recognition of ancient Tamil characters. In our preceding work, hybrid multi-neural learning based prediction and recognition system (HMNL-PRS) is introduced for the prediction process which lacks from inaccurate recognition. In this proposed research work, this is overcome by proposing the Brahmi character prediction and conversion system (BC-PCS) methodology. Here, the modified graph based segmentation algorithm (MGSA) is used to segment the characters. And then the statistical and structural features are extracted based on which classification is done using hybridised support vector machine based fuzzy neural network. In the MATLAB simulation environment, the proposed research work is implemented and it is confirmed that the proposed research work direct to give the excellent result compared to the preceding research methodology in terms of recognition rate.
    Keywords: Brahmi characters; accurate recognition; segmentation; graph based approach; Classification.
    DOI: 10.1504/IJBIDM.2018.10009882
  • Benchmarking Tree based Least Squares Twin Support Vector Machine Classifiers   Order a copy of this article
    by Mayank C, S.S. Bedi 
    Abstract: Least square twin support vector machine is an emerging learning method applied in classification problem. This paper present a tree-based least square twin support vector machine (T-LSTWSVM) for classification. Classification procedure depends on the correlation of input feature as well as output feature. UCI benchmark data sets are used to evaluate the test set performance of tree-based least square twin support vector machine (T-LSTWSVM) classifiers with multiple kernel functions such as linear, polynomial and radial basis function (RBF) kernels. This method applies on two main types of classification problems such as binary class problem as well as multi-class problem. The evaluation and accuracy is calculated in terms of distance metric. It was observed that multi-class classification problem performed excellently by tree-based method.
    Keywords: Binary Tree; Classification; Hyper plane; Kernel Function; Machine Learning; Support Vector Machine (SVM); Least Square Twin SVM.
    DOI: 10.1504/IJBIDM.2018.10009883
  • An Utility Based Approach for Business Intelligence to Discover Beneficial Itemsets With or Without Negative Profit in Retail Business Industry   Order a copy of this article
    by C. SIVAMATHI, S. Vijayarani 
    Abstract: Utility mining is defined as discovery of high utility itemsets from the large databases. It can be applied in business Intelligence for business decision-making such as arranging products in shelf, catalogue design, customer segmentation, cross-selling etc. In this work a novel algorithm MAHUIM (matrix approach for high utility itemset mining) is proposed to reveal high utility itemsets from a transaction database. The proposed algorithm uses dynamic matrix structure. The algorithm scans the database only once and does not generate candidate itemsets. The algorithm calculates minimum threshold value automatically, without seeking from the user. The proposed algorithm is compared with the existing algorithms like HUI-Miner, D2HUP and EFIM. For handling negative utility values, MANHUIM algorithm is proposed and this is compared with HUINIV. For performance analysis, four benchmark datasets like Connect, Foodmart, Chess and Mushroom are used. The result shows that the proposed algorithms are efficient than the existing ones.
    Keywords: Utility mining; High utility itemset mining; individual item utility; transaction utility; Minimum utility threshold; Negative utility; Pruning strategy; Profitable transactions.
    DOI: 10.1504/IJBIDM.2018.10009884
  • Automated Optimal Test Data Generation for OCL Specification Using Harmony Search Algorithm   Order a copy of this article
    by A. Jali 
    Abstract: Exploring software testing possibilities at an early software life cycle is increasingly necessary to avoid the propagation of defects to the subsequent phases. This requirement demands technique that can generate automated test cases at the initial phases of software development. Thus, we propose a novel framework for automated test data generation using formal specifications written in object constraint language (OCL). We also defined a novel fitness function named exit-predicate-wise branch coverage (EPWBC) to evaluate the generated test data. Another focus of the proposed approach is to optimise the test case generation process by applying, harmony search (HS) algorithm. The experimental results indicate that the proposed framework outperforms the other OCL-based test case generation techniques. Furthermore, it has been inferred that OCL based testing adopting HS algorithm forms an excellent combination to produce more test coverage and an optimal test suite thereby improving the quality of a system.
    Keywords: specification-based testing; OCL;object constraint language; HS; harmony search; EPWBC; exit-predicate-wise branch coverage;Optimal Test Case Generation.
    DOI: 10.1504/IJBIDM.2018.10009885
  • Characteristic of Enterprise Collaboration System and Its Implementation Issues in Business Management   Order a copy of this article
    by Tanvi Bhatia, Sudhanshu Joshi, Tanvi Bhatia, Sadhna Sharma, Durgesh Samadhiya, Rajiv Ratn Shah 
    Abstract: Collaboration is an extremely useful area for the most of the enterprise systems particularly within Web 2.0 and Enterprise 2.0. The collaboration provides help in enterprise collaboration system (ECS) to achieve the desired goal by unifying completed tasks of employees or people working on a similar or the same task. Thus, the collaboration systems have witnessed significant attention. The ECS provides consistent and off-the-shelf support to processes and managements within organisations. Management techniques of the ECS may be useful to a community which manages ECS systems for collaboration. In this context, this paper focuses on enterprise collaboration system and answers critical questions related to ECS including: 1) what does collaboration really means for an enterprise system; 2) how can the collaboration help to improve internal processes and management of the system; 3) how it is helpful to improve interactions with customers and partners?
    Keywords: Enterprise Collaboration System; Web 2.0; Enterprise 2.0; Management Techniques; Enterprise System.
    DOI: 10.1504/IJBIDM.2019.10010132
  • Unsupervised Key Frame Selection using Information Theory and Color Histogram Difference   Order a copy of this article
    by Janya Sainui, Masashi Sugiyama 
    Abstract: Key frame selection is one of the important research issues in video content analysis, as it helps effective video browsing and retrieval as well as efficient storage. Key frames would typically be as different from each other as possible but, at the same time, cover the entire content of the video. However, the existing methods still lose some meaningful frames due to an inaccurate evaluation of the differences between frames. To address this issue, in this paper, we propose a novel method of key frame selection which incorporates an information theoretic measure, called quadratic mutual information (QMI), with the colour histogram difference. Here, these two criteria are used to produce an appropriate frame difference measure. Through the experiments, we demonstrate that the proposed key frame selection method generates a more coverage of the entire video content with minimum redundancy of key frames compared with the competing approaches.
    Keywords: Key frame selection; Similarity measure; Information theory ; Quadratic mutual information ; Color histogram di?erence.
    DOI: 10.1504/IJBIDM.2018.10010173
  • Building Acoustic Model for Phoneme Recognition using PSO-DBN   Order a copy of this article
    by B.R. Laxmi Sree, M.S. Vijaya 
    Abstract: Deep neural networks has shown its power in generous classification problems including speech recognition. This paper proposes to enhance the power of deep belief network (DBN) further by pre-training the neural network using particle swarm optimisation (PSO). The objective of this work is to build an efficient acoustic model with deep belief networks for phoneme recognition with much better computational complexity. The result of using PSO for pre-training the network drastically reduces the training time of DBN and also decreases the Phoneme error rate (PER) of the acoustic model built to classify the phonemes. Three variations of PSO namely, the basic PSO, second generation PSO (SGPSO) and the New model PSO (NMPSO) are applied in pre-training the DBN to analyse their performance on phoneme classification. It is observed that the basic PSO is performing comparably better to other PSOs considered in this work, most of the time.
    Keywords: Phoneme Recognition; Deep Neural Networks; Particle Swarm Optimisation; Acoustic Model; Tamil Speech Recognition; Deep Learning. Deep Belief Networks.
    DOI: 10.1504/IJBIDM.2018.10010711
  • Efficient search for top-k discords in streaming time series   Order a copy of this article
    by Giao Bui Cong, Duong Tuan Anh 
    Abstract: The problem of anomaly detection in streaming time series has received much attention recently. The problem addresses finding the most anomalous subsequence (discord) over a time-series stream, which might arrive at high speed. The fact that finding top-k discords is more useful than finding the most unusual subsequence since users might make a choice among the top-k discords instead of choosing only one. Hence, an efficient method of search for top-k discords in streaming time series is proposed in the paper. The method uses a lower bound threshold, a lower bounding technique on a common dimensionality reduction transform, and a state-of-the-art technique of the distance computation between two time-series subsequences to prune off unnecessary distance calculations. The three techniques are arranged in a cascading fashion to speed up the performance of the method. Furthermore, the proposed method can return a set of top-k discords on the fly. The experimental results show that the proposed method can acquire quality discords nearly identical to those obtained by HOT SAX, a well-known method of anomaly detection. Remarkably, our proposed method demonstrates a fast response in handling time-series streams at high speed.
    Keywords: anomaly detection; discord; streaming time series.
    DOI: 10.1504/IJBIDM.2018.10010853
  • Mining Big data streams using Business analytics tools: A bird   Order a copy of this article
    by Arunkumar PM, S. Kannimuthu 
    Abstract: Big data evolves as the prominent field in modern computing era. Big data analytics and its impact on extracting business intelligence is becoming indispensable for plethora of applications. The non-proprietary software revolution paved the way for illustrious evolution of tools like Weka, rapid miner, orange and R. Traditional data mining techniques hardly adapts to the requirements of rapid data analysis. The data stream processing algorithms that handle multitude of data endow with greater challenge in real time. Big data mining requires further improvisation in traditional tools to address the challenges of Massive data processing. This paper highlights the importance of data stream mining and explores two important open source frameworks, namely massive online analysis (MOA) and scalable advanced massive online analysis (SAMOA). The implications of both the tools augurs well for further deliberations in big data research community. Business information system (BIS) models can reach unprecedented heights with the proliferation of these business analytics tools.
    Keywords: Big Data; Data mining; Data streams; Massive online analysis; Business Intelligence.
    DOI: 10.1504/IJBIDM.2019.10010854
  • A novel dynamic approach to identifying suspicious customers in money transactions   Order a copy of this article
    by Abdul Khalique Shaikh, Amril Nazir 
    Abstract: Money laundering activity causes a negative impact on the development of the national economy. Anti-money laundering (AML) solutions within financial institutions facilitate to control it in a suitable way. However, one of the fundamental challenges in AML solution is to identify real suspicious transactions. To identify these types of transactions, existing research uses pre-defined rules and statistical approaches that help to detect the suspicious transactions. However, due to the fixed and predetermined rules, it is highly probable that a normal customer can be identified as suspicious customers. To overcome the above limitations, a novel dynamic approach to identifying suspicious customers in money transactions is proposed that is based on dynamic analysis of customer profile features to identify suspicious transactions. The experiment has been executed with real bank customers and their transactions data and the results of the experiment provide promising outcomes in terms of accuracy.
    Keywords: AML; anti-money laundering; suspicious transactions; money transaction; dynamic AML analysis; data analysis.
    DOI: 10.1504/IJBIDM.2019.10010869
  • Anomaly detection for elderly home care   Order a copy of this article
    by Kurnianingsih Kurnianingsih, Lukito Edi Nugroho, Widyawan Widyawan, Lutfan Lazuardi, Anton Satria Prabuwono, Mahardhika Pratama 
    Abstract: In this paper, we propose a model for detecting anomalies in elderly home care. Two scenarios are investigated in detecting anomalies: 1) the elderly person's vital signs and their surrounding environment; 2) the mobility patterns of the elderly. We evaluated our proposed model by employing the isolation forest which detects anomalies using an isolation approach on a random forest of decision trees. We compare isolation forest on unlabeled data with statistical methods on labelled data. Subsequently, to show the reliability of the isolation concept, we compare it with a distance measure concept. The experiment shows that isolation forest has higher detection accuracy and lower error prediction for two attributes in the first scenario: skin temperature and heart rate, whereas, in the second scenario, multi-covariance determinant has a slightly better accuracy compared to isolation forest (3.9% difference in accuracy) and has a small number of prediction errors compared to isolation forest.
    Keywords: anomaly detection; isolation forest; elderly home care.
    DOI: 10.1504/IJBIDM.2018.10011101
  • Multi-Document Based Text Summarization Through Deep Learning Algorithm   Order a copy of this article
    by G. PadmaPriya, K. Duraiswamy 
    Abstract: The proposed approach is provided an effort in terms of deep leaning algorithm to retrieve an effective text summary for a set of documents. Basically, the proposed system consists of two phases such as training phase and the testing phases. The training phase is used for exploiting the three different algorithms to make the text summarisation process an effective one. Similar to every training phase, the proposed training phases is also possessed of known data and attributes. After that, the testing phase is implemented to test the efficiency of the proposed approach. For experimentation, we used four documents sets which are selected from the DUC (2002). The experimental evaluation showed expected results as, the average precision of 78%, the average recall of 1 and the average f-measure of 84%.
    Keywords: Particle Swarm Optimisation; Text Summarization ; Deep Learning Algorithm.
    DOI: 10.1504/IJBIDM.2018.10011144
  • Grey-Wolf Optimizer Based Feature Selection for Feature-Level Multi-Focus Image Fusion   Order a copy of this article
    by Sujatha K, D. Shalini Punithavathani, J. Janet, S. Venkatalakshmi 
    Abstract: This paper proposes optimal ensemble-individual-features (OEIF) for multi-focus image fusion through combining the decision information of individual features. This proposed system consists of three stages. In the first stage, the different types of features such as spatial, texture and frequency are extracted from every block on input blurred images. In the second step, grey wolf optimiser (GWO)-based features validation method is proposed to find suitable features from source images. This method is based on an iterative process, in which each individual represents a candidate solution for validating/invalidating the features. In the final step, the ensemble decision based on optimal individual features is utilised to fuse blurred images. We prove that OEIF method is better in comparison to the noisy feature-based individual pixel-level and the feature-level fusion methods with different multi-focus images and it reveals that OGWO-based proposed method performs better visual quality than other methods.
    Keywords: Multi-focus image fusion; grey wolf optimiser; feature validation; spatial; texture; frequency.
    DOI: 10.1504/IJBIDM.2018.10011145
  • Online Products Recommendation System using Genetic Kernel Fuzzy C-Means and Probabilistic Neural Network   Order a copy of this article
    by Manohar E, D. Shalini Punithavathani 
    Abstract: The purchaser's review plays a significant role in choosing the purchasing activities for online shopping as a customer desires to obtain the opinion of other purchasers by observing their opinion through online products. However, most appropriate product selection from the best website is a challenging problem for online users. Accordingly, this paper proposes a hybrid recommendation system for identifying customer preferences and recommending the most appropriate product. To do this, first the dataset is collected and prepared in the pre-processing step. Genetic kernel fuzzy C-means (GAKFCM) is used for usage cluster formation after the pre-processing step. The different features are extracted from each cluster-based user interest level. The user interest levels are used as features for classifier to extract user knowledge discovery. Based upon the user interest level, the product recommendation is done using probabilistic neural network (PNN). The simulation results show high precision rate which clearly indicates that the proposed method is very useful and appealing.
    Keywords: website; web-log; ranking; rating; review; products; Genetic Kernel Fuzzy C-Means; probabilistic neural network.
    DOI: 10.1504/IJBIDM.2018.10011146
  • Hybridising Neural Network and Pattern Matching under Dynamic Time Warping for Time Series Prediction   Order a copy of this article
    by Thanh Son Nguyen 
    Abstract: Pattern matching-based forecasting models are attractive due to their simplicity and the ability to predict complex nonlinear behaviours. Euclidean measure is the most commonly used metric for pattern matching in time series. However, its weakness is that it is sensitive to distortion in time axis; so, this can influence on forecasting results. The dynamic time warping (DTW) measure is introduced as a solution to the weakness of Euclidean distance metric. In addition, artificial neural networks (ANNs) have been widely used in the time series forecasting. They have been used to capture the complex relationships with a variety of patterns. In this work, we propose an improved hybrid method which is an affine combination of neural network model and DTW-based pattern matching model for time series prediction. This method can take full advantage of the individual strengths of the two models to create a more effective approach for time series prediction. Experimental results show that our proposed method outperforms neural network model and DTW-based pattern matching method used separately in time series prediction.
    Keywords: time series; pattern matching; artificial neural network; time series prediction; dynamic time warping; k-nearest neighbour.
    DOI: 10.1504/IJBIS.2018.10011147
  • REFERS: Refined & Effective Fuzzy E-commerce Recommendation System   Order a copy of this article
    by Sankar Pariserum Perumal, Ganapathy Sannasi, Kannan Arputharaj 
    Abstract: Online shopping culture is gaining traction globally and some of the biggest beneficiaries of this e-commerce shift are Amazon, eBay, etc. Recommendation systems guide online users in a personalised manner to choose what they want and their interest on each product present in the catalogue list. In such a scenario, the existing systems need complete information for making recommendations, which is not always possible in real applications. Therefore, a novel refined and effective fuzzy e-commerce recommendation system has been proposed in this paper that combines the benefits of difference in importance within the rating factors by a single user and new similarity measure approach that aims at improved recommendation list to the e-commerce user. The proposed methodology has been implemented using a new similarity measure on experimental datasets and the refined scores for such e-commerce website-based unlocked mobile phones are compared in this work against classic similarity measures.
    Keywords: Fuzzy recommendation system; degree of similarity measure; rating factor importance; collective expert rating.
    DOI: 10.1504/IJBIDM.2019.10011148
  • Decision tree classifier for university single rate tuition fee system   Order a copy of this article
    by Taufik F. Abidin, Samsul Rizal 
    Abstract: The regulation about single rate tuition fee for undergraduate study at state universities in Indonesia was enacted in 2013. The tuition fee is calculated based on the needs of each academic program and the regional cost index. The fee is grouped into several categories and set differently for each university. For Syiah Kuala University, located in Banda Aceh, Indonesia, the tuition fee is grouped into five different categories. This paper describes the construction of J48 decision tree classifier and evaluates its performance during training and testing phases when compared to ID3 and Naive Bayes classifiers to determine the category. The results show that the J48 decision tree classifier outperforms the other two classifiers in both phases. In the training phase, the F-measure and ROC for the J48 decision tree classifier are 0.889 and 0.973, respectively, and in the testing phase, the F-measure and ROC are 0.911 and 0.987, respectively.
    Keywords: Decision tree classifier; multi-class classification; university single rate tuition fee system.
    DOI: 10.1504/IJBIDM.2019.10011149
  • Using Diverse Set of Features to Design a Content-Based Video Retrieval System Optimized by Gravitational Search Algorithm   Order a copy of this article
    by S. Padmakala, Ganapathy Sankar Anandha Mala, K.M. Anandkumar 
    Abstract: This paper explains about the content based video retrieval approach (CBVR) using four varieties of features and 12 distance measurements, which is optimized by gravitational search algorithm (GSA). Initially, CBVR technique extracts five kinds of features such as color, texture, shape, image and audio features that belong to each frame. Consequently, it emerges particular distance measurements for every sort of features to compute the similarity between query frame and remaining in the database frame. In this paper, we have used GSA to find the nearly optimal combination between the features and their respective similarity measurements. At last, from the video database, the query based videos are recovered. For experimentation, here we used two types of databases such as sports video and UCF sports action datasets. The experimental results demonstrate that the proposed CBVR method shows better performance when contrasted with other existing methods.
    Keywords: video retrieval; distance measurements; color; texture; shape; audio; CBVR; similarity; combinations.
    DOI: 10.1504/IJBIDM.2018.10012001
  • Weighted Neuro-Fuzzy Hybrid Algorithm for Channel Equalization in Time Varying Channel   Order a copy of this article
    by Zeeshan A Abbasi, Zainul Abdin Jaffery 
    Abstract: In MIMO-OFDM communication systems, accurate and specific channel estimation and equalisations are plays a major role. In this paper, we use weighted neuro-fuzzy hybrid (WNFH) channel estimation algorithm for channel equalisation. The pilot is designed based on combination of neural network and fuzzy logic system. Scaled conjugate gradient (SCG) is mutual with group search optimiser (GSO) algorithm along with; the training procedure of neural network is prepared using the hybrid training algorithm. In the transmitter section, the projected system contains quadrature amplitude modulation (QAM) and transmitter. By considering the channel prediction error to recover the performance of symbol detection the minimum mean-square error (MMSE) estimation design is accomplished. To reduce the MMSE of channel estimation and the calculated pilot sequences present great superiority in MIMO-OFDM system. Experimentation outcome shows that the channel assessment is supportive.
    Keywords: MIMO-OFDM; Group Search Optimizer; Scaled Conjugate Gradient; Channel Estimation.
    DOI: 10.1504/IJBIDM.2019.10012002
  • Discrete Weibull regression for modeling football outcomes   Order a copy of this article
    by Alessandro Barbiero 
    Abstract: We propose the use of the discrete Weibull distribution for modeling football match results, as an alternative to existing Poisson and generalized Poisson models. The number of goals scored by the two teams playing a football match are regarded as a pairwise observation and are modelled first through two independent discrete Weibull variables, and then through two dependent discrete Weibull variables, using a copula approach that accommodates non-null correlation. The parameters of the bivariate discrete Weibull distributions are assumed to depend on covariates such as the attack and defense abilities of the two teams and the 'home effect'. Several discrete Weibull regression models are proposed and then applied to the 2015-2016 Italian Serie A. Even if the interpretation of parameters is less immediate than in the case of bivariate Poisson models, nevertheless these models represent a suitable alternative, which can be applied also in other fields than sport data analysis.
    Keywords: count data; count regression model; Frank copula; Poisson distribution; sport analytics.
    DOI: 10.1504/IJBIDM.2018.10012003
  • Prediction of Process Parameters in Electrical Discharge Machining Using Response Surface Methodology and ANN: An Experimental Study   Order a copy of this article
    by T.M. Chenthil Jegan, R. Chitra, V.S. Thangarasu 
    Abstract: In the present work, the process parameters of Electro Discharge Machining are predicted by Response Surface Methodology and Artificial Neural Network (ANN) in AA6061. AA6061 is extensively used in aircraft and aerospace applications. In order to reduce the depletion of the material during machining, prediction of appropriate machining parameter is essential. Current, Pulse On, Pulse Off and Flushing Pressure are considered as input parameters for prediction. Experiments were conducted with those parameters in five different levels and data collected related to process responses for optimization. Material removal rate and surface roughness measured for each experimental run were compared, utilized to fit a quadratic mathematical model in Response Surface Methodology. ANN with back propagation algorithm was used to develop the relationship between input parameters and predominant output responses. The performance of the developed model is analyzed ANOVA and regression plot. The results proved that ANN model is better for empirical modelling.
    Keywords: EDM; Design of Experiments; Response Surface Methodology; Artificial Neural Network Material Removal Rate; Surface Roughness.
    DOI: 10.1504/IJBIDM.2018.10012006
  • Implementation of Multi Node Hadoop Virtual Cluster on Open Stack Cloud Environments   Order a copy of this article
    by Karthikeyan Saminathan, R. Manimegalai 
    Abstract: Nowadays computing plays a vital role in information technology and all other fields. Yes, the Cloud Computing is one of the biggest milestone in most leading next generation technology and booming up in IT filed and business sectors. In our day to day life the data is being generated is enormous amount such as Tera (TB), Peta(PB), Zeta(ZB) bytes. Hadoop Map Reduce is the popular distributed computing paradigm to process data intensive jobs in cloud. Completion time goals or deadline of map reduce jobs set by users are becoming crucial in existing cloud based data processing environments like Hadoop. In this paper proposed a real-time implementation of single node Hadoop cluster on Open stack private cloud and handles the huge data sets in parallel Virtual Machines and compares its average execution time for different size inputs.
    Keywords: Cloud –Data intensive- Hadoop - Map Reduce- Open Stack-Cluster.
    DOI: 10.1504/IJBIDM.2019.10012007
  • Research on Aircraft Landing Schedule using Opposition Based Genetic Algorithm with Cauchy Mutation   Order a copy of this article
    by C. Nithyanandam, Gabriel Mohankumar 
    Abstract: Optimal scheduling of airport runway operation plays a significant responsibility in the aircraft transportation. Arrival runways are a crucial resource in the air traffic system. Arrival delays encompass an immense impact on airline operations in addition to cost. An imperative responsibility is the planning of airport operations like arrival and departure of aircraft. At this juncture, this paper describes the technique of the execution time in addition to the penalty cost of the every aircrafts. These experimentations demonstrate whenever aircrafts landing on the runway in the mean while no congestion on to facilitate particular path, if it is happening subsequently it seems to be problematic. In order towards eradicating these problems, neural network and genetic algorithms through Cauchy mutations are utilised in the direction of eradicating the congestion occur during the runway as well as in addition to proposed technique towards reducing the penalty cost to be charged.
    Keywords: Artificial Neural Network (ANN); Aircraft Selection; Aircraft Landing Problem Opposition Genetic Algorithms with Cauchy Mutation; Runway Selection; Scheduling.
    DOI: 10.1504/IJBIDM.2018.10012008
  • ScrAnViz: A Tool for Analytics and Visualization of Unstructured Data   Order a copy of this article
    by Sriraghav Kameswaran, V.S. Felix Enigo 
    Abstract: Existing big data visualization tools are meant for visualizing structured data. But survey shows that about 80-90% of potentially usable business information is in unstructured format. Analyzing unstructured data is challenging due to lack of structure and relational form. In this paper, we have proposed a tool called ScrAnViz that can structure data, perform analysis and provide visualization thereby helps in decision making for business people and end users. An attribute based opinion mining algorithm has been developed and implemented. Performance analysis shows that the algorithm has reduced the search time by three times than the traditional document level sentiment analysis systems.
    Keywords: Unstructured data; Data Analytics; Sentiment Analysis; Opinion Mining; Data visualization.
    DOI: 10.1504/IJBIDM.2019.10012009
  • Link prediction in multilayer networks   Order a copy of this article
    by Deepak Malik, Anurag Singh 
    Abstract: Link prediction has gained popularity in recent years in large networks. Researchers have proposed various methods for finding the missing links. These methods include common neighbour, Jaccard coefficient, etc. based on the proximity of the nodes. These methods have limitations as they treat all common nodes equal from a pair of nodes. A new method is proposed, common neighbours common neighbour (CNCN). Its performance is better than the existing methods in a single layer network. These methods are based on the topological features of the network. The proposed method finds the different behaviour of common nodes for a pair of nodes. The link prediction is also useful in the multiplex networks. The link predictions in the multiplex networks are more useful than the single layer network as several layers may give more information about a node than the single layer network. Two methods are proposed using dynamic and static weights.
    Keywords: common neighbours; complex network; link prediction.
    DOI: 10.1504/IJBIDM.2018.10012010
    by P. Velvizhy, A. Pravi, M. Selvi, S. Ganapathy, A. Kannan 
    Abstract: Opinion Mining is an ongoing research area in E-commerce which aims at analyzing the people's opinions, sentiments and emotions. Moreover, the existing E-commerce systems allow the users to share their feedback in the form of textual reviews regarding the products and services. It also allows the consumers to give ratings for products that help in future recommendation of products. In this research work, a computational framework for efficiently predicting the consumer review ratings on the products has been proposed. The proposed framework integrates Dimensionality Reduction, Genetic Algorithm, Fuzzy C-Means and Adaptive Neuro-Fuzzy Inference techniques to overcome the limitations of the existing systems. Experiments have been conducted in this work using Amazon dataset consisting of reviews for different products. This system provides better performance and prediction accuracy for review ratings when it is compared with the related work.
    Keywords: sentiment analysis; review ratings prediction; dimensionality reduction; genetic algorithm; data mining; fuzzy c means.
    DOI: 10.1504/IJBIDM.2019.10012011
  • A Technique for Semantic Annotation and Retrieval of E-Learning Objects   Order a copy of this article
    by Balavivekanandhan A 
    Abstract: The primary objective of my research is to design and develop semantic annotation and retrieval model for e-learning document. In training phase, the documents from different domains are taken and the informative words from each document are obtained based on balanced mutual information and frequency of contents in each document. We then use the informative words to identify the superordinates and the objects. The superordinates, the informative words and the objects from each document will give the relation and properties of each document. The relation and properties of each document are then used to cluster the documents. In the testing phase, we give a query or a document as input to the system to retrieve the relevant documents. If a document is given as input, the relation and properties of that document are first identified and it is used to retrieve the relevant documents.
    Keywords: e-learning; document clustering; balanced mutual information; one way matching; cluster based matching.
    DOI: 10.1504/IJBIDM.2018.10012012
    by Bolanle Ojokoh, Oluwatosin Olatunbosun Aboluje, Tobore Igbe 
    Abstract: In this paper, Pearson's correlation coefficient is employed for collaborative filtering due to its ability to manipulate numerical data as well as determine linear relationship among existing users. Its steps involve a user-user representation, similarity generation and prediction generation with a goal to produce a predicted opinion of the active user about a specific item. Concept of parental control is also incorporated for enhancement. Evaluation of the system was done using precision, recall, F-measure, discounted cumulative gain (DCG), idealised discounted cumulative gain (IDCG), normalised discounted cumulative gain (nDCG) and mean absolute error (MAE). Three hundred fortysix datasets were used, out of which 126 were gathered from local video shops and 220 were extracted from internet movie database (IMDb). These were used for the experiments and the results generated through mining of data obtained from profiles and ratings of system users prove the system's average ranking quality of the collaborative filtering algorithm is 95.9%.
    Keywords: Movies; Recommendation; Collaborative Filtering; Information Filtering; Correlation Coefficient; Evaluation.
    DOI: 10.1504/IJBIDM.2018.10012014
  • Location based Personalized Recommendation systems for the Tourists in India   Order a copy of this article
    by Madhusree Kuanr, Sachi Nandan Mohanty 
    Abstract: This study examines the collaborative filtering in recommender system by categorising users according to their choices of place, food, local item purchase, etc. The proposed system will store the opinions of the local users about the sites, foods and products for purchase available in those sites. It uses collaborative filtering technique to find the similar users to a given querying user. The system recommends the best sites along with good foods and products available on those sites according to the recent data. Two hundred (male = 110, female = 90) married individuals from Bhubaneswar, Odisha (India) participated in this survey. Cosine similarity is used in the proposed system to find the similar users of a given input query user. The results revealed that collaborative filtering is the more reliable technique for personalised recommender systems. Experimental results show performance of the proposed system in terms of precision, recall and F-measure values.
    Keywords: collaborative filtering; recommender systems; user profile generation; India.
    DOI: 10.1504/IJBIDM.2019.10012396
  • Stability analysis of feature ranking techniques in the presence of noise: a comparative study   Order a copy of this article
    by Iman Ramezani, Mojtaba Khorram Niaki, Milad Dehghani, Mostafa Rezapour 
    Abstract: Noisy data is one of the common problems associated with real-world data, and may affects the performance of the data models, consequent decisions and the performance of feature ranking techniques. In this paper, we show how stability performance can be changed if different feature ranking methods against attribute noise and class noise are used. We consider Kendalls Tau rank correlation and Spearman rank correlation to evaluate various feature ranking methods stability, and quantify the degree of agreement between ordered lists of features created by a filter on a clean dataset and its outputs on the same dataset corrupted with different combinations of the noise level. According to the results of Kendall and Spearman measures, Gini index (GI) and information gain (IG) have the best performances respectively. Nevertheless, both Kendall and Spearman measures results show that ReliefF (RF) is the most sensitive (the worst) performance.
    Keywords: attribute noise; class noise; filter-based feature ranking; threshold-based feature ranking; stability; Kendall's Tau rank correlation; Spearman rank correlation.
    DOI: 10.1504/IJBIDM.2019.10012557
  • Topic-driven top-k similarity search by applying constrained meta-path based in content-based schema-enriched heterogeneous information network   Order a copy of this article
    by Phu Pham, Phuc Do 
    Abstract: In this paper, we propose a model of TopCPathSim in order to address the problem related to topic-driven similarity searching based on constrained meta-path (or also called restricted meta-path) between same-typed objects within the content-based heterogeneous information networks (HINs). The topic distributions over content-based objects such as: paper/article on the bibliographic network or users comments/reviews on the social networks, etc. are obtained by using the LDA topic model. We conduct the experiments on the real DBLP, Aminer and ACM datasets which demonstrate the effectiveness of our proposed model. Throughout experiments, our proposed model gains about 73.56% in accuracy. The output results also show that the combination of probabilistic topic model with constrained meta-path is promising to leverage the output quality of topic-oriented similarity searching in content-based HINs.
    Keywords: constrained meta-path; content-based heterogeneous information network; topic-driven similarity search; LDA; topic modelling.
    DOI: 10.1504/IJBIDM.2019.10012558
  • Deep learning framework for early detection of intrusion in Virtual Environment   Order a copy of this article
    by Madhu Priya G, S. Mercy Shalinie, P. Mohana Priya 
    Abstract: Today's business enterprise adapts cloud based services as its architectural design. Intelligence technique incorporated into the architecture gives massive tangible and intangible benefits in terms of performance and reliability. Such cloud based business architecture faces many threats towards its availability. DDoS attack is the most prominent threat as its impact is more in the virtual resource based cloud infrastructure. Therefore, there is a need for a Business Intelligence based framework to detect early the attack by monitoring the virtual network traffic. The proposed intelligence framework uses a deep learning framework, Continuous Discriminative-Deep Belief Network (CD-DBN). CD-DBN dynamically captures attack patterns from the network data, analyzes the data and detects the intrusion to the cloud. The observed result shows that the earlier detection approach guarantees the availability of cloud services to the legitimate users and enhances the cloud resource usage.
    Keywords: Deep Learning; Restricted Boltzmann Machine; Deep Belief Network; Cloud Environment; Virtualization; Hypervisor; Intrusion Detection; Availability threat; DDoS attack; SysBench benchmark suite.
    DOI: 10.1504/IJBIDM.2018.10012559
  • Analysing Thyroid Disease using Density Based Clustering Technique   Order a copy of this article
    by Tanupriya Choudhury, Veenita Kunwar, A. Sai Sabitha, Abhay Bansal, Tanupriya Choudhury 
    Abstract: Data mining in medicine has been used to predict unknown patterns in health data and to obtain diagnostic results. Healthcare industry generates large amounts of complex data about patients, diseases and treatments. Data mining in healthcare provides benefits like detecting fraud, availing medical facilities for patients at low cost, ensuring high quality patient care and making healthcare policies. Disease detection has become essential due to increased number of health issues occurring day by day. The thyroid has become one such concern with numerous cases being detected yearly. It causes improper functioning of the thyroid gland. In this paper, clustering technique has been used to detect and understand factors influencing thyroid disease. DBSCAN algorithm has been used as it can handle clusters of varying shapes and sizes and is noise resistant. PCA has also been done for finding high dimension data patterns and to reduce dimension. The experimental setup has been implemented in RapidMiner.
    Keywords: Data mining; Clustering; Thyroid disease; DBSCAN; Principal component analysis.
    DOI: 10.1504/IJBIDM.2019.10013037
  • A Simple Transform Domain Based Low Level Primitives Preserving Texture Synthesis   Order a copy of this article
    by S. Anuvelavan, M. GANESH, P. Ganesan 
    Abstract: In this work, a new patch-based texture synthesis scheme with orthogonal polynomials model coefficients is presented. The proposed scheme has four phases. In the first phase, a block matching technique that identifies a best match, to synthesis in the output image of bigger size is designed in terms of ordered orthogonal polynomials model coefficients. In case of successful match of block, called patch-hit, the proposed scheme finds candidate blocks with triangular search, in the next phase. In the patch selection phase, the proposed scheme considers a subset of orthogonal polynomials model coefficients among the blocks, for the purpose of synthesis which consumes less memory and time. This synthesised output is smoothened in the final phase, by preserving the low level contents between the synthesised patches. The performance of the proposed scheme is measured with energy, contrast, correlation, homogeneity and entropy between the original and synthesised images and is also compared with existing texture synthesis schemes. The results are encouraging.
    Keywords: Texture Synthesis; Orthogonal Polynomials; Patch-Hit; Candidate Block; Patch Selection.
    DOI: 10.1504/IJBIDM.2018.10013005
  • Optimal Region growing and Multi-kernel SVM for fault detection in Electrical Equipments using Infrared Thermography Images   Order a copy of this article
    by C. Shanmugam, E. Chandira Sekaran 
    Abstract: Infrared thermography (IRT) has played an essential part in observing and examining thermal defects of electrical equipment without ending, which has vital enormity for the dependability of electrical recorded. This paper dissected the electrical parts are faulted or non-faulted with the help of segmentation and classification model. The features are calculated from the input thermal images and regions of interest (ROI) is segmented by utilising optimal region growing (ORG) technique and faults are classified using multi kernel support vector machine (MKSVM). In the tests, the classification performances from different input features are assessed. For enhancing the performance of the segmentation investigation optimisation procedure that is whale optimisation (WO) is used. Before classifying, the extracted electrical components are fused by using feature level fusion (FLF) procedure to fused vector in all images. These multi Kernel classification performance indices, including sensitivity, specificity and accuracy are utilised to recognise the most appropriate input feature and the best arrangement of classifiers. The performance of SVM is contrasted with a neural network. The correlation comes about demonstrating that our technique can accomplish a superior performance with accuracy at 98.21%.
    Keywords: Feature extraction; Whale optimisation,Support vector machine; optimisation; Classification and fault detection,Infrared thermography.
    DOI: 10.1504/IJBIDM.2019.10013039
  • ComRank: community-based ranking approach for heterogeneous information network analysis and mining   Order a copy of this article
    by Phu Pham, Phuc Do 
    Abstract: In this paper, we propose the ComRank model to address this problem of ranking a specific typed of object, over the generated topic-driven communities in the information networks. The topic-driven communities are generated by applying the latent topic modelling of LDA. Our proposed ComRank model is directly generated ranking results for specific typed object in the different network communities. We apply our approach to construct the scholastic recommendation system, which support the researchers to find the appropriate citations or potential authors for cooperating while doing scientific researches. The ComRank model is tested with the real-world dataset of DBLP bibliographic network. The experimental results demonstrated that our proposed model can generate the meaningful ranking results within detected topic-driven communities.
    Keywords: information network; heterogeneous network; bibliographic network; community detection; community-based ranking; path-based ranking.
    DOI: 10.1504/IJBIDM.2019.10013040
  • AGS: A Precise and Efficient AI Based Hybrid Software Effort Estimation Model   Order a copy of this article
    by Vignaraj Vikraman, S. Srinivasan 
    Abstract: To predict the amount of effort to develop software is a tedious process for software companies. Hence, predicting the software development effort remains a complex issue drawing in extensive research consideration. The success of software development process considerably depends on proper estimation of effort required to develop that software. Effective software effort estimation techniques enable project managers to schedule software life cycle activities properly. The main objective of this paper is to propose a novel approach in which an artificial intelligence (AI)-based technique, called AGS algorithm, is used to determine the software effort estimation. AGS is hybrid method combining three techniques, namely: adaptive neuro fuzzy inference system (ANFIS), genetic algorithm and satin bower bird optimisation (SBO) algorithm. The performance of the proposed method is assessed using a well standard dataset with real-time benchmark with many attributes. The major metrics used in the performance evaluation are correlation coefficient (CC), kilo lines of code (KLoC) and complexity of the software. The experimental result shows that the prediction accuracy of the proposed model is better than the existing algorithmic models.
    Keywords: Software Effort Estimation; AI; ANFIS; Lines of code (LoC); Genetic Algorithm (GA); Satin Bower Bird Optimiser (SBO); Correlation Co-efficient (CC); Kilo Lines of Code (KLoC),Software Complexity.
    DOI: 10.1504/IJBIDM.2019.10013150
  • High dimensional sentiment classification of product reviews using evolutionary computation   Order a copy of this article
    by Sonu Lal Gupta, Anurag Singh Baghel 
    Abstract: Feature selection is an important process in text classification. In general, traditional feature selection approaches are based on exhaustive search hence become inefficient due to a large search space. Further, this task becomes more challenging as the number of features increases. Recently, evolutionary computation (EC)-based search techniques have received a lot of attention in solving feature selection problem in high-dimensional feature space. This paper proposes a particle swarm optimisation (PSO)-based feature selection approach which is capable of generating the desired number of high-quality features from a large feature space. The proposed algorithm is tested on a large dataset and compared with several existing state-of-the-art algorithms used for feature selection. The accuracy of the underlying classifier has been considered as a measure of performance. Our obtained results demonstrated that the proposed PSO-based feature selection approach outperforms the other traditional feature selection algorithms in all the considered classifiers.
    Keywords: sentiment classification; feature selection; particle swarm optimisation; PSO; evolutionary computation; support vector machine; SVM; naïve Bayes; NB; mutual information; MI; chi-square; CHI.
    DOI: 10.1504/IJBIDM.2019.10013337
  • Using bagging to enhance clustering procedures for planar shapes   Order a copy of this article
    by Elaine Cristina De Assis, Renata Souza, Getulio José Amorim Do Amaral 
    Abstract: Partitional clustering algorithms find a partition maximizing or minimizing some numerical criterion. Statistical shape analysis is used to make decisions observing the shape of objects. The shape of an object is the remaining information when the effects of location, scale and rotation are removed. This paper introduces clustering algorithms suitable for planar shapes. Four numerical criteria are adapted to each algorithm. In order to escape from local optima to reach a better clustering, these algorithms are performed in the framework of Bagging procedures. Simulation studies are carried to validate these proposed methods and two real-life data sets are also considered. The experiment quality is assessed by the corrected Rand index and the results the application of the proposed algorithms showed the effectiveness of these algorithms using different clustering criteria and the union of the Bagging method to the cluster algorithms provided substantial gains in of the quality of the clusters.
    Keywords: Statistical Shape Analysis; Partitional Clustering Methods; Bagging Procedure.
    DOI: 10.1504/IJBIDM.2019.10013537
  • Impact of Clustering on quality of Recommendation in Cluster based Collaborative Filtering: an Empirical Study   Order a copy of this article
    by MONIKA SINGH, Monica Mehrotra 
    Abstract: In memory nearest neighbour computation is a typical approach for collaborative filtering (CF) due to its high recommendation accuracy. However, this approach fails on scalability; which is the declined performance of the same due to the rapid increase in the number of users and items in archetypal merchandising applications. One of the popular techniques to attenuate scalability issue is cluster-based collaborative filtering (CBCF), which uses clustering approach to group most similar users/items from complete dataset. In this work we present a detailed analysis of the impact of clustering in CF approach. Specifically, we study how the extent of clustering impacts collaborative filtering systems in terms of quality of predictions, quality of recommendations, throughput and coverage. Based on the empirical results obtained from two datasets, Movielens100K and Jester; we conclude that with increasing number of clusters the quality of predictions, the quality of recommendations and the throughput are enhanced but the coverage provided by clustered subsystems declines.
    Keywords: Recommender Systems; Collaborative Filtering; Clustering; Prediction; Nearest neighbors; Clustering based collaborative filtering; Average recommendation time; Coverage; Quality of predictions and Qua.
    DOI: 10.1504/IJBIDM.2019.10013538
    by Lakshmi R, S. Baskar 
    Abstract: In this paper, two new similarity measures, namely distance of term frequency-based similarity measure (DTFSM) and presence of common terms-based similarity measure (PCTSM), are proposed to compute the similarity between two documents for improving the effectiveness of text document clustering. The effectiveness of the proposed similarity measures is evaluated on reuters-21578 and WebKB datasets for clustering the documents using K-means and K-means++ clustering algorithms. The results obtained by using the proposed DTFSM and PCTSM are significantly better than other measures for document clustering in terms of accuracy, entropy, recall and F-measure. It is evident that the proposed similarity measures not only improve the effectiveness of the text document clustering, but also reduce the complexity of similarity measures based on the number of required operations during text document clustering.
    Keywords: Document Clustering; Similarity Measures; Accuracy; Entropy; Recall; F-Measure; K-means clustering Algorithm.
    DOI: 10.1504/IJBIDM.2018.10013539
  • XML web quality analysis by employing MFCM clustering Technique and KNN classification   Order a copy of this article
    by M. Gopianand, P. Jaganathan 
    Abstract: The great accomplishment of web search engine is keyword search which is the most trendy search representation for regular consumers. It is permits that the consumer can create the queries without the knowledge of query language and the database schema. So, it is also considered as a user friendly method. The quality of XML web has to be accurate if the exact queries have to be answered. Here we have proposed a method to access the quality of the XML web by analysing the keyword present in the XML web based on the respective keyword search. In our proposed method we collect number of XML documents and are clustered based on the keyword depending on the type of XML files. Modified fuzzy C means (MFCM) is used for clustering. Once the clustering based on the respective keyword is done, we classify the XML web based on quality of the data by utilising KNN classifier.
    Keywords: XML web; K nearest neighbor; Error value; Classification accuracy; feature vectors.
    DOI: 10.1504/IJBIDM.2018.10014525
  • Analysis and Prediction of Heart Disease Aid of Various Data Mining Techniques: A Survey   Order a copy of this article
    by V. Poornima, D. Gladis 
    Abstract: In recent times, health diseases are expanding gradually because of inherited. Particularly, heart disease has turned out to be the more typical nowadays, i.e., life of individuals is at hazard. The data mining strategies specifically decision tree, Na
    Keywords: Data mining; Heart Disease Prediction; performance measure; Fuzzy; and clustering.
    DOI: 10.1504/IJBIDM.2018.10014620
  • Signal-Flow Graph Analysis and Implementation of Novel Power Tracking Algorithm Using Fuzzy Logic Controller   Order a copy of this article
    by S. VENKATESAN, Manimaran Saravanan, Subramanian Venkatnarayanan, Senior Member IEEE 
    Abstract: This paper discussed merits of novel modified perturb and observe (P&O) maximum power point tracker (MPPT) algorithm for stand-alone solar PV system using interleaved LUO converter with fuzzy logic controller (FLC). The merits of FLC based system are compared with existing system. Analytical expressions of the proposed converter are derived through signal flow graph. The proposed interleaved LUO converter based PV system with fuzzy controller reduces considerable amount of ripple content and also proposed MPPT algorithm creates less hunting around maximum power point. Simulations at different illumination levels are carried-out using MATLAB/Simulink. It also experimentally verified with a typical 40 W solar PV panel. The result confirms the superiority of the proposed system with fuzzy controller.
    Keywords: Fuzzy Logic Controller; Interleaved LUO Converter; Maximum Power Point Tracking (MPPT); Modified P&O algorithm; Photovoltaic(PV) system.
    DOI: 10.1504/IJBIDM.2018.10014621
  • SoLoMo Cities: Socio-Spatial City Formation Detection and Evolution Tracking Approach   Order a copy of this article
    by Sara Elhishi, Mervat Abu-Elkheir, Ahmed Aboul-Fotouh 
    Abstract: The tremendous growth of telecommunication devices coupled with the huge number of social media users has revealed a new kind of development that turning our cities into information-rich smart platforms. We analyse the role of LBSN check-ins using social community detection methods to extract city structured communities, which we call SoLoMo cities, using a modified version of Louvain algorithm, then we track these communities evolution patterns through a pairwise consecutive matching process to detect behavioural events changing citys communities. The findings of the experiments on the Brightkite dataset can be summarised as follows: online users check-in activities reveal a set of well-formed physical land spaces of citys communities, the concentration of online social interactions and the formation of those cities are positively correlated with a percentage of 89%. Finally, we were able to track the evolution of the discovered communities through detecting three community behaviour events: survive, grow and shrink.
    Keywords: location-based social networks; LBSN; social; spatial analysis; community detection; evolution; tracking; Brightkite.
    DOI: 10.1504/IJBIDM.2019.10014746
    by Betty P, Mohanageetha D, Jeena Jacob 
    Abstract: Biometric authentication has received greater significance due to its high uniqueness and performance. The ability of quick and convenient authentication is required due to its widespread demand. Extraction of feature is the primary and important task for effective authentication. Dissimilar chrominance texture pattern (DiCTP) technique is used in this paper for effective feature extraction. Patterns of two sequences are generated from the inter channel information of the image which extracts the coloured texture information of the input. Unique information is generated from RGB and BRG planes of the image which produces a part of diversified chromatic feature vectors. The local binary pattern (LBP) code is generated and added along with the feature vector, which aids to inculcate the greyscale information of the image. The experimental results are formulated using the CASIA Face Image Database Version 5 (DB1) and Indian Face database (DB2) which give considerable improvements over the existing methodology.
    Keywords: Biometric Authentication; Dissimilar Chrominance Texture Pattern ; Content Based Image Retrieval.
    DOI: 10.1504/IJBIDM.2018.10014922
  • Discovery of Rare Association Rules in the Distribution of Lawsuits in the Federal Justice System of Southern Brazil   Order a copy of this article
    by Lucia Gruginskie, Guilherme Vaccaro, Leonardo Chiwiakwosky, Attilla Blesz Jr 
    Abstract: In the context of data mining, infrequent association rules may be beneficial for analysing rare or extreme cases with very low support values and high confidence. In researching risky situations or allocating specific resources, such rules may have a much greater impact than rules with high support value. The objective of this study is to obtain association rules from the database of lawsuits filed in the Federal Court of Southern Brazil in 2016, including both frequent and rare rules. By finding these rules, especially rare ones, the information collected can assist in the decision-making process, in this case, such as training clerks or establishing specialised courts.
    Keywords: Association Rules; Rare Rules; Distribution of lawsuits; Brazilian Federal Justice; Data mining.
    DOI: 10.1504/IJBIDM.2019.10015160
  • Integral Verification and Validation for Knowledge Discovery Procedure Models   Order a copy of this article
    by Anne Antonia Scheidler, Markus Rabe 
    Abstract: This paper explains why the knowledge discovery in database (KDD) procedure models lacks verification and validation (V&V) mechanisms and introduces an approach for integral V&V. Based on a generic model for knowledge discovery, a structure named 'KDD triangle model' is presented. This model has a modular design and can be adapted for other KDD procedure models. This has the benefit of allowing existing projects for improving their quality assurance in knowledge discovery. In this paper, the different phases of the developed triangle model for KDD are discussed. One special focus is on the phase results and related testing mechanisms. This paper also describes possible V&V techniques for the developed integral V&V mechanism to ensure direct applicability of the model.
    Keywords: knowledge discovery in databases; data mining; procedure model; verification and validation; quality assurance.
    DOI: 10.1504/IJBIDM.2019.10015983
  • A Multiclass Classification Approach for Incremental Entity Resolution on Short Textual Data   Order a copy of this article
    by Denilson Pereira, João A. Silva 
    Abstract: Several web applications maintain data repositories containing references to thousands of real-world entities originating from multiple sources, and they continually receive new data. Identifying the distinct entities and associating the correct references to each one is a problem known as entity resolution. The challenge is to solve the problem incrementally, as the data arrive, especially when those data are described by a single textual attribute. In this paper, we propose a new approach for incremental entity resolution. The method we have implemented, called AssocIER, uses an ensemble of multiclass classifiers with self-training and detection of novel classes. We have evaluated our method in various real-world datasets and scenarios, comparing it with a traditional entity resolution approach. The results show that AssocIER is effective and efficient to solve unstructured data in collections with a large number of entities and features, and is able to detect hundreds of novel classes.
    Keywords: Entity Resolution; Associative Classification; Incremental Learning; Novel Class Detection; Self-training.
    DOI: 10.1504/IJBIDM.2019.10015984
  • Method for Improvement of Transparency: Use of Text Mining Techniques for Reclassification of Governmental Expenditures Records in Brazil   Order a copy of this article
    by Gustavo De Oliveira Almeida, Kate Revoredo, Claudia Cappelli, Cristiano Maciel 
    Abstract: Many countries have transparency laws requiring availability of data. However, often data is available but not transparent. We present the Transparency Portal of Brazilian Federal Government case and discuss limitations of public acquisitions data stored in free text format. We employed text-mining techniques to reclassify descriptive texts of measurement units related to products and services. The solution presented in KNIME and JAVA aggregated measurements in the original (n = 69,372 with 78% reduction in number of descriptions, 94% items classified) and in cross validation sample (n = 105,266 with 88% reduction, classifying 78% of items). In addition, we tested computational time for processing of texts for a wide range of data input sizes, suggesting the stability and scalability of the solution to process larger datasets. Finally, we produced analysis identifying probable input errors, suppliers and purchasing units with abnormal transactions and factors affecting procurement prices. We present suggestions for future research and improvements.
    Keywords: e-government; data mining; open government; text mining; transparency; KNIME; knowledge discovery; techniques; Brazil.
    DOI: 10.1504/IJBIDM.2019.10015985
  • Data Mining in Credit Insurance Information System for Bank Loans Risk Management in Developing Countries   Order a copy of this article
    by Fouad J. Al Azzawi 
    Abstract: The task of credit risk insurance in our time is critical since loans are taken by everyone and everywhere and it is quite difficult to accurately estimate the possible losses that are incurred by failing to pay those loans. This work proposes an information system module for the banking system to improve the risk management operation that distributes losses on some fair basis, as well as accepting the maximum number of loan requests. Insuring the risk associated with stumbled loans, the bank will partially or completely shift losses under this contract to the insurance company, thus minimising its own losses. The proposed module could find out for what price the bank can buy such insurance policy. The proposed module also could be a key valuable motivation for different development countries to update their strategy of current insurance market to outsource part of the states insurance functions to independent insurance industry. Data mining techniques and mathematical induction have been used and successfully implemented this model. An optimal classification solution module for predicting risky loan requests have been successfully employed. New mathematical model has been developed for calculating the cost of insurance policy in crisis economy.
    Keywords: Data mining; Credit insurance; information systems; Bank loans; risk management; developing countries.
    DOI: 10.1504/IJBIDM.2019.10016599
  • Fibonacci Retracement Pattern Recognition for Forecasting Foreign Exchange Market   Order a copy of this article
    by Mohd Fauzi Ramli, AHMAD KADRI JUNOH, Mahyun Ab Wahab, Wan Zuki Azman Wan Muhamad 
    Abstract: Fibonacci retracement implicates a forecast of future movements in foreign exchange rates (forex) of the previous movement inductive analysis. Fibonacci ratios are used to forecast the retracements level of 0.382, 0.500 and 0.618 and to determine the current trend which provide the mathematical foundation for the Elliott wave theory. K-nearest neighbour (KNN) and linear discriminant analysis (LDA) algorithm are the pattern recognition method for nonlinear feature mining of Elliott wave patterns. Results show that LDA is better than KNN in terms of classification accuracy data which are 99.43%. Among of three levels of Fibonacci retracement results, the 38.2% shows the best forecasting for Great Britain Pound pair to US Dollar currency as major pair by using mean absolute error (MAE), root mean square error (RMSE) and pearson correlation coefficient (r) as the statistical measurements which are 0.001884, 0.000019 and 0.992253 for uptrend and 0.001685, 0.000019 and 0.998806 for downtrend.
    Keywords: forex; forecast; fibonacci retracement; elliott wave; golden ratio.
    DOI: 10.1504/IJBIDM.2019.10016710
  • CARs-RP: Lasso Based Class Association Rules Pruning   Order a copy of this article
    by AZMI Mohamed, Abdelaziz Berrado 
    Abstract: Classification based on association rules gets more and more interest in research and practice. In many contexts, rules are often mined from sparse data in high-dimensional spaces, which leads to large number of rules with considerable containment and overlap. Pruning is often used in search for an optimal subset of rules. This paper introduces a method for class association rules (CARs) pruning. It learns weights for a set of CARs by maximising the likelihood function subject to the sum of the absolute values of the weights. The pruning strength is controlled by a shrinkage parameter ?. The suggested method allows the user to choose the appropriate subset of CARs. This is achieved based on a trade-off between the accuracy and complexity of the resulting classifier which is controlled by changing ?. Experimental analysis shows that the introduced method allows to build more concise classifiers with comparable accuracy to other methods.
    Keywords: class association rules; pruning; regularization; weighting; associative classification.
    DOI: 10.1504/IJBIDM.2019.10018121
  • A statistical approach to investigate the alternatives of love in Moulanas Divan   Order a copy of this article
    by Mohammad Reza Mahmoudi, Ali Abbasalizadeh, Marzieh Rahmati 
    Abstract: Conceptual metaphor is the systematic mapping of conceptual domains on each other. Love is the most important axis of mystical path. In this paper, all the lines in Moulanas are studied and different words, which are used as alternatives of love, are determined and classified in 11 areas. Then chi-square goodness of fit test is used to investigate and compare the frequency of different areas and words which are used as alternatives of love, separately. Finally, based on the clustering methods, these alternatives are clustered in three (high frequency, medium frequency, and low frequency). The results indicate the word fire and the area human have the highest uses as the alternatives of love.
    Keywords: Conceptual Metaphor Love; Moulana; Statistics; Data Mining; Text Mining.
    DOI: 10.1504/IJBIDM.2019.10018197
  • PPM-HC: a Method for Helping Project Portfolio Management Based on Topic Hierarchy Learning   Order a copy of this article
    by Ricardo M. Marcacini, Ricardo A. M. Pinto, Flavia Bernardini 
    Abstract: The projects categorisation is a crucial step in the project portfolio management (PPM). Categorising projects allows the organisation to identify categories with a lack or excess of projects, according to its strategic objectives. In this work, we present a new method for project portfolio management based on hierarchical clustering (PPM-HC) to organise the projects at several levels of abstraction. In the PPM-HC, similar projects are allocated to the same clusters and subclusters. PPM-HC automatically learns an understandable topic hierarchy from the project portfolio dataset, thereby facilitating the (human) task of exploring, analysing and prioritising the projects of the organisation. We also proposed a card sorting-based technique which allows the evaluation of the projects categorisation using an intuitive visual map. We carried out an experimental evaluation based on a benchmark dataset and we also presented a real-world case study. The results show that the proposed PPM-HC method is promising.
    Keywords: Project Portfolio Management; Projects Categorization; Topic Hierarchy Learning; Hierarchical Clustering.
    DOI: 10.1504/IJBIDM.2019.10018936
  • An efficient approach for Defect Detection in Texture analysis using Improved Support Vector Machine   Order a copy of this article
    by Manimozhi I., Janakiraman S. 
    Abstract: Texture defect detection can be defined as the process of determining the location and size of the collection pixels in a textures image which deviate in their intensity values or spatial in compression to a background texture. The detection of abnormalities is a very challenging problem in computer vision. In our proposed method we have designed a method for detecting the defect of pattern texture analysis. Initially, features are extracted from the input image using the gray level co-occurrence matrix (GLCM) and gray level run-length matrix (GLRLM). Then the extracted features are fed to the input of classification stage. Here the classification is done by improved support vector machine (ISVM). The proposed pattern analysis the traditional support vector machine is improved by means of kernel methods. Final stage is the classified features are segmented using the modified fuzzy C means algorithm (MFCM).
    Keywords: Texture defect detection; preprocessing; Gray Level Co-occurrence matrix; Gray Level Run-Length Matrix; Improved Support Vector Machine; modified fuzzy c means algorithm.
    DOI: 10.1504/IJBIDM.2019.10018937
    by A. M. Viswa Bharathy  
    Abstract: The classification techniques proposed so far is not sufficiently intelligent in classifying data set beyond two level classifications. To multi classify the data set for network data we are in need of more hybrid algorithms. In this paper we propose a hybrid technique by combining a modified K-means algorithm called dynamic replicative K-means (DRKM) with self-compiling particle swarm intelligence (SCPSI). The dataset we have chosen for the experiment is KDD Cup 99. The DRKM-SCPSI performs better in terms of detection rate (DR), false positive rate (FPR) and accuracy which is visible from the results presented.
    Keywords: anomaly; detection; intrusion; K-Means; PSI.
    DOI: 10.1504/IJBIDM.2019.10019194
    by Pedro Alexandre Henrique, Pedro Albuquerque, Peng Yao Hao, Sarah Sabino 
    Abstract: This study aimed to verify whether the use of support vector regression (SVR) makes the portfolios return exceed the market. For such propose, SVR was applied for 15 different kernel functions to select the best stocks for each quarter, calculating the quarterly portfolio return and cumulative return along the period. Subsequently, the returns of these portfolios were compared with the returns of a market benchmark. Whites (2000) test was applied to avoid the data-snooping effect in assessing the statistical significance of the portfolios developed by the training strategies. The portfolio selected by SVR with inverse multiquadric kernel presented the highest cumulative return of 374.40% and a value at risk (VaR) of 6.87%.The results of this study corroborate the superiority hypothesis of the innovative method of Support Vector Regression in the formation of portfolios, thus constituting a robust predictive method capable to cope with high dimensionality interactions.
    Keywords: Statistical Learning Theory. Optimization Theory. Financial Econometrics. Support Vector Machine. Kernel methods.
    DOI: 10.1504/IJBIDM.2019.10019195
  • Worldwide Gross Revenue Prediction for Bollywood Movies using Hybrid Ensemble Model   Order a copy of this article
    by Alina Zaidi, Siddhaling Urolagin 
    Abstract: Prediction of revenue before a movie is released can be very beneficial for stakeholders and investors in the movie industry. Even though Indian cinema is a booming industry, the literature work in the field of movie revenue prediction is more inclined towards non-Indian movie. In this study we built a novel hybrid prediction model to predict worldwide gross for Bollywood movies. Bollywood movies dataset is prepared by downloading movie related features from IMDb and YouTube movie trailers which consisting of 674 movies. K-means clustering is performed on the movie dataset and two major clusters are identifier. Important features specific to clusters are selected. The proposed hybrid prediction model performs segregation of movies into two clusters and employs prediction model for each cluster. Prediction models we tested included various basic machine learning models and ensemble models. The ensemble model that combined predictions from support vector regression, neural network and ridge regression gave us the best result for both clusters and we chose it to be our final model. We obtain an overall MAE of 0.0272 and R2 of 0.80 after 10-fold cross validation.
    Keywords: Bollywood; Movie Revenue Prediction; Box office; Regression; Ensemble; Feature Selection; Machine Learning; Scikit-Learn.
    DOI: 10.1504/IJBIDM.2019.10019858
  • Health Data Warehouses: Reviewing Advanced Solutions for Medical Knowledge Discovery   Order a copy of this article
    by Norah Alghamdi 
    Abstract: The implementation of a data warehouse and a decision support system by utilising the capabilities of information retrieval and knowledge discovery tools in the healthcare fields, has allowed for the enhancement in the offered healthcare. In this work, we present a review of recent data warehouses and decision support systems in the healthcare domain with their significance, and applications of evidence-based medicine, electronic health records, and nursing. Given the growing trend on their implementation in healthcare services, researches, and education, we present here the most recent publications that employ these tools to produce suitable decisions for patients or health providers. For all the reviewed publications, we have intensively explored their problems, suggested solutions, utilised methods, and their findings. We have also highlighted the strength of the existing approaches and identified potential drawbacks including data correctness, completeness, consistency, and integration to provide proper medical decision-making.
    Keywords: Data warehouses; Data Mining; Health Data; Medical Records; Quality; Knowledge Discovery; OLAP.
    DOI: 10.1504/IJBIDM.2019.10019971
  • Survey on-demand: A versatile scientific article automated inquiry method using text mining applied to Asset Liability Management   Order a copy of this article
    by Pedro Henrique Albuquerque, Igor Nascimento, Peng Yao Hao 
    Abstract: We proposed a methodology that automatically relate content of text documents with lexical items. The model estimates whether an article addresses a specific research object based on the relevant words in its abstract and title using text mining and partial least square discriminant analysis. The model is efficient in accuracy and the adjustment and validation indicators are either superior or equal to the other models in the literature on text classification. In comparison to existing methods, our method offers highly interpretable outcomes and allows flexible measurements of word frequency. The proposed solution may aid scholars regarding the process of searching theoretical references, suggesting scientific articles based on the similarities among the used vocabulary. Applied to the finance area, our framework has indicated that approximately 10% of the publications in the selected journals that address the subject of asset liability management. Moreover, we highlight the journals with the largest number of publications over time and the key words about the subject using only freely accessible information.
    Keywords: dimensionality reduction; discriminant analysis; text classification; partial least square; bibliometrics.
    DOI: 10.1504/IJBIDM.2019.10020278
  • Clustering Student Instagram accounts using Author-Topic Model Based   Order a copy of this article
    by Nur Rakhmawati, Faiz NF, Irmasari Hafidz, Indra Raditya, Pande Dinatha, Andrianto Suwignyo 
    Abstract: The aim of this study proposes topic model to cluster a group of high school teenager's Instagram account in Surabaya, Indonesia by using the author-topic models method. We collect valid 235 Instagram account (133 female, 102 male students). We gather a total 3,346 captions of the Instagram post from 18 senior high schools. We find major findings what are the topics that define their Instagram's post or caption, seven topics namely: feeling, Surabaya events, photography, artists, vacation, religion and music. Through the process, the lowest perplexity come from 90 iterations, which suggests six groups of topics. The six topics are concluded based on the lowest perplexity value and labelled according to the words included in the topic. The topic of Photography discussed by six schools. Photography-Artists and vacation are discussed by three schools, while feeling, religion and music are being discussed by two and one school respectively.
    Keywords: Topic Modelling ; Senior High School Students ; Author-Topic Models.
    DOI: 10.1504/IJBIDM.2020.10020280
  • The approach of using ontology as pre-knowledge source for semi-supervised labelled topic model by applying text dependency graph   Order a copy of this article
    by Phu Pham, Phuc Do 
    Abstract: Multiple topics discovering from text is an important task in text mining. From the past, the supervised approaches fail to explore multiple topics in text. The topic modelling approach, such as: LSI, pLSI, LDA, etc. are considered as an unsupervised method which supports to discover distributions of multiple topics in text documents. The labelled LDA (LLDA) model is a supervised method which enables to integrate human labelled topics with the given text corpus during the process of modelling topics. However, in real applications, we may not have enough high qualified knowledge to properly assign the topics for all documents before applying the LLDA. In this paper, we present two approaches which are taken the advantage of dependency graph-of-words (GOW) in text analysis. The GOW approach uses frequent sub-graph mining (FSM) technique to extract graph-based concepts from text. Our first approach is method of using graph-based concepts for constructing domain-specific ontology. It is called GC2Onto model. In our second approach, the graph-based concepts are also applied to improve the quality of traditional LLDA. It is called LLDA-GOW model. We combine two GC2Onto and LLDA-GOW models to leverage the multiple topic identification as well as other mining tasks in text.
    Keywords: topic identification; labelled topic modelling; LDA; labelled LDA; ontology-driven topic labelling; dependency graph.
    DOI: 10.1504/IJBIDM.2019.10020863
  • RFID BI Mobility and Producer to Consumer Traceability Architecture   Order a copy of this article
    by Andre Claude Bayomock Linwa  
    Abstract: Radio frequency identifier (RFID) emerged in 2000 an intelligent remote object identification. RFID helps tracking object position and relevant information using radio frequency technology (Bouet and dos Santos, 2008; Pais, 2010). Its application in industries, highly increases the inventory management consistency and accuracy, by capturing in real-time observed object attributes for traceability and quality control purpose. In order to provide traceability and quality control services, RFID applications should offer two main services: business intelligence (BI) and mobility management. The RFID BI provides production traceability services (QoS metrics related to manufacturing processes). And RFID mobility service maintains accurate RFID tag location. In this paper, a generic RFID BI mobility' data model is defined. In the proposed data model, RFID product information generated by a supply chain organisation is translated or migrated from a producer to a consumer. This migration generates two distinct types of RFID mobility: internal (inside buildings) and external.
    Keywords: Mobility Management; RFID; Business Intelligence BI; Data Models; Business Processes; QoS; Mobile Networks; GPS; Events; Mobility Subscription.
    DOI: 10.1504/IJBIDM.2019.10021261
  • Sentimental Event Detection from Arabic Tweets   Order a copy of this article
    by Mohammad Daoud, Daoud Daoud 
    Abstract: This article presents and evaluates an approach to detect sentimental events from Twitter Arabic data streams. Sentimental events attract strongly opinionated responses from the online community; therefore, we aim at detecting the association of a topic with a positive or a negative sentiment at a particular time. To achieve that, we build sentimental time series where the frequencies of that association (between topics and sentiment) are recorded. And then, we use several algorithms to locate possible events. Events in positive timelines will be considered as positive, and similarly for negative events. Our approaches use Shannon diversity index and hill climbing peak finding. We experimented our proposed algorithms with the domain of football (soccer) news. The results showed good precision and recall considering mainstream media as a reference. The success of such experiment can open the door for many useful applications including reputation and brand monitoring systems for various domains and languages.
    Keywords: event detection; sentiment analysis; social media analysis; diversity analysis; data mining.
    DOI: 10.1504/IJBIDM.2018.10021262
  • A comparison of cluster algorithms as applied to unsupervised surveys   Order a copy of this article
    by Kathleen C. Garwood, Arpit Dhobale 
    Abstract: When considering answering important questions with data, unsupervised data offers extensive insight opportunity and unique challenges. This study considers student survey data with a specific goal of clustering students into like groups with underlying concept of identifying different poverty levels. Fuzzy logic is considered during the data cleaning and organising phase helping to create a logical dependent variable for analysis comparison. Using multiple data reduction techniques, the survey was reduced and cleaned. Finally, multiple clustering techniques (k-means, k-modes and hierarchical clustering) are applied and compared. Though each method has strengths, the goal was to identify which was most viable when applied to survey data and specifically when trying to identify the most impoverished students.
    Keywords: Fuzzy logic; cluster analysis; unsupervised learning; survey analysis; decision support system; k-means; k-modes; hierarchical clustering.
    DOI: 10.1504/IJBIDM.2019.10021263
  • Discovery of inconsistent generalized coherent rules   Order a copy of this article
    by Anuradha Radhakrishnan, Rajkumar N, Rathi Gopalakrishnan, Soosaimichael PrinceSahayaBrighty 
    Abstract: Mining multiple-level association rules in a predefined taxonomy is an hierarchies that paves the way for generalised rule mining using interestingness measures like support and confidence. Coherent rule mining identifies significant rules in a database without using interestingness measures. In this paper we propose a new mining algorithm called generalised inconsistent coherent rule mining (GICRM) for mining a new form of generalised coherent rules called Inconsistent coherent rules. The discovered rules are called inconsistent because the correlation of the rules changes from one level of taxonomy to another. The rules are mined from a structured dataset of predefined taxonomy. The inconsistent rules mined would be noteworthy at business point of view for taking strategic decisions in market basket analysis.
    Keywords: GICRM; multiple-level; generalized inconsistent coherent rule; taxonomy.
    DOI: 10.1504/IJBIDM.2019.10021264
  • Time and Structural Anomalies Detection in Business Processes Using Process Mining   Order a copy of this article
    by Elham Saeedi, Faramarz Safi-Esfahani 
    Abstract: Information systems are increasingly being integrated into operational process and as a result, many events are recorded by information systems. Lack of compatibility between the process model and the observed behaviour is one of the challenges in constructing the process model in process mining. This lack of compatibility could be present in both the structure (sequence of the task) and the time spent in each task. In this paper, a hybrid approach for detecting structural and time anomalies via process mining is proposed. A dataset form Iran Insurance Company is used for performing a case study. The proposed method has detected 98.5% of structure anomalies and 96.3% of time anomalies which is one of the main achievements of this paper. A second standard dataset is used to further examine the proposed method that referred to as dataset 2. The proposed method has demonstrated a better performance compared with the baseline approach.
    Keywords: Process mining; conformance checking; workflow mining; structural anomaly; time anomaly; flexible model; Insurance anomaly; anomaly detection; process model; control-flow perspective.
    DOI: 10.1504/IJBIDM.2019.10021265
    by Gandhi Mathi 
    Abstract: This paper is devoted to the study of intuitionistic fuzzy topological spaces. In this paper we introduce the concepts of intuitionistic fuzzy g*-closed sets in intuitionistic fuzzy topological spaces and studied some of its basic properties. Also we introduce the concepts of intuitionistic fuzzy g*-open sets in intuitionistic fuzzy topological spaces and derived several basic properties. We show that Intuitionistic fuzzy g*-closed sets lies between intuitionistic fuzzy ?-closed sets and intuitionistic fuzzy g-closed sets. We also introduced application of intuitionistic fuzzy g*-closed sets namely intuitionistic fuzzy T_(1/2)^*space and(_^*)T_(1/2) space. We obtained some characterizations and several preservation theorems of intuitionistic fuzzy topological spaces.
    Keywords: Intuitionistic fuzzy topology; Intuitionistic fuzzy g*-closed sets; Intuitionistic fuzzy g*-open sets.
    DOI: 10.1504/IJBIDM.2018.10021462
  • The mediation roles of purchase intention and brand trust in relationship between social marketing activities and brand loyalty   Order a copy of this article
    by Nasrin Yazdanian, Saman Ronagh, Parya Laghaei, Fatemeh Mostafshar 
    Abstract: The rise of social media significantly challenges the way of firms managing about introducing their brands. The literature on social media marketing activities (SMMA) has promoted specially in the field of luxury marketing. Building on the basic of web 2.0 social media applications have simplified and facilitated extraordinary growth in customer interaction in modern times. The objective of this study is to examine the role of affecting factors which influence Iranian luxury brands customers' attitude toward purchase intention and brand loyalty. A questionnaire was used for collecting data from a sample of 114 luxury brand customers in social media in Tehran, capital and metropolitan city of Iran. Structural equation modelling was applied to examine the impact of social media marketing activities on brand loyalty. The mediating role of purchase intention and brand trust is considered too. The results indicated that entertainment does not have positive impact on purchase intention, brand trust and brand loyalty. The results of this research enable luxury brands managers to forecast the future purchasing behaviour of their customers and provide a guide to managing their strategies and marketing activities in competitive environment.
    Keywords: luxury brands; social media marketing activities; SMMA; brand trust; loyalty; purchase intention.
    DOI: 10.1504/IJBIDM.2018.10008661
  • A hybrid framework for job scheduling on cloud using firefly and BAT algorithm   Order a copy of this article
    by Bhagavathi Hariharan, Dassan Paul Raj 
    Abstract: Nowadays cloud computing is an emerging field, requires more algorithm and techniques for the various process of cloud computing. Here, we have considered the job scheduling process in cloud computing platform that needs a good algorithm to schedule the jobs requested from various users of cloud computing environment. Here, the request can be from any platform so scheduling is indispensable one when a number of users need the particular jobs. In this research, we have intended to develop a hybrid algorithm for job scheduling in cloud computing environment. Accordingly, multiple criteria will be taken for scheduling various jobs located in various servers. Then, the job scheduling will be done based on a hybrid optimisation algorithm. Additionally, different jobs with different constraints will be considered and the cloud computing environment is simulated with the help of cloudsim tool.
    Keywords: cloud computing; firefly algorithm; BAT algorithm; job scheduling; FF-BAT algorithm.
    DOI: 10.1504/IJBIDM.2018.10009440
  • Optimal decision tree fuzzy rule-based classifier for heart disease prediction using improved cuckoo search algorithm   Order a copy of this article
    by Subhashini Narayan, Jagadeesh Gobal 
    Abstract: Heart disease is a major cause for anomaly in developed countries and one of the basic diseases in developing countries. Then there is a necessary to insert an alternative expressively caring network for predicting heart disease of a patient. The clinical alternative expressively caring networks contain three method of preprocessing such as preprocessing, generate decision rule and rule weighting, classification. Initially, the Cleveland data, Hungarian data and Switzerland data are loud in the reliable information from the database in preprocessing. On this process, underline quantity reduction method will be associated to reduce the components space exploiting orthogonal neighbourhood safeguarding projection (OLPP) computation. While, the combinations of cuckoo search algorithm, fuzzy and decision tree classifier can create a hybrid classifier. Here, fuzzy and decision tree algorithm will be sufficiently combined with cuckoo search (CS) algorithm and which will guide for accurate grouping.
    Keywords: preprocessing; cuckoo search; fuzzy; decision tree; classification.
    DOI: 10.1504/IJBIDM.2018.10008934
  • Efficient clustering technique for k-anonymisation with aid of optimal KFCM   Order a copy of this article
    by G. Chitra Ganabathi, P. Uma Maheswari 
    Abstract: The k-anonymity model is a simple and practical approach for data privacy preservation. To minimise the information loss due to anonymisation, it is crucial to group similar data together and then anonymises each group individually. So that in this paper proposes a novel clustering method for conducting the k-anonymity model effectively. The clustering will be done by an optimal kernel based fuzzy c-means clustering algorithm (KFCM). In KFCM, the original Euclidean distance in the FCM is replaced by a kernel-induced distance. Here the objective function of the kernel fuzzy c-means clustering algorithm is optimised with the help of modified grey wolf optimisation algorithm (MGWO). Based on that, the collected data is grouped in an effective manner. The performance of the proposed technique is evaluated by means of information loss, time taken to group the available data. The proposed technique will be implemented in the working platform of MATLAB.
    Keywords: privacy preservation; k-anonymity; kernel fuzzy c-means; KFCM; grey wolf optimisation; information loss.
    DOI: 10.1504/IJBIDM.2018.10008933
  • Multi label learning approaches for multi species avifaunal occurrence modelling: a case study of south eastern Tamil Nadu   Order a copy of this article
    by S. Appavu Alias Balamurugan, P.K.A. Chitra, S. Geetha 
    Abstract: Many multi label problem transformation (PT) and algorithm adaptation (AA) methods need to be explored to get good candidate for avifaunal occupancy modelling. This research contrasted eight commonly used state-of-the-art PT and AA multi label methods. The data was created by collecting January 2014-December 2014 records from e-bird repository for the study area Madurai district, south eastern Tamil Nadu. The analysis shows that classifier chain (CC) and multi label naive Bayes (MLNB) are the good aspirants for avifauna data. The MLNB did best with 0.019 hamming loss and 90% average precision. To the best of our knowledge this is the first time to use MLNB for avifaunal data and the results of multi label naive Bayes concludes that out of 143 species observed, six species had high occurrence rate and 68 species had low occurrence rate.
    Keywords: multi species occupancy; multi label learning; multi label naive Bayes; MLNB; central part of southern Tamil Nadu.
    DOI: 10.1504/IJBIDM.2018.10008307
  • Enhancing purchase decision using multi-word target bootstrapping with part-of-speech pattern recognition algorithm   Order a copy of this article
    by M. Pradeepa, C. Deisy 
    Abstract: In this research work, multi-word target related terms are extracted automatically from the customer reviews for sentiment analysis. We used LIDF measure and have proposed a novel measure called, TCumass in iterative multi-word target (IMWT) bootstrapping algorithm. In addition, part-of-speech pattern recognition (PPR) algorithm has been proposed to identify the appropriate target and emotional words from multi-word target related terms. This article aims to bring out both implicit and explicit targets with their corresponding polarities in an unsupervised manner. We proposed two models namely, MWTB without PPR and MWTB with PPR. Thus, the present research illustrates the comparison between the proposed works and the existing multi-aspect bootstrapping (MAB) algorithm. The experiment has been done based on different data sets and thereafter the performance evaluated using different measures. From this study, the result expounds that MWTB with PPR model performs well, having achieved the precise targets and emotional words.
    Keywords: bootstrapping; emotional polarity; multi-word target; part-of-speech; POS; sentiment analysis.
    DOI: 10.1504/IJBIDM.2018.10008334