Template-Type: ReDIF-Article 1.0
Author-Name: Imad Bouteraa
Author-X-Name-First: Imad
Author-X-Name-Last: Bouteraa
Author-Name: Makhlouf Derdour
Author-X-Name-First: Makhlouf
Author-X-Name-Last: Derdour
Author-Name: Ahmed Ahmim
Author-X-Name-First: Ahmed
Author-X-Name-Last: Ahmim
Title: Intrusion detection using classification techniques: a comparative study
Abstract:
Today's highly connected world suffers from the increase and variety of cyber-attacks. To mitigate those threats, researchers have been continuously exploring different methods for intrusion detection through the last years. In this paper, we study the use of data mining techniques for intrusion detection. The research intends to compare the performances of classification techniques for intrusion detection. To reach the goal, we involve 74 classification techniques in this comparative study. The study shows that no technique outperforms the others in all situations. However, some classification methods lead to promising results and give clues for further combinations.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 65-86
Issue: 1
Volume: 12
Year: 2020
Keywords: data mining; classification; network security; intrusion detection; KDD99.
File-URL: http://www.inderscience.com/link.php?id=105596
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:1:p:65-86

Template-Type: ReDIF-Article 1.0
Author-Name: Sravani Nalluri
Author-X-Name-First: Sravani
Author-X-Name-Last: Nalluri
Author-Name: R. Sasikala
Author-X-Name-First: R.
Author-X-Name-Last: Sasikala
Title: An insight into application of big data analytics in healthcare
Abstract:
The main aim of this paper is to comprehend, gain insight of the current trends in application of big data in healthcare, and to identify the various potential healthcare horizons. A brief analysis was done on 'big data analytics in healthcare' focusing on collection of data, the tools employed, the aspects of health that were addressed, the type of machine learning algorithms and which statistics commissioned to compare the performance of these algorithms. The focus was mainly on prediction of the diseases, emergency department visits or a disease outbreak, using 'HADOOP' and 'WEKA' tool, by obtaining data from University of California machine learning repository, hospitals and government agencies. Support vector machine, artificial neural networks, naive Bayes and decision tree were commonly used algorithms whose efficacy was compared statistically using 'accuracy'. In my perspective, apart from prediction of disease other domains of health are to be addressed.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 87-117
Issue: 1
Volume: 12
Year: 2020
Keywords: big data; Hadoop; machine learning algorithms; healthcare; map-reduce; chronic diseases; accuracy rate; prevention; analytics.
File-URL: http://www.inderscience.com/link.php?id=105598
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:1:p:87-117

Template-Type: ReDIF-Article 1.0
Author-Name:  Aarti
Author-X-Name-First: 
Author-X-Name-Last: Aarti
Author-Name: Geeta Sikka
Author-X-Name-First: Geeta
Author-X-Name-Last: Sikka
Author-Name: Renu Dhir
Author-X-Name-First: Renu
Author-X-Name-Last: Dhir
Title: Grey relational classification algorithm for software fault proneness with SOM clustering
Abstract:
The estimation by the human judgment to deal with the inherent uncertainty of software gives a vague and imprecise solution. To cope with this challenge, we propose a new hybrid analogy model based on the integration of grey relational analysis (GRA) classification with self-organising map (SOM) clustering. In this paper, a new classification approach is proposed to distribute the data to similar groups. The attributes are selected based on GRC values. In the proposed, the similarity measure between reference project and cluster head is computed to determine the cluster to which target project belongs. The fault-proneness of reference project is estimated based on the regression equation of the selected cluster. The proposed algorithm gives resilience to users to select features for both continuous and categorical attributes. In this study, two scenarios based on the integration of proposed classification with regression have been proposed. Experimental results show significant results indicating that proposed methodology can be used for the prediction of faults and produce conceivable results when compared with the results of multilayer-perceptron, logistic regression, bagging, na&#239;ve Bayes and sequential minimal optimisation (SMO).
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 28-64
Issue: 1
Volume: 12
Year: 2020
Keywords: self-organising map; SOM; grey relational analysis; GRA; unsupervised classification; fault-proneness; object-oriented; OO.
File-URL: http://www.inderscience.com/link.php?id=105599
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:1:p:28-64

Template-Type: ReDIF-Article 1.0
Author-Name: Imane Messaoudi
Author-X-Name-First: Imane
Author-X-Name-Last: Messaoudi
Author-Name: Nadjet Kamel
Author-X-Name-First: Nadjet
Author-X-Name-Last: Kamel
Title: Overlapping community detection with a novel hybrid metaheuristic optimisation algorithm
Abstract:
Social networks are ubiquitous in our daily life. Due to the rapid development of information and electronic technology, social networks are becoming more and more complex in terms of sizes and contents. It is of paramount significance to analyse the structures of social networks in order to unveil the myth beneath complex social networks. Network community detection is recognised as a fundamental tool towards social networks analytics. As a consequence, numerical community detection methods are proposed in the literature. For a real-world social network, an individual may possess multiple memberships, while the existing community detection methods are mainly designed for non-overlapping situations. With regard to this, this paper proposes a hybrid metaheuristic method to detect overlapping communities in social networks. In the proposed method, the overlapping community detection problem is formulated as an optimisation problem and a novel bat optimisation algorithm is designed to solve the established optimisation model. To enhance the searchability of the proposed algorithm, a local search operator based on tabu search is introduced. To validate the effectiveness of the proposed algorithm, experiments on benchmark and real-world social networks are carried out. The experiments indicate that the proposed algorithm is promising for overlapping community detection.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 118-139
Issue: 1
Volume: 12
Year: 2020
Keywords: overlapping community; modified density; Tabu search; TS; Bat algorithm; BA; link clustering; social network.
File-URL: http://www.inderscience.com/link.php?id=105601
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:1:p:118-139

Template-Type: ReDIF-Article 1.0
Author-Name: Nawel Sekkal
Author-X-Name-First: Nawel
Author-X-Name-Last: Sekkal
Author-Name: Sidi Mohamed Benslimane
Author-X-Name-First: Sidi Mohamed
Author-X-Name-Last: Benslimane
Author-Name: Michael Mrissa
Author-X-Name-First: Michael
Author-X-Name-Last: Mrissa
Author-Name: Cheol Young Park
Author-X-Name-First: Cheol Young
Author-X-Name-Last: Park
Author-Name: Boudjemaa Boudaa
Author-X-Name-First: Boudjemaa
Author-X-Name-Last: Boudaa
Title: Proactive and reactive context reasoning architecture for smart web services
Abstract:
The web of things (WoT) uses web technologies to connect embedded objects to each other and to deliver services to stakeholders. The context of these interactions (situation) is a key source of information which can be sometimes uncertain. In this paper, we focus on the development of intelligent web services. The main requirements for intelligent service are to deal with context diversity, semantic context representation and the capacity to reason with uncertain information. From this perspective, we propose a framework for intelligent services to deal with various contexts, to reactively respond to real-time situations and proactively predict future situations. For the semantic representation of context, we use PR-OWL, a probabilistic ontology based on multi-entity Bayesian networks. PR-OWL is flexible enough to represent complex and uncertain contexts. We validate our framework with an intelligent plant watering use case to show its reasoning capabilities.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 1-27
Issue: 1
Volume: 12
Year: 2020
Keywords: smart web service; the web of things; context reasoning; proactive; reactive; multi-entity Bayesian networks; MEBNs; PR-OWL.
File-URL: http://www.inderscience.com/link.php?id=105609
File-Format: text/html
File-Restriction: Open Access
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:1:p:1-27

Template-Type: ReDIF-Article 1.0
Author-Name: Yasmine Chaabani
Author-X-Name-First: Yasmine
Author-X-Name-Last: Chaabani
Author-Name: Jalel Akaichi
Author-X-Name-First: Jalel
Author-X-Name-Last: Akaichi
Title: Bees colonies for detecting communities evolution using data warehouse
Abstract:
The analysis of social networks and their evolution has gained much interest in recent years. In fact, few methods revealed and tracked meaningful communities over time. These methods also dealt efficiently with structure and topic evolution of networks. In this paper, we propose a novel technique to track dynamic communities and their evolution behaviour. The main objective of our approach and using the artificial bee colony (ABC) is to trace the evolution of community and to optimise our objective function to keep proper partitioning. Moreover, we use a data warehouse as a mind of bees to store the information of different communities structure in every timestamp. The experimental results showed that the proposed method is efficient in discovering dynamics communities and tracking their evolution.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 192-206
Issue: 2
Volume: 12
Year: 2020
Keywords: social network; community detection; bees colony.
File-URL: http://www.inderscience.com/link.php?id=106720
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:2:p:192-206

Template-Type: ReDIF-Article 1.0
Author-Name: Fatima Meskine
Author-X-Name-First: Fatima
Author-X-Name-Last: Meskine
Author-Name: Safia Nait-Bahloul
Author-X-Name-First: Safia
Author-X-Name-Last: Nait-Bahloul
Title: A support architecture to MDA contribution for data mining
Abstract:
The data mining process is the sequence of tasks applied to data, in order to discover relations between them to have knowledge. However, the data mining process lacks a formal specification that allows it to be modelled independently of platforms. Model driven architecture (MDA) is an approach for the development of software systems, based on the use of models to improve their productivity. Several research works have been elaborated to align the MDA approach with data mining on data warehouses, to specify the data mining process in a very high level of abstraction. In our work, we propose a support architecture that allows positioning these researches in different abstraction levels, on the basis of several criteria; with the aim to identify strengths for each level, in term of modelling; and to have a clear visibility on the MDA contribution for data mining.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 207-236
Issue: 2
Volume: 12
Year: 2020
Keywords: data mining; model driven architecture; MDA; data warehouses; UML profiles; data multidimensional model; transformation.
File-URL: http://www.inderscience.com/link.php?id=106723
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:2:p:207-236

Template-Type: ReDIF-Article 1.0
Author-Name: Jaishree Ranganathan
Author-X-Name-First: Jaishree
Author-X-Name-Last: Ranganathan
Author-Name: Angelina A. Tzacheva
Author-X-Name-First: Angelina A.
Author-X-Name-Last: Tzacheva
Title: Emotion mining from text for actionable recommendations detailed survey
Abstract:
In the era of Web 2.0, people express their opinion, feelings and thoughts about topics including political and cultural events, natural disasters, products and services, through mediums such as blogs, forums, and micro-blogs, like Twitter. Also, large amount of text is generated through e-mail which contains the writer's feeling or opinion; for instance, customer care service e-mail. The texts generated through such platforms are a rich source of data which can be mined in order to gain useful information about user opinion or feeling which in turn can be utilised in specific applications such as: marketing, sale predictions, political surveys, health care, student-faculty culture, e-learning platforms, and social networks. This process of identifying and extracting information about the attitude of a speaker or writer about a topic, polarity, or emotion in a document is called sentiment analysis. There are variety of sources for extracting sentiment such as speech, music, facial expression. Due to the rich source of information available in the form of text data, this paper focuses on sentiment analysis and emotion mining from text, as well as discovering actionable patterns. The actionable patterns may suggest ways to alter the user's sentiment or emotion to a more positive or desirable state.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 143-191
Issue: 2
Volume: 12
Year: 2020
Keywords: actionable pattern mining; data mining; text mining; sentiment analysis.
File-URL: http://www.inderscience.com/link.php?id=106729
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:2:p:143-191

Template-Type: ReDIF-Article 1.0
Author-Name: Abdullah Alsaeedi
Author-X-Name-First: Abdullah
Author-X-Name-Last: Alsaeedi
Title: A survey of term weighting schemes for text classification
Abstract:
Text document classification approaches are designed to categorise documents into predefined classes. These approaches have two main components: document representation models and term-weighting methods. The high dimensionality of feature space has always been a major problem in text classification methods. To resolve high dimensionality issues and to improve the accuracy of text classification, various feature selection approaches were presented in the literature. Besides which, several term-weighting schemes were introduced that can be utilised for feature selection methods. This work surveys and investigates various term (feature) weighting approaches that have been presented in the text classification context.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 237-254
Issue: 2
Volume: 12
Year: 2020
Keywords: document frequency; supervised term weighting; text classification; unsupervised term weighting.
File-URL: http://www.inderscience.com/link.php?id=106741
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:2:p:237-254

Template-Type: ReDIF-Article 1.0
Author-Name: Mohammed Al-Sarem
Author-X-Name-First: Mohammed
Author-X-Name-Last: Al-Sarem
Author-Name: Abdel-Hamid Emara
Author-X-Name-First: Abdel-Hamid
Author-X-Name-Last: Emara
Author-Name: Ahmed Abdel Wahab
Author-X-Name-First: Ahmed Abdel
Author-X-Name-Last: Wahab
Title: Performance of authorship attribution classifiers with short texts: application of religious Arabic fatwas
Abstract:
Although authorship attribution is a well-known problem in authorship analysis domain, researches on Arabic contexts are still limited. In addition, examining the performance of the attribution methods on training set with short textual documents is also not considered well in other languages, such as English, Chinese, Spanish and Dutch. Therefore, this current work aims at examining the performance of attribution classifiers in the context of short Arabic textual documents. The experimental part of this work is conducted with well-known classifiers namely: decision tree C4.5 method, naive Bayes model, K-NN method, Markov model, SMO and Burrows Delta method. We experiment with various features combination. The results show that combining the word-based lexical features with the structural features yields the best accuracy. At this end, we use this combination as a baseline for further investigation. We also examine the effect of combining the n-gram features. The results indicate that some classifiers show an improvement while the others do not. In addition, the results show that the naive Bayes method gives the highest accuracy among all the attribution classifiers.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 350-364
Issue: 3
Volume: 12
Year: 2020
Keywords: authorship attribution; AA; stylomatric features; SF; attribution classifiers; JGAAP tool; Arabic language.
File-URL: http://www.inderscience.com/link.php?id=108719
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:350-364

Template-Type: ReDIF-Article 1.0
Author-Name: Akram Osman
Author-X-Name-First: Akram
Author-X-Name-Last: Osman
Author-Name: Naomie Salim
Author-X-Name-First: Naomie
Author-X-Name-Last: Salim
Title: Extracting useful reply-posts for text forum threads summarisation using quality features and classification methods
Abstract:
Text forums threads have a large amount of information furnished by users who discuss on a specific topic. At times, certain thread reply-posts are entirely off-topic, thereby deviating from the main discussion. It negatively affects the user's preference to continue replying to the discussion. Thus, there is a possibility that the user prefers to read certain selected reply-posts that provide a short summary of the topic of the discussion. The objective of the paper is to choose quality reply-posts regarding a topic considered in the initial-post, which also serve a brief summary. We offer an exhaustive examination of the conversational patterns of the threads on the basis of 12 quality features for analysis. These features can ensure selection of relevant reply-posts for the thread summary. Experimental outcomes obtained using two datasets show that the presented techniques considerably enhanced the performance in selecting initial-post replies pairs for text forum threads summarisation.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 330-349
Issue: 3
Volume: 12
Year: 2020
Keywords: information retrieval; initial-post replies pairs; text data; text forum threads; TFThs; text forum threads summarisation; text summarisation; thread retrieval.
File-URL: http://www.inderscience.com/link.php?id=108725
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:330-349

Template-Type: ReDIF-Article 1.0
Author-Name: Hiba Zuhair
Author-X-Name-First: Hiba
Author-X-Name-Last: Zuhair
Author-Name: Ali Selamat
Author-X-Name-First: Ali
Author-X-Name-Last: Selamat
Title: Phish webpage classification using hybrid algorithm of machine learning and statistical induction ratios
Abstract:
Although the conventional machine learning-based anti-phishing techniques outperform their competitors in phishing detection, they are still targeted by zero-hour phish webpages due to their constraints of phishing induction. Therefore, phishing induction must be boosted up with the extraction of new features, the selection of robust subsets of decisive features, the active learning of classifiers on a big webpage stream. In this paper, we propose a hybrid feature-based classification algorithm (HFBC) for decisive phish webpage classification. HFBC hybridises two statistical criteria optimised feature occurrence (OFC) and phishing induction ratio (PIR) with the induction settings of the most salient machine learning algorithms, Na&#239;ve bays and decision tree. Additionally, we propose two constituent algorithms of features extraction and features selection for holistic phish webpage characterisation. The superiority of our proposed approach is justified and proven throughout chronological, real-time, and comparative analyses against existing machines learning-based anti-phishing techniques.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 255-276
Issue: 3
Volume: 12
Year: 2020
Keywords: phish webpage; machine learning; optimised feature occurrence; OFC; phishing induction ratio; PIR; hybrid feature-based classifier; HFBC.
File-URL: http://www.inderscience.com/link.php?id=108727
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:255-276

Template-Type: ReDIF-Article 1.0
Author-Name: Meryem Amar
Author-X-Name-First: Meryem
Author-X-Name-Last: Amar
Author-Name: Bouabid El Ouahidi
Author-X-Name-First: Bouabid El
Author-X-Name-Last: Ouahidi
Title: Weighted LSTM for intrusion detection and data mining to prevent attacks
Abstract:
The usage of cloud opportunities brings not only resources and storage availability, but puts also customer's privacy at stake. These services are carried out through web that generate log files. These files contain valuable information in tracking malicious behaviours. However, they are variant, voluminous and have high velocity. This paper structures input log files using data preparation treatment (DPT), anticipates missing features, and performs a weighted conversion to ease the discrimination of malicious activities. Regarding the robustness of deep learning in analysing high dimension databases, selecting dynamically features and detecting intrusions, our architecture avails its strength and proposes a weighted long short-term memory (WLSTM) deep learning algorithm. WLSTM mine network traffic predictors considering past events, and minimizes the vanishing gradient. Results prove its effectiveness; it achieves 98% of accuracy and reduces false alarm rates to 1.47%. For contextual malicious behaviours, the accuracy attained 97% and the loss was 22%.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 308-329
Issue: 3
Volume: 12
Year: 2020
Keywords: cloud security breaches; intrusion-detection; weight of evidence; WoE; deep learning; long short-term memory; LSTM.
File-URL: http://www.inderscience.com/link.php?id=108728
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:308-329

Template-Type: ReDIF-Article 1.0
Author-Name: E.O. Rodrigues
Author-X-Name-First: E.O.
Author-X-Name-Last: Rodrigues
Author-Name: D. Casanova
Author-X-Name-First: D.
Author-X-Name-Last: Casanova
Author-Name: M. Teixeira
Author-X-Name-First: M.
Author-X-Name-Last: Teixeira
Author-Name: V. Pegorini
Author-X-Name-First: V.
Author-X-Name-Last: Pegorini
Author-Name: F. Favarim
Author-X-Name-First: F.
Author-X-Name-Last: Favarim
Author-Name: E. Clua
Author-X-Name-First: E.
Author-X-Name-Last: Clua
Author-Name: A. Conci
Author-X-Name-First: A.
Author-X-Name-Last: Conci
Author-Name: Panos Liatsis
Author-X-Name-First: Panos
Author-X-Name-Last: Liatsis
Title: Proposal and study of statistical features for string similarity computation and classification
Abstract:
Adaptations of features commonly applied in the field of visual computing, co-occurrence matrix (COM) and run-length matrix (RLM), are proposed for the similarity computation of strings in general (words, phrases, codes and texts). The proposed features are not sensitive to language related information. These are purely statistical and can be used in any context with any language or grammatical structure. Other statistical measures that are commonly employed in the field such as longest common subsequence, maximal consecutive longest common subsequence, mutual information and edit distances are evaluated and compared. In the first synthetic set of experiments, the COM and RLM features outperform the remaining state-of-the-art statistical features. In 3 out of 4 cases, the RLM and COM features were statistically more significant than the second best group based on distances (P-value &#60; 0.001). When it comes to a real text plagiarism dataset, the RLM features obtained the best results.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 277-307
Issue: 3
Volume: 12
Year: 2020
Keywords: word comparison; string similarity; classification; statistical features; text mining; optical character recognition; OCR; text plagiarism; text entailment; supervised learning.
File-URL: http://www.inderscience.com/link.php?id=108731
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:277-307

Template-Type: ReDIF-Article 1.0
Author-Name: Alexey G. Finogeev
Author-X-Name-First: Alexey G.
Author-X-Name-Last: Finogeev
Author-Name: Leyla A. Gamidullaeva
Author-X-Name-First: Leyla A.
Author-X-Name-Last: Gamidullaeva
Author-Name: Sergey M. Vasin
Author-X-Name-First: Sergey M.
Author-X-Name-Last: Vasin
Title: Application of hyper-convergent platform for big data in exploring regional innovation systems
Abstract:
The authors developed a decentralised hyper-convergent analytical platform for the collection and processing of big data in order to explore the monitoring processes of distributed objects in the regions on the basis of multi-agent approach. The platform is intended for modular integration of tools for searching, collecting, processing and big data mining from cyber-physical and cyber-social objects. The results of the intellectual analysis are used to assess the integrated criteria for the effectiveness of innovation systems of distributed monitoring and forecasting the dynamics of the influence of various factors on technological and socio-economic processes. The work analyses convergent and hyper-convergent systems, substantiates the necessity of creating a multi-agent decentralised platform for big data collection and analytical processing. The article proposes the principles of streaming architecture for the data integration analytical processing to resolve the problems of searching, parallel processing, data mining and uploading of information into a cloud storage. The paper also considers the main components of the hyper-convergent analytical platform. A new concept of distributed extraction, transformation, loading, mining (ETLM) system is considered.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 365-385
Issue: 4
Volume: 12
Year: 2020
Keywords: innovation system; convergence; convergent platform; hyper-convergent system; intellectual analysis; big data; multi-agent approach; ETLM.
File-URL: http://www.inderscience.com/link.php?id=111395
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:4:p:365-385

Template-Type: ReDIF-Article 1.0
Author-Name: Mehdi Soleymani
Author-X-Name-First: Mehdi
Author-X-Name-Last: Soleymani
Title: A quest for better anomaly detectors
Abstract:
Anomaly detection is a very popular method for detecting exceptional observations which are very rare. It has been frequently used in medical diagnosis, fraud detection, etc. In this article, we revisit some popular algorithms for anomaly detection and investigate why we are on a quest for a better algorithm for identifying anomalies. We propose a new algorithm, which unlike other popular algorithms, is not looking for outliers directly, but it searches for them by removing the inliers (opposite to outliers) in an iterative way. We present an extensive simulation study to show the performance of the proposed algorithm compared to its competitors.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 447-458
Issue: 4
Volume: 12
Year: 2020
Keywords: anomaly detection; algorithm; <i>k</i>-nearest neighbour.
File-URL: http://www.inderscience.com/link.php?id=111399
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:4:p:447-458

Template-Type: ReDIF-Article 1.0
Author-Name: Borislava Petrova Vrigazova
Author-X-Name-First: Borislava Petrova
Author-X-Name-Last: Vrigazova
Author-Name: Ivan Ganchev Ivanov
Author-X-Name-First: Ivan Ganchev
Author-X-Name-Last: Ivanov
Title: The bootstrap procedure in classification problems
Abstract:
In classification problems, cross-validation chooses random samples from the dataset in order to improve the ability of the model to classify properly new observations in the respective class. Research articles from various fields show that when applied to regression problems, the bootstrap can improve either the prediction ability of the model or the ability for feature selection. The purpose of our research is to show that the bootstrap as a model selection procedure in classification problems can outperform cross-validation. We compare the performance measures of cross-validation and the bootstrap on a set of classification problems and analyse their practical advantages and disadvantages. We show that the bootstrap procedure can accelerate execution time compared to the cross-validation procedure while preserving the accuracy of the classification model. This advantage of the bootstrap is particularly important in big datasets as the time needed for fitting the model can be reduced without decreasing the model's performance.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 428-446
Issue: 4
Volume: 12
Year: 2020
Keywords: logistic regression; decision tree; k-nearest neighbour; KNN; the bootstrap; cross-validation.
File-URL: http://www.inderscience.com/link.php?id=111400
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:4:p:428-446

Template-Type: ReDIF-Article 1.0
Author-Name: Mamoon Obiedat
Author-X-Name-First: Mamoon
Author-X-Name-Last: Obiedat
Author-Name: Ali Al-yousef
Author-X-Name-First: Ali
Author-X-Name-Last: Al-yousef
Author-Name: Mustafa Banikhalaf
Author-X-Name-First: Mustafa
Author-X-Name-Last: Banikhalaf
Author-Name: Khairallah Al Talafha
Author-X-Name-First: Khairallah Al
Author-X-Name-Last: Talafha
Title: A new quantitative method for simplifying complex fuzzy cognitive maps
Abstract:
Fuzzy cognitive map (FCM) is a qualitative soft computing approach addresses uncertain human perceptions of diverse real-world problems. The map depicts the problem in the form of problem nodes and cause-effect relationships among them. Complex problems often produce complex maps that may be difficult to understand or predict, and therefore, maps need to be simplified. Previous studies used subjectively simplification/condensation processes by grouping similar variables into one variable in a qualitative manner. This paper proposes a quantitative method for simplifying FCM. It uses the spectral clustering quantitative technique to classify/group related variables into new clusters without human intervention. Initially, improvements were added to this clustering technique to properly handle FCM matrix data. Then, the proposed method was examined by an application dataset to validate its appropriateness in FCM simplification. The results showed that the method successfully classified the dataset into meaningful clusters.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 415-427
Issue: 4
Volume: 12
Year: 2020
Keywords: soft computing; fuzzy cognitive map model; complex problems; FCM simplification; spectral clustering; topological overlap matrix; decision support systems.
File-URL: http://www.inderscience.com/link.php?id=111402
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:4:p:415-427

Template-Type: ReDIF-Article 1.0
Author-Name: Ahmet Arif Aydin
Author-X-Name-First: Ahmet Arif
Author-X-Name-Last: Aydin
Author-Name: Kenneth M. Anderson
Author-X-Name-First: Kenneth M.
Author-X-Name-Last: Anderson
Title: Data modelling for large-scale social media analytics: design challenges and lessons learned
Abstract:
We live in a world of big data; organisations collect, store, and analyse large volumes of data for various purposes. The five V's of big data introduce new challenges for developers to handle when performing data processing and analysis. Indeed, data modelling is one of the most challenging and critical aspects of big data because it determines how data will be structured and stored; these decisions then impact how that data can be processed and analysed. In this paper, we report on designing a data model for storing and analysing Twitter data in support of crisis informatics. In this work, we leverage the data model provided by columnar NoSQL data stores to design column families that can efficiently index, sort, store and analyse large Twitter datasets. In particular, our column families are designed to achieve efficient batch data processing. We evaluate these claims and discuss our future work.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 386-414
Issue: 4
Volume: 12
Year: 2020
Keywords: data modelling; social media analytics; big data analytics; NoSQL.
File-URL: http://www.inderscience.com/link.php?id=111409
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:4:p:386-414