Forthcoming and Online First Articles

International Journal of Data Mining, Modelling and Management

International Journal of Data Mining, Modelling and Management (IJDMMM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Data Mining, Modelling and Management (9 papers in press)

Regular Issues

  • Detection of Terrorism’s Apologies on Twitter using a New Bi-lingual Dataset   Order a copy of this article
    by Khaled BEDJOU, Faical Azouaou 
    Abstract: A lot of terrorist apology content is being shared on social media without being detected. Therefore, the automatic and immediate detection of these contents is essential for people’s safety. In this paper, we propose a language independent process to detect and classify terrorism’ apologies on Twitter into three classes (apology, no apology, and neutral). We tested the process on a bi-lingual (Arabic and English) dataset of 12,155 manually annotated tweets. We conducted two sets of experiments, one with imbalanced data and the other with oversampled data. We compared the classification performances of four machine learning algorithms (RF, DT, KNN, and NB) and five deep learning algorithms (GRU, SimpleRNN, LSTM, BiLSTM, and BERT). Our comparative study concluded that BERT achieves better classification performance than the others, with an accuracy of 0.84 for Arabic and 0.81 for English on imbalanced data, and 0.88 for Arabic and 0.91 for English on oversampled data.
    Keywords: terrorism’s apology; social network analysis; Twitter; NLP; sentiment analysis; machine learning; deep learning; transfer learning.
    DOI: 10.1504/IJDMMM.2023.10051983
     
  • A Comparative Study of Supervised/Unsupervised Machine Learning Algorithms with Feature Selection Approaches to Predict Student Performance   Order a copy of this article
    by Alaa Khalaf Hamoud, Ali Salah Alasady, Wid Akeel Awadh, Jassim Mohammed Dahr, Mohammed B. M. Kamel, Aqeel Majeed Humadi, Ihab Ahmed Najm 
    Abstract: The field of educational data mining (EDM) is one of the most growing fields that aims to improve the performance of students, academic staff, and overall institutional performance. The implementing process of data mining algorithms almost needs the feature selection process to find the most correlated features and improve the accuracy. In this paper, a comparative study is performed to study implementation of supervised/unsupervised algorithms in predicting the students’ performance. The student's grade is classified using different fields of supervised and unsupervised algorithms such as decision trees, clustering, and neural networks. These algorithms were examined over the questionnaire dataset before/after feature selection to measure the effect of feature selection on the result accuracy. The results showed that the random forest decision tree outperformed other supervised/unsupervised algorithms. The results also showed that the performance evaluation of algorithms with the dataset after removing the less correlated attributes is enhanced for most of the algorithms.
    Keywords: educational data mining; EDM; students’ performance; supervised algorithms; unsupervised algorithms; feature selection.
    DOI: 10.1504/IJDMMM.2023.10055032
     
  • A Novel Taxonomy of Natural Disasters based on Casualty and Consequence using Hierarchical Clustering   Order a copy of this article
    by Donald D. Atsa\'am, Frank Adusei-Mensah, Oluwafemi S. Balogun, Temidayo O. Omotehinwa, Oluwaseun S. Dada, Richard Osei Agjei, Samuel Nii Odoi Devine 
    Abstract: Post-disaster management requires a proportional deployment of human and material resources. The number of resources required to manage a disaster cannot be known without first evaluating the extent of casualty and consequence. This study proposed a taxonomy for classifying natural disasters based on casualty and consequence. Using a secondary data on global disasters from 1900 to 2021, the hierarchical cluster analysis technique was deployed for taxonomy formation. The learning algorithm evaluated the similarities in numbers of deaths, injuries, and the cost of damaged property caused by disasters. Three clusters were extracted which sub-grouped historical disasters based on similarities in casualty and consequence. Further, a taxonomy that defines the ranges of what constitute low, average, and high deaths/injuries/damage were established. Classifying a future disaster with this taxonomy prior to the deployment of resources for rescue, resettlement, compensation, and other disaster management operations will guide efficient resource allocation on a case-by-case basis.
    Keywords: Disaster taxonomy; natural disasters; casualty and consequence; post-disaster management; hierarchical cluster analysis.
    DOI: 10.1504/IJDMMM.2023.10055078
     
  • K-Means and DBSCAN for Look Alike-Sound Alike Medicines Issue   Order a copy of this article
    by Souad Moufok, Anas Mouattah, Khalid Hachemi 
    Abstract: The goal of this study is to analyse the application of data mining techniques in clustering drug names based on their spelling similarity in order to reduce the occurrence of dispensing errors caused by look-alike sound-alike medicine confusion, as they considered one of the most common causes of dispensing errors. Two unsupervised data mining methods, k-means and DBSCAN, were used in conjunction with two similarity measures, Bisim and Levenshtein. The results of the study showed that the approach is effective in identifying potential confusable medicines, with Bisim-based k-means clustering being favored with a silhouette score of 0.5.
    Keywords: Look Alike Sound Alike; Data Mining; Medication Errors; Dispensing Errors; Lasa; K-means; DBSCAN.
    DOI: 10.1504/IJDMMM.2024.10057242
     
  • HARUIM: High Average Recent Utility Itemset Mining   Order a copy of this article
    by Mathe John Kenny Kumar, Dipti Rana 
    Abstract: High utility itemset mining (HUIM) discovers itemsets that are profitable in nature. Previously, the recency of an itemset was determined by adding the recency of each transaction of an itemset. A major disadvantage of this method is that some transactions of an itemset which are very recent can cause the whole itemset to be recent. To overcome this limitation, we present a novel measure called average recency to mine recent and high utility itemsets. Average recency upper-bound (arub) and estimated recency co-occurrence structure (ERCS) are proposed to prune unpromising itemsets. A variation of list structure known as average recent utility list (ARUL) has been created to hold data regarding utility and recency of itemsets. Through a series of comprehensive experimentation carried out on both real as well as synthetic datasets, it has been demonstrated that the proposed system surpasses the baseline algorithm in runtime, memory utilisation, and candidate generation.
    Keywords: data mining; high utility itemset mining; HUIM; recency; average recency; list structure; pattern mining; EUCS; knowledge engineering; candidate generation.
    DOI: 10.1504/IJDMMM.2024.10055782
     
  • Using data mining to integrate Recency-Frequency-Monetary value (RFM) analysis and credit scoring methods for bank customer behavior analysis   Order a copy of this article
    by Mohammad Khanbabaei, Pantea Parsi, Najmeh Farhadi 
    Abstract: Banks apply credit scoring to identify customers with low credit risk. Additionally, recency-frequency-monetary value (RFM) analysis method is suitable for identifying valuable bank customers. Data mining techniques can be used to discover useful patterns hidden in customer data. However, in previous research, data mining has been used separately in both credit scoring and RFM approaches. To evaluate customer behavior, banks must employ credit scoring and RFM analysis method, simultaneously. This study proposes a framework for using data mining techniques to integrate credit scoring and RFM methods in the field of banking. In this framework, k-means had better performance than Kohonen network and DBSCAN to identify and cluster valuable customers based on the RFM and credit scoring indices. Moreover, the C5 decision tree, BN, and SVM with 94.10%, 92.71%, and 92.36% accuracy had better performance to classify valuable bank customers based on RFM and credit scoring indices.
    Keywords: data mining; RFM method; credit scoring; banking; marketing.
    DOI: 10.1504/IJDMMM.2023.10055838
     
  • Hybrid Classifier Model for Big Data by leveraging Map reduce framework   Order a copy of this article
    by Sitha Ramulu V, K. Rajendra Prasad, Sudheer Reddy K., A.V. Krishna Prasad, Venkat Dass M 
    Abstract: Big data technology is being popular and desirable among many users for handling, analysing, and storing large data. However, clustering the large data has become more complex due to its size. In recent years, several techniques have been presented to retrieve the information from big data. The proposed hybrid classifier model CSDHAP, the hybridised form of sun flower optimisation (SFO) and deer hunting optimisation (DHO) algorithms with adaptive pollination rate using MapReduce framework. The CSDHAP is a data classification technique that performed using classifiers. The results of the presented approach are evaluated over the extant approaches using various metrics namely, F1-score, specificity, NPV, accuracy, FNR, FDR, sensitivity, precision, FPR, and MCC. It is pertinent to mention that, the proposed model is better than any of the traditional models. The proposed HC+CSDHAP model attained better precision value than other traditional models like RNN, SVM, CNN, Bi-LSTM, NB, LSTM, and DBN, correspondingly.
    Keywords: big data classification; MapReduce framework; long short-term memory; LSTM; deep belief network; DBN; optimisation.
    DOI: 10.1504/IJDMMM.2024.10057054
     
  • Developing a Data Pipeline Solution for Big Data Processing   Order a copy of this article
    by Ivona Lipovac, Marina Bagi? Babac 
    Abstract: This paper presents a comprehensive exploration of the concept of big data and its management while highlighting the challenges that arise in the process. The study showcases the development of a data pipeline, designed to facilitate big data collection, integration, and analysis while addressing state-of-the-art challenges, methods, tools, and technologies. Emphasis is placed on pipeline flexibility, with a view towards enabling ease of implementation of architecture changes, seamless integration of new sources, and straightforward implementation of additional transformations in existing pipelines as needed. The pipeline architecture is discussed in detail, with a focus on its design principles, components, and implementation details, as well as the mechanisms used to ensure its reliability, scalability, and performance. Results from a range of experiments demonstrate the pipeline's effectiveness in addressing the challenges of big data management and analysis, as well as its robustness and versatility in accommodating diverse data sources and processing requirements. This study provides insights into the critical role of data pipelines in enabling effective big data management and showcases the importance of flexibility in pipeline design to ensure adaptability to evolving data processing needs.
    Keywords: big data; data pipeline; data processing; data analysis; cloud computing.
    DOI: 10.1504/IJDMMM.2024.10058088
     
  • Effect of Various Factors on Classification Performance of Ordinal Logistic Regression   Order a copy of this article
    by Ali Vasfi Ağlarcı, Cengiz Bal 
    Abstract: The classification problem is the way in which a new observation belongs to a set of categories, using known features. For example, categorising e-mails as necessary or unnecessary, or finding a diagnosis of a disease using a patient’s various values (such as gender, blood pressure, presence of various symptoms). Various methods are used in classification processes. In this study, the classification performance of ordinal logistic regression, which is a statistical method, was investigated. It has been revealed how the classification success of the method changes when the data set properties change. For this, a simulation study was carried out by deriving data sets with different properties with the help of the R program. As a result of the simulation study, it was observed that the correlation structure in the data set, the sample size, the number and distribution of the response variable categories affected the classification performance of the method. Suggestions have been made to improve the classification performance of the ordinal logistic regression method.
    Keywords: statistical learning; classification; ordinal data; simulation.
    DOI: 10.1504/IJDMMM.2024.10058094