Forthcoming and Online First Articles

International Journal of Data Mining, Modelling and Management

International Journal of Data Mining, Modelling and Management (IJDMMM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Data Mining, Modelling and Management (13 papers in press)

Regular Issues

  • Effect of Various Factors on Classification Performance of Ordinal Logistic Regression   Order a copy of this article
    by Ali Vasfi Ağlarcı, Cengiz Bal 
    Abstract: The classification problem is the way in which a new observation belongs to a set of categories, using known features. For example, categorising e-mails as necessary or unnecessary, or finding a diagnosis of a disease using a patient’s various values (such as gender, blood pressure, presence of various symptoms). Various methods are used in classification processes. In this study, the classification performance of ordinal logistic regression, which is a statistical method, was investigated. It has been revealed how the classification success of the method changes when the data set properties change. For this, a simulation study was carried out by deriving data sets with different properties with the help of the R program. As a result of the simulation study, it was observed that the correlation structure in the data set, the sample size, the number and distribution of the response variable categories affected the classification performance of the method. Suggestions have been made to improve the classification performance of the ordinal logistic regression method.
    Keywords: statistical learning; classification; ordinal data; simulation.
    DOI: 10.1504/IJDMMM.2024.10058094
  • Intrusion Detection System using Statistical Query Tree with Hierarchical Clustering Approach   Order a copy of this article
    by P.V.N. Rajeswari, M. Shashi 
    Abstract: The existence of potholes threatens road safety and contributes to a significant portion of accidents worldwide. It takes a lot of work to constantly patch potholes and keep track of when new ones appear. Our goal in this work is to create a pothole detection system that would make it simpler to accurately detect potholes from images. The system can potentially save human lives and assist the government authorities to fix the potholes. In order to achieve this objective, we first make use of a pre-trained deep learning model (VGG-16) and thereafter, propose a novel convolutional neural network (CNN) model. This work employs a publicly available dataset, Nienaber Potholes 2 (Complex), for experiments. The proposed model provides 98.87% accuracy on pothole classification task in images and outperforms recent state-of-the-art approaches in the literature. Further, since no past work has been done on this dataset to detect bounding boxes for potholes, we use YOLO-v3 and YOLO-v5 to generate bounding box predictions on this dataset and evaluate the results. The bounding box task achieves 83.23% mAP and 87.45% precision. Due to the absence of significant existing results, these results for bounding box prediction may be considered as a benchmark
    Keywords: statistical query tree; intrusion detection system; IDS; outlier; statistical hierarchical clustering; SHiC; cyber-attack; CICIDS-2017.
    DOI: 10.1504/IJDMMM.2024.10059511
  • Improving Intrusion Detection in the IoT with African Vultures Optimization Algorithm-Based Feature Selection   Order a copy of this article
    by Mohammed Alweshah, Ghadeer Ahmad Alhebaishan, Sofian Kassaymeh, Saleh Alkhalaileh, Mohammed Ababneh 
    Abstract: he security of the system may be jeopardised by unsecured data transmitted through IoT devices, and ensuring the reliability of data is critical to maintaining the integrity of information over the internet. To enhance the intrusion detection rate, several investigations have been conducted to develop methodologies capable of identifying the minimum required secure features. One such method is the use of the feature selection procedure with metaheuristic algorithms. In this study, the African vulture optimisation algorithm was used in two wrapper FS approaches to select the most secure features in IoT. The first approach used AVO, while the second employed OBL-AVO, a hybrid model combining AVO with opposition-based learning (OBL) to enhance exploration. Based on the outcomes, it was found that the OBL-AVO is superior to the AVO in enhancing FS. Furthermore, the proposed methods’ were evaluated and compared to four recent approaches.
    Keywords: intrusion detection; internet of things; IoT; feature selection; hybrid metaheuristics; African vultures optimisation algorithm; AVO; opposition-based learning; OBL.
    DOI: 10.1504/IJDMMM.2024.10060965
  • Chatbots and ChatGPT: A Bibliometric Analysis and Systematic Review of Publications in Web of Science and Scopus Databases   Order a copy of this article
    by Hamed Khosravi, Mohammad Reza Shafie, Morteza Hajiabadi, Ahmed Shoyeb Raihan, Imtiaz Ahmed 
    Abstract: This paper presents a bibliometric analysis of the scientific literature related to chatbots, focusing specifically on ChatGPT. Chatbots have gained increasing attention recently, with an annual growth rate of 19.16% and 27.19% on the Web of Sciences (WoS) and Scopus, respectively. The research consists of two study phases: 1) an analysis of chatbot literature; 2) a comprehensive review of scientific documents on ChatGPT. In the first phase, a bibliometric analysis is conducted on all the published literature from both Scopus (5,839) and WoS (2,531) databases covering the period from 1998 to 2023. Consequently, bibliometric analysis has been carried out on ChatGPT publications, and 45 published studies have been analysed thoroughly based on their methods, novelty, and conclusions. Overall, the study aims to provide guidelines for researchers to conduct their research more effectively in the field of chatbots and specifically highlight significant areas for future investigation into ChatGPT.
    Keywords: chatbot; ChatGPT; bibliometrics; artificial intelligence; natural language processing; NLP; generative artificial intelligence.
    DOI: 10.1504/IJDMMM.2024.10061138
  • An Irregular CLA-based Novel Frequent Pattern Mining Approach   Order a copy of this article
    by Moumita Ghosh, Sourav Mondal, Harshita Moondra, Dina Tri Utari, Anirban Roy, Kartick Chandra Mondal 
    Abstract: Frequent itemset mining has received a lot of attention in the field of data mining. Its main objective is to find groups of items that consistently appear together in datasets. Even while frequent itemset mining is useful, the algorithms for mining frequent itemsets have quite high resource requirements. In order to optimise the time and memory needs, a few improvements have been made in recent years. This study proposes CellFPM, a straightforward yet effective cellular learning automata-based method for finding frequent itemset occurrences. It works efficiently with large datasets. The efficiency of the proposed approach in time and memory requirements has been evaluated using benchmark datasets explicitly designed for performance measure. The varying size and density of the test datasets have confirmed the scalability of the suggested method. The findings show that CellFPM consistently surpasses the leading algorithms in terms of runtime and memory usage, particularly memory usage mostly.
    Keywords: cellular learning automata; CLA; frequent itemsets; data mining; knowledge discovery.
    DOI: 10.1504/IJDMMM.2024.10061507
  • Clustering-based Multidimensional Sequential Pattern Mining of Semantic Trajectories   Order a copy of this article
    by Thouraya Sakouhi, Jalel Akaichi 
    Abstract: Knowledge discovery from mobility data is about identifying behaviours from trajectories. In fact, mining masses of trajectories is required to have an overview of this data, notably, investigate the relationship between different entities movement. Most state-of-the-art work in this issue operates on raw trajectories. Nevertheless, behaviours discovered from raw trajectories are not as rich and meaningful as those discovered from semantic trajectories. In this paper, we establish a mining approach to extract patterns from semantic trajectories. We propose to apply sequential pattern mining based on a pre-processing step of clustering to alleviate the former's temporal complexity. Mining considers the spatial and temporal dimensions at different levels of granularity providing then richer and more insightful patterns about humans behaviour. We evaluate our work on tourists semantic trajectories in Kyoto. Results showed the effectiveness and efficiency of our model compared to state-of-the-art work.
    Keywords: mobility data; trajectories; semantic modelling; sequential pattern mining; clustering; mobility pattern.
    DOI: 10.1504/IJDMMM.2024.10061616
  • A Comparative Analysis of User Attitudes towards ICO and IEO in Blockchain Projects: Insights from Social Media Big Data   Order a copy of this article
    by ShengJuan Zhao, GyooGun Lim 
    Abstract: This study conducts a comparative analysis of two popular crowdfunding methods in the blockchain market, the initial coin offering (ICO) and the initial exchange offering (IEO) models. Using project names as keywords, we collected and analysed big data, applying techniques such as TF-IDF, LDA, social network analysis, and sentiment analysis. Our findings show that the attitude of target groups towards ICO and IEO projects is not significantly different, although IEO targets exhibit more interest in entertainment-related topics. Social network analysis reveals that the ICO target group is more sensitive to popular elements, such as pop singers, while the IEO target group is more interested in soccer competitions. Both projects show a strong interest in the US election. Our study suggests that IEO, as an upgraded financing model of ICO, does not yet enjoy high levels of trust from the market crowd. By identifying the preferences of the target groups for both models through multiple analyses, we recommend that these preferences be taken into consideration to improve the efficiency of targeted marketing.
    Keywords: blockchain; big data; token issuance; initial coin offering; ICO; initial exchange offering; initial exchange offering; IEO.
    DOI: 10.1504/IJDMMM.2024.10062229
  • A Node sets based Fast and Scalable Frequent Itemset (FSFIM) Algorithm for Mining Big Data using MapReduce Paradigm   Order a copy of this article
    by Borra Sivaiah, R. .Rajeswara Rao 
    Abstract: Big Data is rapidly growing, making traditional tools inefficient for handling large amounts of data. Existing algorithms for frequent itemset mining struggle with scalability due to limitations in parallel processing power. In this paper, we proposed a fast and scalable frequent itemset mining (FSFIM) algorithm used to generate frequent item sets from huge data. Preorder coding (POC) trees and Nodeset data structures save half the memory of node-lists and N-lists. The FSFIM uses Cloudera’s CDH Map Reduce framework. With a maximum speedup value of 1.85 when minimal support is set to 1, The experimental results reveal that FSFIM outperforms the state-of-the-art methods such as HBPFP, Mlib PFP, and Big FIM. Fast and scalable frequent itemset mining algorithm is more scalable and faster for mining frequent item sets from big data.
    Keywords: big data; frequent itemset mining; FIM; MapReduce paradigm; fast and scalable frequent itemset mining; FSFIM.
    DOI: 10.1504/IJDMMM.2024.10062349
  • Data mining techniques along with fuzzy logic control to find solutions to road traffic accidents: Case study in Morocco   Order a copy of this article
    by Halima Drissi Touzani, Sanaa Faquir, Ali Yahyaouy 
    Abstract: Collecting data on road accidents is important. However, it is equally important to analyse and process this data to prevent future accidents. Data analysis can provide valuable insights and help identify patterns, contributing to the development of effective strategies and interventions to improve road safety. Over years, many efforts in research have tackled several causes related to traffic accidents trying to identify risk factors. Different statistics identified that most accidents are due to human errors. In Morocco, a lot of studies have been applied to cars system to become automatic or semi-automatic to avoid serious injuries due to poor driving practices. This paper presents data mining techniques applied on real traffic accidents data using statistical analysis, K-means clustering algorithm and fuzzy logic. The data represents accidents that happened in Morocco during 2014. Results showed important features that caused previous accidents which was used to implement an algorithm based on fuzzy logic to train a semi-autonomous car to make right decisions whenever needed and therefore, prevent accidents from happening.
    Keywords: data analysis; data mining techniques; road traffic accidents; semi-autonomous cars; fuzzy logic control; decision algorithm; statistical methods; Morocco.
    DOI: 10.1504/IJDMMM.2024.10063889
  • Discrete Cuckoo Search for 0-1 knapsack problem   Order a copy of this article
    by Aziz Ouaarab 
    Abstract: This paper presents a resolution of a space management optimisation problem such as 0-1 knapsack problems (KP) by discrete cuckoo search algorithm (DCS). The proposed approach includes an adaptation process of three main components: the objective function, the solution representation, and the step move operator. A simplified conception of these three components is designed without introducing an additional technique, especially in the search process for the optimal solution. Three sets of benchmark instances have been taken from the literature to test the performance of DCS. Experimental results prove that DCS is effective in solving different types of 0-1 KP instances. The result comparisons with other state-of-the-art algorithms show that DCS is a competitive approach that outperforms most of them.
    Keywords: 0-1 knapsack problem; discrete cuckoo search; DCS; combinatorial optimisation; L┬┤evy flights; approximate algorithm.
    DOI: 10.1504/IJDMMM.2024.10064048
  • Early Stage Analysis of Breast Cancer Using Intelligent System   Order a copy of this article
    by Arpita Nath Boruah, Mrinal Goswami 
    Abstract: Breast cancer (BC) poses a considerable global health concern for women which makes a significant issue for women's well-being worldwide. It is crucial to develop a system that can proactively identify the critical risk factors associated with BC. The present study introduces an intelligent system for BC by analysing risk factors (IS-BC-analysing-RF) which utilises decision tree rules to identify the primary risk factors underlying BC accurately. The rules are processed based on the proposed score function to get the most relevant ones. Finally, using the sequential search approach, the critical risk factors are identified along with their respective ranges. Based on the simulation results using University of California at Irvine (UCI) repository BC dataset, the findings indicate that the proposed IS-BC-analysing-RF system is highly significant and has the potential to effectively mitigate the risk of BC by targeting and managing one or two crucial risk factors.
    Keywords: decision system; breast cancer; decision tree; machine learning; risk factor.
    DOI: 10.1504/IJDMMM.2024.10064214
  • A Novel LWT-based Robust Watermark Strategy for Colour Images   Order a copy of this article
    by Prachee Dewangan, Debabala Swain, Monalisa Swain 
    Abstract: With the progress of information technology, digital data larceny and duplicity become very easier. Image watermarking in cryptography is a major domain that provides manifold security features like confidentiality, authenticity, integrity, etc. This research introduces a robust watermarking scheme for colour images. The proposed technique segments the colour image into three layers red, green and blue. The lifting wavelet transform (LWT) and differential histogram shifting are used to embed text watermark information into the R layer. The performance of the proposed technique was assessed using the SIPI image dataset. Test outputs show that the proposed scheme maintains the balance between imperceptibility and robustness. This scheme has a better resistance against all types of attacks like different noises, filter effects, image compressions, etc. Besides, the text watermark can be successfully extracted for different types of tampering like content removal attacks, and content addition attacks.
    Keywords: robust watermarking; geometric attack; fragile attack; dual watermark; lifting wavelet transform.
    DOI: 10.1504/IJDMMM.2024.10064256
  • Data Driven journey: A data management paradigm-centric review and data mesh capabilities   Order a copy of this article
    by Kamel Abdellaoui, Mohamed Ali HADJ TAIEB, Rafik MAHJOUBI, Mohamed B.E.N. AOUICHA 
    Abstract: Becoming data driven is one of the top strategic objectives of data-rich organisations. Africa must join the wave to capture and unlock the highest value from data. Therefore, this survey analyses the drivers, challenges, and evolution, of existing data management paradigms including data warehouse, data lake and data lakehouse. It reveals the limitations of monolithic approaches to address data at scale and how they led to a paradigm shift toward a more distributed and decentralised data mesh. The paper discusses data mesh capabilities to address the challenges of data availability and accessibility at scale in Africa to enable leapfrog development in its journey to being data driven.
    Keywords: data-driven; data management paradigms; data mesh; analytics; developing countries.
    DOI: 10.1504/IJDMMM.2024.10064271