International Journal of Data Mining, Modelling and Management (IJDMMM) Inderscience Publishers - linking academia, business and industry through research

Forthcoming and Online First Articles

International Journal of Data Mining, Modelling and Management

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Data Mining, Modelling and Management (8 papers in press)

Regular Issues

Effect of Various Factors on Classification Performance of Ordinal Logistic Regression
by Ali Vasfi Ağlarcı, Cengiz Bal
Abstract: The classification problem is the way in which a new observation belongs to a set of categories, using known features. For example, categorising e-mails as necessary or unnecessary, or finding a diagnosis of a disease using a patient’s various values (such as gender, blood pressure, presence of various symptoms). Various methods are used in classification processes. In this study, the classification performance of ordinal logistic regression, which is a statistical method, was investigated. It has been revealed how the classification success of the method changes when the data set properties change. For this, a simulation study was carried out by deriving data sets with different properties with the help of the R program. As a result of the simulation study, it was observed that the correlation structure in the data set, the sample size, the number and distribution of the response variable categories affected the classification performance of the method. Suggestions have been made to improve the classification performance of the ordinal logistic regression method.
Keywords: statistical learning; classification; ordinal data; simulation.
DOI: 10.1504/IJDMMM.2024.10058094

Intrusion Detection System using Statistical Query Tree with Hierarchical Clustering Approach
by P.V.N. Rajeswari, M. Shashi
Abstract: The existence of potholes threatens road safety and contributes to a significant portion of accidents worldwide. It takes a lot of work to constantly patch potholes and keep track of when new ones appear. Our goal in this work is to create a pothole detection system that would make it simpler to accurately detect potholes from images. The system can potentially save human lives and assist the government authorities to fix the potholes. In order to achieve this objective, we first make use of a pre-trained deep learning model (VGG-16) and thereafter, propose a novel convolutional neural network (CNN) model. This work employs a publicly available dataset, Nienaber Potholes 2 (Complex), for experiments. The proposed model provides 98.87% accuracy on pothole classification task in images and outperforms recent state-of-the-art approaches in the literature. Further, since no past work has been done on this dataset to detect bounding boxes for potholes, we use YOLO-v3 and YOLO-v5 to generate bounding box predictions on this dataset and evaluate the results. The bounding box task achieves 83.23% mAP and 87.45% precision. Due to the absence of significant existing results, these results for bounding box prediction may be considered as a benchmark
Keywords: statistical query tree; intrusion detection system; IDS; outlier; statistical hierarchical clustering; SHiC; cyber-attack; CICIDS-2017.
DOI: 10.1504/IJDMMM.2024.10059511

Improving Intrusion Detection in the IoT with African Vultures Optimization Algorithm-Based Feature Selection
by Mohammed Alweshah, Ghadeer Ahmad Alhebaishan, Sofian Kassaymeh, Saleh Alkhalaileh, Mohammed Ababneh
Abstract: he security of the system may be jeopardised by unsecured data transmitted through IoT devices, and ensuring the reliability of data is critical to maintaining the integrity of information over the internet. To enhance the intrusion detection rate, several investigations have been conducted to develop methodologies capable of identifying the minimum required secure features. One such method is the use of the feature selection procedure with metaheuristic algorithms. In this study, the African vulture optimisation algorithm was used in two wrapper FS approaches to select the most secure features in IoT. The first approach used AVO, while the second employed OBL-AVO, a hybrid model combining AVO with opposition-based learning (OBL) to enhance exploration. Based on the outcomes, it was found that the OBL-AVO is superior to the AVO in enhancing FS. Furthermore, the proposed methods’ were evaluated and compared to four recent approaches.
Keywords: intrusion detection; internet of things; IoT; feature selection; hybrid metaheuristics; African vultures optimisation algorithm; AVO; opposition-based learning; OBL.
DOI: 10.1504/IJDMMM.2024.10060965

Chatbots and ChatGPT: A Bibliometric Analysis and Systematic Review of Publications in Web of Science and Scopus Databases
by Hamed Khosravi, Mohammad Reza Shafie, Morteza Hajiabadi, Ahmed Shoyeb Raihan, Imtiaz Ahmed
Abstract: This paper presents a bibliometric analysis of the scientific literature related to chatbots, focusing specifically on ChatGPT. Chatbots have gained increasing attention recently, with an annual growth rate of 19.16% and 27.19% on the Web of Sciences (WoS) and Scopus, respectively. The research consists of two study phases: 1) an analysis of chatbot literature; 2) a comprehensive review of scientific documents on ChatGPT. In the first phase, a bibliometric analysis is conducted on all the published literature from both Scopus (5,839) and WoS (2,531) databases covering the period from 1998 to 2023. Consequently, bibliometric analysis has been carried out on ChatGPT publications, and 45 published studies have been analysed thoroughly based on their methods, novelty, and conclusions. Overall, the study aims to provide guidelines for researchers to conduct their research more effectively in the field of chatbots and specifically highlight significant areas for future investigation into ChatGPT.
Keywords: chatbot; ChatGPT; bibliometrics; artificial intelligence; natural language processing; NLP; generative artificial intelligence.
DOI: 10.1504/IJDMMM.2024.10061138

An Irregular CLA-based Novel Frequent Pattern Mining Approach
by Moumita Ghosh, Sourav Mondal, Harshita Moondra, Dina Tri Utari, Anirban Roy, Kartick Chandra Mondal
Abstract: Frequent itemset mining has received a lot of attention in the field of data mining. Its main objective is to find groups of items that consistently appear together in datasets. Even while frequent itemset mining is useful, the algorithms for mining frequent itemsets have quite high resource requirements. In order to optimise the time and memory needs, a few improvements have been made in recent years. This study proposes CellFPM, a straightforward yet effective cellular learning automata-based method for finding frequent itemset occurrences. It works efficiently with large datasets. The efficiency of the proposed approach in time and memory requirements has been evaluated using benchmark datasets explicitly designed for performance measure. The varying size and density of the test datasets have confirmed the scalability of the suggested method. The findings show that CellFPM consistently surpasses the leading algorithms in terms of runtime and memory usage, particularly memory usage mostly.
Keywords: cellular learning automata; CLA; frequent itemsets; data mining; knowledge discovery.
DOI: 10.1504/IJDMMM.2024.10061507

Clustering-based Multidimensional Sequential Pattern Mining of Semantic Trajectories
by Thouraya Sakouhi, Jalel Akaichi
Abstract: Knowledge discovery from mobility data is about identifying behaviours from trajectories. In fact, mining masses of trajectories is required to have an overview of this data, notably, investigate the relationship between different entities movement. Most state-of-the-art work in this issue operates on raw trajectories. Nevertheless, behaviours discovered from raw trajectories are not as rich and meaningful as those discovered from semantic trajectories. In this paper, we establish a mining approach to extract patterns from semantic trajectories. We propose to apply sequential pattern mining based on a pre-processing step of clustering to alleviate the former's temporal complexity. Mining considers the spatial and temporal dimensions at different levels of granularity providing then richer and more insightful patterns about humans behaviour. We evaluate our work on tourists semantic trajectories in Kyoto. Results showed the effectiveness and efficiency of our model compared to state-of-the-art work.
Keywords: mobility data; trajectories; semantic modelling; sequential pattern mining; clustering; mobility pattern.
DOI: 10.1504/IJDMMM.2024.10061616

A Comparative Analysis of User Attitudes towards ICO and IEO in Blockchain Projects: Insights from Social Media Big Data
by ShengJuan Zhao, GyooGun Lim
Abstract: This study conducts a comparative analysis of two popular crowdfunding methods in the blockchain market, the initial coin offering (ICO) and the initial exchange offering (IEO) models. Using project names as keywords, we collected and analysed big data, applying techniques such as TF-IDF, LDA, social network analysis, and sentiment analysis. Our findings show that the attitude of target groups towards ICO and IEO projects is not significantly different, although IEO targets exhibit more interest in entertainment-related topics. Social network analysis reveals that the ICO target group is more sensitive to popular elements, such as pop singers, while the IEO target group is more interested in soccer competitions. Both projects show a strong interest in the US election. Our study suggests that IEO, as an upgraded financing model of ICO, does not yet enjoy high levels of trust from the market crowd. By identifying the preferences of the target groups for both models through multiple analyses, we recommend that these preferences be taken into consideration to improve the efficiency of targeted marketing.
Keywords: blockchain; big data; token issuance; initial coin offering; ICO; initial exchange offering; initial exchange offering; IEO.
DOI: 10.1504/IJDMMM.2024.10062229

A Node sets based Fast and Scalable Frequent Itemset (FSFIM) Algorithm for Mining Big Data using MapReduce Paradigm
by Borra Sivaiah, R. .Rajeswara Rao
Abstract: Big Data is rapidly growing, making traditional tools inefficient for handling large amounts of data. Existing algorithms for frequent itemset mining struggle with scalability due to limitations in parallel processing power. In this paper, we proposed a fast and scalable frequent itemset mining (FSFIM) algorithm used to generate frequent item sets from huge data. Preorder coding (POC) trees and Nodeset data structures save half the memory of node-lists and N-lists. The FSFIM uses Cloudera’s CDH Map Reduce framework. With a maximum speedup value of 1.85 when minimal support is set to 1, The experimental results reveal that FSFIM outperforms the state-of-the-art methods such as HBPFP, Mlib PFP, and Big FIM. Fast and scalable frequent itemset mining algorithm is more scalable and faster for mining frequent item sets from big data.
Keywords: big data; frequent itemset mining; FIM; MapReduce paradigm; fast and scalable frequent itemset mining; FSFIM.
DOI: 10.1504/IJDMMM.2024.10062349

Forthcoming and Online First Articles

International Journal of Data Mining, Modelling and Management

Keep up-to-date