Forthcoming Articles

International Journal of Data Analysis Techniques and Strategies

International Journal of Data Analysis Techniques and Strategies (IJDATS)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Analysis Techniques and Strategies (17 papers in press)

Regular Issues

  • Using the BIRCH Algorithm and Affinity Propagation, an Advanced Descriptor for Video Processing   Order a copy of this article
    by Jayanta Mondal, Jitendra Pramanik, Satyajit Pattnaik, Bijay Paikaray 
    Abstract: Video summarisation is the most preferred approach to administer the augmentation of video content. In the area of video surveillance and object and intrusion detection, Video Summarization has been the most popular as it provides concise and less redundant information. As video content continues to expand quickly, an automatic video summary would be helpful for anyone who wants to learn more quickly and with less effort. Most existing methods depend on various network architectures to train a single score predictor for shot rating and selection. This study addresses the issue of video summarisation, which involves selecting significant frames to succinctly and comprehensively express the material of the original film. The current paper presents a comparative study of the application of advanced texture descriptors Local Phase Quantization (LPQ), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) in the process of Video Summarization. Clusters of key frames have been extracted by unsupervised learning algorithms - Affinity Propagation & BIRCH. The performance of the proposed video summarising method has shown good trial results.
    Keywords: Local Ternary Pattern; Local Binary Pattern; Affinity Propagation; Local Phase Quantization; BIRCH; Key Feature.
    DOI: 10.1504/IJDATS.2025.10065080
     
  • Prediction Model for AQI through Indian Vedic Science: Knowledge Management Technique to Control Pollution and for Sustainable Society   Order a copy of this article
    by Rohit Rastogi, Saransh Chauhan, Yash Rastogi, Vaibhav Aggarwal, Utkarsh Agrawal, Richa Singh 
    Abstract: The paper provides an essence of how Indian Vedic Sciences can be used for preventing and predicting the ill effects of pollution on the human body and nature through adopting simple methods of Yajna and Hawan in daily routine. With respect to any other resource like land and water, air is considered as the most important resource. Evidence shows that Indian Vedic Sciences primarily focus on prana vayu which means air that we breathe. The authors team and the Central Pollution Control Board (CPCB) have gathered the data and reading of the last four months through installed sensors in an isolated as well as non-isolated environment that was continuously under the effects of Yajna and Hawan.
    Keywords: AQI; PM 2.5; PM 10; Climate Change; Yajna; Mantra; Human Health; Economic Growth; Knowledge Management; Knowledge Pyramid; Sustainable Society; Knowledge Levels and Extractions.
    DOI: 10.1504/IJDATS.2025.10065356
     
  • Enhanced Pearl Millet Mildew Disease Detection using Ensemble Deep Learning Methods   Order a copy of this article
    by Aditya Kumar, Jainath Yadav 
    Abstract: Millet crops play a crucial role in global food security, providing sustenance to millions of people worldwide. Mildew disease poses a significant threat to pearl millet, a staple crop in many regions, impacting both its quality and yield. Detecting diseases in millet crops is crucial for maintaining both the quality and quantity of agricultural yields. However, limited labelled data and the expense of manual data labelling pose significant challenges in this domain. To address these issues, we suggest a deep learning ensemble framework that utilises the potential of multiple models for enhanced disease detection accuracy. Ensembles integrate the strengths of individual deep-learning models to improve overall performance and robustness. DenseNet121 and ResNet50, two deep learning models, were selected as the base models in our ensemble. Preliminary experimental results demonstrate the effectiveness of our ensemble approach, with an impressive accuracy of 96.6%.
    Keywords: Millet crops; Leaf disease; Precision agriculture; Deep learning; Machine learning.
    DOI: 10.1504/IJDATS.2025.10066101
     
  • Climatic Data Analysis Using Machine Learning and Correlation with Human Health   Order a copy of this article
    by Rohit Rastogi, Prabhinav Mishra, Rayush Jain, Prateek Singh 
    Abstract: Climatic data analysis and effects on human health is a data science project that focuses on the analysis and interpretation of climatic data to gain valuable insights into past and present climate patterns. The project utilises advanced data analytics techniques like regression models to process and analyse large-scale climatic datasets, enabling the identification of trends and patterns that contribute to a deeper understanding of climate dynamics. The primary objectives of this project are to investigate climate change phenomena, assess the impact of climatic change on human health, and predict the variation of spread of diseases as per the different climatic conditions. By employing various statistical models, machine learning algorithms, and visualisation tools, the project aims to uncover hidden relationships within the data and provide evidence-based findings for policymakers, researchers, and stakeholders. To achieve these goals, the project leverages diverse sources of climatic data, including maximum and minimum temperature records, rainfall and humidity measurements, atmospheric pressure data etc.
    Keywords: Jupyter NoteBook; Pandas; Linear Regression.
    DOI: 10.1504/IJDATS.2025.10067196
     
  • Design a Modern Scheme for Machine Learning-Based Detection of Image Forgery   Order a copy of this article
    by Emir Kalik, Ayad Adhab 
    Abstract: The rapid growth and development of information technology have led to the emergence of numerous methods that are used for digital image forgery. Thus, manipulating digital images to achieve a negative or positive purpose has become easy. The use of advanced methods in forgery has increased the difficulty of detecting the nature of the images, whether they are original or forged, especially when using classical methods. Therefore, many researchers are interested in this field, making it a popular research direction for researchers. In this paper, we will introduce an intelligent approach to designing a method for digital image forgery detection by using machine learning. This proposal seeks to train an intelligent model to discern between altered and original images by examining the essential features of the images. The results demonstrated that it achieved superior performance and high accuracy when it came to detecting forgeries in digital images.
    Keywords: Convolutional Neural Network; CNN; Deep Reinforcement Learning; DRL; Forgery; Image Detection; Manipulation.
    DOI: 10.1504/IJDATS.2026.10068720
     
  • Development of G-Causality by Utilising Hybridisation of Bootstrap Method for Assessing Tourism Impacts in Malaysia   Order a copy of this article
    by Anton Abdulbasah Kamil, Muhamad Safiih Lola 
    Abstract: This study aims to develop and examine the causality direction of non-economic short and long-term factors in the Malaysian tourism industry using a new hybrid Bootstrap-Granger Model. The proposed method was validated with non-economic factor dataset from the World Bank (tourist arrival, population, air transport, and carbon dioxide emission) in the tourism industry. The model effectiveness was tested and analysed by comparing it against the actual Granger model using statistical tests such as unit root, Johansen cointegration, and Granger causality tests. The empirical results revealed that compared to the Granger model, the proposed counterpart generated smaller mean square error and root mean square error values for non-economic factor datasets. Furthermore, the results also revealed that tourist arrival and other determinants were co-integrated. In other words, the proposed model enhanced Granger causality accuracy and proved to be more robust, precise, and accurate results towards the promotion of overall economic activities.
    Keywords: Bootstrap method; Granger Causality; Hybridization; Tourism Impact and non-economy factors; Malaysia.
    DOI: 10.1504/IJDATS.2026.10069162
     
  • MUSEM: Combining Multi-UpSampling and Ensemble learning Methods for Effective Financial Fraud Detection   Order a copy of this article
    by Asieh Bagheri, Hossein Rahmani, Mohamad Mahdi Yadegar 
    Abstract: The rise of electronic payments, both online and in-person, has coincided with an increase in fraudulent and defaulted transactions, leading to significant financial losses. Researchers have explored various machine learning models for anomaly detection in credit card transactions, but challenges such as overlapping data classes and imbalanced distributions persist. To address these issues, we propose a dual-strategy approach called MUSEM, which integrates multi-up sampling with ensemble learning for enhanced fraud detection. MUSEM combines seven individual models into a unified framework, offering a more efficient method for identifying fraud. This study presents a comprehensive review and comparative analysis of various machine learning algorithms employed in financial fraud detection. Experimental results demonstrate a 3% improvement in recall over individual classifiers, affirming the effectiveness of the ensemble learning paradigm adopted in MUSEM. The findings highlight MUSEMs potential for real-world fraud detection applications, improving electronic payment security and reducing financial risks.
    Keywords: UpSampling techniques; Ensemble learning; Financial Fraud; Machine learning; Majority voting; MUSEM.
    DOI: 10.1504/IJDATS.2026.10070295
     
  • A Large Language Model-Based Named Entity Recognition Framework for Med-Sig Parsing   Order a copy of this article
    by Madeline Chudy, Kewal Mishra, Chun-Kit Ngan 
    Abstract: Medication Signatures (med-sigs) provide essential instructions for medication use, often documented with shorthand and abbreviations. While there is a widely accepted list of common abbreviations, these shortcuts can lead to medication errors, resulting in an estimated 44,000 to 98,000 hospital deaths annually in the U.S. and costing between $37.6 to $50 billion in healthcare expenses, disability, and lost productivity. Standardizing and translating med-sigs across medical facilities is crucial. Natural Language Processing (NLP) and Named Entity Recognition (NER) technologies are key in automating the interpretation of medical prescriptions, breaking down complex instructions into identifiable elements. This paper analyzes state-of-the-art NER med-sig parsing models, evaluates their efficacy, and identifies gaps in their application. We propose adaptations and develop a pipeline using GPT-4 for NER on med-sigs. Analysing a dataset of 177 med-sigs, our pipeline outperformed nine existing parsing models, demonstrating its effectiveness.
    Keywords: natural language processing; named entity recognition; large language models; medical signatura analysis; parsing; medication errors.
    DOI: 10.1504/IJDATS.2026.10071471
     
  • Application of Generalised Regression Neural Network for Financial Time Series Forecasting: a Comprehensive Comparison with Autoregressive Integrated Moving Average   Order a copy of this article
    by Hoang Duc Le, Ke Nghia Nguyen 
    Abstract: Time series forecasting is highly significant in various fields, including economics, business, and finance. Autoregressive Integrated Moving Average (ARIMA) and its variations are well known for their superior ability to forecast with precision and accuracy. Nevertheless, introducing advanced computer processing capabilities and developing sophisticated Machine Learning (ML) approaches and Deep Learning (DL) methodologies have led to the creation of new algorithms for time series analysis and prediction. This study investigates whether DL-based forecasting algorithms provide a superior performance compared to traditional forecasting approaches. We found that the Generalized Regression Neural Network (GRNN) outperformed ARIMA regarding forecasting accuracy. GRNN has superior accuracy in predictions, with an error margin of less than 5%. GRNN also outperforms ARIMA in statistical measures like MAE, RMSE, and MAPE. Furthermore, the GRNN algorithm enjoys the advantage of shorter training times, which is particularly beneficial in situations when frequent transaction predictions are needed.
    Keywords: Time Series Forecasting; Machine Learning; Deep Learning; Generalised Regression Neural Network (GRNN); Autoregressive Integrated Moving Average (ARIMA).
    DOI: 10.1504/IJDATS.2026.10072162
     
  • Better Credit Decisioning through Scorecard Surrogate Models for Machine Learning Algorithms   Order a copy of this article
    by Billie Anderson, Naeem Siddiqi, Mark Newman, J. Michael Hardin 
    Abstract: Over the last several years, the application of machine-learning models, called black-box models, has become a popular research topic in credit scoring. This study illustrates how surrogate models can be used to interpret credit decisions made using black-box models. A framework for using surrogate models in a credit scoring context is used to explain and interpret well-known machine learning models (e.g., neural networks, forests, gradient boosting, and support vector machines). This study uses real-world anonymized consumer bureau data obtained from Equifax to illustrate the degree of interpretability that can be achieved using machine learning models to assess the creditworthiness of loan applicants. The main objective of this study is to show practitioners how surrogate scorecard models can be used to interpret some of the most popular machine learning models in a credit scoring decision making process.
    Keywords: credit scoring; explainable machine learning models; surrogate models.
    DOI: 10.1504/IJDATS.2026.10072545
     
  • Enhanced Sales Forecasting Through Auto Regression and Cycle-GAN Models   Order a copy of this article
    by Arif Hossen, Md Refat Hossain, Mithun Kumar PK. 
    Abstract: Precise sales forecasting is essential for businesses to manage inventory, and allocate resources. However, traditional methods often struggle to capture sales data's complex patterns, seasonality, and dynamic nature. The problem lies in the limitations of existing forecasting techniques, which fail to model the convoluted relationships and dependencies within time series data. To address this challenge, we propose a novel method that combines the strength of autoregression(AR) and Cycle-GAN(Generative Adversarial Network) models. By applying the strengths of autoregression for capturing linear-temporal dependencies and utilising Cycle-GAN's capability to learn non-linear mappings between different-domains. Experimental results on real-world sales datasets demonstrate the excellent performance of our approach, outperforming cutting- edge forecasting methods in terms of accuracy, adaptability, and generalisation. The proposed AR-CycleGAN model delivers superior results and surpasses all other cutting-edge models with an accuracy of 98.96%, a precision of 98.16%, a recall of 98.97%, and an F1-score of 98.56%.
    Keywords: Auto Regression (AR); GAN; Cycle-GAN; Sales Forecasting; Time Series data; Machine Learning (ML); Deep Learning (DL); Business Analytics.
    DOI: 10.1504/IJDATS.2026.10073097
     
  • Depression Detection : Analysing Social and Private Contexts for Detection with Deep Learning   Order a copy of this article
    by Gaurav Kumar Gupta, Dilip Kumar Sharma 
    Abstract: The potential social networks offer information, such as emotions, psychological behaviours, and opinions, enabling the psychological analysis to assess the mental state for depression detection. However, recognising the depression state from the linguistic content in the social network becomes insufficient. Even though social networks provide multifarious data for analysing the mindset, depression sufferers are reluctant to express their feelings publicly on social media. Thus, investigating the private context of an individual becomes crucial for accurate decision-making. Hence, considering the social and private context offers the most prominent solution to depression detection. This work proposes the social and private context-based depression (SPriD) detection model using deep learning. Moreover, the proposed approach integrates the depression tendency from social and private contexts to distinguish the depressive and non-depressive individuals. Thus, the results of SpriD show the superiority of the proposed depression detection approach.
    Keywords: Depression Detection; Social Context; Private Context; NRC lexicon; Word-level Weighted Vectorization; Multi-Task Semi-Supervised Learning; Weighted attention; and Hybrid Deep Learning.
    DOI: 10.1504/IJDATS.2026.10073294
     
  • Machine Learning Algorithms to Predict Groundwater Productivity in Iraq   Order a copy of this article
    by Qahtan Yas, Younis K.Hamead 
    Abstract: Groundwater (GW) is a vital water source in most countries, but the decline in surface and groundwater supplies is a significant challenge. This paper presents a promising solution by proposing a new approach to predict future groundwater levels using machine learning algorithms. The dataset of groundwater productivity for the period (20172020) was adopted in Iraq. The methodology was implemented to compare and evaluate the performance of four different machine learning models: the Elman Neural Network, Cascade Neural Network, Layer Recurrent Neural Network, and Nonlinear Autoregressive with Outlier for predicting groundwater productivity in Iraq. Various metrics were adopted to evaluate the proposed ML algorithms. The results showed that the NARX algorithm performed the best in predicting groundwater productivity at the value 4804.68, outperforming other models. This research has the potential to significantly impact future water resource management, offering hope for a more sustainable future.
    Keywords: Groundwater Productivity; ENN algorithm; LRNN algorithm; CNN algorithm; NARX algorithm; Machine Learning algorithms; Sustainability.
    DOI: 10.1504/IJDATS.2026.10074703
     
  • Drawing the Profile of the Parent's Intention to Control their Child's Internet Use with the Naive Bayes Method   Order a copy of this article
    by Esin Avci 
    Abstract: The rapid growth of internet access among children brings both opportunities for learning and significant exposure to online risks. This study applies the Naive Bayes classification algorithm to identify factors influencing parents decisions to install special software designed to block access to harmful websites. Data were collected via surveys from parents, teachers, and students in 26 schools in Giresun, Turkey, encompassing demographic, occupational, and household characteristics. Results indicate that only 15.5% of parents use such software, with adoption more prevalent among parents aged under 35, those with two children, and those employed in the state sector or not currently working. The Naive Bayes model achieved an accuracy rate of 89% (CI: 8195%), demonstrating strong predictive capability for software installation behaviour. These findings underscore the importance of targeted awareness and educational programs for parents, particularly in promoting digital safety measures for children.
    Keywords: Internet security; children; software; machine learning; Naive Bayes classifier.
    DOI: 10.1504/IJDATS.2026.10074711
     
  • Identification of the Importance Level of Characteristic Variables for Food-Insecure Households Using Random Forest   Order a copy of this article
    by Muhammad Subianto, Riska Adelia, Nany Salwa, Evi Ramadhani, Bagus Sartono 
    Abstract: The rise of big data presents both challenges and opportunities for machine learning applications. Random Forest, an ensemble method using Decision Trees and bagging, offers strong predictive performance but often lacks interpretability. This study aims to develop an optimal classification model and identify key factors associated with food-insecure households. Data were sourced from the 2022 National Socioeconomic Survey (SUSENAS) in Aceh Province. The best model used a 55:45 data split with optimised hyperparameters: 88 estimators, 47 max features, 37 max depth, min samples split of 5, min samples leaf of 1, and the entropy criterion. Model performance reached 65.88% accuracy, 70.50% precision, 65.88% recall, and 67.83% F1-score. SHAP was applied to interpret model outputs, revealing the five most influential variables: residence floor area, adequacy of sanitation facilities, education of household head, land asset ownership, and internet access.
    Keywords: Classification; Random Forest; SHAP; Food Insecurity; Aceh.
    DOI: 10.1504/IJDATS.2026.10074894
     
  • Making Data Visualisation more Efficient and Effective   Order a copy of this article
    by Rania Mkhinini Gahar, Olfa Arfaoui, Minyar Sassi Hidri 
    Abstract: Data science is an interdisciplinary study field that uses systems, processes, algorithms, and other frameworks tomake use of massive data amounts. Thus, data scientists integrate a variety of abilities, such as IT, statistics, and business knowledge to evaluate data gathered from clients or other sources utilising sensors, smartphones, web surfing patterns, etc. However, it is their still little-known competence that makes their profile so appealing to recruiters. This is why these uncommon profiles are in high demand. This paper is a return on a definitely attractive and constantly evolving data scientist profession in terms of dashboards highlighting its salary. Defining dashboard performance indicators for data scientists and designing a management and reporting tool which is, to be served for data enthusiasts to rush into the Data Science world and this by using cutting-edge technologies are the main contributions of the proposed work.
    Keywords: Data Science; DataViz; Business Intelligence; Power BI; Dashboard; Big Data.
    DOI: 10.1504/IJDATS.2026.10075121
     
  • A Novel Algorithm for Tree-Based Sequential Pattern Mining using HU-Chain structure   Order a copy of this article
    by Ritika ., Sunil Kumar Gupta 
    Abstract: One of the challenges faced by researchers in the domain of mining high-utility sequential patterns is maintaining the downward closure feature. Addressing this issue, researchers have proposed various properties, representations, and pruning strategies to be utilized in high-utility mining. This paper proposes a novel compact header utility chain data structure for representing the utility information providing quick access to utility values while traversing the lexicographic quantitative sequence tree. The candidates are effectively reduced using a combination of pruning strategies. The sequence identifier information is stored only once in the header node during the construction of utility chain thereby saving time. Experimental data shows that the proposed approach performs better than the current approaches in terms of mining speed as well as the number of candidates created.
    Keywords: Header utility chain; pruning strategies; tree; high utility mining.