International Journal of Data Analysis Techniques and Strategies (IJDATS) Inderscience Publishers - linking academia, business and industry through research

Forthcoming Articles

International Journal of Data Analysis Techniques and Strategies

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

International Journal of Data Analysis Techniques and Strategies (16 papers in press)

Regular Issues

Using the BIRCH Algorithm and Affinity Propagation, an Advanced Descriptor for Video Processing
by Jayanta Mondal, Jitendra Pramanik, Satyajit Pattnaik, Bijay Paikaray
Abstract: Video summarisation is the most preferred approach to administer the augmentation of video content. In the area of video surveillance and object and intrusion detection, Video Summarization has been the most popular as it provides concise and less redundant information. As video content continues to expand quickly, an automatic video summary would be helpful for anyone who wants to learn more quickly and with less effort. Most existing methods depend on various network architectures to train a single score predictor for shot rating and selection. This study addresses the issue of video summarisation, which involves selecting significant frames to succinctly and comprehensively express the material of the original film. The current paper presents a comparative study of the application of advanced texture descriptors Local Phase Quantization (LPQ), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) in the process of Video Summarization. Clusters of key frames have been extracted by unsupervised learning algorithms - Affinity Propagation & BIRCH. The performance of the proposed video summarising method has shown good trial results.
Keywords: Local Ternary Pattern; Local Binary Pattern; Affinity Propagation; Local Phase Quantization; BIRCH; Key Feature.
DOI: 10.1504/IJDATS.2025.10065080

Prediction Model for AQI through Indian Vedic Science: Knowledge Management Technique to Control Pollution and for Sustainable Society
by Rohit Rastogi, Saransh Chauhan, Yash Rastogi, Vaibhav Aggarwal, Utkarsh Agrawal, Richa Singh
Abstract: The paper provides an essence of how Indian Vedic Sciences can be used for preventing and predicting the ill effects of pollution on the human body and nature through adopting simple methods of Yajna and Hawan in daily routine. With respect to any other resource like land and water, air is considered as the most important resource. Evidence shows that Indian Vedic Sciences primarily focus on prana vayu which means air that we breathe. The authors team and the Central Pollution Control Board (CPCB) have gathered the data and reading of the last four months through installed sensors in an isolated as well as non-isolated environment that was continuously under the effects of Yajna and Hawan.
Keywords: AQI; PM 2.5; PM 10; Climate Change; Yajna; Mantra; Human Health; Economic Growth; Knowledge Management; Knowledge Pyramid; Sustainable Society; Knowledge Levels and Extractions.
DOI: 10.1504/IJDATS.2025.10065356

Enhanced Pearl Millet Mildew Disease Detection using Ensemble Deep Learning Methods
by Aditya Kumar, Jainath Yadav
Abstract: Millet crops play a crucial role in global food security, providing sustenance to millions of people worldwide. Mildew disease poses a significant threat to pearl millet, a staple crop in many regions, impacting both its quality and yield. Detecting diseases in millet crops is crucial for maintaining both the quality and quantity of agricultural yields. However, limited labelled data and the expense of manual data labelling pose significant challenges in this domain. To address these issues, we suggest a deep learning ensemble framework that utilises the potential of multiple models for enhanced disease detection accuracy. Ensembles integrate the strengths of individual deep-learning models to improve overall performance and robustness. DenseNet121 and ResNet50, two deep learning models, were selected as the base models in our ensemble. Preliminary experimental results demonstrate the effectiveness of our ensemble approach, with an impressive accuracy of 96.6%.
Keywords: Millet crops; Leaf disease; Precision agriculture; Deep learning; Machine learning.
DOI: 10.1504/IJDATS.2025.10066101

A Comprehensive and Comparative Analysis of Deep Learning Models for Textual Sentiment Analysis
by Leyla Mammadova
Abstract: Analyzing public opinion may provide important insights for us. Sentiment analysis is a textual data analysis technique that identifies subjective information expressed by people or groups, including views and emotions. By advancing natural language processing and deep learning approaches, sentiment analysis advances our comprehension of human language. In this study, we provide a thorough evaluation and comparative analysis of various deep learning models, such as RNNs, LSTMs, and GRUs, and their bidirectional variants. We achieve an analysis with four datasets that are accessible to the public: The imdb_reviews, Twitter Sentiment Dataset, Emotions dataset and ag_news_subset. We assess the accuracy of six well-known deep learning models performance. Our experimental results demonstrate that bidirectional architectures perform generally better than their unidirectional equivalents. The bidirectional models consistently achieved the highest accuracy across different datasets.
Keywords: Sentiment analysis; RNN; LSTM; GRU; Bidirectional RNN; Bidirectional LSTM; Bidirectional GRU.
DOI: 10.1504/IJDATS.2025.10066752

Volatility Modelling and Forecasting in Stock Markets: a Machine Learning Approach
by Soumen Ghosh, Kuntal Mukherjee, Biswajit Jana, Syed Saif Ahmed, Mohammad Aasif, Sayel Munsi
Abstract: This research explores the application of various models for stock price prediction, including ARIMA, LSTM, SARIMAX, and a hybrid SARIMAX-LSTM, highlighting their importance in the post-pandemic financial landscape. The study emphasises the limitations of traditional methods and the necessity of time-series analysis for understanding stock price patterns. It focuses on the impact of COVID-19 on financial markets and assesses the reliability of these models in unpredictable conditions. The methodology involves data selection, pre-processing, model parameter tuning, and performance evaluation. The research establishes a framework for the implementation of these models, underscoring the need for parameter optimisation to enhance accuracy. Ultimately, the study shows that LSTM performs better than the other models and offers valuable insights into using advanced forecasting techniques for improved investment strategies in the evolving stock market.
Keywords: LSTM; ARIMA ; Moving average (MA); Autoregressive (AR) ; Mean Absolute Error (MAE); Mean Squared Error (MSE); Root Mean Squared Error (RMSE); and R-squared (R²).
DOI: 10.1504/IJDATS.2025.10066979

Analysing Social Medial Sentiment: Unravelling the Trichotomy of Positive, Negative, and Neutral Sentiments in User Comments
by Reddy Sowmya Vangumalla, Yoonsuk Choi
Abstract: This study explores sentiment analysis of Twitter comments, focusing on neutral, negative, and positive attitudes. By applying advanced techniques such as feature engineering, data pre-processing, and machine learning, we aim to derive actionable insights. Our approach involves setting project goals, selecting data sources, and establishing infrastructure for analysis. After pre-processing, we utilise support vector machines (SVMs) for classification and evaluate the model with metrics like accuracy, precision, recall, and F1-score. Visualisation tools, including ROC curves and confusion matrices, help interpret the results. We discuss the limitations and suggest future research to enhance performance and address data quality issues.
Keywords: Data Analysis; Decision-Making; Feature Engineering; Machine Learning; Sentiment Analysis; Social Media; Support Vector Machines; Twitter; Text Preprocessing.
DOI: 10.1504/IJDATS.2025.10067092

Climatic Data Analysis Using Machine Learning and Correlation with Human Health
by Rohit Rastogi, Prabhinav Mishra, Rayush Jain, Prateek Singh
Abstract: Climatic data analysis and effects on human health is a data science project that focuses on the analysis and interpretation of climatic data to gain valuable insights into past and present climate patterns. The project utilises advanced data analytics techniques like regression models to process and analyse large-scale climatic datasets, enabling the identification of trends and patterns that contribute to a deeper understanding of climate dynamics. The primary objectives of this project are to investigate climate change phenomena, assess the impact of climatic change on human health, and predict the variation of spread of diseases as per the different climatic conditions. By employing various statistical models, machine learning algorithms, and visualisation tools, the project aims to uncover hidden relationships within the data and provide evidence-based findings for policymakers, researchers, and stakeholders. To achieve these goals, the project leverages diverse sources of climatic data, including maximum and minimum temperature records, rainfall and humidity measurements, atmospheric pressure data etc.
Keywords: Jupyter NoteBook; Pandas; Linear Regression.
DOI: 10.1504/IJDATS.2025.10067196

Comparing Discrimination and Calibration Performance of Two Flexible Link Functions in Discrete Survival Models
by Susan Maposa, Alphonce Bere, Caston Sigauke, Charles Chimedza
Abstract: This study provides the first direct comparison between the Pareto and Logit-power link functions within discrete survival models, evaluated alongside three commonly used links. We assess their discrimination and calibration using simulated and real-life datasets with varying skewness. Simulations included 100 data sets with symmetric, right-skewed, and left-skewed distributions, and bootstrapping was applied for robust evaluation. The results show that cloglog excels in discrimination, while logit offers superior calibration. The Pareto family demonstrates robust performance, making it a reliable secondary option. However, Logit-power performs poorly in calibration and is unsuitable for discrete survival models. The study offers practical recommendations for implementing the Logit-power link, addressing its complex estimation process, and suggests a grid search approach using information criteria for parameter optimization. These findings highlight the importance of carefully selecting link functions in discrete survival modeling.
Keywords: Calibration; Discrimination; Discrete survival models; Families of link functions.
DOI: 10.1504/IJDATS.2025.10067711

Design a Modern Scheme for Machine Learning-Based Detection of Image Forgery
by Emir Kalik, Ayad Adhab
Abstract: The rapid growth and development of information technology have led to the emergence of numerous methods that are used for digital image forgery. Thus, manipulating digital images to achieve a negative or positive purpose has become easy. The use of advanced methods in forgery has increased the difficulty of detecting the nature of the images, whether they are original or forged, especially when using classical methods. Therefore, many researchers are interested in this field, making it a popular research direction for researchers. In this paper, we will introduce an intelligent approach to designing a method for digital image forgery detection by using machine learning. This proposal seeks to train an intelligent model to discern between altered and original images by examining the essential features of the images. The results demonstrated that it achieved superior performance and high accuracy when it came to detecting forgeries in digital images.
Keywords: Convolutional Neural Network; CNN; Deep Reinforcement Learning; DRL; Forgery; Image Detection; Manipulation.
DOI: 10.1504/IJDATS.2026.10068720

Development of G-Causality by Utilising Hybridisation of Bootstrap Method for Assessing Tourism Impacts in Malaysia
by Anton Abdulbasah Kamil, Muhamad Safiih Lola
Abstract: This study aims to develop and examine the causality direction of non-economic short and long-term factors in the Malaysian tourism industry using a new hybrid Bootstrap-Granger Model. The proposed method was validated with non-economic factor dataset from the World Bank (tourist arrival, population, air transport, and carbon dioxide emission) in the tourism industry. The model effectiveness was tested and analysed by comparing it against the actual Granger model using statistical tests such as unit root, Johansen cointegration, and Granger causality tests. The empirical results revealed that compared to the Granger model, the proposed counterpart generated smaller mean square error and root mean square error values for non-economic factor datasets. Furthermore, the results also revealed that tourist arrival and other determinants were co-integrated. In other words, the proposed model enhanced Granger causality accuracy and proved to be more robust, precise, and accurate results towards the promotion of overall economic activities.
Keywords: Bootstrap method; Granger Causality; Hybridization; Tourism Impact and non-economy factors; Malaysia.
DOI: 10.1504/IJDATS.2026.10069162

MUSEM: Combining Multi-UpSampling and Ensemble learning Methods for Effective Financial Fraud Detection
by Asieh Bagheri, Hossein Rahmani, Mohamad Mahdi Yadegar
Abstract: The rise of electronic payments, both online and in-person, has coincided with an increase in fraudulent and defaulted transactions, leading to significant financial losses. Researchers have explored various machine learning models for anomaly detection in credit card transactions, but challenges such as overlapping data classes and imbalanced distributions persist. To address these issues, we propose a dual-strategy approach called MUSEM, which integrates multi-up sampling with ensemble learning for enhanced fraud detection. MUSEM combines seven individual models into a unified framework, offering a more efficient method for identifying fraud. This study presents a comprehensive review and comparative analysis of various machine learning algorithms employed in financial fraud detection. Experimental results demonstrate a 3% improvement in recall over individual classifiers, affirming the effectiveness of the ensemble learning paradigm adopted in MUSEM. The findings highlight MUSEMs potential for real-world fraud detection applications, improving electronic payment security and reducing financial risks.
Keywords: UpSampling techniques; Ensemble learning; Financial Fraud; Machine learning; Majority voting; MUSEM.
DOI: 10.1504/IJDATS.2026.10070295

A Large Language Model-Based Named Entity Recognition Framework for Med-Sig Parsing
by Madeline Chudy, Kewal Mishra, Chun-Kit Ngan
Abstract: Medication Signatures (med-sigs) provide essential instructions for medication use, often documented with shorthand and abbreviations. While there is a widely accepted list of common abbreviations, these shortcuts can lead to medication errors, resulting in an estimated 44,000 to 98,000 hospital deaths annually in the U.S. and costing between $37.6 to $50 billion in healthcare expenses, disability, and lost productivity. Standardizing and translating med-sigs across medical facilities is crucial. Natural Language Processing (NLP) and Named Entity Recognition (NER) technologies are key in automating the interpretation of medical prescriptions, breaking down complex instructions into identifiable elements. This paper analyzes state-of-the-art NER med-sig parsing models, evaluates their efficacy, and identifies gaps in their application. We propose adaptations and develop a pipeline using GPT-4 for NER on med-sigs. Analysing a dataset of 177 med-sigs, our pipeline outperformed nine existing parsing models, demonstrating its effectiveness.
Keywords: natural language processing; named entity recognition; large language models; medical signatura analysis; parsing; medication errors.
DOI: 10.1504/IJDATS.2026.10071471

Application of Generalised Regression Neural Network for Financial Time Series Forecasting: a Comprehensive Comparison with Autoregressive Integrated Moving Average
by Hoang Duc Le, Ke Nghia Nguyen
Abstract: Time series forecasting is highly significant in various fields, including economics, business, and finance. Autoregressive Integrated Moving Average (ARIMA) and its variations are well known for their superior ability to forecast with precision and accuracy. Nevertheless, introducing advanced computer processing capabilities and developing sophisticated Machine Learning (ML) approaches and Deep Learning (DL) methodologies have led to the creation of new algorithms for time series analysis and prediction. This study investigates whether DL-based forecasting algorithms provide a superior performance compared to traditional forecasting approaches. We found that the Generalized Regression Neural Network (GRNN) outperformed ARIMA regarding forecasting accuracy. GRNN has superior accuracy in predictions, with an error margin of less than 5%. GRNN also outperforms ARIMA in statistical measures like MAE, RMSE, and MAPE. Furthermore, the GRNN algorithm enjoys the advantage of shorter training times, which is particularly beneficial in situations when frequent transaction predictions are needed.
Keywords: Time Series Forecasting; Machine Learning; Deep Learning; Generalised Regression Neural Network (GRNN); Autoregressive Integrated Moving Average (ARIMA).
DOI: 10.1504/IJDATS.2026.10072162

Better Credit Decisioning through Scorecard Surrogate Models for Machine Learning Algorithms
by Billie Anderson, Naeem Siddiqi, Mark Newman, J. Michael Hardin
Abstract: Over the last several years, the application of machine-learning models, called black-box models, has become a popular research topic in credit scoring. This study illustrates how surrogate models can be used to interpret credit decisions made using black-box models. A framework for using surrogate models in a credit scoring context is used to explain and interpret well-known machine learning models (e.g., neural networks, forests, gradient boosting, and support vector machines). This study uses real-world anonymized consumer bureau data obtained from Equifax to illustrate the degree of interpretability that can be achieved using machine learning models to assess the creditworthiness of loan applicants. The main objective of this study is to show practitioners how surrogate scorecard models can be used to interpret some of the most popular machine learning models in a credit scoring decision making process.
Keywords: credit scoring; explainable machine learning models; surrogate models.
DOI: 10.1504/IJDATS.2026.10072545

Enhanced Sales Forecasting Through Auto Regression and Cycle-GAN Models
by Arif Hossen, Md Refat Hossain, Mithun Kumar PK.
Abstract: Precise sales forecasting is essential for businesses to manage inventory, and allocate resources. However, traditional methods often struggle to capture sales data's complex patterns, seasonality, and dynamic nature. The problem lies in the limitations of existing forecasting techniques, which fail to model the convoluted relationships and dependencies within time series data. To address this challenge, we propose a novel method that combines the strength of autoregression(AR) and Cycle-GAN(Generative Adversarial Network) models. By applying the strengths of autoregression for capturing linear-temporal dependencies and utilising Cycle-GAN's capability to learn non-linear mappings between different-domains. Experimental results on real-world sales datasets demonstrate the excellent performance of our approach, outperforming cutting- edge forecasting methods in terms of accuracy, adaptability, and generalisation. The proposed AR-CycleGAN model delivers superior results and surpasses all other cutting-edge models with an accuracy of 98.96%, a precision of 98.16%, a recall of 98.97%, and an F1-score of 98.56%.
Keywords: Auto Regression (AR); GAN; Cycle-GAN; Sales Forecasting; Time Series data; Machine Learning (ML); Deep Learning (DL); Business Analytics.
DOI: 10.1504/IJDATS.2026.10073097

Depression Detection : Analysing Social and Private Contexts for Detection with Deep Learning
by Gaurav Kumar Gupta, Dilip Kumar Sharma
Abstract: The potential social networks offer information, such as emotions, psychological behaviours, and opinions, enabling the psychological analysis to assess the mental state for depression detection. However, recognising the depression state from the linguistic content in the social network becomes insufficient. Even though social networks provide multifarious data for analysing the mindset, depression sufferers are reluctant to express their feelings publicly on social media. Thus, investigating the private context of an individual becomes crucial for accurate decision-making. Hence, considering the social and private context offers the most prominent solution to depression detection. This work proposes the social and private context-based depression (SPriD) detection model using deep learning. Moreover, the proposed approach integrates the depression tendency from social and private contexts to distinguish the depressive and non-depressive individuals. Thus, the results of SpriD show the superiority of the proposed depression detection approach.
Keywords: Depression Detection; Social Context; Private Context; NRC lexicon; Word-level Weighted Vectorization; Multi-Task Semi-Supervised Learning; Weighted attention; and Hybrid Deep Learning.
DOI: 10.1504/IJDATS.2026.10073294

Forthcoming Articles

International Journal of Data Analysis Techniques and Strategies

Keep up-to-date