International Journal of Data Analysis Techniques and Strategies (IJDATS) Inderscience Publishers - linking academia, business and industry through research

Forthcoming Articles

International Journal of Data Analysis Techniques and Strategies

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

International Journal of Data Analysis Techniques and Strategies (20 papers in press)

Regular Issues

Using the BIRCH Algorithm and Affinity Propagation, an Advanced Descriptor for Video Processing
by Jayanta Mondal, Jitendra Pramanik, Satyajit Pattnaik, Bijay Paikaray
Abstract: Video summarisation is the most preferred approach to administer the augmentation of video content. In the area of video surveillance and object and intrusion detection, Video Summarization has been the most popular as it provides concise and less redundant information. As video content continues to expand quickly, an automatic video summary would be helpful for anyone who wants to learn more quickly and with less effort. Most existing methods depend on various network architectures to train a single score predictor for shot rating and selection. This study addresses the issue of video summarisation, which involves selecting significant frames to succinctly and comprehensively express the material of the original film. The current paper presents a comparative study of the application of advanced texture descriptors Local Phase Quantization (LPQ), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) in the process of Video Summarization. Clusters of key frames have been extracted by unsupervised learning algorithms - Affinity Propagation & BIRCH. The performance of the proposed video summarising method has shown good trial results.
Keywords: Local Ternary Pattern; Local Binary Pattern; Affinity Propagation; Local Phase Quantization; BIRCH; Key Feature.
DOI: 10.1504/IJDATS.2025.10065080

Prediction Model for AQI through Indian Vedic Science: Knowledge Management Technique to Control Pollution and for Sustainable Society
by Rohit Rastogi, Saransh Chauhan, Yash Rastogi, Vaibhav Aggarwal, Utkarsh Agrawal, Richa Singh
Abstract: The paper provides an essence of how Indian Vedic Sciences can be used for preventing and predicting the ill effects of pollution on the human body and nature through adopting simple methods of Yajna and Hawan in daily routine. With respect to any other resource like land and water, air is considered as the most important resource. Evidence shows that Indian Vedic Sciences primarily focus on prana vayu which means air that we breathe. The authors team and the Central Pollution Control Board (CPCB) have gathered the data and reading of the last four months through installed sensors in an isolated as well as non-isolated environment that was continuously under the effects of Yajna and Hawan.
Keywords: AQI; PM 2.5; PM 10; Climate Change; Yajna; Mantra; Human Health; Economic Growth; Knowledge Management; Knowledge Pyramid; Sustainable Society; Knowledge Levels and Extractions.
DOI: 10.1504/IJDATS.2025.10065356

Emoji Translation for Sentiment Analysis in Algerian Arabic Dialect
by Samira Hazmoune, Fateh Bougamouza
Abstract: Sentiment analysis (SA) is an important natural language processing (NLP) field that involves extracting sentiments and opinions from text data. Although SA has advanced significantly, its application to dialectal Arabic text presents challenges due to linguistic nuances and resource constraints. This research investigates the incorporation of emojis into SA for Algerian Arabic dialect (AAD), marking the first exploration of its kind in this area. Specifically, we focus on emoji translation, building upon prior studies highlighting emojis, potential in SA and their translation into meaningful words or sentences as a preprocessing approach. We evaluate the impact of this approach on enhancing sentiment classification in AAD text, specifically focusing on customer reviews of Algerian telephone operators. After preprocessing, including various emoji translation techniques, we employ transfer learning by fine-tuning DziriBERT model on a compiled Algerian dialect dataset. Our results demonstrate promising outcomes and offer novel conclusions and perspectives in AAD sentiment analysis.
Keywords: Sentiment Analysis; Emoji Translation; DziriBERT; Algerian Arabic Dialect; Transfer Learning; Emoji Categorisation; Emoji Handling ; Customer Reviews.
DOI: 10.1504/IJDATS.2025.10065720

Analysis of Online Transaction using Data Analytics Framework
by Md Nurul Islam, Iqbal Hasan, Shahla Tarannum, S.M.K. Quadri
Abstract: Nowadays, online transactions become a necessity for everyone; thus, they generate a vast amount of data, which requires a robust framework to ensure their security, efficiency, and reliability. This research paper explores the application of advanced data analytics techniques to ensure and enhance the confidentiality of the online transaction process. Using this analytics framework, we can analyse patterns, detect anomalies, and predict trends with online transaction data. An online survey was conducted to collect data from one lakh consumers of different geographical regions and diverse working groups. Descriptive analysis has been used in this study to ascertain the present state of online transactions. The study investigates the significance of feature selection, anomaly detection, and clustering methods in identifying patterns, trends, and potential fraud indicators within online transactions. The findings of this research contribute to the growing body of knowledge on leveraging data analytics frameworks to extract valuable insights from online transaction data.
Keywords: Online transactions; Data analytics; Online payment; Security; E-commerce; Analysis.
DOI: 10.1504/IJDATS.2025.10065866

Enhanced Pearl Millet Mildew Disease Detection using Ensemble Deep Learning Methods
by Aditya Kumar, Jainath Yadav
Abstract: Millet crops play a crucial role in global food security, providing sustenance to millions of people worldwide. Mildew disease poses a significant threat to pearl millet, a staple crop in many regions, impacting both its quality and yield. Detecting diseases in millet crops is crucial for maintaining both the quality and quantity of agricultural yields. However, limited labelled data and the expense of manual data labelling pose significant challenges in this domain. To address these issues, we suggest a deep learning ensemble framework that utilises the potential of multiple models for enhanced disease detection accuracy. Ensembles integrate the strengths of individual deep-learning models to improve overall performance and robustness. DenseNet121 and ResNet50, two deep learning models, were selected as the base models in our ensemble. Preliminary experimental results demonstrate the effectiveness of our ensemble approach, with an impressive accuracy of 96.6%.
Keywords: Millet crops; Leaf disease; Precision agriculture; Deep learning; Machine learning.
DOI: 10.1504/IJDATS.2025.10066101

A Comprehensive and Comparative Analysis of Deep Learning Models for Textual Sentiment Analysis
by Leyla Mammadova
Abstract: Analyzing public opinion may provide important insights for us. Sentiment analysis is a textual data analysis technique that identifies subjective information expressed by people or groups, including views and emotions. By advancing natural language processing and deep learning approaches, sentiment analysis advances our comprehension of human language. In this study, we provide a thorough evaluation and comparative analysis of various deep learning models, such as RNNs, LSTMs, and GRUs, and their bidirectional variants. We achieve an analysis with four datasets that are accessible to the public: The imdb_reviews, Twitter Sentiment Dataset, Emotions dataset and ag_news_subset. We assess the accuracy of six well-known deep learning models performance. Our experimental results demonstrate that bidirectional architectures perform generally better than their unidirectional equivalents. The bidirectional models consistently achieved the highest accuracy across different datasets.
Keywords: Sentiment analysis; RNN; LSTM; GRU; Bidirectional RNN; Bidirectional LSTM; Bidirectional GRU.
DOI: 10.1504/IJDATS.2025.10066752

Volatility Modelling and Forecasting in Stock Markets: a Machine Learning Approach
by Soumen Ghosh, Kuntal Mukherjee, Biswajit Jana, Syed Saif Ahmed, Mohammad Aasif, Sayel Munsi
Abstract: This research explores the application of various models for stock price prediction, including ARIMA, LSTM, SARIMAX, and a hybrid SARIMAX-LSTM, highlighting their importance in the post-pandemic financial landscape. The study emphasises the limitations of traditional methods and the necessity of time-series analysis for understanding stock price patterns. It focuses on the impact of COVID-19 on financial markets and assesses the reliability of these models in unpredictable conditions. The methodology involves data selection, pre-processing, model parameter tuning, and performance evaluation. The research establishes a framework for the implementation of these models, underscoring the need for parameter optimisation to enhance accuracy. Ultimately, the study shows that LSTM performs better than the other models and offers valuable insights into using advanced forecasting techniques for improved investment strategies in the evolving stock market.
Keywords: LSTM; ARIMA ; Moving average (MA); Autoregressive (AR) ; Mean Absolute Error (MAE); Mean Squared Error (MSE); Root Mean Squared Error (RMSE); and R-squared (R²).
DOI: 10.1504/IJDATS.2025.10066979

Analysing Social Medial Sentiment: Unravelling the Trichotomy of Positive, Negative, and Neutral Sentiments in User Comments
by Reddy Sowmya Vangumalla, Yoonsuk Choi
Abstract: This study explores sentiment analysis of Twitter comments, focusing on neutral, negative, and positive attitudes. By applying advanced techniques such as feature engineering, data pre-processing, and machine learning, we aim to derive actionable insights. Our approach involves setting project goals, selecting data sources, and establishing infrastructure for analysis. After pre-processing, we utilise support vector machines (SVMs) for classification and evaluate the model with metrics like accuracy, precision, recall, and F1-score. Visualisation tools, including ROC curves and confusion matrices, help interpret the results. We discuss the limitations and suggest future research to enhance performance and address data quality issues.
Keywords: Data Analysis; Decision-Making; Feature Engineering; Machine Learning; Sentiment Analysis; Social Media; Support Vector Machines; Twitter; Text Preprocessing.
DOI: 10.1504/IJDATS.2025.10067092

Climatic Data Analysis Using Machine Learning and Correlation with Human Health
by Rohit Rastogi, Prabhinav Mishra, Rayush Jain, Prateek Singh
Abstract: Climatic data analysis and effects on human health is a data science project that focuses on the analysis and interpretation of climatic data to gain valuable insights into past and present climate patterns. The project utilises advanced data analytics techniques like regression models to process and analyse large-scale climatic datasets, enabling the identification of trends and patterns that contribute to a deeper understanding of climate dynamics. The primary objectives of this project are to investigate climate change phenomena, assess the impact of climatic change on human health, and predict the variation of spread of diseases as per the different climatic conditions. By employing various statistical models, machine learning algorithms, and visualisation tools, the project aims to uncover hidden relationships within the data and provide evidence-based findings for policymakers, researchers, and stakeholders. To achieve these goals, the project leverages diverse sources of climatic data, including maximum and minimum temperature records, rainfall and humidity measurements, atmospheric pressure data etc.
Keywords: Jupyter NoteBook; Pandas; Linear Regression.
DOI: 10.1504/IJDATS.2025.10067196

Comparing Discrimination and Calibration Performance of Two Flexible Link Functions in Discrete Survival Models
by Susan Maposa, Alphonce Bere, Caston Sigauke, Charles Chimedza
Abstract: This study provides the first direct comparison between the Pareto and Logit-power link functions within discrete survival models, evaluated alongside three commonly used links. We assess their discrimination and calibration using simulated and real-life datasets with varying skewness. Simulations included 100 data sets with symmetric, right-skewed, and left-skewed distributions, and bootstrapping was applied for robust evaluation. The results show that cloglog excels in discrimination, while logit offers superior calibration. The Pareto family demonstrates robust performance, making it a reliable secondary option. However, Logit-power performs poorly in calibration and is unsuitable for discrete survival models. The study offers practical recommendations for implementing the Logit-power link, addressing its complex estimation process, and suggests a grid search approach using information criteria for parameter optimization. These findings highlight the importance of carefully selecting link functions in discrete survival modeling.
Keywords: Calibration; Discrimination; Discrete survival models; Families of link functions.
DOI: 10.1504/IJDATS.2025.10067711

An Empirical Examination of Classification Algorithms and Resampling Strategies for Dealing with Imbalanced Datasets: a Comparative Analysis
by Himani Deshpande, Leena Ragha
Abstract: Imbalanced datasets can lead to biased models and inaccurate predictions, thus making it a crucial issue to be addressed. This research comprehensively analyses issues, approaches and evaluation parameters to work with imbalanced dataset based machine learning models. Literature suggests that data imbalance handling methods are categorised into three broad categories namely pre-processing methods, cost-sensitive learning, and ensemble methods. Experiments are conducted to test popular classifiers in combination with three pre-processing methods namely clustered smote, random over sampling, and scaled values on seven standard imbalanced datasets. The results of study show that Random Forest classifier with Random Over Sampling pre-processing method, performed best for most of the datasets with precision values between 0.68 to 1, AUC values between 0.831, and prediction accuracy between 76.199.8%. This study highlights that the choice of the evaluation metric and the pre-processing method can have a significant impact on the performance of the classifier.
Keywords: Imbalanced data; Over sampling; Undersampling; Classifictaion; Cost sensitive; Ensemble Learning; Feature weighing ; Instance Weighing.
DOI: 10.1504/IJDATS.2025.10068244

Design a Modern Scheme for Machine Learning-Based Detection of Image Forgery
by Emir Kalik, Ayad Adhab
Abstract: The rapid growth and development of information technology have led to the emergence of numerous methods that are used for digital image forgery. Thus, manipulating digital images to achieve a negative or positive purpose has become easy. The use of advanced methods in forgery has increased the difficulty of detecting the nature of the images, whether they are original or forged, especially when using classical methods. Therefore, many researchers are interested in this field, making it a popular research direction for researchers. In this paper, we will introduce an intelligent approach to designing a method for digital image forgery detection by using machine learning. This proposal seeks to train an intelligent model to discern between altered and original images by examining the essential features of the images. The results demonstrated that it achieved superior performance and high accuracy when it came to detecting forgeries in digital images.
Keywords: Convolutional Neural Network; CNN; Deep Reinforcement Learning; DRL; Forgery; Image Detection; Manipulation.
DOI: 10.1504/IJDATS.2026.10068720

Development of G-Causality by Utilising Hybridisation of Bootstrap Method for Assessing Tourism Impacts in Malaysia
by Anton Abdulbasah Kamil, Muhamad Safiih Lola
Abstract: This study aims to develop and examine the causality direction of non-economic short and long-term factors in the Malaysian tourism industry using a new hybrid Bootstrap-Granger Model. The proposed method was validated with non-economic factor dataset from the World Bank (tourist arrival, population, air transport, and carbon dioxide emission) in the tourism industry. The model effectiveness was tested and analysed by comparing it against the actual Granger model using statistical tests such as unit root, Johansen cointegration, and Granger causality tests. The empirical results revealed that compared to the Granger model, the proposed counterpart generated smaller mean square error and root mean square error values for non-economic factor datasets. Furthermore, the results also revealed that tourist arrival and other determinants were co-integrated. In other words, the proposed model enhanced Granger causality accuracy and proved to be more robust, precise, and accurate results towards the promotion of overall economic activities.
Keywords: Bootstrap method; Granger Causality; Hybridization; Tourism Impact and non-economy factors; Malaysia.
DOI: 10.1504/IJDATS.2026.10069162

A Cross-Sectional Analysis of Severe SARS Cases Evolution in a Brazilian Municipality using Data Mining Techniques
by Silvano Júnior, William Oliveira, Luis Neto, Hugo Souza, Yúri Sant’Anna
Abstract: The first severe acute respiratory syndrome (SARS) outbreak occurred in China in 2002, followed by other coronavirus variants like MERS (2012), 2019-nCOV (2019), and Omicron (2020). While data mining (DM) has been widely used for SARS classification and decision-making, most studies overlook socioeconomic factors such as income and education. This study applies the cross-industry standard process for data mining (CRISP-DM) framework and DM techniques to predict severe SARS case progression in Recife, Brazil. Using open datasets, it incorporates attributes related to symptoms, pre-existing conditions, and socioeconomic indicators. Three healthcare experts participated in the analysis. Results showed that the apriori algorithm performed best in rule induction, while the decision tree slightly outperformed logistic regression. Notably, correlations emerged between severe case progression and socioeconomic data, underscoring the importance of integrating social determinants in disease classification models. These findings provide insights for improving predictive models and public health strategies.
Keywords: SARS; data mining; machine learning; CRISP-DM.
DOI: 10.1504/IJDATS.2026.10069755

Improving Public Health Outcomes through Accurate UV Index Forecasting: ARIMA and ANN Approach in Songkhla Province
by Korakot Wichitsa-nguan Jetwanna, Orathai Yongseng, Supanan Kongmee, Tanongsak Sukyareak, Wasun Bunyod, Chidchanok Choksuchat, Nuntouchaporn Prateepausanont, Thanathip Limna
Abstract: This research forecasts the UV Index using five weather parameters: temperature, dew point, humidity, wind speed, and atmospheric pressure in Muang District, Songkhla Province, over a period of 1,000 days (from March 6, 2021, to November 30, 2023). It employs a combined ARIMA and ANN model for prediction. The ARIMA model outputs were further used to forecast the UV index with ANN, yielding high accuracy. The dataset was processed to handle missing data using median values. Results showed that the ARIMA model had the MAPE of 0.04% to 26.49%, MAE of 0.3% to 4.3%, and RMSE of 0.4% to 5.4 Meanwhile, the ANN model demonstrated an accuracy of 94.2%.
Keywords: UV Index Prediction; ARIMA; Artificial Neural Networks; Weather Parameters; Public Health Outcomes.
DOI: 10.1504/IJDATS.2026.10070269

MUSEM: Combining Multi-UpSampling and Ensemble learning Methods for Effective Financial Fraud Detection
by Asieh Bagheri, Hossein Rahmani, Mohamad Mahdi Yadegar
Abstract: The rise of electronic payments, both online and in-person, has coincided with an increase in fraudulent and defaulted transactions, leading to significant financial losses. Researchers have explored various machine learning models for anomaly detection in credit card transactions, but challenges such as overlapping data classes and imbalanced distributions persist. To address these issues, we propose a dual-strategy approach called MUSEM, which integrates multi-up sampling with ensemble learning for enhanced fraud detection. MUSEM combines seven individual models into a unified framework, offering a more efficient method for identifying fraud. This study presents a comprehensive review and comparative analysis of various machine learning algorithms employed in financial fraud detection. Experimental results demonstrate a 3% improvement in recall over individual classifiers, affirming the effectiveness of the ensemble learning paradigm adopted in MUSEM. The findings highlight MUSEMs potential for real-world fraud detection applications, improving electronic payment security and reducing financial risks.
Keywords: UpSampling techniques; Ensemble learning; Financial Fraud; Machine learning; Majority voting; MUSEM.
DOI: 10.1504/IJDATS.2026.10070295

A Large Language Model-Based Named Entity Recognition Framework for Med-Sig Parsing
by Madeline Chudy, Kewal Mishra, Chun-Kit Ngan
Abstract: Medication Signatures (med-sigs) provide essential instructions for medication use, often documented with shorthand and abbreviations. While there is a widely accepted list of common abbreviations, these shortcuts can lead to medication errors, resulting in an estimated 44,000 to 98,000 hospital deaths annually in the U.S. and costing between $37.6 to $50 billion in healthcare expenses, disability, and lost productivity. Standardizing and translating med-sigs across medical facilities is crucial. Natural Language Processing (NLP) and Named Entity Recognition (NER) technologies are key in automating the interpretation of medical prescriptions, breaking down complex instructions into identifiable elements. This paper analyzes state-of-the-art NER med-sig parsing models, evaluates their efficacy, and identifies gaps in their application. We propose adaptations and develop a pipeline using GPT-4 for NER on med-sigs. Analysing a dataset of 177 med-sigs, our pipeline outperformed nine existing parsing models, demonstrating its effectiveness.
Keywords: natural language processing; named entity recognition; large language models; medical signatura analysis; parsing; medication errors.
DOI: 10.1504/IJDATS.2026.10071471

Application of Generalised Regression Neural Network for Financial Time Series Forecasting: a Comprehensive Comparison with Autoregressive Integrated Moving Average
by Hoang Duc Le, Ke Nghia Nguyen
Abstract: Time series forecasting is highly significant in various fields, including economics, business, and finance. Autoregressive Integrated Moving Average (ARIMA) and its variations are well known for their superior ability to forecast with precision and accuracy. Nevertheless, introducing advanced computer processing capabilities and developing sophisticated Machine Learning (ML) approaches and Deep Learning (DL) methodologies have led to the creation of new algorithms for time series analysis and prediction. This study investigates whether DL-based forecasting algorithms provide a superior performance compared to traditional forecasting approaches. We found that the Generalized Regression Neural Network (GRNN) outperformed ARIMA regarding forecasting accuracy. GRNN has superior accuracy in predictions, with an error margin of less than 5%. GRNN also outperforms ARIMA in statistical measures like MAE, RMSE, and MAPE. Furthermore, the GRNN algorithm enjoys the advantage of shorter training times, which is particularly beneficial in situations when frequent transaction predictions are needed.
Keywords: Time Series Forecasting; Machine Learning; Deep Learning; Generalised Regression Neural Network (GRNN); Autoregressive Integrated Moving Average (ARIMA).
DOI: 10.1504/IJDATS.2026.10072162

Better Credit Decisioning through Scorecard Surrogate Models for Machine Learning Algorithms
by Billie Anderson, Naeem Siddiqi, Mark Newman, J. Michael Hardin
Abstract: Over the last several years, the application of machine-learning models, called black-box models, has become a popular research topic in credit scoring. This study illustrates how surrogate models can be used to interpret credit decisions made using black-box models. A framework for using surrogate models in a credit scoring context is used to explain and interpret well-known machine learning models (e.g., neural networks, forests, gradient boosting, and support vector machines). This study uses real-world anonymized consumer bureau data obtained from Equifax to illustrate the degree of interpretability that can be achieved using machine learning models to assess the creditworthiness of loan applicants. The main objective of this study is to show practitioners how surrogate scorecard models can be used to interpret some of the most popular machine learning models in a credit scoring decision making process.
Keywords: credit scoring; explainable machine learning models; surrogate models.
DOI: 10.1504/IJDATS.2026.10072545

Enhanced Sales Forecasting Through Auto Regression and Cycle-GAN Models
by Arif Hossen, Md Refat Hossain, Mithun Kumar PK.
Abstract: Precise sales forecasting is essential for businesses to manage inventory, and allocate resources. However, traditional methods often struggle to capture sales data's complex patterns, seasonality, and dynamic nature. The problem lies in the limitations of existing forecasting techniques, which fail to model the convoluted relationships and dependencies within time series data. To address this challenge, we propose a novel method that combines the strength of autoregression(AR) and Cycle-GAN(Generative Adversarial Network) models. By applying the strengths of autoregression for capturing linear-temporal dependencies and utilising Cycle-GAN's capability to learn non-linear mappings between different-domains. Experimental results on real-world sales datasets demonstrate the excellent performance of our approach, outperforming cutting- edge forecasting methods in terms of accuracy, adaptability, and generalisation. The proposed AR-CycleGAN model delivers superior results and surpasses all other cutting-edge models with an accuracy of 98.96%, a precision of 98.16%, a recall of 98.97%, and an F1-score of 98.56%.
Keywords: Auto Regression (AR); GAN; Cycle-GAN; Sales Forecasting; Time Series data; Machine Learning (ML); Deep Learning (DL); Business Analytics.

Forthcoming Articles

International Journal of Data Analysis Techniques and Strategies

Keep up-to-date