Forthcoming and Online First Articles

International Journal of Data Science

International Journal of Data Science (IJDS)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Science (6 papers in press)

Regular Issues

  • Prediction of Customer Churn Risk with Advanced Machine Learning Methods   Order a copy of this article
    by Oguzhan Akan, Abhishek Verma, Sonika Sharma 
    Abstract: Customer churn risk prediction is an important area of research as it directly impacts the revenue stream of businesses. An ability to predict customer churn allows businesses to come up with better strategies to retain existing customers. In this research we perform a comprehensive comparison of feature selection methods, upsampling methods, and machine learning methods on the customer churn risk dataset. (i) Our research compares likelihood-based, tree-based, and layer-based machine learning methods on the churn dataset. (ii) Models built on the churn dataset without upsampling performed better than oversampling methods. However, SMOTE and ADASYN helped stabilize model performance. (iii) The models built on ADASYN dataset were slightly better than the SMOTE counterparts. (iv) It was observed that XGBoost and Deep Cascading Forest combined with XGBoost were consistently better across all metrics compared to other methods. (v) Information Value analysis performed better than PCA. In particular, IVR DCFX model has the best AUROC score with 89.1%.
    Keywords: Customer Churn; Deep Neural Networks; Deep Cascading Forest; Smote; Adasyn.
    DOI: 10.1504/IJDS.2024.10064744
     
  • Self-Evolving Data Collection Through Analytics and Business Intelligence to Predict the Price of Cryptocurrency   Order a copy of this article
    by Adam Moyer, William A. Young II, Timothy J. Haase 
    Abstract: This article presents the Self-Evolving Data Collection Engine through Analytics and Business Intelligence (SEDCABI) for predicting Bitcoin prices. Traditionally models use either structured or unstructured data alone, limiting effectiveness. This research pioneers using both data types. SEDCABI harnesses analytics and BI to extract insights from structured historical price and market data. It also incorporates unstructured social media sentiment and news to capture Bitcoin perceptions. Experiments show integrating both data types significantly improves prediction accuracy. SEDCABI continuously adapts to the dynamic crypto market. The plug-in prediction module enables customization. Overall, SEDCABI offers robust Bitcoin price predictions by combining structured and unstructured data. This contributes to cryptocurrency prediction research with an innovative approach to informed decision-making.
    Keywords: SEDCABI; Prediction; Bitcoin; Cryptocurrency; Text Mining; Analytics; Business Intelligence; Unstructured Data; Sentiment; Price.
    DOI: 10.1504/IJDS.2024.10064877
     
  • A Study of MySQL Protocol-based Database Proxy Approval System for Fortress Machine   Order a copy of this article
    by Xian Zhang, Xinhui Luo, Dong Yin, Taiguo Qu, Hao Li 
    Abstract: With the increase of enterprise informatization, database security, and compliance operation management have become increasingly important. Therefore, it is essential to design an efficient database proxy approval system. In this paper, we develop a database proxy approval system based on the MySQL protocol for fortress machines, which provides a real-time customized configuration scheme for high-risk commands, designs a real-time approval process for six types of high-risk commands, and creates a simple and efficient matching algorithm for high-risk commands. We designed a large number of experiments to test the system's connection success rate, operation stability, response time, CPU resource consumption, matching algorithm performance, and other aspects. The experimental results show that this database proxy approval system has good configuration flexibility, high accuracy, and good time performance. This system has a wide range of applications in electric power, finance, petroleum, and other fields.
    Keywords: Fortress Machine; MySQL Protocol; Database Proxy; Approval System; Database Security.
    DOI: 10.1504/IJDS.2024.10066165
     
  • Mobile Target Defence Against IoT-DDoS Attacks   Order a copy of this article
    by Liping Wu, Xuehua Zhu 
    Abstract: This study analyses the mobile target defence method and feature extraction process based on multi-source information fusion technology (MSIFT), and introduces a feature level fusion (FLF) method for optimising backpropagation neural network (BPNN) DDoS attacks based on genetic algorithm. The models with 9 nodes and 11 nodes had the best learning performance, with learning rates of 0.37 and 0.15. When the intensity of DDoS attacks was low, the prediction accuracy of the proposed method was about 94%. The actual value was usually small, with the 10th group having the highest actual value, close to 800, and the 19th group having the lowest actual value, about 130. Introducing decision level fusion of DDoS attacks based on D-S evidence fusion can further improve the accuracy of attack detection. This study has made significant progress in improving the efficiency and accuracy of mobile target defence against DDoS attacks in the Internet of Things.
    Keywords: Internet of Things; DDoS attacks; Target defense; Multi source information; Genetic algorithm.
    DOI: 10.1504/IJDS.2025.10066963
     
  • A Data Value-Driven Collaborative Data Collection Method in Complex Multi-Constraint Environments   Order a copy of this article
    by LinLiang Zhang, LianShan Yan, ZhiSheng  Liu, Shuo Li, RuiFang Du, ZhiGuo Hu 
    Abstract: Data collection is a foundational task in mobile crowd sensing. However, existing data collection methods prioritise quantity, neglecting heterogeneity, cooperation, energy efficiency, and collision avoidance, causing low multi-agent efficiency in complex scenarios. To address this issue, this paper integrates multi-agent reinforcement learning and deep learning to propose the CS_MCE method. The CS_MCE method, applying to unmanned aerial vehicle (UAV) collaborative data collection scenarios, utilises deep neural networks to solve representation problems in vast state-action spaces and provides intelligent decision-making capabilities. In various experimental environments with different data values, experiments comparing CS_MCE with the MADDPG and IL-DDPG algorithms in terms of reward values, data quality, energy efficiency, and the number of collisions showed that the data quality collected by CS_MCE increased by 56 times, and energy efficiency improved by more than 60%, demonstrating the efficiency and stability of the CS_MCE method.
    Keywords: Mobile Crowd-sensing; Data Collection; Heterogeneous Data; Unmanned Vehicles; Deep Reinforcement Learning.
    DOI: 10.1504/IJDS.2025.10067169
     
  • A Commensurate Univariate Variable Ranking Method for Classification   Order a copy of this article
    by Nuo Xu, Xuan Huang, Thanh Nguyen, Jake Yue Chen 
    Abstract: To apply a variable ranking method for feature selection in classification, the notion of commensurateness is necessitated by the presence of different types of independent variables in a dataset. A commensurate ranking method is one that produces consistent and comparable ranking results among independent variables of different types, such as numeric vs categorical and discrete vs continuous. We invent a ranking method named Condition Empirical Expectation (CEE) and demonstrate it is the most commensurate among several representative ranking methods. Further, it has the highest statistical power as a test of independence when the categorical dependent variable is imbalanced. These properties make CEE uniquely suitable for fast feature selection for any datasets, especially those with high dimensionality of mixed types of variables. Its usage is demonstrated with a case study in facilitating preprocessing for classification.
    Keywords: variable types; variable ranking; variable relevance; commensurate; statistical dependence.
    DOI: 10.1504/IJDS.2025.10067405