Forthcoming and Online First Articles

International Journal of Data Science

International Journal of Data Science (IJDS)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Science (6 papers in press)

Regular Issues

  • Prediction of Customer Churn Risk with Advanced Machine Learning Methods   Order a copy of this article
    by Oguzhan Akan, Abhishek Verma, Sonika Sharma 
    Abstract: Customer churn risk prediction is an important area of research as it directly impacts the revenue stream of businesses. An ability to predict customer churn allows businesses to come up with better strategies to retain existing customers. In this research we perform a comprehensive comparison of feature selection methods, upsampling methods, and machine learning methods on the customer churn risk dataset. (i) Our research compares likelihood-based, tree-based, and layer-based machine learning methods on the churn dataset. (ii) Models built on the churn dataset without upsampling performed better than oversampling methods. However, SMOTE and ADASYN helped stabilize model performance. (iii) The models built on ADASYN dataset were slightly better than the SMOTE counterparts. (iv) It was observed that XGBoost and Deep Cascading Forest combined with XGBoost were consistently better across all metrics compared to other methods. (v) Information Value analysis performed better than PCA. In particular, IVR DCFX model has the best AUROC score with 89.1%.
    Keywords: Customer Churn; Deep Neural Networks; Deep Cascading Forest; Smote; Adasyn.
    DOI: 10.1504/IJDS.2024.10064744
     
  • Self-Evolving Data Collection Through Analytics and Business Intelligence to Predict the Price of Cryptocurrency   Order a copy of this article
    by Adam Moyer, William A. Young II, Timothy J. Haase 
    Abstract: This article presents the Self-Evolving Data Collection Engine through Analytics and Business Intelligence (SEDCABI) for predicting Bitcoin prices. Traditionally models use either structured or unstructured data alone, limiting effectiveness. This research pioneers using both data types. SEDCABI harnesses analytics and BI to extract insights from structured historical price and market data. It also incorporates unstructured social media sentiment and news to capture Bitcoin perceptions. Experiments show integrating both data types significantly improves prediction accuracy. SEDCABI continuously adapts to the dynamic crypto market. The plug-in prediction module enables customization. Overall, SEDCABI offers robust Bitcoin price predictions by combining structured and unstructured data. This contributes to cryptocurrency prediction research with an innovative approach to informed decision-making.
    Keywords: SEDCABI; Prediction; Bitcoin; Cryptocurrency; Text Mining; Analytics; Business Intelligence; Unstructured Data; Sentiment; Price.
    DOI: 10.1504/IJDS.2024.10064877
     
  • Comparison and Database Performance Optimisation Strategies Based on NSGA-II Genetic Algorithm: MySQL and OpenGauss   Order a copy of this article
    by Ming Tang, Lincheng Qi, Sibo Bi, Xinyun Cheng, Shijie Zhang 
    Abstract: In response to the lack of dynamic adjustment and optimization capabilities for real-time environmental changes in database performance optimization strategies, as well as poor query throughput and response time performance, this paper adopted NSGA-II (Non-dominated Sorting Genetic Algorithm II) to study performance optimization of MySQL (My Structured Query Language) and OpenGauss databases Firstly, it defined three objective functions and corresponding constraints for database query response time, query throughput, and query resource utilization, and calculated the fitness of each individual and the crowding distance of each layer Then, the tournament rotation method can be used to output parents with high fitness, and the crossover and mutation probabilities can be set Finally, the optimal parameter configuration of the database can be output The experiment was based on the TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) dataset and compared the performance of MySQL and OpenGauss databases under different parameter configurations The experimental results show that after optimisation by the NSGA-II genetic algorithm, MySQL and OpenGauss databases have certain improvements in query throughput, query response time, and query resource utilisation. Moreover, the optimisation effect on the MySQL database was as high as 90.30%, which is more significant than that on the OpenGauss database.
    Keywords: Database Performance Optimization; MySQL and OpenGauss; Non-dominated Sorting Genetic Algorithm II; Query Response Time; Dynamic Adjustment Capability.
    DOI: 10.1504/IJDS.2024.10065423
     
  • Application of Weaving Based on Log Files in Database Systems   Order a copy of this article
    by Feng Chen, Bin Chen, Huan Xu, Qiuyong Yang, Xiaowen Zeng 
    Abstract: Aspect oriented database (AODB) systems represent a new framework for database systems, and theoretical research and experiments are currently underway In order to improve the weaving efficiency of aspect orient programming (AOP), this article focused on the weaving of log files in AODB This article introduced AOP technology in AODB and compared AOP technology with object oriented programming (OOP) technology The incremental weaving method was selected for log weaving, and the changes in weaving state were calculated The weaving time of the incremental weaving and complete re weaving mechanisms was compared For the normal operation and abnormal restart of AODB systems, quick repair methods for aspect weaving state were provided, and simulation experiments were conducted to verify the effectiveness of this fast repair mechanism The research results indicated that compared to OOP technology, AOP technology can be better applied in log weaving research When notification modifications and connection point changes occur, incremental weaving has shorter weaving time and higher weaving efficiency. The weaving method based on log files can effectively improve the weaving efficiency of AODB and has certain application value.
    Keywords: Log Weaving; Aspect Oriented Database; Aspect Oriented Programming; Incremental Weaving; Weaving State Recovery; Intelligent Decision-making Technology.
    DOI: 10.1504/IJDS.2024.10065629
     
  • A Study of MySQL Protocol-based Database Proxy Approval System for Fortress Machine   Order a copy of this article
    by Xian Zhang, Xinhui Luo, Dong Yin, Taiguo Qu, Hao Li 
    Abstract: With the increase of enterprise informatization, database security, and compliance operation management have become increasingly important. Therefore, it is essential to design an efficient database proxy approval system. In this paper, we develop a database proxy approval system based on the MySQL protocol for fortress machines, which provides a real-time customized configuration scheme for high-risk commands, designs a real-time approval process for six types of high-risk commands, and creates a simple and efficient matching algorithm for high-risk commands. We designed a large number of experiments to test the system's connection success rate, operation stability, response time, CPU resource consumption, matching algorithm performance, and other aspects. The experimental results show that this database proxy approval system has good configuration flexibility, high accuracy, and good time performance. This system has a wide range of applications in electric power, finance, petroleum, and other fields.
    Keywords: Fortress Machine; MySQL Protocol; Database Proxy; Approval System; Database Security.
    DOI: 10.1504/IJDS.2024.10066165
     
  • Intelligent Factory Perception Ability Using Distributed Knowledge Graph   Order a copy of this article
    by Wenjuan Wang, Donghui Shen, Anyin Bao, Jianming Shao, Shunkai Sun 
    Abstract: Traditional research often faces the problem of information isolation from different departments, systems, and data sources, which leads to the inability to obtain comprehensive and cross domain data in the decision-making process, limiting the comprehensive understanding of the entire intelligent factory ecosystem The Proximal Policy Optimization (PPO) algorithm was introduced, combined with the reasoning ability of knowledge graphs, to provide support for complex decision-making problems in intelligent factories, making the decision-making process more intelligent and accurate Intelligent factory data from different departments were collected, and distributed knowledge graphs were constructed Semantic labels for entities and relationships were defined, and data from different data sources was mapped into the semantic model of the knowledge graph Multi-layer perceptrons were used to establish decision-making networks and update policy network parameters through PPO The experimental results showed that the average fault prediction accuracy of PPO combined with distributed knowledge graph reached 96 1%, and the fluctuation of fault prediction accuracy within 12 months is only 0.1%.
    Keywords: Intelligent Factory; Perception Ability; Distributed Knowledge Graph; Fault Prediction; Proximal Policy Optimization.
    DOI: 10.1504/IJDS.2024.10066267