Forthcoming and Online First Articles

International Journal of Data Mining and Bioinformatics

International Journal of Data Mining and Bioinformatics (IJDMB)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Mining and Bioinformatics (31 papers in press)

Regular Issues

  • Plasma proteins related to the state of depression: a case-control study based on proteomics data of pregnant women.   Order a copy of this article
    by Yuhao Feng, Jinman Zhang, Zengyue Zheng, Chenyu Xing, Min Li, Guanghong Yan, Ping Chen, Dingyun You, Ying Wu 
    Abstract: Prenatal and postpartum emotional changes in pregnant women in early pregnancy are of great significance to the physical and mental health of mothers and infants. To identify factors related to this, we conducted this study to identify feature proteins that cause maternal depression. Boruta algorithm (BA), recursive partition algorithm (RPA), regularised random forest (RRF) algorithm, least absolute shrinkage and selection operator (LASSO) algorithm, and genetic algorithm (GA) were used to select features. Extreme gradient boosting (XGBoost), back propagation neural network (BPNN), support vector machine (SVM), random forest (RF), and logistic regression (LR) were selected to construct the predictive models. All models showed a good performance in predicting, with the mean AUC (the area under the receiver operating curve) exceeding 80%. Features will provide clues to prevent depression in pregnant women and improve the physical and mental health of mothers and babies.
    Keywords: pregnant women; depression; proteomics; biomarkers; feature selection.
    DOI: 10.1504/IJDMB.2025.10064226
  • Cross-modal imputation and gated GCN for predicting miRNA-disease associations (CIGGNET)   Order a copy of this article
    by Yan Chen , Zhenjie Hou, Wenguang Zhang, Han Li, Haibin Yao 
    Abstract: microRNA(miRNA) is a short-chain non-coding RNA molecule encoded by endogenous genes. Currently, many miRNAs related to complex diseases have been found, which provides help for further exploring the molecular mechanism of disease pathogenesis. We proposed an algorithm named CIGGNET for predicting the association between miRNA-disease based on cross-modal data imputation and gated graph convolution network. First, CIGGNET uses a cross-modal data imputation operation on the miRNA-disease association matrix to obtain the filled association matrix. Second, CIGGNET integrates miRNA-disease heterogeneous networks, extracts features of miRNAs and diseases use random wander algorithm, and learns miRNA and disease embeddings using graph convolutional network. Third, CIGGNET uses a gating operation to select the appropriate convolution layer. The control gate adaptively outputs suitable convolution layers based on the similarity of different convolution layers and scores unobserved associations. The mean AUC of CIGGNET is 0.9423 in 100 five-fold cross-validations.
    Keywords: miRNA; disease; MiRNA-disease association prediction; cross-modal data imputation; gated graph convolution network.
    DOI: 10.1504/IJDMB.2025.10064546

Special Issue on: AI aided Smart Big Data Applications

  • Digital architectural decoration design and production based on computer image   Order a copy of this article
    by Chan Zhou 
    Abstract: The application of computer image digitisation has realised the transformation of people's production and lifestyle, and also promoted the development of the construction industry. This article aims to realise the research on architectural decoration design and production under computer network environment and promote the ecological development of indoor and outdoor design in the construction industry. This article proposes to use virtual reality technology in image digitisation to guide architectural decoration design research. In the comparative analysis of the weight of architectural decoration elements, among the calculated weights of secondary elements, the spatial function has the largest weight, which is 0.2155, and the landscape has the smallest weight, which is 0.0113. Among the three-level unit weights, the service area has the largest weight, which is 0.0976, and the fence frame has the smallest weight, which is 0.0119.
    Keywords: architectural decoration design; image digitisation; computer technology; virtual reality technology.
    DOI: 10.1504/IJDMB.2024.10060066
  • Evaluation on stock market forecasting framework for AI and embedded real-time system   Order a copy of this article
    by Yu Lin 
    Abstract: Since its birth, the stock market has received widespread attention from many scholars and investors. However, there are many factors that affect stock prices, including the company's own internal factors and the impact of external policies. The extent and manner of fundamental impacts also vary, making stock price predictions very difficult. Based on this, this article first introduces the research significance of the stock market prediction framework, and then conducts academic research and analysis on two key sentences of stock market prediction and artificial intelligence in stock market prediction. Then this article proposes a constructive algorithm theory, and finally conducts a simulation comparison experiment and summarises and discusses the experiment. Research results show that the neural network prediction method is more effective in stock market prediction; the minimum training rate is generally 0.9; the agency's expected dilution rate and the published stock market dilution rate are both around 6%.
    Keywords: stock market forecast; embedded real-time system; artificial intelligence; back propagation neural network; dilution rate.
    DOI: 10.1504/IJDMB.2024.10060068
  • Design of data mining system for sports training biochemical indicators based on artificial intelligence and association rules   Order a copy of this article
    by Dongbiao Liu 
    Abstract: Physiological indicators are an important basis for reflecting the physiological health status of the human body and play an important role in medical practice. Association rules have also been one of the important research hotspots in recent years. This study aims to create a data mining system of association rules and artificial intelligence in biochemical indicators of sports training. This article uses Markov logic for network creation and system training, and tests whether the Markov logic network can be associated with the training system. The results show that the accuracy and recall rate obtained are about 90%, which shows that it is feasible to establish biochemical indicators of sports training based on Markov logic network, and the system has universal, guiding and constructive significance, ensuring that the construction of training system indicators will not go in the wrong direction.
    Keywords: artificial intelligence; association rules; data mining; biochemical indicators.
    DOI: 10.1504/IJDMB.2024.10060195
  • Application of digital twin virtual design and BIM technology in intelligent building image processing   Order a copy of this article
    by Fengyi Han 
    Abstract: Intelligent digital virtual technology has become an indispensable part of modern construction, but there are also some problems in its practical application. Therefore, it is necessary to strengthen the design of intelligent building image processing systems from many aspects. Starting from image digital processing methods, this paper studies the digital twin virtual design scene construction method and related algorithms, converts the original image into a colour digital image through a greyscale algorithm, and then combines morphological knowledge and feature point extraction methods to complete the construction of a three-dimensional virtual environment. Finally, through the comparison of traditional image processing effects with smart building images based on digital twins and BIM technology, the results show that the optimised image processing results have higher clarity, sharper contrast, and a sensitivity increased by 5.84%, presenting better visual effects and solving the risk of misjudgement caused by inaccurate image recognition.
    Keywords: digital twins; building information modelling; BIM; intelligent buildings; electronic imaging.
    DOI: 10.1504/IJDMB.2024.10063209
  • Urban public space environment design based on intelligent algorithm and fuzzy control   Order a copy of this article
    by Ting Song, Yansong Li 
    Abstract: With the development of urban construction, its spatial evolution is also influenced by behavioural actors such as enterprises, residents, and environmental factors, leading to some decision-making behaviours that are not conducive to urban public space and environmental design. At the same time, some cities are vulnerable to various factors such as distance factors, transportation factors, and human psychological factors during the construction of public areas, resulting in a decline in the quality of urban human settlements. Urban public space is the guarantee of urban life. For this, in order to standardise urban public space and improve the quality of urban living environment, the standardisation of the environment of urban public space is required. The rapid development of intelligent algorithms and fuzzy control provides technical support for the environmental design of urban public spaces. Through the modelling of intelligent algorithms and the construction of fuzzy space, it can meet the diverse.
    Keywords: urban public space; environmental design; intelligent algorithm; fuzzy control.
    DOI: 10.1504/IJDMB.2024.10060067
  • Research on low voltage current transformer power measurement technology in the context of cloud computing   Order a copy of this article
    by Chao Yan, Peng Tao, Hongxi Wang, Chunrui Li, Yushuai Zhang 
    Abstract: As IOT develops drastically these years, the application of cloud computing in many fields has become possible. In this paper, we take low-voltage current transformers in power systems as the research object and propose a TCN-BI-GRU power measurement method that incorporates the signal characteristics based on the transformer input and output. Firstly, the basic signal enhancement extraction of input and output is completed by using EMD and correlation coefficients. Secondly, multi-dimensional feature extraction is completed to improve the data performance according to the established TCN network. Finally, the power prediction is completed by using BI-GRU, and the results show that the RMSE of this framework is 5.69 significantly lower than other methods. In the laboratory test, the device after being subjected to strong disturbance, its correlation coefficient feature has a large impact, leading to a large deviation in the prediction, which provides a new idea for future intelligent prediction.
    Keywords: cloud computing; low voltage current transformer; power prediction; empirical mode decomposition; EMD; gated recurrent unit; GRU.
    DOI: 10.1504/IJDMB.2024.10061059
  • Application of AI intelligent technology in natural resource planning and management   Order a copy of this article
    by Hui Cheng 
    Abstract: This article studies the application of artificial intelligence technology in natural resource planning and management. This article first introduces the background of NR and AI intelligent technology, then conducts academic research and summary on NR planning management and AI intelligent technology. Then, an algorithm model based on multi-objective intelligent planning algorithm is established. Finally, simulation experiments are conducted, and experiments summary and discussion are provided. The experimental results show that the average efficiency value of the four stages of NR planning and management before use is 5.25, and the average efficiency value of the four stages of NR planning and management after use is 7. The difference in the average efficiency value before and after use is 1.75. It can be seen that the use of AI intelligent technology can effectively improve the efficiency of natural resource planning and management.
    Keywords: natural resources; planning management; AI intelligence technology; resource management; multiple target.
    DOI: 10.1504/IJDMB.2024.10060785
  • Computer aided translation technology based on edge computing intelligent algorithm   Order a copy of this article
    by Guolan Yang, Weina Xu 
    Abstract: To explore the computer-aided translation technology based on the intelligent algorithm of edge computing. This paper presents the research on computer-aided translation technology based on edge computing intelligent algorithm. In the K-means computer edge algorithm, it analyses the traditional way of average resource allocation and the way of virtual machine allocation. In the process of online solution, we have a more detailed understanding of the data information at the edge, and also avoid the connection relationship between network users and the platform, which has a certain impact on the internal operation efficiency of the system. The network user group is divided into several different types of existence through K-means computer algorithm, and various information resources are counted according to their own characteristics. Computer-aided translation technology can significantly improve the quality of translation, improve the translation efficiency, and reduce the translation cost.
    Keywords: K-means system; translation technology; edge algorithm; base station; KDSAA algorithm; computer-assisted.
    DOI: 10.1504/IJDMB.2024.10062900
  • Design of an intelligent financial sharing platform driven by digital economy and its role in optimising accounting transformation production   Order a copy of this article
    by Yun Ye 
    Abstract: With the expansion of business scope, the environment faced by enterprises has also changed, and competition is becoming increasingly fierce. Traditional financial systems are increasingly difficult to handle complex tasks and predict potential financial risks. In the context of the digital economy era, the booming financial sharing services have reduced labour costs and improved operational efficiency. This paper designs and implements an intelligent financial sharing platform, establishes a fund payment risk early warning model based on an improved support vector machine algorithm, and tests it on the Financial Distress Prediction dataset. The experimental results show that the effectiveness of using F2 score and AUC evaluation methods can reach 0.9484 and 0.9023, respectively. After using this system, the average financial processing time per order decreases by 43%, and the overall financial processing time decreases by 27%. Finally, this paper discusses the role of intelligent financial sharing platform in accounting transformation and optimisation of production.
    Keywords: digital economy; financial sharing; accounting transformation; production optimisation; SMOTE-SVM.
    DOI: 10.1504/IJDMB.2024.10061580
  • Educational countermeasures of different learners in virtual learning community based on artificial intelligence   Order a copy of this article
    by Xiangning Deng 
    Abstract: In order to reduce the challenges encountered by learners and educators in engaging in educational activities, this paper classifies learners' roles in virtual learning communities, and explores the role of behaviour characteristics and their positions in collaborative knowledge construction networks in promoting the process of knowledge construction. This study begins with an analysis of the relationship structure among learners in the virtual learning community and then applies the FCM algorithm to arrange learners into various dimensional combinations and create distinct learning communities. The test results demonstrate that the FCM method performs consistently during the clustering process, with less performance oscillations, and good node aggregation, the ARI value of the model is up to 0.90. It is found that they play an important role in the social interaction of learners' virtual learning community, which plays a certain role in promoting the development of artificial intelligence.
    Keywords: big data; FCM algorithm; social relations; virtual learning community; VLC.
    DOI: 10.1504/IJDMB.2024.10061560
  • Dual network control system for bottom hole throttling pressure control based on RBF with big data computing   Order a copy of this article
    by Yanghou Chen 
    Abstract: In the context of smart city development, the managed pressure drilling (MPD) drilling process faces many uncertainties, but the characteristics of the process are complex and require accurate wellbore pressure control. However, this process runs the risk of introducing un-modelled dynamics into the system. To this problem, this paper employs neural network control techniques to construct a dual-network system for throttle pressure control, the design encompasses both the controller and identifier components. The radial basis function (RBF) network and proportional features are connected in parallel in the controller structure, and the RBF network learning algorithm is used to train the identifier structure. The simulation results show that the actual wellbore pressure can quickly track the reference pressure value when the pressure setpoint changes. In addition, the controller based on neural network realises effective control, which enables the system to track the input target quickly and achieve stable convergence.
    Keywords: controller; identifier; MDP; neural network; radial basis function; RBF.
    DOI: 10.1504/IJDMB.2024.10061267
  • Natural language processing-based machine learning psychological emotion analysis method   Order a copy of this article
    by Yang Zhao 
    Abstract: To achieve psychological and emotional analysis of massive internet chats, researchers have used statistical methods, machine learning, and neural networks to analyse the dynamic tendencies of texts dynamically. For long readers, the author first compares and explores the differences between the two psychoanalysis algorithms based on the emotion dictionary and machine learning for simple sentences, then studies the expansion algorithm of the emotion dictionary, and finally proposes an extended text psychoanalysis algorithm based on conditional random field. According to the experimental results, the mental dictionary's accuracy, recall, and F-score based on the cognitive understanding of each additional ten words were calculated. The optimisation decreased, and the memory and F-score improved. An F-value greater than 1, which is the most effective indicator for evaluating the effectiveness of a mental analysis problem, can better demonstrate that the algorithm is adaptive in the literature dictionary. It has been proven that this scheme can achieve good results in analysing emotional tendencies and has higher efficiency than ordinary weight-based psychological sentiment analysis algorithms.
    Keywords: emotion dictionary; psychological emotion analysis; conditional random field.
    DOI: 10.1504/IJDMB.2024.10061757
  • An empirical study on construction emergency disaster management and risk assessment in shield tunnel construction project with big data analysis   Order a copy of this article
    by Liyu Lu, Meiling Ji, Xi Wen, Yong Xiang 
    Abstract: Emergency disaster management presents substantial risks and obstacles to shield tunnel building projects, particularly in the event of water leakage accidents. Contemporary water leak detection is critical for guaranteeing safety by reducing the likelihood of disasters and the severity of any resulting damages. However, it can be difficult. Deep learning models can analyse images taken inside the tunnel to look for signs of water damage. This study introduces a unique strategy that employs deep learning techniques, generative adversarial networks (GAN) with long short-term memory (LSTM) for water leakage detection i shield tunnel construction (WLD-STC) to conduct classification and prediction tasks on the massive image dataset. The results demonstrate that for identifying and analysing water leakage episodes during shield tunnel construction, the WLD-STC strategy using LSTM-based GAN networks outperformed other methods, particularly on huge data.
    Keywords: disaster management; shield tunnel construction; STC; water leakage detection; big data; deep learning; generative adversarial networks; GAN; long short-term memory; LSTM.
    DOI: 10.1504/IJDMB.2024.10061756
  • Design of intelligent financial sharing platform driven by consensus mechanism under mobile edge computing and accounting transformation   Order a copy of this article
    by Qiang Li 
    Abstract: The intelligent financial sharing platform in the online realm is capable of collecting, storing, processing, analysing and sharing financial data through the integration of AI and big data processing technologies. However, as data volume grows exponentially, the cost of financial data storage and processing increases, and the asset accounting and financial profit data sharing analysis structure in financial sharing platforms is inadequate. To address the issue of data security sharing in the intelligent financial digital sharing platform, this paper proposes a data-sharing framework based on blockchain and edge computing. Building upon this framework, a non-separable task distribution algorithm based on data sharing is developed, which employs multiple nodes for cooperative data storage, reducing the pressure on the central server for data storage and solving the problem of non-separable task distribution. Multiple sets of comparative experiments confirm the proposed scheme has good feasibility in improving algorithm performance and reducing energy consumption and latency.
    Keywords: mobile edge computing; intelligent finance; data sharing; blockchain; non-separable task.
    DOI: 10.1504/IJDMB.2024.10061501
  • Human resource management and organisation decision optimisation based on data mining   Order a copy of this article
    by Mianmin Zeng 
    Abstract: The utilisation of big data presents significant opportunities for businesses to create value and gain a competitive edge. This capability enables firms to anticipate and uncover information quickly and intelligently. The author introduces a human resource scheduling optimisation strategy using a parallel network fusion structure model. The author's approach involves designing a set of network structures based on parallel networks and streaming media, enabling the macro implementation of the enterprise parallel network fusion structure. Furthermore, the author proposes a human resource scheduling optimisation method based on a parallel deep learning network fusion structure. It combines convolutional neural networks and transformer networks to fuse streaming media features, thereby achieving comprehensive identification of the effectiveness of the current human resource scheduling in enterprises. The result shows that the macro and deep learning methods achieve a recognition rate of 87.53%, making it feasible to assess the current state of human resource scheduling in enterprises.
    Keywords: big data analysis; human resource; enterprise management; parallel network; scheduling optimisation.
    DOI: 10.1504/IJDMB.2024.10062629
  • Access controllable multi-blockchain platform for enterprise R&D data management   Order a copy of this article
    by Yongxuan Zhao, Yingfeng Zhang 
    Abstract: In the era of big data, enterprises have accumulated a large amount of research and development data. Effective management of their precipitated data and safe sharing of data can improve the collaboration efficiency of research and development personnel, which has become the top priority of enterprise development. This paper proposes to use blockchain technology to assist the collaboration efficiency of enterprise R&D personnel. Firstly, the multi-chain blockchain platform is used to realise the data sharing of internal data of enterprise R&D data department, project internal data and enterprise data centre, and then the process of construction of multi-chain structure and data sharing is analysed. Finally, searchable encryption was introduced to achieve data retrieval and secure sharing, improving the collaboration efficiency of enterprise research and development personnel and maximising the value of data assets. Through the experimental verification, the multi-chain structure improves the collaboration efficiency of researchers and data security sharing.
    Keywords: enterprise R&D data; multi-chain blockchain; searchable encryption.
    DOI: 10.1504/IJDMB.2024.10062712
  • Integrating big data collaboration models: advancements in health security and infectious disease early warning systems   Order a copy of this article
    by Jiexuan Cui, Ye Deng, Qian Hao 
    Abstract: In order to further improve the public health assurance system and the infectious diseases early warning system to give play to their positive roles and enhance their collaborative capacity, this paper, based on the big and thick data analytics technology, designs a 'rolling-type' data synergy model. This model covers districts and counties, municipalities, provinces, and the country. It forms a data blockchain for the public health assurance system and enables high sharing of data from existing system platforms such as the infectious diseases early warning system, the hospital medical record management system, the public health data management system, and the health big and thick data management system. Additionally, it realises prevention, control and early warning by utilising data mining and synergy technologies, and ideally solves problems of traditional public health assurance system platforms such as excessive pressure on the 'central node', poor data tamper-proofing capacity, low transmission efficiency of big and thick data, bad timeliness of emergency response, and so on. The realisation of this technology can greatly improve the application and analytics of big and thick data and further enhance the public health assurance capacity.
    Keywords: big and thick data analytics; blockchain; public health; early warning model; collaborative model.
    DOI: 10.1504/IJDMB.2024.10064177

Special Issue on: Empowering the Future Generation of Data Mining and Knowledge Discovery in Bioinformatics

  • A novel intelligent-based intrusion detection and prevention system in the cloud using deep learning with meta-heuristic strategy   Order a copy of this article
    by Srilatha Doddi, Thillaiarasu N 
    Abstract: Cloud computing serves diverse options for end-users to minimise costs, and services are easily accessible through online platforms. While the users access the services remotely, the attackers launch cyber-attacks to disrupt the services. Cloud security analysts treat the security of the cloud as a potential area of research to minimise the impacts of abnormal behaviour. One of the potential solutions to detect attacks is the development of the next-generation intrusion detection and prevention system (IDPS). Hence, this paper proposes an efficient IDPS using a hybridised model known as hybrid firebug-squirrel swarm algorithm-based ensemble classifiers (HF-SSA-EC). Initially, the NSL-KDD cup 1999 dataset is considered for experimental analysis. The efficient features are extracted via restricted Boltzmann machines (RBM) layers of the deep belief network (DBN) model. The extracted features are submitted to the ensemble classifiers (ECs), which use naive Bayes (NB), support vector machines (SVM), deep neural networks (DNN), and recurrent neural networks (RNN) for identifying the intrusions. EC parameter optimisation using a hybridised HF-SSA meta-heuristic improves performance. Finally, the prevention model eliminates malicious nodes from detected intrusions. Meta-heuristic clustering is used in the preventative model. The experimental results reveal that the recommended IDPS outperforms existing models.
    Keywords: intrusion detection and prevention system; IDPS; cloud computing; restricted Boltzmann machines; RBM; deep feature extraction; firebug swarm optimisation; FSO; squirrel search algorithm.
    DOI: 10.1504/IJDMB.2025.10062482
  • Metaheuristic gene regulatory networks inference using discrete crow search algorithm and quantitative association rules   Order a copy of this article
    by Makhlouf Ledmi, Mohammed El Habib Souidi, Aboubekeur Hamdi-Cherif, Abdeldjalil Ledmi, Hichem Haouassi, Chafia Kara-Mohamed 
    Abstract: Gene regulatory networks (GRNs) inference appeared as valuable tools for detecting irregularities in cell regulation. Association rule mining (ARM) encompasses specific data mining methods capable of inferring unknown associations between genes. In response to the scarcity of ARM-based GRN inference, a novel metaheuristic algorithm, DCSA-QAR, is presented. This algorithm infers quantitative association rules by discretising the crow search algorithm. A first series of experiments involved comparison with five metaheuristic algorithms on six datasets. The results showed that, for Co-citation and YeastNet datasets, our algorithm was first in precision (100%), specificity (100%) and score (3.75). A second series of experiments involved nine information-theoretic algorithms through the DREAM3 and SOS networks. The average results on DREAM3 datasets are compensated by the SOS real datasets results: the best in accuracy, and true positives. As an overall appraisal, DCSA-QAR can be considered as a good candidate for ARM-based metaheuristic GRNs inference.
    Keywords: artificial intelligence; bioinformatics; gene regulatory networks; GRNs; data mining; soft computing; mining association rules.
    DOI: 10.1504/IJDMB.2025.10062651

Special Issue on: New Applications of Computational Biology and Bioinformatics

  • Spearman dependence function-based goodness-of-fit test for the gene's relation   Order a copy of this article
    by Selim Orhun Susam, Burcu Hudaverdi 
    Abstract: A gene network represents the relationship between different groups of genes with various functions, aiming to depict how genes collaborate and influence each other’s activities within a biological system. This relationship can be effectively explained using copulas. Therefore, it is crucial to determine which copula best fits the gene data and provides the most accurate explanation of the relationships between gene groups. In this study, our objective is to introduce a Spearman dependence function-based goodness-of-fit test using Bernstein polynomial approximation. We apply this test to identify a copula model that can effectively explain the relationships between gene groups. A Monte Carlo simulation study is conducted to assess the performance of the proposed test. Next, we analyze histone gene groups using data from yeast cell regulation, as provided by Eisen et al.(1998). Specifically, we investigate the dependence model structures of gene interactions for eight histone genes.
    Keywords: Spearman dependence; copula goodness-of-fit test; Bernstein copula; histone genes.
    DOI: 10.1504/IJDMB.2025.10061726
  • Research on facial dataset cleaning in mixed scenes based on spatiotemporal correlation   Order a copy of this article
    by Siguang Dai 
    Abstract: Researching methods for cleaning mixed scene facial datasets can improve the performance and reliability of mixed scene facial recognition algorithms. Therefore, the paper proposes a facial dataset cleaning method in mixed scenes based on spatiotemporal correlation. The 2DPCA algorithm is used to reduce the dimensionality of the data set, and the composite multi-scale entropy is used to decompose, reconstruct and arrange the image sequence after the dimensionality reduction. The autocorrelation coefficient and the number of interrelation between image sequences were determined, and the anomaly detection of data set was realised by combining spatio-temporal correlation. Sparse representation was used to repair the abnormal images, and the images with high similarity were deleted to clean the mixed scene face data set. The experimental results show that the minimum anomaly rate of our method is 0.5%, the success rate is between 94% and 96%, and the minimum time cost is 0.2 s.
    Keywords: spatiotemporal correlation; mixed scenes; facial dataset; dataset cleaning; 2DPCA algorithm; composite multi-scale entropy; sparse representation.
    DOI: 10.1504/IJDMB.2025.10061768
  • Identification of potential biomarkers of esophageal squamous cell carcinoma using community detection algorithms   Order a copy of this article
    by Bikash Baruah, Domum Karlo, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya 
    Abstract: Potential biomarker genes are uncovered in this research by developing a unique methodology through the employment of six eminent community detection algorithms (CDAs) on four RNAseq esophageal squamous cell carcinoma (ESCC) datasets. RNAseq datasets are preprocessed using galaxy server followed by the identification of a subset of differentially expressed genes (DEGs). CDAs are applied separately on control and disease samples of DEGs to extract the hidden communities of the datasets. To identify the significant communities, ESCC elite genes are extracted from Genecards for subsequent downstream analysis towards the identification of potential biomarkers. Topological analysis is performed to support critical gene identification based on elite genes followed by a biological investigation. For biological investigation, gene enrichment and pathway analysis are implemented. Finally, a group of genes EPHB2, ABLIM3, ACER1, ABCD4, ARF6, ADRA1D, ATP6V1D, CLTB, ATP6V0A4, and AP1M1 are identified as ESCC possible biomarkers that carry both topological and biological significance.
    Keywords: community detection algorithm; CDA; potential biomarker; esophageal squamous cell carcinoma; ESCC; Elite gene; topological analysis; biological significance.
    DOI: 10.1504/IJDMB.2025.10061876
  • Research on bioinformatics data classification method based on support vector machine   Order a copy of this article
    by Hui Yan, Yunxin Long, Chao Lv, Ping Yu, Duo Long 
    Abstract: Due to the problems of low classification accuracy and long classification time in traditional biological information data classification methods, a biological information data classification method based on support vector machine is proposed. Acquire bio-information data through gene expression and analyse its characteristics. According to the data analysis results, carry out outlier detection and data scaling for the acquired bio-information data. Based on the processing results, use mutual information to measure the correlation and redundancy, select the bio-information data features through the feature selection algorithm of minimum redundancy and maximum correlation, and take the selected bio-information data features as data samples. Through support vector machine, the classification decision function is established under the conditions of linear and non-separable data samples to obtain the classification results of biological information data. The experimental results show that the proposed method has higher classification accuracy and shorter classification time.
    Keywords: support vector machine; bioinformation; data classification; minimum redundancy and maximum correlation; feature selection.
    DOI: 10.1504/IJDMB.2025.10061944
  • Log anomaly detection and diagnosis method based on deep learning   Order a copy of this article
    by Zhiwei Liu, Xiaoyu Li, Dejun Mu 
    Abstract: In order to improve the accuracy of log anomaly detection and diagnostic effectiveness, this paper proposes a deep learning based log anomaly detection and diagnosis method. Firstly, analyse the log data and obtain the corresponding relationship between the log keys and log parameters. Secondly, using deep learning to capture association features, a convolutional neural network bidirectional long short-term memory (CNN-BiLSTM) deep learning model is constructed. Finally, learning context sequence feature information from both positive and negative directions through bidirectional input, and implementing log anomaly detection and diagnosis based on the results of context sequence feature information. The experimental results show that the accuracy of log anomaly detection in this method can reach 98.6%, the time required for log anomaly detection can reach 1.1 s, and the recall rate for log anomaly detection is 96.8%. The log anomaly detection effect is good.
    Keywords: deep learning; one hot encoding; context sequence features; log exception.
    DOI: 10.1504/IJDMB.2025.10062017
  • Classification and retrieval method of personal health data based on differential privacy   Order a copy of this article
    by Guanpeng Xu, Liang Zhao 
    Abstract: Research on personal health data classification and retrieval methods can improve the accuracy and efficiency of medical decision-making, promoting the development of personalised medicine. To overcome the issues of low accuracy, long retrieval time, and low satisfaction in traditional methods, a classification and retrieval method of personal health data based on differential privacy is proposed. The method involves encrypting personal health data using linear regression model and differential privacy, constructing a classification objective function through integrated manifold learning to classify the encrypted results of personal health data. Binary hash codes are used to retrieve the classification results, and the decrypted retrieval results are provided to users for personal health data classification and retrieval. The experimental results demonstrate that this method achieves a maximum accuracy of 96.8% in personal health data classification and retrieval, with a minimum retrieval time of 20 ms and an average satisfaction of 97.1% for the retrieval results.
    Keywords: differential privacy; personal health data; classification and retrieval; linear regression model; encrypted results; binary hash code.
    DOI: 10.1504/IJDMB.2025.10062018
  • Prediction method of commercial customers' mental health based on data mining   Order a copy of this article
    by Yanhua Shen, Bing Gao 
    Abstract: For commercial customer management, mental health prediction is crucial, therefore, a data mining based method for predicting the mental health of commercial customers is proposed. Firstly, the K-means algorithm is used to mine and process the psychological health test data of commercial customers. Secondly, develop a program for evaluating the psychological health of commercial customers, construct a judgment matrix, and calculate weight coefficients to obtain the evaluation results of the psychological health level of commercial customers. Finally, based on the evaluation results of mental health level as input and the predicted results of mental health, a BP neural network is used to build a commercial customer mental health prediction model. The experimental data shows that after the proposed method is applied, the mining results of commercial customers’ mental Health data are consistent with the actual results, and the minimum error of commercial customers’ mental health prediction is 0.4%.
    Keywords: commercial customers; mental health; enterprise development; data mining technology; prediction model construction.
    DOI: 10.1504/IJDMB.2025.10062484
  • Longitudinal analysis for predicting amino acid changes in HIV-1 using association rule mining   Order a copy of this article
    by Mounira Lakab, Abdelouaheb Moussaoui 
    Abstract: The human immunodeficiency virus (HIV) remains a great challenges for humanity. HIV is characterised by high mutational rate, resulting into pathogenic variants that promotes the escape of immune response. In order to understand the correlations between amino acid mutations of the virus and quantify the evolutionary in HIV. We present a novel approach based on association rule mining (ARM) from protein sequence data taken at different time points. In this study, a longitudinal association rule mining (LARM) algorithm has been proposed. We collected the entire genome of 100 untreated HIV-1 infected patients over 3-5 years of infection, with 6-10 longitudinal samples per patient. We used the Los Alamos intra-patient search interface. Our experiments show the effectiveness of the proposed method in discovering major amino acid changes in comparison with the temporal analysis.
    Keywords: association rule mining; longitudinal data; HIV-1; mutation; amino acid; data mining.
    DOI: 10.1504/IJDMB.2025.10062519
  • An advanced approach for DNA sequencing and similarities analysis on the basis of groupings of nucleotide bases   Order a copy of this article
    by Kshatrapal Singh, Laxman Singh, Vijay Shukla, Yogesh Kumar Sharma, Arun Kumar Rai 
    Abstract: In order to seamlessly identify the links between various DNA sequences on a broad scale, DNA sequencing is a crucial tool. But there are still more potential for advancement in sequencing quality. A highly well-liked method for determining sequence similarities is the alignment-free technique. As per their chemical characteristics, the four bases of DNA sequences A, C, G, and T are separated in three types of groupings in this research. A primary DNA sequence is transformed into three symbolic sequences. In order to depict the sequence, the frequencies of group variations of three notational sequences have been aggregated in a 12-component vector. The nucleotide sequences of beta globin gene on a dataset of several species are characterised and compared using the Euclidean distances across inserted vectors. Using phylogenetic trees, the evolutionary relationships between various organisms are visually represented. A phylogenetic tree’s branch structure shows how several species or other groups diverged from several common ancestors. Our findings are in agreement with recent biological assessments. Additionally, we compared our approach to a few currently used sequence comparing techniques and discover that it is more efficient and user-friendly. We also analysed the time and space complexities of our proposed approach.
    Keywords: alignment-free technique; similarity analysis; bases groupings; mutation; phylogenetic tree.
    DOI: 10.1504/IJDMB.2025.10063428
  • In silico evaluation via the docking of selected antidiabetic phytochemicals on proteins in the insulin signalling pathway: PTP1B, IRS1 and PP2A   Order a copy of this article
    by Hazim Alsharabaty, Niveen Alayasi, Safa Radi Jabarin, Siba Shanak, Hilal Zaid 
    Abstract: Type II Diabetes Mellitus (T2MD) is a worldwide disease, caused by the resistance of tissues to insulin. In this study, eight potential antidiabetic phytochemicals from Gundelia tournefortii and Ocimum basilicum were tested in silico. To this aim, we docked the phytochemicals on pivotal proteins in the insulin signalling pathway; using the docking protocol of AutoDock. This work aimed at understanding the mechanism of action of these phytochemicals by finding the optimal binding site, calculating the best orientation, and studying the amino acids involved at the interaction interface between the phytochemicals and each protein target. Our results indicated that stigmasterol, beta-amyrinm, beta-sitosterol, lupeol-trifluoroacetate and lupeol introduce good binding to PTP1B, IRS1, and PP2A and are candidate drugs for the treatment of T2DM. The results of the study may serve as a focal point for drug discovery that may be further extended in the in vitro, in vivo and clinical studies.
    Keywords: diabetes; phytochemicals; in silico; Gundelia tournefortii; Ocimum basilicum; docking; AutoDock.
    DOI: 10.1504/IJDMB.2025.10064690