Forthcoming and Online First Articles

International Journal of Biometrics

International Journal of Biometrics (IJBM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Biometrics (26 papers in press)

Regular Issues

  • Statistical analysis of lateral palm prints in conjugation with signatures for personal identification   Order a copy of this article
    by Sonali Ashok Dagha, Kanica Chugh, Pooja Ahuja 
    Abstract: Personal identification on the basis of individual characteristics is always an area of interest for the forensic scientist. Certain behavioural biometrics, viz. Signatures can help in individualising the person. If such biometrics are modelled with lateral palm prints, they can be a reliable source of individualisation. The hypothenar area/lateral side of the palm/ulnar side of palm that is encountered in crime scenes in the form of patent or latent prints in cases such as sexual assault, below the signature on the document, etc. gives the way to identify an individual. The paper discusses one such useful technique, where fingerprints help in identifying the individual, in the same way that the lateral prints left on paper in conjugation with signatures can also help in identification of the author without analysing the signatures. The present study was conducted on 200 samples obtained in normal position from individuals, in order to identify lateral prints and determine their patterns for the purpose of personal identification. The results suggested positive identification to understand whether the person is right-handed or left-handed. Statistical analysis helps to establish a correlation between the genuine way of signature and the lateral side of the palm.
    Keywords: signature authorship; biometrics; hypothenar area; personal identification; palm biometrics.
    DOI: 10.1504/IJBM.2026.10071812
     
  • A novel Boolean approach for cancellable biometric template generation   Order a copy of this article
    by Onkar Singh, Ajay Jaiswal, Nitin Kumar, Naveen Kumar 
    Abstract: Cancellable biometrics addresses privacy and security concerns by transforming biometric data into a non-invertible template. This paper presents two novel and secure binary domain transformations for generating cancellable templates from biometric images. Our approach involves converting pixel decimal values to their unsigned binary equivalents, significantly enhancing non-invertibility. Experimental results on eight datasets demonstrate superior performance, achieving an Equal Error Rate (EER) below 0.001%, outperforming five leading cancellable biometric methods based on salting, XOR, and random permutations. Inverse attack, Attack via Record Multiplicity (ARM), and similarity metrics confirm the non-invertibility and robustness of the generated templates. False accept and brute force attack analysis prove that the methods are secure. Both methods adhere to the essential requirements of cancellable biometrics, allowing the cancellation of templates and demonstrating improved recognition accuracy. This makes both of them feasible for secure biometric authentication.
    Keywords: Boolean-XOR; biometric salting; biometric recognition; template protection; privacy; security.
    DOI: 10.1504/IJBM.2026.10071901
     
  • Facial micro expression classification and recognition method based on hidden Markov and support vector machine   Order a copy of this article
    by Ruifang Xing, Jingjing Feng 
    Abstract: In order to reduce the misidentification rate and time consumption of facial micro expression classification and recognition, a facial micro expression classification and recognition method based on hidden Markov and support vector machine is proposed. Firstly, the micro expression sample images are preprocessed using bilinear interpolation and mean variance normalisation methods. Secondly, Gabor wavelet transform is used to extract local features of micro expression space. Finally, a hidden Markov model is constructed to extract temporal features, and the optimal feature sequence is searched through Viterbi. The feature sequence is then input into a support vector machine to complete classification and recognition. The experimental results show that the false recognition rate of this method is always below 0.2%, the AUC area is close to 1, and the recognition time is always below 15 ms. Therefore, it indicates that the proposed method significantly improves the accuracy and timeliness of micro expression recognition.
    Keywords: Gabor wavelet transform; feature extraction; hidden Markov; support vector machine; SVM; micro expression recognition.
    DOI: 10.1504/IJBM.2025.10071942
     
  • An optimised learning strategy for analysing mentally disordered childrens activity based on facial expressions   Order a copy of this article
    by Mayuri N. Panpaliya , Pritish A. Tijare 
    Abstract: Facial expressions play crucial role in non-verbal communication and are used to recognise present state of mind in humans. A lot of research has been carried out for determining human emotions from facial expressions. However, the contribution is very less in case of children’s having mental illness. This article presents a novel intelligent convolution neural-based buffalo optimisation (CNbBO) model to detect and predict, activities of mentally ill children’s based on their facial expressions. Initially, machine learning (ML) algorithms are used for pre-processing and feature extraction. Facial emotions are classified with the help of optimisation fitness function from these features. Then, the second fitness function at layer two is updated to track the activities, enhance the detection rate, as well as to improve classification accuracy. Results shows the performance of the developed algorithm is promising with an accuracy of 94%, which is better than present available techniques.
    Keywords: mental disorder; mental illness; facial expression; feature extraction; classification accuracy.
    DOI: 10.1504/IJBM.2026.10072292
     
  • Chinese speech emotion recognition based on improved convolutional neural network   Order a copy of this article
    by Xiaoyan Wei, Xinhua Wang 
    Abstract: A Chinese speech emotion recognition method based on improved convolutional neural network is proposed with the expected goal of solving the problems of high false acceptance rate and false rejection rate and high recognition time consumption in traditional Chinese speech emotion recognition methods. Collect Chinese speech signals and perform pre emphasis, framing, windowing, and fast Fourier transform on the collected signals to achieve pre-processing of Chinese speech signals and extract features of the pre-processed Chinese speech signals. Introducing multi-level residuals to improve the convolutional neural network, inputting Chinese speech signal features into the improved convolutional neural network, and iteratively outputting Chinese speech emotion recognition results. Through experimental testing, it has been proven that the proposed method has an average false acceptance rate of 2.84% and an average false rejection rate of 4.63%. The maximum time consumption for Chinese speech emotion recognition is 49.2 ms.
    Keywords: improved convolutional neural network; Chinese speech; emotion recognition; fast Fourier transform; FFT; multi-level residuals.
    DOI: 10.1504/IJBM.2025.10072293
     
  • Survival analysis: a comparative study of frequentist and Bayesian approaches   Order a copy of this article
    by Zaheer Aslam, Abid Hussain, Nasir Ali, Muhammad Hanif, Roquia Aslam 
    Abstract: This experiment compares two categories of survival analysis, including traditional (Kaplan-Meier, Cox PH, parametric models, Bayesian Cox) and deep learning (DeepSurv, DeepHit), on 299 heart failure patients. DeepHit did better than the others (C-index = 0.75, IBS = 0.16) and it surpassed DeepSurv (0.73) and Cox PH (0.71). Cox PH turned out to be quicker and easily interpretable and age (HR = 1.05) and serum creatinine (HR = 1.37) emerged as key predictors. The Bayesian models performed well in terms of small-sample (DIC = 1,272.30), thus providing uncertainty quantification. Parametric models (e.g., Weibull, AIC = 1282.24) were effective, where distributional assumptions were met. Important variables, such as age, ejection fraction, and renal biomarkers, were also always significant. The model selection is need-based, i.e., Cox PH is fast and easy to interpret, Bayesian procedures are small sample sizes and priors and DeepHit is used when the patterns are complex and parametric models are when the data is distributed in a certain way.
    Keywords: Survival analysis; parametric models; nonparametric models; semiparametric methods; Bayesian parametric models; Bayesian semiparametric models.
    DOI: 10.1504/IJBM.2026.10072343
     
  • Optimised VGH algorithm-based deep CNN classifier for diabetic retinopathy   Order a copy of this article
    by Bhagyashree Somnath Madan, Avinash Sharma 
    Abstract: High consequential condition of diabetic retinopathy (DR) has an impact on diabetic patients, inaccurate detection results the permanent vision loss. Fundus images are used to predict the severity of DR from the lesions segmentation which takes more time and complex process for manual segmentation. Therefore, the proposed research developed the vision guided horse optimiser (VGH algorithm) based on the deep convolution neural network (DCNN) classifier to identify DR. The significance of the present research is based on the vision guided horse-optimised deep convolution neural network (VGH-optimised DCNN) model for classifying DR. Image contrast enhancement, as well as illumination correction, is utilised in the stage of pre-processing, and the automatic Otsu approach, as well as the contour-based threshold approach, is used to segment both optic disc as well as blood vessel. Various methods are compared with the VGH-optimised DCNN to enhance the model performance. Thus, the accuracy, F1-score, sensitivity, as well as specificity of the developed model by varying epoch25 is 97.59%, 96.76%, 96.64%, and 96.42% at the training percentage of 90, whereas, at the k-fold value 6, the VGH-optimised DCNN model attains the values of 97.10%, 96.77%, 96.22%, and 97.07% utilising the IDRID dataset.
    Keywords: diabetic retinopathy; optic disc segmentation; blood vessel segmentation; deep CNN; vision guided horse optimiser.
    DOI: 10.1504/IJBM.2026.10072442
     
  • Design of an ensembled face recognition system using optimal local features for unconstraint and age difference environment   Order a copy of this article
    by Dipak Kumar, Ravi Kant Kumar, Jogendra Garain, Dakshina Ranjan Kisku, Jamuna Kanta Sing, Phalguni Gupta 
    Abstract: Biometrics authentication especially using face recognition, is still a big challenging problem. Face exhibits various semantic information with its numerous expressions. The exhibitions of dynamic expressions of faces are the main challenges for the biometric system. However, researchers continuously trying to enhance facial recognition robustness. This work proposes a multi-classifier-based ensemble system for effective face recognition. Our proposed system has two modules: 1) an optimisation module that decreases computational cost using feature sets; 2) a fusion module with multiple classifiers to advance accuracy. The feature set is extracted using local descriptors named LBP, DS-LBP, and LGS. Feature sets are optimised using a genetic algorithm (GA). Optimised feature sets are classified distinctly, and results are united using decision-level fusion methods like AND-rule, OR-rule, and majority voting. Experiments accomplished on LFW, BioID, and LAG datasets and it shows that the proposed ensemble system is more efficient and robust.
    Keywords: face recognition system; local binary pattern; LBP; local graph structure; SLGS; densely sampled local binary pattern; DS-LBP; ensemble system; genetic algorithms; majority voting.
    DOI: 10.1504/IJBM.2026.10072443
     
  • Deep learning-driven multi-sample periocular recognition for biometric authentication   Order a copy of this article
    by Nabil Hezil, Amir Benzaoui, Ghania Droua-Hamdani, Khadidja Belattar, Ahmed Bouridane 
    Abstract: The COVID-19 pandemic has accelerated the adoption of contactless biometric modalities such as face, iris, voice, and periocular recognition, which offer a safer alternative to traditional methods by reducing the risk of disease transmission in public and private spaces. While face recognition technologies have shown robust performance even with partial facial occlusions, their accuracy significantly diminishes when individuals wear medical masks, highlighting the importance of periocular biometrics for reliable personal identification. To enhance security and accuracy, multi-biometric systems-combining multiple biometric traits outperform single-modality approaches. We propose a multi-input convolutional neural network (MICNN) framework that fuses the left and right periocular traits from the same face image for enhanced biometric recognition. We evaluate our method on two challenging periocular datasets, achieving highly competitive correct recognition rates of 99.62% and 98.33%, respectively, outperforming recent benchmarks. These results underscore the efficacy of multi-sample periocular recognition using deep learning for contactless biometric identification.
    Keywords: multi-sample biometrics; periocular recognition; fusion; deep learning; cascade object detector; multi-input convolutional neural network; MICNN; feed-forward neural networks; FFNNs.
    DOI: 10.1504/IJBM.2026.10072550
     
  • Gender-based analysis of ECG biometric identification under different physiological conditions   Order a copy of this article
    by Siti Nurfarah Ain Mohd Azam, Khairul Azami Sidek 
    Abstract: The study investigates the reliability of ECG signals as a biometric feature for individuals in different physiological conditions. Previous research has shown that ECG biometric identification can be performed during normal conditions, however, the challenge lies in performing biometric identification on moving subjects. Therefore, this work proposed a robust biometric identification by using ECG signals incorporating different physiological conditions based on gender. A total of 15 male subjects and 7 female subjects who performed sitting, walking and running activities were involved in this work. The study discovered that ECG signals can be reliably used as a biometric for different physiological conditions, with medium Gaussian SVM being the most effective with the results of 98.7%. The accuracy of ECG signal classification can be affected by gender differences, with female subjects exhibiting accuracy as low as 82.9%, likely due to the size of their hearts.
    Keywords: ECG; biometric; person identification; pattern recognition; verification; signal processing; human authentication; support vector machine; biological signals; signal classification.
    DOI: 10.1504/IJBM.2025.10070475
     
  • Enhancing security and accuracy in biometric systems through the fusion of fingerprint and gait recognition technologies   Order a copy of this article
    by Mayank Shekhar, Amit Kumar Trivedi, Ripon Patgiri 
    Abstract: In the evolving landscape of security technology, biometric systems are pivotal for unique identification through physiological or behavioural traits. This research focuses on enhancing biometric system security and accuracy by integrating fingerprint and gait recognition technologies. Fingerprint recognition is valued for its precision and ease of data acquisition, while gait recognition offers non-invasiveness and resistance to obfuscation. The study explores feature and score level fusions of these modalities, utilising advanced algorithms to optimise the integration and elevate recognition performance. Experimental evaluations demonstrate that the proposed multimodal system not only outperforms unimodal systems but also strengthens robustness against spoofing attacks. Key contributions include a novel gait feature extraction technique compatible with fingerprint features and an optimised score-level fusion algorithm, significantly enhancing accuracy and security. Biometric security systems have become integral to modern security architectures, leveraging unique physiological and behavioural characteristics to authenticate individuals.
    Keywords: multimodal biometrics; fingerprint recognition; gait recognition; biometric security; feature fusion; biometric authentication.
    DOI: 10.1504/IJBM.2025.10068285
     
  • Classification of human emotion using an EEG-based brain-machine interface: a machine learning approach   Order a copy of this article
    by Abdul Cader Mohamed Nafrees, Sidath Ravindra Liyanage, Naomal G.J. Dias 
    Abstract: The main purpose of this work is to investigate the possibility of using electroencephalography (EEG) data to improve machine learning models' ability to accurately identify emotions. The work focuses on emotion classification using EMG data, to improve data mining models. This work investigates the use of individual and ensemble classification methods in the processing of windowed data obtained from four scalp sites. This information is then utilised to calculate the emotions that participants felt at particular times. The results indicate that the use of a low resolution, readily available EEG device can be a useful tool for determining a human's emotional status. The submission of ensembling technique increases the accuracy of the model; this highlights the possibility of creating categorisation algorithms that may be used in practical decision support systems. Future studies in this field ought to concentrate on determining if the method, attribute creation, attribute selection, or both were responsible for this notable improvement.
    Keywords: electroencephalography; EEG; electromyography; EMG; facial expressions; human emotion; machine learning; ML.
    DOI: 10.1504/IJBM.2025.10070061
     
  • A survey of multimodal emotion recognition: fusion techniques, datasets, challenges and future directions   Order a copy of this article
    by Kuei-Chung Chang, Sheng-Quan Chen 
    Abstract: Emotion recognition is crucial in enhancing the quality of human-computer interaction, education, healthcare, and transportation safety. By integrating data from different sources, multimodal emotion recognition can capture complex emotional signals more comprehensively and accurately than single-modal data sources. This paper comprehensively reviews AI-based multimodal emotion recognition systems, covering deep learning techniques, datasets, challenges, and future research directions. We also present current research techniques, including exploring fusion strategies for different modal data and diverse data fusion methods. Several challenges in multimodal emotion recognition are discussed in this paper, such as the incompleteness of modal data, inconsistency in signal quality, and insufficient model interpretability. The paper points out that future research needs to further explore how to effectively integrate data from different modalities and enhance the adaptability and interpretability of models in practical applications.
    Keywords: emotion recognition; multimodal learning; artificial intelligence; feature fusion.
    DOI: 10.1504/IJBM.2025.10070454
     
  • Facial thermograms - application of facial recognition in the medical sector   Order a copy of this article
    by Swagata Sarkar, R. Muthuselvan, N. Ashokkumar, Rajesh Kumar Vishwakarma 
    Abstract: Millions of people around the world have recurrent migraines, which are a nerve disease that can be very bad. This study finds a big difference in temperature in the frontal and temporal areas of the right brains of women who had headaches on one side only. Notably, the temperature trends of people who had pain on both sides did not change, which suggests that the diagnostic process may be more complex. More study with bigger groups is still required. Face thermography should still be read with care, though, because more research and proof are needed. Facial thermography has a lot of potential to help doctors diagnose headaches better and learn more about how they work on a neurophysiological level. The system was trained with 1980 images and then tried with 576 images. It got a score of 96.66% for accuracy.
    Keywords: female; headache; humans; migraine disorders; quality of life; pain; temperature; thermography.
    DOI: 10.1504/IJBM.2025.10069534
     

Special Issue on: Applications of Image Processing and Pattern Recognition in Biometrics

  • A method for classifying and recognising the emotional states of dancers based on the spatiotemporal features of facial expressions   Order a copy of this article
    by Yaotian Li, Zhaoping Wang 
    Abstract: To address the issues of low recall and poor accuracy in the classification and recognition of dance performers' emotional states based on spatiotemporal features of facial expressions, a dance performer emotional state classification and recognition method based on spatiotemporal features of facial expressions is proposed. Firstly, face detection is performed using an integral graph, and preprocessing is carried out using affine transformation and histogram equalization. Secondly, combining LBP and LPQ algorithms to extract spatiotemporal features of facial expressions. Next, principal component analysis is applied for feature selection and dimensionality reduction to reduce noise and redundant information. Finally, support vector machine (SVM) is used for emotional state classification, achieving automatic recognition and multi class classification. Through experiments, it has been proven that the accuracy and recall rate of the emotional state recognition method proposed in this paper are high, with a recall rate consistently above 95%.
    Keywords: spatiotemporal features; principal component analysis; PCA; support vector machine; SVM; emotional state; classification recognition.
    DOI: 10.1504/IJBM.2025.10069414
     
  • A grading evaluation method for English oral pronunciation errors based on deep neural networks.   Order a copy of this article
    by Jian Sun, Li Zhang, Guanghui Shu 
    Abstract: In this paper, a deep neural network-based grading method for English oral pronunciation errors is proposed. Preprocess English oral pronunciation signals and extract MFCC feature vectors. Using Hidden Markov model to construct an acoustic model, using deep neural network to predict the state probability distribution of acoustic feature vectors, replacing the observation probability of the acoustic model. Construct a language model to obtain the probability of word order, combine it with an acoustic model to build a search network, use the Viterbi algorithm to decode, and find the phoneme state sequence. And based on the reference phoneme sequence, calculate the degree of pronunciation errors, compare it with a threshold, and achieve graded evaluation. The results indicate that the AUC value of the proposed method is close to 1, and the F1 value is above 0.95, indicating a high accuracy of the evaluation.
    Keywords: spoken English; pronunciation error; graded evaluation; hidden Markov model; deep neural network; DNN.
    DOI: 10.1504/IJBM.2025.10069415
     
  • Research on basketball emergency stop jump shot action recognition based on semantic guided neural network   Order a copy of this article
    by Yong Wang 
    Abstract: In order to accurately and quickly recognize basketball emergency stop and jump shot movements, a new semantic guided neural network-based basketball emergency stop jump shot action recognition method is proposed. Firstly, improve the quality of basketball action images through color vectorization and filtering preprocessing techniques. Secondly, using image retrieval technology for edge contour feature extraction and fusion retrieval, a high suspicion basketball emergency stop jump shot action pixel feature sample set is selected. Finally, semantic information is integrated into the neural network to improve recognition accuracy. The network architecture innovatively incorporates non local feature extraction modules, ECA attention mechanism modules, and deformable convolution modules to extract feature information. Through fully connected layers, accurate recognition of basketball emergency stop jump shots is achieved. The test results show that the recognition accuracy of this paper method is stable at around 95%, and the highest recognition time is only 0.93s.
    Keywords: semantic guided neural network; basketball emergency stop jump shot; action recognition; edge contour features.
    DOI: 10.1504/IJBM.2025.10069416
     
  • Seeing the unseen: a novel approach to biometric recognition system   Order a copy of this article
    by Kumari Deepika , Deepika Punj, Jyoti Verma 
    Abstract: This paper introduces an innovative three-phase cascade framework designed for biometric recognition systems, particularly suited for small-scale applications. By integrating multiple biometric modalities - dorsal vein, wrist vein, and palm print - the framework aims to improve recognition accuracy and robustness. The first phase focuses on extracting unique features from each modality using a moment-based approach that is transformation-invariant and computationally efficient. In the second phase, an asymmetric aggregator operator is employed to merge these features into a unified representation. The final phase utilises spectral clustering to classify and match the fused feature vectors, effectively addressing unseen samples. Evaluated on 350 samples from the COEP and FYO benchmark databases, the framework achieved an impressive accuracy of around 98% for unseen samples, outperforming existing methods like Zernike moment and hierarchical clustering. This work not only enhances biometric authentication but also broadens its applicability across various domains, marking a significant advancement in the field.
    Keywords: moment; unseen samples; spectral clustering; hierarchical; Zernike; Hu; CFOEP palmprint; FYO DB.
    DOI: 10.1504/IJBM.2025.10070135
     
  • Interactive gesture recognition method based on spatiotemporal mask and variational mode decomposition   Order a copy of this article
    by Yuee Yi 
    Abstract: In order to reduce the error recognition rate and response time of interactive gestures, this study proposes a new interactive gesture recognition method based on spatiotemporal masking and variational mode decomposition. Firstly, the spatiotemporal features of interactive gestures are extracted using spatiotemporal masks, and combined with graph convolutional networks and self attention mechanisms, the modeling of hand joint data is optimized. Secondly, the variational mode decomposition technique is used to extract time-frequency features from interactive gestures. Finally, the principal component analysis method is used to reduce the dimensionality of high-dimensional gesture features, and a support vector machine classifier is employed to recognize different types of interactive gestures in the reduced feature space. The experimental results show that the proposed method performs well in interactive gesture recognition tasks, with a maximum error rate of no more than 1% and a response time of only 0.31s.
    Keywords: spatiotemporal mask; variational mode decomposition; interactive gesture recognition; support vector machine; SVM.
    DOI: 10.1504/IJBM.2025.10070791
     
  • English oral pronunciation recognition based on improved deep neural networks   Order a copy of this article
    by Lifang Cheng 
    Abstract: In order to solve the problems of poor performance, low F1-score, and high error rate in traditional English oral pronunciation recognition methods, an English oral pronunciation recognition method based on improved deep neural networks is proposed. Firstly, preprocess the spoken English pronunciation signals and videos to extract audio and lip features. Then, fusion processing is performed on the extracted multimodal features. Finally, the multimodal feature fusion result is used as the output vector, and the English spoken pronunciation recognition result is used as the output vector. By adding an attention module before the first fully connected layer, a deep neural network model is built to obtain the relevant recognition results. The experimental results show that the proposed method has good recognition performance, with an F1-value consistently maintained above 0.95 and an error rate of no more than 1%. It can be further promoted in related fields.
    Keywords: spoken English; pronunciation recognition; deep neural network; attention module; multimodal feature fusion.
    DOI: 10.1504/IJBM.2025.10070792
     
  • Multimodal emotion recognition based on combined deep learning network   Order a copy of this article
    by Zhenzhen Wang, Yu Ji, Rui Sun, Qi Liu 
    Abstract: Aiming to address the issues of low accuracy, low F1 score, and long task completion time in traditional multimodal emotion recognition methods, a multimodal emotion recognition method based on combined deep learning network is proposed. Firstly, collect EEG signals, eye movement data, and facial expression images, and extract the features of the collected data. Then, using the modal attention module, weighted operation, and decision module, a multimodal feature fusion model is built, and the extracted features are used as model inputs to obtain the multimodal feature fusion results. Finally, combining the results of multimodal feature fusion with a combined deep learning network to achieve multimodal emotion recognition. The experimental results show that the proposed method has a maximum accuracy of 99.1% for multimodal emotion recognition, a minimum F1 value of 0.947, and a minimum completion time of 56.8ms for multimodal emotion recognition tasks, demonstrating high precision and efficiency.
    Keywords: combined deep learning network; multimodal; emotion recognition; EEG signals; eye movement data; facial expression images.
    DOI: 10.1504/IJBM.2025.10070793
     
  • Automatic speech recognition method for English translator based on improved transfer learning   Order a copy of this article
    by Wei Zeng, Gaochao Huang 
    Abstract: Aiming to solve the problems of high word error rate and long response time in current English translator speech recognition methods, an improved transfer learning based English translator speech automatic recognition method is proposed. Sample, quantify, and pre-process the speech signal of the English translator, and extract the features of the pre-processed English translator speech signal. Using the speech signal features of the English translator as input vectors and the speech recognition results of the English translator as output vectors, improving transfer learning by removing the output layer of the base model, re-initialising an output layer and connecting it to the hidden layer of the base model, and fine-tuning the network to build an English translator speech automatic recognition model and obtain speech recognition results. Experimental results show that the word error rate of the proposed method is less than 3.5%, and the highest response time is only 7.3 ms.
    Keywords: improved transfer learning; English translator; speech recognition; pre-processed; signal features.
    DOI: 10.1504/IJBM.2025.10071397
     
  • Continuous pronunciation error recognition of English vocabulary based on dual modal fusion features   Order a copy of this article
    by Lin Wu 
    Abstract: In order to reduce the word error rate and character error rate in English pronunciation recognition, a continuous pronunciation error recognition method for English vocabulary based on dual modal fusion features is proposed. Firstly, pre-processing continuous speech data of English vocabulary, including screening visual information mouth ROI, colour normalisation, horizontal flipping; using short-time Fourier transform to extract audio features and normalise them to ensure data temporal consistency. Secondly, for the pre-processed continuous speech data of English vocabulary, the fusion of dual modal features of speech visual information and auditory information is completed based on kernel principal component analysis. Finally, using the fused features as input, the construction of an English vocabulary continuous speech pronunciation error recognition model is completed. Experimental results show that the proposed method has a word error rate of less than 7.9%, a character error rate of less than 6.4% for continuous pronunciation errors in English vocabulary.
    Keywords: dual modal fusion features; English vocabulary; continuous speech; pronunciation error; intelligent recognition.
    DOI: 10.1504/IJBM.2025.10071398
     
  • Application of improved stacking ensemble learning in intelligent terminal fingerprint recognition   Order a copy of this article
    by Zhe Li 
    Abstract: In order to solve the problems of high RMSE value and error acceptance rate, as well as poor recognition performance in existing fingerprint recognition methods, the application of improved Stacking ensemble learning in intelligent terminal fingerprint recognition was studied. Firstly, image quality is improved through grayscale processing, Gaussian filtering denoising, and Gabor filtering. Secondly, the Sobel operator is used to calculate the gradient direction and divide it into blocks to extract fine node features, which are described by curvature to support fingerprint matching. Finally, the improved algorithm is used for fingerprint recognition, which is trained and predicted by the base learner. The output is then subjected to feature enhancement, weighted fusion, and second layer learner training to obtain the final fingerprint recognition result. The experimental results show that the RMSE value and error acceptance rate of the proposed method are low, and the identification effect is good.
    Keywords: improved stacking ensemble learning; intelligent terminal; fingerprint recognition; decision tree algorithm.
    DOI: 10.1504/IJBM.2025.10071399
     
  • Health status recognition method of human in sports videos based on deep learning   Order a copy of this article
    by Xunxing Liu 
    Abstract: In this paper, A recognition method of human health status in sports videos based on deep learning is proposed. Firstly, determine the global spatial information of frame difference features and enhance the frame difference features of the human's health status for frame difference feature extraction; Secondly, extract the facial expression information feature frame differences of the human's health status, and complete the frame difference segmentation of the human's health status in the motion video. Then, reduce the fitting degree of the recognition results, use the fully connected layer in deep learning to output the recognition results, and introduce a loss function to optimize the recognition results; Finally, experimental verification will be conducted. The experimental results show that the proposed method has high confidence, low complexity, and small error, indicating good recognition performance.
    Keywords: deep learning; sports videos; human’s health status; identification; frame difference segmentation.
    DOI: 10.1504/IJBM.2025.10071400
     
  • Static facial expression emotion recognition method using spatiotemporal graph convolution   Order a copy of this article
    by Yanmei Sun, Bo Cheng 
    Abstract: In order to improve the Matthews correlation coefficient and consistency index of facial expression features in emotion recognition, a static facial expression emotion recognition method using spatiotemporal graph convolution is proposed. Firstly, by standardising facial images through eye localisation, correcting tilted expressions through rotation, estimating pixel values through bilinear interpolation, and combining histogram equalisation techniques, grayscale processing of static facial expression images has been achieved. Secondly, by applying two-dimensional Gabor wavelet filtering to static facial images, the time-frequency localisation characteristics of Gabor wavelets are utilised to accurately extract texture details of different frequencies and directions in facial images. Finally, the spatiotemporal graph convolution method is used to extract spatial features and achieve effective recognition of static facial expressions and emotions. In the static facial expression emotion recognition experiment, the Matthews correlation coefficient remained above 0.9, and the consistency index of expression features remained above 0.91.
    Keywords: spatiotemporal graph convolution; static facial expressions; emotion recognition; two dimensional Gabor wavelet filtering.
    DOI: 10.1504/IJBM.2025.10071401