Open Access Article

Title: Multimodal English corpus text recognition based on unsupervised domain adaptation

Authors: Xiaole Duan; Yan Hu

Addresses: Teaching Department of Public Courses, Hunan Communication Polytechnic, Changsha 410132, China ' College of Intelligent Transportation, Hunan Communication Polytechnic, Changsha 410132, China

Abstract: With the explosion of multimodal data, it is an important challenge to effectively utilise unlabeled data for cross-modal text recognition. This paper first preprocesses the text and speech data in the English corpus, and use BiLSTM and self-attention mechanism (SA) to extract important text features; and use convolutional neural network, BiLSTM and SA to extract speech features with high contribution. Subsequently, the multimodal features are modelled by graph neural networks, a two-part graph is constructed and knowledge transfer is performed, and domain-invariant features containing information about inter-domain interactions are extracted. Reducing the difficulty of domain adaptation with large inter-domain differences through unsupervised domain adaptation makes the adversarial training process smoother. Finally, the recognition results are obtained by the inference of domain invariant features by the classifier. Experimental results show that the weighted accuracy of the proposed model reaches 93.67%, which significantly improves the recognition effect.

Keywords: multimodal text recognition; self-attention mechanism; SA; unsupervised domain adaptation; UDA; graph neural network; adversarial training; AT.

DOI: 10.1504/IJICT.2025.146103

International Journal of Information and Communication Technology, 2025 Vol.26 No.11, pp.53 - 68

Received: 12 Mar 2025
Accepted: 22 Mar 2025

Published online: 06 May 2025 *