Article: Multimodal English corpus text recognition based on unsupervised domain adaptation Journal: International Journal of Information and Communication Technology (IJICT) 2025 Vol.26 No.11 pp.53 - 68 Abstract: With the explosion of multimodal data, it is an important challenge to effectively utilise unlabeled data for cross-modal text recognition. This paper first preprocesses the text and speech data in the English corpus, and use BiLSTM and self-attention mechanism (SA) to extract important text features; and use convolutional neural network, BiLSTM and SA to extract speech features with high contribution. Subsequently, the multimodal features are modelled by graph neural networks, a two-part graph is constructed and knowledge transfer is performed, and domain-invariant features containing information about inter-domain interactions are extracted. Reducing the difficulty of domain adaptation with large inter-domain differences through unsupervised domain adaptation makes the adversarial training process smoother. Finally, the recognition results are obtained by the inference of domain invariant features by the classifier. Experimental results show that the weighted accuracy of the proposed model reaches 93.67%, which significantly improves the recognition effect. Inderscience Publishers - linking academia, business and industry through research

Title: Multimodal English corpus text recognition based on unsupervised domain adaptation

Authors: Xiaole Duan; Yan Hu

Addresses: Teaching Department of Public Courses, Hunan Communication Polytechnic, Changsha 410132, China ' College of Intelligent Transportation, Hunan Communication Polytechnic, Changsha 410132, China

Abstract: With the explosion of multimodal data, it is an important challenge to effectively utilise unlabeled data for cross-modal text recognition. This paper first preprocesses the text and speech data in the English corpus, and use BiLSTM and self-attention mechanism (SA) to extract important text features; and use convolutional neural network, BiLSTM and SA to extract speech features with high contribution. Subsequently, the multimodal features are modelled by graph neural networks, a two-part graph is constructed and knowledge transfer is performed, and domain-invariant features containing information about inter-domain interactions are extracted. Reducing the difficulty of domain adaptation with large inter-domain differences through unsupervised domain adaptation makes the adversarial training process smoother. Finally, the recognition results are obtained by the inference of domain invariant features by the classifier. Experimental results show that the weighted accuracy of the proposed model reaches 93.67%, which significantly improves the recognition effect.

Keywords: multimodal text recognition; self-attention mechanism; SA; unsupervised domain adaptation; UDA; graph neural network; adversarial training; AT.

DOI: 10.1504/IJICT.2025.146103

International Journal of Information and Communication Technology, 2025 Vol.26 No.11, pp.53 - 68

Received: 12 Mar 2025
Accepted: 22 Mar 2025
Published online: 06 May 2025 *

Title: Multimodal English corpus text recognition based on unsupervised domain adaptation

Keep up-to-date