Title: Optimising multi-modal fusion with a tri-encoder tensor network for process applications
Authors: Jiayao Li; Li Li; Xiaochen Shi; Zeqiu Chen; Kaiyi Zhao; Ruizhi Sun
Addresses: School of Software, Shanxi Agricultural University, Taigu 030801, China; College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China ' School of Computer Science and Technology, North University of China, Taiyuan, 030051, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China; Scientific Research Base for Integrated Technologies of Precision Agriculture (Animal Husbandry), The Ministry of Agriculture, Beijing 110000, China
Abstract: Multi-modal fusion combines information from diverse modalities, enabling scalable predictions in complex data environments. To address the existing limitations in capturing modalities interactions and reducing time consumption, we propose an optimising multi-modal fusion with a tri-encoder tensor network for process applications (TS-OMMF). Specifically, the input modalities are encoded to abstract the intra-modal feature and obtain the representation respectively. The representations are fused into a high dimension space with a low-rank factor to limit the dimension, and a linear extension pattern is employed to assist model extend for multi-modal. The fusion features are input into a novel tri-encoder to capture the inter-modal feature at a finer granularity and obtain the complementary features, thereby reducing the time consumption. Extensive experiments on benchmark multi-modal datasets demonstrate that TS-OMMF improves performance metrics by 0.8% to 6.1%. These results highlight TS-OMMF practical applicability, scalability, and potential for advancing process modelling and other complex multi-modal data-driven tasks.
Keywords: tensor fusion; encoder-decoder; tensor representation; multi-modal fusion; MMF.
DOI: 10.1504/IJSPM.2025.148293
International Journal of Simulation and Process Modelling, 2025 Vol.22 No.1/2, pp.75 - 92
Received: 07 Jan 2025
Accepted: 23 May 2025
Published online: 01 Sep 2025 *