Title: Multi-modal similarity feature exchange and structural perception for person re-identification
Authors: Xuefeng Lei
Addresses: College of Artificial Intelligence, Jiangxi Industry Polytechnic College, Nanchang, Jiangxi, 330095, China
Abstract: Visible-infrared person re-identification is crucial for surveillance, aiming to match person images across visible and infrared modalities. However, spectral and style gaps hinder local structure modelling and cross-modal feature alignment. We propose the cross-modality similarity exchange transformer (CSET) to improve both aspects. CSET uses two modality-specific transformer encoders to extract features independently. A similarity exchange mechanism computes intra-modality similarity and cross-modality Jaccard distance, selectively exchanging correlated token features for local alignment and feature complementation. To enhance structural perception, we introduce a multi-relational heterogeneous graph attention mechanism, building a graph from transformer outputs where positional embedding differences define relation levels. Feature aggregation is guided by relational strength to capture fine-grained structural cues. Experiments on RegDB and SYSU-MM01 show CSET outperforms state-of-the-art methods in Rank-1 accuracy and mAP, validating its cross-modal learning effectiveness.
Keywords: cross-modality person re-identification; similarity exchange; SE; transformer; heterogeneous graph attention; feature alignment.
DOI: 10.1504/IJICT.2025.149988
International Journal of Information and Communication Technology, 2025 Vol.26 No.41, pp.24 - 42
Received: 08 Jul 2025
Accepted: 24 Aug 2025
Published online: 20 Nov 2025 *


