Title: Two-stream attention network with local and non-local dependence for referring relationships

Authors: Jincheng Hu; Tao Wang

Addresses: School of Information Management, Shanghai Lixin University of Accounting and Finance, Shanghai, 201209, China ' School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, 200433, China

Abstract: Referring relationships is a task aiming to retrieve entities in the image conditioned on the given structured relationship text. Previous works utilise convolution-network-base model to learn image features conditioned on the text and classify the pixels in the image for retrieval. However, convolution network concentrates on the local dependence in the image without considering correlations between pixels far apart. In this paper, a two-stream attention network (TSAN) is proposed to address the above issue. Firstly, cross-modal feature maps are initialised by adding text information to each pixel in the visual feature maps. Then, TSAN designs attention mechanism for the local and non-local dependence in the cross-modal features to obtain two probability maps. Finally, the network integrates the two probability maps to predict the localisation probability. Experiments are conducted on three datasets and the results demonstrate that the proposed TSAN achieves better performance compared with several powerful models.

Keywords: two-stream attention network; TSAN; local and non-local dependence; referring relationships; cross modal.

DOI: 10.1504/IJES.2022.122059

International Journal of Embedded Systems, 2022 Vol.15 No.1, pp.53 - 60

Received: 04 Feb 2021
Accepted: 14 Mar 2021

Published online: 08 Apr 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article