Title: Incorporation of question segregation procedures in visual question-answering models
Authors: Souvik Chowdhury; Badal Soni; Doli Phukan
Addresses: Computer Science and Engineering Department, National Institute of Technology Silchar, Silchar, 788010, Assam, India ' Computer Science and Engineering Department, National Institute of Technology Silchar, Silchar, 788010, Assam, India ' Computer Science and Engineering Department, National Institute of Technology Silchar, Silchar, 788010, Assam, India
Abstract: There are various open issues in visual question answering (VQA). One of them is sometimes a model can predict 'Yes' or 'No' as an answer, which is not relatable to the question and requires a descriptive answer, and vice versa. To solve this issue in the VQA domain, in this paper, a question segregation (QS) technique is incorporated to classify the questions into three types ('Yes/No', 'Other' and 'Number'). Then we successfully incorporated this technique with two of the VQA models, stacked attention networks (SAN) and modular co-attention network (MCAN). We evaluate the performance of the QS and SAN models on two datasets, VQA v.2 and CLEVR. We also studied and analysed the impact of question segregation on the performance of these two models on different datasets.
Keywords: VQA; visual question answering; machine learning; deep learning; CNN; convolutional neural network; LSTM; long-short-term memory.
DOI: 10.1504/IJCSM.2024.140859
International Journal of Computing Science and Mathematics, 2024 Vol.20 No.2, pp.99 - 108
Received: 29 Jun 2022
Accepted: 11 Apr 2023
Published online: 03 Sep 2024 *