Title: Incorporation of question segregation procedures in visual question-answering models

Authors: Souvik Chowdhury; Badal Soni; Doli Phukan

Addresses: Computer Science and Engineering Department, National Institute of Technology Silchar, Silchar, 788010, Assam, India ' Computer Science and Engineering Department, National Institute of Technology Silchar, Silchar, 788010, Assam, India ' Computer Science and Engineering Department, National Institute of Technology Silchar, Silchar, 788010, Assam, India

Abstract: There are various open issues in visual question answering (VQA). One of them is sometimes a model can predict 'Yes' or 'No' as an answer, which is not relatable to the question and requires a descriptive answer, and vice versa. To solve this issue in the VQA domain, in this paper, a question segregation (QS) technique is incorporated to classify the questions into three types ('Yes/No', 'Other' and 'Number'). Then we successfully incorporated this technique with two of the VQA models, stacked attention networks (SAN) and modular co-attention network (MCAN). We evaluate the performance of the QS and SAN models on two datasets, VQA v.2 and CLEVR. We also studied and analysed the impact of question segregation on the performance of these two models on different datasets.

Keywords: VQA; visual question answering; machine learning; deep learning; CNN; convolutional neural network; LSTM; long-short-term memory.

DOI: 10.1504/IJCSM.2024.140859

International Journal of Computing Science and Mathematics, 2024 Vol.20 No.2, pp.99 - 108

Received: 29 Jun 2022
Accepted: 11 Apr 2023

Published online: 03 Sep 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article