Article: SwinRelTR: an efficient single-stage scene graph generation model for low-resolution images Journal: International Journal of Intelligent Engineering Informatics (IJIEI) 2024 Vol.12 No.2 pp.169 - 187 Abstract: Targeting low-resolution imagery is crucial in democratising computer vision technologies, facilitating applicability in resource-limited environments where high-resolution data is often unreachable. Scene graphs have proven to be a powerful representation for capturing the hierarchical relationships between objects in an input image, providing a structured visual scene understanding. Nevertheless, all scene graph generation models focus on high-resolution images, neglecting the challenges posed by low-resolution images. This paper presents a novel approach called SwinRelTR for generating scene graphs designed specifically for low-resolution images. The proposed model addresses the limitations associated with low-resolution images by utilising the Swin transformer as a backbone instead of the convolution neural network in the original RelTR model. The Visual Genome dataset is utilised to compare the SwinRelTR results with the state-of-the-art approaches. It has been proven that this approach outperforms several state-of-the-art approaches as well as the original RelTR model on low-resolution images. Inderscience Publishers - linking academia, business and industry through research

Title: SwinRelTR: an efficient single-stage scene graph generation model for low-resolution images

Authors: Mohammad Essam; Howida A. Shedeed; Mohamed F. Tolba; Dina Khattab

Addresses: Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt ' Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt ' Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt ' Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Abstract: Targeting low-resolution imagery is crucial in democratising computer vision technologies, facilitating applicability in resource-limited environments where high-resolution data is often unreachable. Scene graphs have proven to be a powerful representation for capturing the hierarchical relationships between objects in an input image, providing a structured visual scene understanding. Nevertheless, all scene graph generation models focus on high-resolution images, neglecting the challenges posed by low-resolution images. This paper presents a novel approach called SwinRelTR for generating scene graphs designed specifically for low-resolution images. The proposed model addresses the limitations associated with low-resolution images by utilising the Swin transformer as a backbone instead of the convolution neural network in the original RelTR model. The Visual Genome dataset is utilised to compare the SwinRelTR results with the state-of-the-art approaches. It has been proven that this approach outperforms several state-of-the-art approaches as well as the original RelTR model on low-resolution images.

Keywords: low-resolution; scene graph; scene graph generation; SGG; visual scene understanding; visual relationship detection.

DOI: 10.1504/IJIEI.2024.138854

International Journal of Intelligent Engineering Informatics, 2024 Vol.12 No.2, pp.169 - 187

Received: 04 Jan 2024
Accepted: 07 Feb 2024
Published online: 31 May 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: SwinRelTR: an efficient single-stage scene graph generation model for low-resolution images

Keep up-to-date