Article: FCOS3Dformer: enhancing monocular 3D object detection through transformer-assisted fusion of depth information Journal: International Journal of Vehicle Systems Modelling and Testing (IJVSMT) 2024 Vol.18 No.3 pp.228 - 244 Abstract: Existing monocular 3D object detection schemes for autonomous driving predominantly rely on local features, rendering them incapable of comprehensively grasping global depth context information. Consequently, precise recognition of 3D data is hindered. To address this challenge, this paper proposes an innovative approach termed FCOS3Dformer, which leverages a Transformer-assisted depth information fusion scheme. First, a Transformer-based depth encoder is employed to establish a global depth-guided region on features processed through the adaptive channel-space coordinate attention module, encapsulating both distant and close spatial depth information within the image. Subsequently, a depth decoder facilitates interactions between inter-queries and queries with global depth features, enabling each object query to estimate global depth distance from the depth-guided region. Additionally, we introduce a multi-object bounding box module using pseudo-labels to eliminate strict limitations of original hard labels, improving monocular depth estimation. Experimental evaluations on the KITTI dataset demonstrate the effectiveness of our proposed FCOS3Dformer method. Inderscience Publishers - linking academia, business and industry through research

Title: FCOS3Dformer: enhancing monocular 3D object detection through transformer-assisted fusion of depth information

Authors: Bingsen Hao; Zhaoxue Deng; Mingze Liu; Can Liu

Addresses: School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, 400074, China ' School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, 400074, China ' School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, 400074, China ' School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, 400074, China

Abstract: Existing monocular 3D object detection schemes for autonomous driving predominantly rely on local features, rendering them incapable of comprehensively grasping global depth context information. Consequently, precise recognition of 3D data is hindered. To address this challenge, this paper proposes an innovative approach termed FCOS3Dformer, which leverages a Transformer-assisted depth information fusion scheme. First, a Transformer-based depth encoder is employed to establish a global depth-guided region on features processed through the adaptive channel-space coordinate attention module, encapsulating both distant and close spatial depth information within the image. Subsequently, a depth decoder facilitates interactions between inter-queries and queries with global depth features, enabling each object query to estimate global depth distance from the depth-guided region. Additionally, we introduce a multi-object bounding box module using pseudo-labels to eliminate strict limitations of original hard labels, improving monocular depth estimation. Experimental evaluations on the KITTI dataset demonstrate the effectiveness of our proposed FCOS3Dformer method.

Keywords: autonomous driving; 3D object detection; local features; Transformer; depth information fusion; adaptive; pseudo-label; depth estimation.

DOI: 10.1504/IJVSMT.2024.142156

International Journal of Vehicle Systems Modelling and Testing, 2024 Vol.18 No.3, pp.228 - 244

Received: 17 Dec 2023
Accepted: 28 May 2024
Published online: 10 Oct 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: FCOS3Dformer: enhancing monocular 3D object detection through transformer-assisted fusion of depth information

Keep up-to-date