Title: Decoupled 3-D object detector

Authors: Moemen Arafa; Ahmed Osama; M. Abdelaziz; Maged Ghoneima; Fernando García; Shady A. Maged

Addresses: Autotronics Research Lab (ARL), Faculty of Engineering, Ain Shams University, Cairo, Egypt ' Autotronics Research Lab (ARL), Faculty of Engineering, Ain Shams University, Cairo, Egypt; Centre of Mobility Research, Faculty of Engineering, Ain Shams University, Cairo, Egypt ' Autotronics Research Lab (ARL), Faculty of Engineering, Ain Shams University, Cairo, Egypt; Automotive Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt ' Autotronics Research Lab (ARL), Faculty of Engineering, Ain Shams University, Cairo, Egypt; Mechatronics Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt ' Intelligent Systems Laboratory, Universidad Carlos III de Madrid, Leganés, Madrid, Spain ' Autotronics Research Lab (ARL), Faculty of Engineering, Ain Shams University, Cairo, Egypt; Mechatronics Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt

Abstract: This paper proposes an efficient cascaded 3-D object detection architecture. Such an architecture decouples the 3-D object detection pipeline to maximise the utilisation of the inherent advantages of RGB images and LiDAR point clouds in order to perform 3-D object detection while maintaining low computational complexity. Our proposed architecture relies on a cascade of two networks, the first leverages the texture density in images and the maturity of state-of-the-art 2-D object detectors to classify and obtain initial region proposals for objects in the scene. These proposals are fed to a light-weight secondary network that leverages the compactness of bird-eye view point cloud representations to perform orientation and size estimation. The 3-D bounding box proposal is constructed by fusing predictions inferred from both networks, as predictions lie on orthogonal planes. Evaluated on the KITTI benchmark data set, we show that the proposed method obtains results on-par with more complex end-to-end 3-D detection methods while greatly reducing computational and memory requirements. This work also presents results from the deployment within a perception pipeline, and analyses challenges faced in deployment within a frontal perception pipeline.

Keywords: autonomous vehicles; CNN; estimation; 3D object detection; perception; point cloud; bird eye-view.

DOI: 10.1504/IJVAS.2022.133008

International Journal of Vehicle Autonomous Systems, 2022 Vol.16 No.2/3/4, pp.143 - 160

Received: 04 Dec 2020
Received in revised form: 09 Jan 2022
Accepted: 26 Jan 2022

Published online: 24 Aug 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article