Title: Lightweight remote sensing road detection with an attention-augmented transformer
Authors: Feng Deng; Hongyan Tian; Xu Zhao; Duo Han
Addresses: Beijing Key Laboratory of High Dynamic Navigation Technology, Beijing Information Science and Technology University, Beijing, 100192, China; School of Automation, Beijing Information Science and Technology University, Beijing, 100192, China ' Beijing Key Laboratory of High Dynamic Navigation Technology, Beijing Information Science and Technology University, Beijing, 100192, China; School of Automation, Beijing Information Science and Technology University, Beijing, 100192, China ' Beijing Key Laboratory of High Dynamic Navigation Technology, Beijing Information Science and Technology University, Beijing, 100192, China; Key Laboratory of Modern Measurement and Control Technology, Ministry of Education, Beijing Information Science and Technology University, Beijing, 100192, China ' Beijing Key Laboratory of High Dynamic Navigation Technology, Beijing Information Science and Technology University, Beijing, 100192, China; School of Automation, Beijing Information Science and Technology University, Beijing, 100192, China
Abstract: Road extraction is a critical task in computer vision. However, accurate road delineation faces challenges due to multiple factors, e.g., object occlusions and similar entities. This study proposes a lightweight road detection model with an attention-augmented transformer to create an effective encoder-decoder and semantic extractor to enhance the road extraction precision. The encoder optimises MobileNetv3 by improving the squeeze and excitation module and bottleneck structure. This modification exploits road global feature extraction efficiency, simultaneously decreasing parameters and computational demands. Moreover, we present an attention-augmented semantic extractor comprising the enhanced transformer blocks that merge depth-wise separable convolutions with an improved multi-head attention as well as efficient channel attention mechanism, thus boosting the model proficiency in capturing extensive dependencies within road semantics. Empirical assessments on the Massachusetts and DeepGlobe road datasets demonstrate that our method outperforms the alternative state-of-the-art solutions, attaining mean intersection over union scores of 80.41% and 79.14%, respectively.
Keywords: remote sensing imagery; road extraction; LiAT-Net; attention-augmented semantic extractor; improved multi-head attention.
DOI: 10.1504/IJSNET.2024.142717
International Journal of Sensor Networks, 2024 Vol.46 No.4, pp.245 - 259
Received: 14 Jun 2024
Accepted: 01 Jul 2024
Published online: 18 Nov 2024 *