Title: A vector processing method applicable to convolutional algorithms executed in digital signal processors

Authors: Shuying Wang; Yonghua Hu; Anxing Xie; Huixiang Li; Xin Zhang; Shangfeng Mo

Addresses: School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, Hunan, China ' School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, Hunan, China ' School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, Hunan, China ' School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, Hunan, China ' School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, Hunan, China ' School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, Hunan, China

Abstract: A vector processing method for one-dimensional discrete convolution is presented for high-performance vector digital signal processor (DSP) which has on-chip vector cache and vector shuffling unit. This method fully combines the characteristics of hardware and the basic principle of the algorithm. It changes the process which calculates the result data members in turn into the process which synchronously accumulates values for multiple result data members. In this process, each data of the convolution kernel is extended as a vector by shuffling. This method has concise and clear computing logic. It not only avoids redundant memory access operations and repeated addition and multiplication operations, but also can handle a convolution kernel of an arbitrary length. The experimental results on the FT-M7002-based platform are presented, which show that the average speed-up ratios of the algorithm for single- and double-precision floating-point data reach 3.4 and 7.8, respectively, compared to the corresponding TMS320C66x library function in CCS.

Keywords: vector processor; parallel algorithms; single instruction multiple data; SIMD; shuffle; one-dimensional convolution.

DOI: 10.1504/IJES.2022.125441

International Journal of Embedded Systems, 2022 Vol.15 No.4, pp.344 - 353

Received: 20 Dec 2021
Received in revised form: 11 Mar 2022
Accepted: 10 Apr 2022

Published online: 09 Sep 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article