Article: Intelligent generation of robotic dance motions via convolution-enhanced transformer networks Journal: International Journal of Information and Communication Technology (IJICT) 2025 Vol.26 No.30 pp.97 - 112 Abstract: The generation of robotic dance movements represents a complex and challenging task. To tackle the limitations of current research, such as poor coherence in movement sequences and low generation efficiency, this paper proposes a depth-separable convolution-enhanced Transformer network (DSFormer). DSFormer significantly reduces the number of parameters while enhancing the computational efficiency of the model. Furthermore, based on DSFormer, a music encoder, a robot dance movement encoder, and a cross-modal generator are developed. These components effectively capture both local spatial features and global temporal characteristics of music and robotic dance motion sequences, thereby alleviating the adverse effects of noisy data. Experimental comparisons conducted on real-world datasets reveal that the proposed method achieves at least a 21.64% reduction in the Frechet Inception Distance (FID) score compared to baseline approaches. This not only ensures the generation of high-quality dance motions but also maintains precise synchronisation with the music. Inderscience Publishers - linking academia, business and industry through research

Title: Intelligent generation of robotic dance motions via convolution-enhanced transformer networks

Authors: Fei Yue; Jing Tong; Zhen Ren

Addresses: Hubei Engineering University, Xiaogan 432000, China ' Hubei University of Automotive Technology, Shiyan 442000, China ' Wuhan Conservatory of Music, Wuhan 430000, China

Abstract: The generation of robotic dance movements represents a complex and challenging task. To tackle the limitations of current research, such as poor coherence in movement sequences and low generation efficiency, this paper proposes a depth-separable convolution-enhanced Transformer network (DSFormer). DSFormer significantly reduces the number of parameters while enhancing the computational efficiency of the model. Furthermore, based on DSFormer, a music encoder, a robot dance movement encoder, and a cross-modal generator are developed. These components effectively capture both local spatial features and global temporal characteristics of music and robotic dance motion sequences, thereby alleviating the adverse effects of noisy data. Experimental comparisons conducted on real-world datasets reveal that the proposed method achieves at least a 21.64% reduction in the Frechet Inception Distance (FID) score compared to baseline approaches. This not only ensures the generation of high-quality dance motions but also maintains precise synchronisation with the music.

Keywords: dance action generation; depth separable convolution; transformer model; music encoder; cross-modal generator.

DOI: 10.1504/IJICT.2025.147764

International Journal of Information and Communication Technology, 2025 Vol.26 No.30, pp.97 - 112

Received: 02 Jun 2025
Accepted: 16 Jun 2025
Published online: 30 Jul 2025 *

Title: Intelligent generation of robotic dance motions via convolution-enhanced transformer networks

Keep up-to-date