Article: Deep learning models combining stereo vision for dance movement evaluation Journal: International Journal of Information and Communication Technology (IJICT) 2025 Vol.26 No.11 pp.69 - 85 Abstract: Computerised dance movement evaluation has grown to be a prominent research focus as computer vision and deep learning algorithms develop. Although manual annotation and 2D picture analysis are used in traditional dance movement evaluation techniques, they find it difficult to capture the dancer's 3D spatial information, therefore producing erroneous and inconsistent assessments. To address this difficulty, this work presents StereoDance-CNN-Transformer, a dance movement evaluation model leveraging stereo vision and deep learning techniques. Whereas transformer employs the self-attention mechanism for temporal modelling to capture dance movement temporal dynamics, the convolutional neural network (CNN) extracts spatial characteristics from the image and captures dance movement posture. Combining spatial and temporal data helps the model to grasp and examine difficult dancing motions. Under several dance forms, this work examined StereoDance-CNN-Transformer and showed it exceeds conventional techniques in evaluation accuracy, fluency, cross-stylistic generalisation, adaptability, and robustness. Inderscience Publishers - linking academia, business and industry through research

Title: Deep learning models combining stereo vision for dance movement evaluation

Authors: Xiaofei Ma

Addresses: School of Music, Nanjing Xiaozhuang University, Nanjing 211171, China

Abstract: Computerised dance movement evaluation has grown to be a prominent research focus as computer vision and deep learning algorithms develop. Although manual annotation and 2D picture analysis are used in traditional dance movement evaluation techniques, they find it difficult to capture the dancer's 3D spatial information, therefore producing erroneous and inconsistent assessments. To address this difficulty, this work presents StereoDance-CNN-Transformer, a dance movement evaluation model leveraging stereo vision and deep learning techniques. Whereas transformer employs the self-attention mechanism for temporal modelling to capture dance movement temporal dynamics, the convolutional neural network (CNN) extracts spatial characteristics from the image and captures dance movement posture. Combining spatial and temporal data helps the model to grasp and examine difficult dancing motions. Under several dance forms, this work examined StereoDance-CNN-Transformer and showed it exceeds conventional techniques in evaluation accuracy, fluency, cross-stylistic generalisation, adaptability, and robustness.

Keywords: dance movement evaluation; stereo vision; deep learning; convolutional neural network; CNN; transformer.

DOI: 10.1504/IJICT.2025.146102

International Journal of Information and Communication Technology, 2025 Vol.26 No.11, pp.69 - 85

Received: 12 Mar 2025
Accepted: 22 Mar 2025
Published online: 06 May 2025 *

Title: Deep learning models combining stereo vision for dance movement evaluation

Keep up-to-date