Title: Piano performance beat assessment: integrating transformer with multimodal feature learning
Authors: Jun Deng
Addresses: School of Arts, Shandong Management University, Jinan, 250000, China
Abstract: This paper proposes PianoTrans-Fusion, a piano performance beat assessment system that integrates the transformer architecture with multimodal feature learning. The system uses three modalities, including audio, video, and MIDI, to perform feature extraction and preprocessing, respectively, and captures fine-grained temporal dependencies in the performance rhythm through multimodal fusion strategies and transformer-based processing modules. Comparative experiments on the MAESTRO dataset show that PianoTrans-Fusion improves rhythm consistency to 0.032 and reduces beat error to 0.071 compared to five baseline methods. Ablation experiments further verify the key roles of transformer, multimodal fusion, and self-attention mechanisms. The results indicate that the system has advantages in terms of accuracy and robustness in beat evaluation, and has application value in intelligent piano accompaniment, music education, and automated performance feedback.
Keywords: transformer; multimodal feature learning; piano performance; beat assessment.
DOI: 10.1504/IJICT.2025.149992
International Journal of Information and Communication Technology, 2025 Vol.26 No.41, pp.74 - 90
Received: 12 Aug 2025
Accepted: 26 Sep 2025
Published online: 20 Nov 2025 *


