Open Access Article

Title: An automatic fluency evaluation method for broadcast hosting speech: autoregressive speech LLM

Authors: Bingyuan Li

Addresses: Xiangshan Film and Television College, Ningbo University of Finance and Economics, Ningbo, 315175, China

Abstract: Oral fluency is a key indicator for evaluating the professional skills of broadcast hosting. To address the current research gap in modelling deep semantic associations for spoken fluency, this paper first utilises Res2Net for multiscale feature extraction from broadcast hosts' speech. Subsequently, a pause prediction module is proposed. This module predicts multiple types of pause labels based on the original text. It then predicts a Gaussian mixture distribution for each phoneme and achieves diverse phoneme durations through random sampling. Finally, an autoregressive large language model and a discriminative module based on transformer are proposed. This module is applied at each time step of the autoregressive process and prevents misalignment phenomena via the transformer and judging mechanism. Experimental results show that the proposed model achieves an evaluation accuracy of 93.35% and a word error rate of 0.7%, enabling high-accuracy fluency evaluation for oral speech.

Keywords: spoken fluency assessment; feature extraction; Res2Net model; autoregressive large language model; transformer model.

DOI: 10.1504/IJICT.2025.150402

International Journal of Information and Communication Technology, 2025 Vol.26 No.43, pp.78 - 94

Received: 25 Sep 2025
Accepted: 25 Oct 2025

Published online: 12 Dec 2025 *