Title: A multi features fusion support vector machine for classification of emotion issue in the design of an audio recognition system

Authors: Imen Trabelsi; Med Salim Bouhlel

Addresses: Sciences and Technologies of Image and Telecommunications (SETIT), University of Sfax, Tunisia ' Sciences and Technologies of Image and Telecommunications (SETIT), University of Sfax, Tunisia

Abstract: Most state-of-the-art automatic speech emotion recognition rely on utterance level statistics of features. In this study, spoken utterances are represented by a set of statistics from different features computed over all frames. Therefore, for exploiting the complementary emotion-specific information provided by individual features (spectral, prosodic and voice quality features), intelligent combination of features is expected. In this work, we use contour-based low-level descriptors to extract features from the emotional data and then fuse the evidences provided by these features. Finally, multi-class SVM modelling is performed directly at the output of the extracted features. The experiments were carried out on the Berlin corpus consisting of six basic emotions: sadness, boredom, neutral, fear, happiness, anger and the neutral state (no emotion). The results demonstrate that on the average, the features obtained from different information streams and combined at the decision level outperforms the single features or the features combined at the feature level in terms of classification accuracy.

Keywords: emotions; SVM; pitch; formants; loudness; MFCC; jitter; shimmer; HNR; features combination.

DOI: 10.1504/IJAPR.2016.079053

International Journal of Applied Pattern Recognition, 2016 Vol.3 No.2, pp.181 - 196

Received: 14 Jan 2016
Accepted: 11 Apr 2016

Published online: 10 Sep 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article