Title: A two-stage multimodal speaker location-aware approach in pervasive computing

Authors: Ruo-gui Xiao, Tong-qiang Guo

Addresses: Institute of Artificial Intelligence, School of Computer Science, Zhejiang University, Zheda Road 38, Yuquan Campus, Hangzhou, Zhejiang Province 310027, China. ' Institute of Artificial Intelligence, School of Computer Science, Zhejiang University, Zheda Road 38, Yuquan Campus, Hangzhou, Zhejiang Province 310027, China

Abstract: Location-aware computing is important in pervasive computing and intelligent video surveillance. We propose a two-stage multimodal approach to locate the active speaker in intelligent environments. Firstly, human voice is captured as audio cue to find the approximate orientation of current speaker. Secondly, the colour feature of mouth region is extracted as visual cue to detect continuous mouth motion that identifies the active speaker. The speaking recognition is conducted by a well-trained Hidden Markov Model based on colour feature of mouth region during continuous motion. Experiments show that the proposed multimodal approach is effective for speaker localisation in intelligent indoor environments.

Keywords: location awareness; multimodal speaker; pervasive computing; TDOA; time delays of arrival; HMM; hidden Markov models; location-aware computing; intelligent video surveillance; active speakers; human voice; speaker orientation; colour features; continuous mouth motion; speaking recognition; security.

DOI: 10.1504/IJCAT.2010.034147

International Journal of Computer Applications in Technology, 2010 Vol.38 No.1/2/3, pp.118 - 123

Received: 31 Jul 2008
Accepted: 15 Dec 2008

Published online: 16 Jul 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article