Authors: Ruo-gui Xiao, Tong-qiang Guo
Addresses: Institute of Artificial Intelligence, School of Computer Science, Zhejiang University, Zheda Road 38, Yuquan Campus, Hangzhou, Zhejiang Province 310027, China. ' Institute of Artificial Intelligence, School of Computer Science, Zhejiang University, Zheda Road 38, Yuquan Campus, Hangzhou, Zhejiang Province 310027, China
Abstract: Location-aware computing is important in pervasive computing and intelligent video surveillance. We propose a two-stage multimodal approach to locate the active speaker in intelligent environments. Firstly, human voice is captured as audio cue to find the approximate orientation of current speaker. Secondly, the colour feature of mouth region is extracted as visual cue to detect continuous mouth motion that identifies the active speaker. The speaking recognition is conducted by a well-trained Hidden Markov Model based on colour feature of mouth region during continuous motion. Experiments show that the proposed multimodal approach is effective for speaker localisation in intelligent indoor environments.
Keywords: location awareness; multimodal speaker; pervasive computing; TDOA; time delays of arrival; HMM; hidden Markov models; location-aware computing; intelligent video surveillance; active speakers; human voice; speaker orientation; colour features; continuous mouth motion; speaking recognition; security.
International Journal of Computer Applications in Technology, 2010 Vol.38 No.1/2/3, pp.118 - 123
Received: 31 Jul 2008
Accepted: 15 Dec 2008
Published online: 16 Jul 2010 *