The performance of trainable speech-processing systems deteriorates significantly when there is a mismatch between the training and testing data. The data mismatch becomes a dominant factor when collecting speech data for resource scarce languages, where one wishes to use any available training data for a variety of purposes. Research into a new channel normalization (CN) technique for channel mismatched speech recognition is presented. A process of inverse linear filtering is used in order to match training and testing short-term spectra as closely as possible. Our technique is able to reduce the phoneme recognition error rate between the baseline and mismatched systems, to an extent comparable to the results obtained by the widely-used ceostral mean subtraction. Combining these techniques gives some additional improvement
Reference:
Kleynhans, N and Barnard, E. 2008. Channel normalization technique for speech recognition in mismatched conditions. Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, South Africa, 27-28 November 2008
Kleynhans, N., & Barnard, E. (2008). Channel normalization technique for speech recognition in mismatched conditions. PRASA 2008. http://hdl.handle.net/10204/5541
Kleynhans, N, and E Barnard. "Channel normalization technique for speech recognition in mismatched conditions." (2008): http://hdl.handle.net/10204/5541
Kleynhans N, Barnard E, Channel normalization technique for speech recognition in mismatched conditions; PRASA 2008; 2008. http://hdl.handle.net/10204/5541 .