This paper demonstrates the application of Intrinsic Spectral Analysis (ISA) for low-resource Automatic Speech Recognition (ASR). State-of-the-art speech recognition systems that require large amounts of task specific training data fail to reliably model feature distributions in resource impoverished settings. We address this issue by approaching the problem in the front-end where we can learn an intrinsic subspace that can replace the traditional feature space like mel frequency cepstral coefficients (MFCC). We use ISA features for underresourced settings to model the acoustic feature distribution with less complexity. We also propose to combine intrinsic features with extrinsic ones to take advantage of both subspaces. Experimental results for a phone recognition task on the Afrikaans language show that a combination of the intrinsic subspace and extrinsic subspaces provides us with improved performance compared to conventional features.
Reference:
Sahraeian, R, Van Compernolle, D and De Wet, F. 2014. On using intrinsic spectral analysis for low-resource languages. In: 4th International Workshop on Spoken Language Technologies for Under-resourced Languages, St. Petersburg Institute for Informatics and Automation, St. Petersburg, Russia, 14-16 May 2014
Sahraeian, R., Van Compernolle, D., & De Wet, F. (2014). On using intrinsic spectral analysis for low-resource languages. SLTU 2014. http://hdl.handle.net/10204/7527
Sahraeian, R, D Van Compernolle, and Febe De Wet. "On using intrinsic spectral analysis for low-resource languages." (2014): http://hdl.handle.net/10204/7527
Sahraeian R, Van Compernolle D, De Wet F, On using intrinsic spectral analysis for low-resource languages; SLTU 2014; 2014. http://hdl.handle.net/10204/7527 .
4th International Workshop on Spoken Language Technologies for Under-resourced Languages, St. Petersburg Institute for Informatics and Automation, St. Petersburg, Russia, 14-16 May 2014