This paper reports on the recognition of second language (L2) isiXhosa speech produced by beginner level adult language learners. The speech samples were produced and recorded during the development of a Mobile Assisted Language Learning (MALL) application. The application aimed to provide a means for students to practise their oral skills and improve their pronunciation of isiXhosa. Automatically derived proficiency indicators can enhance MALL applications by enabling Computer Assisted Pronunciation Training (CAPT) and monitoring students' progress. However, the automatic recognition of low-proficient, non-native speech is a particularly challenging task, especially for under-resourced languages. Data augmentation strategies aim to increase the quantity of training data, improve model robustness and avoid overfitting. In this study we investigated whether directly adjusting the speed of raw audio signals (simulating additional training speakers) improved phone recognition accuracy for learner data. We present results for subspace Gaussian mixture models (SGMMs) and deep neural networks (DNNs) implemented using Kaldi. The under-resourced system's tendency to overfit on within-corpus test data is clearly illustrated and contrasted with cross-corpus results for non-native data. Compared to first language data, the speech rate of most language learners is considerably slower. Our results indicate that adjusting the speed of the learner data improves phone recognition accuracy.
Reference:
Badenhorst, J.A.C and De Wet, F. 2017. The limitations of data perturbation for ASR of learner data in under-resourced languages. Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech) 2017, 29 November - 1 December 2017, Bloemfontein, Free State, South Africa
Badenhorst, J. A., & De Wet, F. (2017). The limitations of data perturbation for ASR of learner data in under-resourced languages. IEEE. http://hdl.handle.net/10204/9981
Badenhorst, Jacob AC, and Febe De Wet. "The limitations of data perturbation for ASR of learner data in under-resourced languages." (2017): http://hdl.handle.net/10204/9981
Badenhorst JA, De Wet F, The limitations of data perturbation for ASR of learner data in under-resourced languages; IEEE; 2017. http://hdl.handle.net/10204/9981 .
Copyright: 2017 IEEE. Due to copyright restrictions, the attached PDF file contains the accepted version of the published item. For access to the published version, please consult the publisher's website.