ResearchSpace

The limitations of data perturbation for ASR of learner data in under-resourced languages

Show simple item record

dc.contributor.author Badenhorst, Jacob AC
dc.contributor.author De Wet, Febe
dc.date.accessioned 2018-01-23T10:05:07Z
dc.date.available 2018-01-23T10:05:07Z
dc.date.issued 2017-11
dc.identifier.citation Badenhorst, J.A.C and De Wet, F. 2017. The limitations of data perturbation for ASR of learner data in under-resourced languages. Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech) 2017, 29 November - 1 December 2017, Bloemfontein, Free State, South Africa en_US
dc.identifier.isbn 978-1-5386-2315-2
dc.identifier.uri http://ieeexplore.ieee.org/document/8261121/
dc.identifier.uri DOI: 10.1109/RoboMech.2017.8261121
dc.identifier.uri http://hdl.handle.net/10204/9981
dc.description Copyright: 2017 IEEE. Due to copyright restrictions, the attached PDF file contains the accepted version of the published item. For access to the published version, please consult the publisher's website. en_US
dc.description.abstract This paper reports on the recognition of second language (L2) isiXhosa speech produced by beginner level adult language learners. The speech samples were produced and recorded during the development of a Mobile Assisted Language Learning (MALL) application. The application aimed to provide a means for students to practise their oral skills and improve their pronunciation of isiXhosa. Automatically derived proficiency indicators can enhance MALL applications by enabling Computer Assisted Pronunciation Training (CAPT) and monitoring students' progress. However, the automatic recognition of low-proficient, non-native speech is a particularly challenging task, especially for under-resourced languages. Data augmentation strategies aim to increase the quantity of training data, improve model robustness and avoid overfitting. In this study we investigated whether directly adjusting the speed of raw audio signals (simulating additional training speakers) improved phone recognition accuracy for learner data. We present results for subspace Gaussian mixture models (SGMMs) and deep neural networks (DNNs) implemented using Kaldi. The under-resourced system's tendency to overfit on within-corpus test data is clearly illustrated and contrasted with cross-corpus results for non-native data. Compared to first language data, the speech rate of most language learners is considerably slower. Our results indicate that adjusting the speed of the learner data improves phone recognition accuracy. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.ispartofseries Worklist;19988
dc.subject Mobile Assisted Language Learning en_US
dc.subject MALL en_US
dc.subject Pattern recognition en_US
dc.title The limitations of data perturbation for ASR of learner data in under-resourced languages en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Badenhorst, J. A., & De Wet, F. (2017). The limitations of data perturbation for ASR of learner data in under-resourced languages. IEEE. http://hdl.handle.net/10204/9981 en_ZA
dc.identifier.chicagocitation Badenhorst, Jacob AC, and Febe De Wet. "The limitations of data perturbation for ASR of learner data in under-resourced languages." (2017): http://hdl.handle.net/10204/9981 en_ZA
dc.identifier.vancouvercitation Badenhorst JA, De Wet F, The limitations of data perturbation for ASR of learner data in under-resourced languages; IEEE; 2017. http://hdl.handle.net/10204/9981 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Badenhorst, Jacob AC AU - De Wet, Febe AB - This paper reports on the recognition of second language (L2) isiXhosa speech produced by beginner level adult language learners. The speech samples were produced and recorded during the development of a Mobile Assisted Language Learning (MALL) application. The application aimed to provide a means for students to practise their oral skills and improve their pronunciation of isiXhosa. Automatically derived proficiency indicators can enhance MALL applications by enabling Computer Assisted Pronunciation Training (CAPT) and monitoring students' progress. However, the automatic recognition of low-proficient, non-native speech is a particularly challenging task, especially for under-resourced languages. Data augmentation strategies aim to increase the quantity of training data, improve model robustness and avoid overfitting. In this study we investigated whether directly adjusting the speed of raw audio signals (simulating additional training speakers) improved phone recognition accuracy for learner data. We present results for subspace Gaussian mixture models (SGMMs) and deep neural networks (DNNs) implemented using Kaldi. The under-resourced system's tendency to overfit on within-corpus test data is clearly illustrated and contrasted with cross-corpus results for non-native data. Compared to first language data, the speech rate of most language learners is considerably slower. Our results indicate that adjusting the speed of the learner data improves phone recognition accuracy. DA - 2017-11 DB - ResearchSpace DP - CSIR KW - Mobile Assisted Language Learning KW - MALL KW - Pattern recognition LK - https://researchspace.csir.co.za PY - 2017 SM - 978-1-5386-2315-2 T1 - The limitations of data perturbation for ASR of learner data in under-resourced languages TI - The limitations of data perturbation for ASR of learner data in under-resourced languages UR - http://hdl.handle.net/10204/9981 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record