dc.contributor.author |
Badenhorst, Jacob AC
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.date.accessioned |
2018-01-23T10:05:07Z |
|
dc.date.available |
2018-01-23T10:05:07Z |
|
dc.date.issued |
2017-11 |
|
dc.identifier.citation |
Badenhorst, J.A.C and De Wet, F. 2017. The limitations of data perturbation for ASR of learner data in under-resourced languages. Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech) 2017, 29 November - 1 December 2017, Bloemfontein, Free State, South Africa |
en_US |
dc.identifier.isbn |
978-1-5386-2315-2 |
|
dc.identifier.uri |
http://ieeexplore.ieee.org/document/8261121/
|
|
dc.identifier.uri |
DOI: 10.1109/RoboMech.2017.8261121
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/9981
|
|
dc.description |
Copyright: 2017 IEEE. Due to copyright restrictions, the attached PDF file contains the accepted version of the published item. For access to the published version, please consult the publisher's website. |
en_US |
dc.description.abstract |
This paper reports on the recognition of second language (L2) isiXhosa speech produced by beginner level adult language learners. The speech samples were produced and recorded during the development of a Mobile Assisted Language Learning (MALL) application. The application aimed to provide a means for students to practise their oral skills and improve their pronunciation of isiXhosa. Automatically derived proficiency indicators can enhance MALL applications by enabling Computer Assisted Pronunciation Training (CAPT) and monitoring students' progress. However, the automatic recognition of low-proficient, non-native speech is a particularly challenging task, especially for under-resourced languages. Data augmentation strategies aim to increase the quantity of training data, improve model robustness and avoid overfitting. In this study we investigated whether directly adjusting the speed of raw audio signals (simulating additional training speakers) improved phone recognition accuracy for learner data. We present results for subspace Gaussian mixture models (SGMMs) and deep neural networks (DNNs) implemented using Kaldi. The under-resourced system's tendency to overfit on within-corpus test data is clearly illustrated and contrasted with cross-corpus results for non-native data. Compared to first language data, the speech rate of most language learners is considerably slower. Our results indicate that adjusting the speed of the learner data improves phone recognition accuracy. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
IEEE |
en_US |
dc.relation.ispartofseries |
Worklist;19988 |
|
dc.subject |
Mobile Assisted Language Learning |
en_US |
dc.subject |
MALL |
en_US |
dc.subject |
Pattern recognition |
en_US |
dc.title |
The limitations of data perturbation for ASR of learner data in under-resourced languages |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Badenhorst, J. A., & De Wet, F. (2017). The limitations of data perturbation for ASR of learner data in under-resourced languages. IEEE. http://hdl.handle.net/10204/9981 |
en_ZA |
dc.identifier.chicagocitation |
Badenhorst, Jacob AC, and Febe De Wet. "The limitations of data perturbation for ASR of learner data in under-resourced languages." (2017): http://hdl.handle.net/10204/9981 |
en_ZA |
dc.identifier.vancouvercitation |
Badenhorst JA, De Wet F, The limitations of data perturbation for ASR of learner data in under-resourced languages; IEEE; 2017. http://hdl.handle.net/10204/9981 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Badenhorst, Jacob AC
AU - De Wet, Febe
AB - This paper reports on the recognition of second language (L2) isiXhosa speech produced by beginner level adult language learners. The speech samples were produced and recorded during the development of a Mobile Assisted Language Learning (MALL) application. The application aimed to provide a means for students to practise their oral skills and improve their pronunciation of isiXhosa. Automatically derived proficiency indicators can enhance MALL applications by enabling Computer Assisted Pronunciation Training (CAPT) and monitoring students' progress. However, the automatic recognition of low-proficient, non-native speech is a particularly challenging task, especially for under-resourced languages. Data augmentation strategies aim to increase the quantity of training data, improve model robustness and avoid overfitting. In this study we investigated whether directly adjusting the speed of raw audio signals (simulating additional training speakers) improved phone recognition accuracy for learner data. We present results for subspace Gaussian mixture models (SGMMs) and deep neural networks (DNNs) implemented using Kaldi. The under-resourced system's tendency to overfit on within-corpus test data is clearly illustrated and contrasted with cross-corpus results for non-native data. Compared to first language data, the speech rate of most language learners is considerably slower. Our results indicate that adjusting the speed of the learner data improves phone recognition accuracy.
DA - 2017-11
DB - ResearchSpace
DP - CSIR
KW - Mobile Assisted Language Learning
KW - MALL
KW - Pattern recognition
LK - https://researchspace.csir.co.za
PY - 2017
SM - 978-1-5386-2315-2
T1 - The limitations of data perturbation for ASR of learner data in under-resourced languages
TI - The limitations of data perturbation for ASR of learner data in under-resourced languages
UR - http://hdl.handle.net/10204/9981
ER -
|
en_ZA |