dc.contributor.author |
Badenhorst, Jacob AC
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.date.accessioned |
2019-10-28T10:19:19Z |
|
dc.date.available |
2019-10-28T10:19:19Z |
|
dc.date.issued |
2019-08 |
|
dc.identifier.citation |
Badenhorst, J.A.C. & De Wet, F. 2019. The usefulness of imperfect speech data for ASR development in low-resource languages. Information, Vol. 10, no. 9, pp. 1-6 |
en_US |
dc.identifier.issn |
2078-2489 |
|
dc.identifier.uri |
https://www.mdpi.com/2078-2489/10/9/268
|
|
dc.identifier.uri |
Doi:10.3390/info10090268
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/11196
|
|
dc.description |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
en_US |
dc.description.abstract |
When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the substantial improvements in acoustic modeling that deep architectures achieved for well-resourced languages ushered in a new data requirement: their development requires hundreds of hours of speech. A suitable strategy for the enlargement of speech resources for the South African languages is therefore required. The first possibility was to look for data that has already been collected but has not been included in an existing corpus. Additional data was collected during the NCHLT project that was not included in the official corpus: it only contains a curated, but limited subset of the data. In this paper, we first analyze the additional resources that could be harvested from the auxiliary NCHLT data. We also measure the effect of this data on acoustic modeling. The analysis incorporates recent factorized time-delay neural networks (TDNN-F). These models significantly reduce phone error rates for all languages. In addition, data augmentation and cross-corpus validation experiments for a number of the datasets illustrate the utility of the auxiliary NCHLT data. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
MDPI |
en_US |
dc.relation.ispartofseries |
Workflow;22685 |
|
dc.subject |
Automatic speech recognition |
en_US |
dc.subject |
Kaldi |
en_US |
dc.subject |
Low-resource languages |
en_US |
dc.subject |
Speech data |
en_US |
dc.subject |
Speech technology |
en_US |
dc.subject |
Time-delay neural networks |
en_US |
dc.title |
The usefulness of imperfect speech data for ASR development in low-resource languages |
en_US |
dc.type |
Article |
|
dc.identifier.apacitation |
Badenhorst, J. A., & De Wet, F. (2019). The usefulness of imperfect speech data for ASR development in low-resource languages. http://hdl.handle.net/10204/11196 |
en_ZA |
dc.identifier.chicagocitation |
Badenhorst, Jacob AC, and Febe De Wet "The usefulness of imperfect speech data for ASR development in low-resource languages." (2019) http://hdl.handle.net/10204/11196 |
en_ZA |
dc.identifier.vancouvercitation |
Badenhorst JA, De Wet F. The usefulness of imperfect speech data for ASR development in low-resource languages. 2019; http://hdl.handle.net/10204/11196. |
en_ZA |
dc.identifier.ris |
TY - Article
AU - Badenhorst, Jacob AC
AU - De Wet, Febe
AB - When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the substantial improvements in acoustic modeling that deep architectures achieved for well-resourced languages ushered in a new data requirement: their development requires hundreds of hours of speech. A suitable strategy for the enlargement of speech resources for the South African languages is therefore required. The first possibility was to look for data that has already been collected but has not been included in an existing corpus. Additional data was collected during the NCHLT project that was not included in the official corpus: it only contains a curated, but limited subset of the data. In this paper, we first analyze the additional resources that could be harvested from the auxiliary NCHLT data. We also measure the effect of this data on acoustic modeling. The analysis incorporates recent factorized time-delay neural networks (TDNN-F). These models significantly reduce phone error rates for all languages. In addition, data augmentation and cross-corpus validation experiments for a number of the datasets illustrate the utility of the auxiliary NCHLT data.
DA - 2019-08
DB - ResearchSpace
DP - CSIR
KW - Automatic speech recognition
KW - Kaldi
KW - Low-resource languages
KW - Speech data
KW - Speech technology
KW - Time-delay neural networks
LK - https://researchspace.csir.co.za
PY - 2019
SM - 2078-2489
T1 - The usefulness of imperfect speech data for ASR development in low-resource languages
TI - The usefulness of imperfect speech data for ASR development in low-resource languages
UR - http://hdl.handle.net/10204/11196
ER -
|
en_ZA |