The usefulness of imperfect speech data for ASR development in low-resource languages

Badenhorst, Jacob AC; De Wet, Febe

dc.contributor.author	Badenhorst, Jacob AC
dc.contributor.author	De Wet, Febe
dc.date.accessioned	2019-10-28T10:19:19Z
dc.date.available	2019-10-28T10:19:19Z
dc.date.issued	2019-08
dc.identifier.citation	Badenhorst, J.A.C. & De Wet, F. 2019. The usefulness of imperfect speech data for ASR development in low-resource languages. Information, Vol. 10, no. 9, pp. 1-6	en_US
dc.identifier.issn	2078-2489
dc.identifier.uri	https://www.mdpi.com/2078-2489/10/9/268
dc.identifier.uri	Doi:10.3390/info10090268
dc.identifier.uri	http://hdl.handle.net/10204/11196
dc.description	© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).	en_US
dc.description.abstract	When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the substantial improvements in acoustic modeling that deep architectures achieved for well-resourced languages ushered in a new data requirement: their development requires hundreds of hours of speech. A suitable strategy for the enlargement of speech resources for the South African languages is therefore required. The first possibility was to look for data that has already been collected but has not been included in an existing corpus. Additional data was collected during the NCHLT project that was not included in the official corpus: it only contains a curated, but limited subset of the data. In this paper, we first analyze the additional resources that could be harvested from the auxiliary NCHLT data. We also measure the effect of this data on acoustic modeling. The analysis incorporates recent factorized time-delay neural networks (TDNN-F). These models significantly reduce phone error rates for all languages. In addition, data augmentation and cross-corpus validation experiments for a number of the datasets illustrate the utility of the auxiliary NCHLT data.	en_US
dc.language.iso	en	en_US
dc.publisher	MDPI	en_US
dc.relation.ispartofseries	Workflow;22685
dc.subject	Automatic speech recognition	en_US
dc.subject	Kaldi	en_US
dc.subject	Low-resource languages	en_US
dc.subject	Speech data	en_US
dc.subject	Speech technology	en_US
dc.subject	Time-delay neural networks	en_US
dc.title	The usefulness of imperfect speech data for ASR development in low-resource languages	en_US
dc.type	Article
dc.identifier.apacitation	Badenhorst, J. A., & De Wet, F. (2019). The usefulness of imperfect speech data for ASR development in low-resource languages. http://hdl.handle.net/10204/11196	en_ZA
dc.identifier.chicagocitation	Badenhorst, Jacob AC, and Febe De Wet "The usefulness of imperfect speech data for ASR development in low-resource languages." (2019) http://hdl.handle.net/10204/11196	en_ZA
dc.identifier.vancouvercitation	Badenhorst JA, De Wet F. The usefulness of imperfect speech data for ASR development in low-resource languages. 2019; http://hdl.handle.net/10204/11196.	en_ZA
dc.identifier.ris	TY - Article AU - Badenhorst, Jacob AC AU - De Wet, Febe AB - When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the substantial improvements in acoustic modeling that deep architectures achieved for well-resourced languages ushered in a new data requirement: their development requires hundreds of hours of speech. A suitable strategy for the enlargement of speech resources for the South African languages is therefore required. The first possibility was to look for data that has already been collected but has not been included in an existing corpus. Additional data was collected during the NCHLT project that was not included in the official corpus: it only contains a curated, but limited subset of the data. In this paper, we first analyze the additional resources that could be harvested from the auxiliary NCHLT data. We also measure the effect of this data on acoustic modeling. The analysis incorporates recent factorized time-delay neural networks (TDNN-F). These models significantly reduce phone error rates for all languages. In addition, data augmentation and cross-corpus validation experiments for a number of the datasets illustrate the utility of the auxiliary NCHLT data. DA - 2019-08 DB - ResearchSpace DP - CSIR KW - Automatic speech recognition KW - Kaldi KW - Low-resource languages KW - Speech data KW - Speech technology KW - Time-delay neural networks LK - https://researchspace.csir.co.za PY - 2019 SM - 2078-2489 T1 - The usefulness of imperfect speech data for ASR development in low-resource languages TI - The usefulness of imperfect speech data for ASR development in low-resource languages UR - http://hdl.handle.net/10204/11196 ER -	en_ZA

Files in this item

Name: Badenhorst_2019.pdf

Size: 440.9Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Journal Articles

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.