The NCHLT speech corpus of the South African languages

Barnard, E; Davel, MH; Van Heerden, C; De Wet, Febe; Badenhorst, J

The NCHLT speech corpus of the South African languages

http://mica.edu.vn/sltu2014/proceedings/28.pdf
http://hdl.handle.net/10204/7549

Abstract:

The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven of cial languages of South Africa. We describe the design and development processes that were undertaken in order to develop the corpus, and report on associated materials such as orthographic transcriptions and pronunciation dictionaries that were released as part of the corpus. In order to benchmark speech recognition performance on the corpus, we have also developed both phone-recognition and word-recognition systems for all eleven languages; we nd that high accuracies can be achieved for these speaker-independent but vocabulary-dependent recognition tasks in all languages.

Reference:

Barnard, E, Davel, M.H, Van Heerden, C, De Wet, F and Badenhorst, J. 2014. The NCHLT speech corpus of the South African languages. In: 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages, St Petersburg, Russia, 14-16 May 2014

Barnard, E., Davel, M., Van Heerden, C., De Wet, F., & Badenhorst, J. (2014). The NCHLT speech corpus of the South African languages. http://hdl.handle.net/10204/7549

Barnard, E, MH Davel, C Van Heerden, Febe De Wet, and J Badenhorst. "The NCHLT speech corpus of the South African languages." (2014): http://hdl.handle.net/10204/7549

Barnard E, Davel M, Van Heerden C, De Wet F, Badenhorst J, The NCHLT speech corpus of the South African languages; 2014. http://hdl.handle.net/10204/7549 .

Download RIS

4th International Workshop on Spoken Language Technologies for Under-Resourced Languages, St Petersburg, Russia, 14-16 May 2014

Barnard, E
Davel, MH
Van Heerden, C
De Wet, Febe
Badenhorst, J

May 2014

Automatic Speech Recognition
ASR
Text-to speech
TTS
South African languages
Spoken language technologies
Under-resources languages

Show full item record

Files in this item

De Wet_2014_ABSTRACT ONLY.pdf

This item appears in the following Collection(s)

Conference Publications

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

The NCHLT speech corpus of the South African languages

The NCHLT speech corpus of the South African languages

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect