Pronunciation modelling of foreign words for Sepedi ASR

Modipa, T; Davel, MH

dc.contributor.author	Modipa, T
dc.contributor.author	Davel, MH
dc.date.accessioned	2010-12-23T10:00:21Z
dc.date.available	2010-12-23T10:00:21Z
dc.date.issued	2010-11
dc.identifier.citation	Modipa, T and Davel, MH. 2010. Pronunciation modelling of foreign words for Sepedi ASR. 21st Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Stellenbosch, South Africa, 22-23 November 2010, pp 185-189	en
dc.identifier.isbn	978-0-7992-2470-2
dc.identifier.uri	http://hdl.handle.net/10204/4715
dc.description	21st Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Stellenbosch, South Africa, 22-23 November 2010	en
dc.description.abstract	This study focuses on the effective pronunciation modelling of words from different languages encountered during the development of a Sepedi automatic speech recognition (ASR) system. While the speech corpus used for training the ASR system consists mostly of Sepedi utterances, many words from English (and other South African languages) are embedded within the Sepedi sentences. In order to model these words effectively, different approaches to pronunciation dictionary development are investigated, specifically: (1) using language-specific letter-to-sound rules to predict the pronunciation of each word (based on the language of the word) and mapping foreign phonemes to Sepedi phonemes using linguistically motivated mappings, (2) experimenting with data-driven foreign-to-Sepedi phonemes using linguistically motivated mappings, and (3) using Sepedi letter-to-sound to predict the pronunciation of all words irrespective of language. We find that the data-driven phoneme mappings are more accurate than the initial linguistically motivated mappings evaluated, and (with a slight margin) obtain our best result using Sepedi letter-to-sound rules across all words in the speech corpus.	en
dc.language.iso	en	en
dc.publisher	PRASA 2010	en
dc.relation.ispartofseries	Conference Paper	en
dc.subject	Sepedi	en
dc.subject	Automatic speech recognition	en
dc.subject	Pronunciation modelling	en
dc.subject	Pattern recognition	en
dc.subject	PRASA 2010	en
dc.title	Pronunciation modelling of foreign words for Sepedi ASR	en
dc.type	Conference Presentation	en
dc.identifier.apacitation	Modipa, T., & Davel, M. (2010). Pronunciation modelling of foreign words for Sepedi ASR. PRASA 2010. http://hdl.handle.net/10204/4715	en_ZA
dc.identifier.chicagocitation	Modipa, T, and MH Davel. "Pronunciation modelling of foreign words for Sepedi ASR." (2010): http://hdl.handle.net/10204/4715	en_ZA
dc.identifier.vancouvercitation	Modipa T, Davel M, Pronunciation modelling of foreign words for Sepedi ASR; PRASA 2010; 2010. http://hdl.handle.net/10204/4715 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Modipa, T AU - Davel, MH AB - This study focuses on the effective pronunciation modelling of words from different languages encountered during the development of a Sepedi automatic speech recognition (ASR) system. While the speech corpus used for training the ASR system consists mostly of Sepedi utterances, many words from English (and other South African languages) are embedded within the Sepedi sentences. In order to model these words effectively, different approaches to pronunciation dictionary development are investigated, specifically: (1) using language-specific letter-to-sound rules to predict the pronunciation of each word (based on the language of the word) and mapping foreign phonemes to Sepedi phonemes using linguistically motivated mappings, (2) experimenting with data-driven foreign-to-Sepedi phonemes using linguistically motivated mappings, and (3) using Sepedi letter-to-sound to predict the pronunciation of all words irrespective of language. We find that the data-driven phoneme mappings are more accurate than the initial linguistically motivated mappings evaluated, and (with a slight margin) obtain our best result using Sepedi letter-to-sound rules across all words in the speech corpus. DA - 2010-11 DB - ResearchSpace DP - CSIR KW - Sepedi KW - Automatic speech recognition KW - Pronunciation modelling KW - Pattern recognition KW - PRASA 2010 LK - https://researchspace.csir.co.za PY - 2010 SM - 978-0-7992-2470-2 T1 - Pronunciation modelling of foreign words for Sepedi ASR TI - Pronunciation modelling of foreign words for Sepedi ASR UR - http://hdl.handle.net/10204/4715 ER -	en_ZA

Files in this item

Name: Modipa_2010.pdf

Size: 2.750Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.