dc.contributor.author |
Kleynhans, N
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.contributor.author |
Barnard, E
|
|
dc.date.accessioned |
2016-07-11T10:58:17Z |
|
dc.date.available |
2016-07-11T10:58:17Z |
|
dc.date.issued |
2015-11 |
|
dc.identifier.citation |
Kleynhans, N, De Wet, F and Barnard, E. 2015. Unsupervised acoustic model training: comparing South African English and isiZulu. In: Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), Port Elizabeth, South Africa, 25-26 November 2015 |
en_US |
dc.identifier.uri |
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7359512&tag=1
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/8629
|
|
dc.description |
Copyright: 2015 by IEEE.Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), Port Elizabeth, South Africa, 25-26 November 2015 |
en_US |
dc.description.abstract |
Large amounts of untranscribed audio data are generated every day. These audio resources can be used to develop robust acoustic models that can be used in a variety of speech-based systems. Manually transcribing this data is resource intensive and requires funding, time and expertise. Lightly-supervised training techniques, however, provide a means to rapidly transcribe audio, thus reducing the initial resource investment to begin the modelling process. Our findings suggest that the lightly-supervised training technique works well for English but when moving to an agglutinative language, such as isiZulu, the process fails to achieve the performance seen for English. Additionally, phone-based performances are significantly worse when compared to an approach using word-based language models. These results indicate a strong dependence on large or well-matched text resources for lightly-supervised training techniques. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
IEEE |
en_US |
dc.relation.ispartofseries |
Workflow;16006 |
|
dc.subject |
Untranscribed audio data |
en_US |
dc.subject |
isiZulu |
en_US |
dc.subject |
Word-based language models |
en_US |
dc.subject |
Lightly-supervised training |
en_US |
dc.subject |
Unsupervised training |
en_US |
dc.subject |
Automatic transcription generation |
en_US |
dc.subject |
Audio harvesting |
en_US |
dc.subject |
English language |
en_US |
dc.title |
Unsupervised acoustic model training: comparing South African English and isiZulu |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Kleynhans, N., De Wet, F., & Barnard, E. (2015). Unsupervised acoustic model training: comparing South African English and isiZulu. IEEE. http://hdl.handle.net/10204/8629 |
en_ZA |
dc.identifier.chicagocitation |
Kleynhans, N, Febe De Wet, and E Barnard. "Unsupervised acoustic model training: comparing South African English and isiZulu." (2015): http://hdl.handle.net/10204/8629 |
en_ZA |
dc.identifier.vancouvercitation |
Kleynhans N, De Wet F, Barnard E, Unsupervised acoustic model training: comparing South African English and isiZulu; IEEE; 2015. http://hdl.handle.net/10204/8629 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Kleynhans, N
AU - De Wet, Febe
AU - Barnard, E
AB - Large amounts of untranscribed audio data are generated every day. These audio resources can be used to develop robust acoustic models that can be used in a variety of speech-based systems. Manually transcribing this data is resource intensive and requires funding, time and expertise. Lightly-supervised training techniques, however, provide a means to rapidly transcribe audio, thus reducing the initial resource investment to begin the modelling process. Our findings suggest that the lightly-supervised training technique works well for English but when moving to an agglutinative language, such as isiZulu, the process fails to achieve the performance seen for English. Additionally, phone-based performances are significantly worse when compared to an approach using word-based language models. These results indicate a strong dependence on large or well-matched text resources for lightly-supervised training techniques.
DA - 2015-11
DB - ResearchSpace
DP - CSIR
KW - Untranscribed audio data
KW - isiZulu
KW - Word-based language models
KW - Lightly-supervised training
KW - Unsupervised training
KW - Automatic transcription generation
KW - Audio harvesting
KW - English language
LK - https://researchspace.csir.co.za
PY - 2015
T1 - Unsupervised acoustic model training: comparing South African English and isiZulu
TI - Unsupervised acoustic model training: comparing South African English and isiZulu
UR - http://hdl.handle.net/10204/8629
ER -
|
en_ZA |