dc.contributor.author |
Louw, Johannes A
|
|
dc.contributor.author |
Moodley, Avashlin
|
|
dc.date.accessioned |
2020-03-21T12:05:40Z |
|
dc.date.available |
2020-03-21T12:05:40Z |
|
dc.date.issued |
2017-12 |
|
dc.identifier.citation |
Louw, J.A. and Moodley, A. 2017. Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages. PRASA-RobMech International Conference, Bloemfontein, Free State, South Africa, 29 November - 1 December 2017, 6pp. |
en_US |
dc.identifier.isbn |
978-1-5386-2314-5 |
|
dc.identifier.isbn |
978-1-5386-2315-2 |
|
dc.identifier.uri |
http://www.rgems.co.za/Downloads/Events/2017_PRASA-RobMech_Program.pdf
|
|
dc.identifier.uri |
DOI: 10.1109/RoboMech.2017.8261147
|
|
dc.identifier.uri |
https://ieeexplore.ieee.org/document/8261147
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/11372
|
|
dc.description |
Copyright: 2017 IEEE. This is the pre-print version of the work. For access to the full text item, kindly consult the publisher's website. |
en_US |
dc.description.abstract |
In this paper an automatic method to implicitly model intonation for statistical parametric speech synthesis (SPSS) is presented. The approach is ideally suited to single speaker speech databases as used in text-to-speech (TTS), due to the models being speaker-specific. Fundamental frequency curves are automatically stylized based on the speaker-specific acoustics in the recorded database, requiring no models rooted in linguistic theory, and therefore being well suited to intonation modelling in under-resourced languages. The stylized curves are then coded into abstract pitch labels, which are used as features in the training of the statistical parametric acoustic models. A conditional random field (CRF) model is trained in order to predict the abstract pitch labels from the text for synthesis. The CRF model can be used to predict the abstract pitch labels on the syllable, word and phrase tiers. Objective and subjective results on synthetic voices built from English and isiXhosa speech databases are shown. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
IEEE |
en_US |
dc.relation.ispartofseries |
Worklist;19990 |
|
dc.subject |
Automatic stylization |
en_US |
dc.subject |
Prosody |
en_US |
dc.subject |
Speech synthesis |
en_US |
dc.subject |
Text-to-speech |
en_US |
dc.title |
Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Louw, J. A., & Moodley, A. (2017). Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages. IEEE. http://hdl.handle.net/10204/11372 |
en_ZA |
dc.identifier.chicagocitation |
Louw, Johannes A, and Avashlin Moodley. "Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages." (2017): http://hdl.handle.net/10204/11372 |
en_ZA |
dc.identifier.vancouvercitation |
Louw JA, Moodley A, Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages; IEEE; 2017. http://hdl.handle.net/10204/11372 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Louw, Johannes A
AU - Moodley, Avashlin
AB - In this paper an automatic method to implicitly model intonation for statistical parametric speech synthesis (SPSS) is presented. The approach is ideally suited to single speaker speech databases as used in text-to-speech (TTS), due to the models being speaker-specific. Fundamental frequency curves are automatically stylized based on the speaker-specific acoustics in the recorded database, requiring no models rooted in linguistic theory, and therefore being well suited to intonation modelling in under-resourced languages. The stylized curves are then coded into abstract pitch labels, which are used as features in the training of the statistical parametric acoustic models. A conditional random field (CRF) model is trained in order to predict the abstract pitch labels from the text for synthesis. The CRF model can be used to predict the abstract pitch labels on the syllable, word and phrase tiers. Objective and subjective results on synthetic voices built from English and isiXhosa speech databases are shown.
DA - 2017-12
DB - ResearchSpace
DP - CSIR
KW - Automatic stylization
KW - Prosody
KW - Speech synthesis
KW - Text-to-speech
LK - https://researchspace.csir.co.za
PY - 2017
SM - 978-1-5386-2314-5
SM - 978-1-5386-2315-2
T1 - Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages
TI - Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages
UR - http://hdl.handle.net/10204/11372
ER -
|
en_ZA |