Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages

Louw, Johannes A; Moodley, Avashlin

dc.contributor.author	Louw, Johannes A
dc.contributor.author	Moodley, Avashlin
dc.date.accessioned	2020-03-21T12:05:40Z
dc.date.available	2020-03-21T12:05:40Z
dc.date.issued	2017-12
dc.identifier.citation	Louw, J.A. and Moodley, A. 2017. Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages. PRASA-RobMech International Conference, Bloemfontein, Free State, South Africa, 29 November - 1 December 2017, 6pp.	en_US
dc.identifier.isbn	978-1-5386-2314-5
dc.identifier.isbn	978-1-5386-2315-2
dc.identifier.uri	http://www.rgems.co.za/Downloads/Events/2017_PRASA-RobMech_Program.pdf
dc.identifier.uri	DOI: 10.1109/RoboMech.2017.8261147
dc.identifier.uri	https://ieeexplore.ieee.org/document/8261147
dc.identifier.uri	http://hdl.handle.net/10204/11372
dc.description	Copyright: 2017 IEEE. This is the pre-print version of the work. For access to the full text item, kindly consult the publisher's website.	en_US
dc.description.abstract	In this paper an automatic method to implicitly model intonation for statistical parametric speech synthesis (SPSS) is presented. The approach is ideally suited to single speaker speech databases as used in text-to-speech (TTS), due to the models being speaker-specific. Fundamental frequency curves are automatically stylized based on the speaker-specific acoustics in the recorded database, requiring no models rooted in linguistic theory, and therefore being well suited to intonation modelling in under-resourced languages. The stylized curves are then coded into abstract pitch labels, which are used as features in the training of the statistical parametric acoustic models. A conditional random field (CRF) model is trained in order to predict the abstract pitch labels from the text for synthesis. The CRF model can be used to predict the abstract pitch labels on the syllable, word and phrase tiers. Objective and subjective results on synthetic voices built from English and isiXhosa speech databases are shown.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartofseries	Worklist;19990
dc.subject	Automatic stylization	en_US
dc.subject	Prosody	en_US
dc.subject	Speech synthesis	en_US
dc.subject	Text-to-speech	en_US
dc.title	Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages	en_US
dc.type	Conference Presentation	en_US
dc.identifier.apacitation	Louw, J. A., & Moodley, A. (2017). Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages. IEEE. http://hdl.handle.net/10204/11372	en_ZA
dc.identifier.chicagocitation	Louw, Johannes A, and Avashlin Moodley. "Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages." (2017): http://hdl.handle.net/10204/11372	en_ZA
dc.identifier.vancouvercitation	Louw JA, Moodley A, Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages; IEEE; 2017. http://hdl.handle.net/10204/11372 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Louw, Johannes A AU - Moodley, Avashlin AB - In this paper an automatic method to implicitly model intonation for statistical parametric speech synthesis (SPSS) is presented. The approach is ideally suited to single speaker speech databases as used in text-to-speech (TTS), due to the models being speaker-specific. Fundamental frequency curves are automatically stylized based on the speaker-specific acoustics in the recorded database, requiring no models rooted in linguistic theory, and therefore being well suited to intonation modelling in under-resourced languages. The stylized curves are then coded into abstract pitch labels, which are used as features in the training of the statistical parametric acoustic models. A conditional random field (CRF) model is trained in order to predict the abstract pitch labels from the text for synthesis. The CRF model can be used to predict the abstract pitch labels on the syllable, word and phrase tiers. Objective and subjective results on synthetic voices built from English and isiXhosa speech databases are shown. DA - 2017-12 DB - ResearchSpace DP - CSIR KW - Automatic stylization KW - Prosody KW - Speech synthesis KW - Text-to-speech LK - https://researchspace.csir.co.za PY - 2017 SM - 978-1-5386-2314-5 SM - 978-1-5386-2315-2 T1 - Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages TI - Automatic stylization, coding and modelling of intonation in text-to-speech for under-resourced languages UR - http://hdl.handle.net/10204/11372 ER -	en_ZA

Files in this item

Name: RS_19990_Automatic ...

Size: 188.8Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.