dc.contributor.author |
Louw, Johannes A
|
|
dc.date.accessioned |
2020-08-18T08:05:45Z |
|
dc.date.available |
2020-08-18T08:05:45Z |
|
dc.date.issued |
2019-12 |
|
dc.identifier.citation |
Louw, J.A. 2019. Neural speech synthesis for resource-scarce languages. In: Proceedings of the South African Forum for Artificial Intelligence, Cape Town, 4-6 December 2019 |
en_US |
dc.identifier.issn |
1613-0073 |
|
dc.identifier.uri |
http://ceur-ws.org/Vol-2540/
|
|
dc.identifier.uri |
http://ceur-ws.org/Vol-2540/FAIR2019_paper_66.pdf
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/11541
|
|
dc.description |
Presented in: Proceedings of the South African Forum for Artificial Intelligence, Cape Town, 4-6 December 2019 |
en_US |
dc.description.abstract |
Recent work in sequence-to-sequence neural networks with attention mechanisms, such as the Tacotron 2 and DCTTS architectures, have brought on substantial naturalness improvements in synthesised speech. These architectures require at least an order of magnitude more data than is generally available in resource-scarce language environments. In this paper we propose an efficient feed-forward deep neural network (DNN)-based acoustic model, using stacked bottleneck features, that together with the recently introduced LPCNet vocoder can be used in resource-scarce language environments, with corpora less than 1 hour in size, to build text-to-speech systems of high perceived naturalness. We compare traditional hidden Markov model (HMM)-based acoustic modelling for speech synthesis with the proposed architecture using the World and LPCNet vocoders, giving both objective and MUSHRA based subjective results, showing that the DNN LPCNet combination leads to more natural synthesised speech that can be confused with natural speech. The proposed acoustic model provides for an efficient implementation, with faster than real time synthesis. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Ruzica Piskac |
en_US |
dc.relation.ispartofseries |
Workflow;23305 |
|
dc.subject |
Deep Neural Network |
en_US |
dc.subject |
DNN |
en_US |
dc.subject |
Hidden Markov Model |
en_US |
dc.subject |
HMM |
en_US |
dc.subject |
LPCNet |
en_US |
dc.subject |
Resource-scarce languages |
en_US |
dc.title |
Neural speech synthesis for resource-scarce languages |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Louw, J. A. (2019). Neural speech synthesis for resource-scarce languages. Ruzica Piskac. http://hdl.handle.net/10204/11541 |
en_ZA |
dc.identifier.chicagocitation |
Louw, Johannes A. "Neural speech synthesis for resource-scarce languages." (2019): http://hdl.handle.net/10204/11541 |
en_ZA |
dc.identifier.vancouvercitation |
Louw JA, Neural speech synthesis for resource-scarce languages; Ruzica Piskac; 2019. http://hdl.handle.net/10204/11541 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Louw, Johannes A
AB - Recent work in sequence-to-sequence neural networks with attention mechanisms, such as the Tacotron 2 and DCTTS architectures, have brought on substantial naturalness improvements in synthesised speech. These architectures require at least an order of magnitude more data than is generally available in resource-scarce language environments. In this paper we propose an efficient feed-forward deep neural network (DNN)-based acoustic model, using stacked bottleneck features, that together with the recently introduced LPCNet vocoder can be used in resource-scarce language environments, with corpora less than 1 hour in size, to build text-to-speech systems of high perceived naturalness. We compare traditional hidden Markov model (HMM)-based acoustic modelling for speech synthesis with the proposed architecture using the World and LPCNet vocoders, giving both objective and MUSHRA based subjective results, showing that the DNN LPCNet combination leads to more natural synthesised speech that can be confused with natural speech. The proposed acoustic model provides for an efficient implementation, with faster than real time synthesis.
DA - 2019-12
DB - ResearchSpace
DP - CSIR
KW - Deep Neural Network
KW - DNN
KW - Hidden Markov Model
KW - HMM
KW - LPCNet
KW - Resource-scarce languages
LK - https://researchspace.csir.co.za
PY - 2019
SM - 1613-0073
T1 - Neural speech synthesis for resource-scarce languages
TI - Neural speech synthesis for resource-scarce languages
UR - http://hdl.handle.net/10204/11541
ER -
|
en_ZA |