ResearchSpace

Neural speech synthesis for resource-scarce languages

Show simple item record

dc.contributor.author Louw, Johannes A
dc.date.accessioned 2020-08-18T08:05:45Z
dc.date.available 2020-08-18T08:05:45Z
dc.date.issued 2019-12
dc.identifier.citation Louw, J.A. 2019. Neural speech synthesis for resource-scarce languages. In: Proceedings of the South African Forum for Artificial Intelligence, Cape Town, 4-6 December 2019 en_US
dc.identifier.issn 1613-0073
dc.identifier.uri http://ceur-ws.org/Vol-2540/
dc.identifier.uri http://ceur-ws.org/Vol-2540/FAIR2019_paper_66.pdf
dc.identifier.uri http://hdl.handle.net/10204/11541
dc.description Presented in: Proceedings of the South African Forum for Artificial Intelligence, Cape Town, 4-6 December 2019 en_US
dc.description.abstract Recent work in sequence-to-sequence neural networks with attention mechanisms, such as the Tacotron 2 and DCTTS architectures, have brought on substantial naturalness improvements in synthesised speech. These architectures require at least an order of magnitude more data than is generally available in resource-scarce language environments. In this paper we propose an efficient feed-forward deep neural network (DNN)-based acoustic model, using stacked bottleneck features, that together with the recently introduced LPCNet vocoder can be used in resource-scarce language environments, with corpora less than 1 hour in size, to build text-to-speech systems of high perceived naturalness. We compare traditional hidden Markov model (HMM)-based acoustic modelling for speech synthesis with the proposed architecture using the World and LPCNet vocoders, giving both objective and MUSHRA based subjective results, showing that the DNN LPCNet combination leads to more natural synthesised speech that can be confused with natural speech. The proposed acoustic model provides for an efficient implementation, with faster than real time synthesis. en_US
dc.language.iso en en_US
dc.publisher Ruzica Piskac en_US
dc.relation.ispartofseries Workflow;23305
dc.subject Deep Neural Network en_US
dc.subject DNN en_US
dc.subject Hidden Markov Model en_US
dc.subject HMM en_US
dc.subject LPCNet en_US
dc.subject Resource-scarce languages en_US
dc.title Neural speech synthesis for resource-scarce languages en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Louw, J. A. (2019). Neural speech synthesis for resource-scarce languages. Ruzica Piskac. http://hdl.handle.net/10204/11541 en_ZA
dc.identifier.chicagocitation Louw, Johannes A. "Neural speech synthesis for resource-scarce languages." (2019): http://hdl.handle.net/10204/11541 en_ZA
dc.identifier.vancouvercitation Louw JA, Neural speech synthesis for resource-scarce languages; Ruzica Piskac; 2019. http://hdl.handle.net/10204/11541 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Louw, Johannes A AB - Recent work in sequence-to-sequence neural networks with attention mechanisms, such as the Tacotron 2 and DCTTS architectures, have brought on substantial naturalness improvements in synthesised speech. These architectures require at least an order of magnitude more data than is generally available in resource-scarce language environments. In this paper we propose an efficient feed-forward deep neural network (DNN)-based acoustic model, using stacked bottleneck features, that together with the recently introduced LPCNet vocoder can be used in resource-scarce language environments, with corpora less than 1 hour in size, to build text-to-speech systems of high perceived naturalness. We compare traditional hidden Markov model (HMM)-based acoustic modelling for speech synthesis with the proposed architecture using the World and LPCNet vocoders, giving both objective and MUSHRA based subjective results, showing that the DNN LPCNet combination leads to more natural synthesised speech that can be confused with natural speech. The proposed acoustic model provides for an efficient implementation, with faster than real time synthesis. DA - 2019-12 DB - ResearchSpace DP - CSIR KW - Deep Neural Network KW - DNN KW - Hidden Markov Model KW - HMM KW - LPCNet KW - Resource-scarce languages LK - https://researchspace.csir.co.za PY - 2019 SM - 1613-0073 T1 - Neural speech synthesis for resource-scarce languages TI - Neural speech synthesis for resource-scarce languages UR - http://hdl.handle.net/10204/11541 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record