ResearchSpace

Cross-lingual transfer using phonological features for resource-scarce text-to-speech

Show simple item record

dc.contributor.author Louw, Johannes A
dc.date.accessioned 2023-12-12T11:18:39Z
dc.date.available 2023-12-12T11:18:39Z
dc.date.issued 2023-08
dc.identifier.citation Louw, J.A. 2023. Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404 . en_ZA
dc.identifier.uri http://hdl.handle.net/10204/13404
dc.description.abstract In this work, we explore the use of phonological features incross-lingual transfer within resource-scarce settings. We modify the architecture of VITS to accept a phonological feature vector as input, instead of phonemes or characters. Subsequently, we train multispeaker base models using data from LibriTTS and then fine-tune them on single-speaker Afrikaans and isiXhosa datasets of varying sizes, representing the resourcescarce setting. We evaluate the synthetic speech both objectively and subjectively and compare it to models trained with the same data using the standard VITS architecture. In our experiments, the proposed system utilizing phonological features as input converges significantly faster and requires less data than the base system. We demonstrate that the model employing phonological features is capable of producing sounds in the target language that were unseen in the source language, even in languages with significant linguistic differences, and with only 5 minutes of data in the target language. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri https://www.isca-speech.org/archive/ssw_2023/index.html en_US
dc.source 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023 en_US
dc.subject Text-to-speech en_US
dc.subject Resource-scarce en_US
dc.subject Phonological features en_US
dc.subject Cross-lingual en_US
dc.title Cross-lingual transfer using phonological features for resource-scarce text-to-speech en_US
dc.type Conference Presentation en_US
dc.description.pages 7 en_US
dc.description.note Paper presented at the 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023 en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Voice Computing en_US
dc.identifier.apacitation Louw, J. A. (2023). Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404 en_ZA
dc.identifier.chicagocitation Louw, Johannes A. "Cross-lingual transfer using phonological features for resource-scarce text-to-speech." <i>12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023</i> (2023): http://hdl.handle.net/10204/13404 en_ZA
dc.identifier.vancouvercitation Louw JA, Cross-lingual transfer using phonological features for resource-scarce text-to-speech; 2023. http://hdl.handle.net/10204/13404 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Louw, Johannes A AB - In this work, we explore the use of phonological features incross-lingual transfer within resource-scarce settings. We modify the architecture of VITS to accept a phonological feature vector as input, instead of phonemes or characters. Subsequently, we train multispeaker base models using data from LibriTTS and then fine-tune them on single-speaker Afrikaans and isiXhosa datasets of varying sizes, representing the resourcescarce setting. We evaluate the synthetic speech both objectively and subjectively and compare it to models trained with the same data using the standard VITS architecture. In our experiments, the proposed system utilizing phonological features as input converges significantly faster and requires less data than the base system. We demonstrate that the model employing phonological features is capable of producing sounds in the target language that were unseen in the source language, even in languages with significant linguistic differences, and with only 5 minutes of data in the target language. DA - 2023-08 DB - ResearchSpace DP - CSIR J1 - 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023 KW - Text-to-speech KW - Resource-scarce KW - Phonological features KW - Cross-lingual LK - https://researchspace.csir.co.za PY - 2023 T1 - Cross-lingual transfer using phonological features for resource-scarce text-to-speech TI - Cross-lingual transfer using phonological features for resource-scarce text-to-speech UR - http://hdl.handle.net/10204/13404 ER - en_ZA
dc.identifier.worklist 27267 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record