dc.contributor.author |
Louw, Johannes A
|
|
dc.date.accessioned |
2023-12-12T11:18:39Z |
|
dc.date.available |
2023-12-12T11:18:39Z |
|
dc.date.issued |
2023-08 |
|
dc.identifier.citation |
Louw, J.A. 2023. Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404 . |
en_ZA |
dc.identifier.uri |
http://hdl.handle.net/10204/13404
|
|
dc.description.abstract |
In this work, we explore the use of phonological features incross-lingual transfer within resource-scarce settings. We modify the architecture of VITS to accept a phonological feature vector as input, instead of phonemes or characters. Subsequently, we train multispeaker base models using data from LibriTTS and then fine-tune them on single-speaker Afrikaans and isiXhosa datasets of varying sizes, representing the resourcescarce setting. We evaluate the synthetic speech both objectively and subjectively and compare it to models trained with the same data using the standard VITS architecture. In our experiments, the proposed system utilizing phonological features as input converges significantly faster and requires less data than the base system. We demonstrate that the model employing phonological features is capable of producing sounds in the target language that were unseen in the source language, even in languages with significant linguistic differences, and with only 5 minutes of data in the target language. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
https://www.isca-speech.org/archive/ssw_2023/index.html |
en_US |
dc.source |
12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023 |
en_US |
dc.subject |
Text-to-speech |
en_US |
dc.subject |
Resource-scarce |
en_US |
dc.subject |
Phonological features |
en_US |
dc.subject |
Cross-lingual |
en_US |
dc.title |
Cross-lingual transfer using phonological features for resource-scarce text-to-speech |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.description.pages |
7 |
en_US |
dc.description.note |
Paper presented at the 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023 |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Voice Computing |
en_US |
dc.identifier.apacitation |
Louw, J. A. (2023). Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404 |
en_ZA |
dc.identifier.chicagocitation |
Louw, Johannes A. "Cross-lingual transfer using phonological features for resource-scarce text-to-speech." <i>12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023</i> (2023): http://hdl.handle.net/10204/13404 |
en_ZA |
dc.identifier.vancouvercitation |
Louw JA, Cross-lingual transfer using phonological features for resource-scarce text-to-speech; 2023. http://hdl.handle.net/10204/13404 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Louw, Johannes A
AB - In this work, we explore the use of phonological features incross-lingual transfer within resource-scarce settings. We modify the architecture of VITS to accept a phonological feature vector as input, instead of phonemes or characters. Subsequently, we train multispeaker base models using data from LibriTTS and then fine-tune them on single-speaker Afrikaans and isiXhosa datasets of varying sizes, representing the resourcescarce setting. We evaluate the synthetic speech both objectively and subjectively and compare it to models trained with the same data using the standard VITS architecture. In our experiments, the proposed system utilizing phonological features as input converges significantly faster and requires less data than the base system. We demonstrate that the model employing phonological features is capable of producing sounds in the target language that were unseen in the source language, even in languages with significant linguistic differences, and with only 5 minutes of data in the target language.
DA - 2023-08
DB - ResearchSpace
DP - CSIR
J1 - 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023
KW - Text-to-speech
KW - Resource-scarce
KW - Phonological features
KW - Cross-lingual
LK - https://researchspace.csir.co.za
PY - 2023
T1 - Cross-lingual transfer using phonological features for resource-scarce text-to-speech
TI - Cross-lingual transfer using phonological features for resource-scarce text-to-speech
UR - http://hdl.handle.net/10204/13404
ER -
|
en_ZA |
dc.identifier.worklist |
27267 |
en_US |