Cross-lingual transfer using phonological features for resource-scarce text-to-speech

Louw, Johannes A

dc.contributor.author	Louw, Johannes A
dc.date.accessioned	2023-12-12T11:18:39Z
dc.date.available	2023-12-12T11:18:39Z
dc.date.issued	2023-08
dc.identifier.citation	Louw, J.A. 2023. Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404 .	en_ZA
dc.identifier.uri	http://hdl.handle.net/10204/13404
dc.description.abstract	In this work, we explore the use of phonological features incross-lingual transfer within resource-scarce settings. We modify the architecture of VITS to accept a phonological feature vector as input, instead of phonemes or characters. Subsequently, we train multispeaker base models using data from LibriTTS and then fine-tune them on single-speaker Afrikaans and isiXhosa datasets of varying sizes, representing the resourcescarce setting. We evaluate the synthetic speech both objectively and subjectively and compare it to models trained with the same data using the standard VITS architecture. In our experiments, the proposed system utilizing phonological features as input converges significantly faster and requires less data than the base system. We demonstrate that the model employing phonological features is capable of producing sounds in the target language that were unseen in the source language, even in languages with significant linguistic differences, and with only 5 minutes of data in the target language.	en_US
dc.format	Fulltext	en_US
dc.language.iso	en	en_US
dc.relation.uri	https://www.isca-speech.org/archive/ssw_2023/index.html	en_US
dc.source	12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023	en_US
dc.subject	Text-to-speech	en_US
dc.subject	Resource-scarce	en_US
dc.subject	Phonological features	en_US
dc.subject	Cross-lingual	en_US
dc.title	Cross-lingual transfer using phonological features for resource-scarce text-to-speech	en_US
dc.type	Conference Presentation	en_US
dc.description.pages	7	en_US
dc.description.note	Paper presented at the 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023	en_US
dc.description.cluster	Next Generation Enterprises & Institutions	en_US
dc.description.impactarea	Voice Computing	en_US
dc.identifier.apacitation	Louw, J. A. (2023). Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404	en_ZA
dc.identifier.chicagocitation	Louw, Johannes A. "Cross-lingual transfer using phonological features for resource-scarce text-to-speech." <i>12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023</i> (2023): http://hdl.handle.net/10204/13404	en_ZA
dc.identifier.vancouvercitation	Louw JA, Cross-lingual transfer using phonological features for resource-scarce text-to-speech; 2023. http://hdl.handle.net/10204/13404 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Louw, Johannes A AB - In this work, we explore the use of phonological features incross-lingual transfer within resource-scarce settings. We modify the architecture of VITS to accept a phonological feature vector as input, instead of phonemes or characters. Subsequently, we train multispeaker base models using data from LibriTTS and then fine-tune them on single-speaker Afrikaans and isiXhosa datasets of varying sizes, representing the resourcescarce setting. We evaluate the synthetic speech both objectively and subjectively and compare it to models trained with the same data using the standard VITS architecture. In our experiments, the proposed system utilizing phonological features as input converges significantly faster and requires less data than the base system. We demonstrate that the model employing phonological features is capable of producing sounds in the target language that were unseen in the source language, even in languages with significant linguistic differences, and with only 5 minutes of data in the target language. DA - 2023-08 DB - ResearchSpace DP - CSIR J1 - 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023 KW - Text-to-speech KW - Resource-scarce KW - Phonological features KW - Cross-lingual LK - https://researchspace.csir.co.za PY - 2023 T1 - Cross-lingual transfer using phonological features for resource-scarce text-to-speech TI - Cross-lingual transfer using phonological features for resource-scarce text-to-speech UR - http://hdl.handle.net/10204/13404 ER -	en_ZA
dc.identifier.worklist	27267	en_US

Files in this item

Name: RS_27267_NGEI_Cross ...

Size: 554.1Kb

Format: PDF

Description: Conference paper

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.