Cross-lingual transfer using phonological features for resource-scarce text-to-speech

Louw, Johannes A

Cross-lingual transfer using phonological features for resource-scarce text-to-speech

http://hdl.handle.net/10204/13404

Abstract:

In this work, we explore the use of phonological features incross-lingual transfer within resource-scarce settings. We modify the architecture of VITS to accept a phonological feature vector as input, instead of phonemes or characters. Subsequently, we train multispeaker base models using data from LibriTTS and then fine-tune them on single-speaker Afrikaans and isiXhosa datasets of varying sizes, representing the resourcescarce setting. We evaluate the synthetic speech both objectively and subjectively and compare it to models trained with the same data using the standard VITS architecture. In our experiments, the proposed system utilizing phonological features as input converges significantly faster and requires less data than the base system. We demonstrate that the model employing phonological features is capable of producing sounds in the target language that were unseen in the source language, even in languages with significant linguistic differences, and with only 5 minutes of data in the target language.

Reference:

Louw, J.A. 2023. Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404 .

Louw, J. A. (2023). Cross-lingual transfer using phonological features for resource-scarce text-to-speech. http://hdl.handle.net/10204/13404

Louw, Johannes A. "Cross-lingual transfer using phonological features for resource-scarce text-to-speech." 12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023 (2023): http://hdl.handle.net/10204/13404

Louw JA, Cross-lingual transfer using phonological features for resource-scarce text-to-speech; 2023. http://hdl.handle.net/10204/13404 .

Download RIS

Louw, Johannes A

Aug 2023

Text-to-speech
Resource-scarce
Phonological features
Cross-lingual

Show full item record

Files in this item

RS_27267_NGEI_Cross lingual transfer using phonological features for resource scarce text to speech_202308.pdf

Source

12th ISCA Speech Synthesis Workshop, Grenoble, France, 26-28 August 2023

This item appears in the following Collection(s)

Conference Publications

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

Cross-lingual transfer using phonological features for resource-scarce text-to-speech

Cross-lingual transfer using phonological features for resource-scarce text-to-speech

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect