ResearchSpace

Transformer-based machine translation for low-resourced languages embedded with language identification

Show simple item record

dc.contributor.author Sefara, Tshephisho J
dc.contributor.author Zwane, SG
dc.contributor.author Gama, N
dc.contributor.author Sibisi, H
dc.contributor.author Senoamadi, PN
dc.contributor.author Marivate, V
dc.date.accessioned 2021-08-16T12:47:45Z
dc.date.available 2021-08-16T12:47:45Z
dc.date.issued 2021-04
dc.identifier.citation Sefara, T.J., Zwane, S., Gama, N., Sibisi, H., Senoamadi, P. & Marivate, V. 2021. Transformer-based machine translation for low-resourced languages embedded with language identification. http://hdl.handle.net/10204/12082 . en_ZA
dc.identifier.isbn 978-1-7281-8081-6
dc.identifier.isbn 978-1-7281-8082-3
dc.identifier.uri DOI: 10.1109/ICTAS50802.2021.9394996
dc.identifier.uri http://hdl.handle.net/10204/12082
dc.description.abstract Recent research on the development of machine translation (MT) models has resulted in state-of-the-art performance for many resourced European languages. However, there has been a little focus on applying these MT services to low-resourced languages. This paper presents the development of neural machine translation (NMT) for lowresourced languages of South Africa. Two MT models, JoeyNMT and transformer NMT with self-attention are trained and evaluated using BLEU score. The transformer NMT with self-attention obtained state-of-the-art performance on isiNdebele, SiSwati, Setswana, Tshivenda, isiXhosa, and Sepedi while JoeyNMT performed well on isiZulu. The MT models are embedded with language identification (LID) model that presets the language for translation models. The LID models are trained using logistic regression and multinomial naive Bayes (MNB). MNB classifier obtained an accuracy of 99% outperforming logistic regression which obtained the lowest accuracy of 97%. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri https://ieeexplore.ieee.org/document/9394996 en_US
dc.source 5th Conference on Information Communications Technology and Society, Durban, South Africa, 10 -11 March 2021 en_US
dc.subject Language identification en_US
dc.subject Low-resourced languages en_US
dc.subject Machine translation en_US
dc.subject Neural network en_US
dc.title Transformer-based machine translation for low-resourced languages embedded with language identification en_US
dc.type Conference Presentation en_US
dc.description.pages 6pp en_US
dc.description.note ©2021 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full-text item. For access to the full-text item, please consult the publisher's website: https://ieeexplore.ieee.org/document/9394996 en_US
dc.description.cluster Next Generation Enterprises & Institutions
dc.description.impactarea Data Science en_US
dc.identifier.apacitation Sefara, T. J., Zwane, S., Gama, N., Sibisi, H., Senoamadi, P., & Marivate, V. (2021). Transformer-based machine translation for low-resourced languages embedded with language identification. http://hdl.handle.net/10204/12082 en_ZA
dc.identifier.chicagocitation Sefara, Tshephisho J, SG Zwane, N Gama, H Sibisi, PN Senoamadi, and V Marivate. "Transformer-based machine translation for low-resourced languages embedded with language identification." <i>5th Conference on Information Communications Technology and Society, Durban, South Africa, 10 -11 March 2021</i> (2021): http://hdl.handle.net/10204/12082 en_ZA
dc.identifier.vancouvercitation Sefara TJ, Zwane S, Gama N, Sibisi H, Senoamadi P, Marivate V, Transformer-based machine translation for low-resourced languages embedded with language identification; 2021. http://hdl.handle.net/10204/12082 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Sefara, Tshephisho J AU - Zwane, SG AU - Gama, N AU - Sibisi, H AU - Senoamadi, PN AU - Marivate, V AB - Recent research on the development of machine translation (MT) models has resulted in state-of-the-art performance for many resourced European languages. However, there has been a little focus on applying these MT services to low-resourced languages. This paper presents the development of neural machine translation (NMT) for lowresourced languages of South Africa. Two MT models, JoeyNMT and transformer NMT with self-attention are trained and evaluated using BLEU score. The transformer NMT with self-attention obtained state-of-the-art performance on isiNdebele, SiSwati, Setswana, Tshivenda, isiXhosa, and Sepedi while JoeyNMT performed well on isiZulu. The MT models are embedded with language identification (LID) model that presets the language for translation models. The LID models are trained using logistic regression and multinomial naive Bayes (MNB). MNB classifier obtained an accuracy of 99% outperforming logistic regression which obtained the lowest accuracy of 97%. DA - 2021-04 DB - ResearchSpace DP - CSIR J1 - 5th Conference on Information Communications Technology and Society, Durban, South Africa, 10 -11 March 2021 KW - Language identification KW - Low-resourced languages KW - Machine translation KW - Neural network LK - https://researchspace.csir.co.za PY - 2021 SM - 978-1-7281-8081-6 SM - 978-1-7281-8082-3 T1 - Transformer-based machine translation for low-resourced languages embedded with language identification TI - Transformer-based machine translation for low-resourced languages embedded with language identification UR - http://hdl.handle.net/10204/12082 ER - en_ZA
dc.identifier.worklist 24841 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record