dc.contributor.author |
Sefara, Tshephisho J
|
|
dc.contributor.author |
Zwane, SG
|
|
dc.contributor.author |
Gama, N
|
|
dc.contributor.author |
Sibisi, H
|
|
dc.contributor.author |
Senoamadi, PN
|
|
dc.contributor.author |
Marivate, V
|
|
dc.date.accessioned |
2021-08-16T12:47:45Z |
|
dc.date.available |
2021-08-16T12:47:45Z |
|
dc.date.issued |
2021-04 |
|
dc.identifier.citation |
Sefara, T.J., Zwane, S., Gama, N., Sibisi, H., Senoamadi, P. & Marivate, V. 2021. Transformer-based machine translation for low-resourced languages embedded with language identification. http://hdl.handle.net/10204/12082 . |
en_ZA |
dc.identifier.isbn |
978-1-7281-8081-6 |
|
dc.identifier.isbn |
978-1-7281-8082-3 |
|
dc.identifier.uri |
DOI: 10.1109/ICTAS50802.2021.9394996
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/12082
|
|
dc.description.abstract |
Recent research on the development of machine translation (MT) models has resulted in state-of-the-art performance for many resourced European languages. However, there has been a little focus on applying these MT services to low-resourced languages. This paper presents the development of neural machine translation (NMT) for lowresourced languages of South Africa. Two MT models, JoeyNMT and transformer NMT with self-attention are trained and evaluated using BLEU score. The transformer NMT with self-attention obtained state-of-the-art performance on isiNdebele, SiSwati, Setswana, Tshivenda, isiXhosa, and Sepedi while JoeyNMT performed well on isiZulu. The MT models are embedded with language identification (LID) model that presets the language for translation models. The LID models are trained using logistic regression and multinomial naive Bayes (MNB). MNB classifier obtained an accuracy of 99% outperforming logistic regression which obtained the lowest accuracy of 97%. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
https://ieeexplore.ieee.org/document/9394996 |
en_US |
dc.source |
5th Conference on Information Communications Technology and Society, Durban, South Africa, 10 -11 March 2021 |
en_US |
dc.subject |
Language identification |
en_US |
dc.subject |
Low-resourced languages |
en_US |
dc.subject |
Machine translation |
en_US |
dc.subject |
Neural network |
en_US |
dc.title |
Transformer-based machine translation for low-resourced languages embedded with language identification |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.description.pages |
6pp |
en_US |
dc.description.note |
©2021 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full-text item. For access to the full-text item, please consult the publisher's website: https://ieeexplore.ieee.org/document/9394996 |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
|
dc.description.impactarea |
Data Science |
en_US |
dc.identifier.apacitation |
Sefara, T. J., Zwane, S., Gama, N., Sibisi, H., Senoamadi, P., & Marivate, V. (2021). Transformer-based machine translation for low-resourced languages embedded with language identification. http://hdl.handle.net/10204/12082 |
en_ZA |
dc.identifier.chicagocitation |
Sefara, Tshephisho J, SG Zwane, N Gama, H Sibisi, PN Senoamadi, and V Marivate. "Transformer-based machine translation for low-resourced languages embedded with language identification." <i>5th Conference on Information Communications Technology and Society, Durban, South Africa, 10 -11 March 2021</i> (2021): http://hdl.handle.net/10204/12082 |
en_ZA |
dc.identifier.vancouvercitation |
Sefara TJ, Zwane S, Gama N, Sibisi H, Senoamadi P, Marivate V, Transformer-based machine translation for low-resourced languages embedded with language identification; 2021. http://hdl.handle.net/10204/12082 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Sefara, Tshephisho J
AU - Zwane, SG
AU - Gama, N
AU - Sibisi, H
AU - Senoamadi, PN
AU - Marivate, V
AB - Recent research on the development of machine translation (MT) models has resulted in state-of-the-art performance for many resourced European languages. However, there has been a little focus on applying these MT services to low-resourced languages. This paper presents the development of neural machine translation (NMT) for lowresourced languages of South Africa. Two MT models, JoeyNMT and transformer NMT with self-attention are trained and evaluated using BLEU score. The transformer NMT with self-attention obtained state-of-the-art performance on isiNdebele, SiSwati, Setswana, Tshivenda, isiXhosa, and Sepedi while JoeyNMT performed well on isiZulu. The MT models are embedded with language identification (LID) model that presets the language for translation models. The LID models are trained using logistic regression and multinomial naive Bayes (MNB). MNB classifier obtained an accuracy of 99% outperforming logistic regression which obtained the lowest accuracy of 97%.
DA - 2021-04
DB - ResearchSpace
DP - CSIR
J1 - 5th Conference on Information Communications Technology and Society, Durban, South Africa, 10 -11 March 2021
KW - Language identification
KW - Low-resourced languages
KW - Machine translation
KW - Neural network
LK - https://researchspace.csir.co.za
PY - 2021
SM - 978-1-7281-8081-6
SM - 978-1-7281-8082-3
T1 - Transformer-based machine translation for low-resourced languages embedded with language identification
TI - Transformer-based machine translation for low-resourced languages embedded with language identification
UR - http://hdl.handle.net/10204/12082
ER - |
en_ZA |
dc.identifier.worklist |
24841 |
en_US |