Recent research on the development of machine translation (MT) models has resulted in state-of-the-art performance for many resourced European languages. However, there has been a little focus on applying these MT services to low-resourced languages. This paper presents the development of neural machine translation (NMT) for lowresourced languages of South Africa. Two MT models, JoeyNMT and transformer NMT with self-attention are trained and evaluated using BLEU score. The transformer NMT with self-attention obtained state-of-the-art performance on isiNdebele, SiSwati, Setswana, Tshivenda, isiXhosa, and Sepedi while JoeyNMT performed well on isiZulu. The MT models are embedded with language identification (LID) model that presets the language for translation models. The LID models are trained using logistic regression and multinomial naive Bayes (MNB). MNB classifier obtained an accuracy of 99% outperforming logistic regression which obtained the lowest accuracy of 97%.
Reference:
Sefara, T.J., Zwane, S., Gama, N., Sibisi, H., Senoamadi, P. & Marivate, V. 2021. Transformer-based machine translation for low-resourced languages embedded with language identification. http://hdl.handle.net/10204/12082 .
Sefara, T. J., Zwane, S., Gama, N., Sibisi, H., Senoamadi, P., & Marivate, V. (2021). Transformer-based machine translation for low-resourced languages embedded with language identification. http://hdl.handle.net/10204/12082
Sefara, Tshephisho J, SG Zwane, N Gama, H Sibisi, PN Senoamadi, and V Marivate. "Transformer-based machine translation for low-resourced languages embedded with language identification." 5th Conference on Information Communications Technology and Society, Durban, South Africa, 10 -11 March 2021 (2021): http://hdl.handle.net/10204/12082
Sefara TJ, Zwane S, Gama N, Sibisi H, Senoamadi P, Marivate V, Transformer-based machine translation for low-resourced languages embedded with language identification; 2021. http://hdl.handle.net/10204/12082 .