ResearchSpace

Generation of segmented isiZulu text

Show simple item record

dc.contributor.author Mkhwanazi, Sthembiso
dc.contributor.author Marais, Laurette
dc.date.accessioned 2024-09-13T08:27:41Z
dc.date.available 2024-09-13T08:27:41Z
dc.date.issued 2024-02
dc.identifier.citation Mkhwanazi, S. & Marais, L. 2024. Generation of segmented isiZulu text. <i>Journal of the Digital Humanities Association of Southern Africa, 5(1).</i> http://hdl.handle.net/10204/13749 en_ZA
dc.identifier.issn 3006-6492
dc.identifier.uri https://doi.org/10.55492/dhasa.v5i1.5034
dc.identifier.uri http://hdl.handle.net/10204/13749
dc.description.abstract The complex morphology, conjunctive orthography and widespread occurrence of morphophonological alternation in the Nguni languages have given rise to several efforts towards morphological segmentation of tokens of Nguni languages. For supervised methods, annotated data is required, which currently exists as canonically segmented data in the NCHLT corpus and surface segmented data in the Ukwabelana corpus. In this paper, we present a method and segmentation strategy based on a computational grammar for isiZulu. The grammar, which itself has some limitations in processing speed and robustness to unexpected input, is used to create a new set of segmentations for the tokens of the Ukwabelana corpus. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri https://upjournals.up.ac.za/index.php/dhasa/article/view/5034 en_US
dc.source Journal of the Digital Humanities Association of Southern Africa, 5(1) en_US
dc.subject Nguni languages en_US
dc.subject Agglutinative languages en_US
dc.subject Morphological segmentation en_US
dc.subject Language models en_US
dc.subject Segmented isiZulu text en_US
dc.title Generation of segmented isiZulu text en_US
dc.type Article en_US
dc.description.pages 8 en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Voice Computing en_US
dc.identifier.apacitation Mkhwanazi, S., & Marais, L. (2024). Generation of segmented isiZulu text. <i>Journal of the Digital Humanities Association of Southern Africa, 5(1)</i>, http://hdl.handle.net/10204/13749 en_ZA
dc.identifier.chicagocitation Mkhwanazi, Sthembiso, and Laurette Marais "Generation of segmented isiZulu text." <i>Journal of the Digital Humanities Association of Southern Africa, 5(1)</i> (2024) http://hdl.handle.net/10204/13749 en_ZA
dc.identifier.vancouvercitation Mkhwanazi S, Marais L. Generation of segmented isiZulu text. Journal of the Digital Humanities Association of Southern Africa, 5(1). 2024; http://hdl.handle.net/10204/13749. en_ZA
dc.identifier.ris TY - Article AU - Mkhwanazi, Sthembiso AU - Marais, Laurette AB - The complex morphology, conjunctive orthography and widespread occurrence of morphophonological alternation in the Nguni languages have given rise to several efforts towards morphological segmentation of tokens of Nguni languages. For supervised methods, annotated data is required, which currently exists as canonically segmented data in the NCHLT corpus and surface segmented data in the Ukwabelana corpus. In this paper, we present a method and segmentation strategy based on a computational grammar for isiZulu. The grammar, which itself has some limitations in processing speed and robustness to unexpected input, is used to create a new set of segmentations for the tokens of the Ukwabelana corpus. DA - 2024-02 DB - ResearchSpace DP - CSIR J1 - Journal of the Digital Humanities Association of Southern Africa, 5(1) KW - Nguni languages KW - Agglutinative languages KW - Morphological segmentation KW - Language models KW - Segmented isiZulu text LK - https://researchspace.csir.co.za PY - 2024 SM - 3006-6492 T1 - Generation of segmented isiZulu text TI - Generation of segmented isiZulu text UR - http://hdl.handle.net/10204/13749 ER - en_ZA
dc.identifier.worklist 28061 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record