dc.contributor.author |
Mkhwanazi, Sthembiso
|
|
dc.contributor.author |
Marais, Laurette
|
|
dc.date.accessioned |
2024-09-13T08:27:41Z |
|
dc.date.available |
2024-09-13T08:27:41Z |
|
dc.date.issued |
2024-02 |
|
dc.identifier.citation |
Mkhwanazi, S. & Marais, L. 2024. Generation of segmented isiZulu text. <i>Journal of the Digital Humanities Association of Southern Africa, 5(1).</i> http://hdl.handle.net/10204/13749 |
en_ZA |
dc.identifier.issn |
3006-6492 |
|
dc.identifier.uri |
https://doi.org/10.55492/dhasa.v5i1.5034
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/13749
|
|
dc.description.abstract |
The complex morphology, conjunctive orthography and widespread occurrence of morphophonological alternation in the Nguni languages have given rise to several efforts towards morphological segmentation of tokens of Nguni languages. For supervised methods, annotated data is required, which currently exists as canonically segmented data in the NCHLT corpus and surface segmented data in the Ukwabelana corpus. In this paper, we present a method and segmentation strategy based on a computational grammar for isiZulu. The grammar, which itself has some limitations in processing speed and robustness to unexpected input, is used to create a new set of segmentations for the tokens of the Ukwabelana corpus. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
https://upjournals.up.ac.za/index.php/dhasa/article/view/5034 |
en_US |
dc.source |
Journal of the Digital Humanities Association of Southern Africa, 5(1) |
en_US |
dc.subject |
Nguni languages |
en_US |
dc.subject |
Agglutinative languages |
en_US |
dc.subject |
Morphological segmentation |
en_US |
dc.subject |
Language models |
en_US |
dc.subject |
Segmented isiZulu text |
en_US |
dc.title |
Generation of segmented isiZulu text |
en_US |
dc.type |
Article |
en_US |
dc.description.pages |
8 |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Voice Computing |
en_US |
dc.identifier.apacitation |
Mkhwanazi, S., & Marais, L. (2024). Generation of segmented isiZulu text. <i>Journal of the Digital Humanities Association of Southern Africa, 5(1)</i>, http://hdl.handle.net/10204/13749 |
en_ZA |
dc.identifier.chicagocitation |
Mkhwanazi, Sthembiso, and Laurette Marais "Generation of segmented isiZulu text." <i>Journal of the Digital Humanities Association of Southern Africa, 5(1)</i> (2024) http://hdl.handle.net/10204/13749 |
en_ZA |
dc.identifier.vancouvercitation |
Mkhwanazi S, Marais L. Generation of segmented isiZulu text. Journal of the Digital Humanities Association of Southern Africa, 5(1). 2024; http://hdl.handle.net/10204/13749. |
en_ZA |
dc.identifier.ris |
TY - Article
AU - Mkhwanazi, Sthembiso
AU - Marais, Laurette
AB - The complex morphology, conjunctive orthography and widespread occurrence of morphophonological alternation in the Nguni languages have given rise to several efforts towards morphological segmentation of tokens of Nguni languages. For supervised methods, annotated data is required, which currently exists as canonically segmented data in the NCHLT corpus and surface segmented data in the Ukwabelana corpus. In this paper, we present a method and segmentation strategy based on a computational grammar for isiZulu. The grammar, which itself has some limitations in processing speed and robustness to unexpected input, is used to create a new set of segmentations for the tokens of the Ukwabelana corpus.
DA - 2024-02
DB - ResearchSpace
DP - CSIR
J1 - Journal of the Digital Humanities Association of Southern Africa, 5(1)
KW - Nguni languages
KW - Agglutinative languages
KW - Morphological segmentation
KW - Language models
KW - Segmented isiZulu text
LK - https://researchspace.csir.co.za
PY - 2024
SM - 3006-6492
T1 - Generation of segmented isiZulu text
TI - Generation of segmented isiZulu text
UR - http://hdl.handle.net/10204/13749
ER -
|
en_ZA |
dc.identifier.worklist |
28061 |
en_US |