dc.contributor.author |
Mukwevho, M
|
|
dc.contributor.author |
Rananga, S
|
|
dc.contributor.author |
Mbooi, Mahlatse S
|
|
dc.contributor.author |
Isong, B
|
|
dc.contributor.author |
Marivate, V
|
|
dc.date.accessioned |
2024-09-16T08:16:31Z |
|
dc.date.available |
2024-09-16T08:16:31Z |
|
dc.date.issued |
2024-05 |
|
dc.identifier.citation |
Mukwevho, M., Rananga, S., Mbooi, M.S., Isong, B. & Marivate, V. 2024. Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760 . |
en_ZA |
dc.identifier.uri |
http://hdl.handle.net/10204/13760
|
|
dc.description.abstract |
In the modern digital age, the widespread dissemination of misinformation has become a serious issue. Most focus in identifying misinformation online has been targeted at the English language in contrast to low-resource languages like Tshivenda. In this paper, we create a new dataset for news in the Tshivenda language to assist in developing resources for misinformation in the language. In our proposed methodology, we leveraged conditional random fields (CRF), gated recurrent unit (GRU), and long short-term memory (LSTM) to collect and annotate social media content. By applying these deep learning approaches to existing Tshivenda posts, we can assess their effectiveness for identifying false news in a low-resource language setting. This paper emphasises the vital need to combat misinformation in languages with limited resources, such as Tshivenda. Through the creation of a specialised dataset and the use of advanced techniques, it aims to address the problem of the spread of misinformation in low represented language communities. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.source |
IST-Africa Conference (IST-Africa), 20-24 May 2024 |
en_US |
dc.subject |
Misinformation |
en_US |
dc.subject |
Natural Language Processing |
en_US |
dc.subject |
LNP |
en_US |
dc.subject |
Social media |
en_US |
dc.subject |
Low-resource language |
en_US |
dc.subject |
Conditional Random Fields |
en_US |
dc.subject |
CRF |
en_US |
dc.subject |
Gated Recurrent Unit |
en_US |
dc.subject |
GRU |
en_US |
dc.subject |
Long short-term memory |
en_US |
dc.subject |
LSTM |
en_US |
dc.title |
Building a dataset for misinformation detection in the low-resource language |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.description.pages |
9 |
en_US |
dc.description.note |
Copyright © 2024 The authors |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Data Science |
en_US |
dc.identifier.apacitation |
Mukwevho, M., Rananga, S., Mbooi, M. S., Isong, B., & Marivate, V. (2024). Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760 |
en_ZA |
dc.identifier.chicagocitation |
Mukwevho, M, S Rananga, Mahlatse S Mbooi, B Isong, and V Marivate. "Building a dataset for misinformation detection in the low-resource language." <i>IST-Africa Conference (IST-Africa), 20-24 May 2024</i> (2024): http://hdl.handle.net/10204/13760 |
en_ZA |
dc.identifier.vancouvercitation |
Mukwevho M, Rananga S, Mbooi MS, Isong B, Marivate V, Building a dataset for misinformation detection in the low-resource language; 2024. http://hdl.handle.net/10204/13760 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Mukwevho, M
AU - Rananga, S
AU - Mbooi, Mahlatse S
AU - Isong, B
AU - Marivate, V
AB - In the modern digital age, the widespread dissemination of misinformation has become a serious issue. Most focus in identifying misinformation online has been targeted at the English language in contrast to low-resource languages like Tshivenda. In this paper, we create a new dataset for news in the Tshivenda language to assist in developing resources for misinformation in the language. In our proposed methodology, we leveraged conditional random fields (CRF), gated recurrent unit (GRU), and long short-term memory (LSTM) to collect and annotate social media content. By applying these deep learning approaches to existing Tshivenda posts, we can assess their effectiveness for identifying false news in a low-resource language setting. This paper emphasises the vital need to combat misinformation in languages with limited resources, such as Tshivenda. Through the creation of a specialised dataset and the use of advanced techniques, it aims to address the problem of the spread of misinformation in low represented language communities.
DA - 2024-05
DB - ResearchSpace
DP - CSIR
J1 - IST-Africa Conference (IST-Africa), 20-24 May 2024
KW - Misinformation
KW - Natural Language Processing
KW - LNP
KW - Social media
KW - Low-resource language
KW - Conditional Random Fields
KW - CRF
KW - Gated Recurrent Unit
KW - GRU
KW - Long short-term memory
KW - LSTM
LK - https://researchspace.csir.co.za
PY - 2024
T1 - Building a dataset for misinformation detection in the low-resource language
TI - Building a dataset for misinformation detection in the low-resource language
UR - http://hdl.handle.net/10204/13760
ER -
|
en_ZA |
dc.identifier.worklist |
28133 |
en_US |