ResearchSpace

Building a dataset for misinformation detection in the low-resource language

Show simple item record

dc.contributor.author Mukwevho, M
dc.contributor.author Rananga, S
dc.contributor.author Mbooi, Mahlatse S
dc.contributor.author Isong, B
dc.contributor.author Marivate, V
dc.date.accessioned 2024-09-16T08:16:31Z
dc.date.available 2024-09-16T08:16:31Z
dc.date.issued 2024-05
dc.identifier.citation Mukwevho, M., Rananga, S., Mbooi, M.S., Isong, B. & Marivate, V. 2024. Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760 . en_ZA
dc.identifier.uri http://hdl.handle.net/10204/13760
dc.description.abstract In the modern digital age, the widespread dissemination of misinformation has become a serious issue. Most focus in identifying misinformation online has been targeted at the English language in contrast to low-resource languages like Tshivenda. In this paper, we create a new dataset for news in the Tshivenda language to assist in developing resources for misinformation in the language. In our proposed methodology, we leveraged conditional random fields (CRF), gated recurrent unit (GRU), and long short-term memory (LSTM) to collect and annotate social media content. By applying these deep learning approaches to existing Tshivenda posts, we can assess their effectiveness for identifying false news in a low-resource language setting. This paper emphasises the vital need to combat misinformation in languages with limited resources, such as Tshivenda. Through the creation of a specialised dataset and the use of advanced techniques, it aims to address the problem of the spread of misinformation in low represented language communities. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.source IST-Africa Conference (IST-Africa), 20-24 May 2024 en_US
dc.subject Misinformation en_US
dc.subject Natural Language Processing en_US
dc.subject LNP en_US
dc.subject Social media en_US
dc.subject Low-resource language en_US
dc.subject Conditional Random Fields en_US
dc.subject CRF en_US
dc.subject Gated Recurrent Unit en_US
dc.subject GRU en_US
dc.subject Long short-term memory en_US
dc.subject LSTM en_US
dc.title Building a dataset for misinformation detection in the low-resource language en_US
dc.type Conference Presentation en_US
dc.description.pages 9 en_US
dc.description.note Copyright © 2024 The authors en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Data Science en_US
dc.identifier.apacitation Mukwevho, M., Rananga, S., Mbooi, M. S., Isong, B., & Marivate, V. (2024). Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760 en_ZA
dc.identifier.chicagocitation Mukwevho, M, S Rananga, Mahlatse S Mbooi, B Isong, and V Marivate. "Building a dataset for misinformation detection in the low-resource language." <i>IST-Africa Conference (IST-Africa), 20-24 May 2024</i> (2024): http://hdl.handle.net/10204/13760 en_ZA
dc.identifier.vancouvercitation Mukwevho M, Rananga S, Mbooi MS, Isong B, Marivate V, Building a dataset for misinformation detection in the low-resource language; 2024. http://hdl.handle.net/10204/13760 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Mukwevho, M AU - Rananga, S AU - Mbooi, Mahlatse S AU - Isong, B AU - Marivate, V AB - In the modern digital age, the widespread dissemination of misinformation has become a serious issue. Most focus in identifying misinformation online has been targeted at the English language in contrast to low-resource languages like Tshivenda. In this paper, we create a new dataset for news in the Tshivenda language to assist in developing resources for misinformation in the language. In our proposed methodology, we leveraged conditional random fields (CRF), gated recurrent unit (GRU), and long short-term memory (LSTM) to collect and annotate social media content. By applying these deep learning approaches to existing Tshivenda posts, we can assess their effectiveness for identifying false news in a low-resource language setting. This paper emphasises the vital need to combat misinformation in languages with limited resources, such as Tshivenda. Through the creation of a specialised dataset and the use of advanced techniques, it aims to address the problem of the spread of misinformation in low represented language communities. DA - 2024-05 DB - ResearchSpace DP - CSIR J1 - IST-Africa Conference (IST-Africa), 20-24 May 2024 KW - Misinformation KW - Natural Language Processing KW - LNP KW - Social media KW - Low-resource language KW - Conditional Random Fields KW - CRF KW - Gated Recurrent Unit KW - GRU KW - Long short-term memory KW - LSTM LK - https://researchspace.csir.co.za PY - 2024 T1 - Building a dataset for misinformation detection in the low-resource language TI - Building a dataset for misinformation detection in the low-resource language UR - http://hdl.handle.net/10204/13760 ER - en_ZA
dc.identifier.worklist 28133 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record