dc.contributor.author |
Mbooi, Mahlatse S
|
|
dc.contributor.author |
Rangata, Mapitsi R
|
|
dc.contributor.author |
Sefara, Tshephisho J
|
|
dc.date.accessioned |
2024-07-30T09:15:15Z |
|
dc.date.available |
2024-07-30T09:15:15Z |
|
dc.date.issued |
2024-03 |
|
dc.identifier.citation |
Mbooi, M.S., Rangata, M.R. & Sefara, T.J. 2024. Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735 . |
en_ZA |
dc.identifier.isbn |
979-8-3503-1491-5 |
|
dc.identifier.isbn |
979-8-3503-1492-2 |
|
dc.identifier.uri |
DOI: 10.1109/ICTAS59620.2024.10507116
|
|
dc.identifier.uri |
10.1109/ICTAS59620.2024
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/13735
|
|
dc.description.abstract |
This paper proposes a model for the topic modelling of tweets in the health and mental health domain using the Latent Dirichlet Allocation (LDA) method. The data were obtained from the sentiment140 project. The data were prepared for topic modelling by performing Natural Language Processing (NLP) tasks such as stemming and data cleaning. LDA method was trained on the data to create a cluster of topics. We explored 1 to 6 clusters and, after thorough analysis, three topics were chosen to create the LDA model. Each topic was labelled with a label name that is generated using Bard and coding analysis. This method can be used to label unlabelled data without using sophisticated supervised machine learning methods. Labelled data can be used to improve data management, information retrieval, supervised machine learning, and other techniques. |
en_US |
dc.format |
Abstract |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
https://ieeexplore.ieee.org/document/10507116 |
en_US |
dc.relation.uri |
https://ieeexplore.ieee.org/xpl/conhome/10507104/proceeding |
en_US |
dc.source |
2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024 |
en_US |
dc.subject |
Topic modelling |
en_US |
dc.subject |
Latent dirichlet allocation |
en_US |
dc.subject |
Natural Language Processing |
en_US |
dc.subject |
NLP |
en_US |
dc.title |
Topic modelling of short texts in the health domain using LDA and bard |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.description.pages |
6 |
en_US |
dc.description.note |
©2024 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text item, please consult the publisher's website: https://ieeexplore.ieee.org/document/10507116 |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Data Science |
en_US |
dc.identifier.apacitation |
Mbooi, M. S., Rangata, M. R., & Sefara, T. J. (2024). Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735 |
en_ZA |
dc.identifier.chicagocitation |
Mbooi, Mahlatse S, Mapitsi R Rangata, and Tshephisho J Sefara. "Topic modelling of short texts in the health domain using LDA and bard." <i>2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024</i> (2024): http://hdl.handle.net/10204/13735 |
en_ZA |
dc.identifier.vancouvercitation |
Mbooi MS, Rangata MR, Sefara TJ, Topic modelling of short texts in the health domain using LDA and bard; 2024. http://hdl.handle.net/10204/13735 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Mbooi, Mahlatse S
AU - Rangata, Mapitsi R
AU - Sefara, Tshephisho J
AB - This paper proposes a model for the topic modelling of tweets in the health and mental health domain using the Latent Dirichlet Allocation (LDA) method. The data were obtained from the sentiment140 project. The data were prepared for topic modelling by performing Natural Language Processing (NLP) tasks such as stemming and data cleaning. LDA method was trained on the data to create a cluster of topics. We explored 1 to 6 clusters and, after thorough analysis, three topics were chosen to create the LDA model. Each topic was labelled with a label name that is generated using Bard and coding analysis. This method can be used to label unlabelled data without using sophisticated supervised machine learning methods. Labelled data can be used to improve data management, information retrieval, supervised machine learning, and other techniques.
DA - 2024-03
DB - ResearchSpace
DP - CSIR
J1 - 2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024
KW - Topic modelling
KW - Latent dirichlet allocation
KW - Natural Language Processing
KW - NLP
LK - https://researchspace.csir.co.za
PY - 2024
SM - 979-8-3503-1491-5
SM - 979-8-3503-1492-2
T1 - Topic modelling of short texts in the health domain using LDA and bard
TI - Topic modelling of short texts in the health domain using LDA and bard
UR - http://hdl.handle.net/10204/13735
ER -
|
en_ZA |
dc.identifier.worklist |
27926 |
en_US |