ResearchSpace

Topic modelling of short texts in the health domain using LDA and bard

Show simple item record

dc.contributor.author Mbooi, Mahlatse S
dc.contributor.author Rangata, Mapitsi R
dc.contributor.author Sefara, Tshephisho J
dc.date.accessioned 2024-07-30T09:15:15Z
dc.date.available 2024-07-30T09:15:15Z
dc.date.issued 2024-03
dc.identifier.citation Mbooi, M.S., Rangata, M.R. & Sefara, T.J. 2024. Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735 . en_ZA
dc.identifier.isbn 979-8-3503-1491-5
dc.identifier.isbn 979-8-3503-1492-2
dc.identifier.uri DOI: 10.1109/ICTAS59620.2024.10507116
dc.identifier.uri 10.1109/ICTAS59620.2024
dc.identifier.uri http://hdl.handle.net/10204/13735
dc.description.abstract This paper proposes a model for the topic modelling of tweets in the health and mental health domain using the Latent Dirichlet Allocation (LDA) method. The data were obtained from the sentiment140 project. The data were prepared for topic modelling by performing Natural Language Processing (NLP) tasks such as stemming and data cleaning. LDA method was trained on the data to create a cluster of topics. We explored 1 to 6 clusters and, after thorough analysis, three topics were chosen to create the LDA model. Each topic was labelled with a label name that is generated using Bard and coding analysis. This method can be used to label unlabelled data without using sophisticated supervised machine learning methods. Labelled data can be used to improve data management, information retrieval, supervised machine learning, and other techniques. en_US
dc.format Abstract en_US
dc.language.iso en en_US
dc.relation.uri https://ieeexplore.ieee.org/document/10507116 en_US
dc.relation.uri https://ieeexplore.ieee.org/xpl/conhome/10507104/proceeding en_US
dc.source 2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024 en_US
dc.subject Topic modelling en_US
dc.subject Latent dirichlet allocation en_US
dc.subject Natural Language Processing en_US
dc.subject NLP en_US
dc.title Topic modelling of short texts in the health domain using LDA and bard en_US
dc.type Conference Presentation en_US
dc.description.pages 6 en_US
dc.description.note ©2024 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text item, please consult the publisher's website: https://ieeexplore.ieee.org/document/10507116 en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Data Science en_US
dc.identifier.apacitation Mbooi, M. S., Rangata, M. R., & Sefara, T. J. (2024). Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735 en_ZA
dc.identifier.chicagocitation Mbooi, Mahlatse S, Mapitsi R Rangata, and Tshephisho J Sefara. "Topic modelling of short texts in the health domain using LDA and bard." <i>2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024</i> (2024): http://hdl.handle.net/10204/13735 en_ZA
dc.identifier.vancouvercitation Mbooi MS, Rangata MR, Sefara TJ, Topic modelling of short texts in the health domain using LDA and bard; 2024. http://hdl.handle.net/10204/13735 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Mbooi, Mahlatse S AU - Rangata, Mapitsi R AU - Sefara, Tshephisho J AB - This paper proposes a model for the topic modelling of tweets in the health and mental health domain using the Latent Dirichlet Allocation (LDA) method. The data were obtained from the sentiment140 project. The data were prepared for topic modelling by performing Natural Language Processing (NLP) tasks such as stemming and data cleaning. LDA method was trained on the data to create a cluster of topics. We explored 1 to 6 clusters and, after thorough analysis, three topics were chosen to create the LDA model. Each topic was labelled with a label name that is generated using Bard and coding analysis. This method can be used to label unlabelled data without using sophisticated supervised machine learning methods. Labelled data can be used to improve data management, information retrieval, supervised machine learning, and other techniques. DA - 2024-03 DB - ResearchSpace DP - CSIR J1 - 2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024 KW - Topic modelling KW - Latent dirichlet allocation KW - Natural Language Processing KW - NLP LK - https://researchspace.csir.co.za PY - 2024 SM - 979-8-3503-1491-5 SM - 979-8-3503-1492-2 T1 - Topic modelling of short texts in the health domain using LDA and bard TI - Topic modelling of short texts in the health domain using LDA and bard UR - http://hdl.handle.net/10204/13735 ER - en_ZA
dc.identifier.worklist 27926 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record