Topic modelling of short texts in the health domain using LDA and bard

Mbooi, Mahlatse S; Rangata, Mapitsi R; Sefara, Tshephisho J

Topic modelling of short texts in the health domain using LDA and bard

DOI: 10.1109/ICTAS59620.2024.10507116
10.1109/ICTAS59620.2024
http://hdl.handle.net/10204/13735

Abstract:

This paper proposes a model for the topic modelling of tweets in the health and mental health domain using the Latent Dirichlet Allocation (LDA) method. The data were obtained from the sentiment140 project. The data were prepared for topic modelling by performing Natural Language Processing (NLP) tasks such as stemming and data cleaning. LDA method was trained on the data to create a cluster of topics. We explored 1 to 6 clusters and, after thorough analysis, three topics were chosen to create the LDA model. Each topic was labelled with a label name that is generated using Bard and coding analysis. This method can be used to label unlabelled data without using sophisticated supervised machine learning methods. Labelled data can be used to improve data management, information retrieval, supervised machine learning, and other techniques.

Reference:

Mbooi, M.S., Rangata, M.R. & Sefara, T.J. 2024. Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735 .

Mbooi, M. S., Rangata, M. R., & Sefara, T. J. (2024). Topic modelling of short texts in the health domain using LDA and bard. http://hdl.handle.net/10204/13735

Mbooi, Mahlatse S, Mapitsi R Rangata, and Tshephisho J Sefara. "Topic modelling of short texts in the health domain using LDA and bard." 2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024 (2024): http://hdl.handle.net/10204/13735

Mbooi MS, Rangata MR, Sefara TJ, Topic modelling of short texts in the health domain using LDA and bard; 2024. http://hdl.handle.net/10204/13735 .

Download RIS

Mbooi, Mahlatse S
Rangata, Mapitsi R
Sefara, Tshephisho J

Mar 2024

Topic modelling
Latent dirichlet allocation
Natural Language Processing
NLP

Show full item record

Files in this item

Mbooi_2024.pdf

Source

2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 7-8 March 2024

This item appears in the following Collection(s)

Conference Publications

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

Topic modelling of short texts in the health domain using LDA and bard

Topic modelling of short texts in the health domain using LDA and bard

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect