Semi-supervised probabilistics approach for normalising informal short text messages

Modupe, A; Celik, T; Marivate, Vukosi N; Diale, Melvin

dc.contributor.author	Modupe, A
dc.contributor.author	Celik, T
dc.contributor.author	Marivate, Vukosi N
dc.contributor.author	Diale, Melvin
dc.date.accessioned	2018-01-09T07:15:46Z
dc.date.available	2018-01-09T07:15:46Z
dc.date.issued	2017-03
dc.identifier.citation	Modupe, A. et al. 2017. Semi-supervised probabilistics approach for normalising informal short text messages. 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017	en_US
dc.identifier.isbn	2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017
dc.identifier.uri	http://ieeexplore.ieee.org/document/7920659/?reload=true
dc.identifier.uri	http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7920659
dc.identifier.uri	DOI: 10.1109/ICTAS.2017.7920659
dc.identifier.uri	http://hdl.handle.net/10204/9934
dc.description	Copyright: 2017 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text item, please consult the publisher's website.	en_US
dc.description.abstract	The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given word is likely to be transformed. Our method uses language model probability to characterise the relationship between formal and Informal-word, then employ the string similarity with a log-linear model to includes features for both word-level transformation and local context similarity. The weights of these features are trained by employing maximum likelihood framework using stochastic gradient descent (SGD) to hypothesise the better clean feature for a given informal short text. Experiments were conducted on a publicly available Enlish-language tweet and the approach is able to normalise inflected words in an online social network.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartofseries	Worklist;19724
dc.subject	Informal Short Text Messages	en_US
dc.subject	User-generated context	en_US
dc.subject	UGC	en_US
dc.subject	Social Media Text	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Stochastic gradient descent	en_US
dc.subject	SGD	en_US
dc.subject	Online Social Network	en_US
dc.title	Semi-supervised probabilistics approach for normalising informal short text messages	en_US
dc.type	Conference Presentation	en_US
dc.identifier.apacitation	Modupe, A., Celik, T., Marivate, V. N., & Diale, M. (2017). Semi-supervised probabilistics approach for normalising informal short text messages. IEEE. http://hdl.handle.net/10204/9934	en_ZA
dc.identifier.chicagocitation	Modupe, A, T Celik, Vukosi N Marivate, and Melvin Diale. "Semi-supervised probabilistics approach for normalising informal short text messages." (2017): http://hdl.handle.net/10204/9934	en_ZA
dc.identifier.vancouvercitation	Modupe A, Celik T, Marivate VN, Diale M, Semi-supervised probabilistics approach for normalising informal short text messages; IEEE; 2017. http://hdl.handle.net/10204/9934 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Modupe, A AU - Celik, T AU - Marivate, Vukosi N AU - Diale, Melvin AB - The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given word is likely to be transformed. Our method uses language model probability to characterise the relationship between formal and Informal-word, then employ the string similarity with a log-linear model to includes features for both word-level transformation and local context similarity. The weights of these features are trained by employing maximum likelihood framework using stochastic gradient descent (SGD) to hypothesise the better clean feature for a given informal short text. Experiments were conducted on a publicly available Enlish-language tweet and the approach is able to normalise inflected words in an online social network. DA - 2017-03 DB - ResearchSpace DP - CSIR KW - Informal Short Text Messages KW - User-generated context KW - UGC KW - Social Media Text KW - Natural Language Processing KW - Stochastic gradient descent KW - SGD KW - Online Social Network LK - https://researchspace.csir.co.za PY - 2017 SM - 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017 T1 - Semi-supervised probabilistics approach for normalising informal short text messages TI - Semi-supervised probabilistics approach for normalising informal short text messages UR - http://hdl.handle.net/10204/9934 ER -	en_ZA

Files in this item

Name: Modupe_19724_2017.pdf

Size: 94.11Kb

Format: PDF

Description: Abstract

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.