Semi-supervised probabilistics approach for normalising informal short text messages

Modupe, A; Celik, T; Marivate, Vukosi N; Diale, Melvin

Semi-supervised probabilistics approach for normalising informal short text messages

http://ieeexplore.ieee.org/document/7920659/?reload=true
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7920659
DOI: 10.1109/ICTAS.2017.7920659
http://hdl.handle.net/10204/9934

Abstract:

The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given word is likely to be transformed. Our method uses language model probability to characterise the relationship between formal and Informal-word, then employ the string similarity with a log-linear model to includes features for both word-level transformation and local context similarity. The weights of these features are trained by employing maximum likelihood framework using stochastic gradient descent (SGD) to hypothesise the better clean feature for a given informal short text. Experiments were conducted on a publicly available Enlish-language tweet and the approach is able to normalise inflected words in an online social network.

Reference:

Modupe, A. et al. 2017. Semi-supervised probabilistics approach for normalising informal short text messages. 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017

Modupe, A., Celik, T., Marivate, V. N., & Diale, M. (2017). Semi-supervised probabilistics approach for normalising informal short text messages. IEEE. http://hdl.handle.net/10204/9934

Modupe, A, T Celik, Vukosi N Marivate, and Melvin Diale. "Semi-supervised probabilistics approach for normalising informal short text messages." (2017): http://hdl.handle.net/10204/9934

Modupe A, Celik T, Marivate VN, Diale M, Semi-supervised probabilistics approach for normalising informal short text messages; IEEE; 2017. http://hdl.handle.net/10204/9934 .

Download RIS

Copyright: 2017 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text item, please consult the publisher's website.

Modupe, A
Celik, T
Marivate, Vukosi N
Diale, Melvin

Mar 2017

Informal Short Text Messages
User-generated context
UGC
Social Media Text
Natural Language Processing
Stochastic gradient descent
SGD
Online Social Network

Show full item record

Files in this item

Modupe_19724_2017.pdf

This item appears in the following Collection(s)

Conference Publications

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

Semi-supervised probabilistics approach for normalising informal short text messages

Semi-supervised probabilistics approach for normalising informal short text messages

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect