dc.contributor.author |
Modupe, A
|
|
dc.contributor.author |
Celik, T
|
|
dc.contributor.author |
Marivate, Vukosi N
|
|
dc.contributor.author |
Diale, Melvin
|
|
dc.date.accessioned |
2018-01-09T07:15:46Z |
|
dc.date.available |
2018-01-09T07:15:46Z |
|
dc.date.issued |
2017-03 |
|
dc.identifier.citation |
Modupe, A. et al. 2017. Semi-supervised probabilistics approach for normalising informal short text messages. 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017 |
en_US |
dc.identifier.isbn |
2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017 |
|
dc.identifier.uri |
http://ieeexplore.ieee.org/document/7920659/?reload=true
|
|
dc.identifier.uri |
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7920659
|
|
dc.identifier.uri |
DOI: 10.1109/ICTAS.2017.7920659
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/9934
|
|
dc.description |
Copyright: 2017 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text item, please consult the publisher's website. |
en_US |
dc.description.abstract |
The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given word is likely to be transformed. Our method uses language model probability to characterise the relationship between formal and Informal-word, then employ the string similarity with a log-linear model to includes features for both word-level transformation and local context similarity. The weights of these features are trained by employing maximum likelihood framework using stochastic gradient descent (SGD) to hypothesise the better clean feature for a given informal short text. Experiments were conducted on a publicly available Enlish-language tweet and the approach is able to normalise inflected words in an online social network. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
IEEE |
en_US |
dc.relation.ispartofseries |
Worklist;19724 |
|
dc.subject |
Informal Short Text Messages |
en_US |
dc.subject |
User-generated context |
en_US |
dc.subject |
UGC |
en_US |
dc.subject |
Social Media Text |
en_US |
dc.subject |
Natural Language Processing |
en_US |
dc.subject |
Stochastic gradient descent |
en_US |
dc.subject |
SGD |
en_US |
dc.subject |
Online Social Network |
en_US |
dc.title |
Semi-supervised probabilistics approach for normalising informal short text messages |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Modupe, A., Celik, T., Marivate, V. N., & Diale, M. (2017). Semi-supervised probabilistics approach for normalising informal short text messages. IEEE. http://hdl.handle.net/10204/9934 |
en_ZA |
dc.identifier.chicagocitation |
Modupe, A, T Celik, Vukosi N Marivate, and Melvin Diale. "Semi-supervised probabilistics approach for normalising informal short text messages." (2017): http://hdl.handle.net/10204/9934 |
en_ZA |
dc.identifier.vancouvercitation |
Modupe A, Celik T, Marivate VN, Diale M, Semi-supervised probabilistics approach for normalising informal short text messages; IEEE; 2017. http://hdl.handle.net/10204/9934 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Modupe, A
AU - Celik, T
AU - Marivate, Vukosi N
AU - Diale, Melvin
AB - The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given word is likely to be transformed. Our method uses language model probability to characterise the relationship between formal and Informal-word, then employ the string similarity with a log-linear model to includes features for both word-level transformation and local context similarity. The weights of these features are trained by employing maximum likelihood framework using stochastic gradient descent (SGD) to hypothesise the better clean feature for a given informal short text. Experiments were conducted on a publicly available Enlish-language tweet and the approach is able to normalise inflected words in an online social network.
DA - 2017-03
DB - ResearchSpace
DP - CSIR
KW - Informal Short Text Messages
KW - User-generated context
KW - UGC
KW - Social Media Text
KW - Natural Language Processing
KW - Stochastic gradient descent
KW - SGD
KW - Online Social Network
LK - https://researchspace.csir.co.za
PY - 2017
SM - 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017
T1 - Semi-supervised probabilistics approach for normalising informal short text messages
TI - Semi-supervised probabilistics approach for normalising informal short text messages
UR - http://hdl.handle.net/10204/9934
ER -
|
en_ZA |