ResearchSpace

Semi-supervised probabilistics approach for normalising informal short text messages

Show simple item record

dc.contributor.author Modupe, A
dc.contributor.author Celik, T
dc.contributor.author Marivate, Vukosi N
dc.contributor.author Diale, Melvin
dc.date.accessioned 2018-01-09T07:15:46Z
dc.date.available 2018-01-09T07:15:46Z
dc.date.issued 2017-03
dc.identifier.citation Modupe, A. et al. 2017. Semi-supervised probabilistics approach for normalising informal short text messages. 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017 en_US
dc.identifier.isbn 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017
dc.identifier.uri http://ieeexplore.ieee.org/document/7920659/?reload=true
dc.identifier.uri http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7920659
dc.identifier.uri DOI: 10.1109/ICTAS.2017.7920659
dc.identifier.uri http://hdl.handle.net/10204/9934
dc.description Copyright: 2017 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text item, please consult the publisher's website. en_US
dc.description.abstract The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given word is likely to be transformed. Our method uses language model probability to characterise the relationship between formal and Informal-word, then employ the string similarity with a log-linear model to includes features for both word-level transformation and local context similarity. The weights of these features are trained by employing maximum likelihood framework using stochastic gradient descent (SGD) to hypothesise the better clean feature for a given informal short text. Experiments were conducted on a publicly available Enlish-language tweet and the approach is able to normalise inflected words in an online social network. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.ispartofseries Worklist;19724
dc.subject Informal Short Text Messages en_US
dc.subject User-generated context en_US
dc.subject UGC en_US
dc.subject Social Media Text en_US
dc.subject Natural Language Processing en_US
dc.subject Stochastic gradient descent en_US
dc.subject SGD en_US
dc.subject Online Social Network en_US
dc.title Semi-supervised probabilistics approach for normalising informal short text messages en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Modupe, A., Celik, T., Marivate, V. N., & Diale, M. (2017). Semi-supervised probabilistics approach for normalising informal short text messages. IEEE. http://hdl.handle.net/10204/9934 en_ZA
dc.identifier.chicagocitation Modupe, A, T Celik, Vukosi N Marivate, and Melvin Diale. "Semi-supervised probabilistics approach for normalising informal short text messages." (2017): http://hdl.handle.net/10204/9934 en_ZA
dc.identifier.vancouvercitation Modupe A, Celik T, Marivate VN, Diale M, Semi-supervised probabilistics approach for normalising informal short text messages; IEEE; 2017. http://hdl.handle.net/10204/9934 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Modupe, A AU - Celik, T AU - Marivate, Vukosi N AU - Diale, Melvin AB - The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given word is likely to be transformed. Our method uses language model probability to characterise the relationship between formal and Informal-word, then employ the string similarity with a log-linear model to includes features for both word-level transformation and local context similarity. The weights of these features are trained by employing maximum likelihood framework using stochastic gradient descent (SGD) to hypothesise the better clean feature for a given informal short text. Experiments were conducted on a publicly available Enlish-language tweet and the approach is able to normalise inflected words in an online social network. DA - 2017-03 DB - ResearchSpace DP - CSIR KW - Informal Short Text Messages KW - User-generated context KW - UGC KW - Social Media Text KW - Natural Language Processing KW - Stochastic gradient descent KW - SGD KW - Online Social Network LK - https://researchspace.csir.co.za PY - 2017 SM - 2017 Conference on Information Communications Technology and Society (ICTAS), Umhlanga, South Africa, 8-10 March 2017 T1 - Semi-supervised probabilistics approach for normalising informal short text messages TI - Semi-supervised probabilistics approach for normalising informal short text messages UR - http://hdl.handle.net/10204/9934 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record