dc.contributor.author |
Sefara, Tshephisho J
|
|
dc.contributor.author |
Mbooi, Mahlatse S
|
|
dc.contributor.author |
Mashile, Katlego J
|
|
dc.contributor.author |
Rambuda, Thompho
|
|
dc.contributor.author |
Rangata, Mapitsi R
|
|
dc.date.accessioned |
2022-12-11T14:33:45Z |
|
dc.date.available |
2022-12-11T14:33:45Z |
|
dc.date.issued |
2022-08 |
|
dc.identifier.citation |
Sefara, T.J., Mbooi, M.S., Mashile, K.J., Rambuda, T. & Rangata, M.R. 2022. A toolkit for text extraction and analysis for natural language processing tasks. http://hdl.handle.net/10204/12565 . |
en_ZA |
dc.identifier.isbn |
978-1-6654-8422-0 |
|
dc.identifier.isbn |
978-1-6654-8421-3 |
|
dc.identifier.isbn |
978-1-6654-8423-7 |
|
dc.identifier.uri |
DOI: 10.1109/icABCD54961.2022.9856269
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/12565
|
|
dc.description.abstract |
Text extraction is an important part of natural language processing (NLP) tasks. Most NLP tasks like text classification, machine translation, text-to-speech, text-based language identification, text summarization, and named-entity recognition involve the use of textual data. Such data is limited for low-resourced languages making it difficult to experiment advanced NLP techniques on these languages. This paper presents a Python-based toolkit for text analysis and text extraction from different types of images, documents, and audio files. The toolkit is built as a library that has functions that can be imported and utilized for text extraction. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
https://ieeexplore.ieee.org/document/9856269 |
en_US |
dc.source |
2022 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 4-5 August 2022 |
en_US |
dc.subject |
Text recognition |
en_US |
dc.subject |
Text categorization |
en_US |
dc.subject |
Big data |
en_US |
dc.subject |
Natural Language Processing |
en_US |
dc.subject |
Machine translation |
en_US |
dc.subject |
Data communication |
en_US |
dc.title |
A toolkit for text extraction and analysis for natural language processing tasks |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.description.pages |
6 |
en_US |
dc.description.note |
Due to copyright restrictions, the attached PDF file contains the preprint version of the published item. For access to the published version, please consult the publisher's website: https://ieeexplore.ieee.org/document/9856269 |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Data Science |
en_US |
dc.identifier.apacitation |
Sefara, T. J., Mbooi, M. S., Mashile, K. J., Rambuda, T., & Rangata, M. R. (2022). A toolkit for text extraction and analysis for natural language processing tasks. http://hdl.handle.net/10204/12565 |
en_ZA |
dc.identifier.chicagocitation |
Sefara, Tshephisho J, Mahlatse S Mbooi, Katlego J Mashile, Thompho Rambuda, and Mapitsi R Rangata. "A toolkit for text extraction and analysis for natural language processing tasks." <i>2022 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 4-5 August 2022</i> (2022): http://hdl.handle.net/10204/12565 |
en_ZA |
dc.identifier.vancouvercitation |
Sefara TJ, Mbooi MS, Mashile KJ, Rambuda T, Rangata MR, A toolkit for text extraction and analysis for natural language processing tasks; 2022. http://hdl.handle.net/10204/12565 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Sefara, Tshephisho J
AU - Mbooi, Mahlatse S
AU - Mashile, Katlego J
AU - Rambuda, Thompho
AU - Rangata, Mapitsi R
AB - Text extraction is an important part of natural language processing (NLP) tasks. Most NLP tasks like text classification, machine translation, text-to-speech, text-based language identification, text summarization, and named-entity recognition involve the use of textual data. Such data is limited for low-resourced languages making it difficult to experiment advanced NLP techniques on these languages. This paper presents a Python-based toolkit for text analysis and text extraction from different types of images, documents, and audio files. The toolkit is built as a library that has functions that can be imported and utilized for text extraction.
DA - 2022-08
DB - ResearchSpace
DP - CSIR
J1 - 2022 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 4-5 August 2022
KW - Text recognition
KW - Text categorization
KW - Big data
KW - Natural Language Processing
KW - Machine translation
KW - Data communication
LK - https://researchspace.csir.co.za
PY - 2022
SM - 978-1-6654-8422-0
SM - 978-1-6654-8421-3
SM - 978-1-6654-8423-7
T1 - A toolkit for text extraction and analysis for natural language processing tasks
TI - A toolkit for text extraction and analysis for natural language processing tasks
UR - http://hdl.handle.net/10204/12565
ER -
|
en_ZA |
dc.identifier.worklist |
26284 |
en_US |