ResearchSpace

Emotional speaker recognition based on machine and deep learning

Show simple item record

dc.contributor.author Sefara, Tshephisho J
dc.contributor.author Mokgonyane, TB
dc.date.accessioned 2021-02-12T09:18:14Z
dc.date.available 2021-02-12T09:18:14Z
dc.date.issued 2020-11
dc.identifier.citation Sefara, T.J. & Mokgonyane, T. 2020. Emotional speaker recognition based on machine and deep learning. http://hdl.handle.net/10204/11753 . en_ZA
dc.identifier.isbn 978-1-7281-9521-6
dc.identifier.isbn 978-1-7281-9520-9
dc.identifier.uri http://hdl.handle.net/10204/11753
dc.description.abstract Speaker recognition is a method which recognise a speaker from characteristics of a voice. Speaker recognition technologies have been widely used in many domains. Most speaker recognition systems have been trained on normal clean recordings, however the performance of these speaker recognition systems tends to degrade when recognising speech which has emotions. This paper presents an emotional speaker recognition system trained using machine and deep learning algorithms using time, frequency and spectral features on emotional speech database acquired from the Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS). We trained and compared the performance of five machine learning models (Logistic Regression, Support Vector Machine, Random Forest, XGBoost, and k-Nearest Neighbor), and three deep learning models (Long Short-Term Memory network, Multilayer Perceptron, and Convolutional Neural Network). After the evaluation of the models, the deep neural networks showed good performance compared to machine learning models by attaining the highest accuracy of 92% outperforming the state-of-the-art models in emotional speaker detection from speech signals. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri DOI: 10.1109/IMITEC50163.2020.9334138 en_US
dc.relation.uri https://www.spu.ac.za/index.php/ieee-imitec-2020-programme/ en_US
dc.relation.uri https://ieeexplore.ieee.org/xpl/conhome/9334048/proceeding en_US
dc.source 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Sol Plaatje University, Kimberley, South Africa, 25 - 27 November 2020 en_US
dc.subject Ryerson AudioVisual Database of Emotional Speech and Song en_US
dc.subject RAVDESS en_US
dc.subject Neural networks en_US
dc.subject Machine learning en_US
dc.subject Emotional recognition en_US
dc.subject Speaker recognition en_US
dc.title Emotional speaker recognition based on machine and deep learning en_US
dc.type Conference Presentation en_US
dc.description.pages 8pp en_US
dc.description.note Copyright: 2020 IEEE. This is the preprint version of the work. en_US
dc.description.cluster Next Generation Enterprises & Institutions
dc.description.impactarea Data Science en_US
dc.identifier.apacitation Sefara, T. J., & Mokgonyane, T. (2020). Emotional speaker recognition based on machine and deep learning. http://hdl.handle.net/10204/11753 en_ZA
dc.identifier.chicagocitation Sefara, Tshephisho J, and TB Mokgonyane. "Emotional speaker recognition based on machine and deep learning." <i>2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Sol Plaatje University, Kimberley, South Africa, 25 - 27 November 2020</i> (2020): http://hdl.handle.net/10204/11753 en_ZA
dc.identifier.vancouvercitation Sefara TJ, Mokgonyane T, Emotional speaker recognition based on machine and deep learning; 2020. http://hdl.handle.net/10204/11753 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Sefara, Tshephisho J AU - Mokgonyane, TB AB - Speaker recognition is a method which recognise a speaker from characteristics of a voice. Speaker recognition technologies have been widely used in many domains. Most speaker recognition systems have been trained on normal clean recordings, however the performance of these speaker recognition systems tends to degrade when recognising speech which has emotions. This paper presents an emotional speaker recognition system trained using machine and deep learning algorithms using time, frequency and spectral features on emotional speech database acquired from the Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS). We trained and compared the performance of five machine learning models (Logistic Regression, Support Vector Machine, Random Forest, XGBoost, and k-Nearest Neighbor), and three deep learning models (Long Short-Term Memory network, Multilayer Perceptron, and Convolutional Neural Network). After the evaluation of the models, the deep neural networks showed good performance compared to machine learning models by attaining the highest accuracy of 92% outperforming the state-of-the-art models in emotional speaker detection from speech signals. DA - 2020-11 DB - ResearchSpace DP - CSIR J1 - 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Sol Plaatje University, Kimberley, South Africa, 25 - 27 November 2020 KW - Ryerson AudioVisual Database of Emotional Speech and Song KW - RAVDESS KW - Neural networks KW - Machine learning KW - Emotional recognition KW - Speaker recognition LK - https://researchspace.csir.co.za PY - 2020 SM - 978-1-7281-9521-6 SM - 978-1-7281-9520-9 T1 - Emotional speaker recognition based on machine and deep learning TI - Emotional speaker recognition based on machine and deep learning UR - http://hdl.handle.net/10204/11753 ER - en_ZA
dc.identifier.worklist 24192 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record