Emotional speaker recognition based on machine and deep learning

Sefara, Tshephisho J; Mokgonyane, TB

dc.contributor.author	Sefara, Tshephisho J
dc.contributor.author	Mokgonyane, TB
dc.date.accessioned	2021-02-12T09:18:14Z
dc.date.available	2021-02-12T09:18:14Z
dc.date.issued	2020-11
dc.identifier.citation	Sefara, T.J. & Mokgonyane, T. 2020. Emotional speaker recognition based on machine and deep learning. http://hdl.handle.net/10204/11753 .	en_ZA
dc.identifier.isbn	978-1-7281-9521-6
dc.identifier.isbn	978-1-7281-9520-9
dc.identifier.uri	http://hdl.handle.net/10204/11753
dc.description.abstract	Speaker recognition is a method which recognise a speaker from characteristics of a voice. Speaker recognition technologies have been widely used in many domains. Most speaker recognition systems have been trained on normal clean recordings, however the performance of these speaker recognition systems tends to degrade when recognising speech which has emotions. This paper presents an emotional speaker recognition system trained using machine and deep learning algorithms using time, frequency and spectral features on emotional speech database acquired from the Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS). We trained and compared the performance of five machine learning models (Logistic Regression, Support Vector Machine, Random Forest, XGBoost, and k-Nearest Neighbor), and three deep learning models (Long Short-Term Memory network, Multilayer Perceptron, and Convolutional Neural Network). After the evaluation of the models, the deep neural networks showed good performance compared to machine learning models by attaining the highest accuracy of 92% outperforming the state-of-the-art models in emotional speaker detection from speech signals.	en_US
dc.format	Fulltext	en_US
dc.language.iso	en	en_US
dc.relation.uri	DOI: 10.1109/IMITEC50163.2020.9334138	en_US
dc.relation.uri	https://www.spu.ac.za/index.php/ieee-imitec-2020-programme/	en_US
dc.relation.uri	https://ieeexplore.ieee.org/xpl/conhome/9334048/proceeding	en_US
dc.source	2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Sol Plaatje University, Kimberley, South Africa, 25 - 27 November 2020	en_US
dc.subject	Ryerson AudioVisual Database of Emotional Speech and Song	en_US
dc.subject	RAVDESS	en_US
dc.subject	Neural networks	en_US
dc.subject	Machine learning	en_US
dc.subject	Emotional recognition	en_US
dc.subject	Speaker recognition	en_US
dc.title	Emotional speaker recognition based on machine and deep learning	en_US
dc.type	Conference Presentation	en_US
dc.description.pages	8pp	en_US
dc.description.note	Copyright: 2020 IEEE. This is the preprint version of the work.	en_US
dc.description.cluster	Next Generation Enterprises & Institutions
dc.description.impactarea	Data Science	en_US
dc.identifier.apacitation	Sefara, T. J., & Mokgonyane, T. (2020). Emotional speaker recognition based on machine and deep learning. http://hdl.handle.net/10204/11753	en_ZA
dc.identifier.chicagocitation	Sefara, Tshephisho J, and TB Mokgonyane. "Emotional speaker recognition based on machine and deep learning." <i>2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Sol Plaatje University, Kimberley, South Africa, 25 - 27 November 2020</i> (2020): http://hdl.handle.net/10204/11753	en_ZA
dc.identifier.vancouvercitation	Sefara TJ, Mokgonyane T, Emotional speaker recognition based on machine and deep learning; 2020. http://hdl.handle.net/10204/11753 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Sefara, Tshephisho J AU - Mokgonyane, TB AB - Speaker recognition is a method which recognise a speaker from characteristics of a voice. Speaker recognition technologies have been widely used in many domains. Most speaker recognition systems have been trained on normal clean recordings, however the performance of these speaker recognition systems tends to degrade when recognising speech which has emotions. This paper presents an emotional speaker recognition system trained using machine and deep learning algorithms using time, frequency and spectral features on emotional speech database acquired from the Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS). We trained and compared the performance of five machine learning models (Logistic Regression, Support Vector Machine, Random Forest, XGBoost, and k-Nearest Neighbor), and three deep learning models (Long Short-Term Memory network, Multilayer Perceptron, and Convolutional Neural Network). After the evaluation of the models, the deep neural networks showed good performance compared to machine learning models by attaining the highest accuracy of 92% outperforming the state-of-the-art models in emotional speaker detection from speech signals. DA - 2020-11 DB - ResearchSpace DP - CSIR J1 - 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Sol Plaatje University, Kimberley, South Africa, 25 - 27 November 2020 KW - Ryerson AudioVisual Database of Emotional Speech and Song KW - RAVDESS KW - Neural networks KW - Machine learning KW - Emotional recognition KW - Speaker recognition LK - https://researchspace.csir.co.za PY - 2020 SM - 978-1-7281-9521-6 SM - 978-1-7281-9520-9 T1 - Emotional speaker recognition based on machine and deep learning TI - Emotional speaker recognition based on machine and deep learning UR - http://hdl.handle.net/10204/11753 ER -	en_ZA
dc.identifier.worklist	24192	en_US

Files in this item

Name: RS_24192_Emotional ...

Size: 1.100Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.