There are many services that generate or archive vast amounts of spoken audio data, on a daily basis. It is infeasible for humans to create annotations, by listening to all this data, so automatic methods are needed to process the data and create appropriate transcriptions or assign relevant annotations. There are many important informational aspects associated with the audio data, but one common component is the topic under discussion. Knowing the topic can help process the data in a variety of ways – segment an audio stream by dynamic-topic tracking, cluster similar audio recordings according based on the topic or improve ASR recognition outputs by using appropriate language models. In this work, the best spoken audio topic identification system achieved an error rate of 17.6%, using an ASR system that produced an average word error rate of 57% and supervised latent Dirichlet allocation topic modelling technique. The proposed language model topic modelling technique produced the worst results, highlighting the sensitivity to high ASR word error rates. The support vector machine topic classifier, which made use of a simplified term-weighted feature vector, performed comparably to that of the term frequency inverse document frequency feature vectors.
Reference:
Kleynhans, N. 2014. An investigation into spoken audio topic identification using the Fisher Corpus. Proceedings of the 2014 PRASA, RobMech and AfLat International Joint Symposium, Cape Town, South Africa, 27-28 November 2014, pp 1-5
Kleynhans, N. (2014). An investigation into spoken audio topic identification using the Fisher Corpus. Pattern Recognition Association of South Africa. http://hdl.handle.net/10204/7899
Kleynhans, N. "An investigation into spoken audio topic identification using the Fisher Corpus." (2014): http://hdl.handle.net/10204/7899
Kleynhans N, An investigation into spoken audio topic identification using the Fisher Corpus; Pattern Recognition Association of South Africa; 2014. http://hdl.handle.net/10204/7899 .