This paper compares the recognition accuracy of a phoneme-based automatic speech recognition system with that of a grapheme-based system, using Afrikaans as case study. The first system is developed using a conventional pronunciation dictionary, while the latter system uses the letters of each word directly as the acoustic units to be modelled. We ensure that the pronunciation dictionary we use is highly accurate and then investigate the extent to which ASR performance degrades when the dictionary is removed.We analyse this effect at different data set sizes and classify the causes of performance degradation. With grapheme-based ASR outperforming phoneme-based ASR in certain word categories, we find that relative error rates are highly dependent on word category, which points towards strategies for compensating for grapheme-based inaccuracies.
Reference:
Basson, W.D and Davel, M.H. 2012. Comparing grapheme-based and phoneme-based speech recognition for Afrikaans. In: PRASA 2012, CSIR International Convention Centre, Pretoria, 29-30 November 2012
Basson, W., & Davel, M. (2012). Comparing grapheme-based and phoneme-based speech recognition for Afrikaans. PRASA 2012. http://hdl.handle.net/10204/6492
Basson, WD, and MH Davel. "Comparing grapheme-based and phoneme-based speech recognition for Afrikaans." (2012): http://hdl.handle.net/10204/6492
Basson W, Davel M, Comparing grapheme-based and phoneme-based speech recognition for Afrikaans; PRASA 2012; 2012. http://hdl.handle.net/10204/6492 .