Author:Botha, GR; Barnard, EDate:Nov 2007The authors investigate the factors that determine the performance of text-based language identification, with a particular focus on the 11 official languages of South Africa, using n-gram statistics as features for classification. For a fixed ...Read more
Author:Zulu, PN; Botha, G; Barnard, EDate:2007Two methods for objectively measuring similarities and dissimilarities between the 11 official languages of South Africa are described. The first concerns the use of n-grams. The confusions between different languages in a text-based language ...Read more