Analyzing Genetic Connections between Languages by Matching Consonant Classes
Peter Turchin, Ilia Peiros, Murray Gell-Mann
The idea that the Turkic, Mongolian, Tungusic, Korean, and Japanese languages are genetically related (the “Altaic hypothesis”) remains controversial within the linguistic community. In an effort to resolve such controversies, we propose a simple approach to analyzing genetic connections between languages. The Consonant Class Matching (CCM) method uses strict phonological identification and permits no changes in meanings. This allows us to estimate the probability that the observed similarities between a pair (or more) of languages occurred by chance alone. The CCM procedure yields reliable statistical inferences about historical connections between languages: it classifies languages correctly for well-known families (Indo-European and Semitic) and does not appear to yield false positives. The quantitative patterns of similarity that we document for languages within the Altaic family are similar to those in the non-controversial Indo-European family. Thus, if the Indo-European family is accepted as real, the same conclusion should also apply to the Altaic family.