Introduction

Since the first hypothesis about the relationship between Altaic and some other languages was proposed by Swedish officer Philipp Johann von Strahlenberg in the first half of the eighteenth century, Altaic linguistics has undergone nearly 300 years of research. If Einführung in die Altaische Sprachwissenschaft written by G. J. Ramstedt (1952, 2004), a Finnish scholar, laid the theoretical foundation of Altaics and established the “Altaic Theory”, which separated the “Altaic languages” from the “Ural Altaic language” and became an independent Altaic Theory, then it can be said that publications of N. N. Poppe’s (1897-1991, American scholar) Altaisch und Urtürkisch (Ungarische Jahrbücher, Bd.VI, Berlin 1926), Introduction to Mongolian Comparative Studies (MS-FOU 110, Helsinki 1955), Vergleichende Grammatik der Altaischen Sprachen (Otto Harrassowitz, Wiesbaden, 1960), Introduction to Altaic Linguistics (Otto Harrassowitz, Wiesbaden, 1965, etc. have advanced research on the Altaic Theory. For many years, although historical comparative linguists have consolidated the Altaic Theory from aspects of phonetic features, word formation methods, syntactic structures, phonetic rules and cognate words, many scholars still suspect or oppose Altaicism. For example, the British scholar G. Clauson, the German scholar G. Doerfer et al. maintain that the Turkic language had a strong influence on Mongolian and that Mongolian language had a great influence on Tungusic (Clauson 1959, 1962; Doerfer 1966). The common components (similarities) among Turkic, Mongolian and Manchu-Tungusic languages have evolved from the original structural similarity and language contact (borrowing and influencing). These scholars argue that the Altaic languages have only typological similarity instead of kindred. One important proof is that there are no common numerals and cognate words between Mongolian and Turkic languages. Scholars such as J. Benzing, L. Ligeti, K. Grönbech, D. Sinor, A.Róna-Tas, etc., claim that it is too early to conclude that “Altaic languages” have relevance in etymology. Some scholars hold the opposite stand towards to the Altaic Theory and some scholars suggest more research before making conclusions.

Apparently, scholars realize that commonalities and similarities of phonetic features, word formation and syntactic structure are not sufficient to authenticate the features of original languages. These may be common features rather than original features, and cannot be used to illustrate that these languages have a common origin. This is a major difference between historical comparative linguistics and linguistic typology. For example, although there are some similarities in phonetic features, word formation and syntactic structure for Ural and Altaic languages, the commonalities in phonetic corresponding and cognate words are scarce. Therefore, most scholars do not support the the theory of kinship between Ural and Altaic languages (Huhe 2013a, b).

Altaic languages are mainly spoken in the vast area of North and Central Asia and some areas in Europe. According to statistics, there are about 100 million people speaking an Altaic language, excluding Japan. Then, how many languages belong to the Altaic language family? Due to the differences in the classification of language family, language, and dialect, consensus has not yet been achieved. According to the latest viewpoints of most scholars, there are about 50 Altaic languages. The Altaic language family has been widely used in China, covering all the Altaic languages such as Uyghur, Kazakh, Uzbek, Western Yugu, Kirgiz, Tatar, Salar, Tuvan (Turkic language family); Mongolian, Monguor, Dawoer, Dongxiang, Eastern Yugu, Baoan (Mongolic language family); Manchu, Xibo, Ewenki, Oroqen, Hezhe (Manchu-Tungusic family), etc. Although China is the birthplace of Altaic languages, the contribution of Chinese scholars to the Altaic Theory is far less than that of foreign scholars. In recent years, some scholars, such as Qinggeltai, Geng Shimin, Litipu Tohuti and Chaoke have published some research works; however, most of them are engaged in the descriptive study of a single language family or a comparative study within the same language family. Achievements in the comparative study of Altaic languages based on the perspective of historical comparative linguistics have rarely been reported (Hugjiltu 2004).

Since the establishment of the Permanent International Altaic Conference at the 24th International Conference of Orientalism held in Munich, Germany in 1957, until 2019, 62 international Altaic Conferences have been held, continuously advancing the research of international Altaics. However, due to the complexity of the Altaic Theory and the shortage of research resources all over the world, the Altaic Theory still remains at the stage of hypothesis. In order to make a breakthrough in Altaic research, (1) we should not only actively conduct the comparative study of Altaic languages (language history) using the theories and methodologies of historical comparative linguistics, (2) but also conduct the quantitative and qualitative research of language ontology based on modern science and technology, such as experimental phonetics, computational linguistics, statistics and so on, (3) more importantly, apply archaeology, anthropology, ethnology (folklore) and historiography to examine the living traces left by the Altaic ethnic groups; (4) investigate the genetic information and ethnic differences of these ethnic groups by applying anatomy, genetics, and especially DNA technology. Up to present, scholars have mainly focused on the first and the third methods (these two are based on living traces. However, most of the Altaic ethnic groups are nomadic, and very few written documents and relics exist), while the second and third methods are rarely used (these two are empirical and rich resources can be utilized). This paper applies the second method (experimental phonetics) to conduct the research.

Theory and methods

Since the early 1990s, we conducted some quantitative and qualitative studies on the segmental and suprasegmental phonetic features of Mongolic dialects and even Altaic languages by applying the theories and methods in experimental phonetics, made some new findings which are different from those found by traditional linguistic studies, and propose some new theories, methods, and viewpoints to solve problems in phonetics and prosody which cannot be resolved with traditional linguistic means. However, our previous research focuses mainly on the averaged values of the acoustic parameters of a single language, and less attention is paid to the distribution pattern and variation (range and trend) of phonetic segments in acoustic space. Generally, we pay too much attention to the description of the synchronic status (static state) of segments while ignoring the historical evolution (dynamic) of segments.

Since 2013, to facilitate research on the acoustic feature of speech, our research team has developed software tools to automatically label and retrieve phonetic feature of acoustic parameters. We also developed “Unified Platform Software” (Unified Platform for Speech Acoustic Parameters of Chinese Minority Languages), which accomplishes inquiry, output, and analysis of acoustic parameters of speech. So far, the Unified Platform contains acoustic parameters (vowel, consonant, segmental and suprasegmental features) of 10 minority languages. Each language contains a word-list of 1000-2000 polysyllabic words which are pronounced at reading speed. Based on the Unified Platform (Huhe et al. 2009), by analyzing and studying “vowel acoustic dynamic distribution”, “voice acoustic distribution pattern” and “voice acoustic distribution type” of a single language, we found that the similarity of acoustic distribution pattern of segments is related to language closeness. By comparing the similarities of the “vowel acoustic triangle” and the “acoustic distribution pattern” between Mongolic and Altaic languages, we examine the closeness or relevance between these languages (Huhe 2013a, b, 2016, 2019). Through these empirical researches, we realize that our results and conclusions can be used to verify and correct the conclusions obtained in historical comparative linguistics and consolidate the Altaic Theory. At present, we mainly focus on this issue so as to verify the relevance among languages. The Research Roadmap is demonstrated in Fig. 1.

Fig. 1
figure 1

The Research Roadmap

However, in terms of the relationship among languages, terms such as similarity, closeness, relevance, and kindred (or kinship) are used to evaluate distances among languages (the four terms are sorted by distances from far to near). With a high similarity, two languages are close to each other. With a high closeness, two languages may have relevance. But, it is impossible to conclude kinship of languages with just relevance because it involves multiple complicated factors which exceed our discussions in the paper.

Our proposed “acoustic distribution model of phonetic segments” is visible and measurable. In order to distinguish the individuality and generality of the acoustic distribution patterns of phonetic segments, we suggest that the acoustic distribution characteristics of monolingual speech be called “acoustic distribution pattern” (actual system), and the original model reconstructed by analyzing and comparing the multilingual “acoustic distribution pattern” be called “phonemic distribution pattern” (reconstructed system). The acoustic distribution pattern (modern and actual) of Altaic vowels is shown in Fig. 2. The phonemic distribution pattern (ancient and reconstructed) of Altaic vowels is shown in Fig. 3. In the two figures ellipses represent phonemes and allophones. The numbers and the scopes of phonemes and allophones in the two figures are different, indicating evolution trend of phonological systems. In Figs. 2 and 3, all data come from male informants (MGYM - Mongolian, UGYM - Uyghur, EWKY - Ewenki).

Fig. 2
figure 2

Acoustic distribution patterns of vowels in Uyghur, Mongolian and Ewenki (actual)

Fig. 3
figure 3

Phonemic distribution pattern of vowels in Uyghur, Mongolian and Ewenki (reconstructed)

First, the first and second formants of all vowels in the first syllable of each word for each language were extracted by using the Unified Platform. Thus, the acoustic distribution pattern of vowels in each language could be computed. Second, based on the vowel acoustic distribution pattern, we demonstrated the acoustic distribution pattern of vowels in each language (Figs. 2 and 3). Thirdly, we calculated the similarity between the two languages with “histogram distance method” and “block histogram method” (please refer to Tables 1, 2 and 3 and notes [1]). Finally, the similarity and closeness were analyzed by using historical comparative linguistics and experimental phonetics and the vowel phonemic pattern of the original language was reconstructed.

Table 1 Comparison of acoustic distribution patterns of vowels in three languages (actual)
Table 2 Comparison of acoustic distribution patterns of reconstructed vowels in three languages (reconstructed)
Table 3 Comparison of acoustic distribution patterns of Vowels in three languages (parameter)

The histogram distance calculation algorithm was applied to measure the similarity of two images. Firstly, the histograms, Hista and Histb, of the two images were calculated respectively. Then, the normalized correlation coefficients (Bhattacharyya distance, histogram intersection distance) of the two histograms were computed. Bhattacharyya distance refers to the similarity of two discrete or continuous probability distribution values. It is closely related to the Bhattacharyya coefficient, which was used to measure the overlapping between two statistical samples. Meanwhile, the Bhattacharyya coefficient can be used to measure the dispersion of class variables.

Results

Similarity comparison of vowel acoustic distribution patterns (actual)

As Table 1 shows, the similarity between Mongolian and Ewenki (the highest similarity value reaches 85%) is higher than that between Mongolian and Uyghur, or Uyghur and Ewenki. Here is the result:

“Histogram Distance” (similarity from large to small):85% (Mongolian — Ewenki) > 79% (Mongolian — Uyghur) > 76% (Uyghur — Ewenki).

“Block Histogram Distance” (similarity from large to small):67% (Mongolian — Ewenki) > 54% (Mongolian — Uyghur) > 52% (Uyghur — Ewenki).

Table 3 (calculated based on acoustic parameters) shows that the similarity between Mongolian and Ewenki (the highest similarity value reaches 69%) is higher than that between Mongolian and Uyghur, Uyghur and Ewenki. The following is the calculation results:

Male:69% (Mongolian — Ewenki) > 65% (Mongolian — Uyghur) > 57% (Uyghur — Ewenki).

Female:64% (Mongolian — Ewenki) > 59% (Mongolian — Uyghur) > 46% (Uyghur — Ewenki).

Similarity comparison of vowel acoustic distribution pattern (reconstructed)

Table 2 shows that the similarity values of reconstructed vowel patterns of the three languages are basically consistent with the above results. For example, the highest similarity between Mongolian and Ewenki is 71%.

“Histogram Distance” (similarity from large to small):

Male:71% (Mongolian — Ewenki) > 69% (Mongolian — Uyghur) > 60% (Uyghur — Ewenki).

Female:69% (Mongolian — Ewenki) > 67% (Uyghur — Ewenki) > 62% (Mongolian — Uyghur).

“Block Histogram Distance” (similarity from large to small):

Male:57% (Mongolian — Ewenki) > 54% (Mongolian — Uyghur) > 52% (Uyghur — Ewenki).

Female:57% (Mongolian — Ewenki) > 53% (Uyghur — Ewenki) > 51% (Mongolian — Uyghur).

The similarity comparison of cardinal vowels of three languages by using logarithmic quotient model

The vowel normalization algorithm extracts the essence of features of vowels by eliminating pronunciation variance caused by speaker, context etc. After evaluating the performances of some classical vowel normalization models (Johnson 2005), Zhou Xuewen proposed a high-performance vowel normalization algorithm—“logarithmic quotient model” (Xuewen 2013; Xuewen and Long 2017). We apply this model to normalize the formant values of three cardinal vowels (a, i, u) of the three languages and compare their articulation distances. Table 4 shows the distances of averaged normalized values of three vowels in the three languages. The right-most column (sum of distances of three vowels) shows that the distance between Mongolian and Ewenki is the smallest (0.090), indicating that Mongolian and Ewenki are more similar.

Table 4 Distances among normalized values of the three vowels

The relevance between the similarity of vowel acoustic distribution pattern and language closeness

Some scholars maintain that the common elements (similarity) between Altaic languages originate from the result of primitive structural similarity as well as mutual contact and influence (borrowing or interaction). Similarity in typology in Altaic languages does exist. In this paper, we only focus on the closeness of the three languages. Whether in Figs. 2 and 3 (qualitative) or from the similarity value (quantitative) in Tables 1, 2 and 3, we find out that the similarity of vowel acoustic distribution patterns between Mongolian and Ewenki is higher than that between Mongolian and Uyghur. Therefore, we are convinced that, in terms of vowels, the closeness of the first two languages is higher than that of the latter. In addition, the similarity values in Tables 1, 2 and 3 show that the similarity between the three languages is more than 50%, indicating that their remarkable typological similarity.

Discussion and conclusion

This paper proposes the hypothesis that the similarity of the acoustic distribution pattern of vowels in phonetic segments and language closeness are related. By calculating and comparing the similarity values of the distribution pattern of vowels in the first syllable in Mongolian, Uyghur and Ewenki (Altaic language family), based on the Unified Platform, we examined the closeness and relevance among the three languages and have made the followings preliminary conclusions:

Mongolian and Ewenki languages are close relatives. They and the Uyghur language are distant relatives and share typological similarity.

We maintain that languages in the same language family share original and common “language genetic information” (abbreviated as “language DNA”). This “language DNA” in acoustic space is reflected as “acoustic parameter model of speech and prosody” (abbreviated as “speech acoustic model”). Like the DNA of organisms, “language DNA” contains the “linguistic genetic information” from the origin languages of contemporary languages’ origins. Although languages have undergone varying degrees of variation, change, and evolution in their long history, the original and common “language DNA” of the same language family is stable. By comparing the similarity found in the “acoustic parameter model” among languages, we can find out the original common “DNA” of the same language family.

Although many problems remain to be further clarified and solved, as interdisciplinary research, this study possesses pioneering implications and addresses some important issues. For example, (1) the acoustic distribution pattern of vowels includes both modern and historical sounds (evolution clues). How to accurately examine historical phonetics from a synchronic perspective? How to understand and explain the relevance between synchronic and diachronic phonetics? (2) Although the phonetic system is stable, it is also evolutionary. The synchronic model cannot fully reflect the diachronic model. What is the relationship between phonetic stability and change? (3) To what extent do modern phonetic patterns reflect historical patterns? To what extent do they reflect the change? (4) How to apply the modern experimental phonetic theory to assist the studies of historical origins of phonetics? (5) As an assessment of closeness or relevance of languages, the similarity value (index) needs to be further quantified. As the research continues, it is certain that these problems can be solved by calculating and comparing the acoustic distribution patterns and pattern similarity values of segmental and suprasegmental acoustics, thus advancing the development of linguistics, linguistic typology, historical comparative linguistics, and anthropology.

At present, the above discussions and results remain at an experimental and exploratory stage; thus, no final conclusions have been made. In addition, features of consonants and syllables are not included in the paper. Because the results of this research rely closely on image recognition and vowel normalization technologies, however, and with advanced technologies and increased acoustic data, it is expected that our research will be further expanded and deepened and the resulting conclusions will be more accurate and convincing.

Note.

[1] Bhattacharyya distance and Bhattacharyya coefficient are named after A. Bhattacharya, a statistician who worked at the Indian Institute of statistics in the 1930s.

  1. (1)

    Bhattacharyya Distance

For discrete DB(p, q) =  − In(BC(p, q)) probability p and q, which reside in the same domain X, Bhattacharyya Distance.

Among which, \(BC\left(p,q\right)=\sum \limits_{x\in X}\sqrt{p(x)q(x)}\) is Bhattacharyya coefficient.

For continuous probability p and q, Bhattacharyya coefficient is: \(BC\left(p,q\right)=\int \sqrt{p(x)q(x)}\kern0.5em dx\).

In case of 0 ≤ BC ≤ 1 and 0 ≤ DB ≤  ∞  , DB does not accord with trigonometric inequality (in addition, Hellinger distance does not accord with trigonometric inequality \(0\le {D}_B\le \infty {D}_B\sqrt{1- BC}\)).

For Multi-variable Gauss distribution, the sum of pi = N(mi, Pi) and \({D}_B=\frac{1}{8}{\left({m}_1-{m}_2\right)}^T{P}^{-1}\left({m}_1-{m}_2\right)+\frac{1}{2}\mathrm{In}\left(\frac{\det \kern0.5em P}{\sqrt{\det \kern0.5em {P}_1\kern0.5em \det \kern0.5em {P}_2}}\right)\) is distribution of means and covariance \(p=\frac{P_1+{P}_2}{2}\).

In this case, Bhattacharyya distance in the first item is related to Mahalanobis distance.

  1. (2)

    Bhattacharyya coefficient

Bhattacharyya coefficient is the approximate measurement of overlapping between two samples a and b. Overlapping area is divided into sub-zones (the number is n).

$$\mathrm{Bhattacharyya}=\sum \limits_{i=1}^n\sqrt{\left(\Sigma {\mathrm{a}}_i\cdot \Sigma {\mathrm{b}}_i\right)}$$

This algorithm is based on measuring image similarity by computing mathematical vector differences. It demonstrates two advantages. First, it is easy to normalize a histogram. Second, similarity between two images with different resolution can be computed with a histogram easily and efficiently.