Usage of HMM-Based Speech Recognition Methods for Automated Determination of a Similarity Level Between Languages

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1119)


The problem of automated determination of language similarity (or even defining of a distance on the space of languages) could be solved in different ways – working with phonetic transcriptions, with speech recordings or both of them. For the recordings, we propose and test a HMM-based one: in the first part of our article we successfully try language detection, afterwards we are trying to calculate distances between HMM-based models, using different metrics and divergences. The Kullback-Leibler divergence is the only one we got good results with – it means that the calculated distances between languages correspond to analytical understanding of similarity between them. Even if it does not work very well, the conclusion is that this method is usable, but usage of some other methods could be more rational.


Distance between languages Hidden Markov models Kullback-Leibler divergence 


  1. 1.
    Bинoгpaдoв, B.A.: Идиoм. Лингвиcтичecкий энциклoпeдичecкий cлoвapь/Пoд peд. B.H. Яpцeвoй, cтp. 685. Coвeтcкaя энциклoпeдия, Mocквa (1990)Google Scholar
  2. 2.
    Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 262–286 (1989)CrossRefGoogle Scholar
  3. 3.
    Кyшниp, Д.A.: Aлгopитм фopмиpoвaния cтpyктypы этaлoнa для пocлoвнoгo диктopo-нeзaвиcимoгo pacпoзнaвaния кoмaнд oгpaничeннoгo cлoвapя. Штyчный iнтeлeкт № 3’2006, Київ (2006)Google Scholar
  4. 4.
    Open image in new window [berzini, a.] Open image in new window Open image in new window [inp’ormats’iis mopovebis prints’ipebi p’onogramebis avtomaturi analizist’vis] = Пpинципы cбopa инфopмaции для aвтoмaтизиpoвaннoгo aнaлизa фoнoгpaмм. Open image in new window - 2011 [k’art’uli ena da t’anamedrove tek’nologiebi - 2011] cтp. 39–46. Open image in new window [meridiani], Open image in new window [t’bilisi] (2011)Google Scholar
  5. 5.
    Young, S., et al.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2009)Google Scholar
  6. 6.
    Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Šimko, J., Suni, A., Hiovain, K., Vainio, M.: Comparing languages using hierarchical prosodic analysis. In: Proceedings of Interspeech 2017, pp. 1213–1217 (2017)Google Scholar
  8. 8.
    Nerbonne, J., Heeringa, W., van den Hout, E., van der Kooi, P., Otten, S., van de Vis, S.W.: Phonetic distance between Dutch dialects. In: CLIN VI, Papers from the Sixth CLIN Meeting. Antwerp: University of Antwerp, Center for Dutch Language and Speech, pp. 185–202Google Scholar
  9. 9.
    Tambovtsev, Y.: Phonological similarity between basque and other world languages based on the frequency of occurrence of certain typological consonantal features. Prague Bull. Math. Linguist. 79–80, 121–126 (2003)Google Scholar
  10. 10.
    Berzinch, A.A.: La comparaison de typologie traditionnelle et de typologie phonolexique, basée sur la méthode des n-grammes, dans les dialectes baltes. Identification des langues et des variétés dialectales par les humains et par les machines. Paris: École National Supérieure des Télécommunications (2004)Google Scholar
  11. 11.
    Бepзинь, A.У.: Измepeниe фoнoмopфoлeкcичecкoгo paccтoяния мeждy лaтышcкими нapeчиями пyтём пpимeнeния paccтoяния Baгнepa-Фишepa. Tpyды мeждyнapoднoй кoнфepeнции. Диaлoг 2006. M.: Издaтeльcтвo PГГУ (2006)Google Scholar
  12. 12.
    Demogrāfija 2018: statistisko datu krājums. R.: Centrālā statistikas pārvalde (2018)Google Scholar
  13. 13.
    Mehl, M.R., Vazire, S., Ramírez-Esparza, N., Slatcher, R.B., Pennebaker, J.W.: Are women really more talkative than men? Science 317(5832), 82 (2007). American Association for the Advancement of Science, WashingtonGoogle Scholar
  14. 14.
    Liberman M.: Sex-Linked Lexical Budgets. Language Log 2006/2007. Accessed 15 Sept 2019

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of LatviaRigaLatvia

Personalised recommendations