Advertisement

Phonetic String Matching for Languages with Cyrillic Alphabet

  • Viacheslav ParamonovEmail author
  • Alexey Shigarov
  • Gennady Ruzhnikov
  • Evgeny Cherkashin
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 852)

Abstract

The usage of phonetic similarity in comparison of textual strings and elimination of misprints is one of significant issues in philology. It is widely used in automatic text checking. Nowadays most of phonetic algorithms are designed for English language words processing. The quality of comparison may be decreased for non-English languages especially for languages, which have rich morphology and use non-Latin alphabet symbols, e.g. East Slavic languages with Cyrillic letters. We propose an approach to phonetic comparison of Russian language words. It is based on detection letters and letter sequences that have similar pronunciation according to rules of the language. The resultant phonetic representation of the words are coded by prime numbers. The efficiency of the reviewed algorithm is considered in the paper. The algorithm was adopted for Mongolian language phonetic processing.

Keywords

Natural language processing Phonetic algorithms String comparison Cyrillic letters 

Notes

Acknowledgement

The reported study was supported in part by RFBR (grants 18-07-00758, 17-57-44006, 16-07-00411), RFBR and Government of Irkutsk Region – grant 17-47-380007. Experiments were performed on the resources of the Shared Equipment Centre of Integrated information and computing network of Irkutsk Research and Educational Complex http://net.icc.ru.

References

  1. 1.
    Storeya, V.C., Songb, I.-Y.: Big data technologies and management: what conceptual modelling can do. Data Know. Eng. 108, 50–67 (2017)CrossRefGoogle Scholar
  2. 2.
    Cubberley Russian, P.: A Linguistic Introduction, 396 p. Cambridge Press (2002)Google Scholar
  3. 3.
    Parmar, V.P., Kumbharana, C.K.: Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithm(s). Int. J. Comput. Appl. 98(19), 45–49 (2014). (0975 — 8887)Google Scholar
  4. 4.
    Zahoranský, D., Polasek, I.: Text search of surnames in some slavic and other morphologically rich languages using rule based phonetic algorithms. IEEE/ACM Trans. Audio Speech Lang. Proces. (T–ASL), 553–563. IEEE (2015)Google Scholar
  5. 5.
    Orr, K.: Data quality and systems theory. Commun. ACM 41(2), 66–71 (1998)CrossRefGoogle Scholar
  6. 6.
    Skripnik, Y.N., Smolenskaya, T.M.: Phonetics of modern Russian Language Open image in new window. Skripnik, Y.N. (ed.) Stavropol — VoSIGI (2010). 152 p. (in Russian)Google Scholar
  7. 7.
    Valgina, N.S., Rozental’, D.E., Fomina, M.I.: Modern Russian Language: Textbook Open image in new window, 6th edn. In: Valgina, N.S. (ed.) . Moscow Logos (2002). 528 p. (in Russian)Google Scholar
  8. 8.
    Parubchenko, L.B.: Hypercorrection errors Open image in new window. Russian Literature 4, 23–27 (2005). (in Russian)Google Scholar
  9. 9.
    GOST R 52535.1-2006. Identification cards. Machine readable travel documents. Part 1 Machine Readable Passports. National Standard of the Russian Federation Open image in new windowOpen image in new windowOpen image in new window. Moscow, Russia (2006). 18 p. (in Russian)Google Scholar
  10. 10.
    Paramonov, V.V., Shigarov, A.O., Ruzhnikov, G.M., Belykh, P.V.: Polyphon: an algorithm for phonetic string matching in russian language. In: Proceeding of the 22nd International Conference Information and Software Thechnologies, ICTIST 2016. Communications in Computer Science, vol. 639, pp. 568–579 (2016)Google Scholar
  11. 11.
    Alotaibi, Y., Meftah, A.: Review of distinctive phonetic features and the Arabic share in related modern research. Turk. J. Electr. Eng. Comput. Sci. 21(5), 1426–1439 (2013)CrossRefGoogle Scholar
  12. 12.
    The Soundex Indexing System. National archives. http://www.archives.gov/research/census/soundex.html
  13. 13.
    Ivanova, T.F.: New orthoepic dictionary of Russian. Pronunciation. Accent. Grammatical forms Open image in new windowOpen image in new window, 2nd edn. Russian language-Media (2005). 893 p. (in Russian)Google Scholar
  14. 14.
    Zhirmunsky, V.: National Language and social dialects Open image in new window. The State Publisher of Fiction, Moscow (1936). 300 p. (in Russian)Google Scholar
  15. 15.
    Ozhegov, S.I.: Dictionary of Russian language. About 53000 words Open image in new window. In: Skvortsova L.I. (ed.) 24 edn. Oniks, World and Education, Moscow (2007). 1200 p. (in Russian)Google Scholar
  16. 16.
    Kasatkin, L.L.: Modern Russian dialectics and literary phonetics as a source for the history of the Russian language Open image in new windowOpen image in new window. Nauka, Moscow (1999). 528 p. (in Russian)Google Scholar
  17. 17.
    Budnjam, S., Paramonov, V.V., Ruzhnikov, G.M.: Phonetic strings comparison with particularities of the Mongolian language. Scientific Notes of the University of Science of Mongolia Open image in new windowOpen image in new window, Ulaanbaatar, N 1 , pp. 40–47 (2017). (in Russian)Google Scholar
  18. 18.
    Damaševičius, R., Kapociute-Dzikine, J., Wozniak, M.: Towards Rhythmicity analysis of text using empirical mode decomposition. In: Proceeding of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2017), vol. 1, pp. 310–317. KDIR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Viacheslav Paramonov
    • 1
    • 2
    Email author
  • Alexey Shigarov
    • 1
    • 2
  • Gennady Ruzhnikov
    • 1
  • Evgeny Cherkashin
    • 1
    • 2
    • 3
  1. 1.Matrosov Institute for System Dynamics and Control Theory of SB RASIrkutskRussia
  2. 2.Institute of Mathematics Economics and InformaticsIrkutsk State UniversityIrkutskRussia
  3. 3.National Research Irkutsk State Technical UniversityIrkutskRussia

Personalised recommendations