Skip to main content

Phonetic String Matching for Languages with Cyrillic Alphabet

  • Conference paper
  • First Online:
  • 740 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 852))

Abstract

The usage of phonetic similarity in comparison of textual strings and elimination of misprints is one of significant issues in philology. It is widely used in automatic text checking. Nowadays most of phonetic algorithms are designed for English language words processing. The quality of comparison may be decreased for non-English languages especially for languages, which have rich morphology and use non-Latin alphabet symbols, e.g. East Slavic languages with Cyrillic letters. We propose an approach to phonetic comparison of Russian language words. It is based on detection letters and letter sequences that have similar pronunciation according to rules of the language. The resultant phonetic representation of the words are coded by prime numbers. The efficiency of the reviewed algorithm is considered in the paper. The algorithm was adopted for Mongolian language phonetic processing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Yandex keyword statistics service.

References

  1. Storeya, V.C., Songb, I.-Y.: Big data technologies and management: what conceptual modelling can do. Data Know. Eng. 108, 50–67 (2017)

    Article  Google Scholar 

  2. Cubberley Russian, P.: A Linguistic Introduction, 396 p. Cambridge Press (2002)

    Google Scholar 

  3. Parmar, V.P., Kumbharana, C.K.: Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithm(s). Int. J. Comput. Appl. 98(19), 45–49 (2014). (0975 — 8887)

    Google Scholar 

  4. Zahoranský, D., Polasek, I.: Text search of surnames in some slavic and other morphologically rich languages using rule based phonetic algorithms. IEEE/ACM Trans. Audio Speech Lang. Proces. (T–ASL), 553–563. IEEE (2015)

    Google Scholar 

  5. Orr, K.: Data quality and systems theory. Commun. ACM 41(2), 66–71 (1998)

    Article  Google Scholar 

  6. Skripnik, Y.N., Smolenskaya, T.M.: Phonetics of modern Russian Language . Skripnik, Y.N. (ed.) Stavropol — VoSIGI (2010). 152 p. (in Russian)

    Google Scholar 

  7. Valgina, N.S., Rozental’, D.E., Fomina, M.I.: Modern Russian Language: Textbook , 6th edn. In: Valgina, N.S. (ed.) . Moscow Logos (2002). 528 p. (in Russian)

    Google Scholar 

  8. Parubchenko, L.B.: Hypercorrection errors . Russian Literature 4, 23–27 (2005). (in Russian)

    Google Scholar 

  9. GOST R 52535.1-2006. Identification cards. Machine readable travel documents. Part 1 Machine Readable Passports. National Standard of the Russian Federation . Moscow, Russia (2006). 18 p. (in Russian)

    Google Scholar 

  10. Paramonov, V.V., Shigarov, A.O., Ruzhnikov, G.M., Belykh, P.V.: Polyphon: an algorithm for phonetic string matching in russian language. In: Proceeding of the 22nd International Conference Information and Software Thechnologies, ICTIST 2016. Communications in Computer Science, vol. 639, pp. 568–579 (2016)

    Google Scholar 

  11. Alotaibi, Y., Meftah, A.: Review of distinctive phonetic features and the Arabic share in related modern research. Turk. J. Electr. Eng. Comput. Sci. 21(5), 1426–1439 (2013)

    Article  Google Scholar 

  12. The Soundex Indexing System. National archives. http://www.archives.gov/research/census/soundex.html

  13. Ivanova, T.F.: New orthoepic dictionary of Russian. Pronunciation. Accent. Grammatical forms , 2nd edn. Russian language-Media (2005). 893 p. (in Russian)

    Google Scholar 

  14. Zhirmunsky, V.: National Language and social dialects . The State Publisher of Fiction, Moscow (1936). 300 p. (in Russian)

    Google Scholar 

  15. Ozhegov, S.I.: Dictionary of Russian language. About 53000 words . In: Skvortsova L.I. (ed.) 24 edn. Oniks, World and Education, Moscow (2007). 1200 p. (in Russian)

    Google Scholar 

  16. Kasatkin, L.L.: Modern Russian dialectics and literary phonetics as a source for the history of the Russian language . Nauka, Moscow (1999). 528 p. (in Russian)

    Google Scholar 

  17. Budnjam, S., Paramonov, V.V., Ruzhnikov, G.M.: Phonetic strings comparison with particularities of the Mongolian language. Scientific Notes of the University of Science of Mongolia , Ulaanbaatar, N 1 , pp. 40–47 (2017). (in Russian)

    Google Scholar 

  18. Damaševičius, R., Kapociute-Dzikine, J., Wozniak, M.: Towards Rhythmicity analysis of text using empirical mode decomposition. In: Proceeding of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2017), vol. 1, pp. 310–317. KDIR (2017)

    Google Scholar 

Download references

Acknowledgement

The reported study was supported in part by RFBR (grants 18-07-00758, 17-57-44006, 16-07-00411), RFBR and Government of Irkutsk Region – grant 17-47-380007. Experiments were performed on the resources of the Shared Equipment Centre of Integrated information and computing network of Irkutsk Research and Educational Complex http://net.icc.ru.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viacheslav Paramonov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paramonov, V., Shigarov, A., Ruzhnikov, G., Cherkashin, E. (2019). Phonetic String Matching for Languages with Cyrillic Alphabet. In: Borzemski, L., Świątek, J., Wilimowska, Z. (eds) Information Systems Architecture and Technology: Proceedings of 39th International Conference on Information Systems Architecture and Technology – ISAT 2018. ISAT 2018. Advances in Intelligent Systems and Computing, vol 852. Springer, Cham. https://doi.org/10.1007/978-3-319-99981-4_28

Download citation

Publish with us

Policies and ethics