Abstract
The usage of phonetic similarity in comparison of textual strings and elimination of misprints is one of significant issues in philology. It is widely used in automatic text checking. Nowadays most of phonetic algorithms are designed for English language words processing. The quality of comparison may be decreased for non-English languages especially for languages, which have rich morphology and use non-Latin alphabet symbols, e.g. East Slavic languages with Cyrillic letters. We propose an approach to phonetic comparison of Russian language words. It is based on detection letters and letter sequences that have similar pronunciation according to rules of the language. The resultant phonetic representation of the words are coded by prime numbers. The efficiency of the reviewed algorithm is considered in the paper. The algorithm was adopted for Mongolian language phonetic processing.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Yandex keyword statistics service.
References
Storeya, V.C., Songb, I.-Y.: Big data technologies and management: what conceptual modelling can do. Data Know. Eng. 108, 50–67 (2017)
Cubberley Russian, P.: A Linguistic Introduction, 396 p. Cambridge Press (2002)
Parmar, V.P., Kumbharana, C.K.: Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithm(s). Int. J. Comput. Appl. 98(19), 45–49 (2014). (0975 — 8887)
Zahoranský, D., Polasek, I.: Text search of surnames in some slavic and other morphologically rich languages using rule based phonetic algorithms. IEEE/ACM Trans. Audio Speech Lang. Proces. (T–ASL), 553–563. IEEE (2015)
Orr, K.: Data quality and systems theory. Commun. ACM 41(2), 66–71 (1998)
Skripnik, Y.N., Smolenskaya, T.M.: Phonetics of modern Russian Language . Skripnik, Y.N. (ed.) Stavropol — VoSIGI (2010). 152 p. (in Russian)
Valgina, N.S., Rozental’, D.E., Fomina, M.I.: Modern Russian Language: Textbook , 6th edn. In: Valgina, N.S. (ed.) . Moscow Logos (2002). 528 p. (in Russian)
Parubchenko, L.B.: Hypercorrection errors . Russian Literature 4, 23–27 (2005). (in Russian)
GOST R 52535.1-2006. Identification cards. Machine readable travel documents. Part 1 Machine Readable Passports. National Standard of the Russian Federation . Moscow, Russia (2006). 18 p. (in Russian)
Paramonov, V.V., Shigarov, A.O., Ruzhnikov, G.M., Belykh, P.V.: Polyphon: an algorithm for phonetic string matching in russian language. In: Proceeding of the 22nd International Conference Information and Software Thechnologies, ICTIST 2016. Communications in Computer Science, vol. 639, pp. 568–579 (2016)
Alotaibi, Y., Meftah, A.: Review of distinctive phonetic features and the Arabic share in related modern research. Turk. J. Electr. Eng. Comput. Sci. 21(5), 1426–1439 (2013)
The Soundex Indexing System. National archives. http://www.archives.gov/research/census/soundex.html
Ivanova, T.F.: New orthoepic dictionary of Russian. Pronunciation. Accent. Grammatical forms , 2nd edn. Russian language-Media (2005). 893 p. (in Russian)
Zhirmunsky, V.: National Language and social dialects . The State Publisher of Fiction, Moscow (1936). 300 p. (in Russian)
Ozhegov, S.I.: Dictionary of Russian language. About 53000 words . In: Skvortsova L.I. (ed.) 24 edn. Oniks, World and Education, Moscow (2007). 1200 p. (in Russian)
Kasatkin, L.L.: Modern Russian dialectics and literary phonetics as a source for the history of the Russian language . Nauka, Moscow (1999). 528 p. (in Russian)
Budnjam, S., Paramonov, V.V., Ruzhnikov, G.M.: Phonetic strings comparison with particularities of the Mongolian language. Scientific Notes of the University of Science of Mongolia , Ulaanbaatar, N 1 , pp. 40–47 (2017). (in Russian)
Damaševičius, R., Kapociute-Dzikine, J., Wozniak, M.: Towards Rhythmicity analysis of text using empirical mode decomposition. In: Proceeding of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2017), vol. 1, pp. 310–317. KDIR (2017)
Acknowledgement
The reported study was supported in part by RFBR (grants 18-07-00758, 17-57-44006, 16-07-00411), RFBR and Government of Irkutsk Region – grant 17-47-380007. Experiments were performed on the resources of the Shared Equipment Centre of Integrated information and computing network of Irkutsk Research and Educational Complex http://net.icc.ru.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Paramonov, V., Shigarov, A., Ruzhnikov, G., Cherkashin, E. (2019). Phonetic String Matching for Languages with Cyrillic Alphabet. In: Borzemski, L., Świątek, J., Wilimowska, Z. (eds) Information Systems Architecture and Technology: Proceedings of 39th International Conference on Information Systems Architecture and Technology – ISAT 2018. ISAT 2018. Advances in Intelligent Systems and Computing, vol 852. Springer, Cham. https://doi.org/10.1007/978-3-319-99981-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-99981-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99980-7
Online ISBN: 978-3-319-99981-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)