Abstract
Greeklish to Greek transcription does undeniably seem to be a challenging task since it cannot be accomplished by directly mapping each Greek character to a corresponding symbol of the Latin alphabet. The ambiguity in the human way of Greeklish writing, since Greeklish users do not follow a standardized way of transliteration makes the process of transcribing Greeklish back to Greek alphabet challenging. Even though a plethora of deterministic approaches for the task at hand exists, this paper presents a non-deterministic, vocabulary-free approach, which produces comparable and even better results, supports argot and other linguistic peculiarities, based on an ensemble classification methodology of Data Mining, namely Random Forests. Using data from real users from a conglomeration of resources such as Blogs, forums, email lists, etc., as well as artificial data from a robust stochastic Greek to Greeklish transcriber, the proposed approach depicts satisfactory outcomes in the range of 91.5%-98.5%, which is comparable to an alternative commercial approach.
Chapter PDF
Similar content being viewed by others
References
Dale, I.R.H.: “Digraphia”. International Journal of the Sociology of Language 26, 5–13 (1980)
Androutsopoulos, J.: Latin-Greek spelling in e-mail messages: Usage and attitudes. In: Studies in Greek Linguistics, pp. 75–86 (2000) (in Greek)
Tseliga, T., Marinis, T.: On-line processing of Roman-alphabeted Greek: the influence of morphology in the spelling preferences of Greeklish. In: 6th International Conference in Greek Linguistics, Rethymno, Crete, September 18-21 (2003)
ELOT, Greek Organisation of Standardization (1982)
e-Chaos: freeware Greeklish converter, http://www.paraschis.gr/files.php
Greek to Greeklish by Innoetics, http://services.innoetics.com/greeklish/
Chalamandaris, A., Protopapas, A., Tsiakoulis, P., Raptis, S.: All Greek to me! An automatic Greeklish to Greek transliteration system. In: Proceedings of the 5th Intl. Conference in Language Resources and Evaluation, pp. 1226–1229 (2006)
DeGreeklish, http://tools.wcl.ece.upatras.gr/degreeklish
Greeklish Out!, http://greeklishout.gr/main/
Breiman, L.: Random forests. Machine Learning Journal 45, 532 (2001)
Kononenko, I.: Estimating attributes: analysis and extensions of Relief. In: De Raedt, L., Bergadano, F. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Hatzigeorgiu, N., Mikros, G., Carayannis, G.: Word length, word frequencies and Zipf’s law in the Greek language. Journal of Quantitative Linguistics 8, 175–185 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Panteli, A., Maragoudakis, M. (2011). A Random Forests Text Transliteration System for Greek Digraphia. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds) Artificial Intelligence Applications and Innovations. EANN AIAI 2011 2011. IFIP Advances in Information and Communication Technology, vol 364. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23960-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-23960-1_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23959-5
Online ISBN: 978-3-642-23960-1
eBook Packages: Computer ScienceComputer Science (R0)