Abstract
Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications ranging from historical investigations to literary analyses. In this work, we use transcribed version of Ottoman literary texts in the Latin alphabet and show that it is possible to develop effective Automatic Text Categorization techniques that can be applied to the Ottoman language. For this purpose, we use two fundamentally different machine learning methods: Naïve Bayes and Support Vector Machines, and employ four style markers: most frequent words, token lengths, two-word collocations, and type lengths. In the experiments, we use the collected works (divans) of ten different poets: two poets from five different hundred-year periods ranging from the 15th to 19th century. The experimental results show that it is possible to obtain highly accurate classifications in terms of poet and time period. By using statistical analysis we are able to recommend which style marker and machine learning method are to be used in future studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sebastiani, F.: Machine learning in automatic text categorization. ACM Comput. Surv. 34(1), 1–47 (October 2002)
Ottoman Text Archive Project. http://courses.washington.edu/otap/ (2011)
Ba?bakanl?k Devlet Ar?ivleri, T.C.: http://www.devletarsivleri.gov.tr (2011)
Holmes, D.I.: Authorship attribution. Comput. Human. 28(2), 87–106 (October 1994)
Merriam, T.: An experiment with the federalist papers. Comput. Human. 23(3), 251–254 (1989)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization (1998)
Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proceedings of the 24th ACM SIGIR conference, 128–136 (2001)
Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F.: Chat mining: Predicting user and message attributes in computer-mediated communication. Inf. Process. Manag. 44(4), 1448–1466 (2008)
Yu, B.: An evaluation of text classification methods for literary study. Lit. Ling. Comp. 23(3), 327–343 (2008)
Can, F., Patton, J.M.: Change of writing style with time. Comput. Human. 38(1), 61–82 (2004)
Patton, J.M., Can, F.: A stylometric analysis of Ya?ar Kemal’s ?nce Memed tetralogy. Comput. Human. 38(4), 457–467 (2004)
Andrews, W.G., Black, N., Kalpakli, M.: Ottoman lyric poetry. University of Texas Press, Austin, Texas, USA (1997)
Forsyth, R.S., Holmes, D.I.: Feature-finding for text classification. Lit. Ling. Comput. 11(4), 162–174 (June 1996)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification (2nd edn.). Wiley-Interscience, New York (2000)
Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)
Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. Lect. Notes Comput. Sci. 3689, 174–189 (November 2005)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: ECML-98, 137–142 (1998)
Scheffe, H.: A method for judging all contrasts in the analysis of variance. Biometrica 40, 87–104 (1953)
Acknowledgments
This work is partially supported by the Scientific and Technical Research Council of Turkey (TÜB?TAK) under the grant number 109E006. Any opinions, findings and conclusions or recommendations expressed in this article belong to the authors and do not necessarily reflect those of the sponsor.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this paper
Cite this paper
Can, E.F., Can, F., Duygulu, P., Kalpakli, M. (2011). Automatic Categorization of Ottoman Literary Texts by Poet and Time Period. In: Gelenbe, E., Lent, R., Sakellari, G. (eds) Computer and Information Sciences II. Springer, London. https://doi.org/10.1007/978-1-4471-2155-8_6
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2155-8_6
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2154-1
Online ISBN: 978-1-4471-2155-8
eBook Packages: EngineeringEngineering (R0)