Skip to main content

Automatic Categorization of Ottoman Literary Texts by Poet and Time Period

  • Conference paper
  • First Online:
Computer and Information Sciences II

Abstract

Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications ranging from historical investigations to literary analyses. In this work, we use transcribed version of Ottoman literary texts in the Latin alphabet and show that it is possible to develop effective Automatic Text Categorization techniques that can be applied to the Ottoman language. For this purpose, we use two fundamentally different machine learning methods: Naïve Bayes and Support Vector Machines, and employ four style markers: most frequent words, token lengths, two-word collocations, and type lengths. In the experiments, we use the collected works (divans) of ten different poets: two poets from five different hundred-year periods ranging from the 15th to 19th century. The experimental results show that it is possible to obtain highly accurate classifications in terms of poet and time period. By using statistical analysis we are able to recommend which style marker and machine learning method are to be used in future studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sebastiani, F.: Machine learning in automatic text categorization. ACM Comput. Surv. 34(1), 1–47 (October 2002)

    Article  Google Scholar 

  2. Ottoman Text Archive Project. http://courses.washington.edu/otap/ (2011)

  3. Ba?bakanl?k Devlet Ar?ivleri, T.C.: http://www.devletarsivleri.gov.tr (2011)

  4. Holmes, D.I.: Authorship attribution. Comput. Human. 28(2), 87–106 (October 1994)

    Article  Google Scholar 

  5. Merriam, T.: An experiment with the federalist papers. Comput. Human. 23(3), 251–254 (1989)

    Article  Google Scholar 

  6. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization (1998)

    Google Scholar 

  7. Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proceedings of the 24th ACM SIGIR conference, 128–136 (2001)

    Google Scholar 

  8. Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F.: Chat mining: Predicting user and message attributes in computer-mediated communication. Inf. Process. Manag. 44(4), 1448–1466 (2008)

    Article  Google Scholar 

  9. Yu, B.: An evaluation of text classification methods for literary study. Lit. Ling. Comp. 23(3), 327–343 (2008)

    Article  Google Scholar 

  10. Can, F., Patton, J.M.: Change of writing style with time. Comput. Human. 38(1), 61–82 (2004)

    Article  Google Scholar 

  11. Patton, J.M., Can, F.: A stylometric analysis of Ya?ar Kemal’s ?nce Memed tetralogy. Comput. Human. 38(4), 457–467 (2004)

    Article  Google Scholar 

  12. Andrews, W.G., Black, N., Kalpakli, M.: Ottoman lyric poetry. University of Texas Press, Austin, Texas, USA (1997)

    Google Scholar 

  13. Forsyth, R.S., Holmes, D.I.: Feature-finding for text classification. Lit. Ling. Comput. 11(4), 162–174 (June 1996)

    Google Scholar 

  14. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification (2nd edn.). Wiley-Interscience, New York (2000)

    Google Scholar 

  15. Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  16. Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. Lect. Notes Comput. Sci. 3689, 174–189 (November 2005)

    Article  Google Scholar 

  17. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: ECML-98, 137–142 (1998)

    Google Scholar 

  18. Scheffe, H.: A method for judging all contrasts in the analysis of variance. Biometrica 40, 87–104 (1953)

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is partially supported by the Scientific and Technical Research Council of Turkey (TÜB?TAK) under the grant number 109E006. Any opinions, findings and conclusions or recommendations expressed in this article belong to the authors and do not necessarily reflect those of the sponsor.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fazli Can .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this paper

Cite this paper

Can, E.F., Can, F., Duygulu, P., Kalpakli, M. (2011). Automatic Categorization of Ottoman Literary Texts by Poet and Time Period. In: Gelenbe, E., Lent, R., Sakellari, G. (eds) Computer and Information Sciences II. Springer, London. https://doi.org/10.1007/978-1-4471-2155-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2155-8_6

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2154-1

  • Online ISBN: 978-1-4471-2155-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics