Advertisement

Linguistic Corpora: A View from Turkish

  • Mustafa Aksan
  • Yeşim Aksan
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Usage-based linguistic studies have gained new insights as corpus-based and corpus-driven analyses have advanced in recent years. Linguists working in different domains have turned to corpora as a major source in their study of language at all levels of representation. Currently, corpus linguistics is evolving into a sophisticated methodology in extracting and analyzing data. Building and using corpora in Turkish linguistics is a recent undertaking, initially motivated by work on natural language processing (NLP) research. The number of available corpora is increasing and linguistic research has come to test hypotheses on attested data, or uncover more lexical and grammatical patterns of use that have gone unnoticed in the absence of corpus data. Advances in NLP research and tools provided for corpus building and annotation further contribute to corpus studies in Turkish linguistics.

References

  1. Aksan M, Aksan Y (2012) Multi-word units in informative and imaginative domains. In: Proceedings of the international conference on Turkish Linguistics, AnkaraGoogle Scholar
  2. Aksan M, Aksan Y (2013) Multi-word units and pragmatic functions in genre specification. In: Proceedings of the international pragmatics conference, New Delhi, pp 239–240Google Scholar
  3. Aksan Y, Aksan M (2014) Frequency effects in Turkish: a study on multi-word units. In: Proceedings of the international conference on Turkish Linguistics, RouenGoogle Scholar
  4. Aksan Y, Demirhan UU (2015) Expressions of gratitude in the Turkish National Corpus. In: Ruhi Ş, Aksan Y (eds) Exploring (im)politeness in specialized and general corpora: converging methodologies and analytic procedures. Cambridge Scholars, Newcastle upon Tyne, pp 121–172Google Scholar
  5. Aksan M, Mersinli Ü (2011) A corpus-based Nooj module for Turkish. In: Proceedings of the Nooj 2010 international conference and workshop, Komotini, pp 29–39Google Scholar
  6. Aksan M, Mersinli Ü (2015) Retrieving and analyzing requestive forms: evidence from the Turkish National Corpus. In: Ruhi Ş, Aksan Y (eds) Exploring (im)politeness in specialized and general corpora: converging methodologies and analytic procedures. Cambridge Scholars, Newcastle upon Tyne, pp 173–220Google Scholar
  7. Aksan Y, Yaldır Y (2012) A corpus-based frequency list of Turkish: evidence from the Turkish National Corpus. In: Proceedings of the international conference on Turkish linguistics. Gold Press Nyomda Kft, Szeged, pp 47–58Google Scholar
  8. Aksan Y, Aksan M, Koltuksuz A, Sezer T, Mersinli Ü, Demirhan UU, Yılmazer H, Kurtoğlu Ö, Öz S, Yıldız İ (2012) Construction of the Turkish National Corpus (TNC). In: Proceedings of LREC, Istanbul, pp 3223–3227Google Scholar
  9. Akşehirli S (2014) Dereceli karşıt anlamlılarda belirtisizlik ve ölçek yapısı. J. Lang. Linguist. Stud. 10:49–66Google Scholar
  10. Ambati BR, Reddy S, Kilgarriff A (2012) Word sketches for Turkish. In: Proceedings of LREC, Istanbul, pp 2945–2950Google Scholar
  11. Baker P, Hardie A, McEnery T (2006) A glossary of corpus linguistics. Edinburgh University Press, EdinburghGoogle Scholar
  12. Barlow M (2011) Corpus linguistics and theoretical linguistics. Int J Corpus Linguist 16:3–44Google Scholar
  13. Biber D (1993) Representativeness in corpus design. Lit Linguist Comput 8:243–257Google Scholar
  14. Carter R, McCarthy M (2004) Talking, creating: interactional language, creativity, and context. Appl Linguist 25(1):62–88Google Scholar
  15. Çebi Y, Dalkılıç G (2004) Turkish word n-gram analyzing algorithms for a large-scale Turkish corpus – TurCo. In: Proceedings of international conference on information technology: coding and computing, Las Vegas, NV, pp 236–240Google Scholar
  16. Çelebi H (2014) Impoliteness in corpora: a comparative analysis of British English and spoken Turkish. Equinox, LondonGoogle Scholar
  17. Çetinoğlu Ö (2014) Turkish treebank as a gold standard for morphological disambiguation and its influence on parsing. In: Proceedings of LREC, Reykjavík, pp 3360–3365Google Scholar
  18. Çöltekin Ç (2010) A freely available morphological analyzer for Turkish. In: Proceedings of LREC, Valetta, pp 820–827Google Scholar
  19. Çubukçu H (2005) Karşılıklı konuşmada destekleyici geri bildirim. In: Ergenç İ (ed) Dilbilim İncelemeleri. Doğan Yayıncılık, Ankara, pp 289–305Google Scholar
  20. Dalkılıç G, Çebi Y (2002) A 300MB Turkish corpus and word analysis. In: Proceedings of the conference on advances in information systems. LNCS, vol 2547. Springer, Berlin, pp 205–212Google Scholar
  21. Davis M (2008) The 385+ million word corpus of American English (1990–2008+): design, architecture and linguistic insights. Int J Corpus Linguist 14(2):159–190Google Scholar
  22. Demirşahin I, Zeyrek D (2014) Annotating discourse connectives in spoken Turkish. In: Proceedings of the linguistic annotation workshop, Dublin, pp 105–109Google Scholar
  23. de Schryver G (2002) Web for/as corpus: a perspective for the African languages. Nord J Afr Stud 11:266–282Google Scholar
  24. Doğan M (ed) (2011) Multisaund: Ulusal konuşma ve dil teknolojileri platformu kuruluşu ve Türkçede mevcut durum çalıştayı bildirileri, TÜBİTAK-BİLGEM, GebzeGoogle Scholar
  25. Du Bois J, Chafe W, Meyer C, Thompson S, Englebretson R, Martey N (2005) Santa Barbara corpus of spoken American English, Parts 1–4, Philadelphia, PAGoogle Scholar
  26. Erköse Y, Uçar A (2014) Türkçedeki dur- konumlama eyleminin derlem temelli bilişsel anlam çözümlemesi. In: Proceedings of the national linguistics conference, Hacettepe University, Kemer, pp 351–358Google Scholar
  27. Ferraresi A, Zanchetta E, Baroni M, Bernardini S (2008) Introducing and evaluating ukWaC, a very large web-derived corpus of English. In: Proceedings of the workshop on web as corpus workshop – Can we beat Google? Marrakech, MoroccoGoogle Scholar
  28. Francis W, Kučera H (1964) A standard corpus of present-day edited American English, for use with digital computers. Brown University, Providence, RIGoogle Scholar
  29. Granger S (2003) The international corpus of learner English: a new resource for foreign language learning and teaching and second language acquisition research. TESOL Q 37(3):538–546Google Scholar
  30. Greenbaum S (1991) The development of international corpus of English. In: Aijmer K, Altenberg B (eds) English corpus linguistics. Studies in honour of Jan Svartvik. Longman, London, pp 83–91Google Scholar
  31. Greenbaum S, Svartvik J (1990) The London-Lund Corpus of Spoken English. In: Svartvik J (ed) The London-Lund corpus of spoken English: description and research. Lund University Press, Lund, pp 11–45Google Scholar
  32. Hoffmann S, Evert S, Smith N, Lee D, Prytz YB (2008) Corpus linguistics with BNCweb: a practical guide. Peter Lang, FrankfurtGoogle Scholar
  33. Holmes J, Vine B, Johnson G (1998) Guide to the Wellington corpus of spoken New Zealand English. University of Wellington Press, WellingtonGoogle Scholar
  34. Işık-Güler H, Ruhi Ş Ruhi (2010) Face and impoliteness at the intersection with emotions: a corpus-based study in Turkish. Intercult Pragmat 7:625–660Google Scholar
  35. Karaoğlan B, Dinçer BT, Kışla T, Kumova-Metin S (2013) Derlem normalizasyonu için bir öneri. In: Proceedings of IEEE signal processing and communications applications conference, MagosaGoogle Scholar
  36. Kawaguchi Y (2005) Two Turkish clause linkages: –DIK- and –mE-: a pilot analysis based on the METU Turkish corpus. In: Takagaki T, Zaima S, Tsuruga Y, Moreno-Fernandez F, Kawaguchi Y (eds) Corpus-based approaches to sentence structures. John Benjamins, Amsterdam, pp 151–177Google Scholar
  37. Kennedy G (1998) An introduction to corpus linguistics. Longman, LondonGoogle Scholar
  38. Kilgarriff A, Reddy S, Pomikalek J, Avinesh PVS (2010) A corpus factory for many languages. In: Proceedings of LREC, Valletta, pp 904–910Google Scholar
  39. Kilimci A, Can C (2009) TICLE: Uluslararası Türk Öğrenci İngilizcesi Derlemi. In: Sarıca M, Sarıca N (eds) Proceedings of the national linguistics conference, Yüzüncü Yıl Üniversitesi, Van, pp 1–11Google Scholar
  40. Kırkıcı B (2009) İmparator çizelgesi vs. imparatorlar çizelgesi: on the (non)-use of plural non-head nouns in Turkish nominal compounding. Dilbilim Araştırmaları Dergisi 1:35–53Google Scholar
  41. Köksal A (1976) A first approach to a computerized model for the automatic morphological analysis of Turkish. PhD thesis, Hacettepe University, AnkaraGoogle Scholar
  42. Leech G (1992) Corpora and theories of linguistic performance. In: Svartvik J (ed) Directions in corpus linguistics. Mouton de Gruyter, Berlin, pp 105–122Google Scholar
  43. Leech G (2007) New resources, or just better old ones? The holy grail of representativeness. In: Hundt M, Nesselhauf N, Biewer C (eds) Corpus linguistics and the web. Rodopi, Amsterdam, pp 133–149Google Scholar
  44. Leech G (2011) Principles and applications of corpus linguistics: interview with Geoffrey Leech. In: V V, Zyngier S, Barnbrook G (eds) Perspectives on corpus linguistics. John Benjamins, Amsterdam, pp 155–170Google Scholar
  45. Lew R (2009) The web as corpus versus traditional corpora: their relative utility for linguists and language learners. In: Baker P (ed) Contemporary corpus linguistics. Continuum, London, pp 289–300Google Scholar
  46. Lüdeling A, Kytö M (2008) Introduction. In: Lüdeling A KM (ed) Corpus linguistics: an international handbook. Walter de Gruyter, Berlin, pp v–xiiGoogle Scholar
  47. McEnery T, Hardie A (2012) Corpus linguistics: method, theory and practice. Cambridge University Press, CambridgeGoogle Scholar
  48. McEnery T, Wilson A (1996) Corpus linguistics. Edinburgh University Press, EdinburghGoogle Scholar
  49. McEnery T, Xiao R, Tono Y (2006) Corpus-based language studies. Routledge, LondonGoogle Scholar
  50. Meyer C (2004) English corpus linguistics: an introduction. Cambridge University Press, CambridgeGoogle Scholar
  51. Newmeyer F (1986) Linguistic theory in America. Academic, LondonGoogle Scholar
  52. Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish Treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic, BerlinGoogle Scholar
  53. Oktar L, Cem-Değer A (1999) Gazete söyleminde kiplik ve işlevleri. Dilbilim Araştırmaları Dergisi, pp 45–53Google Scholar
  54. Özge U, Say B (2004) Development of a corpus workbench for the METU Turkish Corpus. In: Proceedings of LREC, Lisbon, pp 223–225Google Scholar
  55. Özyıldırım I (2010) Tür çözümlemesi. Bilgesu Yayınları, AnkaraGoogle Scholar
  56. Renouf A (2007) Corpus development 25 years on: from super-corpus to cyber-corpus. In: Facchinetti R (ed) Corpus linguistics 25 years on. Rodopi, Amsterdam, pp 27–49Google Scholar
  57. Renouf A, Kehoe A, Banerjee J (2007) WebCorp: an integrated system for web text search. In: Hundt M, Biewer C, Nesselhauf N (eds) Corpus linguistics and the web. Rodopi, Amsterdam, pp 47–67Google Scholar
  58. Rissanen M, Kytö M, Kahlas-Tarkka L, Kilpiö M, Nevanlinna S, Taavitsainen I, Nevalainen T, Raumolin-Brunberg H (eds) (1991) The Helsinki corpus of english texts. University of Helsinki, HelsinkiGoogle Scholar
  59. Ruhi Ş (2006) Politeness in compliment responses: a perspective from naturally occurring exchanges in Turkish. Pragmatics 16:43–101Google Scholar
  60. Ruhi Ş (2009) The pragmatics of yani as a parenthetical marker in Turkish: evidence from the METU Turkish corpus. In: Working papers in corpus-based linguistics and language education, vol 3, pp 285–298Google Scholar
  61. Ruhi Ş (2011) Creating a sustainable large corpus of spoken Turkish for multiple research purposes. In: Proceedings of Multisaund: Ulusal konuşma ve dil teknolojileri platformu kuruluşu ve Türkçede mevcut durum çalıştayı, TÜBİTAK-BİLGEM, Gebze, pp 70–73Google Scholar
  62. Ruhi Ş (2013) Interactional markers in Turkish: a corpus-based perspective. J Linguist Lit 10:1–7Google Scholar
  63. Ruhi Ş (2014) Sözlü Türkçe Derlemi’nde temel arama ve edimbilimsel açımlama: Yöntem geliştirme. In: Proceedings of the national linguistics conference, Hacettepe University, Kemer, pp 271–279Google Scholar
  64. Ruhi Ş, Eröz-Tuğa B, Hatipoğlu Ç, Işık-Güler H, Acar G, Eryılmaz K, Can H, Karakaş Ö, Karadaş DÇ (2010a) Sustaining a corpus for spoken Turkish discourse: accessibility and corpus management issues. In: Proceedings of the workshop on language resources: from storyboard to sustainability and LR lifecycle management, Valetta, pp 44–48Google Scholar
  65. Ruhi Ş, Işık-Güler H, Hatipoğlu Ç, Eröz-Tuğa B, Karadaş DÇK (2010b) Achieving representativeness through the parameters of spoken language and discursive features: the case of the spoken Turkish corpus. In: Moskowich-Spiegel F, Isabel CG, Begona I, Lareo M, Sandino PL (eds) Language windowing through corpora. Visualización del Lenguaje a Través de Corpus. Universidade da Coruña, Coruña, pp 789–799Google Scholar
  66. Ruhi Ş, Schmidt T, Wörner K, Eryılmaz K (2011) Annotating for precision and recall in speech act variation: the case of directives in the Spoken Turkish Corpus. In: Proceedings of the conference of the german society for computational linguistics and language technology – working papers in multilingualism, Hamburg, pp 203–206Google Scholar
  67. Ruhi Ş, Eryılmaz K, Acar G (2012) A platform for creating multimodal and multilingual spoken corpora for Turkic languages: insights from the Spoken Turkish Corpus. In: Proceedings of the first workshop on language resources and technologies for Turkic languages, Istanbul, pp 57–63Google Scholar
  68. Ruhi Ş, Haugh M, Schmidt T, Wörner K (eds) (2014) Best practices for spoken corpora in linguistic research. Cambridge Scholar, Newcastle upon TyneGoogle Scholar
  69. Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261Google Scholar
  70. Sampson G (2013) The empirical trend. Int J Corpus Linguist 18:281–289Google Scholar
  71. Say B (2006) Türkçe için bir derlem geliştirme çalışması. In: Bilgisayar Destekli Dilbilim Çalışmaları Bildirileri, TDK, Ankara, pp 81–88Google Scholar
  72. Say B, Zeyrek D, Oflazer K, Özge U (2004) Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the international conference on Turkish linguistics, Magosa, pp 183–192Google Scholar
  73. Schmidt T (2004) Transcribing and annotating spoken language with EXMARaLDA. In: Proceedings of the workshop on XML-based richly annotated corpora, Lisbon, pp 69–74Google Scholar
  74. Searle JR (1975) A taxonomy of illocutionary acts. In: Mind and knowledge. Minnesota studies in the philosophy of science. University of Minnesota Press, Minneapolis, pp 344–369Google Scholar
  75. Sebzecioğlu T (2013) Anlık oluşum ve Türkçe anlık sözcüklerin oluşum süreçleri üzerine bir betimleme. J Lang Linguist Stud 10:17–47Google Scholar
  76. Sinclair JM (2005) Appendix to chapter one: how to make a corpus. In: Wynne, M (ed) Developing linguistic corpora: a guide to good practice. www.ota.ox.ac.uk/documents/creating/dlc. Accessed 3 July 2017
  77. Sofu H, Altan A (2009) Partial reduplication: revisited. In: Proceedings of the international conference on Turkish linguistics, Wiesbaden, pp 63–72Google Scholar
  78. Svartvik J (2007) Corpus linguistics 25+ years on. In: Facchinetti R (ed) Corpus linguistics 25 years on. Rodopi, Amsterdam, pp 11–25Google Scholar
  79. Teubert W (2005) My version of corpus linguistics. Int J Corpus Linguist 10:1–13Google Scholar
  80. Teubert W, Cermakova A (2004) Corpus linguistics: a short introduction. Continuum, LondonGoogle Scholar
  81. Uçar A (2014) Özel amaçlı derlemi çeviriyazmak: Bir çeviriyazı modeli. Dilbilim Araştırmaları Dergisi 1:1–30Google Scholar
  82. Uçar A, Kurtoğlu Ö (2012) A corpus-based account of polysemy in Turkish: a case of ver-‘give’. In: Kincses-Nagy E, Biacsi M (eds) Proceedings of the international conference on Turkish linguistics. Gold Press Nymoda Kft, Szeged, pp 539–552Google Scholar
  83. Uçar A, Yıldız İ (2015) Humor and impoliteness in Turkish: a corpus-based analysis of the television show Komedi Dükkânı ‘comedy shop.’. In: Ruhi Ş, Aksan Y (eds) Exploring (im)politeness in specialized and general corpora: converging methodologies and analytic procedures. Cambridge Scholars, Newcastle upon Tyne, pp 40–81Google Scholar
  84. Uzun L, Erk-Emeksiz Z, Turan ÜD, Keçik İ (2014) Sosyal bilimler alanında Türkçe yazılan özgün araştırma yazılarında uslamlama türlerine göre sav şemaları. In: Proceedings of the national linguistics conference, Kemer, pp 305–321Google Scholar
  85. Wörner K (2009) Werkzeuge zur flachen Annotation von Transkriptionen gesprochener Sprache. PhD thesis, Bielefeld University, BielefeldGoogle Scholar
  86. Wynne M (2005) Developing linguistic corpora: a guide to good practice. www.icar.univ-lyon2.fr/ecole_thematique/contaci/documents/Baude/wynne.pdf. Accessed 14 Sept 2017
  87. Yıldız İ, Aksan M (2014) Türkçe bilimsel metinlerde eylemler: Derlem temelli bir inceleme. In: Proceedings of the national linguistics conference. Hacettepe University, Kemer, pp 247–253Google Scholar
  88. Zeyrek D (2012) Thanking in Turkish: a corpus-based study. In: Ruiz de Zarobe L, Ruiz de Zarobe Y (eds) Speech acts and politeness across languages and cultures. Peter Lang, Bern, pp 53–88Google Scholar
  89. Zeyrek D, Turan ÜD, Bozşahin C, Çakıcı R, Sevdik-Çallı A, Demirşahin I, Aktaş B, Yalçınkaya İ, Ögel H (2009) Annotating subordinators in the Turkish Discourse Bank. In: Proceedings of the Linguistic annotation workshop, Singapore, pp 44–47Google Scholar
  90. Zeyrek D, Demirşahin I, Sevdik-Çallı A, Çakıcı R (2013) Turkish Discourse Bank: porting a discourse annotation style to a morphologically rich language. Dialogue Discourse 4(2):174–184CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Mustafa Aksan
    • 1
  • Yeşim Aksan
    • 1
  1. 1.Mersin UniversityMersinTurkey

Personalised recommendations