Adam Kilgarriff’s Legacy to Computational Linguistics and Beyond

  • Roger Evans
  • Alexander Gelbukh
  • Gregory Grefenstette
  • Patrick Hanks
  • Miloš Jakubíček
  • Diana McCarthyEmail author
  • Martha Palmer
  • Ted Pedersen
  • Michael Rundell
  • Pavel Rychlý
  • Serge Sharoff
  • David Tugwell
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9623)


The 2016 CICLing conference was dedicated to the memory of Adam Kilgarriff who died the year before. Adam leaves behind a tremendous scientific legacy and those working in computational linguistics, other fields of linguistics and lexicography are indebted to him. This paper is a summary review of some of Adam’s main scientific contributions. It is not and cannot be exhaustive. It is written by only a small selection of his large network of collaborators. Nevertheless we hope this will provide a useful summary for readers wanting to know more about the origins of work, events and software that are so widely relied upon by scientists today, and undoubtedly will continue to be so in the foreseeable future.


  1. 1.
    Atkins, S.: Tools for computer-aided corpus lexicography: the hector project. Acta Linguistica Hungarica 41, 5–72 (1993)Google Scholar
  2. 2.
    Atkins, S., Rundell, M., Kilgarriff, A.: Database of ANalysed Texts of English (DANTE). In: Proceedings of Euralex (2010)Google Scholar
  3. 3.
    Banko, M., Brill, E.: Scaling to very very large corpora for natural language disambiguation. In: ACL, pp. 26–33 (2001)Google Scholar
  4. 4.
    Baroni, M., Kilgarriff, A., Pomikálek, J., Rychlý, P.: WebBootCat: a web tool for instant corpora. In: Proceedings of Euralex, Torino, Italy, pp. 123–132 (2006)Google Scholar
  5. 5.
    Baroni, M., Kilgarriff, A., Pomikálek, J., Rychlỳ, P.: WebBootCaT: instant domain-specific corpora to support human translators. In: Proceedings of EAMT, pp. 247–252 (2006)Google Scholar
  6. 6.
    Copestake, A.: Implementing Typed Feature Structure Grammars. CSLI Lecture Notes. CSLI Publications, Stanford (2002). zbMATHGoogle Scholar
  7. 7.
    Dale, R., Kilgarriff, A.: Helping our own: text massaging for computational linguistics as a new shared task. In: Proceedings of the 6th International Natural Language Generation Conference, pp. 263–267. Association for Computational Linguistics (2010)Google Scholar
  8. 8.
    Erjavec, T., Evans, R., Ide, N., Kilgarriff, A.: The concede model for lexical databases. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 355–362. Athens, Greece (2000)Google Scholar
  9. 9.
    Evans, R., Gazdar, G.: DATR: a language for lexical knowledge representation. Comput. Linguist. 22(2), 167–216 (1996). Google Scholar
  10. 10.
    Gale, W., Church, K., Yarowsky, D.: One sense per discourse. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 233–237 (1992)Google Scholar
  11. 11.
    Gardner, S., Nesi, H.: A classification of genre families in university student writing. Appl. Linguist. 34(1), 25–52 (2012). ams024CrossRefGoogle Scholar
  12. 12.
    Gilquin, G., Granger, S., Paquot, M.: Learner corpora: the missing link in EAP pedagogy. J. Engl. Acad. Purp. 6(4), 319–335 (2007)CrossRefGoogle Scholar
  13. 13.
    Hanks, P.: Do word meanings exist? Comput. Humanit. 34(1–2), 205–215 (2000). SENSEVAL Special IssueCrossRefGoogle Scholar
  14. 14.
    Ide, N., Kilgarriff, A., Romary, L.: A formal model of dictionary structure and content. In: Heid, U., Evert, S., Lehmann, E., Rohrer, C. (eds.) Proceedings of the 9th EURALEX International Congress. Institut für Maschinelle Sprachverarbeitung, Stuttgart, Germany, pp. 113–126, August 2000Google Scholar
  15. 15.
    Ide, N., Véronis, J.: Encoding dictionaries. Comput. Humanit. 29(2), 167–179 (1995). CrossRefGoogle Scholar
  16. 16.
    Jakubíček, M., Rychlý, P., Kilgarriff, A., McCarthy, D.: Fast syntactic searching in very large corpora for many languages. In: PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, Tokyo, pp. 741–747 (2010)Google Scholar
  17. 17.
    Kallas, J., Tuulik, M., Langemets, M.: The basic Estonian dictionary: the first monolingual L2 learner’s dictionary of Estonian. In: Proceedings of the XVI Euralex Congress (2014)Google Scholar
  18. 18.
    Kilgarriff, A., Kovar, V., Frankenberg-Garcia, A.: Bilingual word sketches: three flavours. In: Electronic Lexicography in the 21st Century: Thinking outside the Paper (eLex 2013), pp. 17–19 (2013)Google Scholar
  19. 19.
    Kilgarriff, A.: Polysemy. Ph.D. thesis, University of Sussex (1992)Google Scholar
  20. 20.
    Kilgarriff, A.: Dictionary word-sense distinctions: an enquiry into their nature. Comput. Humanities 26(1–2), 365–387 (1993)Google Scholar
  21. 21.
    Kilgarriff, A.: The hard parts of lexicography. Int. J. Lexicography 11(1), 51–54 (1997)CrossRefGoogle Scholar
  22. 22.
    Kilgarriff, A.: Putting frequencies in the dictionary. Int. J. Lexicography 10(2), 135–155 (1997)CrossRefGoogle Scholar
  23. 23.
    Kilgarriff, A.: What is word sense disambiguation good for? In: Proceedings of Natural Language Processing in the Pacific Rim, pp. 209–214 (1997)Google Scholar
  24. 24.
    Kilgarriff, A.: Gold standard datasets for evaluating word sense disambiguation programs. Comput. Speech Lang. 12(3), 453–472 (1998)CrossRefGoogle Scholar
  25. 25.
    Kilgarriff, A.: I don’t believe in word senses. Comput. Humanit. 31(2), 91–113 (1998). Reprinted in Practical Lexicography: a Reader. Fontenelle (ed.) Oxford University Press (2008). Also reprinted in Polysemy: Flexible patterns of meaning in language and mind Nerlich Todd, Herman and Clarke (eds.) Walter de Gruyter, pp. 361–392. And to be reprinted in Readings in the Lexicon Pustejovsky and Wilks (eds.) MIT PressCrossRefGoogle Scholar
  26. 26.
    Kilgarriff, A.: SENSEVAL: an exercise in evaluating word sense disambiguation programs. In: Proceedings of LREC, Granada, pp. 581–588 (1998)Google Scholar
  27. 27.
    Kilgarriff, A.: Comparing corpora. Int. J. Corpus Linguist. 6(1), 1–37 (2001)CrossRefGoogle Scholar
  28. 28.
    Kilgarriff, A.: Language is never ever ever random. Corpus Linguist. Linguist. Theor. 1(2), 263–276 (2005)CrossRefGoogle Scholar
  29. 29.
    Kilgarriff, A.: Collocationality (and how to measure it). In: Proceedings of the 12th EURALEX International Congress, Torino, Italy, September 2006, pp. 997–1004 (2006)Google Scholar
  30. 30.
    Kilgarriff, A.: Word senses. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 29–46. Springer, Heidelberg (2006). CrossRefGoogle Scholar
  31. 31.
    Kilgarriff, A.: Googleology is bad science. Comput. Linguist. 33(1), 147–151 (2007)CrossRefGoogle Scholar
  32. 32.
    Kilgarriff, A.: Grammar is to meaning as the law is to good behaviour. Corpus Linguist. Linguist. Theor 3(2), 195–197 (2007)CrossRefGoogle Scholar
  33. 33.
    Kilgarriff, A.: Simple maths for keywords. In: Proceedings of Corpus Linguistics, Liverpool, UK (2009)Google Scholar
  34. 34.
    Kilgarriff, A.: Comparable corpora within and across languages, word frequency lists and the kelly project. In: Procedings of Workshop on Building and Using Comparable Corpora at LREC, Malta (2010)Google Scholar
  35. 35.
    Kilgarriff, A.: A detailed, accurate, extensive, available English lexical database. In: Proceedings of the NAACL HLT 2010 Demonstration Session, pp. 21–24. Association for Computational Linguistics, Los Angeles, June 2010.
  36. 36.
    Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., Suchomel, V.: The Sketch Engine: ten years on. Lexicography 1(1), 7–36 (2014). CrossRefGoogle Scholar
  37. 37.
    Kilgarriff, A., Charalabopoulou, F., Gavrilidou, M., Johannessen, J.B., Khalil, S., Kokkinakis, S.J., Lew, R., Sharoff, S., Vadlapudi, R., Volodina, E.: Corpus-based vocabulary lists for language learners for nine languages. Lang. Resour. Eval. 48(1), 121–163 (2014)CrossRefGoogle Scholar
  38. 38.
    Kilgarriff, A., Evans, R., Koeling, R., Rundell, M., Tugwell, D.: WASPBENCH: a lexicographer’s workbench supporting state-of-the-art word sense disambiguation. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, EACL 2003, vol. 2, pp. 211–214. Association for Computational Linguistics, Stroudsburg (2003).
  39. 39.
    Kilgarriff, A., Grefenstette, G.: Introduction to the special issue on web as corpus. Comput. Linguist. 29(3), 333–347 (2003)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Kilgarriff, A., Husák, M., McAdam, K., Rundell, M., Rychlý, P.: GDEX: automatically finding good dictionary examples in a corpus. In: Proceedings of the 13th EURALEX International Congress, Barcelona, Spain, July 2008, pp. 425–432 (2008)Google Scholar
  41. 41.
    Kilgarriff, A., Jakubíček, M., Kovář, V., Rychlý, P., Suchomel, V.: Finding terms in corpora for many languages with the Sketch Engine. In: EACL 2014, p. 53 (2014)Google Scholar
  42. 42.
    Kilgarriff, A., Palmer, M.: Introduction to the special issue on SENSEVAL. Comput. Humanit. 34(1–2), 1–13 (2000). SENSEVAL Special IssueCrossRefGoogle Scholar
  43. 43.
    Kilgarriff, A., Palmer, M. (eds.): SENSEVAL98: Evaluating Word Sense Disambiguation Systems, pp. 1–2. Kluwer, Dordrecht (2000)Google Scholar
  44. 44.
    Kilgarriff, A., Rosenzweig, J.: Framework and results for English SENSEVAL. Comput. Humanit. 34(1–2), 15–48 (2000). SENSEVAL Special IssueCrossRefGoogle Scholar
  45. 45.
    Kilgarriff, A., Rychlý, P., Kovář, V., Baisa, V.: Finding multiwords of more than two words. In: Proceedings of EURALEX 2012 (2012)Google Scholar
  46. 46.
    Kilgarriff, A., Rychlý, P.: Semi-automatic dictionary drafting download. In: de Schryver, G.M. (ed.) A Way with Words: Recent Advances in Lexical Theory and Analysis. A Festschrift for Patrick Hanks, Menha (2010)Google Scholar
  47. 47.
    Kilgarriff, A., Rychlỳ, P., Jakubicek, M., Kovár, V., Baisa, V., Kocincová, L.: Extrinsic corpus evaluation with a collocation dictionary task. In: LREC, pp. 545–552 (2014)Google Scholar
  48. 48.
    Kilgarriff, A., Rychlý, P., Smrz, P., Tugwell, D.: The sketch engine. In: Proceedings of Euralex, Lorient, France, pp. 105–116 (2004). Reprinted in Patrick Hanks (ed.) (2007). Lexicology: Critical Concepts in Linguistics. Routledge, LondonGoogle Scholar
  49. 49.
    Kilgarriff, A., Tugwell, D.: WASP-Bench: an MT lexicographer’s workstation supporting state-of-the-art lexical disambiguation. In: Proceedings of the MT Summit VIII, Santiago de Compostela, Spain, pp. 187–190, September 2001Google Scholar
  50. 50.
    Kosem, I., Gantar, P., Krek, S.: Automation of lexicographic work: an opportunity for both lexicographers and crowd-sourcing. In: Electronic Lexicography in the 21st Century: Thinking Outside the Paper: Proceedings of the eLex 2013 Conference, Tallinn, Estonia, 17–19 October 2013, pp. 32–48 (2013)Google Scholar
  51. 51.
    Kosem, I., Husák, M., McCarthy, D.: GDEX for slovene. In: Proceedings of eLex2011, Bled, Slovenia (2011)Google Scholar
  52. 52.
    Krek, S., Abel, A., Tiberius, C.: ENeL Project: DWS/CQS Survey Analysis (2015).
  53. 53.
    Leech, G.: 100 million words of English: the British national corpus (BNC). Lang. Res. 28(1), 1–13 (1992)MathSciNetGoogle Scholar
  54. 54.
    Louw, B., Chateau, C.: Semantic prosody for the 21st century: are prosodies smoothed in academic contexts? A contextual prosodic theoretical perspective. In: Proceedings of the tenth JADT Conference on Statistical Analysis of Textual Data, pp. 754–764. Citeseer (2010)Google Scholar
  55. 55.
    Baroni, M., Chantree, F., Kilgarriff, A., Sharoff, S.: CleanEval: a competition for cleaning web pages. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, pp. 638–643 (2008)Google Scholar
  56. 56.
    Mautner, G.: Mining large corpora for social information: the case of elderly. Lang. Soc. 36(01), 51–72 (2007)CrossRefGoogle Scholar
  57. 57.
    McCarthy, D., Kilgarriff, A., Jakubíček, M., Reddy, S.: Semantic word sketches. In: 8th International Corpus Linguistics Conference (CL 2015) (2015)Google Scholar
  58. 58.
    McEnery, T., Wilson, A.: Corpus Linguistics. Edinburgh University Press, Edinburgh (1999)zbMATHGoogle Scholar
  59. 59.
    Mihalcea, R., Chklovski, T., Kilgarriff, A.: The SENSEVAL-3 English lexical sample task. In: Mihalcea, R., Edmonds, P. (eds.) Proceedings SENSEVAL-3 Second International Workshop on Evaluating Word Sense Disambiguation Systems, Barcelona, Spain, pp. 25–28 (2004)Google Scholar
  60. 60.
    Nastase, V., Sayyad-Shirabad, J., Sokolova, M., Szpakowicz, S.: Learning noun-modifier semantic relations with corpus-based and WordNet-based features. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, no. 1, p. 781. AAAI Press/MIT Press, Menlo Park, Cambridge, London 1999 (2006)Google Scholar
  61. 61.
    O’Donovan, R., O’Neill, M.: A systematic approach to the selection of neologisms for inclusion in a large monolingual dictionary. In: Proceedings of the XIII EURALEX International Congress, Barcelona, 15–19 July 2008, pp. 571–579 (2008)Google Scholar
  62. 62.
    Peters, W., Kilgarriff, A.: Discovering semantic regularity in lexical resources. Int. J. Lexicography 13(4), 287–312 (2000)CrossRefGoogle Scholar
  63. 63.
    Pomikálek, J., Rychlỳ, P., Kilgarriff, A., et al.: Scaling to billion-plus word corpora. Adv. Comput. Linguist. 41, 3–13 (2009)Google Scholar
  64. 64.
    Preiss, J., Yarowsky, D. (eds.): Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France (2001). sIGLEX Workshop Organized by Cotton, S., Edmonds, P., Kilgarriff, A., Palmer, MGoogle Scholar
  65. 65.
    Rundell, M.: Macmillan English Dictionary. Macmillan, Oxford (2002)Google Scholar
  66. 66.
    Rundell, M., Kilgarriff, A.: Automating the creation of dictionaries: where will it all end? In: Meunier, F. et al. (eds.) A Taste for Corpora. In Honour of Sylviane Granger, pp. 257–281. Benjamins, Amsterdam (2011)Google Scholar
  67. 67.
    Rychlý, P.: Korpusové manažery a jejich efektiví implementace. Ph.D. thesis, Masaryk University, Brno (únor 2000)Google Scholar
  68. 68.
    Rychlý, P.: Manatee/Bonito - a modular corpus manager. In: Proceedings of Recent Advances in Slavonic Natural Language Processing 2007. Masaryk University, Brno (2007)Google Scholar
  69. 69.
    Rychlý, P.: A lexicographer-friendly association score. In: Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2008, pp. 6–9 (2008)Google Scholar
  70. 70.
    Sharoff, S.: Creating general-purpose corpora using automated search engine queries. In: Baroni, M., Bernardini, S. (eds.) WaCky! Working Papers on the Web as Corpus, Gedit, Bologna (2006)Google Scholar
  71. 71.
    Sinclair, J.: The lexical item. In: Weigand, E. (ed.) Contrastive Lexical Semantics. Benjamins, Amsterdam (1998)Google Scholar
  72. 72.
    Tugwell, D., Kilgarriff, A.: WASP-Bench: a lexicographic tool supporting word-sense disambiguation. In: Preiss, J., Yarowsky, D. (eds.) Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France (2001)Google Scholar
  73. 73.
    Tugwell, D., Kilgarriff, A.: Word sketch: extraction and display of significant collocations for lexicography. In: Proceedings of the ACL Workshop on Collocations, Toulouse, France, pp. 32–28 (2001)Google Scholar
  74. 74.
    Wellner, B., Pustejovsky, J., Havasi, C., Rumshisky, A., Saurí, R.: Classification of discourse coherence relations: an exploratory study using multiple knowledge sources. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, SigDIAL 2006, pp. 117–125, Association for Computational Linguistics, Stroudsburg (2006).
  75. 75.
    Yarowsky, D.: One sense per collocation. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 266–271. Morgan Kaufman (1993)Google Scholar
  76. 76.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Roger Evans
    • 1
  • Alexander Gelbukh
    • 2
  • Gregory Grefenstette
    • 3
  • Patrick Hanks
    • 4
  • Miloš Jakubíček
    • 5
    • 6
  • Diana McCarthy
    • 7
    Email author
  • Martha Palmer
    • 8
  • Ted Pedersen
    • 9
  • Michael Rundell
    • 10
  • Pavel Rychlý
    • 5
    • 6
  • Serge Sharoff
    • 11
  • David Tugwell
    • 12
  1. 1.University of BrightonBrightonUK
  2. 2.CIC, Instituto Politécnico NacionalMexico CityMexico
  3. 3.IHMCOcalaUSA
  4. 4.University of WolverhamptonWolverhamptonUK
  5. 5.Lexical ComputingBrightonUK
  6. 6.Masaryk UniversityBrnoCzech Republic
  7. 7.DTAL University of CambridgeCambridgeUK
  8. 8.University of ColoradoBoulderUSA
  9. 9.University of MinnesotaMinneapolisUSA
  10. 10.Lexicography MasterClassBrightonUK
  11. 11.University of LeedsLeedsUK
  12. 12.Independent ResearcherEdinburghUK

Personalised recommendations