The Colloquial WordNet: Extending Princeton WordNet with Neologisms

  • John P. McCrae
  • Ian Wood
  • Amanda Hicks
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)


Princeton WordNet is one of the most important resources for natural language processing, but has not been updated for over ten years and is not suitable for analyzing the fast moving language as used on social media. We propose an extension to WordNet, with new terms that have been found from Twitter and Reddit, and cover language usage that is emergent or vulgar. In addition to our methodology for extraction, we analyze new terms to provide information about how new words are entering the English language. Finally, we discuss publishing this resource both as linguistic linked open data and as part of the Global WordNet Association’s Interlingual Index.


WordNet Neologisms Slang Linked data Lexicography 



This work was supported in part by the Science Foundation Ireland under Grant Number SFI/12/RC/2289 (Insight) and NIH/NCATS Clinical and Translational Science Awards to the University of Florida UL1 TR000064/UL1 TR001427. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH/NCATS.


  1. 1.
    Arcan, M., McCrae, J.P., Buitelaar, P.: Expanding wordnets to new languages with multilingual sense disambiguation. In: Proceedings of The 26th International Conference on Computational Linguistics (2016)Google Scholar
  2. 2.
    Bond, F., Vossen, P., McCrae, J.P., Fellbaum, C.: CILI: the collaborative interlingual index. In: Proceedings of the Global WordNet Conference (2016)Google Scholar
  3. 3.
    Breen, J.: Identification of neologisms in Japanese by corpus analysis. In: Proceedings of the E-lexicography in the 21st Century: New Challenges, New Applications, ELex 2009, Louvain-la Neuve, pp. 13–21 (2010)Google Scholar
  4. 4.
    Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: linguistic linked data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources, pp. 7–25. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Cimiano, P., McCrae, J.P., Buitelaar, P.: Lexicon model for ontologies: community report. Final Community Group Report, World Wide Web Consortium (2016)Google Scholar
  6. 6.
    Morgado da Costa, L., Bond, F.: Wow! what a useful extension! introducing non-referential concepts to WordNet. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. (2016)Google Scholar
  7. 7.
    Dhuliawala, S., Kanojia, D., Bhattacharyya, P.: SlangNet: a WordNet like resource for English slang. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp. 4329–4332 (2016)Google Scholar
  8. 8.
    Falk, I., Bernhard, D., Gérard, C.: From non word to new word: automatically identifying neologisms in French newspapers. In: The 9th Language Resources and Evaluation Conference, LREC (2014)Google Scholar
  9. 9.
    Fellbaum, C.: WordNet. Blackwell Publishing Ltd., Hoboken (1998)MATHGoogle Scholar
  10. 10.
    Grant, H.: Tumblinguistics: innovation and variation in new forms of written CMC. Master’s thesis, University of Glasgow (2015)Google Scholar
  11. 11.
    Hicks, A., Rutherford, M., Fellbaum, C., Bian, J.: An analysis of WordNet’s coverage of gender identity using Twitter and the national transgender discrimination survey. In: Global WordNet Conference (2016)Google Scholar
  12. 12.
    Jurgens, D., Pilehvar, M.T.: Reserating the awesometastic: an automatic extension of the WordNet taxonomy for novel terms. In: HLT-NAACL, pp. 1459–1465 (2015)Google Scholar
  13. 13.
    Maziarz, M., Piasecki, M., Rudnicka, E., Szpakowicz, S., Kedzia, P.: plWordNet 3.0-a comprehensive lexical-semantic resource. In: Proceedings of the 26th International Conference on Computational Linguistics, COLING 2016: Technical Papers, pp. 2259–2268 (2016)Google Scholar
  14. 14.
    McCrae, J., Aguado-de-Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., et al.: Interchanging lexical resources on the semantic web. Lang. Resour. Eval. 46(4), 701–719 (2012)CrossRefGoogle Scholar
  15. 15.
    McCrae, J.P.: Yuzu: publishing any data as linked data. In: ISWC 2016 Posters and Demonstrations Track (2016)Google Scholar
  16. 16.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    O’Donovan, R., O’Neill, M.: A systematic approach to the selection of neologisms for inclusion in a large monolingual dictionary. In: Proceedings of the 13th Euralex International Congress, pp. 571–579 (2008)Google Scholar
  18. 18.
    Sporny, M., Longley, D., Kellogg, G., Lanthaler, M., Lindström, N.: JSON-LD 1.1: a JSON-based serialization for linked data. Community Group Report, World Wide Web Consortium (2017)Google Scholar
  19. 19.
    Vossen, P., Bond, F., McCrae, J.P.: Toward a truly multilingual global WordNet grid. In: Proceedings of the Global WordNet Conference (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Insight Center for Data AnalyticsNational University of Ireland, GalwayGalwayIreland
  2. 2.Department of Health Outcomes and PolicyUniversity of FloridaGainesvilleUSA

Personalised recommendations