Skip to main content

The Colloquial WordNet: Extending Princeton WordNet with Neologisms

  • Conference paper
  • First Online:
Language, Data, and Knowledge (LDK 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10318))

Included in the following conference series:

Abstract

Princeton WordNet is one of the most important resources for natural language processing, but has not been updated for over ten years and is not suitable for analyzing the fast moving language as used on social media. We propose an extension to WordNet, with new terms that have been found from Twitter and Reddit, and cover language usage that is emergent or vulgar. In addition to our methodology for extraction, we analyze new terms to provide information about how new words are entering the English language. Finally, we discuss publishing this resource both as linguistic linked open data and as part of the Global WordNet Association’s Interlingual Index.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This end point provides a sample of approximately 1% of all tweets.

  2. 2.

    https://github.com/lucasdnd/simple-reddit-crawler.

  3. 3.

    Compiled at http://norvig.com/ngrams/.

  4. 4.

    http://www.urbandictionary.com.

  5. 5.

    See http://wordnetweb.princeton.edu/perl/webwn?s=post.

  6. 6.

    http://blog.oxforddictionaries.com/2013/08/what-is-the-origin-of-twerk/.

  7. 7.

    http://colloqwn.linguistic-lod.org/.

  8. 8.

    http://globalwordnet.github.io/schemas/.

  9. 9.

    We have aimed to combine this resource with our data, but discussions with the authors on licensing have been inconclusive.

References

  1. Arcan, M., McCrae, J.P., Buitelaar, P.: Expanding wordnets to new languages with multilingual sense disambiguation. In: Proceedings of The 26th International Conference on Computational Linguistics (2016)

    Google Scholar 

  2. Bond, F., Vossen, P., McCrae, J.P., Fellbaum, C.: CILI: the collaborative interlingual index. In: Proceedings of the Global WordNet Conference (2016)

    Google Scholar 

  3. Breen, J.: Identification of neologisms in Japanese by corpus analysis. In: Proceedings of the E-lexicography in the 21st Century: New Challenges, New Applications, ELex 2009, Louvain-la Neuve, pp. 13–21 (2010)

    Google Scholar 

  4. Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: linguistic linked data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources, pp. 7–25. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Cimiano, P., McCrae, J.P., Buitelaar, P.: Lexicon model for ontologies: community report. Final Community Group Report, World Wide Web Consortium (2016)

    Google Scholar 

  6. Morgado da Costa, L., Bond, F.: Wow! what a useful extension! introducing non-referential concepts to WordNet. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), PortoroĹľ, Slovenia. (2016)

    Google Scholar 

  7. Dhuliawala, S., Kanojia, D., Bhattacharyya, P.: SlangNet: a WordNet like resource for English slang. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp. 4329–4332 (2016)

    Google Scholar 

  8. Falk, I., Bernhard, D., GĂ©rard, C.: From non word to new word: automatically identifying neologisms in French newspapers. In: The 9th Language Resources and Evaluation Conference, LREC (2014)

    Google Scholar 

  9. Fellbaum, C.: WordNet. Blackwell Publishing Ltd., Hoboken (1998)

    MATH  Google Scholar 

  10. Grant, H.: Tumblinguistics: innovation and variation in new forms of written CMC. Master’s thesis, University of Glasgow (2015)

    Google Scholar 

  11. Hicks, A., Rutherford, M., Fellbaum, C., Bian, J.: An analysis of WordNet’s coverage of gender identity using Twitter and the national transgender discrimination survey. In: Global WordNet Conference (2016)

    Google Scholar 

  12. Jurgens, D., Pilehvar, M.T.: Reserating the awesometastic: an automatic extension of the WordNet taxonomy for novel terms. In: HLT-NAACL, pp. 1459–1465 (2015)

    Google Scholar 

  13. Maziarz, M., Piasecki, M., Rudnicka, E., Szpakowicz, S., Kedzia, P.: plWordNet 3.0-a comprehensive lexical-semantic resource. In: Proceedings of the 26th International Conference on Computational Linguistics, COLING 2016: Technical Papers, pp. 2259–2268 (2016)

    Google Scholar 

  14. McCrae, J., Aguado-de-Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., et al.: Interchanging lexical resources on the semantic web. Lang. Resour. Eval. 46(4), 701–719 (2012)

    Article  Google Scholar 

  15. McCrae, J.P.: Yuzu: publishing any data as linked data. In: ISWC 2016 Posters and Demonstrations Track (2016)

    Google Scholar 

  16. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. O’Donovan, R., O’Neill, M.: A systematic approach to the selection of neologisms for inclusion in a large monolingual dictionary. In: Proceedings of the 13th Euralex International Congress, pp. 571–579 (2008)

    Google Scholar 

  18. Sporny, M., Longley, D., Kellogg, G., Lanthaler, M., Lindström, N.: JSON-LD 1.1: a JSON-based serialization for linked data. Community Group Report, World Wide Web Consortium (2017)

    Google Scholar 

  19. Vossen, P., Bond, F., McCrae, J.P.: Toward a truly multilingual global WordNet grid. In: Proceedings of the Global WordNet Conference (2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Science Foundation Ireland under Grant Number SFI/12/RC/2289 (Insight) and NIH/NCATS Clinical and Translational Science Awards to the University of Florida UL1 TR000064/UL1 TR001427. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH/NCATS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John P. McCrae .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

McCrae, J.P., Wood, I., Hicks, A. (2017). The Colloquial WordNet: Extending Princeton WordNet with Neologisms. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59888-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59887-1

  • Online ISBN: 978-3-319-59888-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics