Skip to main content

Profiling Idioms:

A Sociolexical Approach to the Study of Phraseological Patterns

  • Conference paper
  • First Online:
Computational and Corpus-Based Phraseology (EUROPHRAS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11755))

Included in the following conference series:

  • 749 Accesses

Abstract

This paper introduces a novel approach to the study of lexical and pragmatic meaning called ‘sociolexical profiling’, which aims at correlating the use of lexical items with author-attributed demographic features, such as gender, age, profession, and education. The approach was applied to a case study of a set of English idioms derived from the Pattern Dictionary of English Verbs (PDEV), a corpus-driven lexical resource which defines verb senses in terms of the phraseological patterns in which a verb typically occurs. For each selected idiom, a gender profile was generated based on data extracted from the Blog Authorship Corpus (BAC) in order to establish whether any statistically significant differences can be detected in the way men and women use idioms in every-day communication. A quantitative and qualitative analysis of the gender profiles was subsequently performed, enabling us to test the validity of the proposed approach. If performed on a large scale, we believe that sociolexical profiling will have important implications for several areas of research, including corpus lexicography, translation, creative writing, forensic linguistics, and natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is not surprising; idioms are known to generally occur with very low frequencies in most corpora.

References

  • Argamon, S., Koppel, M., Schler, J., Pennebaker, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)

    Article  Google Scholar 

  • Baisa, V., El Maarouf, I., Rychlý, P., Rambousek, A.: Software and data for corpus pattern analysis. In: Horák, A. et al. (eds.) RASLAN, pp. 75–86. Tribun EU (2015)

    Google Scholar 

  • Esuli, A., Sebastiani, F.: SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining (2006). http://citeseer.ist.psu.edu/esuli06sentiwordnet.html

  • Grieve, J.: Quantitative authorship attribution: an evaluation of techniques. Lit. Linguist. Comput. 22(3), 251–270 (2007)

    Article  Google Scholar 

  • Hanks, P.: Corpus pattern analysis. In: Williams, G., Vessier, S. (eds.) 11th Euralex International Congress, Proceedings, pp. 87–97. Université de Bretagne-Sud, Lorient (2004)

    Google Scholar 

  • Hanks, P.: Lexical Analysis: Norms and Exploitations. MIT Press, Cambridge (2013)

    Book  Google Scholar 

  • Ježek, E., Hanks, P.: What lexical sets tell us about conceptual categories. Lexis: E-J. Eng. lexicol. 7–22 (2010). 4: Corpus Linguistics and the Lexicon

    Google Scholar 

  • Ježek, E., Magnini, B., Feltracco, A., Bianchini, A., Popescu, O.: T-PAS: a resource of corpus-derived typed predicate argument structures for linguistic analysis and semantic processing. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 890–895. ELRA (2014)

    Google Scholar 

  • Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431 (2017)

    Google Scholar 

  • Juola, P.: Author attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2008)

    Article  Google Scholar 

  • Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1(1), 7–36 (2014)

    Article  Google Scholar 

  • Nazar, R., Renau, I.: Ontology population using corpus statistics. In: Papini, O. et al. (eds.) Proceedings of the Joint Ontology Workshops 2015 Co-located with the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015) (2015)

    Google Scholar 

  • Leech, G.: 100 million words of English: the British National Corpus (BNC). Lang. Res. 28(1), 1–13 (1992)

    MathSciNet  Google Scholar 

  • Oakes, M.: Literary Detective Work on the Computer. John Benjamins, Amsterdam/Philadelphia (2014)

    Book  Google Scholar 

  • Renau, I., Nazar, R.: Verbario. http://www.verbario.com. Accessed 14 May 2019

  • Ruppenhofer, J., Ellsworth, M., Petruck, M.R., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended Theory and Practice. ICSI, Berkeley (2006)

    Google Scholar 

  • Savoy, J.: Comparative evaluation of term selection functions for authorship attribution. Lit. Linguist. Comput. 30(2), 246–261 (2015)

    Article  MathSciNet  Google Scholar 

  • Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2008)

    Article  Google Scholar 

  • Schler J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199 –205. AAAI (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sara Može or Emad Mohamed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Može, S., Mohamed, E. (2019). Profiling Idioms:. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019. Lecture Notes in Computer Science(), vol 11755. Springer, Cham. https://doi.org/10.1007/978-3-030-30135-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30135-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30134-7

  • Online ISBN: 978-3-030-30135-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics