Burstiness of Verbs and Derived Nouns

Pierrehumbert, Janet B.

doi:10.1007/978-3-642-30773-7_8

Janet B. Pierrehumbert⁴

548 Accesses
8 Citations

Abstract

The frequencies of words vary with the discourse context, because any given word is more relevant to some topics of discussion than to others. In the statistical natural language processing literature, the term burstiness is used to characterize the tendency of topical words to occur repeatedly in bursts, separated by lulls in which they occur more rarely This article builds on the study of word burstiness by Altmann et al. (PLoS ONE 4:e7678, 2009). The study analyzed the archive of the USENET discussion group talk.origins, developed a novel method for quantifying burstiness, and showed that the burstiness of words is strongly correlated with their semantic type (in the sense of Montague semantics). Using the same dataset, I here explore the burstiness of abstract derived nouns (such as argument) in relation to their verb stems (e.g. argue) and frequency-matched nonderived nouns (such as science). I ask whether the burstiness of the derived form is inherited from the stem along with other stem features, such as the argument structure, or whether it is determined by the deverbal suffix. Overall, derived nouns pattern just like nonderived nouns, indicating that the suffix acts like the morphological head in determining the discourse statistics. This finding is interpreted in the light of Carlson’s theory of dialogue games (Carlson in Dialogue games: An approach to discourse analysis. Synthese language library, vol. 17. Reidel, Dordrecht, 1983).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter. 2009. Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4: e7678. doi:10.1371/journal.pone.0007678
Article Google Scholar
Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter. 2011. Niche as a determinant of word fate in online groups. PLoS ONE 6: e19009. doi:10.1371/journal.pone.0019009
Article Google Scholar
Anderson, John R., and Robert Milson. 1989. Human memory: An adaptive perspective. Psychological Review 96: 703–719.
Article Google Scholar
Baayen, R.H., Lee H. Wurm, and Joanna Aycock. 2007. Lexical dynamics for low-frequency complex words: a regression study across tasks and modalities. The Mental Lexicon 2: 419–463. doi:10.1075/ml.2.3.06baa
Google Scholar
van Benthem, Johan. 1989. Logical constants across varying types. Notre Dame Journal of Formal Logic 30: 315–342.
Article MathSciNet MATH Google Scholar
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3: 993–1022.
MATH Google Scholar
Blevins, Juliette, and Andrew Wedel. 2009. Inhibited sound change: An evolutionary approach to lexical competition. Diachronica 26: 143–183. doi:10.1075/dia.26.2.01ble
Article Google Scholar
Bookstein, Abraham, and Don R. Swanson. 1974. Probabilistic models for automatic indexing. Journal of the American Society for Information Science 25: 312–318. doi:10.1002/asi.4630250505
Article Google Scholar
Bybee, Joan. 2001. Phonology and language use. Vol. 94 of Cambridge studies in linguistics. Cambridge: Cambridge University Press.
Book Google Scholar
Carlson, Lauri. 1983. Dialogue games: An approach to discourse analysis. Vol. 17 of Synthese language library. Dordrecht: Reidel.
Google Scholar
Chomsky, Noam. 1970. Remarks on nominalizations. In Readings in English transformational grammar, eds. Roderick A. Jacobs and Peter S. Rosenbaum, 184–221. Waltham: Ginn.
Google Scholar
Church, Kenneth W. 2000. Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p ². In Proceedings of the 17th conference on computational linguistics (COLING 2000), 180–186. Stroudsburg: Association for Computational Linguistics.
Chapter Google Scholar
Church, Kenneth W., and William A. Gale. 1995. Poisson mixtures. Natural Language Engineering 1: 163–190. doi:10.1017/S1351324900000139
Article Google Scholar
Dennett, Daniel C., and John Haugeland. 1987. Intentionality. In The Oxford companion to the mind, ed. Richard L. Gregory, 383–386. London: Oxford University Press.
Google Scholar
Hay, Jennifer. 2003. Causes and consequences of word structure. London: Routledge.
Google Scholar
Heller, Jordana, and Janet B. Pierrehumbert. 2011. Word burstiness improves models of word reduction in spontaneous speech. In Architectures and mechanisms for language processing (AMLaP 2011), Paris. http://amlap2011.files.wordpress.com/2011/08/129_pdf.pdf.
Google Scholar
Heller, Jordana, Janet B. Pierrehumbert, and David N. Rapp. 2010. Predicting words beyond the syntactic horizon: Word recurrence distributions modulate on-line long-distance lexical predictability. In Architectures and mechanisms for language processing (AMLaP 2010). York: University of York.
Google Scholar
Hoeksema, Jack. 1992. The head parameter in morphology and syntax. In Language and cognition 2: Yearbook 1992 of the research group for linguistic theory and knowledge representation of the University of Groningen, eds. Dicky Gilbers and Sietze Looyenga, 119–132. Groningen: Universiteitsdrukkerij Groningen.
Google Scholar
Katz, Slava M. 1996. Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2: 15–59.
Article Google Scholar
Kintsch, Walter. 1974. The representation of meaning in memory. The experimental psychology series. Hillsdale: Erlbaum.
Google Scholar
Lijffijt, Jefrey, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila. 2011. Analyzing word frequencies in large text corpora using inter-arrival times and bootstrapping. In Proceedings of European conference on machine learning and knowledge discovery in databases (ECML PKDD 2011). Part II, eds. Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, Vol. 6912 of Lecture notes in artificial intelligence, 341–357. Berlin: Springer.
Chapter Google Scholar
Montague, Richard. 1973. The proper treatment of quantification in ordinary English. In Approaches to natural language, eds. Jaakko Hintikka, Julius Moravscik, and Patrick Suppes, 221–242. Dordrecht: Reidel.
Chapter Google Scholar
Montemurro, Marcelo A., and Damián H. Zanette. 2002. Entropic analysis of the role of words in literary texts. Advances in Complex Systems 5: 7–17.
Article MATH Google Scholar
Nigam, Kamal, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–134. doi:10.1023/A:1007692713085
Article MATH Google Scholar
Partee, Barbara H. 1992. Syntactic categories and semantic type. In Computational linguistics and formal semantics, eds. Michael Rosner and Roderick Johnson, Studies in natural language processing, 97–126. Cambridge: Cambridge University Press.
Chapter Google Scholar
Sarkar, Avik, Paul Garthwaite, and Anne de Roeck. 2005. A Bayesian mixture model for term re-occurrence and burstiness. In Proceedings of the 9th conference on computational natural language learning (CoNLL), 48–55.
Chapter Google Scholar
Sharkey, Noel E., and D.C. Mitchell. 1985. Word recognition in a functional context: The use of scripts in reading. Journal of Memory and Language 24: 253–270. doi:10.1016/0749-596X(85)90027-0
Article Google Scholar
Singer, Murray, Peter Andruslak, Paul Reisdorf, and Nancy L. Black. 1992. Individual differences in bridging inference processes. Memory & Cognition 20: 539–548. doi:10.3758/BF03199586
Article Google Scholar
Tanenhaus, Michael K., and Sarah Brown-Schmidt. 2008. Language processing in the natural world. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363: 1105–1122.
Article Google Scholar
von Fintel, Kai. 1995. The formal semantics of grammaticalization. In Proceedings of NELS 25. Vol. 2 of Papers from the workshops on language acquisition & language change GLSA, 175–189.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, and Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
Janet B. Pierrehumbert

Authors

Janet B. Pierrehumbert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Janet B. Pierrehumbert .

Editor information

Editors and Affiliations

Department of Literature, Area Studies and, University of Oslo, Niels Henrik Abels vei 36, Oslo, 0315, Oslo, Norway
Diana Santos
Department of Modern Languages, University of Helsinki, Unioninkatu 40, Helsinki, 00014, Finland
Krister Lindén
Informatics, University of Nairobi, School of Computing &, Chiromo Campus RM 105, Nairobi, 00100, Kenya
Wanjiku Ng’ang’a

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pierrehumbert, J.B. (2012). Burstiness of Verbs and Derived Nouns. In: Santos, D., Lindén, K., Ng’ang’a, W. (eds) Shall We Play the Festschrift Game?. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30773-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-30773-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30772-0
Online ISBN: 978-3-642-30773-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics