Abstract
The frequencies of words vary with the discourse context, because any given word is more relevant to some topics of discussion than to others. In the statistical natural language processing literature, the term burstiness is used to characterize the tendency of topical words to occur repeatedly in bursts, separated by lulls in which they occur more rarely This article builds on the study of word burstiness by Altmann et al. (PLoS ONE 4:e7678, 2009). The study analyzed the archive of the USENET discussion group talk.origins, developed a novel method for quantifying burstiness, and showed that the burstiness of words is strongly correlated with their semantic type (in the sense of Montague semantics). Using the same dataset, I here explore the burstiness of abstract derived nouns (such as argument) in relation to their verb stems (e.g. argue) and frequency-matched nonderived nouns (such as science). I ask whether the burstiness of the derived form is inherited from the stem along with other stem features, such as the argument structure, or whether it is determined by the deverbal suffix. Overall, derived nouns pattern just like nonderived nouns, indicating that the suffix acts like the morphological head in determining the discourse statistics. This finding is interpreted in the light of Carlson’s theory of dialogue games (Carlson in Dialogue games: An approach to discourse analysis. Synthese language library, vol. 17. Reidel, Dordrecht, 1983).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter. 2009. Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4: e7678. doi:10.1371/journal.pone.0007678
Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter. 2011. Niche as a determinant of word fate in online groups. PLoS ONE 6: e19009. doi:10.1371/journal.pone.0019009
Anderson, John R., and Robert Milson. 1989. Human memory: An adaptive perspective. Psychological Review 96: 703–719.
Baayen, R.H., Lee H. Wurm, and Joanna Aycock. 2007. Lexical dynamics for low-frequency complex words: a regression study across tasks and modalities. The Mental Lexicon 2: 419–463. doi:10.1075/ml.2.3.06baa
van Benthem, Johan. 1989. Logical constants across varying types. Notre Dame Journal of Formal Logic 30: 315–342.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3: 993–1022.
Blevins, Juliette, and Andrew Wedel. 2009. Inhibited sound change: An evolutionary approach to lexical competition. Diachronica 26: 143–183. doi:10.1075/dia.26.2.01ble
Bookstein, Abraham, and Don R. Swanson. 1974. Probabilistic models for automatic indexing. Journal of the American Society for Information Science 25: 312–318. doi:10.1002/asi.4630250505
Bybee, Joan. 2001. Phonology and language use. Vol. 94 of Cambridge studies in linguistics. Cambridge: Cambridge University Press.
Carlson, Lauri. 1983. Dialogue games: An approach to discourse analysis. Vol. 17 of Synthese language library. Dordrecht: Reidel.
Chomsky, Noam. 1970. Remarks on nominalizations. In Readings in English transformational grammar, eds. Roderick A. Jacobs and Peter S. Rosenbaum, 184–221. Waltham: Ginn.
Church, Kenneth W. 2000. Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p 2. In Proceedings of the 17th conference on computational linguistics (COLING 2000), 180–186. Stroudsburg: Association for Computational Linguistics.
Church, Kenneth W., and William A. Gale. 1995. Poisson mixtures. Natural Language Engineering 1: 163–190. doi:10.1017/S1351324900000139
Dennett, Daniel C., and John Haugeland. 1987. Intentionality. In The Oxford companion to the mind, ed. Richard L. Gregory, 383–386. London: Oxford University Press.
Hay, Jennifer. 2003. Causes and consequences of word structure. London: Routledge.
Heller, Jordana, and Janet B. Pierrehumbert. 2011. Word burstiness improves models of word reduction in spontaneous speech. In Architectures and mechanisms for language processing (AMLaP 2011), Paris. http://amlap2011.files.wordpress.com/2011/08/129_pdf.pdf.
Heller, Jordana, Janet B. Pierrehumbert, and David N. Rapp. 2010. Predicting words beyond the syntactic horizon: Word recurrence distributions modulate on-line long-distance lexical predictability. In Architectures and mechanisms for language processing (AMLaP 2010). York: University of York.
Hoeksema, Jack. 1992. The head parameter in morphology and syntax. In Language and cognition 2: Yearbook 1992 of the research group for linguistic theory and knowledge representation of the University of Groningen, eds. Dicky Gilbers and Sietze Looyenga, 119–132. Groningen: Universiteitsdrukkerij Groningen.
Katz, Slava M. 1996. Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2: 15–59.
Kintsch, Walter. 1974. The representation of meaning in memory. The experimental psychology series. Hillsdale: Erlbaum.
Lijffijt, Jefrey, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila. 2011. Analyzing word frequencies in large text corpora using inter-arrival times and bootstrapping. In Proceedings of European conference on machine learning and knowledge discovery in databases (ECML PKDD 2011). Part II, eds. Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, Vol. 6912 of Lecture notes in artificial intelligence, 341–357. Berlin: Springer.
Montague, Richard. 1973. The proper treatment of quantification in ordinary English. In Approaches to natural language, eds. Jaakko Hintikka, Julius Moravscik, and Patrick Suppes, 221–242. Dordrecht: Reidel.
Montemurro, Marcelo A., and Damián H. Zanette. 2002. Entropic analysis of the role of words in literary texts. Advances in Complex Systems 5: 7–17.
Nigam, Kamal, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–134. doi:10.1023/A:1007692713085
Partee, Barbara H. 1992. Syntactic categories and semantic type. In Computational linguistics and formal semantics, eds. Michael Rosner and Roderick Johnson, Studies in natural language processing, 97–126. Cambridge: Cambridge University Press.
Sarkar, Avik, Paul Garthwaite, and Anne de Roeck. 2005. A Bayesian mixture model for term re-occurrence and burstiness. In Proceedings of the 9th conference on computational natural language learning (CoNLL), 48–55.
Sharkey, Noel E., and D.C. Mitchell. 1985. Word recognition in a functional context: The use of scripts in reading. Journal of Memory and Language 24: 253–270. doi:10.1016/0749-596X(85)90027-0
Singer, Murray, Peter Andruslak, Paul Reisdorf, and Nancy L. Black. 1992. Individual differences in bridging inference processes. Memory & Cognition 20: 539–548. doi:10.3758/BF03199586
Tanenhaus, Michael K., and Sarah Brown-Schmidt. 2008. Language processing in the natural world. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363: 1105–1122.
von Fintel, Kai. 1995. The formal semantics of grammaticalization. In Proceedings of NELS 25. Vol. 2 of Papers from the workshops on language acquisition & language change GLSA, 175–189.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pierrehumbert, J.B. (2012). Burstiness of Verbs and Derived Nouns. In: Santos, D., Lindén, K., Ng’ang’a, W. (eds) Shall We Play the Festschrift Game?. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30773-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-30773-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30772-0
Online ISBN: 978-3-642-30773-7
eBook Packages: Computer ScienceComputer Science (R0)