Advertisement

Burstiness of Verbs and Derived Nouns

  • Janet B. Pierrehumbert

Abstract

The frequencies of words vary with the discourse context, because any given word is more relevant to some topics of discussion than to others. In the statistical natural language processing literature, the term burstiness is used to characterize the tendency of topical words to occur repeatedly in bursts, separated by lulls in which they occur more rarely This article builds on the study of word burstiness by Altmann et al. (PLoS ONE 4:e7678, 2009). The study analyzed the archive of the USENET discussion group talk.origins, developed a novel method for quantifying burstiness, and showed that the burstiness of words is strongly correlated with their semantic type (in the sense of Montague semantics). Using the same dataset, I here explore the burstiness of abstract derived nouns (such as argument) in relation to their verb stems (e.g. argue) and frequency-matched nonderived nouns (such as science). I ask whether the burstiness of the derived form is inherited from the stem along with other stem features, such as the argument structure, or whether it is determined by the deverbal suffix. Overall, derived nouns pattern just like nonderived nouns, indicating that the suffix acts like the morphological head in determining the discourse statistics. This finding is interpreted in the light of Carlson’s theory of dialogue games (Carlson in Dialogue games: An approach to discourse analysis. Synthese language library, vol. 17. Reidel, Dordrecht, 1983).

Keywords

Word Frequency Semantic Type Argument Structure Complex Word Common Noun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter. 2009. Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4: e7678. doi: 10.1371/journal.pone.0007678 CrossRefGoogle Scholar
  2. Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter. 2011. Niche as a determinant of word fate in online groups. PLoS ONE 6: e19009. doi: 10.1371/journal.pone.0019009 CrossRefGoogle Scholar
  3. Anderson, John R., and Robert Milson. 1989. Human memory: An adaptive perspective. Psychological Review 96: 703–719. CrossRefGoogle Scholar
  4. Baayen, R.H., Lee H. Wurm, and Joanna Aycock. 2007. Lexical dynamics for low-frequency complex words: a regression study across tasks and modalities. The Mental Lexicon 2: 419–463. doi: 10.1075/ml.2.3.06baa Google Scholar
  5. van Benthem, Johan. 1989. Logical constants across varying types. Notre Dame Journal of Formal Logic 30: 315–342. MathSciNetzbMATHCrossRefGoogle Scholar
  6. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3: 993–1022. zbMATHGoogle Scholar
  7. Blevins, Juliette, and Andrew Wedel. 2009. Inhibited sound change: An evolutionary approach to lexical competition. Diachronica 26: 143–183. doi: 10.1075/dia.26.2.01ble CrossRefGoogle Scholar
  8. Bookstein, Abraham, and Don R. Swanson. 1974. Probabilistic models for automatic indexing. Journal of the American Society for Information Science 25: 312–318. doi: 10.1002/asi.4630250505 CrossRefGoogle Scholar
  9. Bybee, Joan. 2001. Phonology and language use. Vol. 94 of Cambridge studies in linguistics. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  10. Carlson, Lauri. 1983. Dialogue games: An approach to discourse analysis. Vol. 17 of Synthese language library. Dordrecht: Reidel. Google Scholar
  11. Chomsky, Noam. 1970. Remarks on nominalizations. In Readings in English transformational grammar, eds. Roderick A. Jacobs and Peter S. Rosenbaum, 184–221. Waltham: Ginn. Google Scholar
  12. Church, Kenneth W. 2000. Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p 2. In Proceedings of the 17th conference on computational linguistics (COLING 2000), 180–186. Stroudsburg: Association for Computational Linguistics. CrossRefGoogle Scholar
  13. Church, Kenneth W., and William A. Gale. 1995. Poisson mixtures. Natural Language Engineering 1: 163–190. doi: 10.1017/S1351324900000139 CrossRefGoogle Scholar
  14. Dennett, Daniel C., and John Haugeland. 1987. Intentionality. In The Oxford companion to the mind, ed. Richard L. Gregory, 383–386. London: Oxford University Press. Google Scholar
  15. Hay, Jennifer. 2003. Causes and consequences of word structure. London: Routledge. Google Scholar
  16. Heller, Jordana, and Janet B. Pierrehumbert. 2011. Word burstiness improves models of word reduction in spontaneous speech. In Architectures and mechanisms for language processing (AMLaP 2011), Paris. http://amlap2011.files.wordpress.com/2011/08/129_pdf.pdf. Google Scholar
  17. Heller, Jordana, Janet B. Pierrehumbert, and David N. Rapp. 2010. Predicting words beyond the syntactic horizon: Word recurrence distributions modulate on-line long-distance lexical predictability. In Architectures and mechanisms for language processing (AMLaP 2010). York: University of York. Google Scholar
  18. Hoeksema, Jack. 1992. The head parameter in morphology and syntax. In Language and cognition 2: Yearbook 1992 of the research group for linguistic theory and knowledge representation of the University of Groningen, eds. Dicky Gilbers and Sietze Looyenga, 119–132. Groningen: Universiteitsdrukkerij Groningen. Google Scholar
  19. Katz, Slava M. 1996. Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2: 15–59. CrossRefGoogle Scholar
  20. Kintsch, Walter. 1974. The representation of meaning in memory. The experimental psychology series. Hillsdale: Erlbaum. Google Scholar
  21. Lijffijt, Jefrey, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila. 2011. Analyzing word frequencies in large text corpora using inter-arrival times and bootstrapping. In Proceedings of European conference on machine learning and knowledge discovery in databases (ECML PKDD 2011). Part II, eds. Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, Vol. 6912 of Lecture notes in artificial intelligence, 341–357. Berlin: Springer. CrossRefGoogle Scholar
  22. Montague, Richard. 1973. The proper treatment of quantification in ordinary English. In Approaches to natural language, eds. Jaakko Hintikka, Julius Moravscik, and Patrick Suppes, 221–242. Dordrecht: Reidel. CrossRefGoogle Scholar
  23. Montemurro, Marcelo A., and Damián H. Zanette. 2002. Entropic analysis of the role of words in literary texts. Advances in Complex Systems 5: 7–17. zbMATHCrossRefGoogle Scholar
  24. Nigam, Kamal, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–134. doi: 10.1023/A:1007692713085 zbMATHCrossRefGoogle Scholar
  25. Partee, Barbara H. 1992. Syntactic categories and semantic type. In Computational linguistics and formal semantics, eds. Michael Rosner and Roderick Johnson, Studies in natural language processing, 97–126. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  26. Sarkar, Avik, Paul Garthwaite, and Anne de Roeck. 2005. A Bayesian mixture model for term re-occurrence and burstiness. In Proceedings of the 9th conference on computational natural language learning (CoNLL), 48–55. CrossRefGoogle Scholar
  27. Sharkey, Noel E., and D.C. Mitchell. 1985. Word recognition in a functional context: The use of scripts in reading. Journal of Memory and Language 24: 253–270. doi: 10.1016/0749-596X(85)90027-0 CrossRefGoogle Scholar
  28. Singer, Murray, Peter Andruslak, Paul Reisdorf, and Nancy L. Black. 1992. Individual differences in bridging inference processes. Memory & Cognition 20: 539–548. doi: 10.3758/BF03199586 CrossRefGoogle Scholar
  29. Tanenhaus, Michael K., and Sarah Brown-Schmidt. 2008. Language processing in the natural world. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363: 1105–1122. CrossRefGoogle Scholar
  30. von Fintel, Kai. 1995. The formal semantics of grammaticalization. In Proceedings of NELS 25. Vol. 2 of Papers from the workshops on language acquisition & language change GLSA, 175–189. Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Department of Linguistics, and Northwestern Institute on Complex SystemsNorthwestern UniversityEvanstonUSA

Personalised recommendations