Which Words Do You Remember? Temporal Properties of Language Use in Digital Archives

  • Nina Tahmasebi
  • Gerhard Gossen
  • Thomas Risse
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7489)


Knowing the behavior of terms in written texts can help us tailor fit models, algorithms and resources to improve access to digital libraries and help us answer information needs in longer spanning archives. In this paper we investigate the behavior of English written text in blogs in comparison to traditional texts from the New York Times, The Times Archive, and the British National Corpus. We show that user generated content, similar to spoken content, differs in characteristics from ‘professionally’ written text and experiences a more dynamic behavior.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abecker, A., Stojanovic, L.: Ontology evolution: Medline case study. In: Wirtschaftsinformatik: eEconomy, eGovernment, eSociety, pp. 1291–1308 (2005)Google Scholar
  2. 2.
    Bamman, D., Crane, G.: Measuring historical word sense variation. In: JCDL, pp. 1–10 (2011)Google Scholar
  3. 3.
    The British National Corpus, version 3, BNC Consortium (2007)Google Scholar
  4. 4.
    Christiansen, M., Kirby, S.: Language evolution. Studies in the evolution of language. Oxford University Press (2003)Google Scholar
  5. 5.
    Ernst-Gerlach, A., Fuhr, N.: Retrieval in text collections with historic spelling using linguistic and spelling variants. In: JCDL, pp. 333–341 (2007)Google Scholar
  6. 6.
    Kanhabua, N., Nørvåg, K.: Exploiting time-based synonyms in searching document archives. In: JCDL, pp. 79–88 (2010)Google Scholar
  7. 7.
    Macdonald, C., Ounis, I.: The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection. DCS Technical Report Series (2006)Google Scholar
  8. 8.
    Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38, 39–41 (1995)CrossRefGoogle Scholar
  9. 9.
    Pinker, S., Bloom, P.: Natural selection and natural language. Behavioral and Brain Sciences 13(4), 707–784 (1990)CrossRefGoogle Scholar
  10. 10.
    Segerstad, Y.: Use and adaptation of written language to the conditions of computer-mediated communication. Ph.D. thesis, Göteborg University (2002)Google Scholar
  11. 11.
    Tahmasebi, N., Niklas, K., Theuerkauf, T., Risse, T.: Using Word Sense Discrimination on Historic Document Collections. In: JCDL, pp. 89–98 (2010)Google Scholar
  12. 12.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nina Tahmasebi
    • 1
  • Gerhard Gossen
    • 1
  • Thomas Risse
    • 1
  1. 1.L3S Research CenterHannoverGermany

Personalised recommendations