Advertisement

Detecting Events in a Million New York Times Articles

  • Tristan Snowsill
  • Ilias Flaounas
  • Tijl De Bie
  • Nello Cristianini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree.

This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.

Keywords

event detection suffix tree New York Times 

References

  1. 1.
    Sandhaus, E.: The New York Times Annotated Corpus. In: Linguistic Data Consortium, Philadelphia (2008)Google Scholar
  2. 2.
    Snowsill, T., Nicart, F., Stefani, M., De Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: 2010 IAPR Workshop on Cognitive Information Processing (2010)Google Scholar
  3. 3.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Tristan Snowsill
    • 1
  • Ilias Flaounas
    • 2
  • Tijl De Bie
    • 1
  • Nello Cristianini
    • 1
    • 2
  1. 1.Department of Engineering MathematicsUniversity of Bristol 
  2. 2.Department of Computer ScienceUniversity of Bristol 

Personalised recommendations