Detecting Events in a Million New York Times Articles

  • Tristan Snowsill
  • Ilias Flaounas
  • Tijl De Bie
  • Nello Cristianini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree.

This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.

Keywords

event detection suffix tree New York Times 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sandhaus, E.: The New York Times Annotated Corpus. In: Linguistic Data Consortium, Philadelphia (2008)Google Scholar
  2. 2.
    Snowsill, T., Nicart, F., Stefani, M., De Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: 2010 IAPR Workshop on Cognitive Information Processing (2010)Google Scholar
  3. 3.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Tristan Snowsill
    • 1
  • Ilias Flaounas
    • 2
  • Tijl De Bie
    • 1
  • Nello Cristianini
    • 1
    • 2
  1. 1.Department of Engineering MathematicsUniversity of Bristol 
  2. 2.Department of Computer ScienceUniversity of Bristol 

Personalised recommendations