Detecting Events in a Million New York Times Articles

  • Tristan Snowsill
  • Ilias Flaounas
  • Tijl De Bie
  • Nello Cristianini
Conference paper

DOI: 10.1007/978-3-642-15939-8_46

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)
Cite this paper as:
Snowsill T., Flaounas I., De Bie T., Cristianini N. (2010) Detecting Events in a Million New York Times Articles. In: Balcázar J.L., Bonchi F., Gionis A., Sebag M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science, vol 6323. Springer, Berlin, Heidelberg

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree.

This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.

Keywords

event detection suffix tree New York Times 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Tristan Snowsill
    • 1
  • Ilias Flaounas
    • 2
  • Tijl De Bie
    • 1
  • Nello Cristianini
    • 1
    • 2
  1. 1.Department of Engineering MathematicsUniversity of Bristol 
  2. 2.Department of Computer ScienceUniversity of Bristol 

Personalised recommendations