Chapter

Machine Learning and Knowledge Discovery in Databases

Volume 6323 of the series Lecture Notes in Computer Science pp 615-618

Detecting Events in a Million New York Times Articles

  • Tristan SnowsillAffiliated withCarnegie Mellon UniversityDepartment of Engineering Mathematics, University of Bristol
  • , Ilias FlaounasAffiliated withCarnegie Mellon UniversityDepartment of Computer Science, University of Bristol
  • , Tijl De BieAffiliated withCarnegie Mellon UniversityDepartment of Engineering Mathematics, University of Bristol
  • , Nello CristianiniAffiliated withCarnegie Mellon UniversityDepartment of Engineering Mathematics, University of BristolDepartment of Computer Science, University of Bristol

* Final gross prices may vary according to local VAT.

Get Access

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree.

This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.

Keywords

event detection suffix tree New York Times