Focused crawler for events

  • Mohamed M. G. Farag
  • Sunshin Lee
  • Edward A. Fox
Article

DOI: 10.1007/s00799-016-0207-1

Cite this article as:
Farag, M.M.G., Lee, S. & Fox, E.A. Int J Digit Libr (2017). doi:10.1007/s00799-016-0207-1
  • 124 Downloads

Abstract

There is need for an Integrated Event Focused Crawling system to collect Web data about key events. When a disaster or other significant event occurs, many users try to locate the most up-to-date information about that event. Yet, there is little systematic collecting and archiving anywhere of event information. We propose intelligent event focused crawling for automatic event tracking and archiving, ultimately leading to effective access. We developed an event model that can capture key event information, and incorporated that model into a focused crawling algorithm. For the focused crawler to leverage the event model in predicting webpage relevance, we developed a function that measures the similarity between two event representations. We then conducted two series of experiments to evaluate our system about two recent events: California shooting and Brussels attack. The first experiment series evaluated the effectiveness of our proposed event model representation when assessing the relevance of webpages. Our event model-based representation outperformed the baseline method (topic-only); it showed better results in precision, recall, and F1-score with an improvement of 20% in F1-score. The second experiment series evaluated the effectiveness of the event model-based focused crawler for collecting relevant webpages from the WWW. Our event model-based focused crawler outperformed the state-of-the-art baseline focused crawler (best-first); it showed better results in harvest ratio with an average improvement of 40%.

Keywords

Event archiving Focused crawling Web archiving Event modeling Digital libraries 

Funding information

Funder NameGrant NumberFunding Note
National Science Foundation (US)
  • IIS - 1319578

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Mohamed M. G. Farag
    • 1
  • Sunshin Lee
    • 1
  • Edward A. Fox
    • 1
  1. 1.Virginia TechBlacksburgUSA

Personalised recommendations