Skip to main content

A Systematic Analysis of Sentence Update Detection for Temporal Summarization

  • 2151 Accesses

Part of the Lecture Notes in Computer Science book series (LNISA,volume 10193)

Abstract

Temporal summarization algorithms filter large volumes of streaming documents and emit sentences that constitute salient event updates. Systems developed typically combine in an ad-hoc fashion traditional retrieval and document summarization algorithms to filter sentences inside documents. Retrieval and summarization algorithms however have been developed to operate on static document collections. Therefore, a deep understanding of the limitations of these approaches when applied to a temporal summarization task is necessary. In this work we present a systematic analysis of the methods used for retrieval of update sentences in temporal summarization, and demonstrate the limitations and potentials of these methods by examining the retrievability and the centrality of event updates, as well as the existence of intrinsic inherent characteristics in update versus non-update sentences.

Keywords

  • Temporal summarization
  • Content analysis
  • Event modeling

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-56608-5_33
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-56608-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

Notes

  1. 1.

    TREC TS focuses on large events with a wide impact, such as natural catastrophes (storms, earthquakes), conflicts (bombings, protests, riots, shootings) and accidents.

  2. 2.

    In total we extract 8,471 unigrams and 1,169,276 bigrams using the log-likelihood ratio weighting scheme.

  3. 3.

    We discard event types for which there is not enough annotated data available.

  4. 4.

    http://trec-kba.org/kba-stream-corpus-2014.shtml.

  5. 5.

    Word2Vec was trained on the set of gold standard updates from the TREC TS 2013 and TREC TS 2014 collections.

  6. 6.

    No documents were released for event 7, hence the white row in the heatmap.

  7. 7.

    For events 14, 21, 24 and 25 we cannot report on any centrality scores across relevant documents due to the size of the data and the inability of LexRank to handle it - hence the white rows in the heatmap in columns (B). The average values for the precision measures below the heatmap are computed excluding these events.

References

  1. Allan, J.: HARD track overview in TREC 2003 high accuracy retrieval from documents. Technical report, DTIC Document (2005)

    Google Scholar 

  2. Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report (1998)

    Google Scholar 

  3. Allan, J., Gupta, R., Khandelwal, V.: Topic models for summarizing novelty. In: ARDA Workshop on LMIR, Pennsylvania (2001)

    Google Scholar 

  4. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: Proceedings of the 21st ACM SIGIR Conference, pp. 37–45 (1998)

    Google Scholar 

  5. Aslam, J.A., Diaz, F., Ekstrand-Abueg, M., McCreadie, R., Pavlu, V., Sakai, T.: TREC 2015 temporal summarization. In: Proceedings of the 24th TREC Conference 2015, Gaithersburg, MD, USA (2015)

    Google Scholar 

  6. Chakrabarti, D., Punera, K.: Event summarization using Tweets. ICWSM 11, 66–73 (2011)

    Google Scholar 

  7. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Google Scholar 

  8. Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the 21st ACM CIKM Conference, pp. 1173–1182. ACM (2012)

    Google Scholar 

  9. Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th ACL Interactive Poster and Demonstration Sessions, pp. 193–196. ACL (2007)

    Google Scholar 

  10. Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Comput. Surv. (CSUR) 47(4), 67 (2015)

    CrossRef  Google Scholar 

  11. Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., Robertson, S.: INEX 2007 evaluation measures. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 24–33. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85902-4_2

    CrossRef  Google Scholar 

  12. Kedzie, C., McKeown, K., Diaz, F.: Predicting salient updates for disaster summarization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, ACL, pp. 1608–1617 (2015)

    Google Scholar 

  13. McCreadie, R., Macdonald, C., Ounis, I.: Incremental update summarization: adaptive sentence selection based on prevalence and novelty. In: Proceedings of the 23rd ACM CIKM Conference, pp. 301–310. ACM (2014)

    Google Scholar 

  14. Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. ACL (2004)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in NIPS, pp. 3111–3119 (2013)

    Google Scholar 

  16. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    CrossRef  Google Scholar 

  17. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)

    Google Scholar 

  18. Radev, D.R., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., et al.: Mead-a platform for multidocument multilingual text summarization. In: LREC (2004)

    Google Scholar 

  19. Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the Workshop on Comparing Corpora, pp. 1–6. ACL (2000)

    Google Scholar 

  20. Vuurens, J.B.P., de Vries, A.P., Blanco, R., Mika, P.: Online news tracking for ad-hoc information needs. In: Proceedings of the 2015 lCTIR Conference, MA, USA, 27–30 September 2015, pp. 221–230 (2015)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Dutch national program COMMIT. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evangelos Kanoulas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gârbacea, C., Kanoulas, E. (2017). A Systematic Analysis of Sentence Update Detection for Temporal Summarization. In: , et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56608-5_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56607-8

  • Online ISBN: 978-3-319-56608-5

  • eBook Packages: Computer ScienceComputer Science (R0)