Skip to main content
Book cover

The Past Web pp 225–242Cite as

Automatic Generation of Timelines for Past-Web Events

Abstract

Despite significant advances in web archive infrastructures, the problem of exploring the historical heritage preserved by web archives is yet to be solved. Timeline generation emerges in this context as one possible solution for automatically producing summaries of news over time. Thanks to this, users can gain a better sense of reported news events, entities, stories or topics over time, such as getting a summary of the most important news about a politician, an organisation or a locality. Web archives play an important role here by providing access to a historical set of preserved information. This particular characteristic of web archives makes them an irreplaceable infrastructure and a valuable source of knowledge that contributes to the process of timeline generation. Accordingly, the authors of this chapter developed “Tell me Stories” (http://archive.tellmestories.pt), a news summarisation system, built on top of the infrastructure of Arquivo.pt—the Portuguese web-archive—to automatically generate a timeline summary of a given topic. In this chapter, we begin by providing a brief overview of the most relevant research conducted on the automatic generation of timelines for past-web events. Next, we describe the architecture and some use cases for “Tell me Stories”. Our system demonstrates how web archives can be used as infrastructures to develop innovative services. We conclude this chapter by enumerating open challenges in this field and possible future directions in the general area of temporal summarisation in web archives.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-63291-5_18
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   129.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-63291-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   169.99
Price excludes VAT (USA)
Hardcover Book
USD   169.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allan J, Gupta R, Khandelwal V (2001) Temporal summaries of new topics. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, 9–13 September. ACM Press, New Orleans, LA, USA, pp 10–18

    Google Scholar 

  • AlNoamany Y, Weigle MC, Nelson ML (2017) Generating stories from archived collections. In: Proceedings of the 2017 ACM on web science conference (WebSci’17). ACM Press, New York, NY, USA, pp 309–318

    CrossRef  Google Scholar 

  • Alonso O, Berberich K, Bedathur S, Weikum G (2010) Time-based exploration of news archives. In: Proceedings of the fourth workshop on human-computer interaction and information retrieval (HCIR), 22 August, New Brunswick, USA, pp 12–15

    Google Scholar 

  • Alonso O, Kandylas V, Tremblay S-E (2018) How it happened: discovering and archiving the evolution of a story using social signals. In: Proceedings of the 18th ACM/IEEE joint conference on digital libraries, 3–7 June, Fort Worth, USA, pp 193–202

    Google Scholar 

  • Ansah J, Liu L, Kang W, Kwashie S, Li J, Li J (2019) A graph is worth a thousand words: telling event stories using timeline summarization graphs. In: Proceedings of the World Wide Web Conference (WWW’19), 13–17 May. ACM, San Francisco, USA, pp 2565–2571

    CrossRef  Google Scholar 

  • Barros C, Lloret E, Saquete E, Navarro-Colorado B (2019) NATSUM: narrative abstractive summarization (A. Jorge, R. Campos, A. Jatowt, & S. Nunes, Eds.). Inf Process Manag 56(5):1775–1793

    CrossRef  Google Scholar 

  • Barzilay R, Elhadad N, McKeown KR (2002) Inferring strategies for sentence ordering in multidocument news summarization. J Artif Intell Res 17(1):35–55

    CrossRef  Google Scholar 

  • Campos R, Dias G, Jorge A, Jatowt A (2014) Survey of temporal information retrieval and related applications. ACM Comput Surv 47(2):15

    Google Scholar 

  • Campos R, Dias G, Jorge A, Nunes C (2017) Identifying top relevant dates for implicit time sensitive queries. Inf Retr J 20(4):363–398

    CrossRef  Google Scholar 

  • Campos R, Mangaravite V, Pasquali A, Jorge AM, Nunes C, Jatowt A (2018) A text feature based automatic keyword extraction method for single documents. In: Proceedings of the 40th European conference on information retrieval (ECIR’18). Springer, Grenoble, France, pp 684–691

    Google Scholar 

  • Campos R, Mangaravite V, Pasquali A, Jatowt A, Jorge A, Nunes C, Jatowt A (2020) YAKE! Keyword extraction from single documents using multiple local features. Inform Sci J 509:257–289

    Google Scholar 

  • Caselli T, Van Erp M, Minard A-L, Finlayson M, Miller B, Atserias J et al (2015) Proceedings of the first workshop on computing news storylines (CNewsStory’15), 31 July. Association for Computational Linguistics, Beijing, China, pp 1–73

    CrossRef  Google Scholar 

  • Finlayson MA, Whitman R, Winston P (2010) Computational models of narrative: review of a workshop. AI Mag 31(2):97–100

    Google Scholar 

  • Gomes D, Silva M (2006) Modelling information persistence on the web. Proceedings of the 6th international conference on web engineering (ICWE’06), 11–14 July, California, USA, pp 193–200

    Google Scholar 

  • Gomes D, Cruz D, Miranda J, Costa M, Fontes S (2013) Search the past with the Portuguese web archive. In: Proceedings of the 22nd international conference on world wide web (WWW’13), 13–17 May, Rio de Janeiro, pp 321–324

    Google Scholar 

  • Gossen G, Risse T, Demidova E (2018) Towards extracting event-centric collections from web archives. In: International journal on digital libraries. Springer, Cham, pp 1–15

    Google Scholar 

  • Hiltz SR, Plotnick L (2013) Dealing with information overload when using social media for emergency management: emerging solutions. In: Proceedings of the 10th international ISCRAM conference (ISCRAM’13), May 2013, Baden-Baden, Germany, pp 823–827

    Google Scholar 

  • Jorge A, Campos R, Jatowt A, Nunes S (2018) Proceedings of the first international workshop on on narrative extraction from text (Text2Story’18@ECIR’18), 26 March. CEUR, Grenoble, France, pp 1–51

    Google Scholar 

  • Kedzie C, McKeown K, Diaz F (2015) Predicting salient updates for disaster summarization. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguist (ACL’15) and the 7th international joint conference on natural language process (IJCNLP’15), 26–31 July, Beijing, China, pp 1608–1617

    Google Scholar 

  • Kenter T, de Rijke M (2015) Short text similarity with word embeddings. In: Proceedings of the 24th ACM international on conference on information and knowledge management (CIKM’15), 18–23 October. ACM, Melbourne, Australia, pp 1411–1420

    Google Scholar 

  • Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    CrossRef  MathSciNet  Google Scholar 

  • Magerko B, Riedl M (2007) Proceedings of the 1st intelligent narratives technologies, 9–11 November. AAAI, Arlington, USA, pp 1–190

    Google Scholar 

  • Margaret M, Huang T-H, Ferraro F, Misra I (2018) Proceedings of the storytelling workshop (StoryNLP’18@NAACL’18), 5 July, New Orleans, USA, pp 1–67

    Google Scholar 

  • Martinez-Alvarez M, Kruschwitz U, Kazai G, Hopfgartner F, Corney D, Campos R, Albakour D (2016) Report on the first international workshop on recent trends in news information retrieval. SIGIR Forum 50(1):58–67

    CrossRef  Google Scholar 

  • Martins F, Mourão A (2020) Revisionista.PT: uncovering the news cycle using web archives. Proceedings of the 42nd European conference on information retrieval (ECIR’20). Springer, Lisbon, Portugal

    Google Scholar 

  • Martschat S, Markert K (2017) Improving {ROUGE} for timeline summarization. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics, 3–7 April. Association for Computational Linguistics, Valencia, Spain, pp 285–290

    Google Scholar 

  • McCreadie R, Macdonald C, Ounis I (2014) Incremental update summarization: adaptive sentence selection based on prevalence and novelty. In: Proceedings of the 23rd ACM international conference on information and knowledge management (CIKM’14), 3–7 November. ACM Press, Shanghai, China, pp 301–310

    Google Scholar 

  • McCreadie R, Santos R, Macdonald C, Ounis I (2018) Explicit diversification of event aspects for temporal summarization. ACM Trans Infor Syst 36(3):25

    CrossRef  Google Scholar 

  • McKeown K, Passonneau RJ, Elson DK, Nenkova A, Hirschberg J (2005) Do summaries help? A task-based evaluation of multi-document summarization. In: Proceedings of the 28th annual international conference on research and development in information retrieval (SIGIR’05), 15–19 August. ACM Press, Salvador da Bahia, Brazil, pp 217–210

    Google Scholar 

  • Mishra A, Berberich K (2016) Event digest: a holistic view on past events. In: Proceedings of the 39th international conference on research and development in information retrieval (SIGIR’16), 17–21 July. ACM Press, Pisa, Italy, pp 493–502

    Google Scholar 

  • Niklaus C, Cetto M, Freitas A, Handschuh S (2018) A survey on open information extraction. In: Proceedings of the 27th international conference on computational linguistics, 20–26 August, Santa Fe, USA, pp 3866–3878

    Google Scholar 

  • Pasquali A, Mangaravite V, Campos R, Jorge A, Jatowt A (2019) Interactive system for automatically generating temporal narratives. In: Proceedings of the 41st European conference on information retrieval (ECIR’19), 14–18 April. Springer, Cologne, Germany

    Google Scholar 

  • Piotrkowicz A, Dimitrova V, Markert K (2017) Automatic extraction of news values from headline text. In: Proceedings of the student research workshop at the 15th conference of the European chapter of the Association for Computational Linguistics (SRW@EACL’17), 3–7 April. Association for Computational Linguistic, Valencia, Spain, pp 64–74

    Google Scholar 

  • Tran GB, Alrifai M, Nguyen DQ (2013) Predicting relevant news events for timeline summaries. In: WWW2013: proceedings of the companion publication of the 22nd international conference on world wide web companion, 13–17 May, Rio de Janeiro, Brazil, pp 91–92

    Google Scholar 

  • Tran G, Alrifai M, Herder E (2015) Timeline summarization from relevant headlines. In: Proceedings of the 37th European conference on information retrieval, 29 March–2 April. Springer, Vienna, Austria, pp 245–256

    Google Scholar 

  • Vossen P, Caselli T, Kontzopoulou Y (2015) Storylines for structuring massive streams of news. In: Proceedings of the first workshop on computing news storylines (CNewsStory’15@ACL-IJCNLP’15), 31 July. Association for Computational Linguistics, Beijing, China, pp 40–49

    CrossRef  Google Scholar 

  • Wang D, Li T (2010) Document update summarization using incremental hierarchical clustering. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM’10), 26–30 October. ACM Press, Toronto, Canada, pp 279–288

    Google Scholar 

  • Xu S, Kong L, Zhang Y (2012) A picture paints a thousand words: a method of generating image-text timelines. In: CIKM 2012: proceedings of the 21st ACM international conference on information and knowledge management, 29 October–2 November. ACM Press, Maui, Hawaii, pp 2511–2514

    Google Scholar 

  • Yan R, Wan X, Otterbacher J, Kong L, Li X, Zhang Y (2011) Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR’11), 24–28 July. ACM Press, Beijing, China, pp 745–754

    Google Scholar 

Download references

Acknowledgements

Arian Pasquali and Vítor Mangaravite were financed by National Funds through the Portuguese funding agency, FCT (Fundação para a Ciência e a Tecnologia), within project UIDB/50014/2020. Ricardo Campos and Alípio Jorge were financed by the ERDF (European Regional Development Fund) through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020, and by National Funds through the Portuguese funding agency, FCT (Fundação para a Ciência e a Tecnologia) within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185). This funding fits under the research line of the Text2Story project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Campos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Campos, R., Pasquali, A., Jatowt, A., Mangaravite, V., Jorge, A.M. (2021). Automatic Generation of Timelines for Past-Web Events. In: Gomes, D., Demidova, E., Winters, J., Risse, T. (eds) The Past Web. Springer, Cham. https://doi.org/10.1007/978-3-030-63291-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63291-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63290-8

  • Online ISBN: 978-3-030-63291-5

  • eBook Packages: Computer ScienceComputer Science (R0)