Abstract
Despite significant advances in web archive infrastructures, the problem of exploring the historical heritage preserved by web archives is yet to be solved. Timeline generation emerges in this context as one possible solution for automatically producing summaries of news over time. Thanks to this, users can gain a better sense of reported news events, entities, stories or topics over time, such as getting a summary of the most important news about a politician, an organisation or a locality. Web archives play an important role here by providing access to a historical set of preserved information. This particular characteristic of web archives makes them an irreplaceable infrastructure and a valuable source of knowledge that contributes to the process of timeline generation. Accordingly, the authors of this chapter developed “Tell me Stories” (http://archive.tellmestories.pt), a news summarisation system, built on top of the infrastructure of Arquivo.pt—the Portuguese web-archive—to automatically generate a timeline summary of a given topic. In this chapter, we begin by providing a brief overview of the most relevant research conducted on the automatic generation of timelines for past-web events. Next, we describe the architecture and some use cases for “Tell me Stories”. Our system demonstrates how web archives can be used as infrastructures to develop innovative services. We conclude this chapter by enumerating open challenges in this field and possible future directions in the general area of temporal summarisation in web archives.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Allan J, Gupta R, Khandelwal V (2001) Temporal summaries of new topics. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, 9–13 September. ACM Press, New Orleans, LA, USA, pp 10–18
AlNoamany Y, Weigle MC, Nelson ML (2017) Generating stories from archived collections. In: Proceedings of the 2017 ACM on web science conference (WebSci’17). ACM Press, New York, NY, USA, pp 309–318
Alonso O, Berberich K, Bedathur S, Weikum G (2010) Time-based exploration of news archives. In: Proceedings of the fourth workshop on human-computer interaction and information retrieval (HCIR), 22 August, New Brunswick, USA, pp 12–15
Alonso O, Kandylas V, Tremblay S-E (2018) How it happened: discovering and archiving the evolution of a story using social signals. In: Proceedings of the 18th ACM/IEEE joint conference on digital libraries, 3–7 June, Fort Worth, USA, pp 193–202
Ansah J, Liu L, Kang W, Kwashie S, Li J, Li J (2019) A graph is worth a thousand words: telling event stories using timeline summarization graphs. In: Proceedings of the World Wide Web Conference (WWW’19), 13–17 May. ACM, San Francisco, USA, pp 2565–2571
Barros C, Lloret E, Saquete E, Navarro-Colorado B (2019) NATSUM: narrative abstractive summarization (A. Jorge, R. Campos, A. Jatowt, & S. Nunes, Eds.). Inf Process Manag 56(5):1775–1793
Barzilay R, Elhadad N, McKeown KR (2002) Inferring strategies for sentence ordering in multidocument news summarization. J Artif Intell Res 17(1):35–55
Campos R, Dias G, Jorge A, Jatowt A (2014) Survey of temporal information retrieval and related applications. ACM Comput Surv 47(2):15
Campos R, Dias G, Jorge A, Nunes C (2017) Identifying top relevant dates for implicit time sensitive queries. Inf Retr J 20(4):363–398
Campos R, Mangaravite V, Pasquali A, Jorge AM, Nunes C, Jatowt A (2018) A text feature based automatic keyword extraction method for single documents. In: Proceedings of the 40th European conference on information retrieval (ECIR’18). Springer, Grenoble, France, pp 684–691
Campos R, Mangaravite V, Pasquali A, Jatowt A, Jorge A, Nunes C, Jatowt A (2020) YAKE! Keyword extraction from single documents using multiple local features. Inform Sci J 509:257–289
Caselli T, Van Erp M, Minard A-L, Finlayson M, Miller B, Atserias J et al (2015) Proceedings of the first workshop on computing news storylines (CNewsStory’15), 31 July. Association for Computational Linguistics, Beijing, China, pp 1–73
Finlayson MA, Whitman R, Winston P (2010) Computational models of narrative: review of a workshop. AI Mag 31(2):97–100
Gomes D, Silva M (2006) Modelling information persistence on the web. Proceedings of the 6th international conference on web engineering (ICWE’06), 11–14 July, California, USA, pp 193–200
Gomes D, Cruz D, Miranda J, Costa M, Fontes S (2013) Search the past with the Portuguese web archive. In: Proceedings of the 22nd international conference on world wide web (WWW’13), 13–17 May, Rio de Janeiro, pp 321–324
Gossen G, Risse T, Demidova E (2018) Towards extracting event-centric collections from web archives. In: International journal on digital libraries. Springer, Cham, pp 1–15
Hiltz SR, Plotnick L (2013) Dealing with information overload when using social media for emergency management: emerging solutions. In: Proceedings of the 10th international ISCRAM conference (ISCRAM’13), May 2013, Baden-Baden, Germany, pp 823–827
Jorge A, Campos R, Jatowt A, Nunes S (2018) Proceedings of the first international workshop on on narrative extraction from text (Text2Story’18@ECIR’18), 26 March. CEUR, Grenoble, France, pp 1–51
Kedzie C, McKeown K, Diaz F (2015) Predicting salient updates for disaster summarization. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguist (ACL’15) and the 7th international joint conference on natural language process (IJCNLP’15), 26–31 July, Beijing, China, pp 1608–1617
Kenter T, de Rijke M (2015) Short text similarity with word embeddings. In: Proceedings of the 24th ACM international on conference on information and knowledge management (CIKM’15), 18–23 October. ACM, Melbourne, Australia, pp 1411–1420
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Magerko B, Riedl M (2007) Proceedings of the 1st intelligent narratives technologies, 9–11 November. AAAI, Arlington, USA, pp 1–190
Margaret M, Huang T-H, Ferraro F, Misra I (2018) Proceedings of the storytelling workshop (StoryNLP’18@NAACL’18), 5 July, New Orleans, USA, pp 1–67
Martinez-Alvarez M, Kruschwitz U, Kazai G, Hopfgartner F, Corney D, Campos R, Albakour D (2016) Report on the first international workshop on recent trends in news information retrieval. SIGIR Forum 50(1):58–67
Martins F, Mourão A (2020) Revisionista.PT: uncovering the news cycle using web archives. Proceedings of the 42nd European conference on information retrieval (ECIR’20). Springer, Lisbon, Portugal
Martschat S, Markert K (2017) Improving {ROUGE} for timeline summarization. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics, 3–7 April. Association for Computational Linguistics, Valencia, Spain, pp 285–290
McCreadie R, Macdonald C, Ounis I (2014) Incremental update summarization: adaptive sentence selection based on prevalence and novelty. In: Proceedings of the 23rd ACM international conference on information and knowledge management (CIKM’14), 3–7 November. ACM Press, Shanghai, China, pp 301–310
McCreadie R, Santos R, Macdonald C, Ounis I (2018) Explicit diversification of event aspects for temporal summarization. ACM Trans Infor Syst 36(3):25
McKeown K, Passonneau RJ, Elson DK, Nenkova A, Hirschberg J (2005) Do summaries help? A task-based evaluation of multi-document summarization. In: Proceedings of the 28th annual international conference on research and development in information retrieval (SIGIR’05), 15–19 August. ACM Press, Salvador da Bahia, Brazil, pp 217–210
Mishra A, Berberich K (2016) Event digest: a holistic view on past events. In: Proceedings of the 39th international conference on research and development in information retrieval (SIGIR’16), 17–21 July. ACM Press, Pisa, Italy, pp 493–502
Niklaus C, Cetto M, Freitas A, Handschuh S (2018) A survey on open information extraction. In: Proceedings of the 27th international conference on computational linguistics, 20–26 August, Santa Fe, USA, pp 3866–3878
Pasquali A, Mangaravite V, Campos R, Jorge A, Jatowt A (2019) Interactive system for automatically generating temporal narratives. In: Proceedings of the 41st European conference on information retrieval (ECIR’19), 14–18 April. Springer, Cologne, Germany
Piotrkowicz A, Dimitrova V, Markert K (2017) Automatic extraction of news values from headline text. In: Proceedings of the student research workshop at the 15th conference of the European chapter of the Association for Computational Linguistics (SRW@EACL’17), 3–7 April. Association for Computational Linguistic, Valencia, Spain, pp 64–74
Tran GB, Alrifai M, Nguyen DQ (2013) Predicting relevant news events for timeline summaries. In: WWW2013: proceedings of the companion publication of the 22nd international conference on world wide web companion, 13–17 May, Rio de Janeiro, Brazil, pp 91–92
Tran G, Alrifai M, Herder E (2015) Timeline summarization from relevant headlines. In: Proceedings of the 37th European conference on information retrieval, 29 March–2 April. Springer, Vienna, Austria, pp 245–256
Vossen P, Caselli T, Kontzopoulou Y (2015) Storylines for structuring massive streams of news. In: Proceedings of the first workshop on computing news storylines (CNewsStory’15@ACL-IJCNLP’15), 31 July. Association for Computational Linguistics, Beijing, China, pp 40–49
Wang D, Li T (2010) Document update summarization using incremental hierarchical clustering. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM’10), 26–30 October. ACM Press, Toronto, Canada, pp 279–288
Xu S, Kong L, Zhang Y (2012) A picture paints a thousand words: a method of generating image-text timelines. In: CIKM 2012: proceedings of the 21st ACM international conference on information and knowledge management, 29 October–2 November. ACM Press, Maui, Hawaii, pp 2511–2514
Yan R, Wan X, Otterbacher J, Kong L, Li X, Zhang Y (2011) Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR’11), 24–28 July. ACM Press, Beijing, China, pp 745–754
Acknowledgements
Arian Pasquali and Vítor Mangaravite were financed by National Funds through the Portuguese funding agency, FCT (Fundação para a Ciência e a Tecnologia), within project UIDB/50014/2020. Ricardo Campos and Alípio Jorge were financed by the ERDF (European Regional Development Fund) through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020, and by National Funds through the Portuguese funding agency, FCT (Fundação para a Ciência e a Tecnologia) within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185). This funding fits under the research line of the Text2Story project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Campos, R., Pasquali, A., Jatowt, A., Mangaravite, V., Jorge, A.M. (2021). Automatic Generation of Timelines for Past-Web Events. In: Gomes, D., Demidova, E., Winters, J., Risse, T. (eds) The Past Web. Springer, Cham. https://doi.org/10.1007/978-3-030-63291-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-63291-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63290-8
Online ISBN: 978-3-030-63291-5
eBook Packages: Computer ScienceComputer Science (R0)