Skip to main content

Exploding TV Sets and Disappointing Laptops: Suggesting Interesting Content in News Archives Based on Surprise Estimation

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2021)

Abstract

Many archival collections have been recently digitized and made available to a wide public. The contained documents however tend to have limited attractiveness for ordinary users, since content may appear obsolete and uninteresting. Archival document collections can become more attractive for users if suitable content can be recommended to them. The purpose of this research is to propose a new research direction of Archival Content Suggestion to discover interesting content from long-term document archives that preserve information on society history and heritage. To realize this objective, we propose two unsupervised approaches for automatically discovering interesting sentences from news article archives. Our methods detect interesting content by comparing the information written in the past with one created in the present to make use of a surprise effect. Experiments on New York Times corpus show that our approaches effectively retrieve interesting content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://chroniclingamerica.loc.gov/.

  2. 2.

    https://books.google.com/.

  3. 3.

    https://archive.org/.

  4. 4.

    For example: https://allthatsinteresting.com/interesting-history-facts https://www.thefactsite.com/100-history-facts/ https://parade.com/1099930/marynliles/history-facts/.

  5. 5.

    One could imagine a service that automatically detects interesting sentences or headlines for broad topics and publishes them daily on web portals of underlying document archives.

  6. 6.

    https://www.wikipedia.org/.

  7. 7.

    http://trec.nist.gov/.

  8. 8.

    https://lucene.apache.org/solr/.

  9. 9.

    https://www.nltk.org/.

  10. 10.

    We have also experimented with embedding models but they did not perform better.

  11. 11.

    https://www.figure-eight.com/.

  12. 12.

    We set n=5 as the number of top sentences returned for every top-ranked topic in Topic-based MRRW, and for each top-ranked topic pair in Topic Pair-based MRRW method and Topic co-occurrence methods.

  13. 13.

    Anecdotally, this particular example triggered recollections of childhood memories of one author. His grandparents owned a USSR-produced TV set and often warned him not to sit close to it when he visited their home. Only now, he could understand that the fears of his relatives were actually not without a substance. On a more general note, exploring news archives offers chances for learning about history, and might sometimes even lead to serendipitous discoveries and recollections as this example demonstrates.

References

  1. Adamopoulos, P., Tuzhilin, A.: On unexpectedness in recommender systems: or how to better expect the unexpected. ACM TIST 5(4), 54 (2015)

    Google Scholar 

  2. Baldi, P., Itti, L.: Of bits and wows: a Bayesian theory of surprise with applications to attention. Neural Netw. 23(5), 649–666 (2010)

    Article  Google Scholar 

  3. Berk, N.A., Gültekin, F.: The topics that students are curious about in the history lesson. Procedia-Soc. Behav. Sci. 15, 2785–2791 (2011)

    Article  Google Scholar 

  4. Berlyne, D.E.: Conflict, arousal, and curiosity (1960)

    Google Scholar 

  5. Boldi, P., Monti, C.: LlamaFur: learning latent category matrix to find unexpected relations in Wikipedia. In: Proceedings of WebScience, pp. 218–222. ACM (2016)

    Google Scholar 

  6. Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. 47(2), 15:1–15:41 (2014)

    Google Scholar 

  7. Chen, Y.N., Metze, F.: Two-layer mutually reinforced random walk for improved multi-party meeting summarization. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 461–466. IEEE (2012)

    Google Scholar 

  8. Costa, M., Silva, M.: Understanding the information needs of web archive users. In: The 10th International Web Archiving Workshop (2011)

    Google Scholar 

  9. Derezinski, M., Rohanimanesh, K., Hydrie, A.: Discovering surprising documents with context-aware word representations. In: 23rd International Conference on Intelligent User Interfaces, pp. 31–35. ACM (2018)

    Google Scholar 

  10. Färber, M.: Semantic Search for Novel Information, vol. 31. IOS Press, Amsterdam (2017)

    Google Scholar 

  11. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 9 (2006)

    Article  Google Scholar 

  12. Gomes, D., Cruz, D., Miranda, J., Costa, M., Fontes, S.: Search the past with the Portuguese web archive. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 321–324 (2013)

    Google Scholar 

  13. Hidi, S., Baird, W.: Interestingness-a neglected variable in discourse processing. Cogn. Sci. 10(2), 179–194 (1986)

    Google Scholar 

  14. Itti, L., Baldi, P.F.: A principled approach to detecting surprising events in video. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Siego, CA, pp. 631–637, June 2005

    Google Scholar 

  15. Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision Res. 49(10), 1295–1306 (2009)

    Article  Google Scholar 

  16. Kaminskas, M., Bridge, D.: Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Tran. Interact. Intell. Syst. (TiiS) 7(1), 1–42 (2016)

    Google Scholar 

  17. Kanhabua, N., Anand, A.: Temporal information retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1235–1238 (2016)

    Google Scholar 

  18. Koolen, M., Kamps, J.: Searching cultural heritage data: does structure help expert searchers? In: Adaptivity, Personalization and Fusion of Heterogeneous Information, pp. 152–155. Citeseer (2010)

    Google Scholar 

  19. Kuznetsov, S.O., Makhalova, T.: On interestingness measures of formal concepts. Inf. Sci. 442, 202–219 (2018)

    Article  MathSciNet  Google Scholar 

  20. Li, X., Croft, W.B.: Improving novelty detection for general topics using sentence level information patterns. In: Proceedings of CIKM, pp. 238–247. ACM (2006)

    Google Scholar 

  21. Liu, B., Hsu, W., Mun, L.F., Lee, H.Y.: Finding interesting patterns using user expectations. IEEE Trans. Knowl. Data Eng. 11(6), 817–832 (1999)

    Article  Google Scholar 

  22. Macrae, C.N., Bodenhausen, G.V.: Social cognition: thinking categorically about others. Annu. Rev. Psychol. 51(1), 93–120 (2000)

    Article  Google Scholar 

  23. Padmanabhan, B., Tuzhilin, A.: Unexpectedness as a measure of interestingness in knowledge discovery. Decis. Support Syst. 27(3), 303–318 (1999)

    Article  Google Scholar 

  24. Pasquali, A., Mangaravite, V., Campos, R., Jorge, A.M., Jatowt, A.: Interactive system for automatically generating temporal narratives. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 251–255. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15719-7_34

    Chapter  Google Scholar 

  25. Pessent, E.: Is history irrelevant? Dissent Mag., pp. 1, June 1971

    Google Scholar 

  26. Sandhaus, E.: The New York times annotated corpus. Linguist. Data Consortium Philadelphia 6(12), e26752 (2008)

    Google Scholar 

  27. Schwartz, J.M., Cook, T.: Archives, records, and power: the making of modern memory. Arch. Sci. 2(1–2), 1–19 (2002)

    Article  Google Scholar 

  28. Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE TKDE 8(6), 970–974 (1996)

    Google Scholar 

  29. Silveira, T., Zhang, M., Lin, X., Liu, Y., Ma, S.: How good your recommender system is? A survey on evaluations in recommendation. Int. J. Mach. Learn. Cybern. 10(5), 813–831 (2019)

    Article  Google Scholar 

  30. Silvia, P.J.: What is interesting? Exploring the appraisal structure of interest. Emotion 5(1), 89 (2005)

    Article  Google Scholar 

  31. Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi-relational data. Data Min. Knowl. Discov. 28(3), 808–849 (2014)

    Article  MathSciNet  Google Scholar 

  32. Stiller, J.: A framework for classifying interactions in cultural heritage information systems. Int. J. Heritage Digital Era 1(1), 141–146 (2012)

    Article  Google Scholar 

  33. Strauss, V.: Why so many students hate history - and what to do about it? The Washington Post (2017)

    Google Scholar 

  34. Tran, G., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 245–256. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_26

    Chapter  Google Scholar 

  35. Trant, J.: Understanding searches of a contemporary art museum catalogue: a preliminary study. Report, Archives & Museum Informatics (2006)

    Google Scholar 

  36. Tsukuda, K., Ohshima, H., Yamamoto, M., Iwasaki, H., Tanaka, K.: Discovering unexpected information on the basis of popularity/unpopularity analysis of coordinate objects and their relationships. In: Proceedings of SAC, pp. 878–885. ACM (2013)

    Google Scholar 

  37. Tsurel, D., Pelleg, D., Guy, I., Shahaf, D.: Fun facts: automatic trivia fact extraction from Wikipedia. In: Proceedings of WSDM, pp. 345–354. ACM (2017)

    Google Scholar 

  38. Veale, T., Cardoso, A.: Computational Creativity: The Philosophy and Engineering of Autonomously Creative Systems. CSACS, vol. 31. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-43610-4

    Book  Google Scholar 

  39. Warwick, C., Terras, M., Huntington, P., Pappa, N.: If you build it will they come? The LAIRAH study: quantifying the use of online resources in the arts and humanities through statistical analysis of user log data. Literary Linguist. Comput. 23(1), 85–102 (2007)

    Article  Google Scholar 

  40. Yannakakis, G.N., Liapis, A.: Searching for surprise. In: Proceedings of the International Conference on Computational Creativity (2016)

    Google Scholar 

Download references

Acknowledgments

This work has been partially funded by MEXT JSPS Grant-in-Aid. Ricardo Campos, one of the authors of this paper was financed by the ERDF – European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185). This funding fits under the research line of the Text2Story project. The first author was employed by Kyoto University when the first version of this paper was created.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Adam Jatowt , Ricardo Campos or Masatoshi Yoshikawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jatowt, A., Hung, IC., Färber, M., Campos, R., Yoshikawa, M. (2021). Exploding TV Sets and Disappointing Laptops: Suggesting Interesting Content in News Archives Based on Surprise Estimation. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12656. Springer, Cham. https://doi.org/10.1007/978-3-030-72113-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72113-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72112-1

  • Online ISBN: 978-3-030-72113-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics