Advertisement

Constructing a Recipe Web from Historical Newspapers

  • Marieke van ErpEmail author
  • Melvin Wevers
  • Hugo Huurdeman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11136)

Abstract

Historical newspapers provide a lens on customs and habits of the past. For example, recipes published in newspapers highlight what and how we ate and thought about food. The challenge here is that newspaper data is often unstructured and highly varied. Digitised historical newspapers add an additional challenge, namely that of fluctuations in OCR quality. Therefore, it is difficult to locate and extract recipes from them. We present our approach based on distant supervision and automatically extracted lexicons to identify recipes in digitised historical newspapers, to generate recipe tags, and to extract ingredient information. We provide OCR quality indicators and their impact on the extraction process. We enrich the recipes with links to information on the ingredients. Our research shows how natural language processing, machine learning, and semantic web can be combined to construct a rich dataset from heterogeneous newspapers for the historical analysis of food culture.

Keywords

Natural language processing Information extraction Food history Digitised newspapers Digital humanities 

Notes

Acknowledgements

The authors thank the National Library of the Netherlands for making available the newspaper collection for research purposes as well as for organising the HackaLOD hackathon with Rijksmuseum and Netwerk Digitaal Erfgoed where this project got started. We thank Jesse de Does for the OCR quality measure, Marten Postma and Emiel van Miltenburg for querying Open Dutch WordNet, and Richard Zijdeman for fruitful discussions on the dataset concept. No Hawaiian pizzas were consumed during the writing of this paper.

References

  1. 1.
    van Otterloo, A.H.: Eten en eetlust in Nederland, 1840–1990: een historisch-sociologische studie. B. Bakker, Amsterdam (1990)Google Scholar
  2. 2.
    Wilson, T.M. (ed.): Food, Drink and Identity In Europe. European studies. Rodopi, Amsterdam (2006)Google Scholar
  3. 3.
    Schudson, M.: The Power of News. Harvard University Press, Cambridge (1982)Google Scholar
  4. 4.
    Marchand, R.: Advertising the American Dream: Making Way for Modernity, 1920–1940. University of California Press, Berkeley (1985)Google Scholar
  5. 5.
    Harashima, J., Ariga, M., Murata, K., Ioki, M.: A large-scale recipe and meal data collection as infrastructure for food research. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, May 2016Google Scholar
  6. 6.
    Tasse, D., Smith, N.A.: SOUR CREAM: toward semantic processing of recipes. Technical report CMU-LTI-08-005, Carnegie Mellon University, Pittsburgh, PA (2008)Google Scholar
  7. 7.
    Maeta, H., Sasada, T., Mori, S.: A framework for recipe text interpretation. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Adjunct Publication. ACM, pp. 553–558 (2014)Google Scholar
  8. 8.
    Mori, S., Maeta, H., Yamakata, Y., Sasada, T.: Flow graph corpus from recipe texts. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, May 2014Google Scholar
  9. 9.
    Mazzei, A.: On the lexical coverage of some resources on Italian cooking recipes. In: Proceedings of CLiC-it 2014, First Italian Conference on Computational Linguistics, pp. 254–259 (2014)Google Scholar
  10. 10.
    Kicherer, H., Dittrich, M., Grebe, L., Scheible, C., Klinger, R.: What you use, not what you do: automatic classification of recipes. In: Frasincar, F., Ittoo, A., Nguyen, L.M., Métais, E. (eds.) NLDB 2017. LNCS, vol. 10260, pp. 197–209. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59569-6_22CrossRefGoogle Scholar
  11. 11.
    Greene, E.: Extracting structured data from recipes using conditional random fields. The New York Times Open Blog (2015)Google Scholar
  12. 12.
    Packer, T.L., et al.: Extracting person names from diverse and noisy OCR text. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 19–26. ACM (2010)Google Scholar
  13. 13.
    Kolchin, M., Chistyakov, A., Lapaev, M., Khaydarova, R.: FOODpedia: Russian food products as a linked data dataset. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 87–90. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25639-9_17CrossRefGoogle Scholar
  14. 14.
    Chang, M., Hare, V.M., Kim, J., Agrawala, M.: RecipeScape: mining and analyzing diverse processes in cooking recipes. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1524–1531. ACM (2017)Google Scholar
  15. 15.
    Jurafsky, D., Chahuneau, V., Routledge, B., Smith, N.: Linguistic markers of status in food culture: Bordieu’s distinction in a menu corpus. J. Cult. Anal. (2016). http://culturalanalytics.org/2016/10/linguistic-markers-of-status-in-food-culture-bourdieus-distinction-in-a-menu-corpus/
  16. 16.
    Schuyt, K., Taverne, E.: Dutch Culture in a European Perspective: 1950, Prosperity and Welfare. Palgrave Macmillan, Basingstoke (2004)Google Scholar
  17. 17.
    Hoving, I., Dibbits, H., Schrover, M., eds.: Cultuur en migratie in Nederland. Veranderingen in het Alledaagse, 1950–2000. Sdu Uitgevers, The Hague (2005)Google Scholar
  18. 18.
    Schot, J., Rip, A., Lintsen, H. (eds.): Technology and the Making of the Netherlands: The Age of Contested Modernization, 1890–1970. MIT Press, Cambridge (2010)Google Scholar
  19. 19.
    Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011)Google Scholar
  20. 20.
    Postma, M., van Miltenburg, E., Segers, R., Schoen, A., Vossen, P.: Open Dutch WordNet. In: Proceedings of the Eight Global WordNet Conference, Bucharest, Romania (2016)Google Scholar
  21. 21.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Marieke van Erp
    • 1
    Email author
  • Melvin Wevers
    • 1
  • Hugo Huurdeman
    • 2
  1. 1.KNAW Humanities Cluster, DHLabAmsterdamThe Netherlands
  2. 2.Universiteit van AmsterdamAmsterdamThe Netherlands

Personalised recommendations