Finding Person Relations in Image Data of News Collections in the Internet Archive

Müller-Budack, Eric; Pustu-Iren, Kader; Diering, Sebastian; Ewerth, Ralph

doi:10.1007/978-3-030-00066-0_20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11057))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1824 Accesses
3 Citations
4 Altmetric

Abstract

The amount of multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually disregarded. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Results can be found on: http://vis-www.cs.umass.edu/lfw/results.html.
2.
The entity list can be found at: https://github.com/TIB-Visual-Analytics/PIIA.

References

Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016)
Google Scholar
Best-Rowden, L., Jain, A.K.: Longitudinal study of automatic face recognition. Trans. Pattern Anal. Mach. Intell. 40, 148–162 (2018)
Article Google Scholar
Brambilla, M., Ceri, S., Della Valle, E., Volonterio, R., Acero Salazar, F.X.: Extracting emerging knowledge from social media. In: International Conference on World Wide Web, pp. 795–804. IW3C2 (2017)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition, pp. 886–893. IEEE (2005)
Google Scholar
Ding, C., Tao, D.: Trunk-branch ensemble convolutional neural networks for video-based face recognition. Trans. Pattern Anal. Mach. Intell. 40, 1002–1014 (2017)
Article Google Scholar
Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A.G., Draicchio, F., Mongiovì, M.: Semantic web machine reading with FRED. Semant. Web 8(6), 873–893 (2017)
Article Google Scholar
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst (2007)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105. NIPS (2012)
Google Scholar
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE (2017)
Google Scholar
Masi, I., et al.: Learning pose-aware models for pose-invariant face recognition in the wild. Trans. Pattern Anal. Mach. Intell. (2018)
Google Scholar
Masi, I., Hassner, T., Tran, A.T., Medioni, G.: Rapid synthesis of massive face sets for improved face recognition. In: International Conference on Automatic Face & Gesture Recognition, pp. 604–611. IEEE (2017)
Google Scholar
Masi, I., Rawls, S., Medioni, G., Natarajan, P.: Pose-aware face recognition in the wild. In: Conference on Computer Vision and Pattern Recognition, pp. 4838–4846. IEEE (2016)
Google Scholar
Masi, I., Tran, A.T., Hassner, T., Leksut, J.T., Medioni, G.: Do we really need to collect millions of faces for effective face recognition? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 579–596. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_35
Chapter Google Scholar
Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)
Google Scholar
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Article MathSciNet Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition, pp. 815–823. IEEE (2015)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Conference on Computer Vision and Pattern Recognition, pp. 1891–1898. IEEE (2014)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: Conference on Computer Vision and Pattern Recognition, pp. 1701–1708. IEEE (2014)
Google Scholar
Van Erp, M., Rizzo, G., Troncy, R.: Learning with the web: Spotting named entities on the intersection of NERD and machine learning. In: Workshop on Making Sense of Microposts, pp. 27–30 (2013)
Google Scholar
Wen, Y., Li, Z., Qiao, Y.: Latent factor guided convolutional neural networks for age-invariant face recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 4893–4901. IEEE (2016)
Google Scholar
Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: International Conference on Computer Vision, pp. 3676–3684. IEEE (2015)
Google Scholar
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. CoRR abs/1411.7923 (2014)
Google Scholar
Yin, X., Yu, X., Sohn, K., Liu, X., Chandraker, M.: Towards large-pose face frontalization in the wild. CoRR abs/1704.06244 (2017)
Google Scholar
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Conference on Computer Vision and Pattern Recognition, pp. 146–155. IEEE (2016)
Google Scholar

Download references

Acknowledgement

This work is financially supported by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft, project number: EW 134/4-1). The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233, Wolfgang Nejdl).

Author information

Authors and Affiliations

Leibniz Information Centre for Science and Technology (TIB), Hannover, Germany
Eric Müller-Budack, Kader Pustu-Iren, Sebastian Diering & Ralph Ewerth
L3S Research Center, Leibniz Universität Hannover, Hannover, Germany
Eric Müller-Budack & Ralph Ewerth

Authors

Eric Müller-Budack
View author publications
You can also search for this author in PubMed Google Scholar
Kader Pustu-Iren
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Diering
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Ewerth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Müller-Budack .

Editor information

Editors and Affiliations

University Carlos III, Madrid, Spain
Eva Méndez
USI, Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Cristina Ribeiro
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Gabriel David
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müller-Budack, E., Pustu-Iren, K., Diering, S., Ewerth, R. (2018). Finding Person Relations in Image Data of News Collections in the Internet Archive. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-00066-0_20
Published: 05 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics