Skip to main content
Log in

Content-based video retrieval in historical collections of the German Broadcasting Archive

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

The German Broadcasting Archive maintains the cultural heritage of radio and television broadcasts of the former German Democratic Republic (GDR). The uniqueness and importance of the video material fosters a large scientific interest in the video content. In this paper, we present a system for automatic video content analysis and retrieval to facilitate search in historical collections of GDR television recordings. It relies on a distributed, service-oriented architecture and includes video analysis algorithms for shot boundary detection, concept classification, person recognition, text recognition and similarity search. The combination of different search modalities allows users to obtain answers for a wide range of queries, leading to satisfactory results in short time. The performance of the system is evaluated using 2500 h of GDR television recordings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.yovisto.com.

  2. http://www.osti.gov/sciencecinema.

  3. http://av.tib.eu.

  4. http://www.jordicom.ch/tv-media/.

  5. http://wdr-mediagroup.com.

  6. http://trecvid.nist.gov.

  7. http://www.cognitec.com.

  8. http://opencv.org.

  9. https://github.com/tmbdev/ocropy.

  10. http://code.google.com/p/tesseract-ocr/.

  11. https://developer.nvidia.com/devbox.

References

  1. Ahonen, T., Hadid, A., Pietikainen, M.: Face recognition with local binary patterns. In: Proceedings of the IEEE European Conference on Computer Vision. pp. 469–481 (2004)

  2. Albertson, D., Ju, B.: Design criteria for video digital libraries: categories of important features emerging from users’ responses. Online Inf. Rev. 39(2), 214–228 (2015)

    Article  Google Scholar 

  3. Belhumeur, P.N., Kriegman, D.J.: Eigenfaces versus fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)

    Article  Google Scholar 

  4. Breuel, T.M., Ul-Hasan, A., Al-Azawi, M.A., Shafait, F.: High-performance OCR for printed English and Fraktur using LSTM networks. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 683–687 (2013)

  5. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the British Machine Vision Conference, pp. 1–11 (2014)

  6. Christel, M., Kanade, T., Mauldin, M., Reddy, R., Sirbu, M., Stevens, S., Wactlar, H.: Informedia digital video library. Commun. ACM 38(4), 57–58 (1995)

    Article  Google Scholar 

  7. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: A real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 48:1–48:9 (2009)

  8. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09), pp. 2–9 (2009)

  9. Ewerth, R., Ballafkir, K., Mühling, M., Seiler, D., Freisleben, B.: Long-term incremental web-supervised learning of visual concepts via random savannas. IEEE Trans. Multimed. 14(4), 1008–1020 (2012)

    Article  Google Scholar 

  10. Ewerth, R., Freisleben, B.: Video cut detection without thresholds. In: Proceedings of the 11th International Workshop on Signals, Systems and Image Processing (IWSSIP ’04), pp. 227–230. Poznan, Poland (2004)

  11. Ewerth, R., Freisleben, B.: Unsupervised detection of gradual video shot changes with motion-based false alarm removal. In: Proceedings of the 11th Conference on Advanced Concepts for Intelligent Vision Systems, pp. 253–264 (2009)

  12. Ewerth, R., Mühling, M., Freisleben, B.: Self-supervised learning of face appearances in TV casts and movies. Int. J. Semant. Comput. 1(2), 185–204 (2007)

    Article  Google Scholar 

  13. Ewerth, R., Mühling, M., Freisleben, B.: Robust video content analysis via transductive learning. ACM Trans. Intell. Syst. Technol. (TIST) 3(3), 1–26 (2011)

    Google Scholar 

  14. Ewerth, R., Schwalb, M., Tessmann, P., Freisleben, B.: Segmenting Moving Objects in MPEG Videos in the Presence of Camera Motion. In: Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on IEEE, pp. 819–824 (2007)

  15. Gllavata, J., Ewerth, R.: Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of 17th International Conference on Pattern Recognition (ICPR ’04), pp. 425–428. IEEE (2004)

  16. Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep Convolutional Ranking for Multilabel Image Annotation. arXiv preprint arXiv:1312.4894 (2013)

  17. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 6645–6649 (2013)

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  19. Hentschel, C., Blümel, I., Sack, H.: Automatic annotation of scientific video material based on visual concept detection. In: Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies, p. 16 (2013)

  20. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678 (2014)

  21. Krizhevsky, A., Hinton, G.: Using very deep Autoencoders for content-based image retrieval. In: Proceedings of the European Symposium on Artificial Neural Networks, pp. 1–7 (2011)

  22. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1–9 (2012)

  23. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Doklady. 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  24. Lin, K., Yang, H., Hsiao, J., Chen, C.: Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 27–35 (2015)

  25. Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: Proceedings of the 13th IEEE International Conference on Computer Vision, pp. 2486–2493 (2011)

  26. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)

  27. Marchionini, G., Geisler, G.: The open video digital library. D-Lib. Mag. 8(12), 1082–9873 (2002)

    Google Scholar 

  28. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  29. Mühling, M., Markus, M., Ewerth, R., Freisleben, B.: Improving cross-domain concept detection via object-based features. In: Proceedings of the International Conference on Computer Analysis of Images and Patterns (CAIP ’15) (2015)

  30. Mühling, M., Ewerth, R., Freisleben, B.: On the spatial extents of SIFT descriptors for visual concept detection. In: Proceedings of the 8th International Conference on Computer Vision Systems, pp. 71–80. Springer (2011)

  31. Mühling, M., Ewerth, R., Shi, B., Freisleben, B.: Multi-class object detection with hough forests using local histograms of visual words. In: Proceedings of 14th International Conference on Computer Analysis of Images and Patterns, pp. 386–393. Springer (2011)

  32. Mühling, M., Ewerth, R., Zhou, J., Freisleben, B.: Multimodal video concept detection via bag of auditory words and multiple kernel learning. In: Proceedings of the 18th International Conference on Advances in Multimedia Modeling, pp. 40–50. Springer (2012)

  33. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2012)

  34. Sack, H., Plank, M.: AV-Portal: The German National Library of Science and Technology’s Semantic Video Portal. ERCIM News 96 (2014)

  35. Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approx. Reason. 50(7), 969–978 (2009)

    Article  Google Scholar 

  36. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

  37. Smeulders, A.W., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern. Anal. Mach. Intell. 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  38. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  39. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: Closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2014)

  40. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 511–518 (2001)

  41. Wan, J., Wang, D., Hoi, S.C.H., Wu, P.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the ACM International Conference on Multimedia, pp. 157–166 (2014)

  42. Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321 (1993)

  43. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. Adv. Neural Inf. Process. Syst. 27, 487–495 (2014)

    Google Scholar 

Download references

Acknowledgements

This work is financially supported by the German Research Foundation (DFG; Funding Programme: “Förderung herausragender Forschungsbibliotheken”; Project: “Bild- und Szenenrecherche in historischen Beständen des DDR-Fernsehens im Deutschen Rundfunkarchiv durch automatische inhaltsbasierte Videoanalyse”; CR 456/1-1, EW 134/1-1, FR 791/12-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Mühling.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mühling, M., Meister, M., Korfhage, N. et al. Content-based video retrieval in historical collections of the German Broadcasting Archive. Int J Digit Libr 20, 167–183 (2019). https://doi.org/10.1007/s00799-018-0236-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-018-0236-z

Keywords

Navigation