Advertisement

Multimedia Tools and Applications

, Volume 63, Issue 2, pp 287–329 | Cite as

CONTENTUS—technologies for next generation multimedia libraries

Automatic multimedia processing for semantic search
  • Jan NandzikEmail author
  • Berenike Litz
  • Nicolas Flores-Herr
  • Aenne Löhden
  • Iuliu Konya
  • Doris Baum
  • André Bergholz
  • Dirk Schönfuß
  • Christian Fey
  • Johannes Osterhoff
  • Jörg Waitelonis
  • Harald Sack
  • Ralf Köhler
  • Patrick Ndjiki-Nya
Article

Abstract

An ever-growing amount of digitized content urges libraries and archives to integrate new media types from a large number of origins such as publishers, record labels and film archives, into their existing collections. This is a challenging task, since the multimedia content itself as well as the associated metadata is inherently heterogeneous—the different sources lead to different data structures, data quality and trustworthiness. This paper presents the contentus approach towards an automated media processing chain for cultural heritage organizations and content holders. Our workflow allows for unattended processing from media ingest to availability thorough our search and retrieval interface. We aim to provide a set of tools for the processing of digitized print media, audio/visual, speech and musical recordings. Media specific functionalities include quality control for digitization of still image and audio/visual media and restoration of the most common quality issues encountered with these media. Furthermore, the contentus tools include modules for content analysis like segmentation of printed, audio and audio/visual media, optical character recognition (OCR), speech-to-text transcription, speaker recognition and the extraction of musical features from audio recordings, all aimed at a textual representation of information inherent within the media assets. Once the information is extracted and transcribed in textual form, media independent processing modules offer extraction and disambiguation of named entities and text classification. All contentus modules are designed to be flexibly recombined within a scalable workflow environment using cloud computing techniques. In the next step analyzed media assets can be retrieved and consumed through a search interface using all available metadata. The search engine combines Semantic Web technologies for representing relations between the media and entities such as persons, locations and organizations with a full-text approach for searching within transcribed information gathered through the preceding processing steps. The contentus unified search interface integrates text, images, audio and audio/visual content. Queries can be narrowed and expanded in an exploratory manner, search results can be refined by disambiguating entities and topics. Further, semantic relationships become not only apparent, but can also be navigated.

Keywords

Automatic information extraction Semantic Web Workflow automation Multimedia search and retrieval Digital libraries Content analysis 

Notes

Acknowledgements

We would like to kindly thank Andreas Hess for initial input, Jan Hannemann for coordination and supporting of the review process and Klaus Bossert for review. The project contentus was funded by means of the German Federal Ministry of Economy and Technology under the promotional reference “01MQ07003”.

Our project partners are:

•    German National Library (Project lead, content provision and semantic technologies),

•    Deutsche Thomson OHG (Processing of audio/visual media),

•    Fraunhofer Institute for Intelligent Analysis and Information Systems (Text and speech processing, workflow engine),

•    Fraunhofer Heinrich Hertz Institute (Audio/visual and still image processing),

•    Hasso Plattner Institut (since 2009) (User interface, personalization),

•    Institut für Rundfunktechnik GmbH (Requirements specification, content provision),

•    Moresophy GmbH (until 2009) (User interface and semantic technologies), and

•    Mufin GmbH (Audio processing and classification).

References

  1. 1.
    Agius HW, Angelides MC (2009) From mpeg-7 user interaction tools to hanging basket models: bridging the gap. Multimedia Tools Appl 41(3):375–406CrossRefGoogle Scholar
  2. 2.
    ALEXANDRIA—a collaborative knowledge engine, a use case of the Theseus research program. http://alexandria.wefind.de. Accessed 1 Dec 2011
  3. 3.
    Altenhöner R, Hannemann J, Kett J (2010) Linked data aus und für bibliotheken: rückgratstärkung im Semantic Web. In: Proc of 1. DGI-Konferenz Semantic Web und linked data—elemente zukünftiger informationsstrukturen, pp 67–75Google Scholar
  4. 4.
    Amato G, Debole F, Peters CPS (2008) The multimatch prototype: multilingual/multimedia search for cultural heritage objects. In: Proc of the 12th European conf on digital libraries. Aarhus, DenmarkGoogle Scholar
  5. 5.
    Antonacopoulos A, Pletschacher S, Bridson D, Papadopoulos C (2009) Page segmentation competition. In: Proc of 10th int conf on document analysis and recognition (ICDAR), pp 1370–1374Google Scholar
  6. 6.
    Avrithis Y, Kompatsiaris Y, Staab S, O’Connor N (eds) (2006) Semantic multimedia: first international conference on semantic and digital media technologies. In: SAMT 2006, Athens, Greece, 6–8 December 2006, Proceedings, lecture notes in computer science, vol 4306. Springer, Berlin. doi: 10.1007/11930334
  7. 7.
    Bartolini I, Patella M, Romani C (2010) Shiatsu: semantic-based hierarchical automatic tagging of videos by segmentation using cuts. In: Proc of the 3rd int’l workshop on automated information extraction in media production, AIEMPro ’10. ACM, New York, pp 57–62CrossRefGoogle Scholar
  8. 8.
    Baum D (2009) Topic-based speaker recognition for German parliamentary speeches. In: Proc of IEEE automatic speech recognition and understanding workshop (ASRU ’09). Merano, ItalyGoogle Scholar
  9. 9.
    Baum D, Schneider D, Bardeli R, Schwenninger J, Samlowski B, Winkler T, Köhler J (2010) DiSCo—a German evaluation corpus for challenging problems in the broadcast domain. In: Proc of the 7th int’l conf on language resources and evaluation (LREC’10)Google Scholar
  10. 10.
    Baum D, Schneider D, Mertens T, Köhler J (2010) Constrained subword units for speaker recognition. In: Proc of the speaker and language recognition workshop odysseyGoogle Scholar
  11. 11.
    Behrens-Neumann R, Pfeifer B (2011) Die Gemeinsame Normdatei—ein Kooperationsprojekt. Dialog mit BibliothekenGoogle Scholar
  12. 12.
    Benitez AB, Zhong D, Chang SF (2007) Enabling MPEG-7 structural and semantic descriptions in retrieval applications. J Am Soc Inf Sci Technol 58:1377–1380CrossRefGoogle Scholar
  13. 13.
    Berners-Lee T, Hendler J, Lassila O (2001) The semantic Web. Sci Am 284(5):34–43CrossRefGoogle Scholar
  14. 14.
    Blinkx: a video search engine. http://www.blinkx.com. Accessed 1 Dec 2011
  15. 15.
    Breuel T (2002) Two algorithms for geometric layout analysis. In: Proc of workshop on document analysis systems, vol 3697, pp 188–199Google Scholar
  16. 16.
    Breuel TM (2003) High performance document layout analysisGoogle Scholar
  17. 17.
    Broadcast metadata exchange format—specification. http://www.irt.de/en/activities/production/bmf.html. Accessed 1 Dec 2011
  18. 18.
    Celma O, Dasiopoulou S, Hausenblas M, Little S, Tsinaraki C, Troncy R (2007) MPEG-7 and the semantic Web. W3C Incubator Group EditorsGoogle Scholar
  19. 19.
    Cheng S, Wang H, Fu H (2010) BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization. IEEE Trans Audio Speech Lang Process 18(1):141–157CrossRefGoogle Scholar
  20. 20.
    Contentus: a use case of the Theseus research program. http://www.contentus-projekt.de. Accessed 1 Dec 2011
  21. 21.
    Corda U (2008) Multimedia semantics—from MPEG-7 metadata to semantic Web ontologiesGoogle Scholar
  22. 22.
    Dasiopoulou S, Tzouvaras V, Kompatsiaris I, Strintzis MG (2009) Capturing MPEG-7 semantics. In: Sicilia MA, Lytras MD (eds) Metadata and semantics. Springer, New York, pp 113–122CrossRefGoogle Scholar
  23. 23.
    Dasiopoulou S, Giannakidou E, Litos G, Malasioti P, Kompatsiaris Y (2011) Knowledge-driven multimedia information extraction and ontology evolution. Chap A survey of semantic image and video annotation tools. Springer, Berlin, pp 196–239CrossRefGoogle Scholar
  24. 24.
    DC (Dublin Core): metadata element set, version 1.1. http://purl.org/dc/elements/1.1/. Accessed 1 Dec 2011
  25. 25.
    DDB: the German Digital Library project, a portal for culture and science. http://www.deutsche-digitale-bibliothek.de. Accessed 1 Dec 2011
  26. 26.
    Debald S, Nejdl W, Nucci FS, Paiu R, Plu M (2006) Pharos—platform for search of audiovisual resources across online spaces. In: Proc of the 1st int’l conf on semantic and digital media technologies (SAMT2006). Athens, GreeceGoogle Scholar
  27. 27.
    Defining N-ary relations on the semantic Web, W3C working group note 12 April 2006. http://www.w3.org/TR/swbp-n-aryRelations/. Accessed 1 Dec 2011
  28. 28.
    Ding H, Sølvberg IT (2005) Semantic data integration framework in peer-to-peer based digital libraries. JDIM 3(2):71–75Google Scholar
  29. 29.
    dpa (deutsche presse-agentur gmbh). http://www.dpa.de. Accessed 1 Dec 2011
  30. 30.
    FAO (Food and Agriculture Organization of the United Nations): geopolitical ontology. http://aims.fao.org/aos/geopolitical.owl. Accessed 1 Dec 2011
  31. 31.
    Ferzli R, Karam LJ (2009) A no-reference objective image sharpness metric based on the notion of just noticeable blur (jnb). IEEE Trans Image Process 18(4):717–728MathSciNetCrossRefGoogle Scholar
  32. 32.
    FOAF (Friend Of A Friend): vocabulary specification. http://xmlns.com/foaf/spec/. Accessed 1 Dec 2011
  33. 33.
    Gatos B, Danatsas D, Pratikakis I, Perantonis SJ (2005) Automatic table detection in document images. In: Proc of 3rd int conf on advances in pattern recognition (ICAPR), LNCS 3686, pp 609–618Google Scholar
  34. 34.
    Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an information tapestry. Commun ACM 35:61–70CrossRefGoogle Scholar
  35. 35.
    Guha R, McCool R, Miller E (2003) Using the semantic Web: semantic search. In: WWW ’05 proceedings of the 14th international conference on world wide Web, pp 700–709. doi:10.1145/775152.775250
  36. 36.
    Hannemann J, Kett J (2010) Linked data for libraries. In: Proc of the world library and information congress of the Int’l Federation of Library Associations and Institutions (IFLA)Google Scholar
  37. 37.
    Hinze A, Buchanan G, Bainbridge D, Witten IH (2009) Semantics in Greenstone. In: Kruk SR, McDaniel B (eds) Semantic digital libraries. Springer, New York, pp 163–176. doi: 10.1007/978-3-540-85434-0_12 CrossRefGoogle Scholar
  38. 38.
    Hobson P, Kompatsiaris Y (2006) Advances in semantic multimedia analysis for personalised content access. In: ISCAS. IEEE, PiscatawayGoogle Scholar
  39. 39.
    Huiskes MJ, Lew MS (2008) The Mir Flickr retrieval evaluation. In: Proceeding of the 1st ACM int’l conf on multimedia information retrieval, MIR ’08. ACM, New York, pp 39–43CrossRefGoogle Scholar
  40. 40.
    IASA (International Association of Sound and Audiovisual Archives) TC 04: guidelines on the production and preservation of digital audio objects. http://www.iasa-web.org/audio-preservation-tc04. Accessed 1 Dec 2011
  41. 41.
    Information technology—multimedia content description interface—part 1: systems. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=34228. Accessed 1 Dec 2011
  42. 42.
    Informedia-i: integrated speech, image and language understanding for creation and exploration of digital video libraries. http://www.informedia.cs.cmu.edu/dli1/index.html. Accessed 1 Dec 2011
  43. 43.
    Informedia-ii digital video library: auto summarization and visualization across multiple video documents and libraries. http://www.informedia.cs.cmu.edu/dli2/index.html. Accessed 1 Dec 2011
  44. 44.
    Jain A, Yu B (1998) Document representation and its application to page decomposition. IEEE Trans Pattern Anal Mach Intell 20(3):294–308CrossRefGoogle Scholar
  45. 45.
    Kaprykowsky H, Ndjiki-Nya P (2009) Restoration of digitized videos: efficient drop-out detection and removal. In: Proc of IEEE int’l conf on image processing (ICIP ’09)Google Scholar
  46. 46.
    Kim Hg, Moreau N, Sikora T (2005) MPEG-7 audio and retrieval. CommunicationGoogle Scholar
  47. 47.
    Koeppel M, Doshkov D, Ndjiki-Nya P (2009) Fully automatic inpainting method for complex image content. In: Proc of int’l workshop on image analysis for multimedia interactive services (WIAMS’09)Google Scholar
  48. 48.
    Kompatsiaris Y, Hobson P (2008) Semantic multimedia and ontologies: theory and applications, 1st edn.Google Scholar
  49. 49.
    Konya I, Seibert C, Eickeler S, Glahn S (2009) Constant-time locally optimal adaptive binarization. In: Proc of 10th int’l conf document analysis and recognition. IEEE, Piscataway, pp 738–742Google Scholar
  50. 50.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Williamstown, MA, USA, pp 282–289Google Scholar
  51. 51.
    Lindbloom B (1994) Delta E (CIE 1994). In: Delta E (CIE 1994)Google Scholar
  52. 52.
    Linked data service of the German National Library. http://www.d-nb.de/eng/hilfe/service/linked_data_service.htm. Accessed 1 Dec 2011
  53. 53.
    Liu M, Konya I, Nandzik J, Flores-Herr N, Eickeler S, Ndjiki-Nya P (2011) A new quality assessment and improvement system for print media. EURASIP (Special issue on image and video quality improvement techniques for emerging applications), submittedGoogle Scholar
  54. 54.
    Manjunath BS (2002) Introduction to MPEG-7, multimedia content description interface. Wiley, New YorkGoogle Scholar
  55. 55.
    MEDIAGLOBE—the digital archive, a SME project of the Theseus research program. http://www.projekt-mediaglobe.de/. Accessed 1 Dec 2011
  56. 56.
    Mediamill—semantic video search engine. http://www.science.uva.nl/research/mediamill/index.php. Accessed 1 Dec 2011
  57. 57.
    Mesh—multimedia semantic syndication for enhanced news services. http://www.mesh-ip.eu. Accessed 1 Dec 2011
  58. 58.
    Messina A, Sutter RD, Bailer W, Sano M, Evain JP, Ndjiki-Nya P, Schroeter B (2010) MPEG-7 audiovisual description profile (avdp). Report MPEG2010/M17744, MPEG (ISO/IEC JTC1/SC29/WG11)Google Scholar
  59. 59.
    METS—metadata encoding and transmission standard specification. http://www.loc.gov/standards/mets. Accessed 1 Dec 2011
  60. 60.
    MODS—metadata object description schema specification. http://www.loc.gov/standards/mods. Accessed 1 Dec 2011
  61. 61.
    Mufin player: a recommendation based music player. http://player.mufin.com/en. Accessed 1 Dec 2011
  62. 62.
    Müller S, Bühler J, Weitbruch S, Thebault C, Doser I, Neisse O (2009) Scratch detection supported by coherency analysis of motion vector fields. In: ICIP’09, pp 89–92Google Scholar
  63. 63.
    Nandzik J, Heß A, Hannemann J, Flores-Herr N, Bossert K (2010) Contentus—towards semantic multi-media libraries. In: Proc of 76th IFLA general conf and assembly (2010)Google Scholar
  64. 64.
    NISO metadata for images in XML (NISO MIX) schema. http://www.loc.gov/standards/mix. Accessed 1 Dec 2011
  65. 65.
    Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66MathSciNetCrossRefGoogle Scholar
  66. 66.
    Petasis G, Karkaletsis V, Krithara A, Paliouras G, Spyropoulos C (2009) Semi-automated ontoloy learning: the Boemie approach. In: Proceedings of the 1st ESWC workshop on inductive reasoning and machine learning. Heraklion, GreeceGoogle Scholar
  67. 67.
    Petersohn C (2004) Fraunhofer HHI at TRECVID 2004: shot boundary detection system. In: Proc TREC video retrieval evaluation workshopGoogle Scholar
  68. 68.
    Petersohn C (2009) Temporal video structuring for preservation and annotation of video content. In: Proc of IEEE int’l conf on image processing (ICIP ’09)Google Scholar
  69. 69.
    PND, name authority file of the German National Library. http://www.d-nb.de/eng/standardisierung/normdateien/pnd.htm. Accessed 1 Dec 2011
  70. 70.
    Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. Boulder, CO, USA, pp 147–155Google Scholar
  71. 71.
    Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10(1–3):19–41CrossRefGoogle Scholar
  72. 72.
    RDA (Resource Description and Access): vocabularies. http://metadataregistry.org/rdabrowse.htm. Accessed 1 Dec 2011
  73. 73.
    RELATIONSHIP: a vocabulary for describing relationships between people. http://vocab.org/relationship/.html. Accessed 1 Dec 2011
  74. 74.
    Rushes—European research project on multimedia search and retrieval of rushes data. http://www.rushes-project.eu. Accessed 1 Dec 2011
  75. 75.
    Schneider D, Schon J, Eickeler S (2008) Towards large scale vocabulary independent spoken term detection: advances in the Fraunhofer IAIS audiomining system. In: Köhler J, Larson M, Jong de F, Kraaij W, Ordelman R (eds) Proc of the ACM SIGIR workshop “searching spontaneous conversational speech”. SingaporeGoogle Scholar
  76. 76.
    Skos (simple knowledge organization system): reference. http://www.w3.org/2004/02/skos/. Accessed 1 Dec 2011
  77. 77.
    Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: Proc of the 8th ACM int’l workshop on multimedia information retrieval, MIR ’06. ACM, New York, pp 321–330CrossRefGoogle Scholar
  78. 78.
    Smith K (2006) Capturing analog sound for digital preservation: report of a roundtable discussion of best practices for transferring analog discs and tapesGoogle Scholar
  79. 79.
    Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 4(2):215–322Google Scholar
  80. 80.
    Snoek CGM, Smeulders AWM (2010) Visual-concept search solved? IEEE Computer 43(6):76–78CrossRefGoogle Scholar
  81. 81.
    Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:1–19zbMATHCrossRefGoogle Scholar
  82. 82.
    Theseus: a research program. http://www.theseus-programm.de. Accessed 1 Dec 2011
  83. 83.
    Tritschler A, Gopinath RA (1999) Improved speaker segmentation and segments clustering using the Bayesian information criterion. In: Proc of 6th European conf on speech communication and technology (EUROSPEECH’99). Budapest, Hungary, pp 679–682Google Scholar
  84. 84.
    Tsinaraki C, Christodoulakis S (2007) An MPEG-7 query language and a user preference model that allow semantic retrieval and filtering of multimedia content. Multimedia Syst 13(2):131–153CrossRefGoogle Scholar
  85. 85.
    Ulges A, Schulze C, Keysers D, Breuel TM (2008) A system that learns to tag videos by watching Youtube. In: Proc of the 6th int’l conf on computer vision systems (ICVS’08). Springer, Berlin, pp 415–424CrossRefGoogle Scholar
  86. 86.
    Verge—hybrid interactive video retrieval system. http://mklab.iti.gr/verge/. Accessed 1 Dec 2011
  87. 87.
    Vidi video: improving the accessibility of video. http://www.vidivideo.info. Accessed 1 Dec 2011
  88. 88.
    Vitalas (video and image indexing and retrieval in the large scale): a European fp6 research project. http://vitalas.ercim.org. Accessed 1 Dec 2011
  89. 89.
    Waitelonis J, Osterhoff JP, Sack H (2011) More than the sum of its parts: Contentus—a semantic multimodal search user interface. In: Proc of workshop on visual interfaces to the social and semantic Web (VISSW), co-located with ACM IUI 2011, 13 February 2011, Palo Alto, US, CEUR workshop proceedings, vol 694Google Scholar
  90. 90.
    Waitelonis J, Sack H (2010) Exploratory semantic video search with Yovisto. In: Proc of the 4th IEEE ICSC. Pittsburgh, PA, USAGoogle Scholar
  91. 91.
    Wgs84 geo positioning: an rdf vocabulary. http://www.w3.org/2003/01/geo/wgs84_pos. Accessed 1 Dec 2011
  92. 92.
    Witten IH, Bainbridge D, Nichols DM (2009) How to build a digital library, 2nd edn. Morgan Kaufmann, San FranciscoGoogle Scholar
  93. 93.
    Worring M, Schreiber G (2007) Semantic image and video indexing in broad domains. IEEE Trans Multimedia 9(5):909–911CrossRefGoogle Scholar
  94. 94.
    WS-BPEL: Web services business process execution language (specification). http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html. Accessed 1 Dec 2011
  95. 95.
    WS-RF: Web services resource framework (primer). http://docs.oasis-open.org/wsrf/wsrf-primer-1.2-primer-cd-02.pdf. Accessed 1 Dec 2011
  96. 96.
    Wu L, Hua XS, Yu N, Ma WY, Li S (2008) Flickr distance. In: Proceeding of the 16th ACM int’l conf on multimedia, MM ’08. ACM, New York, pp 31–40CrossRefGoogle Scholar
  97. 97.
    Yan R, Hauptmann AG (2007) A review of text and image retrieval approaches for broadcast news video. Inf Retr 10:445–484CrossRefGoogle Scholar
  98. 98.
    Zheng Y, Liu C, Ding X, Pan S (2001) Form frame line detection with directional single-connected chain. In: Proc of int conf on document analysis and recognition (ICDAR). IEEE Computer Society, Los Alamitos, pp 699–703Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Jan Nandzik
    • 1
    Email author
  • Berenike Litz
    • 2
  • Nicolas Flores-Herr
    • 1
  • Aenne Löhden
    • 2
  • Iuliu Konya
    • 3
  • Doris Baum
    • 3
  • André Bergholz
    • 3
  • Dirk Schönfuß
    • 4
  • Christian Fey
    • 5
  • Johannes Osterhoff
    • 6
  • Jörg Waitelonis
    • 6
  • Harald Sack
    • 6
  • Ralf Köhler
    • 7
  • Patrick Ndjiki-Nya
    • 8
  1. 1.Acosta Consult GmbHFrankfurt am MainGermany
  2. 2.Deutsche Nationalbibliothek, InformationstechnikFrankfurt am MainGermany
  3. 3.Fraunhofer IAISSankt AugustinGermany
  4. 4.mufin GmbH, Büro DresdenDresdenGermany
  5. 5.Institut für Rundfunktechnik GmbH, Production Systems TVMünchenGermany
  6. 6.Hasso-Plattner-Institut für Softwaresystemtechnik GmbHPotsdamGermany
  7. 7.Technicolor - Corporate Research Division, Hanover Image Processing LabDeutsche Thomson OHGHannoverGermany
  8. 8.Fraunhofer-Institut für Nachrichtentechnik Heinrich-Hertz-InstitutBerlinGermany

Personalised recommendations