Multimedia Tools and Applications

, Volume 75, Issue 15, pp 9073–9094 | Cite as

A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling

  • Gabriel SargentEmail author
  • Karina R. Perez-Daniel
  • Andrei Stoian
  • Jenny Benois-Pineau
  • Sofian Maabout
  • Henri Nicolas
  • Mariko Nakano Miyatake
  • Jean Carrive


Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most existing approaches fulfill this goal. However, such an overview does not allow the user to reach all details of interest selectively and progressively. This paper proposes a novel scalable summary generation approach based on the On-Line Analytical Processing data cube. Such a structure integrates tools like the drill down operation allowing to browse efficiently multiple descriptions of a dataset according to increased levels of detail. We adapt this model to video summary generation by expressing a video within a cross-media feature space and by performing clusterings according to particular subspaces. Consensus clustering is used to guide the subspace selection strategy at small dimensions, as the novelty brought by the least consensual subspaces is interesting for the refinements of a summary. Our approach is designed for weakly-structured contents such as cultural documentaries. We perform its evaluation on a corpus of cultural archives provided by the French Audiovisual National Institute (INA) using information retrieval metrics handling single and multiple reference annotations. The performances obtained overall improved results compared to two baseline systems performing random and arbitrary segmentations, showing a better balance between Precision and Recall.


Video summarization Scalability Cross-media space Consensus clustering Data cube Drill down 



This work is supported by the French National Research Agency grant ANR-11-IS02-001 within the joint French-Mexican project Mex-Culture. We are grateful to the Institut National de l’Audiovisuel (INA, France) for providing us the video content we employed for setting up the evaluation. The authors thank Michel Crucianu and Marin Ferecatu for valuable discussions and master student Elie Génard for his efficient help in conducting computational experiments.


  1. 1.
    Almeida J, Leite NJ, da S Torres R (2013) Online video summarization in compressed domain. J Vis Commun Image Represent 24:729–738CrossRefGoogle Scholar
  2. 2.
    Bartolini I, Patella M, Stromei G (2011) The windsurf library for the efficient retrieval of multimedia hierarchical data. In: Proceedings of ACM special interest group on multimedia (SIGMM), pp 139–148Google Scholar
  3. 3.
    Bartsch MA, Wakefield GH (2005) Audio thumbnailing of popular music using chroma-based representations. IEEE Trans Multimed 7:96–104CrossRefGoogle Scholar
  4. 4.
    Ben Abdelali A, Nidhalkrifa M, Mtibaa A, Bourennane EB (2009) A study of color structure descriptor for shot boundary detection. Int J Sci Tech Autom Control Comput Eng 3(1):956–971Google Scholar
  5. 5.
    Benini S, Bianchetti A, Leonardi R, Migliorati P (2006) Extraction of significant video summaries by dendrogram analysis. In: Proceedings of the international conference on image processing (ICIP), pp 133–136Google Scholar
  6. 6.
    Benois-Pineau J, Dupuy W, Barba D (2001) Recovering of visual scenarios in movies by motion analysis and grouping spatio-temporal colour signatures of video shots. In: Proceedings of EUSFLAT’2001, pp 385–389Google Scholar
  7. 7.
    Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231Google Scholar
  8. 8.
    Goder A, Filkov V (2008) Consensus clustering algorithms: Comparison and refinement. In: Proceedings of 9th workshop on algorithm engineering and experiments (ALENEX’08), pp 109–117Google Scholar
  9. 9.
    Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proceedings of the neural information processing systems conference (NIPS), pp 1–9Google Scholar
  10. 10.
    Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. J Data Min Knowledge Disc 1(1):29–53CrossRefGoogle Scholar
  11. 11.
    Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666CrossRefGoogle Scholar
  12. 12.
    Jin X, Han J, Cao L, Luo J, Ding B, Lin CK (2010) Visual cube and n-line analytical processing of images. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM), pp 849–858Google Scholar
  13. 13.
    Kompatsiaris Y, Merialdo B, Lian S (eds) (2012) TV content analysis. Techniques and applications. CRC PressGoogle Scholar
  14. 14.
    Li Y, Merialdo B (2010) VERT: automatic evaluation of video summaries. In: Proceedings of ACM multimedia, pp 851–854Google Scholar
  15. 15.
    Mathieu B, Essid S, Fillon T, Prado J, Richard G (2010) YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the 11th International society for music information retrieval (ISMIR), pp 441–446Google Scholar
  16. 16.
    Messing D, van Beek P, Errico JH (2001) The mpeg-7 color structure descriptor: image description using color and local spatial information. In: Proceedings of the international conference on image processing (ICIP), pp 670–673Google Scholar
  17. 17.
    Naci U, Damnjanovic U, Mansencal B, Benois-Pineau J, Kaes C, Corvaglia M, Rossi E, Aginako N (2008) The COST292 experimental framework for rushes summarization task in TRECVID 2008. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, pp 40–44Google Scholar
  18. 18.
    Peltonen V, Tuomi J, Klapuri A, Huopaniemi J, Sorsa T (2002) Computational auditory scene recognition. In: Proceedings of the 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 1941–1944Google Scholar
  19. 19.
    Pinquier J, Karaman S, Letoupin L, Guyot P, Mégret R, Benois-Pineau J, Gaëstel Y, Dartigues JF (2012) Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors. In: Proceedings of the 21st international conference on pattern recognition (ICPR), pp 3192–3195Google Scholar
  20. 20.
    Quénot G, Benois-Pineau J, Mansencal B, Rossi E, Cord M, Precioso F, Gorisse D, Lambert P, Augereau B, Granjon L, Pellerin D, Rombaut M, Ayache S (2008) Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, pp 80–84Google Scholar
  21. 21.
    R Perez-Daniel K, Nakano-Miyatake M, Benois-Pineau J, Maabout S, Sargent G (2014) Scalable video summarization of cultural video documents in cross-media space based on data cube approach. In: Proceedings of the 12th international workshop on content-based multimedia indexing (CBMI), pp 1–6Google Scholar
  22. 22.
    Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79MathSciNetCrossRefGoogle Scholar
  23. 23.
    Wang J, Liu P, She M, Kouzani A, Nahavandi S (2011) The MPEG-7 color structure descriptor: Image description using color and local spatial information. In: Proceedings of 2011 IEEE international conference on systems, man, and cybernetics (SMC), pp 2449–2454Google Scholar
  24. 24.
    Yeung M, Yeo BL (1996) Time-constrained clustering for segmentation of video into story units. In: Proceedings of the 13th international conference on pattern recognition (ICPR), vol. 3, pp 375–380Google Scholar
  25. 25.
    Yong-ge W, Sheng-ze P (2012) Research on image retrieval based on scalable color descriptor of mpeg7. Adv Control Commun:91–98Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Gabriel Sargent
    • 1
    Email author
  • Karina R. Perez-Daniel
    • 2
  • Andrei Stoian
    • 1
  • Jenny Benois-Pineau
    • 3
  • Sofian Maabout
    • 3
  • Henri Nicolas
    • 3
  • Mariko Nakano Miyatake
    • 2
  • Jean Carrive
    • 4
  1. 1.Vertigo-CEDRIC, CNAMParis Cedex 03France
  2. 2.SEPI, ESIME Culhuacan National Polytechnic Institute IPNUnidad Profesional Adolfo López MateosMexico CityMexico
  3. 3.LaBRI, University of BordeauxTalenceFrance
  4. 4.Institut National de l’Audiovisuel - INA ExpertBry-sur-MarneFrance

Personalised recommendations