Salient Montages from Unconstrained Videos

  • Min Sun
  • Ali Farhadi
  • Ben Taskar
  • Steve Seitz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8695)


We present a novel method to generate salient montages from unconstrained videos, by finding “montageable moments” and identifying the salient people and actions to depict in each montage. Our method addresses the need for generating concise visualizations from the increasingly large number of videos being captured from portable devices. Our main contributions are (1) the process of finding salient people and moments to form a montage, and (2) the application of this method to videos taken “in the wild” where the camera moves freely. As such, we demonstrate results on head-mounted cameras, where the camera moves constantly, as well as on videos downloaded from YouTube. Our approach can operate on videos of any length; some will contain many montageable moments, while others may have none. We demonstrate that a novel “montageability” score can be used to retrieve results with relatively high precision which allows us to present high quality montages to users.


video summarization video saliency detection 


  1. 1.
    Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, B., Salesin, D., Cohen, M.: Interactive digital photomontage. In: ACM SIGGRAPH (2004)Google Scholar
  2. 2.
    Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 388–402. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition 13(2), 111–122 (1981)CrossRefzbMATHGoogle Scholar
  4. 4.
    Borgo, R., Chen, M., Daubney, B., Grundy, E., Heidemann, G., Hoferlin, B., Hoferlin, M., Janicke, H., Weiskopf, D., Xie, X.: A survey on video-based graphics and video visualization. In: EUROGRAPHICS (2011)Google Scholar
  5. 5.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)Google Scholar
  6. 6.
    Chen, S., Fern, A., Todorovic, S.: Multi-object tracking via constrained sequential labeling. In: CVPR (2014)Google Scholar
  7. 7.
    Cremonesi, P., Koren, Y., Turrin, R.: Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 39–46. ACM, New York (2010)CrossRefGoogle Scholar
  8. 8.
    Cui, X., Liu, Q., Metaxas, D.: Temporal spectral residual: fast motion saliency detection. ACM Multimedia (2009)Google Scholar
  9. 9.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  10. 10.
    Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Discriminatively trained deformable part models, release 5,
  12. 12.
    Fragkiadaki, K., Zhang, W., Zhang, G., Shi, J.: Two-granularity tracking: Mediating trajectory and detection graphs for tracking under occlusions. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 552–565. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Goferman, S., Zelnik-Manor, L., Tal, A.: Contextaware saliency detection. TPAMI (2012)Google Scholar
  14. 14.
    Goldman, D., Curless, B., Salesin, D., Seitz, S.: Schematic storyboarding for video visualization and editing. In: SIGGRAPH (2006)Google Scholar
  15. 15.
    Gong, Y., Liu, X.: Video summarization using singular value decomposition. In: CVPR (2000)Google Scholar
  16. 16.
    Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR (2008)Google Scholar
  17. 17.
    Irani, M., Anandan, P., Hsu, S.: Mosaic-based representations of video sequences and their applications. In: ICCV (1995)Google Scholar
  18. 18.
    Joshi, N., Metha, S., Drucker, S., Stollnitz, E., Hoppe, H., Uyttendaele, M., Cohen, M.F.: Cliplets: Juxtaposing still and dynamic imagery. In: UIST (2012)Google Scholar
  19. 19.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009)Google Scholar
  20. 20.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. TPAMI (2011)Google Scholar
  21. 21.
    Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)Google Scholar
  22. 22.
    Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)Google Scholar
  23. 23.
    Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)Google Scholar
  24. 24.
    Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: Dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  25. 25.
    Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. TPAMI (2010)Google Scholar
  26. 26.
    Liu, F., Hen Hu, Y., Gleicher, M.: Discovering panoramas in web video. ACM Multimedia (2008)Google Scholar
  27. 27.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)Google Scholar
  28. 28.
    Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)Google Scholar
  29. 29.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Imaging Understanding Workshop (1981)Google Scholar
  30. 30.
    Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. TPAMI (2010)Google Scholar
  31. 31.
    Massey, M., Bender, W.: Salient stills: Process and practice. IBM Systems Journal 35(3&4), 557–574 (1996)CrossRefGoogle Scholar
  32. 32.
    Milan, A., Schindler, K., Roth, S.: Detection- and trajectory-level exclusion in multiple object tracking. In: CVPR (2013)Google Scholar
  33. 33.
    Ngo, C., Ma, Y., Zhan, H.: Video summarization and scene detection by graph modeling. In: CSVT (2005)Google Scholar
  34. 34.
    Perazzi, F., Krahenbuhl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: CVPR (2012)Google Scholar
  35. 35.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR (2011)Google Scholar
  36. 36.
    Pritch, Y., Rav-Acha, A., Gutman, A., Peleg, S.: Webcam synopsis: Peeking around the world. In: ICCV (2007)Google Scholar
  37. 37.
    Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short. In: CVPR (2006)Google Scholar
  38. 38.
    Rudoy, D., Goldman, D.B., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013)Google Scholar
  39. 39.
    Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. Journal of Vision (2009)Google Scholar
  40. 40.
    Sun, M., Farhadi, A., Seitz, S.: Technical report of salient montage from unconstrained videos,
  41. 41.
    Sunkavalli, K., Joshi, N., Kang, S.B., Cohen, M.F., Pfister, H.: Video snapshots: Creating high-quality images from video clips. IEEE Transactions on Visualization and Computer Graphics 18(11), 1868–1879 (2012)CrossRefGoogle Scholar
  42. 42.
    Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. IJCV, 1–21Google Scholar
  43. 43.
    Yang, B., Nevatia, R.: An online learned crf model for multi-target tracking. In: CVPR (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Min Sun
    • 1
  • Ali Farhadi
    • 1
  • Ben Taskar
    • 1
  • Steve Seitz
    • 1
  1. 1.University of WashingtonSeattleUSA

Personalised recommendations