Skip to main content

Lifelog Scene Change Detection Using Cascades of Audio and Video Detectors

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9010))

Abstract

The advent of affordable wearable devices with a video camera has established the new form of social data, lifelogs, where lives of people are captured to video. Enormous amount of lifelog data and need for on-site processing demand new fast video processing methods. In this work, we experimentally investigate seven hours of lifelogs and point out novel findings: (1) audio cues are exceptionally strong for lifelog processing; (2) cascades of audio and video detectors improve accuracy and enable fast (super frame rate) processing speed. We first construct strong detectors using state-of-the-art audio and visual features: Mel-frequency cepstral coefficients (MFCC), colour (RGB) histograms, and local patch descriptors (SIFT). In the second stage, we construct a cascade of the trained detectors and optimise cascade parameters. Separating the detector and cascade optimisation stages simplify training and results to a fast and accurate processing pipeline.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Gygli, M., Grabner, H., Riemenschneider, H., Gool, L.V.: Creating summaries from user videos (2014)

    Google Scholar 

  2. Zhao, B., Xing, E.: Quasi real-time summarization for consumer videos. In: Proceedings of the CVPR (2014)

    Google Scholar 

  3. Kyperountas, M., Kotropoulos, C., Pitas, I.: Enhanced eigen-audioframes for audiovisual scene change detection. IEEE Trans. Multimedia 9(4), 785–797 (2007)

    Article  Google Scholar 

  4. Song, Y., Zhao, M., Yagnik, J., Wu, X.: Taxonomic classification for web-based videos. In: Proceedings of the CVPR (2010)

    Google Scholar 

  5. Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2001)

    Article  Google Scholar 

  6. Chen, M., Xu, Z., Weinberger, K., Chapelle, O., Kedem, D.: Classifier cascade for minimizing feature evaluation cost. In: AISTATS (2012)

    Google Scholar 

  7. Wu, T., Zhu, S.C.: Learning near-optimal cost-sensitive decision policy for object detection. In: ICCV (2013)

    Google Scholar 

  8. Shen, C., Wang, P., Paisitkriangkrai, S., van den Hengel, A.: Training effective node classifiers for cascade classification. Int. J. Comput. Vis. 103, 326–347 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  9. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classfiers. IEEE PAMI 20, 226–239 (1998)

    Article  Google Scholar 

  10. Wang, M.: Movie2comics: towards a lively video content presentation. IEEE Trans. Multimedia 14, 858–870 (2012)

    Article  Google Scholar 

  11. Yip, S.: The automatic video editor. In: ACM Multimedia, pp. 596–597 (2003)

    Google Scholar 

  12. Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues. In: ICME, vol. 2, pp. 365–368 (2002)

    Google Scholar 

  13. Pfeiffer, S., Lienhart, R., Effelsberg, W.: Scene determination based on video and audio features. In: Multimedia Tools and Applications, pp. 685–690 (1999)

    Google Scholar 

  14. Jiang, H., Lin, T., Zhang, H.: Video segmentation with the assistance of audio content analysis. In: IEEE International Conference on Multimedia and Expo (III), pp. 1507–1510 (2000)

    Google Scholar 

  15. Smeaton, A.F., Over, P., Kraaij, W.: Trecvid: evaluating the effectiveness of information retrieval tasks on digital video. In: Proceedings of ACM Multimedia, New York, USA (2004)

    Google Scholar 

  16. Gargi, U., Kasturi, R., Strayer, S.H.: Performance characterization of video-shot-change detection methods. IEEE Trans. Circuits Syst. Video Technol. 10(1), 1–13 (2000)

    Article  Google Scholar 

  17. Lowe, D.G.: Distinctive features from scale-invariant keypoints. Int. J. Comp. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  18. Steven, B., Davis, P.M.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-28, pp. 357–366 (1980)

    Google Scholar 

  19. Fabro, M., Boszormenyi, L.: State-of-the-art and future challenges in video scene detection: a survey. Multimedia Syst. 19, 427–454 (2013)

    Article  Google Scholar 

  20. Smeaton, A., Over, P., Doherty, A.: Video shot boundary detection: seven years of TRECVid activity. Comput. Vis. Image Underst. 114, 411–418 (2010)

    Article  Google Scholar 

  21. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)

    Google Scholar 

  22. Heittola, T., Measaros, A., Virtanen, T., Eronen, A.: Sound event detection in multisource environments using source separation. In: Workshop on Machine Listening in Multisource Environments, Florence, Italy, pp. 36–40 (2011)

    Google Scholar 

  23. Aucouturier, J.-J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscape but not for polyphonic music. J. Acoust. Soc. Am. 122, 881–891 (2007)

    Article  Google Scholar 

  24. Downie, J.: Music information retrieval. Ann. Rev. Inf. Sci. Technol. 37, 295–340 (2003)

    Article  Google Scholar 

  25. Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the ICCV (2003)

    Google Scholar 

  26. Csurka, G., Dance, C., Willamowski, J., Fan, L., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning in Computer Vision (2004)

    Google Scholar 

  27. Tuytelaars, T., Lampert, C., Blaschko, M., Buntine, W.: Unsupervised object discovery: a comparison. Int. J. Comput. Vis. 88, 284–302 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joni-Kristian Kämäräinen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mahkonen, K., Kämäräinen, JK., Virtanen, T. (2015). Lifelog Scene Change Detection Using Cascades of Audio and Video Detectors. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9010. Springer, Cham. https://doi.org/10.1007/978-3-319-16634-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16634-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16633-9

  • Online ISBN: 978-3-319-16634-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics