Lifelog Scene Change Detection Using Cascades of Audio and Video Detectors

Mahkonen, Katariina; Kämäräinen, Joni-Kristian; Virtanen, Tuomas

doi:10.1007/978-3-319-16634-6_32

Lifelog Scene Change Detection Using Cascades of Audio and Video Detectors

Katariina Mahkonen¹⁵,
Joni-Kristian Kämäräinen¹⁵ &
Tuomas Virtanen¹⁵

Conference paper
First Online: 01 January 2015

1361 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9010))

Abstract

The advent of affordable wearable devices with a video camera has established the new form of social data, lifelogs, where lives of people are captured to video. Enormous amount of lifelog data and need for on-site processing demand new fast video processing methods. In this work, we experimentally investigate seven hours of lifelogs and point out novel findings: (1) audio cues are exceptionally strong for lifelog processing; (2) cascades of audio and video detectors improve accuracy and enable fast (super frame rate) processing speed. We first construct strong detectors using state-of-the-art audio and visual features: Mel-frequency cepstral coefficients (MFCC), colour (RGB) histograms, and local patch descriptors (SIFT). In the second stage, we construct a cascade of the trained detectors and optimise cascade parameters. Separating the detector and cascade optimisation stages simplify training and results to a fast and accurate processing pipeline.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gygli, M., Grabner, H., Riemenschneider, H., Gool, L.V.: Creating summaries from user videos (2014)
Google Scholar
Zhao, B., Xing, E.: Quasi real-time summarization for consumer videos. In: Proceedings of the CVPR (2014)
Google Scholar
Kyperountas, M., Kotropoulos, C., Pitas, I.: Enhanced eigen-audioframes for audiovisual scene change detection. IEEE Trans. Multimedia 9(4), 785–797 (2007)
Article Google Scholar
Song, Y., Zhao, M., Yagnik, J., Wu, X.: Taxonomic classification for web-based videos. In: Proceedings of the CVPR (2010)
Google Scholar
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2001)
Article Google Scholar
Chen, M., Xu, Z., Weinberger, K., Chapelle, O., Kedem, D.: Classifier cascade for minimizing feature evaluation cost. In: AISTATS (2012)
Google Scholar
Wu, T., Zhu, S.C.: Learning near-optimal cost-sensitive decision policy for object detection. In: ICCV (2013)
Google Scholar
Shen, C., Wang, P., Paisitkriangkrai, S., van den Hengel, A.: Training effective node classifiers for cascade classification. Int. J. Comput. Vis. 103, 326–347 (2013)
Article MATH MathSciNet Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classfiers. IEEE PAMI 20, 226–239 (1998)
Article Google Scholar
Wang, M.: Movie2comics: towards a lively video content presentation. IEEE Trans. Multimedia 14, 858–870 (2012)
Article Google Scholar
Yip, S.: The automatic video editor. In: ACM Multimedia, pp. 596–597 (2003)
Google Scholar
Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues. In: ICME, vol. 2, pp. 365–368 (2002)
Google Scholar
Pfeiffer, S., Lienhart, R., Effelsberg, W.: Scene determination based on video and audio features. In: Multimedia Tools and Applications, pp. 685–690 (1999)
Google Scholar
Jiang, H., Lin, T., Zhang, H.: Video segmentation with the assistance of audio content analysis. In: IEEE International Conference on Multimedia and Expo (III), pp. 1507–1510 (2000)
Google Scholar
Smeaton, A.F., Over, P., Kraaij, W.: Trecvid: evaluating the effectiveness of information retrieval tasks on digital video. In: Proceedings of ACM Multimedia, New York, USA (2004)
Google Scholar
Gargi, U., Kasturi, R., Strayer, S.H.: Performance characterization of video-shot-change detection methods. IEEE Trans. Circuits Syst. Video Technol. 10(1), 1–13 (2000)
Article Google Scholar
Lowe, D.G.: Distinctive features from scale-invariant keypoints. Int. J. Comp. Vis. 60, 91–110 (2004)
Article Google Scholar
Steven, B., Davis, P.M.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-28, pp. 357–366 (1980)
Google Scholar
Fabro, M., Boszormenyi, L.: State-of-the-art and future challenges in video scene detection: a survey. Multimedia Syst. 19, 427–454 (2013)
Article Google Scholar
Smeaton, A., Over, P., Doherty, A.: Video shot boundary detection: seven years of TRECVid activity. Comput. Vis. Image Underst. 114, 411–418 (2010)
Article Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)
Google Scholar
Heittola, T., Measaros, A., Virtanen, T., Eronen, A.: Sound event detection in multisource environments using source separation. In: Workshop on Machine Listening in Multisource Environments, Florence, Italy, pp. 36–40 (2011)
Google Scholar
Aucouturier, J.-J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscape but not for polyphonic music. J. Acoust. Soc. Am. 122, 881–891 (2007)
Article Google Scholar
Downie, J.: Music information retrieval. Ann. Rev. Inf. Sci. Technol. 37, 295–340 (2003)
Article Google Scholar
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the ICCV (2003)
Google Scholar
Csurka, G., Dance, C., Willamowski, J., Fan, L., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning in Computer Vision (2004)
Google Scholar
Tuytelaars, T., Lampert, C., Blaschko, M., Buntine, W.: Unsupervised object discovery: a comparison. Int. J. Comput. Vis. 88, 284–302 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Signal Processing, Tampere University of Technology, Tampere, Finland
Katariina Mahkonen, Joni-Kristian Kämäräinen & Tuomas Virtanen

Authors

Katariina Mahkonen
View author publications
You can also search for this author in PubMed Google Scholar
Joni-Kristian Kämäräinen
View author publications
You can also search for this author in PubMed Google Scholar
Tuomas Virtanen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joni-Kristian Kämäräinen .

Editor information

Editors and Affiliations

Center for Visual Information Technology, International Institute of Information Technology, Hyderabad, India
C. V. Jawahar
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shiguang Shan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahkonen, K., Kämäräinen, JK., Virtanen, T. (2015). Lifelog Scene Change Detection Using Cascades of Audio and Video Detectors. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9010. Springer, Cham. https://doi.org/10.1007/978-3-319-16634-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-16634-6_32
Published: 12 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16633-9
Online ISBN: 978-3-319-16634-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics