Abstract
We present a learning framework for fusion-based video retrieval system, which explicitly optimizes given performance metrics. Real-world computer vision systems serve sophisticated user needs, and domain-specific performance metrics are used to monitor the success of such systems. However, the conventional approach for learning under such circumstances is to blindly minimize standard error rates and hope the targeted performance metrics improve, which is clearly suboptimal. In this work, a novel scheme to directly optimize such targeted performance metrics during learning is developed and presented. Our experimental results on two large consumer video archives are promising and showcase the benefits of the proposed approach.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Toderici, G., Aradhye, H., Pasca, M., Sbaiz, L., Yagnik, J.: Finding meaning on youtube: Tag recommendation and category discovery. In: CVPR (2010)
Jiang, Y.G., Ye, G., Chang, S.F., Ellis, D., Loui, A.C.: Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In: ACM ICMR (2011)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: ACM MIR (2006)
Wang, Z., Zhao, M., Song, Y., Kumar, S., Li, B.: Youtubecat: Learning to categorize wild web videos. In: CVPR (2010)
Yang, W., Toderici, G.: Discriminative tag learning on youtube videos with latent sub-tags. In: CVPR (2011)
Joachims, T.: A support vector method for multivariate performance measures. In: ICML (2005)
Calonder, M., Lepetit, V., Fua, P.: Pareto-optimal Dictionaries for Signatures. In: CVPR (2010)
Gao, S., Wu, W., Lee, C.H., Chua, T.S.: A mfom learning approach to robust multiclass multi-label text categorization. In: ICML (2004)
Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: ICCV (2007)
Gehler, P.V., Nowozin, S.: On feature combination for multiclass object classification. In: IEEE International Conference on Computer Vision, ICCV (2009)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ïn the wild.̈ In: CVPR (2009)
Katagiri, S., Juang, B.H., Lee, C.H.: Pattern recognition using a family of design algorithm based upon the generalized probabilistic descent method. Proc. of the IEEE, 2345–2373 (1998)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)
Li, L.J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: A high-level image representation for scene classification and semantic feature sparsification. In: Proceedings of the Neural Information Processing Systems, NIPS (2010)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope 42, 145–175 (2001)
Lee, C.H., Soong, F., Juan, B.H.: A segment model based approach to speech recognition. In: ICASSP (1988)
Martin, A.F., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Eurospeech (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Additional information
This work was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20069. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, I., Oh, S., Byun, B., Perera, A.G.A., Lee, CH. (2012). Explicit Performance Metric Optimization for Fusion-Based Video Retrieval. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33885-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-33885-4_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33884-7
Online ISBN: 978-3-642-33885-4
eBook Packages: Computer ScienceComputer Science (R0)