Multimedia Systems

, Volume 17, Issue 5, pp 421–433 | Cite as

Personalized video similarity measure

Interactive Multimedia Computing

Abstract

As an effective technique to manage and explore large scale of video collections, personalized video search has received great attentions in recent years. One of the key problems in the related technique development is how to design and evaluate the similarity measures. Most of the existing approaches simply adopt traditional Euclidean distance or its variants. Consequently, they generally suffer from two main disadvantages: (1) low effectiveness—retrieval accuracy is poor. One of main reasons is that very little research has been carried out on designing an effective fusion scheme for integrating multimodal information (e.g., text, audio and visual) from video sequences and (2) poor scalability—development process of the video similarity metrics is largely disconnected from that of the relevant database access methods (indexing structures). This article reports a new distance metric called personalized video distance to effectively fuse information about individual preference and multimodal properties into a compact signature. Moreover, a novel hashing-based indexing structure has been designed to facilitate fast retrieval process and better scalability. A set of comprehensive empirical studies have been carried out based on two large video test collections and carefully designed queries with different complexities. We observe significant improvements over the existing techniques on various aspects.

Keywords

Video search Similarity measure Indexing structure Scalability 

References

  1. 1.
    Special issue on keeping, refinding, and sharing personal information. ACM Trans. Inf. Syst. (2008)Google Scholar
  2. 2.
    Aggarwal, C.C.: On the effects of dimensionality reduction on high dimensional similarity search. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (POSD) (2001)Google Scholar
  3. 3.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th International Conference on Database Theory (ICDT) (2001)Google Scholar
  4. 4.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proc. of ACM FOCS (2006)Google Scholar
  5. 5.
    Berchtold, S., Keim, D.A., Kriegel, H.: The x-tree : An index structure for high-dimensional data. In: Proceedings of 22th International Conference on Very Large Data Bases (VLDB’96) pp. 28–39 (1996)Google Scholar
  6. 6.
    Blei, D., Jordan, M.: Modeling annotated data. In: Proc. of ACM SIGIR (2003)Google Scholar
  7. 7.
    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), (2001)Google Scholar
  8. 8.
    Chang, H.S., Sull, S., Lee, S.U.: Efficient video indexing scheme for content-based retrieval. IEEE Trans. Circuits Syst. Video Technol. 9(8), 1269–1279 (1999)CrossRefGoogle Scholar
  9. 9.
    Chen, L., Chua, T.-S.: A match and tiling approach to content-based video retrieval. In: Proceeding of ICME (2001)Google Scholar
  10. 10.
    Cherubini, M., de Oliveira, R., Oliver, N.: Understanding near-duplicate videos: a user-centric approach. In: ACM Multimedia (2009)Google Scholar
  11. 11.
    Cheung, S., Zakhor, A.: Efficient video similarity measurement with video signature. IEEE Trans. Circuits Syst. Video Technol. 13(1), (2003)Google Scholar
  12. 12.
    Chiu, C.-Y., Li, C.-H., Wang, H.-A., Chen, C.-S., Chien, L.-F.: A time warping based approach for video copy detection. In: Proceeding of ICPR (2006)Google Scholar
  13. 13.
    O’Toole, C., Smeaton, A., Murphy, N., Marlow, S.: Evaluation of shot boundary detection on a large video test suite. In: Proc. of Challenges in Image Retrieval (1999)Google Scholar
  14. 14.
    Dadason, K., Lejsek, H., Ásmundsson, F., Jónsson, B., Amsaleg, L.: Videntifier: identifying pirated videos in real-time. In: Proceedings of ACM the 15th International Conference on Multimedia, pp. 471–472 (2007)Google Scholar
  15. 15.
    Divakaran, A., Radhakrishnan, R., Peker, K.A.: Motion activity-based extraction of key-frames from video shots. In: Proceeding of the IEEE International Conference on Image Processing (2002)Google Scholar
  16. 16.
    Fahlman, S.: An empirical study of learning speed for back-propagation networks. Technical report, Technical Report CMU-CS 88-162, Carnegie-Mellon University (1988)Google Scholar
  17. 17.
    Feng, S., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proc. of the International Conference on Computer Vision and Pattern Recognition (CVPR) (2004)Google Scholar
  18. 18.
    Ferman, A.M., Tekalp, A.M.: Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans. Multimed. 5(2), 244–256 (2003)CrossRefGoogle Scholar
  19. 19.
    Gibbon D. (2005) Introduction to video search engines (tutorial). In: Proc. of WWWGoogle Scholar
  20. 20.
    Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall (2002)Google Scholar
  21. 21.
    Haghani, P., Michel, S., Cudré-Mauroux, P., Aberer, K.: Lsh at large—distributed knn search in high dimensions. In: WebDB (2008)Google Scholar
  22. 22.
    Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan Publishing (1994)Google Scholar
  23. 23.
    Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proceedings of 26th International Conference on Very Large Data Bases (VLDB) (2000)Google Scholar
  24. 24.
    Hoad, T., Zobel, J.: Detection of video sequences using compact signatures. ACM Trans. Inf. Syst. 24(1) (2006)Google Scholar
  25. 25.
    Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)CrossRefGoogle Scholar
  26. 26.
    Li, Y., Zhang, T., Tretter, D.: An overview of video abstraction techniques. Technical report, HP Laboratory, (2001)Google Scholar
  27. 27.
    Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The tv-tree: An index structure for high-dimensional data. VLDB J. 3(4), 517–542 (1994)CrossRefGoogle Scholar
  28. 28.
    Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proc. of the ISMIR (2000)Google Scholar
  29. 29.
    Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust., Speech, Signal (2006)Google Scholar
  30. 30.
    Luo, H., Fan, J.: Building concept ontology for medical video annotation. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 57–60 (2006)Google Scholar
  31. 31.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  32. 32.
    OConnor, B.C.: Selecting key frames of moving image documents: A digital environment for analysis and navigation. Microcomput. Inf. Manag. 8(2), (1991)Google Scholar
  33. 33.
    Puzicha, J., Buhmann, J., Rubner, Y., Tomasi, C.: Empirical evaluation of dissimilarity measures for color and texture. In: Proc. of the International Conference on Computer Vision (ICCV) (1999)Google Scholar
  34. 34.
    Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The a-tree: An index structure for high-dimensional spaces using relative approximation. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 516–526 (2000)Google Scholar
  35. 35.
    Santini, S., Jain, R.: Similarity measures. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), (1999)Google Scholar
  36. 36.
    Shen, J., Tao, D., Li, X.: Modality mixture projections for semantic video event detection. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1587–1596 (2008)CrossRefGoogle Scholar
  37. 37.
    Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. 35(3), (2010)Google Scholar
  38. 38.
    Truong, B.T., Venkatesh, S.: Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications and Applications 3(1), (2007)Google Scholar
  39. 39.
    Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. on Speech and Audio Processing (2002)Google Scholar
  40. 40.
    Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), (2009)Google Scholar
  41. 41.
    Wang, M., Hua, X.-S., Tang, J., Hong, R.: Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimed. 11(3), (2009)Google Scholar
  42. 42.
    Zhang, B., Shen, J., Xiang, Q., Wang, Y.: Compositemap: A novel framework for music similarity measure. In: Proc. of ACM SIGIR (2009)Google Scholar
  43. 43.
    Zhang, H., Tan, S.Y., Smoliar, S.W., Gong, Y.: Automatic parsing and indexing of news video. Multimed. Syst. 2(6), 256–266 (1995)CrossRefGoogle Scholar
  44. 44.
    Zhu, X., Fan, J., Elmagarmid, A.K., Wu, X.: Hierarchical video content description and summarization using unified semantic and visual similarity. Multimed. Syst. 9(1), (2003)Google Scholar
  45. 45.
    Zhu, X., Wu, X., Fan, J., Elmagarmid, A.K., Aref, W.G.: Exploring video content structure for hierarchical summarization. Multimed. Syst. 10(2), 98–115 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.School of Information SystemsSingapore Management UniversitySingaporeSingapore

Personalised recommendations