Discovering Video Clusters from Visual Features and Noisy Tags

  • Arash Vahdat
  • Guang-Tong Zhou
  • Greg Mori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


We present an algorithm for automatically clustering tagged videos. Collections of tagged videos are commonplace, however, it is not trivial to discover video clusters therein. Direct methods that operate on visual features ignore the regularly available, valuable source of tag information. Solely clustering videos on these tags is error-prone since the tags are typically noisy. To address these problems, we develop a structured model that considers the interaction between visual features, video tags and video clusters. We model tags from visual features, and correct noisy tags by checking visual appearance consistency. In the end, videos are clustered from the refined tags as well as the visual features. We learn the clustering through a max-margin framework, and demonstrate empirically that this algorithm can produce more accurate clustering results than baseline methods based on tags or visual features, or both. Further, qualitative results verify that the clustering results can discover sub-categories and more specific instances of a given video category.


Visual Feature Spectral Cluster Event Category Home Video Video Category 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

978-3-319-10599-4_34_MOESM1_ESM.pdf (234 kb)
Electronic Supplementary Material (PDF 234 KB)


  1. 1.
    YouTube: Statistics - youtube (2014) (accessed February 27, 2014)Google Scholar
  2. 2.
    Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)Google Scholar
  3. 3.
    Wang, Y., Jiang, H., Drew, M.S., Li, Z.N., Mori, G.: Unsupervised discovery of action classes. In: CVPR (2006)Google Scholar
  4. 4.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: BMVC (2006)Google Scholar
  5. 5.
    Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y.: Learning to cluster search results. In: SIGIR (2004)Google Scholar
  6. 6.
    Schroff, F., Zitnick, C.L., Baker, S.: Clustering videos by location. In: BMVC (2009)Google Scholar
  7. 7.
    Hsu, C.F., Caverlee, J., Khabiri, E.: Hierarchical comments-based clustering. In: SAC (2011)Google Scholar
  8. 8.
    Zhou, G.T., Lan, T., Vahdat, A., Mori, G.: Latent maximum margin clustering. In: NIPS (2013)Google Scholar
  9. 9.
    Vahdat, A., Mori, G.: Handling uncertain tags in visual recognition. In: ICCV (2013)Google Scholar
  10. 10.
    Over, P., Awad, G., Michel, M., Fiscus, J., Kraaij, W., Smeaton, A.F., Quenot, G.: TRECVID 2011 — an overview of the goals, tasks, data, evaluation mechansims and metrics. In: TRECVID (2011)Google Scholar
  11. 11.
    Natarajan, P., Wu, S., Vitaladevuni, S.N.P., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R., Natarajan, P.: Multimodal feature fusion for robust event detection in web videos. In: CVPR (2012)Google Scholar
  12. 12.
    Izadinia, H., Shah, M.: Recognizing complex events using large margin joint low-level event model. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 430–444. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: ACM MM (2007)Google Scholar
  14. 14.
    Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: NIPS (2004)Google Scholar
  15. 15.
    Valizadegan, H., Jin, R.: Generalized maximum margin clustering and unsupervised kernel learning. In: NIPS (2006)Google Scholar
  16. 16.
    Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. In: ICML (2007)Google Scholar
  17. 17.
    Zhao, B., Wang, F., Zhang, C.: Efficient multiclass maximum margin clustering. In: ICML (2008)Google Scholar
  18. 18.
    Yang, W., Toderici, G.: Discriminative tag learning on youtube videos with latent sub-tags. In: CVPR (2011)Google Scholar
  19. 19.
    Hoai, M., Zisserman, A.: Discriminative sub-categorization. In: CVPR (2013)Google Scholar
  20. 20.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)Google Scholar
  21. 21.
    Do, T.M.T., Artières, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)Google Scholar
  22. 22.
    Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. on Pattern Analysis and Machine Intelligence 34(3), 480–492 (2012)CrossRefGoogle Scholar
  23. 23.
    Kvalseth, T.O.: Entropy and correlation: Some comments. IEEE Transactions on Systems, Man and Cybernetics 17(3), 517–519 (1987)CrossRefGoogle Scholar
  24. 24.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Arash Vahdat
    • 1
  • Guang-Tong Zhou
    • 1
  • Greg Mori
    • 1
  1. 1.School of Computing ScienceSimon Fraser UniversityCanada

Personalised recommendations