Fast Approximate Nearest Neighbor Methods for Example-Based Video Search

Part of the Studies in Computational Intelligence book series (SCI, volume 409)

Abstract

The cost of computer storage is steadily decreasing. Many terabytes of video data can be easily collected using video cameras in public places for modern surveillance applications, or stored on video sharing websites. However, the growth in CPU speeds has recently slowed to a crawl. This situation implies that while the data is being collected, it cannot be cheaply processed in time. Searching such vast collections of video data for useful information requires radically different approaches, calling for algorithms with sub-linear time complexity.

One application of a search in a large data set is query-by-example. A video clip is used as a query for an algorithm to find a set of similar clips from the collection. A naive solution to such problem would specify some sort of a similarity metric and exhaustively compute this similarity between the query and all other video clips in the collection. Then the clips with the highest similarity values can be returned as the answer-set. However, as the number of the videos in the collection grows, such computation becomes prohibitively expensive. In order to show sub-linear growth any large-scale algorithm needs to exploit some properties of the data that does away with the need to compute explicit distances between a query point and any other point in the set. To this end, Approximate Nearest Neighbor methods have recently become popular. These algorithms provide a trade-off between the accuracy of finding nearest-neighbors and the corresponding computational complexity. As a result, searches in very large datasets can be performed very quickly albeit at the cost of very few incorrect matches.

Most of the recent work in developing ANN methods has been done for data points that lie in a Euclidean space. However, several applications in computer vision such as object and human activity recognition use non-Euclidean data. State-of-the-art Euclidean ANN methods do not perform well when applied to these datasets. In this chapter, we present algorithms for performing ANN on manifolds 1) by explicitly considering the Riemannian geometry of the non-Euclidean manifold and 2) by taking advantage of the kernel trick in non-Euclidean spaces where performing Riemannian computations is expensive. For a data set with N samples, the proposed methods are able to retrieve similar objects in as low as O(K) time complexity, where K ≪ N. We test and evaluate our methods on both synthetic and real datasets and get better than state-of-the-art results.

Keywords

Cluster Center Binary Code Near Neighbor Geodesic Distance Neighbor Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Symposium on Foundations of Computer Science (2006)Google Scholar
  2. 2.
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journal of the ACM 45, 891–923 (1998)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Basharat, A., Shah, M.: Time series prediction by chaotic modeling of nonlinear dynamical systems. In: International Conference on Computer Vision (2009)Google Scholar
  4. 4.
    Beis, J.S., Lowe, D.G.: Shape indexing using approximate nearest-neighbor search in high dimensional spaces. In: IEEE Conference on Computer Vision and Pattern Recognition (1997)Google Scholar
  5. 5.
    Ben-Arie, J., Wang, Z., Pandit, P., Rajaram, S.: Human activity recognition using multidimensional indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1091–1104 (2002)CrossRefGoogle Scholar
  6. 6.
    Bengio, Y., Delalleau, O., Roux, N.L., Paiement, J.F., Vincent, P., Ouimet, M.: Learning eigenfunctions links spectral embedding and kernel pca. Neural Computation 16 (2004)Google Scholar
  7. 7.
    Bissacco, A., Chiuso, A., Ma, Y., Soatto, S.: Recognition of human gaits. In: IEEE Conference on Computer Vision and Pattern Recognition (2001)Google Scholar
  8. 8.
    Bissacco, A., Chiuso, A., Soatto, S.: Classification and recognition of dynamical models: The role of phase, independent components, kernels and optimal transport. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(11), 1958–1972 (2007)CrossRefGoogle Scholar
  9. 9.
    Biswas, S., Aggarwal, G., Chellappa, R.: Efficient indexing for articulation invariant shape matching and retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  10. 10.
    Çetingül, H.E., Vidal, R.: Intrinsic mean shift for clustering on stiefel and grassmann manifolds. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  11. 11.
    Chaudhry, R., Ivanov, Y.: Fast Approximate Nearest Neighbor Methods for Non-Euclidean Manifolds with Applications to Human Activity Analysis in Videos. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 735–748. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  13. 13.
    Chen, D.Y., Lee, S.Y., Chen, H.T.: Motion activity based semantic video similarity retrieval. In: Advances in Multimedia Information Processing (2002)Google Scholar
  14. 14.
    Chen, X., Zhang, C.: Semantic event retrieval from surveillance video databases. In: IEEE International Symposium on Multimedia (2008)Google Scholar
  15. 15.
    Cock, K.D., Moor, B.D.: Subspace angles and distances between ARMA models. System and Control Letters 46(4), 265–270 (2002)MATHCrossRefGoogle Scholar
  16. 16.
    Dollar, P., Reabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)Google Scholar
  17. 17.
    Freidman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3, 209–226 (1977)CrossRefGoogle Scholar
  18. 18.
    Fukunaga, K., Narendra, P.M.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers 24, 750–753 (1975)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Goh, A., Vidal, R.: Clustering and dimensionality reduction on Riemannian manifolds. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  20. 20.
    Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer (2003)Google Scholar
  21. 21.
    Karpenko, A., Aarabi, P.: Tiny videos: Non-parametric content-based video retrieval and recognition. In: IEEE International Symposium on Multimedia (2008)Google Scholar
  22. 22.
    Kashino, K., Kimura, A., Kurozumi, T.: A quick video search method based on local and global feature clustering. In: International Conference on Pattern Recognition (2004)Google Scholar
  23. 23.
    Kulis, B., Darrel, T.: Learning to hash with binary reconstructive embeddings. Tech. Rep. UCB/EECS-2009-101. Electrical Engineering and Computer Sciences. University of California at Berkeley (2009)Google Scholar
  24. 24.
    Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: International Conference on Computer Vision (2009)Google Scholar
  25. 25.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64, 107–123 (2005)CrossRefGoogle Scholar
  26. 26.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  27. 27.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Applications (2009)Google Scholar
  28. 28.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  29. 29.
    Ravichandran, A., Chaudhry, R., Vidal, R.: View-invariant dynamic texture recognition using a bag of dynamical systems. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  30. 30.
    Salakhutdinov, R., Hinton, G.: Semantic hashing. International Journal of Approximate Reasoning 50, 969–978 (2009)CrossRefGoogle Scholar
  31. 31.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  32. 32.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: International Conference on Pattern Recognition (2004)Google Scholar
  33. 33.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  34. 34.
    Sidenbladh, H., Black, M.J., Sigal, L.: Implicit Probabilistic Models of Human Motion for Synthesis and Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  35. 35.
    Silpa-Anan, C., Hartley, R.: Optimised kd-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  36. 36.
    Subbarao, R., Meer, P.: Nonlinear mean shift over riemannian manifolds. International Journal of Computer Vision 84, 1–20 (2009)CrossRefGoogle Scholar
  37. 37.
    Turaga, P., Chellappa, R.: Nearest-neighbor search algorithms on non-euclidean manifolds for computer vision applications. In: Indian Conference on Computer Vision, Graphics and Image Processing (2010)Google Scholar
  38. 38.
    Turaga, P., Veeraraghavan, A., Chellappa, R.: From videos to verbs: Mining videos for events using a cascade of dynamical systems. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  39. 39.
    Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Neural Information Processing Systems Conference (2008)Google Scholar
  40. 40.
    Zanetti, S., Zelnik-Manor, L., Perona, P.: A walk through the web’s video clips. In: IEEE Workshop on Internet Vision, in Computer Vision and Pattern Recogntion, CVPR (2008)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Center for Imaging ScienceJohns Hopkins UniversityBaltimoreUSA
  2. 2.Mitsubishi Electric Research LaboratoriesCambridgeUSA

Personalised recommendations