Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help?

  • Wanlei Zhao
  • Yu-Gang Jiang
  • Chong-Wah Ngo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4071)


Bag-of-words representation with visual keypoints has recently emerged as an attractive approach for video search. In this paper, we study the degree of improvement when point-to-point (P2P) constraint is imposed on the bag-of-words. We conduct investigation on two tasks: near-duplicate keyframe (NDK) retrieval, and high-level concept classification, covering parts of TRECVID 2003 and 2005 datasets. In P2P matching, we propose a one-to-one symmetric keypoint matching strategy to diminish the noise effect during keyframe comparison. In addition, a new multi-dimensional index structure is proposed to speed up the matching process with keypoint filtering. Through experiments, we demonstrate that P2P constraint can significantly boost the performance of NDK retrieval, while showing competitive accuracy in concept classification of broadcast domain.


Color Histogram Background Clutter Locality Sensitive Hash Maximal Stable Extreme Region Video Search 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wu, X., Ngo, C.-W., Li, Q.: Threading and Autodocumenting News Videos. IEEE Signal Processing Magazine 23(2), 59–68 (2006)CrossRefGoogle Scholar
  2. 2.
    Chang, S.-F., et al.: Columbia University TRECVID-2005 Video Search and High-Level Feature Extraction. In: TRECVID Online Proceedings (2005)Google Scholar
  3. 3.
    Zhang, D.-Q., Chang, S.-F.: Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning. In: ACM International Conference on Multimedia, pp. 877–884 (2004)Google Scholar
  4. 4.
    TREC Video Retrieval Evaluation,
  5. 5.
    Csurka, G., Dance, C., Fan, L., et al.: Visual Categorization with Bags of Keypoints. In: ECCV 2004 Workshop on Statistical Learning in Computer Vision, pp. 59–74 (2004)Google Scholar
  6. 6.
    Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: International Conference on Computer Vision, pp. 1470–1477 (2003)Google Scholar
  7. 7.
    Ke, Y., Suthankar, R., Huston, L.: Efficient Near-Duplicate Detection and Sub-image Retrieval. In: ACM International Conference on Multimedia, pp. 869–876 (2004)Google Scholar
  8. 8.
    Grauman, K., Darrell, T.: Efficient Image Matching with Distributions of Local Invariant Features. Computer Vision and Pattern Recognition, 627–634 (2005)Google Scholar
  9. 9.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s Distance as a Metric for Image Retrieval. International Journal of Computer Vision 40, 99–121 (2000)MATHCrossRefGoogle Scholar
  10. 10.
    Mikolajczyk, K., Schmid, C.: Scale and Affine Invariant Interest Point Detectors. International Journal of Computer Vision 60, 63–86 (2004)CrossRefGoogle Scholar
  11. 11.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., et al.: A Comparison of Affine Region Detectors. International Journal on Computer Vision 65(1-2), 43–72 (2005)CrossRefGoogle Scholar
  12. 12.
    Lowe, D.: Distinctive Image Features from Scale-Invariant Key Points. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  13. 13.
    Matas, J., Chum, O., Urban, M., et al.: Robust Wide Baseline Stereo from Maximally Stable Extremal Regions. In: British Machine Vision Conference, pp. 384–393 (2002)Google Scholar
  14. 14.
    Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. Computer Vision and Pattern Recognition, 257–263 (2003)Google Scholar
  15. 15.
    Ke, Y., Sukthankar, R.: PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. Computer Vision and Pattern Recognition 2, 506–513 (2004)Google Scholar
  16. 16.
    Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55, 311–331 (2004)MATHCrossRefGoogle Scholar
  17. 17.
    Quelhas, P., Monay, F., et al.: Modeling Scenes with Local Descriptors and Latent Aspects. In: International Conference on Computer Vision, pp. 883–890 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wanlei Zhao
    • 1
  • Yu-Gang Jiang
    • 1
  • Chong-Wah Ngo
    • 1
  1. 1.Department of Computer ScienceCity University of Hong KongKowloon, Hong Kong

Personalised recommendations