Skip to main content

Discovery of Image Versions in Large Collections

  • Conference paper
Advances in Multimedia Modeling (MMM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4352))

Included in the following conference series:

Abstract

Image collections may contain multiple copies, versions, and fragments of the same image. Storage or retrieval of such duplicates and near-duplicates may be unnecessary and, in the context of collections derived from the web, their presence may represent infringements of copyright. However, identifying image versions is a challenging problem, as they can be subject to a wide range of digital alterations, and is potentially costly as the number of image pairs to be considered is quadratic in collection size. In this paper, we propose a method for finding the pairs of near-duplicates based on manipulation of an image index. Our approach is an adaptation of a robust object recognition technique and a near-duplicate document detection algorithm to this application domain. We show that this method requires only moderate computing resources, and is highly effective at identifying pairs of near-duplicates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernstein, Y., Zobel, J.: A scalable system for identifying co-derivative documents. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 55–67. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Computer Networks 29(8-13), 1157–1166 (1997)

    Google Scholar 

  3. Chang, E., Wang, J.Z., Wiederhold, G.: RIME: A replicated image detector for the world-wide web. In: Proc. SPIE Int. Conf. on Multimedia Storage and Archiving Systems III (1998)

    Google Scholar 

  4. Foo, J.J., Sinha, R.: Pruning SIFT for scalable near-duplicate image matching. In: Proc. ADC Australasian Database Conference (February 2007)

    Google Scholar 

  5. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proc. VLDB Int. Conf. on Very Large Data Bases, Edinburgh, Scotland UK, September 1999, pp. 518–529. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  6. Hartung, F., Kutter, M.: Multimedia watermarking techniques. Proceedings IEEE (USA) 87(7), 1079–1107 (1999)

    Article  Google Scholar 

  7. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. STOC Int. Conf. on Theory of Computing, Dallas, Texas, USA, May 1998, pp. 604–613. ACM Press, New York (1998)

    Google Scholar 

  8. Jaimes, A., Chang, S.-F., Loui, A.C.: Duplicate detection in consumer photography and news video. In: Proc. MM Int. Conf. on Multimedia, pp. 423–424 (2002)

    Google Scholar 

  9. Johnson, N.F., Duric, Z., Jajodia, S.: On “Fingerprinting” images for recognition. In: Proc. MIS Int. Workshop on Multimedia Information Systems., Indian Wells, California, October 1999, pp. 4–11(1999)

    Google Scholar 

  10. Joho, H., Sanderson, M.: The spirit collection: an overview of a large web collection. SIGIR Forum 38(2), 57–61 (2004)

    Article  Google Scholar 

  11. Kang, X., Huang, J., Shi, Y.Q.: An image watermarking algorithm robust to geometric distortion. In: Proc. IWDW Int. Workshop on Digital Watermarking, Seoul, Korea, November 2002, pp. 212–223. Springer, Heidelberg (2002)

    Google Scholar 

  12. Kang, X., Huang, J., Shi, Y.Q., Lin, Y.: A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression. IEEE Trans. Circuits and Systems for Video Technology 13(8), 776–786 (2003)

    Article  Google Scholar 

  13. Ke, Y., Sukthankar, R.: PCA-sift: A more distinctive representation for local image descriptors. In: Proc. CVPR Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, June–July 2004, pp. 506–513. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  14. Ke, Y., Sukthankar, R., Huston, L.: An efficient parts-based near-duplicate and sub-image retrieval system. In: Proc. MM Int. Conf. on Multimedia, October 2004, pp. 869–876. ACM Press, New York (2004)

    Google Scholar 

  15. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  16. Lu, C.-S., Hsu, C.-Y.: Geometric distortion-resilient image hashing scheme and its applications on copy detection and authentication. Multimedia Systems 11(2), 159–173 (2005)

    Article  Google Scholar 

  17. Luo, J., Nascimento, M.A.: Content based sub-image retrieval via hierarchical tree matching. In: Proc. MMDB Int. Workshop on Multimedia Databases, November 2003, pp. 63–69 (2003)

    Google Scholar 

  18. Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. Journal of Computer Vision 60(1), 63–86 (2004)

    Article  Google Scholar 

  19. Qamra, A., Meng, Y., Chang, E.Y.: Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 27(3), 379–391 (2005)

    Article  Google Scholar 

  20. Sebe, N., Lew, M.S., Huijsmans, D.P.: Multi-scale sub-image search. In: Proc. MM Int. Conf. on Multimedia, Orlando, FL, USA, October–November 1999, pp. 79–82. ACM Press, New York (1999)

    Google Scholar 

  21. Shivakumar, N., Garcia-Molina, H.: Finding near-replicas of documents and servers on the web. In: Proc. WebDB Int. Workshop on World Wide Web and Databases, March 1998, pp. 204–212 (1998)

    Google Scholar 

  22. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  23. Zhang, D., Chang, S.-F.: Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proc. MM Int. Conf. on Multimedia, October 2004, pp. 877–884 (2004)

    Google Scholar 

  24. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys (June 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Foo, J.J., Sinha, R., Zobel, J. (2006). Discovery of Image Versions in Large Collections. In: Cham, TJ., Cai, J., Dorai, C., Rajan, D., Chua, TS., Chia, LT. (eds) Advances in Multimedia Modeling. MMM 2007. Lecture Notes in Computer Science, vol 4352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69429-8_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69429-8_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69428-1

  • Online ISBN: 978-3-540-69429-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics