Discovery of Image Versions in Large Collections

Foo, Jun Jie; Sinha, Ranjan; Zobel, Justin

doi:10.1007/978-3-540-69429-8_44

Jun Jie Foo²¹,
Ranjan Sinha²¹ &
Justin Zobel²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4352))

Included in the following conference series:

International Conference on Multimedia Modeling

872 Accesses
7 Citations

Abstract

Image collections may contain multiple copies, versions, and fragments of the same image. Storage or retrieval of such duplicates and near-duplicates may be unnecessary and, in the context of collections derived from the web, their presence may represent infringements of copyright. However, identifying image versions is a challenging problem, as they can be subject to a wide range of digital alterations, and is potentially costly as the number of image pairs to be considered is quadratic in collection size. In this paper, we propose a method for finding the pairs of near-duplicates based on manipulation of an image index. Our approach is an adaptation of a robust object recognition technique and a near-duplicate document detection algorithm to this application domain. We show that this method requires only moderate computing resources, and is highly effective at identifying pairs of near-duplicates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bernstein, Y., Zobel, J.: A scalable system for identifying co-derivative documents. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 55–67. Springer, Heidelberg (2004)
Chapter Google Scholar
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Computer Networks 29(8-13), 1157–1166 (1997)
Google Scholar
Chang, E., Wang, J.Z., Wiederhold, G.: RIME: A replicated image detector for the world-wide web. In: Proc. SPIE Int. Conf. on Multimedia Storage and Archiving Systems III (1998)
Google Scholar
Foo, J.J., Sinha, R.: Pruning SIFT for scalable near-duplicate image matching. In: Proc. ADC Australasian Database Conference (February 2007)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proc. VLDB Int. Conf. on Very Large Data Bases, Edinburgh, Scotland UK, September 1999, pp. 518–529. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Hartung, F., Kutter, M.: Multimedia watermarking techniques. Proceedings IEEE (USA) 87(7), 1079–1107 (1999)
Article Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. STOC Int. Conf. on Theory of Computing, Dallas, Texas, USA, May 1998, pp. 604–613. ACM Press, New York (1998)
Google Scholar
Jaimes, A., Chang, S.-F., Loui, A.C.: Duplicate detection in consumer photography and news video. In: Proc. MM Int. Conf. on Multimedia, pp. 423–424 (2002)
Google Scholar
Johnson, N.F., Duric, Z., Jajodia, S.: On “Fingerprinting” images for recognition. In: Proc. MIS Int. Workshop on Multimedia Information Systems., Indian Wells, California, October 1999, pp. 4–11(1999)
Google Scholar
Joho, H., Sanderson, M.: The spirit collection: an overview of a large web collection. SIGIR Forum 38(2), 57–61 (2004)
Article Google Scholar
Kang, X., Huang, J., Shi, Y.Q.: An image watermarking algorithm robust to geometric distortion. In: Proc. IWDW Int. Workshop on Digital Watermarking, Seoul, Korea, November 2002, pp. 212–223. Springer, Heidelberg (2002)
Google Scholar
Kang, X., Huang, J., Shi, Y.Q., Lin, Y.: A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression. IEEE Trans. Circuits and Systems for Video Technology 13(8), 776–786 (2003)
Article Google Scholar
Ke, Y., Sukthankar, R.: PCA-sift: A more distinctive representation for local image descriptors. In: Proc. CVPR Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, June–July 2004, pp. 506–513. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Ke, Y., Sukthankar, R., Huston, L.: An efficient parts-based near-duplicate and sub-image retrieval system. In: Proc. MM Int. Conf. on Multimedia, October 2004, pp. 869–876. ACM Press, New York (2004)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Lu, C.-S., Hsu, C.-Y.: Geometric distortion-resilient image hashing scheme and its applications on copy detection and authentication. Multimedia Systems 11(2), 159–173 (2005)
Article Google Scholar
Luo, J., Nascimento, M.A.: Content based sub-image retrieval via hierarchical tree matching. In: Proc. MMDB Int. Workshop on Multimedia Databases, November 2003, pp. 63–69 (2003)
Google Scholar
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. Journal of Computer Vision 60(1), 63–86 (2004)
Article Google Scholar
Qamra, A., Meng, Y., Chang, E.Y.: Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 27(3), 379–391 (2005)
Article Google Scholar
Sebe, N., Lew, M.S., Huijsmans, D.P.: Multi-scale sub-image search. In: Proc. MM Int. Conf. on Multimedia, Orlando, FL, USA, October–November 1999, pp. 79–82. ACM Press, New York (1999)
Google Scholar
Shivakumar, N., Garcia-Molina, H.: Finding near-replicas of documents and servers on the web. In: Proc. WebDB Int. Workshop on World Wide Web and Databases, March 1998, pp. 204–212 (1998)
Google Scholar
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)
Article Google Scholar
Zhang, D., Chang, S.-F.: Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proc. MM Int. Conf. on Multimedia, October 2004, pp. 877–884 (2004)
Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys (June 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & IT, RMIT University, Melbourne, 3001, Australia
Jun Jie Foo, Ranjan Sinha & Justin Zobel

Authors

Jun Jie Foo
View author publications
You can also search for this author in PubMed Google Scholar
Ranjan Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Justin Zobel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, Block N4, Nanyang Avenue, 639798, Singapore
Tat-Jen Cham & Deepu Rajan &
School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Jianfei Cai
IBM T.J. Watson Research Center, Yorktown Heights, P.O. Box 704, 10598, New York, USA
Chitra Dorai
National University of Singapore, 3 Science Dr, 117543, Singapore
Tat-Seng Chua
Center for Multimedia and Network Technology, School of Computer Enginnering, Nanyang Technological University, 639798, Singapore
Liang-Tien Chia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Foo, J.J., Sinha, R., Zobel, J. (2006). Discovery of Image Versions in Large Collections. In: Cham, TJ., Cai, J., Dorai, C., Rajan, D., Chua, TS., Chia, LT. (eds) Advances in Multimedia Modeling. MMM 2007. Lecture Notes in Computer Science, vol 4352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69429-8_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-69429-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69428-1
Online ISBN: 978-3-540-69429-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics