Active hashing and its application to image and text retrieval

Zhen, Yi; Yeung, Dit-Yan

doi:10.1007/s10618-012-0249-y

Active hashing and its application to image and text retrieval

Published: 04 February 2012

Volume 26, pages 255–274, (2013)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Yi Zhen¹ &
Dit-Yan Yeung¹

1074 Accesses
16 Citations
Explore all metrics

Abstract

In recent years, hashing-based methods for large-scale similarity search have sparked considerable research interests in the data mining and machine learning communities. While unsupervised hashing-based methods have achieved promising successes for metric similarity, they cannot handle semantic similarity which is usually given in the form of labeled point pairs. To overcome this limitation, some attempts have recently been made on semi-supervised hashing which aims at learning hash functions from both metric and semantic similarity simultaneously. Existing semi-supervised hashing methods can be regarded as passive hashing since they assume that the labeled pairs are provided in advance. In this paper, we propose a novel framework, called active hashing, which can actively select the most informative labeled pairs for hash function learning. Specifically, it identifies the most informative points to label and constructs labeled pairs accordingly. Under this framework, we use data uncertainty as a measure of informativeness and develop a batch mode algorithm to speed up active selection. We empirically compare our method with a state-of-the-art passive hashing method on two benchmark data sets, showing that the proposed method can reduce labeling cost as well as overcome the limitations of passive hashing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th annual IEEE symposium on foundations of computer science, FOCS ’06, IEEE Computer Society, Washington, pp 459–468
Angluin D (1988) Queries and concept learning. Mach Learn 2(4): 319–342
Google Scholar
Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1998) An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J ACM 45(6): 891–923
Article MathSciNet MATH Google Scholar
Atkinson AC, Donev A (1992) Optimum experimental designs. Oxford University Press, New York, NY
MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge, UK
MATH Google Scholar
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3594–3601
Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2): 201–221
Google Scholar
Eshghi K, Rajaram S (2008) Locality sensitive hash functions based on concomitant rank order statistics. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, ACM, New York, pp 221–229
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4: 933–969
MathSciNet Google Scholar
Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Transac Math Softw 3(3): 209–226
Article MATH Google Scholar
Guo Y, Greiner R (2007) Optimistic active learning using mutual information. In: Veloso MM (ed) Proceedings of the 20th international joint conference on artificial intelligence, IJCAI ’07, pp 823–829
Guo Y, Schuurmans D (2007) Discriminative batch mode active learning. In: Platt JC, Koller D, Singer Y, Roweis S (eds), Advances in neural information processing systems 20, NIPS 20, The MIT Press, Cambridge, MA, pp 593–600
He J, Liu W, Chang S-F (2010) Scalable similarity search with optimized kernel hashing. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10, ACM, New York, pp 1129–1138
He X, Min W, Cai D, Zhou K (2007) Laplacian optimal design for image retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, ACM, New York, pp 119–126
Hoi SCH, Jin R, Zhu J, Lyu MR (2006a) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning [45], pp 417-424
Hoi SCH, Jin R, Lyu MR (2006b) Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th international conference on world wide web, WWW ’06, ACM, New York, pp 633–642
Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, ACM, New York, pp 426–434
Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems 22, NIPS 22, The MIT Press, Cambridge, MA, pp 1042–1050
Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of IEEE 12th international conference on computer vision, ICCV ’09, IEEE Computer Society, Washington, pp 2130–2137
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’94, Springer-Verlag New York, Inc., New York, pp 3–12
Lin R-S, Ross DA, Yagnik J (2010) SPEC hashing: similarity preserving algorithm for entropy-based coding. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 848–854
MacKay DJC (1992) Information-based objective functions for active data selection. Neural Comput 4(4): 590–604
Article Google Scholar
McCallum A, Nigam K (1998) Employing EM and pool-based active learning for text classification. In: Proceedings of the 15th international conference on machine learning, ICML ’98, Morgan Kaufmann Publishers Inc., San Francisco, pp 350–358
Mu Y, Shen J, Yan S (2010) Weakly-supervised hashing in kernel space. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3344–3351
Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: Fox M, Poolev (eds) Proceedings of the 24th AAAI conference on artificial intelligence, AAAI ’10, AAAI Press, Menlo Park, CA, pp 539–544
Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In:Proceedings of the 21st international conference on machine learning, ICML ’04, ACM, New York, pp 79–86
Nicholas R, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th international conference on machine learning, ICML ’01, Morgan Kaufmann Publishers Inc., San Francisco, pp 441–448
Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reason 50: 969–978
Article Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5): 513–523
Article Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47
Article Google Scholar
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th annual workshop on computational learning theory, COLT ’92, ACM, New York, pp 287–294
Shakhnarovich G (2005) Learning task-specific similarity. PhD thesis, Massachusetts Institute of Technology
Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice. The MIT Press, Cambridge, MA
Google Scholar
Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res 2: 45–66
MATH Google Scholar
Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’08, IEEE Computer Society, Los Alamitos, pp 1–8
Wang J, Kumar S, Chang S-F (2010a) Semi-supervised hashing for scalable image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3424–3431
Wang J, Kumar S, Chang S-F (2010b) Sequential projection learning for hashing with compact codes. In: Proceedings of the 27th international conference on machine learning, ICML ’10, Omnipress, Haifa, pp 1127–1134
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21, NIPS 21, The MIT Press, Cambridge, MA, pp 1753–1760
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th annual ACM-SIAM symposium on discrete algorithms, SODA ’93, Society for Industrial and Applied Mathematics, Philadelphia, pp 311–321
Yu K, Bi J, Tresp V (2006) Active learning via transductive experimental design. In: Proceedings of the 23rd international conference on machine learning [47], pp 1081–1088
Yu K, Zhu S, Xu W, Gong Y (2008) Non-greedy active learning for text categorization using convex transductive experimental design. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08, ACM, New York, pp 635–642
Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval [47], pp 18–25
Zhen Y, Yeung D-Y (2010) Supervised experimental design and its application to text retrieval. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval [47], pp 299–306
Zhu X, Lafferty J, Ghahramani Z (2003) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML workshop on the continuum from labeled to unlabeled data in machine learning and data mining, ICML ’03

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clearwater Bay Road, Kowloon, Hong Kong, China
Yi Zhen & Dit-Yan Yeung

Authors

Yi Zhen
View author publications
You can also search for this author in PubMed Google Scholar
Dit-Yan Yeung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhen.

Additional information

Responsible editor: Bing Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhen, Y., Yeung, DY. Active hashing and its application to image and text retrieval. Data Min Knowl Disc 26, 255–274 (2013). https://doi.org/10.1007/s10618-012-0249-y

Download citation

Received: 04 July 2011
Accepted: 12 January 2012
Published: 04 February 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10618-012-0249-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active hashing and its application to image and text retrieval

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Self-supervised Learning: A Succinct Review

A survey of multi-label classification based on supervised and semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Active hashing and its application to image and text retrieval

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Self-supervised Learning: A Succinct Review

A survey of multi-label classification based on supervised and semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation