Query Concept Learning

Chang, Edward Y.

doi:10.1007/978-3-642-20429-6_3

Edward Y. Chang²

1011 Accesses

Abstract

With multimedia databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively learns a user’s desired output or query concept by asking the user whether certain proposed multimedia objects (e.g., images, videos, and songs) are relevant or not. For a learning algorithm to be effective, it must learn a user’s query concept accurately and quickly, while also asking the user to label only a small number of data instances. In addition, the concept-learning algorithm should consider the complexity of a concept in determining its learning strategies. This chapter\(^\dagger\) presents the use of support vector machines active learning in a concept-dependent way (\(\hbox{SVM}^{\rm CD}_{\rm Active}\) for conducting relevance feedback. A concept’s complexity is characterized using three measures: hit-rate, isolation and diversity. To reduce concept complexity so as to improve concept learnability, a multimodal learning approach is designed to use the semantic labels of data instances to intelligently adjust the sampling strategy and the sampling pool of \(\hbox{SVM}^{\rm CD}_{\rm Active}.\) Empirical study on several datasets shows that active learning outperforms traditional passive learning, and concept-dependent learning is superior to concept-independent relevance-feedback schemes.

^†© ACM, 2004. This chapter is written based on the author’s work with Simon Tong [1], Kingshy Goh, and Wei-Cheng Lai [2]. Permission to publish this chapter is granted under copyright licenses #2587600971756 and #2587601214143.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The input space (denoted as X in machine learning and statistics literature) is defined as the original space in which the data vectors are located, and the feature space (denoted as F ) is the space into which the data are projected, either linearly or non-linearly.
2.
Query expansion is a vital component of any retrieval systems and it remains a challenging research area. This component is not deployed in our system yet and is part of our ongoing research.
3.
Unlike some recently developed systems [23] that contain a semantic layer between image features and queries to assist query refinement, our system does not have an explicit semantic layer. We argue that having a hard-coded semantic layer can make a retrieval system restrictive. Rather, dynamically learning the semantics of a query concept is more flexible and hence makes the system more useful.
4.
A query such as “animals”, “women”, and “European architecture” does not reside contiguously in the space formed by the image features.

References

S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of the ACM international conference on Multimedia, pp. 107–118 (2001)
Google Scholar
K.S. Goh, E.Y. Chang, W.C. Lai, Multimodal concept-dependent active learning for image retrieval, in Proceedings of the ACM international conference on Multimedia, pp. 564–571 (2004)
Google Scholar
D. Lewis, W. Gale, A sequential algorithm for training text classifiers, in Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (Springer, Heidelberg, 1994), pp. 3–12
Google Scholar
A. McCallum, K. Nigam, Employing EM in pool-based active learning for text classification, In Proceedings of the Fifteenth International Conference on Machine Learning (Morgan Kaufmann, San Francisco, 1998), pp. 350–358
Google Scholar
C. Burges, A tutorial on support vector machines for pattern recognition, in Proceedings of ACM KDD, pp. 121–167 (1998)
Google Scholar
V. Vapnik, Estimation of Dependences Based on Empirical Data (Springer, Heidelberg, 1982)
Google Scholar
C. Campbell, N. Cristianini, A. Smola, Query learning with large margin classifiers, in Proceedings of the Seventeenth International Conference on Machine Learning, pp. 111–118 (2000)
Google Scholar
G. Schohn, D. Cohn, Less is more: active learning with support vector machines, in Proceedings of the Seventeenth International Conference on Machine Learning, pp. 839–846 (2000)
Google Scholar
S. Tong, D. Koller, Support vector machine active learning with applications to text classification, in Proceedings of the 17th International Conference on Machine Learning, pp. 401–412 (June 2000)
Google Scholar
M. Mendel, G. Poliner, D. Ellis, Support vector machine active learning for music retrieval. Multime’d. Syst. 12(1), 3–13 (2006)
Article Google Scholar
T. Mitchell, Generalization as search. Artif. Intel. 28, 203–226 (1982)
Article Google Scholar
J. Shawe-Taylor, N. Cristianini, Further results on the margin distribution, in Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 278–285 (1999)
Google Scholar
V. Vapnik, Statistical Learning Theory (Wiley, NY, 1998)
Google Scholar
R. Herbrich, T. Graepel, C. Campbell, Bayes point machines: estimating the bayes point in kernel space, in International Joint Conference on Artificial IntelligenceWorkshop on Support Vector Machines, pp. 23–27 (1999)
Google Scholar
K. Brinker, Incorporating diversity in active learning with support vector machines, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), pp. 59–66 (August 2003)
Google Scholar
N. Roy, A. McCallum, Toward optimal active learning through sampling estimation of error reduction, in Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pp. 441–448 (August 2001)
Google Scholar
W.C. Lai, K. Goh, E.Y. Chang, On scalability of active learning for formulating query concepts, in Proceedings of Workshop on Computer Vision Meets Databases (CVDB) in cooperation with ACM International Conference on Management of Data (SIGMOD), pp. 11–18 (2004)
Google Scholar
R. Agrawal, Fast algorithms for mining association rules in large databases, in Proceedings of VLDB, pp. 487–499 (1994)
Google Scholar
R. Duda, P. Hart, D.G. Stork, Pattern Classification. 2nd edn. (Wiley, New York, 2001)
MATH Google Scholar
C. Li, E. Chang, H. Garcia-Molina, G. Wilderhold, Clindex: Approximate similarity queries in high-dimensional spaces.. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 792–808 (2002)
Article Google Scholar
E. Chang, B. Li, MEGA: the maximizing expected generalization algorithm for learning complex query concepts. ACM Trans. Inf. Syst. 21(4), 347–382 (2003)
Article MathSciNet Google Scholar
E. Chang, K. Goh, G. Sychay, G. Wu, Content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circuits. Syst. Video Technol. Special Issue Concept. Dynamical Aspects Multime’d. Content Descr. 13(1), 26–38 (2003)
Google Scholar
J. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive integrated matching for picture libraries, in Proceedings of ACM Multimedia Conference, pp. 483–484 (2000)
Google Scholar
C. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1998)
Google Scholar
M. Kearns, U. Vazirani, An Introduction to Computational Learning Theory (MIT Press, USA, 1994)
Google Scholar
T.M. Mitchell, Machine Learning (McGraw-Hill, NY, 1997)
Google Scholar
X.S. Zhou, T.S. Huang, Comparing discriminating transformations and svm for learning during multimedia retrieval, in Proceedings of ACM Conference on Multimedia, pp. 137–146 (2001)
Google Scholar
X.S. Zhou, T.S. Huang, Relevance feedback for image retrieval: a comprehensive review. ACM Multime’d. Syst. J., Special Issue on CBIR 8, 536–544 (2003)
Google Scholar
K.S. Jones, P.Willett (eds.), Readings in Information Retrieval (Morgan Kaufman, San Francisco, July 1997)
Google Scholar
K. Porkaew, K. Chakrabarti, S. Mehrotra, Query refinement for multimedia similarity retrieval in mars, in Proceedings of ACM International Conference on Multimedia, pp. 235–238 (1999)
Google Scholar
L. Wu, C. Faloutsos, K. Sycara, T.R. Payne, Falcon: feedback adaptive loop for contentbased retrieval, in Proceedings of the 26th VLDB Conference, pp. 279–306 (September 2000)
Google Scholar
M. Ortega-Binderberger, S. Mehrotra, Relevance feedback techniques in the MARS image retrieval system. Multime’d. Syst. 9(6), 535–547 (2004)
Article Google Scholar
L. Breiman, Bagging predicators. Mach. Learn. 24(2), 123–140 (1996)
Google Scholar
L. Breiman, Arcing classifiers. Ann. Statist. 26(3), 801–849 (1998)
Google Scholar
A. Grove, D. Schuurmans, Boosting in the limit: maximizing the margin of learned ensembles, in Proceedings of 15th National Conference on Artificial Intelligence (AAAI), pp. 692–699 (1998)
Google Scholar
R. Schapire, Y. Freund, P. Bartlett, W. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods, in Proceeding of the Fourteenth International Conference on Machine Learning (Morgan Kaufmann, San Francisco, 1997) pp. 322–330
Google Scholar
H. Wu, H. Lu, S. Ma, A practical svm-based algorithm for ordinal regression in image retri eval, in Proceedings of ACM International Conference on Multimedia, pp. 612–621 (2003)
Google Scholar
T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)
MATH Google Scholar
G. James, T. Hastie, Error coding and substitution PaCTs, in Proceedings of NIPS, (1997)
Google Scholar
M. Moreira, E. Mayoraz, Improved pairwise coupling classification with error correcting classifiers, in Proceedings of ECML, pp. 160–171 (April 1998)
Google Scholar
D. Cohn, Z. Ghahramani, M. Jordan, Active learning with statistical models. J. Artif. Intell. Res. 4, 129–145 (1996)
MATH Google Scholar
N. Cesa-Bianchi, Y. Freund, D. Haussler, D.P. Helmbold, R.E. Schapire, M.K. Warmuth, How to use expert advice. J. ACM 44(3), 427–485 (1997)
Article MathSciNet MATH Google Scholar
T. Jaakkola, H. Siegelmann, Active information retrieval, in Proceedings of NIPS, pp. 777–784 (2001)
Google Scholar
S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, pp. 107–118 (October 2001)
Google Scholar
Y. Freund, H. Seung, E. Shamir, N. Tishby, Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)
Article MATH Google Scholar
H. Seung, M. Opper, H. Sompolinsky, Query by committee, in Proceedings of the Fifth Workshop on Computational Learning Theory, (Morgan Kaufmann, San Francisco, 1992), pp. 287–294
Google Scholar
I. Dagan, S. Engelson, Committee-based sampling for training probabilistic classifiers, in Proceedings of the Twelfth International Conference on Machine Learning (Morgan Kaufmann, San Francisco, 1995) pp. 150–157
Google Scholar
T. Joachims, Text categorization with support vector machines, in Proceedings of ECML, (Springer, Heidelberg, 1998) pp. 137–142
Google Scholar
S.T. Dumais, J. Platt, D. Heckerman, M. Sahami, Inductive learning algorithms and representations for text categorization, in Proceedings of the Seventh International Conference on Information and Knowledge Management, (ACM Press, NY, 1998) pp. 148–155
Google Scholar
I.J. Cox, M.L. Miller, S.M. Omohundo, P.N. Yianilos, Pichunter: Bayesian relevance feedback for image retrieval, in Proceedings of International Conference on Pattern Recognition, pp. 361–369 (August 1996)
Google Scholar
I.J. Cox, M.L. Miller, T.P. Minka, T.V. Papathomas, P.N. Yianilos, The Bayesian image retrieval system, Pichunter: theory, implementation and psychological experiments. IEEE Trans. Image Process. 9(1), 20–37 (2000)
Article Google Scholar
E.Y. Chang, B. Li, G. Wu, K.S. Goh, Statistical learning for effective visual information retrieval (invited paper), in Proceedings of IEEE International Conference on Image Processing (ICIP), pp. 609–612 (2003)
Google Scholar
J.J. Rocchio, Relevance feedback in information retrieval, in The SMART Retrieval System—Experiments in Automatic Document Processing, ed. by G. Salton (Prentice Hall, NJ, 1971) pp. 313–323
Google Scholar
Y. Ishikawa, R. Subramanya, C. Faloutsos, Mindreader: querying databases through multiple examples, in Proceedings of VLDB, pp. 218–227 (1998)
Google Scholar
M. Ortega, Y. Rui, K. Chakrabarti, A. Warshavsky, S. Mehrotra, T.S. Huang, Supporting ranked boolean similarity queries in mars. IEEE Trans. Knowl. Data Eng. 10(6), 905–925 (1999)
Article Google Scholar
M. Ortega, Y. Rui, K. Chakrabarti, S. Mehrotra, T.S. Huang, Supporting similarity queries in mars, in Proceedings of ACM International Conference on Multimedia, pp. 403–413 (1997)
Google Scholar
Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: A power tool in interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 8(5), 644–655 (1998)
Article Google Scholar
K. Porkaew, S. Mehrota, M. Ortega, Query reformulation for content based multimedia retrieval in mars, in Proceedings of ICMCS, pp. 747–751 (1999)
Google Scholar
M. Flickner, H. Sawhney, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, P. Yanker, Query by image and video content: the QBIC system. IEEE Comput. 28(9), 23–32 (1995)
Google Scholar
A. Gupta, R. Jain, Visual information retrieval. Commun. ACM 40(5), 69–79 (1997)
Article Google Scholar
K.A. Hua, K. Vu, J.H. Oh, Sammatch: a flexible and efficient sampling-based image retrieval technique for image databases, in Proceedings of ACM Multimedia, pp. 225–234 (1999)
Google Scholar
B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
Article Google Scholar
J.R. Smith, S.F. Chang, VisualSEEk: a fully automated content-based image query system, in Proceedings of ACM Multimedia, pp. 87–98 (1996)
Google Scholar
J.Z. Wang, G. Wiederhold, O. Firschein, S.X. Wei, Wavelet-based image indexing techniques with partial sketch retrieval capability, in Proceedings of the ADL, pp. 13–24 (May 1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc., Mountain View, CA, 94306, USA
Edward Y. Chang

Authors

Edward Y. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chang, E.Y. (2011). Query Concept Learning. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-20429-6_3
Published: 26 August 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics