Advertisement

Knowledge and Information Systems

, Volume 35, Issue 2, pp 249–283 | Cite as

A survey on instance selection for active learning

  • Yifan Fu
  • Xingquan Zhu
  • Bin Li
Regular Paper

Abstract

Active learning aims to train an accurate prediction model with minimum cost by labeling most informative instances. In this paper, we survey existing works on active learning from an instance-selection perspective and classify them into two categories with a progressive relationship: (1) active learning merely based on uncertainty of independent and identically distributed (IID) instances, and (2) active learning by further taking into account instance correlations. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/weaknesses, followed by a simple runtime performance comparison, and discussion about emerging active learning applications and instance-selection challenges therein. This survey intends to provide a high-level summarization for active learning and motivates interested readers to consider instance-selection approaches for designing effective active learning solutions.

Keywords

Active learning survey Instance selection Uncertainty sampling Instance correlations 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aminian M (2005) Active learning with scarcely labeled instances via bias variance reduction. In: Proceedings of international conference on artificial intelligence and machine learning (ICAIML 2005), Cairo, pp 41–45Google Scholar
  2. 2.
    Becker M, Hachey B, Alex B, Grover C (2005) Optimising selective sampling for boostrapping named entity recognition. In : Workshop on learning with multiple view the 22nd international conference on machine learning (ICML 2005), Bonn, pp 5–11Google Scholar
  3. 3.
    Bilgic M, Mihalkova L, Getoor L (2010) Active learning for networked data. In: Proceedings of the 27th international conference on machine learning (ICML 2010), ACM, Haifa, pp 79–86Google Scholar
  4. 4.
    Bottou L (1991) One approche theorique del apprentissage connexionniste: applications. Ala reconnaissance de la parole. Doctoral dissertation, Universite de Paris XIGoogle Scholar
  5. 5.
    Burl MC, Wang E (2009) Active learning for directed exploration of complex systems. In: Proceedings of the 26th international conference on machine learning (ICML 2009), Montreal, pp 89–96Google Scholar
  6. 6.
    Campbell C, Cristianini N, Smola A (2000) Query learning with large margin classifiers. In: Proceedings of the 17th international conference of machine learning (ICML 2000), CA, pp 111–118Google Scholar
  7. 7.
    Carlson A, Berreridge J, Wang R, Hruschka ER, Mitchell TM (2010) Coupling semi-supervised learning of information extraction. In: Proceedings of the ACM international conference on web search and data mining (ICWSDM-2010), Washington, pp 101–110Google Scholar
  8. 8.
    Chang MW, Ratinov L, Rizzolo N, Roth D (2008) Learning and inference with constraints. In: Proceedings of the 23rd national conference on artificial intelligence (AAAI 2008), Chicago, pp 1513–1518Google Scholar
  9. 9.
    Chang MW, Ratinov LA, Roth D (2007) Guiding semi-supervision with constraint-driven learning. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague, pp 280–287Google Scholar
  10. 10.
    Chen Y, Subramani M (2010) Study of active learning in the challenge. In: Proceedings of the international joint conference on neural network (IJCNN 2010), Barcelona, pp 1–7Google Scholar
  11. 11.
    Cheng H, Zhang R, Peng Y, Mao J, Tan P (2008) Maximum margin active learning for sequence labeling with different length. In: Proceedings of the 8th industrial conference on advances in data mining: medical applications E-commerce marketing and theoretical aspects (ICADM 2008), Leipzig, pp 345–359Google Scholar
  12. 12.
    Copa L, Devis T, Michele V, Mikhail K (2010) Unbiased query-by-bagging active learning for VHR image classification. In: Proceedings of conference on image and signal processing for remote sensing XVI (ISPRS 2010), vol 7830, Toulouse, pp 78300K–78300K-8Google Scholar
  13. 13.
    Escudeiro N, Jorge A (2010) D-confidence: an active learning strategy which efficiently identifies small classes. In: Proceedings of workshop on active learning for natural language processing (ALNLP 2010), Los Angels, pp 18–26Google Scholar
  14. 14.
    Fine S, Bachrach RG, Shamir E (2002) Query by committee liner separation and random walks. Theor Comput Sci 284(1): 25–51zbMATHCrossRefGoogle Scholar
  15. 15.
    Fuji A, Tokunaga T, Inui K, Tanaka H (1998) Selective sampling for example based word sense disambiguation. Comput Linguist 24(4): 573–597Google Scholar
  16. 16.
    Gilad-Bachrach R, Navor A (2003) Kernel query by committee algorithm. Technology report no. 2003-88 Leibniz centre, The Hebrew UniversityGoogle Scholar
  17. 17.
    Godec et al (2010) Context-driven clustering by multi-class classification in an active learning framework. In 2010 IEEE computer society conference on computer vision and pattern recognition workshops, pp 19–24Google Scholar
  18. 18.
    Hassanzadeh H, Keyvanpour M (2011) A variance based active learning approach for named entity recognition. In: Intelligent computing and information science, vol 135, Springer, Berlin, pp 347–352Google Scholar
  19. 19.
    Hoi SCH, Jin R, Lyu MR (2006) Large-scale text categorization by batch model active learning. In: The international conference on the world wide web (WWW 2006), ACM Press, New york, pp 633–642Google Scholar
  20. 20.
    Hoi SHC, Jin R, Zhu J, Lyu MR (2006) Batch mode active learning and its application to medical image classification. In: The 23rd international conference on machine learning (ICML 2006), Pittsburgh, pp 417–424Google Scholar
  21. 21.
    Holub A, Perona P (2008) Entropy-based active learning for object recognition. In: IEEE computer society conference on computer vision and pattern recognition workshop anchorage (CVPR 2008), pp 1–8Google Scholar
  22. 22.
    Huang A, Milne D, Frank E, Witten I (2008) Clustering documents with active learning using wikipedia. In: The 8th IEEE international conference on data mining (ICDM 2008), Pisa, pp 839–844Google Scholar
  23. 23.
    Huang J, Milne D, Frank E, Witten I (2007) Efficient multiclass boosting classification with active learning. In: The SIAM international conference on data mining (SDM 2007), Minnesota, pp 297–308Google Scholar
  24. 24.
    Ishihara T, Abe KI, Takeda H (1988) Extensions of innovations dual control. Int J Syst Sci 19: 653–667zbMATHCrossRefGoogle Scholar
  25. 25.
    Jones R, Ghani R, Mitchell T, Rilo E (2003) Active learning for information extraction with multiple view feature sets. In: Proceedings of ECML Workshop on Adaptive Text Extraction and Mining (ATEM-2003)Google Scholar
  26. 26.
    Kim J, Song Y, Kim S, Cha J, Lee G (2006) MMr-based active machine learning for bionamed entity recognition. In: Human language technology and the North American association for computational linguistics, ACL Press, pp 69–72Google Scholar
  27. 27.
    Kunegis J, Lommatzsch A, Bauckhage C (2008) Alternative similarity functions for graph kernels. In: Proceedings of international conference on pattern recognition (ICPR 2008), Florida, pp 1–4Google Scholar
  28. 28.
    Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the ACM SIGIR conference on research and development in information retrieval (SIGIR 1994), Dublin, pp 3–12Google Scholar
  29. 29.
    Li B, Yu S, Lu Q (2003) An improved k-nearest neighbor algorithm for text categorization. In: Proceedings of the 20th international conference on computer processing of oriental languages (CPOL 2003), Shenyang, pp 12–19Google Scholar
  30. 30.
    Li D, Qian F, Fu P (2002) Variance minimization approach for a class of dual control problems. In: Proceedings of the 2002 American control conference (ACC 2002), Alaska, pp 3759–3764Google Scholar
  31. 31.
    Li M, Ishwar KS (2006) Confidence-based active learning. IEEE Trans Pattern Anal Mach Intell 28(8): 1251–1261CrossRefGoogle Scholar
  32. 32.
    Long B, Chapelle O, Zhang Y, Chang Y, Zheng Z, Tseng B (2010) Active learning for ranking through expected loss optimization. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval (SIGIR 2010), Geneva, pp 267–274Google Scholar
  33. 33.
    Mann G, McCallum A (2007) Effiecent computation of entropy gradient for semi-supervised conditional random fields. In: Proceedings of the conference of the North American chapter of the association for computational linguistics (NAACL 2007), PA, pp 109–112Google Scholar
  34. 34.
    McCallum AK, Nigam K (1998) Employing EM in pool-based active learning for text classification. In: Proceedings of the international conference on machine learning (ICML 1998), Morgan, pp 359–367Google Scholar
  35. 35.
    Milito R, Padilla C, Padilla R, Cadorin D (1982) An innovations approach to dual control. IEEE Trans Autom Control 27(1): 132–137zbMATHCrossRefGoogle Scholar
  36. 36.
    Muslea I (2002) Active learning with multiple views. Doctoral dissertation, University of South CaliforniaGoogle Scholar
  37. 37.
    Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the 21st international conference on machine learning (ICML 2004), Banff, pp 839–846Google Scholar
  38. 38.
    Nguyen HV, Li B (2010) Cosine similarity metric learning for face verification. In: Proceedings of Asian conference on computer vision (ACCV 2010), QueensTown, pp 709–720Google Scholar
  39. 39.
    Olsson F (2009) A literature survey of active learning machine learning in the context of natural language procession. Swedish Institute of Computer Science, Technical report T2009:06Google Scholar
  40. 40.
    Qi G, Hua X, Rui Y, Tang J, Zhang H (2008) Two-dimensional active learning for image classification. In: Proceedings of the 23rd IEEE conference on computer vision and pattern recognition (CVPR 2008), Alaska, pp 1–8Google Scholar
  41. 41.
    Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the international conference on machine learning (ICML 2001), Morgan, pp 441–448Google Scholar
  42. 42.
    Saar-Tsechansky M, Provost F (2000) Variance-based active learning. The CeDER working paper no. IS-00-05Google Scholar
  43. 43.
    Seung H,S, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th annual workshop on computational learning theory (COLT 1992), Pittsburgh, pp 287–294Google Scholar
  44. 44.
    Settles B (2010) Active learning literature survey. Technical report 1648, University of Wisconsin, MadisonGoogle Scholar
  45. 45.
    Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP-2008), Hawaii, pp 1070–1079Google Scholar
  46. 46.
    Settles B, Craven M, Ray S (2008) Multiple-instance active learning. Adv Neural Inf Process Syst 20: 1289–1296Google Scholar
  47. 47.
    Shen D, Zhang J, Su J, Zhou G, Tan C (2004) Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd annual meeting of association for computational linguistics (ACL 2004), Barcelona, pp 589–596Google Scholar
  48. 48.
    Shi S, Liu Y, Huang Y, Zhu S, Liu Y (2008) Active learning for knn based on bagging features. In: Proceedings of the 4th international conference on natural computation (ICNC 2008), Jinan, pp 61–64Google Scholar
  49. 49.
    Shum S, Dehak N, Dehak R, Glass J (2010) Unsupervised speaker adaptation based on the consine similarity for text-independent speaker verification. In: Proceedings of the IEEE Odyssey workshop, BrnoGoogle Scholar
  50. 50.
    Stolfo S, Fan W, Lee W, Prodromidis A (1997) Credit card fraud detection using meta-learning: issues and initial results. In: Proceedings of AAAI workshop on fraud detection and risk management (AAAI 1997), California, pp 83–90Google Scholar
  51. 51.
    Sun S (2010) Active learning with extremely sparse labeled examples. In: Proceedings of the 10th Brazilian symposium on neural networks (SBRN 2010), Sao Paulo, pp 2980–2984Google Scholar
  52. 52.
    Sohn S, Comeau D, Kim W, Wilbur W (2009) Term-centric active learning for naive bayes document classification. Open Inf Syst J 3: 54–67CrossRefGoogle Scholar
  53. 53.
    Wang M, Hua X (2011) Active learning in multimedia annotation and retrieval: a survey. ACM Trans Intell Syst Technolo 2(2): 3–23MathSciNetGoogle Scholar
  54. 54.
    Wang Z, Song Y, Zhang C (2009) Efficient active learning with boosting. In: Proceedings of the SIAM data mining conference (SDM 2009), Nevada, pp 1232–1243Google Scholar
  55. 55.
    Weber JS, Pollack ME (2007) Entropy-driven online active learning for interactive calendar management. In: Proceedings of the 12th international conference on intelligent user interfaces (ICIUI 2007), Hawaii, pp 141–150Google Scholar
  56. 56.
    Wittenmark B (1975) An active suboptimal dual controller for systems with stochastic parameters. Automat Control Theory Appl 3: 13–19MathSciNetGoogle Scholar
  57. 57.
    Zhu X, Ghahramani Z, John L (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning (ICML 2003), Washington, pp 912–919Google Scholar
  58. 58.
    Yan S (2005) Semi-automatic video semantic annotation based on active learning. Vis Commun Image Process 5960: 251–258Google Scholar
  59. 59.
    Zhang Y (2010) Multi-task active learning with output constraints. In: Proceedings of the 24th AAAI conference on artificial intelligence (AAAI 2010), Georgia, pp 667–672Google Scholar
  60. 60.
    Zhao Y, Cao Y, Pan X (2008) A telecom clients credit risk rating model based on active learning. In: Proceedings of IEEE international conference on automation and logistics (ICAL 2008), Qingdao, pp 2590–2593Google Scholar
  61. 61.
    Zhao Y, Xu C, Cao Y (2006) Research on query-by-committee method of active learning and application. In: Lecture notes on artificial intelligence (LNAI 2006), vol 4093, pp 985–991Google Scholar
  62. 62.
    Zhou Z, Sun Y, Li Y (2009) Multi-instance learning by treating instances as non-i,i,d, samples. In: Proceedings of the 26th international conference on machine learning (ICML 2009), Montreal, pp 1249–1256Google Scholar
  63. 63.
    Zhu J, Wang H, Tsou B, Ma M (2010) Active learning with sampling by uncertainty and density for instances annotations. IEEE Trans Audio Speech Lang Process 18(6): 1323–1331CrossRefGoogle Scholar
  64. 64.
    Zhu X (2008) Semi-supervised learning literature survey. In: Computer sciences TR 1530, University of Wisconsin, MadisonGoogle Scholar
  65. 65.
    Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: Proceedings of the 7th IEEE international conference on data mining (ICDM 2007), Nebraska, pp 757–762Google Scholar
  66. 66.
    Bilgic M, Getoor L (2010) Active inference for collective classification. In: Proceedings of the 24th AAAI conference on artificial intelligence (AAAI 2010), Georgia, pp 1652–1655Google Scholar
  67. 67.
    Chu W, Zinkevich M, Li L (2011) Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD 2011), CAGoogle Scholar
  68. 68.
    Zhang P, Zhu X, Tan J, Guo L (2010) Classifier and cluster ensembles formining concept drifting data streams. In: Proceedings of the 10th IEEE international conference on data mining (ICDM 2010), Sydney, pp 1175–1180Google Scholar
  69. 69.
    Cesa-Bianchi N, Gentile C, Vitale F, Zappella G (2010) Active learing on trees and graphs. In: Proceedings of the 23rd international conference on learning theory, Haifa, pp 320–332Google Scholar
  70. 70.
    Guillory A, Bilmes J (2009) Labeled selection on graphs. In: Proceedings of 23rd annual conference on neural information processing systems (NIPS 2009), Vancouver, pp 320–332Google Scholar
  71. 71.
    Sheng VS, Provost F, Ipeirotis P (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of 16th ACM SIGKDD conference on knowledge discovery and data mining (KDD 2008), Washington, pp 615–622Google Scholar
  72. 72.
    Zhao L, Sukthankar G, Sukthankar R (2011) Incremental relabeling for active learning with noisy crowdsourced annotations. In: Proceedings of the 2011 IEEE third international confernece on social computing (SocialCom 2011), Boston, pp 728–733Google Scholar
  73. 73.
    Chan Y, Ng H (2007) Domain adaptation with active learning for word sense disambiguation. Comput Linguist 45: 49–56Google Scholar
  74. 74.
    Saha A, Rai P, Daume H, Venkatasubramanian S, DuVall S (2011) Active supervised domain adaptation. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases (ECML/PKDD 2011), AthensGoogle Scholar
  75. 75.
    Shi X, Fan W, Ren J (2008) Actively transfer domain knowledge. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases (ECML/PKDD 2011), AntwerpGoogle Scholar
  76. 76.
    Zhu Z, Zhu X, Ye Y, Guo Y, Xue X (2011) Transfer active learning. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM 2011), GlasgowGoogle Scholar
  77. 77.
    Zhu X (2011) Cross-domain semi-supervised learning using feature formulation. IEEE Trans Syst Man Cybern B 41(6): 1627–1638CrossRefGoogle Scholar
  78. 78.
    Zhu X, Wu X (2006) Scalable representative instance selection and ranking. In: Proceedings of the 18th international conference on pattern recognition (ICPR 2006), Hongkong, pp 352–355Google Scholar
  79. 79.
    Fu Y, Li B, Zhu X, Zhang C (2011) Do they belong to the same class: active learning by querying pairwise label homogeneity. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), Glasgow, pp 2161–2164Google Scholar
  80. 80.
    Donmez P, Carbonell J (2008) Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In: Proceedings of the ACM conference on information and knowledge management (CIKM 2008), pp 619–628Google Scholar
  81. 81.
    Vijayanarasimhan S, Jain P, Grauman K (2010) Far-sighted active learning on a budget for image and video recognition. In: Proceedings of the 23rd IEEE conference on computer vision and pattern recognition (CVPR 2010). San Francisco, pp 3035–3042Google Scholar
  82. 82.
    Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of the 15th international conference on machine learning (ICML 1998), pp 1–9Google Scholar
  83. 83.
    Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD 2009), Paris, pp 139–148Google Scholar
  84. 84.
    Fan W, Huang Y, Wang H, Yu P(2004) Active mining of data streams. In: Proceedings of SIAM international conference on data mining (SDM 2004), FloridaGoogle Scholar
  85. 85.
    Brecheisen S, Kriegel H, Pfeifle M (2006) Multi-step density-based clustering. Knowl Inf Syst 9(3): 284–308CrossRefGoogle Scholar
  86. 86.
    Hovsepian K, Anselmo P, Mazumdar S (2011) Supervised inductive learning with Lotka–Volterra derived models. Knowl Inf Syst 26(2): 195–223CrossRefGoogle Scholar
  87. 87.
    Zhou Z, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24(3): 415–439CrossRefGoogle Scholar
  88. 88.
    Amini M, Gallinari P (2005) Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 13(1): 1–42Google Scholar
  89. 89.
    Sinohara Y, Miura T (2003) Active feature selection based on a very limited number of entities. Adv Intell Data Anal 2811: 611–622Google Scholar
  90. 90.
    Beygelzimer A, Dasgupa S, Langford J (2009) Important weighted active learning. In: Proceedings of the 26th international conference on machine learning (ICML 2009), Montreal, pp 49–56Google Scholar
  91. 91.
    Bishan Y, Sun J, Wang T, Chen Z (2009) Effective multi-label active learning for text classification. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD 2009), Paris, pp 917–925Google Scholar
  92. 92.
    Vijayakumar S, Sugyama M, Ogawa H (1998) Training instances selection for optimal generalization with noise variance reduction in neural network. In: Proceedings of the 10th Italian workshop on neural nets, Vietri sul Mare, Italy, pp 1530–1547Google Scholar
  93. 93.
    Culotta A, McCallum A (2005) Reducing labeling effort for stuctured prediction tasks. In: Proceedings of the 20th national conference on artificial intelligence (AAAI 2005), pp 746–751Google Scholar
  94. 94.
    Zhao W, He Q, Ma H, Shi Z (2012) Effective semi-supervised document clustering via active learning with instance-level constraints. Knowl Inf Syst 3(3): 569–587CrossRefGoogle Scholar
  95. 95.
    Zhu X, Ding W, Yu P, Zhang C (2011) One-class learning and concept summarization for data streams. Knowl Inf Syst 28(3): 523–553CrossRefGoogle Scholar
  96. 96.
    Pan S, Zhang Y, Li X (2011) Dynamic classifier ensemble for positive unlabeled text stream classification. Knowl Inf Syst 1–21. doi: 10.1007/s10115-011-0469-2
  97. 97.
    Liu W, Wang T (2011) Online active multi-field learning for efficent email spam filtering. Knowl Inf Syst 1–20. doi: 10.1007/s10115-011-0461-x

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  1. 1.Centre for Quantum Computation and Intelligent Systems (QCIS), Faculty of Engineering and Information TechnologyUniversity of TechnologySydneyAustralia

Personalised recommendations