Skip to main content

Advertisement

Log in

Active learning for object classification: from exploration to exploitation

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Classifying large datasets without any a-priori information poses a problem in numerous tasks. Especially in industrial environments, we often encounter diverse measurement devices and sensors that produce huge amounts of data, but we still rely on a human expert to help give the data a meaningful interpretation. As the amount of data that must be manually classified plays a critical role, we need to reduce the number of learning episodes involving human interactions as much as possible. In addition for real world applications it is fundamental to converge in a stable manner to a solution that is close to the optimal solution. We present a new self-controlled exploration/exploitation strategy to select data points to be labeled by a domain expert where the potential of each data point is computed based on a combination of its representativeness and the uncertainty of the classifier. A new Prototype Based Active Learning (PBAC) algorithm for classification is introduced. We compare the results to other active learning approaches on several benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Asuncion A, Newman D (2007) UCI machine learning repository. http://mlearn.ics.uci.edu/mlrepository.html

  • Baram Y, El-Yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5: 255–291

    MathSciNet  Google Scholar 

  • Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9): 509–517

    Article  MATH  MathSciNet  Google Scholar 

  • Buhmann JM, Zöller T (2000) Active learning for hierarchical pairwise data clustering. In: International conference on pattern recognition (ICPR), Barcelona, Spain, vol II, pp 2186–2189

  • Cebron N, Berthold MR (2006) Adaptive active classification of cell assay images. In: Fürnkranz J, Scheffer T, Spiliopoulou M(eds) PKDD, vol 4213 of lecture notes in computer science. Springer, Berlin, pp 79–90

    Google Scholar 

  • Chin SL (1997) An efficient method for extracting fuzzy classification rules from high dimensional data. JACIII 1(1): 31–36

    Google Scholar 

  • Cohn DA, Atlas L, Ladner RE (1994a) Improving generalization with active learning. Mach Learn 15(2): 201–221

    Google Scholar 

  • Cohn DA, Ghahramani Z, Jordan MI (1994) Active learning with statistical models. In: Tesauro G, Touretzky DS, Leen TK(eds) NIPS. MIT Press, Cambridge, pp 705–712

    Google Scholar 

  • Kang J, Ryu KR, Kwon H-C (2004) Using cluster-based sampling to select initial training set for active learning in text classification. In: Advances in knowledge discovery and data mining, vol 3056. Springer, Berlin, pp 384–388

  • Luo T, Kramer K, Goldgof DB, Hall LO, Samson S, Remsen A, Hopkins T (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6: 589–613

    MathSciNet  Google Scholar 

  • Mandel MI, Poliner GE, Ellis DPW (2006) Support vector machine active learning for music retrieval. Multimedia Syst 12(1): 3–13

    Article  Google Scholar 

  • McCallum A, Nigam K (1998) Employing em and pool-based active learning for text classification. In: Shavlik JW(eds) Proceedings of the fifteenth international conference on machine learning (ICML 1998), Madison, WI, July 24–27, 1998. Morgan Kaufmann, San Fransisco, CA, pp 350–358

    Google Scholar 

  • Nguyen HT, Smeulders AWM (2004) Active learning using pre-clustering. In: Brodley CE (ed) Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8, 2004. ACM

  • Osugi T, Kun D, Scott S (2005) Balancing exploration and exploitation: a new algorithm for active machine learning. In: ICDM ’05: proceedings of the fifth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 330–337

  • Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Langley P(eds) Proceedings of the seventeenth international conference on machine learning (ICML 2000), Stanford University, Stanford, CA, June 29–July 2, 2000. Morgan Kaufmann, San Fransisco, CA, pp 839–846

    Google Scholar 

  • Wang L, Chan KL, Zhang Z (2003) Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR 2003), Madison, WI, June 16–22, 2003. IEEE Computer Society, pp 629–634

  • Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C (2003) Active learning with support vector machines in the drug discovery process. J Chem Inf Comp Sci 43(2): 667–673

    Google Scholar 

  • Xu Z, Yu K, Tresp V, Xu X, Wang J (2004) Representative sampling for text classification using support vector machines. In: ECIR 2003, vol 2633. Springer, Berlin, pp 393–407

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Cebron.

Additional information

Responsible editor: Pierre Baldi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cebron, N., Berthold, M.R. Active learning for object classification: from exploration to exploitation. Data Min Knowl Disc 18, 283–299 (2009). https://doi.org/10.1007/s10618-008-0115-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0115-0

Keywords

Navigation