Abstract
Margin-based active learning remains the most widely used active learning paradigm due to its simplicity and empirical successes. However, most works are limited to binary or multiclass prediction problems, thus restricting the applicability of these approaches to many complex prediction problems where active learning would be most useful. For example, machine learning techniques for natural language processing applications often require combining multiple interdependent prediction problems—generally referred to as learning in structured output spaces. In many such application domains, complexity is further managed by decomposing a complex prediction into a sequence of predictions where earlier predictions are used as input to later predictions—commonly referred to as a pipeline model. This work describes methods for extending existing margin-based active learning techniques to these two settings, thus increasing the scope of problems for which active learning can be applied. We empirically validate these proposed active learning techniques by reducing the annotated data requirements on multiple instances of synthetic data, a semantic role labeling task, and a named entity and relation extraction system.
Similar content being viewed by others
Notes
\(I\left[\kern-0.15em\left[ {} \right.\right.p\left.\left. {} \right]\kern-0.15em\right]\) is an indicator function such that \(I\left[\kern-0.15em\left[ {} \right.\right.p\left.\left. {} \right]\kern-0.15em\right]\) if p is true and 0 otherwise.
Empirical discrepancies between the performance reported in this work and that of [54] is accounted for by the use of averaged Perceptron and smaller batch sizes during instance selection.
References
Abney S (2002) Bootstrapping. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 360–367
Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141
Anderson B, Moore A (2005) Active learning for hidden Markov models: objective functions and algorithms. In: Proceedings of the international conference on machine learning (ICML), pp 9–16
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
Balcan M-F, Beygelzimer A, Langford J (2006) Agnostic active learning. In: Proceedings of the international conference on machine learning (ICML), pp 65–72
Balcan M-F, Broder A, Zhang T (2007) Margin-based active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 35–50
Balcan MF, Hanneke S, Wortman J (2008) The true sample complexity of active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 45–56
Baldridge J, Osborne M (2004) Active learning and the total cost of annotation. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 9–16
Baram Y, El-Yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291
Becker M (2008) Active learning: an explicit treatment of unreliable parameters. PhD thesis, University of Edinburgh
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 92–100
Brinker K (2004) Active learning of label ranking functions. In: Proceedings of the international conference on machine learning (ICML), pp 129–136
Bunescu RC (2008) Learning with probabilistic features for improved pipeline models. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 670–679
Campbell C, Cristianini N, Smola A (2000) Query learning with large margin classifiers. In: Proceedings of the international conference on machine learning (ICML), pp 111–118
Carreras X, Marquez L (2004) Introduction to the conll-2004 shared tasks: semantic role labeling. In:Proceedings of the annual conference on computational natural language learning (CoNLL)
Castro RM, Nowak RD (2007) Minimax bounds for active learning. In: Proceedings of the Annual ACM workshop on computational learning theory (COLT), pp 5–19
Chan YS, Ng HT (2007) Domain adaptation with active learning for word sense disambiguation. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 49–56
Chang M-W, Do Q, Roth D (2006) Multilingual dependency parsing: a pipeline approach. In: Recent advances in natural language processing. Springer, Berlin, pp 195–204
Chang M-W, Ratinov L, Rizzolo N, Roth D (2008) Learning and inference with constraints. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 1513–1518
Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–222
Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145
Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 1–8
Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 746–751
Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the international conference on machine learning (ICML), pp 150–157
Dasgupta S (2004) Analysis of a greedy active learning strategy. In: The conference on advances in neural information processing systems (NIPS), pp 337–344
Dasgupta S, Hsu D, Monteleoni C (2007) A general agnostic active learning algorithm. In: The conference on advances in neural information processing systems (NIPS), vol 20, pp 353–360
Dasgupta S, Kalai AT, Monteleoni C (2005) Analysis of perceptron-based active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 249–263
Daumé III H, Langford J, Marcu D (2009) Search-based structured prediction. Mach Learn 75(3):297–325
Davis PC (2002) Stone soup translation: the linked automata model. PhD thesis, Ohio State University
Donmez P, Carbonell J (2008) Optimizing estimated loss reduction for active sampling in rank learning. In: Proceedings of the international conference on machine learning (ICML), pp 248–255
Donmez P, Carbonell JG, Bennett PN (2007) Dual strategy active learning. In: Proceedings of the European conference on machine learning (ECML), pp 116–127
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York
Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate bayesian inference for linguistic annotation pipelines. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 618–626
Freund Y, Schapire RE (1997) An decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296
Godbole S, Harpale A, Sarawagi S, Chakrabarti S (2004) Document classification through interactive supervision of document and term labels. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 185–196
Hanneke S (2007) A bound o the label complexity of agnostic active learning. In: Proceedings of the international conference on machine learning (ICML), pp 353–360
Hanneke S (2007) Teaching dimension and the complexity of active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 66–81
Har-Peled S, Roth D, Zimak D (2002) Constraint classification for multiclass classification and ranking. In: The conference on advances in neural information processing systems (NIPS), pp 785–792
Hinton G, Sejnowski TJ (1999) Unsupervised learning: foundations of neural computation. MIT Press, Cambridge
Hwa R (2004) Sample selection for statistical parsing. Comput Linguist 30(3):253–276
Kearns MJ, Schapire RE, Sellie LM (1994) Toward efficient agnostic learning. Mach Learn 17(2–3):115–141
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the international conference on machine learning (ICML), pp 282–289
Laws F, Schütze H (2008) Stopping criteria for active learning of named entity recognition. In: Proceedings of the international conference on computational linguistics (COLING), pp 465–472
Luo T, Kramer K, Goldgof DB, Hall LO, Samson S, Remsen A, Hopkins T (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6:589–613
Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the international conference on machine learning (ICML), pp 623–630
Och FJ, Tillmann C, Ney H (1999) Improved alignment models for statistical machine translation. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 20–28
Olsson F (2009) A literature survey of active machine learning in the context of natural language processing. Technical report, Swedish Institute of Computer Science
Punyakanok V, Roth D, tau Yih W, Zimak D (2005) Learning and inference over constrained output. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 1124–1129
Punyakanok V, Roth D, Yih W, Zimak D (2004) Semantic role labeling via integer linear programming inference. In: Proceedings of the international conference on computational linguistics (COLING)
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2):257–286
Rai P, Saha A, Hal Daume III HD, Venkatasubramanian S (2010) Domain adaptation meets active learning. In:NAACL workshop on active learning for NLP (ALNLP)
Roth D, Small K (2006) Margin-based active learning for structured output spaces. In: Proceedings of the European conference on machine learning (ECML), pp 413–424
Roth D, Small K (2008) Active learning for pipeline models. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 683–688
Roth D, Small K, Titov I (2009) Sequential learning of classifiers for structured prediction problems. In: Proceedings of the international conference on artificial intelligence and statistics (AISTATS), pp 440–447
Roth D, Yih W-T (2004) A linear programming formulation for global inference in natural language tasks. In: Proceedings of the annual conference on computational natural language learning (CoNLL), pp 1–8
Roth D, Yih W-T (2005) Integer linear programming inference for conditional random fields. In: Proceedings of the international conference on machine learning (ICML), pp 737–744
Roth D, Yih W-T (2007) Global inference for entity and relation identification via a linear programming formulation. In: Introduction to statistical relational learning
Scheffer T, Wrobel S (2001) Active learning of partially hidden Markov models. In: Proceedings of the ECML/PKDD workshop on instance selection
Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of the international conference on machine learning (ICML), pp 839–846
Sekine S, Sudo K, Nobata C (2002) Extended named entity hierarchy. In: Proceedings of the international conference on language resources and evaluation (LREC), pp 1818–1824
Settles B (2009) Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 1069–1078
Shen D, Zhang J, Su J, Zhou G, Tan C-L (2004) Multi-criteria-based active learning for named entity recognition. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 589–596
Small K (2005) Interactive learning protocols for natural language applications. PhD thesis, University of Illinois at Urbana-Champaign
Tang M, Luo X, Roukos S (2002) Active learning for statistical natural language parsing. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 120–127
Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: The conference on advances in neural information processing systems (NIPS)
Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. In: Proceedings of the international conference on machine learning (ICML), pp 406–414
Tomanek K, Hahn U (2009) Semi-supervised active learning for sequence labeling. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 1039–1047
Tomanek K, Wermter J, Hahn U (2007) An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 486–495
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the international conference on machine learning (ICML), pp 823–830
Valiant LG (1984) A theory of the learnable. Commun ACM, pp 1134–1142
Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, Berlin
Vlachos A (2008) A stopping criterion for active learning. Comput Speech Lang 22(3):295–312
Waterman DA (1986) A guide to expert systems. Addison-Wesley, Reading
Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multiclass active learning. In: Proceedings of the international conference on computer vision (ICCV), pp 516–523
Zhu J, Wang H, Hovy EH (2008) Learning a stopping criterion for active learning for word sense disambiguation and text classification. In: Proceedings of the international joint conference on natural language processing (IJCNLP), pp 366–372
Zhu J, Wang H, Hovy EH (2008) Multi-criteria-based strategy to stop active learning for data annotation. In: Proceedings of the international conference on computational linguistics (COLING), pp 1129–1136
Zhu X (2005) Semi-supervised learning learning literature survey. Computer Sciences 1530, University of Wisconsin-Madison
Acknowledgments
The authors would like to thanks Ming-Wei Chang, Alex Klementiev, Vasin Punyakanok, Nick Rizzolo, and the reviewers for their helpful comments regarding this work. This work has been partially funded by NSF grant ITR IIS-0428472, a research grant from Motorola Labs, DARPA funding under the Bootstrap Learning Program, and by MIAS, a DHS funded Center for Multimodal Information Access and Synthesis at UIUC.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Small, K., Roth, D. Margin-based active learning for structured predictions. Int. J. Mach. Learn. & Cyber. 1, 3–25 (2010). https://doi.org/10.1007/s13042-010-0003-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-010-0003-y