Abstract
In this paper, we present a new methodology for conducting active learning in a single-pass on-line learning context. Single-pass active learning can be understood as an approach for reducing the annotation effort for users and operators in on-line classification problems, in which usually the true class labels of new incoming samples are usually unknown. This reduction in effort can be achieved by selecting the most informative samples, that is, those that contribute most to improving the predictive performance of incremental classifiers. Our approach builds upon certainty-based sample selection in connection with version-space reduction. Two new reliability concepts were investigated and developed in connection with evolving fuzzy classifiers: conflict and ignorance. Conflict models the extent to which a new query point lies in the conflict region between two or more classes and therefore reflects a level of certainty in the classifier’s prediction. Ignorance represents the distance of a new query point from the training samples seen so far. In extended form, it integrates the actual variability of the version space. The choice of the model architecture used for on-line classification scenarios (evolving fuzzy classifier) is clearly motivated in the paper. The results based on real-world binary and multi-class classification streaming data show that our single-pass active learning approach yields evolving classifiers whose performance is similar to that of classifiers using all samples for adaptation; however, the annotation effort in terms of the number of class label requests is reduced by up to 90 %.
Similar content being viewed by others
Notes
Results on CD imprint data could not be included as the state-of-the-art active learning software crashed after some iterations on this data set.
References
Angelov P, Filev D, Kasabov N (2010a) Editorial to evolving systems. Evol Syst 1(1):1–2
Angelov P, Filev D, Kasabov N (2010b) Evolving intelligent systems—methodology and applications. Wiley, New York
Angelov P, Lughofer E, Zhou X (2008) Evolving fuzzy classifiers using different model architectures. Fuzzy Sets Syst 159(23):3160–3182
Angelov P, Zhou X (2008) Evolving fuzzy-rule-based classifiers from data streams. IEEE Trans Fuzzy Syst 16(6):1462–1475
Backer SD, Scheunders P (2001) Texture segmentation by frequency-sensitive elliptical competitive learning. Image Vis Comput 19(9–10):639–648
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Bifet A, Kirkby R (2011) Data stream mining—a practical approach. Technical report, Department of Computer Sciences, University of Waikato, Japan
Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619
Bouchachia A (2009) Incremental induction of classification fuzzy rules. In: IEEE workshop on evolving and self-developing intelligent systems (ESDIS) 2009. Nashville, USA, pp 32–39
Bouchachia A (2010) An evolving classification cascade with self-learning. Evol Syst 1(3):143–160
Bouchachia A, Mittermeir R (2006) Towards incremental fuzzy classifiers. Soft Comput 11(2):193–207
Chu W, Zinkevich M, Li L, Thomas A, Zheng B (2011) Unbiased online active learning in data streams. In: Proceedings of the KDD 2011. San Diego, California
Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221
Cohn D, Ghahramani Z, Jordan M (1996) Active learning with statistical models. J Artif Intell Res 4(1):129–145
Condurache A (2002) A two-stage-classifier for defect classification in optical media inspection. In: Proceedings of the 16th international conference on pattern recognition (ICPR’02), vol 4. Quebec City, Canada, pp 373–376
Dagan I, Engelson S (1995) Committee-based sampling for training probabilistic classifier. In: Proceedings of 12th international conference on machine learning, pp 150–157
Diehl C, Cauwenberghs G (2003) SVM incremental learning, adaptation and optimization. In: Proceedings of the international joint conference on neural networks, vol 4. Boston, pp 2685–2690
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. Boston, MA, pp 71–80
Domingos P, Hulten G (2001) Catching up with the data: research issues in mining data streams. In: Proceedings of the workshop on research issues in data mining and knowledge discovery. Santa Barbara, CA
Donmez P, Carbonell J (2008) Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In: Proceedings of the CIKM 2008 conference. Napa Valley, California
Eitzinger C, Heidl W, Lughofer E, Raiser S, Smith J, Tahir M, Sannen D, van Brussel H (2010) Assessment of the influence of adaptive components in trainable surface inspection systems. Mach Vis Appl 21(5):613–626
Fukumizu K (2000) Statistical active learning in multilayer perceptrons. IEEE Trans Neural Netw 11(1):17–26
Fürnkranz J (2002) Round robin classification. J Mach Learn Res 2:721–747
Gacto M, Alcala R, Herrera F (2011) Interpretability of linguistic fuzzy rule-based systems: an overview of interpretability measures. Inf Sci 181(20):4340–4360
Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Boca Raton
Hartert L, Sayed-Mouchaweh M, Billaudel P (2010) A semi-supervised dynamic version of fuzzy k-nearest neighbors to monitor evolving systems. Evol Syst 1(1):3–15
Hisada M, Ozawa S, Zhang K, Kasabov N (2010) Incremental linear discriminant analysis for evolving feature spaces in multitask pattern recognition problems. Evol Syst 1(1):17–27
Hu W, Hu W, Xi N, Maybank S (2009) Unsupervised active learning based on hierarchical graph-theoretic clustering. IEEE Trans Syst Man Cybern Part B Cybern 39(5):1147–1161
Hühn J, Hüllermeier E (2009) FR3: A fuzzy rule learner for inducing reliable classifiers. IEEE Trans Fuzzy Syst 17(1):138–149
Hüllermeier E, Brinker K (2008) Learning valued preference structures for solving classification problems. Fuzzy Sets Syst 159(18):2337–2352
Ishibuchi H, Nakashima T (2001) Effect of rule weights in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 9(4):506–515
Jackson P (1999) Introduction to expert systems. Addison Wesley Pub Co Inc., Edinburgh Gate
Kruse R, Gebhardt J, Palm R (1994) Fuzzy systems in computer science. Verlag Vieweg, Wiesbaden
Kuncheva L (2000) Fuzzy classifier design. Physica-Verlag, Heidelberg
Lemos A, Caminhas W, Gomide F (2012) Adaptive fault detection and diagnosis using an evolving fuzzy classifier. Inf Sci. (in press). doi:10.1016/j.ins.2011.08.030
Leng G, McGinnity T, Prasad G (2005) An approach for on-line extraction of fuzzy rules using a self-organising fuzzy neural network. Fuzzy Sets Syst 150(2):211–243
Lewis D, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the 11th international conference on machine learning. New Brunswick, New Jersey, pp 148–156
Lughofer E (2011) All-pairs evolving fuzzy classifiers for on-line multi-class classification problems. In: Proceedings of the EUSFLAT 2011 conference. Elsevier, Aix-Les-Bains, France, pp 372–379
Lughofer E (2011) Evolving fuzzy systems—methodologies, advanced concepts and applications. Springer, Berlin
Lughofer E (2011) On-line incremental feature weighting in evolving fuzzy classifiers. Fuzzy Sets Syst 163(1):1–23
Lughofer E, Angelov P, Zhou X (2007) Evolving single- and multi-model fuzzy classifiers with FLEXFIS-Class. In: Proceedings of FUZZ-IEEE 2007. London, UK, pp 363–368
Lughofer E, Bouchot JL, Shaker A (2011) On-line elimination of local redundancies in evolving fuzzy systems. Evol Syst 2(3):165–187
Lughofer E, Smith JE, Caleb-Solly P, Tahir M, Eitzinger C, Sannen D, Nuttin M (2009) Human-machine interaction issues in quality control based on on-line image classification. IEEE Trans Syst Man Cybern Part A Syst Hum 39(5):960–971
Muslea I (2000) Active learning with multiple views. PhD thesis, University of Southern California
Nauck D, Kruse R (1998) NEFCLASS-X–a soft computing tool to build readable fuzzy classifiers. BT Technol J 16(3):180–190
Oza NC, Russell S (2001) Online bagging and boosting. In: Proceedings of the 8th international workshop on artificial intelligence and statistics 2001 (AI and STATISTICS 2001). Morgan Kaufmann, Key West, Florida, pp 105–112
Pang S, Ozawa S, Kasabov N (2005) Incremental linear discriminant analysis for classification of data streams. IEEE Trans Syst Men Cybern Part B Cybern 35(5):905–914
Roy N, Mccallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th international conference on machine learning. Morgan Kaufmann, pp 441–448
Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization and beyond. MIT Press, London
Sculley D (2007) Online active learning methods for fast label efficient spam filtering. In: Proceedings of the fourth conference on email and AntiSpam. Mountain View, California
Settles B (2010) Active learning literature survey. Technical report, Computer Sciences Technical Report 1648, University of Wisconsin Madison
Shilton A, Palaniswami M, Ralph-D D, Tsoi A-C (2005) Incremental training of support vector machines. IEEE Trans Neural Netw 16(1):114–131
Thompson C, Califf M, Mooney R (1999) Active learning for natural language parsing and information extraction. In: Proceedings of 16th international conference on machine learning. Bled, Slovenia, pp 406–414
Tong S, Koller D (2001) Support vector machine active learning with application to text classification. J Mach Learn Res 2:45–66
Tuia D, Volpi M, Copa L, Kanevski M, Muñoz-Marí J (2011) A survey of active learning algorithms for supervised remote sensing image classification. IEEE J Sel Topics Signal Process 5(3):606–617
Utgoff P (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186
Vapnik V (1998) Statistical learning theory. Wiley, New York
Varma M, Zisserman A (2004) Unifying statistical texture classification frameworks. Image Vis Comput 22:1175–1183
Zvarova J (2006) Decision support systems in medicine. In: Zielinski K, Duplaga M, Ingram D (eds) Information technology solutions for healthcare. Springer, London, pp 182–204
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was funded by the Austrian fund for promoting scientific research (FWF, contract number I328-N23, acronym IREFS). This publication reflects only the authors’ views.
Appendix
Appendix
Here we provide the pseudo-code for our single-pass active learning scheme using evolving fuzzy classifiers with single and multi-model (all-pairs) architecture.
Algorithm 1 Single-Pass Active Learning for Binary Classification Problems
-
1.
Input: Current evolved classifier \({\mathbb{C}}\) [rules in the form (4)] with C rules, class frequency matrix h (yields conf ij ), current accumulated accuracy of the classifier Acc(N − 1) (Acc(0) = 0), flag denoting whether ignorance is used (=1) or not (=0), thr as threshold for conflict level; new input query sample \({\bf x}_N\) at time instance N.
-
2.
Predict class label of \({\bf x}_N, L, \) after (9) using \({\mathbb{C}. }\)
-
3.
Obtain the confidence=conflict degree \(conf_L \in [0.5,1]\) of the prediction after (7).
-
4.
IF flag==1
-
5.
ELSE
-
(a)
ign s = 0
-
(a)
-
6.
END IF
-
7.
IF ign s == 1 OR conf L < thr
-
(a)
Get the real class label of \({\bf x}_N, y_N. \)
-
(b)
Update the accumulated one-step-ahead accuracy Acc(N) by (25).
-
(c)
UpdateClassifier(\({\mathbb{C}, h, \{{\bf x}_N,y_N\}}\)).
-
(a)
-
8.
(From time to time evaluate classifier with separate test set.)
-
9.
Output: Updated classifier \({\mathbb{C}}\) (old classifier in case when active learning suggests no update) and its current accuracy Acc(N).
For on-line simulations with batch stored data sets, obtaining the real class label in Step 7 (a) is for free as usually the real class labels are stored along the feature vectors contained in the data sets (which is also the case for all data sets we have used in our experimental study). For real on-line applications and installations, the class labels often need to be obtained by feedback from operators. The condition in Step 7 tries to reduce the number of feedbacks as much as possible (as discussed throughout Sect. 4 and verified in Sect. 5). Step 8 is optional and for the purpose to validate the active learning approach by inspecting the accuracy trend line over time on a separate test data set from the same application—as is used in Sect. 5.2.3 when comparing with a state-of-the-art batch active learning approach.
Algorithm 2 Single-Pass Active Learning for Multi-Class Classification Problems
-
1.
Input: Current evolved classifiers \({\mathbb{C}_{k,l},l>k}\) for each class pair (k, l) after (12), each one with C k,l rules in the form (4); class frequency matrices h k,l , current accumulated accuracy of the classifier Acc(N − 1) (Acc(0) = 0), flag denoting whether ignorance is used (=1) or not (=0), thr con as threshold for conflict level; new input query sample \({\bf x}_N\) at time instance N.
-
2.
FOR all class pairs (k, l) with l > k
-
3.
END FOR
-
4.
Predict class label of \({\bf x}_N, L, \) after the scoring formula in (14) using conf * k,l (instead of conf k,l ).
-
5.
IF flag==1
-
6.
ELSE
-
(a)
ign s = 0
-
(a)
-
7.
END IF
-
8.
IF ign s == 1 OR Condition (16) holds (using conf * k,l in the first part and conf k,l in the second part)
-
(a)
Get the real class label of \({\bf x}_N, y_N. \)
-
(b)
Update the accumulated one-step-ahead accuracy Acc(N) by (25).
-
(c)
For l < y N UpdateClassifier(\({\mathbb{C}_{l,y_N}; h_{l,y_N}; \{{\bf x}_N,2\}}\)).
-
(d)
For l > y N UpdateClassifier(\({\mathbb{C}_{y_N,l}; h_{y_N,l}; \{{\bf x}_N,1\}}\)).
-
(a)
-
9.
(From time to time evaluate classifier with separate test set.)
-
10.
Output: Updated classifiers \({\mathbb{C}_{k,l}}\) (old classifiers in case when active learning suggests no update) and the current overall accuracy Acc(N).
Please note that if y N belongs to the higher class label in the pair (l, y N ) (Step 8 (c)), the binary classifier \({\mathbb{C}_{l,y_N}}\) is updated with class #2, otherwise \({\mathbb{C}_{y_N,l}}\) is updated with class #1 (Step 8 (d)), thus always only updating the upper right triangular matrix of binary classifiers (the lower left matrix is assumed to produce reciprocal preference degrees conf).
The routine ’UpdateClassifier’ is outlined in Algorithm 3.
Algorithm 3 Routine Update Classifier(\({\mathbb{C}, h, \{{\bf x},y\}}\))
-
1.
Input: Current classifier \({\mathbb{C}}\) with C rules, frequency class matrix h, new input query sample \({\bf x}\) with real class label y.
-
2.
IF \({\bf x}\) fulfills rule evolution criterion
-
(a)
Evolve a new rule (usually, the center of the new rule is set to the current sample \(c_C={\bf x}\) and its spread to some initial value).
-
(b)
C = C + 1.
-
(c)
Append row in frequency class matrix with h C,y = 1, h C,k = 0 for k ≠ y.
-
(a)
-
3.
ELSE
-
(a)
Obtain index of rule with maximal activation level \(i* = {\text{argmax}}_{1} \le i \le C\,\mu _i(\vec{x})\).
-
(b)
Increment actual class frequency matrix entry \(h_{i*,y} = h_{i*,y} + 1\).
-
(c)
Update rule(s) of the classifier \(\mathbb{C}\).
-
(a)
-
4.
Output: Updated classifier \({\mathbb{C}}\) and updated class frequency matrix h.
Regarding the concrete rule evolution criterion (Step 2) and updating the current rule(s) of the classifier [Step 3 (c)], any evolving fuzzy classification method from literature can be used, see Lughofer (2011) for a survey. In all our experiments in Sect. 5, we used the FLEXFIS-Class approach, which is based on incremental vector quantization employing vigilance concept, see Lughofer (2011) for details.
Rights and permissions
About this article
Cite this article
Lughofer, E. Single-pass active learning with conflict and ignorance. Evolving Systems 3, 251–271 (2012). https://doi.org/10.1007/s12530-012-9060-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-012-9060-7