Abstract
Although memory-based classifiers offer robust classification performance, their widespread usage on embedded devices is hindered due to the device’s limited memory resources. Moreover, embedded devices often operate in an environment where data exhibits evolutionary changes which entails frequent update of the in-memory training data. A viable option for dealing with the memory constraint is to use Exemplar Learning (EL) schemes that learn a small memory set (called the exemplar set) of high functional information that fits in memory. However, traditional EL schemes have several drawbacks that make them inapplicable for embedded devices; (1) they have high memory overheads and are unable to handle incremental updates to the exemplar set, (2) they cannot be customized to obtain exemplar sets of any user-defined size that fits in the memory and (3) they learn exemplar sets based on local neighborhood structures that do not offer robust classification performance. In this paper, we propose two novel EL schemes, \(\mathsf{EBEL}\) (Entropy-Based Exemplar Learning) and \(\mathsf{ABEL}\) (AUC-Based Exemplar Learning) that overcome the aforementioned short-comings of traditional EL algorithms. We show that our schemes efficiently incorporate new training datasets while maintaining high quality exemplar sets of any user-defined size. We present a comprehensive experimental analysis showing excellent classification-accuracy versus memory-usage tradeoffs using our proposed methods.
Article PDF
Similar content being viewed by others
References
Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37–66.
Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/mlrepository.html.
Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537–550.
Brighton, H., & Mellish, C. (2002). Advances in instance selection for instance-based learning algorithms. Data Mining Knowledge Discovery, 6(2), 153–172.
Cano, J. R., Herrera, F., & Lozano, M. (2003). Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Transactions on Evolutionary Computation, 7(6), 561–575.
Cortes, C., & Mohri, M. (2004). AUC optimization vs. error rate minimization. In Advances in neural information processing systems. Cambridge: MIT Press.
Fawcett, T. (2004). ROC graphs: notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Labs, Palo Alto, CA.
Fukunaga, K., & Hayes, R. R. (1989). The reduced Parzen classifier. IEEE Transactions Pattern Analysis and Machine Intelligence, 11(4), 423–425.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.
Hastie, T., Tibshirani, R., & Friedman, J. (Eds.) (2001). Springer series in statistics. The elements of statistical learning: data mining, inference, and prediction. Berlin: Springer.
Jankowski, N., & Grochowski, M. (2005). Instances selection algorithms in the conjunction with LVQ. In Artificial intelligence and applications (pp. 703–708).
Keogh, E., & Folias, T. (2002). The UCR time series data mining archive. Riverside CA. University of California, Computer Science & Engineering Department.
Kwak, N., & Choi, C.-H. (2002). Input feature selection by mutual information based on Parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1667–1671.
Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3), 1065–1076.
Pkalska, E., Duin, R. P. W., & Paclík, P. (2006). Prototype selection for dissimilarity-based classifiers. Pattern Recognition, 39(2), 189–208.
Smyth, B., & McKenna, E. (1999). Building compact competent case-bases. In ICCBR’99: proceedings of the third international conference on case-based reasoning and development (pp. 329–342). London: Springer.
Thomasian, A. (1962). Review of ‘transmission of information, a statistical theory of communications’ (Fano, R. M., 1961). IEEE Transactions on Information Theory, 8(1), 68–69.
Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38(3), 257–286.
Wolf, W. (2002). What is embedded computing? Computer, 35(1), 136–137.
Xi, X., Keogh, E., Shelton, C., Wei, L., & Ratanamahatana, C. A. (2006). Fast time series classification using numerosity reduction. In ICML’06: proceedings of the 23rd international conference on machine learning (pp. 1033–1040). New York: ACM.
Yan, L., Dodier, R., Mozer, M. C., & Wolniewicz, R. (2003). Optimizing classifier performance via approximation to the Wilcoxon-Mann-Witney statistic. In Proceedings of the 20th international conference on machine learning (pp. 848–855). Menlo Park: AAAI Press.
Zhu, J., & Yang, Q. (1999). Remembering to add: Competence-preserving case-addition policies for case base maintenance. In IJCAI ’99: proceedings of the sixteenth international joint conference on artificial intelligence (pp. 234–241). San Francisco: Morgan Kaufmann.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Walter Daelemans, Bart Goethals, Katharina Morik.
Rights and permissions
About this article
Cite this article
Jain, A., Nikovski, D. Incremental exemplar learning schemes for classification on embedded devices. Mach Learn 72, 189–203 (2008). https://doi.org/10.1007/s10994-008-5067-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5067-5