Advertisement

Controlled Permutations for Testing Adaptive Classifiers

  • Indrė Žliobaitė
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6926)

Abstract

We study evaluation of online classifiers that are designed to adapt to changes in data distribution over time (concept drift). A standard procedure to evaluate such classifiers is the test-then-train, which iteratively uses the incoming instances for testing and then for updating a classifier. Comparing classifiers based on such a test risks to give biased results, since a dataset is processed only once in a fixed sequential order. Such a test concludes how well classifiers adapt when changes happen at fixed time points, while the ultimate goal is to assess how well they would adapt when changes of a similar type happen unexpectedly. To reduce the risk of biased evaluation we propose to run multiple tests with permuted data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning problem. We develop three permutation techniques with theoretical control mechanisms that ensure that different distributions in data are preserved while perturbing the data order. The idea is to manipulate blocks of data keeping individual instances close together. Our permutations reduce the risk of biased evaluation by making it possible to analyze sensitivity of classifiers to variations in the data order.

Keywords

Concept Drift Evaluation Bias Original Order Data Order Machine Learn Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aldous, D., Diaconis, P.: Shuffling cards and stopping times. The American Mathematical Monthly 93(5), 333–348 (1986)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Antoch, J., Huskova, M.: Permutation tests in change point analysis. Statistics and Probability Letters 53, 37–46 (2001)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Atkinson, M.: Restricted permutations. Discrete Math. 195, 27–38 (1999)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Bach, S., Maloof, M.: A bayesian approach to concept drift. In: Advances in Neural Information Processing Systems 23 (NIPS), pp. 127–135 (2010)Google Scholar
  5. 5.
    Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Gavalda, R., Morales-Bueno, R.: Early drift detection method. In: Proc. of ECML/PKDD Workshop on Knowledge Discovery from Data Streams (2006)Google Scholar
  6. 6.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)Google Scholar
  7. 7.
    Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6321, pp. 135–150. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R.: New ensemble methods for evolving data streams. In: Proc. of the 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2009), pp. 139–148 (2009)Google Scholar
  9. 9.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetMATHGoogle Scholar
  10. 10.
    Diaconis, P.: Group representations in probability and statistics. Lecture Notes–Monograph Series, vol. 11. Hayward Inst. of Mathematical Statistics (1988)Google Scholar
  11. 11.
    Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)CrossRefGoogle Scholar
  12. 12.
    Durrett, R.: Shuffling chromosomes. J. of Theor. Probability 16(3), 725–750 (2003)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Gama, J., Sebastiao, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 329–338 (2009)Google Scholar
  15. 15.
    Harries, M.: Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales (1999)Google Scholar
  16. 16.
    Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data (2010)Google Scholar
  17. 17.
    Masud, M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., Thuraisingham, B.: Addressing concept-evolution in concept-drifting data streams. In: Proc. of the 10th IEEE Int. Conf. on Data Mining, ICDM 2010 (2010)Google Scholar
  18. 18.
    Ojala, M., Garriga, G.: Permutation tests for studying classifier performance. Journal of Machine Learning Research 11, 1833–1863 (2010)MathSciNetMATHGoogle Scholar
  19. 19.
    Pemantle, R.: Randomization time for the overhand shuffle. J. of Theoretical Probability 2(1), 37–49 (1989)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 90–99. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Politis, D.: The impact of bootstrap methods on time series analysis. Statistical Science 18(2), 219–230 (2003)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Schiavinotto, T., Stutzle, T.: A review of metrics on permutations for search landscape analysis. Computers and Operations Research 34(10), 3143–3153 (2007)CrossRefMATHGoogle Scholar
  23. 23.
    Sorensen, K.: Distance measures based on the edit distance for permutation-type representations. Journal of Heuristics 13(1), 35–47 (2007)CrossRefGoogle Scholar
  24. 24.
    Welch, W.: Construction of permutation tests. Journal of the American Statistical Association 85(411), 693–698 (1990)CrossRefGoogle Scholar
  25. 25.
    Zliobaite, I.: Controlled permutations for testing adaptive classifiers. Technical report (2011), https://sites.google.com/site/zliobaite/permutations

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Indrė Žliobaitė
    • 1
  1. 1.Smart Technology Research CenterBournemouth UniversityPooleUK

Personalised recommendations