Knowledge and Information Systems

, Volume 60, Issue 2, pp 591–615 | Cite as

Survey of distance measures for quantifying concept drift and shift in numeric data

  • Igor GoldenbergEmail author
  • Geoffrey I. Webb
Survey Paper


Deployed machine learning systems are necessarily learned from historical data and are often applied to current data. When the world changes, the learned models can lose fidelity. Such changes to the statistical properties of data over time are known as concept drift. Similarly, models are often learned in one context, but need to be applied in another. This is called concept shift. Quantifying the magnitude of drift or shift, especially in the context of covariate drift or shift, or unsupervised learning, requires use of measures of distance between distributions. In this paper, we survey such distance measures with respect to their suitability for estimating drift and shift magnitude between samples of numeric data.


Multivariate concept drift Mahalanobis distance Hotelling distance Hellinger distance Kullback–Leibler divergence 


  1. 1.
    Adell JA, Jodrá P (2006) Exact kolmogorov and total variation distances between some familiar discrete distributions. J Inequal Appl 1:1–8MathSciNetzbMATHGoogle Scholar
  2. 2.
    Bartlett M (1935) The effect of non-normality on the t distribution. In: Mathematical proceedings of the Cambridge philosophical society, vol 31. Cambridge University Press, pp 223–231Google Scholar
  3. 3.
    Beirlant J, Devroye L, Györfi L, Vajda I (2001) Large deviations of divergence measures on partitions. J Stat Plan Inference 93(1):1–16MathSciNetzbMATHGoogle Scholar
  4. 4.
    Brereton RG (2015) The mahalanobis distance and its relationship to principal component scores. J Chemom 29(3):143–145Google Scholar
  5. 5.
    Cha SH (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1(2):1MathSciNetGoogle Scholar
  6. 6.
    Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Discov 24(1):136–158. MathSciNetzbMATHGoogle Scholar
  7. 7.
    De Maesschalck R, Jouan-Rimbaud D, Massart DL (2000) The mahalanobis distance. Chemomet Intell Lab Syst 50(1):1–18Google Scholar
  8. 8.
    Duchi J (2007) Derivations for linear algebra and optimization. California, BerkeleyGoogle Scholar
  9. 9.
    Gama J, Žliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44zbMATHGoogle Scholar
  10. 10.
    Grigelionis B (2013) Student’s T-distribution and related stochastic processes. Springer, BerlinzbMATHGoogle Scholar
  11. 11.
    Higham NJ (1988) Computing a nearest symmetric positive semidefinite matrix. Linear Algebra Appl 103:103–118MathSciNetzbMATHGoogle Scholar
  12. 12.
    Hitchcock FL (1941) The distribution of a product from several sources to numerous localities. Stud Appl Math 20(1–4):224–230MathSciNetzbMATHGoogle Scholar
  13. 13.
    Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1(1):89–101Google Scholar
  14. 14.
    Hotelling H (1931) The generalization of student’s ratio. Ann Math Stat 360–378.,
  15. 15.
    Jia R, Koh YS, Dobbie G (2017) Predicting concept drift severity. In: Workshop on learning in the presence of class imbalance and concept drift (LPCICD’17)Google Scholar
  16. 16.
    Joyce JM (2011) Kullback–Leibler divergence. Springer, Berlin, pp 720–722Google Scholar
  17. 17.
    Justel A, Peña D, Zamar R (1997) A multivariate Kolmogorov–Smirnov test of goodness of fit. Stat Probab Lett 35(3):251–259MathSciNetzbMATHGoogle Scholar
  18. 18.
    Kalpić D, Hlupić N, Lovrić M (2011) Students t-tests. Springer, Berlin, pp 1559–1563Google Scholar
  19. 19.
    Kennedy J, Quine M (1989) The total variation distance between the binomial and poisson distributions. Ann Probab 17(1):396–400MathSciNetzbMATHGoogle Scholar
  20. 20.
    Kosina P, Gama J, Sebastiao R (2010) Drift severity metric. In: ECAI, pp 1119–1120Google Scholar
  21. 21.
    Lilliefors HW (1967) On the kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 62(318):399–402Google Scholar
  22. 22.
    MacKay DJ (2003) Information theory, inference and learning algorithms. Cambridge University Press, CambridgezbMATHGoogle Scholar
  23. 23.
    Markowski CA, Markowski EP (1990) Conditions for the effectiveness of a preliminary test of variance. Am Stat 44(4):322–326Google Scholar
  24. 24.
    Mason RL, Young JC (2002) Multivariate statistical process control with industrial applications. SIAM, University CityzbMATHGoogle Scholar
  25. 25.
    Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78zbMATHGoogle Scholar
  26. 26.
    McAssey MP (2013) An empirical goodness-of-fit test for multivariate distributions. J Appl Stat 40(5):1120–1131MathSciNetGoogle Scholar
  27. 27.
    Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742. Google Scholar
  28. 28.
    Pratt JW, Gibbons JD (1981) Kolmogorov–Smirnov two-sample tests. Springer, New York, pp 318–344. Google Scholar
  29. 29.
    Qahtan AA, Alharbi B, Wang S, Zhang X (2015) A pca-based change detection framework for multidimensional data streams: change detection in multidimensional data streams. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 935–944Google Scholar
  30. 30.
    Reschenhofer E (1997) Generalization of the Kolmogorov–Smirnov test. Comput Stat Data Anal 24(4):433–441zbMATHGoogle Scholar
  31. 31.
    Rice J (2006) Mathematical statistics and data analysis. Nelson Education, ScarboroughGoogle Scholar
  32. 32.
    Rizzo ML, Székely GJ (2016) Energy distance. Wiley Interdiscip Rev Comput Stat 8(1):27–38MathSciNetGoogle Scholar
  33. 33.
    Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40(2):99–121zbMATHGoogle Scholar
  34. 34.
    Ruxton GD (2006) The unequal variance t-test is an underused alternative to student’s t-test and the mannwhitney u test. Behav Ecol 17(4):688–690Google Scholar
  35. 35.
    Steerneman T (1983) On the total variation and hellinger distance between signed measures; an application to product measures. Proc Am Math Soc 88(4):684–688MathSciNetzbMATHGoogle Scholar
  36. 36.
    Szekely GJ (1989) Potential and kinetic energy in statistics. Lecture Notes, Budapest InstituteGoogle Scholar
  37. 37.
    Wang F, Guibas L (2012) Supervised earth movers distance learning and its computer vision applications. Comput Vis ECCV 2012:442–455Google Scholar
  38. 38.
    Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994. MathSciNetzbMATHGoogle Scholar
  39. 39.
    Webb GI, Lee LK, Goethals B, Petitjean F (2018) Analyzing concept drift and shift from sample data. Data Min Knowl Discov 2018:1–21MathSciNetGoogle Scholar
  40. 40.
    Weisstein EW (2007) Metric. From math world—a wolfram web resource.

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Information TechnologyMonash UniversityClaytonAustralia

Personalised recommendations