Advertisement

Nonparametric Independence Tests: Space Partitioning and Kernel Approaches

  • Arthur Gretton
  • László Györfi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5254)

Abstract

Three simple and explicit procedures for testing the independence of two multi-dimensional random variables are described. Two of the associated test statistics (L 1, log-likelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernel-based independence measure. All tests reject the null hypothesis of independence if the test statistics become large. The large deviation and limit distribution properties of all three test statistics are given. Following from these results, distribution-free strong consistent tests of independence are derived, as are asymptotically α-level tests. The performance of the tests is evaluated experimentally on benchmark data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Biau, G., Györfi, L.: On the asymptotic properties of a nonparametric l 1-test statistic of homogeneity. IEEE Trans. Inform. Theory 51, 3965–3973 (2005)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Györfi, L., van der Meulen, E.C.: A consistent goodness of fit test based on the total variation distance. In: Roussas, G. (ed.) Nonparametric Functional Estimation and Related Topics, pp. 631–645. Kluwer Academic Publishers, Dordrecht (1990)Google Scholar
  3. 3.
    Beirlant, J., Györfi, L., Lugosi, G.: On the asymptotic normality of the l 1- and l 2-errors in histogram density estimation. Canad. J. Statist. 22, 309–318 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Györfi, L., Vajda, I.: Asymptotic distributions for goodness of fit statistics in a sequence of multinomial models. Stat. Prob. Lett. 56, 57–67 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Dembo, A., Peres, Y.: A topological criterion for hypothesis testing. Ann. Statist. 22, 106–117 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Read, T., Cressie, N.: Goodness-Of-Fit Statistics for Discrete Multivariate Analysis. Springer, New York (1988)Google Scholar
  7. 7.
    Rosenblatt, M.: A quadratic measure of deviation of two-dimensional density estimates and a test of independence. The Annals of Statistics 3, 1–14 (1975)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Feuerverger, A.: A consistent test for bivariate dependence. International Statistical Review 61, 419–433 (1993)zbMATHCrossRefGoogle Scholar
  9. 9.
    Kankainen, A.: Consistent Testing of Total Independence Based on the Empirical Characteristic Function. PhD thesis, University of Jyväskylä (1995)Google Scholar
  10. 10.
    Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 63–78. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.: A kernel statistical test of independence. In: NIPS 20 (2008)Google Scholar
  12. 12.
    Hoeffding, W.: A nonparametric test for independence. The Annals of Mathematical Statistics 19, 546–557 (1948)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Blum, J.R., Kiefer, J., Rosenblatt, M.: Distribution free tests of independence based on the sample distribution function. Ann. Math. Stat. 32, 485–498 (1961)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Gretton, A., Györfi, L.: Consistent nonparametric tests of independence. Technical Report 172, MPI for Biological Cybernetics (2008)Google Scholar
  15. 15.
    Beirlant, J., Devroye, L., Györfi, L., Vajda, I.: Large deviations of divergence measures on partitions. J. Statist. Plan. Inference 93, 1–16 (2001)zbMATHCrossRefGoogle Scholar
  16. 16.
    Kallenberg, W.C.M.: On moderate and large deviations in multinomial distributions. Annals of Statistics 13, 1554–1580 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Quine, M., Robinson, J.: Efficiencies of chi-square and likelihood ratio goodness-of-fit tests. Ann. Statist. 13, 727–742 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Fukumizu, K., Gretton, A., Sun, X., Schölkopf, B.: Kernel measures of conditional dependence. In: NIPS 20 (2008)Google Scholar
  19. 19.
    Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Lanckriet, G.R.G., Schölkopf, B.: Injective Hilbert space embeddings of probability measures. In: COLT, pp. 111–122 (2008)Google Scholar
  20. 20.
    Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research 2, 67–93 (2001)CrossRefMathSciNetGoogle Scholar
  21. 21.
    McDiarmid, C.: On the method of bounded differences. In: Survey in Combinatorics, pp. 148–188. Cambridge University Press, Cambridge (1989)Google Scholar
  22. 22.
    Hall, P.: Central limit theorem for integrated square error of multivariate nonparametric density estimators. Journal of Multivariate Analysis 14, 1–16 (1984)zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Cotterill, D.S., Csörgő, M.: On the limiting distribution of and critical values for the Hoeffding, Blum, Kiefer, Rosenblatt independence criterion. Statistics and Decisions 3, 1–48 (1985)zbMATHMathSciNetGoogle Scholar
  24. 24.
    Beirlant, J., Mason, D.M.: On the asymptotic normality of l p-norms of empirical functionals. Math. Methods Statist. 4, 1–19 (1995)zbMATHMathSciNetGoogle Scholar
  25. 25.
    Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3, 1–48 (2002)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Arthur Gretton
    • 1
  • László Györfi
    • 2
  1. 1.MPI for Biological CyberneticsTübingenGermany
  2. 2.Budapest University of Technology and EconomicsBudapestHungary

Personalised recommendations