Hypothesis testing for topological data analysis

Abstract

Persistence homology is a vital tool for topological data analysis. Previous work has developed some statistical estimators for characteristics of collections of persistence diagrams. However, tools that provide statistical inference for observations that are persistence diagrams are limited. Specifically, there is a need for tests that can assess the strength of evidence against a claim that two samples arise from the same population or process. This expository paper provides an introduction to randomization-style null hypothesis significance tests (NHST) and shows how they can be used with sets of persistence diagrams. The hypothesis test is based on a loss function that comprises pairwise distances between the elements of each sample and all the elements in the other sample. We use this method to analyze a range of simulated and experimental data. Through these examples we experimentally explore the power of the p-values. Our results show that the randomization-style NHST based on pairwise distances can distinguish between samples from different processes, which suggests that its use for hypothesis tests upon persistence diagrams is reasonable. We demonstrate its application on a real dataset of fMRI data of patients with ADHD.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    The reader should note that in Turner et al. (2014b) the focus is on the L1 distance.

References

  1. Baddeley, A., Silverman, B.: A cautionary example on the use of second-order methods for analyzing point patterns. Biometrics. 40(4), 1089–1093 (1984)

    MathSciNet  Article  Google Scholar 

  2. Baddeley, A., Turner, R., et al.: Spatstat: an R package for analyzing spatial point patterns. J. Stat. Softw. 12(6), 1–42 (2005)

    Article  Google Scholar 

  3. Balakrishnan, S., Fasy, B., Lecci, F., Rinaldo, A., Singh, A., and Wasserman, L.: Statistical inference for persistent homology (2013). arXiv:1303.7117

  4. Bendich, P., Edelsbrunner, H., Kerber, M.: Computing robustness and persistence for images. Vis. Comput. Graph. IEEE Trans. 16(6), 1251–1260 (2010)

    Article  Google Scholar 

  5. Berger, J.: Could Fisher, Jeffreys and Neyman have agreed on testing? Stat. Sci. 18(1), 1–32 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  6. Biscio, C., Møller, J.: The accumulated persistence function, a new useful functional summary statistic for topological data analysis, with a view to brain artery trees and spatial point process applications. (2016). arXiv:1611.00630

  7. Bubenik, P.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16(1), 77–102 (2015)

    MathSciNet  MATH  Google Scholar 

  8. Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Homol. Homotopy Appl. 9(2), 337–362 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  9. Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press, Belmont (1990)

    Google Scholar 

  10. Cericola, C., Johnson, I., Kiers, J., Krock, M., Purdy, J., Torrence, J. Extending hypothesis testing with persistence homology to three or more groups. (2016). arXiv:1602.03760

  11. Cerri, A., Ferri, M., Giorgi, D.: Retrieval of trademark images by means of size functions. Graph. Models 68(5), 451–471 (2006)

    Article  Google Scholar 

  12. Chazal, F., Glisse, M., Labruère, C., Michel, B. Optimal rates of convergence for persistence diagrams in topological data analysis. (2013). arXiv:1305.6239

  13. Edgington, E. S., Onghena, P.: Randomization Tests, 4th edn. Chapman & Hall/CRC, Boca Raton (2007)

  14. Ellis, S. P., Klein, A. Describing high-order statistical dependence using “concurrence topology”, with application to functional mri brain data. (2012). arXiv:1212.1642

  15. Gamble, J., Heo, G.: Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Anal. 101(9), 2184–2199 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  16. Gao, J. X.: Visionlab. WWW. (2004). http://visionlab.uta.edu/shape_data.htm

  17. Hatcher, A.: Algebraic topology. Cambridge University Press (2002)

  18. Latecki, L. J., Lakamper, R., Eckhardt, T.: Shape descriptors for non-rigid shapes with a single closed contour. In: Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, vol. 1, pp. 424–429. IEEE (2000)

  19. Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Probl. 27(12), 124007 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  20. Pawitan, Y. : In All Likelihood: Statistical Modelling and Inference Using Likelihood. Clarendon Press, Oxford (2001)

  21. Phipson, B., Smyth, G.K.: Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol. 9(1) (2010)

  22. Robins, V., Turner, K.: Principal component analysis of persistent homology rank functions with case studies of spatial point patterns, sphere packing and colloids. (2015). arXiv:1507.01454

  23. Sikora, T.: The MPEG-7 visual standard for content description-an overview. Circuits Syst. Video Technol. IEEE Trans. 11(6), 696–702 (2001)

    Article  Google Scholar 

  24. Turner, K. Means and medians of sets of persistence diagrams. (2013). arXiv:1307.8300

  25. Turner, K., Mileyko, Y., Mukherjee, S., Harer, J. Fréchet means for distributions of persistence diagrams. Discret. Comput. Geom. 52(1), 44–70 (2014a)

  26. Turner, K., Mukherjee, S., Boyer, D.M.: Persistent homology transform for modeling shapes and surfaces. Inf. Inference 3(4), 310–344 (2014b)

    MathSciNet  Article  Google Scholar 

  27. Welsh, A.H.: Aspects of Statistical Inference. Wiley, New York (1996)

    Google Scholar 

Download references

Acknowledgements

We thank Steve Ellis and Arno Klein for providing us with the persistence diagrams produced in their work. The authors would like to acknowledge the assistance of the Defence Science Institute in facilitating this work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Katharine Turner.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Robinson, A., Turner, K. Hypothesis testing for topological data analysis. J Appl. and Comput. Topology 1, 241–261 (2017). https://doi.org/10.1007/s41468-017-0008-7

Download citation

Keywords

  • Persistence diagram
  • Permutation test
  • Null hypothesis test
  • Topological data analysis

Mathematics Subject Classification

  • 62G10
  • 62G09
  • 55N35