Abstract
We introduce a new model for planar point processes, with the aim of capturing the structure of point interaction and spread in persistence diagrams. Persistence diagrams themselves are a key tool of topological data analysis (TDA), crucial for the delineation and estimation of global topological structure in large data sets. To a large extent, the statistical analysis of persistence diagrams has been hindered by difficulties in providing replications, a problem that was addressed in an earlier paper, which introduced a procedure called replicating statistical topology (RST). Here we significantly improve on the power of RST via the introduction of a more realistic class of models for the persistence diagrams. In addition, we introduce to TDA the idea of bagplotting, a powerful technique from non-parametric statistics well adapted for differentiating between topologically significant points, and noise, in persistence diagrams. Outside the setting of TDA, our model provides a setting for fashioning point processes, in any dimension, in which both local interactions between the points, along with global restraints on the overall, global, shape of the point cloud, are important and perhaps competing.
Similar content being viewed by others
Notes
Originally, “al freír de los huevos lo verá” (you will see when the eggs are fried). See de Cervantes (1605).
References
Adcock, A., Carlsson, E., Carlsson, G.: The ring of algebraic functions on persistence bar codes. Homol. Homotopy Appl. 18(1), 381–402 (2016). https://doi.org/10.4310/HHA.2016.v18.n1.a21
Adler, R., Bobrowski, O., Weinberger, S.: Crackle: the homology of noise. Discrete Comput. Geom. 52, 680–704 (2014)
Adler, R.J., Taylor, J.E.: Random Fields and Geometry. Springer Monographs in Mathematics. Springer, New York (2007)
Adler, R.J., Taylor, J.E.: Topological Complexity of Smooth Random Functions, volume 2019 of Lecture Notes in Mathematics. Springer, Heidelberg (2011). ISBN 978-3-642-19579-2. Lectures from the 39th Probability Summer School held in Saint-Flour, 2009, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School]
Adler, R.J., Taylor, J.E.: Applications of random fields and geometry: foundations and case studies. (2016). Early (but not always complete). https://robert.net.technion.ac.il/files/2016/08/hrf1.pdf
Adler, R.J., Agami, S., Pranav, P.: Modeling and replicating statistical topology and evidence for CMB nonhomogeneity. Proc Natl Acad Sci 114(45), 11878–11883 (2017)
Agami, S., Adler, R.J.: Modeling of persistent homology (2017). arXiv:1711.01570
Barbarossa, S., Tsitsvero, M.: An introduction to hypergraph signal processing. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6425–6429 (2016 March). https://doi.org/10.1109/ICASSP.2016.7472914
Bardeen, J., Bond, J., Kaiser, N., Szalay, A.: The statistics of peaks of Gaussian random fields. Astrophys. J. 304, 15–61 (1986)
Bendich, P., Marron, J.S., Miller, E., Pieloch, A., Skwerer, S.: Persistent homology analysis of brain artery trees. Ann. Appl. Stat. 10(1), 198–218 (2016). https://doi.org/10.1214/15-AOAS886
Besag, J.: Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–236 (1974). (With discussion by D. R. Cox, A. G. Hawkes, P. Clifford, P. Whittle, K. Ord, R. Mead, J. M. Hammersley, and M. S. Bartlett and with a reply by the author)
Bobrowski, O., Kahle, M.: Topology of random geometric complexes: a survey. J. Appl. Comput. Topol. 1(3), 331–364 (2018). https://doi.org/10.1007/s41468-017-0010-0
Bobrowski, O., Kahle, M., Skraba, P.: Maximally persistent cycles in random geometric complexes. Ann. Appl. Probab. 27(4), 2032–2060 (2017a). https://doi.org/10.1214/16-AAP1232
Bobrowski, O., Mukherjee, S., Taylor, J.E.: Topological consistency via kernel estimation. Bernoulli 23(1), 288–328 (2017b). https://doi.org/10.3150/15-BEJ744
Boissonnat, J.-D., Chazal, F., Yvinec, M.: Geometry and Topological Analysis. Cambridge University Press, Cambridge (2018)
Brooks, S., Gemna, A., Jones, G., Meng, X.-L.: Handbook of Markov Chain Monte Carlo. Chapman and Hall, Boca Raton (2011)
Bubenik, P.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16, 77–102 (2015a)
Bubenik, P.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16(1), 77–102 (2015b)
Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach, second edn. Springer, New York (2002)
Carlsson, G.: Topology and data. Bull. Am. Math. Soc. (N.S.) 46(2), 255–308 (2009)
Carlsson, G.: Topological pattern recognition for point cloud data. Acta Numer. 23, 289–368 (2014). https://doi.org/10.1017/S0962492914000051
Chalmond, B.: Modeling and Inverse Problems in Imaging Analysis, Volume 155 of Applied Mathematical Sciences. Springer, New York (2003). https://doi.org/10.1007/978-0-387-21662-1. (Translated from the French, With a foreword by Henri Maître)
Chazal, F., Fasy, B.T., Lecci, F., Michel, B., Rinaldo, A., Wasserman, L.: Robust topological inference: distance to a measure and kernel distance. J. Mach. Learn. Res. 18, 5845 (2017)
Cheng, D., Schwartzman, A.: Multiple testing of local maxima for detection of peaks in random fields. Ann. Stat. 45(2), 529–556 (2017). https://doi.org/10.1214/16-AOS1458
Cole, A., Shiu, G.: Persistent homology and non-Gaussianity. J. Cosmol. Astropart. Phys. 2018(03), 025 (2018)
Coles, P.: Statistical geometry and the microwave background. Mon. Not. R. Astron. Soc. 234(3), 509–531 (1988)
de Cervantes, M.: l ingenioso hidalgo don Quijote de la Mancha. Valladolid, Spain (1605)
Duong, T.: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J. Stat. Softw. 21(7), 1–16 (2007). https://doi.org/10.18637/jss.v021.i07
Edelsbrunner, H.: A Short Course in Computational Geometry and Topology. Springer Briefs in Applied Sciences and Technology. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05957-0
Edelsbrunner, H., Harer, J.: Persistent homology: a survey. In: Surveys on discrete and computational geometry, volume 453 of Contemporary Mathematics, pp 257–282. American Mathematical Society, Providence, RI (2008). https://doi.org/10.1090/conm/453/08802
Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence, RI (2010)
Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., Singh, A.: Confidence sets for persistence diagrams. Ann. Stat. 42(6), 2301–2339 (2014). https://doi.org/10.1214/14-AOS1252
Feldbrugge, J., van Engelen, M.: Analysis of Betti numbers and persistence diagrams of 2-dimensional Gaussian random fields. BSc. thesis, University of Groningen (2012)
Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.-P., Frith, C.D., Frackowiak, R.S.: Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2(4), 189–210 (1994)
Ghrist, R.: Elementary Applied Topology. Createspace, Scotts Valley (2014)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Data Mining, Inference, and Prediction, second edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Hough, J.B., Krishnapur, M., Peres, Y., Virág, B.: Zeros of Gaussian Analytic Functions and Determinantal Point Processes, Volume 51 of University Lecture Series. American Mathematical Society, Providence (2009)
Ji, C., Seymour, L.: A consistent model selection procedure for markov random fields based on penalized pseudolikelihood. Ann. Appl. Probab. 6(2), 423–443 (1996). https://doi.org/10.1214/aoap/1034968138
Kahle, M.: Topology of random simplicial complexes: a survey. In: Algebraic Topology: Applications and New Directions, Volume 620 of Contemporary Mathematics, pp. 201–221. The American Mathematical Society, Providence, RI (2014). https://doi.org/10.1090/conm/620/12367
Liu, X., Zuo, Y.: Comppd: A matlab package for computing projection depth. J. Stat. Softw. 65(2), 1–21 (2015). https://doi.org/10.18637/jss.v065.i02
Mia, H., der Veeken Stephan, V.: Outlier detection for skewed data. J. Chemom. 22(3–4), 235–246 (2008). https://doi.org/10.1002/cem.1123
Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Probl. 27(12), 124007 (2011). https://doi.org/10.1088/0266-5611/27/12/124007
Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Natl. Acad. Sci. 108(17), 7265–7270 (2011). https://doi.org/10.1073/pnas.1102826108
Oudot, S.: Persistence Theory: From Quiver Representations to Data Analysis, Volume 209 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI (2015). https://doi.org/10.1090/surv/209
Owada, T., Bobrowski, O.: Convergence of persistence diagrams for topological crackle. ArXiv e-prints (2018)
Petri, G., Expert, P., Turkheimer, F., Carhart-Harris, R., Nutt, D., Hellyer, P.J., Vaccarino, F.: Homological scaffolds of brain functional networks. J. R. Soc. Interface 11, 101 (2014). https://doi.org/10.1098/rsif.2014.0873
Pranav, P., Adler, R.J., Buchert, T., Edelsbrunner, H., Jones, B.J.T., Schwartzman, A., Wagner, H., van de Weygaert, R.: Unexpected topology of the temperature fluctuations in the cosmic microwave background. Astron. Astrophys. 627, A163 (2019). https://doi.org/10.1051/0004-6361/201834916
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer Texts in Statistics, 2nd edn. Springer, New York (2004). https://doi.org/10.1007/978-1-4757-4145-2
Robinson, A., Turner, K.: Hypothesis testing for topological data analysis. ArXiv e-prints (2013 Oct)
Rousseeuw, P., Ruts, I., Tukey, J.: The bagplot: a bivariate boxplot. Am. Stat. 53, 382–387 (1999)
Schwartzman, A., Gavrilov, Y., Adler, R.J.: Multiple testing of local maxima for detection of peaks in 1D. Ann. Stat. 39(6), 3290–3319 (2011). https://doi.org/10.1214/11-AOS943
Serfling, R: Depth functions in nonparametric multivariate inference. In: Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, Volume 72 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 1–16. The American Mathematical Society, Providence, RI (2006)
Seymour, L., Ji, C.: Approximate Bayes model selection procedures for Gibbs–Markov random fields. J. Stat. Plan. Inference 51(1), 75–97 (1996). https://doi.org/10.1016/0378-3758(95)00071-2
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London (1986). https://doi.org/10.1007/978-1-4899-3324-9
Sousbie, T.: The persistent cosmic web and its filamentary structure: I. theory and implementation. Mon. Not. R. Astron. Soc. 414(1), 350–383 (2011). https://doi.org/10.1111/j.1365-2966.2011.18394.x
Sousbie, T., Pichon, C., Kawahara, H.: The persistent cosmic web and its filamentary structure: Ii. illustrations. Mon. Not. R. Astron. Soc. 414(1), 384–403 (2011). https://doi.org/10.1111/j.1365-2966.2011.18395.x
Stoehr, J.: Statistical inference methods for Gibbs random fields. Theses, Université Montpellier (2015 Oct). https://hal.archives-ouvertes.fr/tel-01241085
Stoehr, J.: A review on statistical inference methods for discrete markov random fields (2017). https://hal.archives-ouvertes.fr/hal-01462078v2
Tukey, J.W.: Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), vol. 2, pp. 523–531. Canadian Mathematical Congress, Montreal, Quebec (1975)
Turner, K., Mileyko, Y., Mukherjee, S., Harer, J.: Fréchet means for distributions of persistence diagrams. Discrete Comput. Geom. 52(1), 44–70 (2014). https://doi.org/10.1007/s00454-014-9604-7
van de Weygaert, R., Vegter, G., Edelsbrunner, H., Jones, B.J.T., Pranav, P., Park, C., Hellwing, W.A., Eldering, B., Kruithof, N., Bos, E.G.P.P., Hidding, J., Feldbrugge, J., ten Have, E., van Engelen, M., Caroli, M., Teillaud, M.: Alpha, betti and the megaparsec universe: on the topology of the cosmic web. In: Gavrilova, M.L., Tan, C.K., Mostafavi, M.A. (eds.) Transactions on Computational Science XIV, pp. 60–101. Springer, Berlin (2011)
Wand, M.P., Jones, M.C.: Multivariate plug-in bandwidth selection. Comput. Stat. 9(2), 97–116 (1994)
Wasserman, L.: All of Statistics. Springer Texts in Statistics. A Concise Course in Statistical Inference. Springer, New York (2004). https://doi.org/10.1007/978-0-387-21736-9
Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. 5, 501–532 (2018). https://www.annualreviews.org/doi/10.1146/annurevstatistics-031017-100045
Zomorodian, A.: Topology for Computing, Volume 16 of Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge (2005). https://doi.org/10.1017/CBO9780511546945
Acknowledgements
We are grateful to Katherine Turner who, at a conference in Japan, suggested that we should be able to improve on the model of Adler et al. (2017) by incorporating information on the global shape of the persistence diagrams into the model. It was an insightful and useful suggestion. We are also indebted to the incisive comments of a referee, who asked a number of pointed questions and pinned us down on a number of questionable, or at least unproven, claims. As a result, this version of the paper is somewhat longer than the original one, but, hopefully, more precise and somewhat clearer as well.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Robert J. Adler, Sarit Agami: Research supported in part by URSAT: Understanding Random Systems via Algebraic Topology, ERC Advanced Grant 320422 and Israel Science Foundation, Grant 2539/17.
Appendices
Appendices
1.1 Simulating from \({\bar{f}}^G\)
While the MCMC calculations of the paper are well summarised in Algorithm 1 of Sect. 5.4, sampling from the distribution \({\bar{f}}^G(x)\) of (12) itself requires some care. While reasonably straightforward, there are numerical subtleties, and so for completeness we describe the full procedure here.
Recall that \({\bar{f}}^G(x)\) is just the kernel density estimate \({\hat{f}}^G\), restricted to \(\mathbb {R}\times \mathbb {R}_+\), and normalised. To sample from it, we first denote by R the smallest rectangular subset of the half plane which includes the set \(\{x\in \mathbb {R}\times \mathbb {R}_+: {\bar{f}}^G(x)>\varepsilon \}\), for some \(\varepsilon >0\) that is case specific. Then divide R into \(I_1\times I_2\) equal sized rectangles \(I_{ij}\), where \(I_1\) and \(I_2\) are typically of the order of 100, but, again, case specific.
The second step involves assigning probabilities to these rectangles, which, a prior, could be done by integrating \({\bar{f}}^G\) over each one. However, noting the original empirical density \({\hat{f}}^G\) comes from a Gaussian kernel, considerable computational time is saved by first defining its integrated version
where \(\Phi _\Sigma \) is the Gaussian (cumulative) distribution function corresponding to the Gaussian kernel in the definition of \({\hat{f}}^G\). Extend \({\hat{F}}^G\) to a measure on rectangles in the usual way, and define the probabilities (which now sum to 1)
By taking any linear enumeration of the indices (i, j) it is now trivial to chose a rectangle at random, according to these probabilities, by the inverse transform method [e.g. Robert and Casella (2004), Brooks et al. (2011)].
Having chosen a rectangle, we now chose a point uniformly, at random, from it. This is the value \(x^*\) taken for Step 3 of Algorithm 1.
1.2 The model of Adler et al. (2017)
The model originally developed in Adler et al. (2017), as with the one used in the current paper, was a Gibbs distribution, and so can be described through its Hamiltonian, as below, retaining the notation of Sect. 4. we shall do this only for projected persistence diagrams, so that each point x in the diagram is of the form \(x=(x^{(1)},x^{(2)})\in \mathbb {R}\times \mathbb {R}_+\).
Define
where \({\bar{x}}^{(1)}=N^{-1} \sum _{i=1}^N x_i^{(1)}\), so \(\sigma _H^2\) is the variance of the horizontal points. On the other hand, \(\sigma _V^2\) is square of the \(L_2\) norm of the vertical points, rather than the centred variance (because of the non-negativeness of the \(x^{(2)}\)).
For integral \(K> 0\), a collection \(\Theta = (\theta _H,\theta _V,\theta _1,\ldots \theta _K)\) of \(\mathbb {R}\)-valued parameters, and a \(\delta >0\), define the Hamiltonian
With this Hamiltonian replacing the one defined by (8) the remaining development in Adler et al. (2017)—in particular that of an appropriate pseudo-likelihood model—is parallel to that in Sect. 4.
We note though the main differences between the models. The first is the parameter \(\delta \), which limits nearest neighbour interactions only to those neighbours that are closer than \(\delta \). This had a mild numerically stabilising effect in the model defined by (15), that, for reasons that are not entirely clear, disappeared in the model of the current paper. Consequently, we no longer use it. The first two terms in the Hamiltonian, involving second moments, were intended to play the role that the empirical density \({\bar{f}}^G\) plays in the current paper; viz. they controlled the overall shape of the random diagrams, and worked “against” the control resulting from the nearest neighbour interactions. However, as shown by most of the examples in Sect. 6—in particular the Gaussian excursion set and non-concentric circles examples—these terms were not able to capture many of the subtleties found in persistence diagrams. Furthermore, as the MCMC simulations progressed, the simulated diagrams had a tendency to move towards the diagonal, in a fashion that was inconsistent with their overall use.
Rights and permissions
About this article
Cite this article
Adler, R.J., Agami, S. Modelling persistence diagrams with planar point processes, and revealing topology with bagplots. J Appl. and Comput. Topology 3, 139–183 (2019). https://doi.org/10.1007/s41468-019-00035-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41468-019-00035-w
Keywords
- Applied topology
- Persistence diagram
- Random fields
- Gibbs distribution
- Topological inference
- Replicating statistical topology
- Bagplots