Skip to main content
Log in

Modelling persistence diagrams with planar point processes, and revealing topology with bagplots

  • Published:
Journal of Applied and Computational Topology Aims and scope Submit manuscript

Abstract

We introduce a new model for planar point processes, with the aim of capturing the structure of point interaction and spread in persistence diagrams. Persistence diagrams themselves are a key tool of topological data analysis (TDA), crucial for the delineation and estimation of global topological structure in large data sets. To a large extent, the statistical analysis of persistence diagrams has been hindered by difficulties in providing replications, a problem that was addressed in an earlier paper, which introduced a procedure called replicating statistical topology (RST). Here we significantly improve on the power of RST via the introduction of a more realistic class of models for the persistence diagrams. In addition, we introduce to TDA the idea of bagplotting, a powerful technique from non-parametric statistics well adapted for differentiating between topologically significant points, and noise, in persistence diagrams. Outside the setting of TDA, our model provides a setting for fashioning point processes, in any dimension, in which both local interactions between the points, along with global restraints on the overall, global, shape of the point cloud, are important and perhaps competing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. Originally, “al freír de los huevos lo verá” (you will see when the eggs are fried). See de Cervantes (1605).

References

Download references

Acknowledgements

We are grateful to Katherine Turner who, at a conference in Japan, suggested that we should be able to improve on the model of Adler et al. (2017) by incorporating information on the global shape of the persistence diagrams into the model. It was an insightful and useful suggestion. We are also indebted to the incisive comments of a referee, who asked a number of pointed questions and pinned us down on a number of questionable, or at least unproven, claims. As a result, this version of the paper is somewhat longer than the original one, but, hopefully, more precise and somewhat clearer as well.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert J. Adler.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Robert J. Adler, Sarit Agami: Research supported in part by URSAT: Understanding Random Systems via Algebraic Topology, ERC Advanced Grant 320422 and Israel Science Foundation, Grant 2539/17.

Appendices

Appendices

1.1 Simulating from \({\bar{f}}^G\)

While the MCMC calculations of the paper are well summarised in Algorithm 1 of Sect. 5.4, sampling from the distribution \({\bar{f}}^G(x)\) of (12) itself requires some care. While reasonably straightforward, there are numerical subtleties, and so for completeness we describe the full procedure here.

Recall that \({\bar{f}}^G(x)\) is just the kernel density estimate \({\hat{f}}^G\), restricted to \(\mathbb {R}\times \mathbb {R}_+\), and normalised. To sample from it, we first denote by R the smallest rectangular subset of the half plane which includes the set \(\{x\in \mathbb {R}\times \mathbb {R}_+: {\bar{f}}^G(x)>\varepsilon \}\), for some \(\varepsilon >0\) that is case specific. Then divide R into \(I_1\times I_2\) equal sized rectangles \(I_{ij}\), where \(I_1\) and \(I_2\) are typically of the order of 100, but, again, case specific.

The second step involves assigning probabilities to these rectangles, which, a prior, could be done by integrating \({\bar{f}}^G\) over each one. However, noting the original empirical density \({\hat{f}}^G\) comes from a Gaussian kernel, considerable computational time is saved by first defining its integrated version

$$\begin{aligned} {\hat{F}}^G (x) \ {\mathop {=}\limits ^{\Delta }}\ \frac{1}{n} \sum _{i=1}^{n}\Phi _{\Sigma } (x-x_{i}), \end{aligned}$$

where \(\Phi _\Sigma \) is the Gaussian (cumulative) distribution function corresponding to the Gaussian kernel in the definition of \({\hat{f}}^G\). Extend \({\hat{F}}^G\) to a measure on rectangles in the usual way, and define the probabilities (which now sum to 1)

$$\begin{aligned} p_{ij} \ = \ \frac{{\hat{F}}^G(I_{ij})}{\sum _{i=1}^{I_1}\sum _{j=1}^{I_2}{\hat{F}}^G(I_{ij})}. \end{aligned}$$

By taking any linear enumeration of the indices (ij) it is now trivial to chose a rectangle at random, according to these probabilities, by the inverse transform method [e.g. Robert and Casella (2004), Brooks et al. (2011)].

Having chosen a rectangle, we now chose a point uniformly, at random, from it. This is the value \(x^*\) taken for Step 3 of Algorithm 1.

1.2 The model of Adler et al. (2017)

The model originally developed in Adler et al. (2017), as with the one used in the current paper, was a Gibbs distribution, and so can be described through its Hamiltonian, as below, retaining the notation of Sect. 4. we shall do this only for projected persistence diagrams, so that each point x in the diagram is of the form \(x=(x^{(1)},x^{(2)})\in \mathbb {R}\times \mathbb {R}_+\).

Define

$$\begin{aligned} \sigma _H^2 = \sum _{x\in {{\tilde{x}}}_N} \big (x^{(1)} - {\bar{x}}^{(1)}\big )^2,\ \ \ \sigma _V^2 = \sum _{x\in {{\tilde{x}}}_N} \big (x^{(2)} \big )^2, \end{aligned}$$

where \({\bar{x}}^{(1)}=N^{-1} \sum _{i=1}^N x_i^{(1)}\), so \(\sigma _H^2\) is the variance of the horizontal points. On the other hand, \(\sigma _V^2\) is square of the \(L_2\) norm of the vertical points, rather than the centred variance (because of the non-negativeness of the \(x^{(2)}\)).

For integral \(K> 0\), a collection \(\Theta = (\theta _H,\theta _V,\theta _1,\ldots \theta _K)\) of \(\mathbb {R}\)-valued parameters, and a \(\delta >0\), define the Hamiltonian

$$\begin{aligned} H_{\delta ,\Theta }^K({{\tilde{x}}}_N)= & {} \theta _H \sigma ^2_H +\theta _V \sigma ^2_V + \delta ^{-2} \sum _{k=1}^K \theta _k \sum _{i=1}^N \nonumber \\&\quad \sum _{z\in {\mathcal N}_k(x_i)} \Vert z-x_i\Vert \, \mathbb {1}_{\{\Vert z-x \Vert \le \delta \}}. \end{aligned}$$
(15)

With this Hamiltonian replacing the one defined by (8) the remaining development in Adler et al. (2017)—in particular that of an appropriate pseudo-likelihood model—is parallel to that in Sect. 4.

We note though the main differences between the models. The first is the parameter \(\delta \), which limits nearest neighbour interactions only to those neighbours that are closer than \(\delta \). This had a mild numerically stabilising effect in the model defined by (15), that, for reasons that are not entirely clear, disappeared in the model of the current paper. Consequently, we no longer use it. The first two terms in the Hamiltonian, involving second moments, were intended to play the role that the empirical density \({\bar{f}}^G\) plays in the current paper; viz. they controlled the overall shape of the random diagrams, and worked “against” the control resulting from the nearest neighbour interactions. However, as shown by most of the examples in Sect. 6—in particular the Gaussian excursion set and non-concentric circles examples—these terms were not able to capture many of the subtleties found in persistence diagrams. Furthermore, as the MCMC simulations progressed, the simulated diagrams had a tendency to move towards the diagonal, in a fashion that was inconsistent with their overall use.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adler, R.J., Agami, S. Modelling persistence diagrams with planar point processes, and revealing topology with bagplots. J Appl. and Comput. Topology 3, 139–183 (2019). https://doi.org/10.1007/s41468-019-00035-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41468-019-00035-w

Keywords

Mathematics Subject Classification

Navigation