Abstract
Identifying and representing clusters in time-varying network data is of particular importance when studying collective behaviors emerging in nature, in mobile device networks or in social networks. Based on combinatorial, categorical, and persistence theoretic viewpoints, we establish a stable functorial pipeline for the summarization of the evolution of clusters in a time-varying network. We first construct a complete summary of the evolution of clusters in a given time-varying network over a set of entities X of which takes the form of a formigram. This formigram can be understood as a certain Reeb graph \(\mathcal {R}\) which is labeled by subsets of X. By applying Möbius inversion to the formigram in two different manners, we obtain two dual notions of diagram: the maximal group diagram and the persistence clustergram, both of which are in the form of an ‘annotated’ barcode. The maximal group diagram consists of time intervals annotated by their corresponding maximal groups — a notion due to Buchin et al., implying that we recognize the notion of maximal groups as a special instance of generalized persistence diagram by Patel. On the other hand, the persistence clustergram is mostly obtained by annotating the intervals in the zigzag barcode of the Reeb graph \(\mathcal {R}\) with certain merging/disbanding events in the given time-varying network. We show that both diagrams are complete invariants of formigrams (or equivalently of trajectory grouping structure by Buchin et al.) and thus contain more information than the Reeb graph \(\mathcal {R}\).
Similar content being viewed by others
Data Availability
Data sharing not applicable to this article.
Notes
The function c associated to F is not unique and thus we refer to \(\textrm{im}(c)\) as ‘a’ set of critical points and not as ‘the’ set of critical points.
For instance, reflexivity follows since is 0-tripod between \(\mathcal {G}_X\) and \(\mathcal {G}_X\).
Silhouettes in this paper have no relation with the persistence silhouettes in [24].
In this section groups do not stand for groups in abstract algebra. See Remark 6.18.
In the preprint of this work, we also considered formigrams derived from directed dynamic graphs [56].
References
Adams, H., Ghosh, D., Mask, C., Ott, W., Williams, K.: Efficient evader detection in mobile sensor networks. arXiv preprint arXiv:2101.09813 (2021)
Adams, H., Carlsson, G.: Evasion paths in mobile sensor networks. Int. J. Robot. Res. 34(1), 90–104 (2015)
Azumaya, G., et al.: Corrections and supplementaries to my paper concerning Krull-Remak-Schmidt’s theorem. Nagoya Math. J. 1, 117–124 (1950)
Bauer, U., Ge, X., Wang, Y.: Measuring distance between Reeb graphs. In: Proceedings of the Thirtieth Annual Symposium on Computational Geometry, pp. 464–473 (2014)
Bauer, U., Munch, E., Wang, Y.: Strong Equivalence of the Interleaving and Functional Distortion Metrics for Reeb Graphs. In: Arge, L., Pach, J. (eds.) 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), vol. 34, pp. 461–475. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2015). https://doi.org/10.4230/LIPIcs.SOCG.2015.461. http://drops.dagstuhl.de/opus/volltexte/2015/5146
Bauer, U., Lesnick, M.: Induced matchings and the algebraic stability of persistence barcodes. J. Comput. Geom. 6(2), 162–191 (2015)
Benkert, M., Gudmundsson, J., Hübner, F., Wolle, T.: Reporting flock patterns. Comput. Geom. 41(3), 111–125 (2008)
Birkhoff, G.: Lattice Theory, vol. 25. American Mathematical Society, Providence (1948)
Bjerkevik, H.B.: On the stability of interval decomposable persistence modules. Discrete Comput. Geom. 66(1), 92–121 (2021)
Bondy, J., Murty, U.: Graph Theory (Graduate Texts in Mathematics). Springer, New York (2008)
Botnan, M.B.: Interval decomposition of infinite zigzag persistence modules. Proc. Am. Math. Soc. 145(8), 3571–3577 (2017)
Botnan, M., Lesnick, M.: Algebraic stability of zigzag persistence modules. Algebraic Geom. Topol. 18(6), 3133–3204 (2018)
Bredon, G.E.: Sheaf Theory, vol. 170. Springer, New York (2012)
Bubenik, P., Scott, J.A.: Categorification of persistent homology. Discrete Comput. Geom. 51(3), 600–627 (2014)
Buchin, K., Buchin, M., van Kreveld, M.J., Speckmann, B., Staals, F.: Trajectory grouping structure. JoCG 6(1), 75–98 (2015)
Burago, D., Burago, Y., Ivanov, S.: A Course in Metric Geometry. AMS Graduate Studies in Math., vol. 33. American Mathematical Society, Providence (2001)
Carlsson, G., Mémoli, F.: Multiparameter hierarchical clustering methods. In: Classification as a Tool for Research, pp. 63–70. Springer (2010)
Carlsson, G.: Topology and data. Bull. Am. Math. Soc. 46(2), 255–308 (2009)
Carlsson, G., Mémoli, F.: Characterization, stability and convergence of hierarchical clustering methods. J. Mach. Learn. Res. 11, 1425–1470 (2010)
Carlsson, G., Mémoli, F.: Classifying clustering schemes. Found. Comput. Math. 13(2), 221–252 (2013)
Carlsson, G., De Silva, V.: Zigzag persistence. Found. Comput. Math. 10(4), 367–405 (2010)
Carlsson, G., Zomorodian, A.: The theory of multidimensional persistence. Discrete Comput. Geom. 42(1), 71–93 (2009)
Chazal, F., Cohen-Steiner, D., Glisse, M., Guibas, L.J., Oudot, S.: Proximity of persistence modules and their diagrams. In: Proceedings of 25th ACM Symposium on Computational Geometry, pp. 237–246 (2009)
Chazal, F., Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L.: Stochastic convergence of persistence landscapes and silhouettes. In: Proceedings of the Thirtieth Annual Symposium on Computational Geometry, pp. 474–483 (2014)
Chowdhury, S., Mémoli, F.: Explicit geodesics in Gromov-Hausdorff space. Electron. Res. Announc. 25, 48–59 (2018)
Clause, N., Kim, W.: Spatiotemporal Persistent Homology Computation Tool. https://github.com/ndag/PHoDMSs (2020)
Clause, N.: Zigzag Persistent Homology and Dynamic Networks. https://github.com/ndag/DynGraphZZ (2021)
Cohen-Steiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discrete Comput. Geom. 37(1), 103–120 (2007)
Curry, J.M.: Sheaves, cosheaves and applications. PhD thesis, University of Pennsylvania (2014)
Curry, J., Patel, A.: Classification of constructible cosheaves. Theory Appl. Categories 35(27), 1012–1047 (2020)
De Silva, V., Ghrist, R.: Coordinate-free coverage in sensor networks with controlled boundaries via homology. Int. J. Robot. Res. 25(12), 1205–1222 (2006)
De Silva, V., Munch, E., Patel, A.: Categorified Reeb graphs. Discrete Comput. Geom. 55(4), 854–906 (2016)
De Silva, V., Ghrist, R., et al.: Homological sensor networks. Notices of the American mathematical society 54(1) (2007)
Dey, T.K., Hou, T.: Computing zigzag persistence on graphs in near-linear time. In: 37th International Symposium on Computational Geometry (SoCG 2021). Leibniz International Proceedings in Informatics (LIPIcs), vol. 189, pp. 30–13015. Schloss Dagstuhl—Leibniz-Zentrum für Informatik, (2021). https://doi.org/10.4230/LIPIcs.SoCG.2021.30
Dey, T.K., Hou, T.: Updating zigzag persistence and maintaining representatives over changing filtrations. arXiv preprint arXiv:2112.02352 (2021)
Di Fabio, B., Landi, C.: The edit distance for Reeb graphs of surfaces. Discrete Comput. Geom. 55(2), 423–461 (2016)
Edelsbrunner, H., Harer, J.: Persistent homology—a survey. Contemp. Math. 453, 257–282 (2008)
Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. Discrete Comput. Geom. 28, 511–533 (2002)
Gabriel, P.: Unzerlegbare darstellungen i. Manuscr. Math. 6(1), 71–103 (1972)
Gamble, J., Chintakunta, H., Krim, H.: Applied topology in static and dynamic sensor networks. In: 2012 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012). IEEE
Ghrist, R., Riess, H.: Cellular sheaves of lattices and the Tarski Laplacian. Homol. Homotopys. Appl. 24(1), 325–345 (2022)
Gonzalez-Diaz, R., Jimenez, M.-J., Medrano, B.: Spatiotemporal barcodes for image sequence analysis. In: International Workshop on Combinatorial Image Analysis, pp. 61–70 (2015). Springer
Gudmundsson, J., van Kreveld, M.: Computing longest duration flocks in trajectory data. In: Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems, pp. 35–42 (2006). ACM
Gudmundsson, J., van Kreveld, M., Speckmann, B.: Efficient detection of patterns in 2d trajectories of moving points. Geoinformatica 11(2), 195–215 (2007)
Hajij, M., Wang, B., Scheidegger, C., Rosen, P.: Visual detection of structural changes in time-varying graphs using persistent homology, 125–134 (2018). IEEE
Huang, Y., Chen, C., Dong, P.: Modeling herds and their evolvements from trajectory data. In: International Conference on Geographic Information Science, pp. 90–105 (2008). Springer
Hwang, S.-Y., Liu, Y.-H., Chiu, J.-K., Lim, E.-P.: Mining mobile group patterns: A trajectory-based approach. In: PAKDD, vol. 3518, pp. 713–718 (2005). Springer
Jardine, N., Sibson, R.: Mathematical Taxonomy, p. 286. Wiley, London (1971). (Wiley Series in Probability and Mathematical Statistics)
Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)
Kalnis, P., Mamoulis, N., Bakiras, S.: On discovering moving clusters in spatio-temporal data. In: SSTD, vol. 3633, pp. 364–381 (2005). Springer
Kerber, M., Morozov, D., Nigmetov, A.: Geometry Helps to Compare Persistence Diagrams. ACM, New York (2017)
Kim, W., Mémoli, F., Smith, Z.: Analysis of dynamic graphs and dynamic metric spaces via zigzag persistence. In: Topological Data Analysis, pp. 371–389. Springer, (2020)
Kim, W., Mémoli, F., Smith, Z.: Clustering behavior summary of dynamic metric data (2017). https://research.math.osu.edu/networks/formigrams
Kim, W., Mémoli, F., Stefanou, A.: Interleaving by parts for persistence in a poset. arXiv preprint arXiv:1912.04366 (2019)
Kim, W., Mémoli, F.: Formigrams: Clustering summaries of dynamic data. In: Proceedings of 30th Canadian Conference on Computational Geometry (CCCG18) (2018)
Kim, W., Memoli, F.: Stable signatures for dynamic graphs and dynamic metric spaces via zigzag persistence. arXiv preprint arXiv:1712.04064v4 (2017)
Kim, W., Mémoli, F.: Generalized persistence diagrams for persistence modules over posets. J. Appl. Comput. Topol. 5(4), 533–581 (2021)
Kim, W., Mémoli, F.: Spatiotemporal persistent homology for dynamic metric spaces. Discrete Comput. Geom. 66(3), 831–875 (2021)
Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1–2), 723–734 (2010)
Mac Lane, S.: Categories for the Working Mathematician, vol. 5. Springer, New York (2013)
McCleary, A., Patel, A.: Bottleneck stability for generalized persistence diagrams. Proc. Am. Math. Soc. U.S.A. 148(7), 3149–3161 (2020)
McCleary, A., Patel, A.: Edit distance and persistence diagrams over lattices. SIAM J. Appl. Algebra Geom. 6(2), 134–155 (2022)
Mémoli, F.: A distance between filtered spaces via tripods. arXiv preprint arXiv:1704.03965 (2017)
Mitchell, B.: Theory of Categories, vol. 17. Academic Press, Washington, DC (1965)
Morozov, D., Beketayev, K., Weber, G.: Interleaving distance between merge trees. Discrete Comput. Geom. 49, 22–45 (2013)
Munch, E.: Applications of persistent homology to time varying systems. PhD thesis, Duke University (2013)
Parrish, J.K., Hamner, W.M.: Animal Groups in Three Dimensions: How Species Aggregate. Cambridge University Press, Cambridge (1997)
Patel, A.: Reeb spaces and the robustness of preimages. PhD thesis, Duke University (2010)
Patel, A.: Generalized persistence diagrams. J. Appl. Comput. Topol. 1(3), 397–419 (2018)
Puuska, V.: Erosion distance for generalized persistence modules. Homol. Homotopye Appl. 22(1), 233–254 (2020)
Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. ACM SIGGRAPH Comput. Graph. 21(4), 25–34 (1987)
Rolle, A., Scoccola, L.: Stable and consistent density-based clustering. arXiv preprint arXiv:2005.09048 (2020)
Rossetti, G., Cazabet, R.: Community discovery in dynamic networks: a survey. ACM Comput. Surv. 51(2), 1–37 (2018)
Rota, G.-C.: On the foundations of combinatorial theory i. theory of Möbius functions. Probab. Theory Relat. Fields 2(4), 340–368 (1964)
Rubenstein, M., Ahler, C., Nagpal, R.: Kilobot: A low cost scalable robot system for collective behaviors. In: 2012 IEEE International Conference on Robotics and Automation, pp. 3293–3298 (2012). IEEE
Schmiedl, F.: Shape matching and mesh segmentation. PhD thesis, Technische Universität München (2015)
Schmiedl, F.: Computational aspects of the Gromov-Hausdorff distance and its application in non-rigid shape matching. Discrete Comput. Geom. 57(4), 854–880 (2017)
Sinhuber, M., Ouellette, N.T.: Phase coexistence in insect swarms. Phys. Rev. Lett. 119(17), 178003 (2017)
Sumpter, D.J.: Collective Animal Behavior. Princeton University Press, Princeton (2010)
Topaz, C.M., Ziegelmeier, L., Halverson, T.: Topological data analysis of biological aggregation models. PLoS ONE 10(5), 0126383 (2015)
van Goethem, A., van Kreveld, M., Löffler, M., Speckmann, B., Staals, F.: Grouping Time-Varying Data for Interactive Exploration. In: 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), vol. 51, pp. 61–16116. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2016). https://doi.org/10.4230/LIPIcs.SoCG.2016.61.http://drops.dagstuhl.de/opus/volltexte/2016/5953
van Kreveld, M., Löffler, M., Staals, F., Wiratma, L.: A refined definition for groups of moving entities and its computation. Int. J. Comput. Geom. Appl. 28(02), 181–196 (2018)
Vehlow, C., Beck, F., Auwärter, P., Weiskopf, D.: Visualizing the evolution of communities in dynamic graphs. Comput. Graph. Forum 34(1), 277–288 (2015). (Wiley Online Library)
Vieira, M.R., Bakalov, P., Tsotras, V.J.: On-line discovery of flock patterns in spatio-temporal data. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 286–295 (2009). ACM
Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient algorithms for mining maximal valid groups. VLDB J. 17(3), 515–535 (2008)
Wikipedia: Formicarium—Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Formicarium. Accessed 12 Dec 2021
Wiratma, L., van Kreveld, M., Löffler, M., Staals, F.: An experimental evaluation of grouping definitions for moving entities. In: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 89–98 (2019)
Xian, L., Adams, H., Topaz, C.M., Ziegelmeier, L.: Capturing dynamics of time-varying data via topology. Found. Data Sci. 4(1), 1–36 (2022)
Acknowledgements
This work was partially supported by NSF grants IIS-1422400, CCF-1526513, DMS-1723003, and CCF-1740761. We thank Michael Lesnick for useful comments about the paper. We also thank Zane Smith for providing an example of non-planar formigram in Example 3.14. WK thanks Amit Patel for beneficial discussions regarding topics related to Sect. 6.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest/Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Editor in Charge: Kenneth Clarkson
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Details and Proofs
1.1 A.1 Bottleneck distance
Recall that injective partial functions are referred to as matchings. We use \(\sigma :A\nrightarrow B\) to denote a matching \(\sigma \subset A\times B\) between sets A and B. The canonical projections of \(\sigma \) onto A and B are denoted by \(\textrm{coim} (\sigma )\) and \(\textrm{im} (\sigma )\), respectively.
Many equivalent expressions for the bottleneck distance have been given in the TDA literature. We adopt the following form from [6]: Recall Notation 2.12. Letting \({\mathcal {A}}\) be a multiset of intervals in \(\textbf{R}\) and \(\varepsilon \ge 0\),
Note that \({\mathcal {A}}^0={\mathcal {A}}\).
Definition A.1
([6]) Let \({\mathcal {A}}\) and \({\mathcal {B}}\) be multisets of intervals in \(\textbf{R}\). We define a \(\delta \)-matching between \({\mathcal {A}}\) and \({\mathcal {B}}\) to be a matching \(\sigma :{\mathcal {A}}\nrightarrow {\mathcal {B}}\) such that \({\mathcal {A}}^{2\delta }\subset \textrm{coim} (\sigma )\), \({\mathcal {B}}^{2\delta }\subset \textrm{im} (\sigma )\), and if \(\sigma \langle b,d\rangle =\langle b',d'\rangle \), then
with the convention \(+\infty +\delta =+\infty \) and \(-\infty -\delta =-\infty \). We define the bottleneck distance \(d_\textrm{B}\) by
We declare \(d_\textrm{B}({\mathcal {A}},{\mathcal {B}})=+\infty \) when there is no \(\delta \)-matching between \({\mathcal {A}}\) and \({\mathcal {B}}\) for any \(\delta \in [0,\infty )\).
1.2 A.2 Proof of Theorem 4.7
We recall the Gromov-Hausdorff distance between metric spaces. Let \((X,d_X)\) and \((Y,d_Y)\) be any two metric spaces and let be a tripod between X and Y. Then, the distortion of R is defined as
Definition A.2
(Gromov-Hausdorff distance [16, Sect. 7.3]) Let \((X,d_X)\) and \((Y,d_Y)\) be any two compact metric spaces. Then,
where the infimum is taken over all tripods R between X and Y. In particular, any tripod R between X and Y is said to be an \(\varepsilon \)-tripod between \((X,d_X)\) and \((Y,d_Y)\) if \(\textrm{dis}(R)\le \varepsilon \).
Proposition A.3
Let \((X,d_X)\) and \((Y,d_Y)\) be any two finite metric spaces. Then, there exist two DGs \(\mathcal {G}_X=\left( V_X(\cdot ),E_X(\cdot )\right) \) and \(\mathcal {G}_Y=\left( V_Y(\cdot ),E_Y(\cdot )\right) \) corresponding to \((X,d_X)\) and \((Y,d_Y)\) respectively such that
Proof
Let \(D\ge 0\) be the diameter of \((X,d_X)\). For \(t\in \textbf{R}\), we define:
We define \(\mathcal {G}_X\) by \(t\mapsto (V_X(t),E_X(t))\). Define \(\mathcal {G}_Y\) similarly. We show that \(d_\textrm{I}^{\textrm{dynG}}(\mathcal {G}_X,\mathcal {G}_Y)\ge 2\cdot d_\textrm{GH}\left( (X,d_X),(Y,d_Y)\right) .\) To this end, suppose that for some \(\varepsilon \ge 0\), is any \(\varepsilon \)-tripod between \(\mathcal {G}_X\) and \(\mathcal {G}_Y\) (cf. Definition 4.4). Then, by the construction of \(\mathcal {G}_X,\mathcal {G}_Y\), it must hold that \(\left|d_X\left( \varphi _X(z),\varphi _X(z')\right) -d_Y\left( \varphi _Y(z),\varphi _Y(z')\right) \right|\le \varepsilon \) for all \(z,z'\in Z.\) The other inequality \(d_\textrm{I}^{\textrm{dynG}}(\mathcal {G}_X,\mathcal {G}_Y)\le 2\cdot d_\textrm{GH}\left( (X,d_X),(Y,d_Y)\right) \) can be similarly proved. \(\square \)
Definition A.4
An ultrametric space is a metric space (X, d) in which the following ultra-triangle inequality holds: for all \(x,y,z\in X\),
If (X, d) were a pseudometric, then d is called an ultra-pseudometric.
Proof of Theorem 4.7
Pick any two ultrametric spaces \((X,u_X)\) and \((Y,u_Y)\). Then, by Proposition A.3, there exist DGs \(\mathcal {G}_X=\left( V_X(\cdot ),E_X(\cdot )\right) \) and \(\mathcal {G}_Y=\left( V_Y(\cdot ),E_Y(\cdot )\right) \) such that the interleaving distance between \(\mathcal {G}_X\) and \(\mathcal {G}_Y\) is identical to twice the Gromov-Hausdorff distance \(\Delta :=d_\textrm{GH}((X,u_X),(Y,u_Y))\) between \((X,u_X)\) and \((Y,u_Y)\). However, according to [77, Cor. 3.8], \(\Delta \) cannot be approximated within any factor less than 3 in polynomial time, unless \(P=NP\). The author shows this by observing that any instance of the 3-partition problem can be reduced to an instance of the bottleneck \(\infty \)-Gromov-Hausdorff distance (\(\infty \)-BGHD) problem between ultrametric spaces (cf. [77, p. 865]). The proof follows. \(\square \)
1.3 A.3 Details about Remark 4.18
Remark A.5
(Interleaving between dendrograms) When \(\theta _X,\theta _Y\) are dendrograms over sets X and Y respectively, let be an \(\varepsilon \)-tripod between \(\theta _X\) and \(\theta _Y\). Since both \(\theta _X\) and \(\theta _Y\) get coarser as \(t\in \textbf{R}\) increases, the interleaving condition in Definition 4.11 can be rewritten as follows: for all \(t\in \textbf{R}\) it holds that \(\theta _X(t) \le _{R} \theta _Y(t+\varepsilon )\) and \(\theta _Y(t) \le _{R} \theta _X(t+\varepsilon )\) (cf. Definition 4.10).
Let X be a finite set and let \(\theta _X\) be a dendrogram over X (cf. Remark 3.7). Recall from [19] that this \(\theta _X\) induces a canonical ultra-pseudometric \(u_X:X\times X\rightarrow \textbf{R}_+\) on X (cf. Definition A.4) defined by
Proposition A.6
Given any two dendrograms \(\theta _X,\theta _Y\) over sets X, Y, respectively, let \(u_X,u_Y\) be the canonical ultra-pseudometrics on X and Y, respectively. Then, \(d_{\textrm{I}}^\textrm{F}(\theta _X,\theta _Y)=2\ d_{\textrm{GH}}((X,u_X), (Y,u_Y)).\)
Proof
We first show that the LHS \(\ge \) the RHS. Let \(\varepsilon \ge 0\) and let be any \(\varepsilon \)-tripod between the two dendrograms \(\theta _X\) and \(\theta _Y\). Let \((x,y),(x',y')\in R\) and let \(t:=u_X(x,x')\). This implies that \(x,x'\) belong to the same block of the partition \(\theta _X(t).\) Since \(\theta _X(t)\le _R \bigvee _{[t]^\varepsilon }\theta _Y=\theta _Y(t+\varepsilon )\), y and \(y'\) must belong to the same block of \(\theta _Y(t+\varepsilon )\), and in turn this implies that \(u_Y(y,y')\le t+\varepsilon =u_X(x,x')+\varepsilon \). By symmetry, we also have \(u_Y(y,y')\le u_X(x,x')+\varepsilon \) and in turn \(\left|u_X(x,x')-u_Y(y,y')\right|\le \varepsilon \). By Definition A.2, this implies that \(d_\textrm{GH}((X,u_X),(Y,u_Y))\le \varepsilon /2.\)
Next, we prove the opposite inequality. Let be a tripod between X and Y such that \(\textrm{dis}(R)=\varepsilon .\) it suffices to show that for all \(t\in \textbf{R}\), \(\theta _X(t)\le _R\theta _Y(t+\varepsilon )\) and \(\theta _Y(t)\le _{R}\theta _X(t+\varepsilon )\). By symmetry, we only prove that \(\theta _X(t)\le _R\theta _Y(t+\varepsilon )\) for all \(t\in \textbf{R}\). For \(t<0\), since \(\theta _X(t)=\emptyset \), \(\theta _X(t)\le _R\theta _Y(t+\varepsilon )\) trivially holds. Now pick any \(t\ge 0\) and pick any \((x,y),(x',y')\in R\). Assume that \(x,x'\) belong to the same block of \(\theta _X(t),\) implying that \(u_X(x,x')\le t.\) Since \(\left|u_X(x,x')-u_Y(y,y')\right|\le \varepsilon \), we know \(u_Y(y,y')\le t+\varepsilon ,\) and hence \(y,y'\) belong to the same block of \(\theta _Y(t+\varepsilon )\). Therefore, \(\theta _X(t)\le _R\theta _Y(t+\varepsilon )\) for all \(t\in \textbf{R}\). \(\square \)
Theorem A.7
(Complexity of computing \(d_{\textrm{I}}^\textrm{F}\)) Fix \(\rho \in (1,6)\). It is not possible to obtain a \(\rho \) approximation to the distance \(d_{\textrm{I}}^\textrm{F}(\theta _X,\theta _Y)\) between formigrams in time polynomially depending on \(\left|X\right|,\left|Y\right|,\left|\textrm{crit}(\theta _X)\right|\), \(\left|\textrm{crit}(\theta _Y)\right|\) unless \(P=NP\).
Proof
Pick any two dendrograms and invoke Proposition A.6 to reduce the problem to the computation of the Gromov-Hausdorff distance between the ultra-pseudometric spaces associated to the dendrograms. The rest of the proof follows along the same lines as that of Theorem 4.7. \(\square \)
1.4 A.4 Proof of Theorem 6.30
Theorem 6.30 directly follows from Theorem A.9 below.
We explicitly represent the colimit of \(M:\textbf{ZZ}\rightarrow \textbf{set}\) as follows. For \(k,l\in \textbf{ZZ}\), assume that \(x\in M{(k)}\) and \(y\in M{(l)}\). We write \(x\sim y\) if k and l are comparable and one of x and y is mapped to the other via the internal map between M(k) and M(l). The colimit of M is the pair \(\left( C,(i_k)_{k\in \textbf{ZZ}}\right) \) described as follows:
where \(\approx \) is the equivalence relation generated by the relations \(x_k\sim x_{l}\) for \(x_k\in M(k)\) and \(x_{l}\in M(l)\) with k, l being comparable. For the quotient map \(q:\coprod _{k\in \textbf{ZZ}}M(k)\rightarrow C\), each \(i_k\) is the composition \(M_k\hookrightarrow \coprod _{k\in \textbf{ZZ}}M(k) {\mathop {\rightarrow }\limits ^{q}} C\).
Let \(I\in \textbf{Int}(\textbf{ZZ})\). For any functor \(N:I\rightarrow \textbf{set}\), we can construct the limit and colimit of N in the same way; namely, in the above description, replace M and \(\textbf{ZZ}\) by N and I, respectively. In what follows, we use this explicit construction whenever considering colimits of (interval restrictions of) \(\textbf{ZZ}\)-indexed \(\textbf{set}\)-diagrams.
Definition A.8
Let \(I\in \textbf{Int}(\textbf{ZZ})\) and let \(N:I\rightarrow \textbf{set}\) by any functor. Let \(c\in \varinjlim N\). We define the support of c as
In particular, if \(\textrm{supp}(c)=I\), we call c a full component of the functor N.
Given \(M:\textbf{ZZ}\rightarrow \textbf{set}\) and \(I\in \textbf{Int}(\textbf{ZZ})\), we denote the number of full components of \(M\vert _I\) by \(\textrm{full}(M\vert _I)\). Recall Notation 6.4.
Theorem A.9
([57, Cor. 4.10]) For any functor \(M:\textbf{ZZ}\rightarrow \textbf{set}\), the multiplicity of I in \(\textrm{barc}(\mathcal {F}\circ M)\) is
Proof of Theorem 6.30
By Proposition 6.10, for every \(I\in \textbf{Int}(\textbf{ZZ})\),
Therefore, by Theorem A.9, it suffices to show that for all \(J\in \textbf{Int}(\textbf{ZZ})\). If \(J\in \textbf{Int}(\textbf{ZZ})\) is not a subset of \(\textrm{supp}(\theta _X)\), then clearly . Now assume that \(J\in \textbf{Int}(\textbf{ZZ})\) is contained in \(\textrm{supp}(\theta _X)\). Then, since \(\theta _X\) is saturated, . Also, \(\left|\bigvee _J\theta _X\right|\) is equal to the number of full components of \(\textrm{Reeb}(\theta _X)\vert _J\), completing the proof. \(\square \)
1.5 A.5 From unlabeled formigrams to persistent cluster counting functors
Let \(\theta _X\) be a formigram. We shall prove that the persistent counting functor (cf. Definition 6.33) can be obtained from the unlabeled formigram of \(\theta _X\) (cf. Definition 3.12 (i)).
Proposition A.10
Let \(\theta \) be the unlabeled formigram of \(\theta _X\). For any \(I\in \textbf{Int}(\textbf{ZZ})\), consider the canonical limit-to-colimit morphism \(\varphi _I:\varprojlim \theta \vert _I\rightarrow \varinjlim \theta \vert _I\) in the category \(\textbf{Part}\). Then, .
Proof
By Proposition 5.1, \(\varprojlim \theta \vert _I\cong \bigwedge _I\theta _X\) and \(\varinjlim \theta \vert _I\cong \bigvee _I\theta _X\), and the morphism \(\varphi _I\) in \(\textbf{Part}\) is the inclusion \(\bigcap _{t\in I}\bigcup \theta _X(t)\hookrightarrow \bigcup _{t\in I}\bigcup \theta _X(t)\). Now by Proposition 5.8 (ii) the desired isomorphism follows. \(\square \)
Proposition A.10 implies that we can extract from \(\theta \): Namely, equals the number of blocks in the second entry of \(\textrm{coim}(\varphi _I)\). Reciprocally, one may wonder whether contains enough information to reconstruct \(\theta \). That is not true; there exists a pair of formigrams which have identical persistent cluster functor, whereas their underlying weighted/unweighted Reeb graphs are different. This implies that their unlabeled formigrams are also different.
1.6 A.6 Details from Section 7
1.6.1 A.6.1 Details from Section 7.2
Proof of Proposition 7.5
Clearly, \(\mathcal {R}_\delta ^1(\gamma _X)\) is a function \(\textbf{R}\rightarrow \textrm{Graph}(X)\). We show that \(\mathcal {R}_\delta ^1(\gamma _X)\) is cosheaf-inducing (cf. Definition 2.17). First we prove that locally \(\mathcal {R}_\delta ^1(\gamma _X)\) admits only finitely many points of discontinuity (those points are called critical points). Let \(I\subset \textbf{R}\) be any nonempty finite interval. For \(i,j\in X:=\{1,\ldots ,n\}\), let \(f_{i,j}:=d_X(\cdot )(i,j):\textbf{R}\rightarrow \textbf{R}_+\). Note that discontinuity points of \(\mathcal {R}_\delta ^1(\gamma _X)\) can occur only at endpoints of connected components of the set \({f_{i,j}}^{-1}(\delta )\) for some \(i,j\in X\). Fix any \(i,j\in X\). Then, by Definition 7.4, the set \({f_{i,j}}^{-1}(\delta )\cap I\) has only finitely many connected components and thus there are only finitely many endpoints arising from those components. Since the set X is finite, this implies that \(\mathcal {R}_\delta ^1(\gamma _X)\) can have only finitely many points of discontinuity in I. Fix any point \(c\in \textbf{R}\) on which \(\mathcal {R}_\delta ^1(\gamma _X)\) is discontinuous. Consider the following two subsets of \(X\times X\):
The continuity of \(d_X(\cdot )(i,j)\) for each \((i,j)\in X\times X\) guarantees that there exists \(\varepsilon >0\) such that
and in turn
since \(A(t,\delta )\cup B(t,\delta )=\{(i,j):i<j\in X\}\) for all \(t\in \textbf{R}.\) This implies that the graph \(\mathcal {R}_\delta ^1(\gamma _X(c))\) contains \(\mathcal {R}_\delta ^1(\gamma _X(t))\) as a subgraph for each \(t\in (c-\varepsilon ,c+\varepsilon )\). \(\square \)
1.6.2 A.6.2 Details from Section 7.3
Details about \(d_{\textrm{I},\lambda }^\textrm{dynM}\). We investigate further properties of the metrics in the family \(\left\{ d_{\textrm{I},\lambda }^\textrm{dynM}\right\} _{\lambda \in [0,\infty )}\). In particular, a discussion about stable invariants of DMSs with respect to the metrics \(d_{\textrm{I},\lambda }^\textrm{dynM}\) for \(\lambda >0\) can be found in [58].
Remark A.11
Let \(\lambda >0\). The distance \(d_{\textrm{I},\lambda }^\textrm{dynM}\) between any two bounded DMSs is finite. More specifically, for any r-bounded DMSs \(\gamma _X=(X,d_X(\cdot ))\) and \(\gamma _Y=(Y,d_Y(\cdot ))\) for some \(r>0\), any tripod R between X and Y is a \((\lambda , \frac{r}{\lambda })\)-tripod between \(\gamma _X\) and \(\gamma _Y\). This implies that
Definition A.12
(Equivalent tripods) Let X, Y be any two sets. For any two tripods and between X and Y, we say that R and S are equivalent if \((x,y)\in R\) if and only if \((x,y)\in S\).
Remark A.13
Let \(\gamma _X=(X,d_X(\cdot ))\) and \(\gamma _Y=(Y,d_Y(\cdot ))\) be any two DMSs. Suppose that R and S are equivalent tripods between X and Y (cf. Definition A.12). Then, it is not difficult to check that for any \(\lambda ,\varepsilon \ge 0\), R is a \((\lambda ,\varepsilon )\)-tripod between \(\gamma _X\) and \(\gamma _Y\) if and only if S is a \((\lambda ,\varepsilon )\)-tripod between \(\gamma _X\) and \(\gamma _Y\).
Proof of Theorem 7.12
We prove the triangle inequality. Take any DMSs \(\gamma _X,\gamma _Y\) and \(\gamma _W\) over X,Y and W, respectively. For some \(\varepsilon ,\varepsilon '>0\), let and be any \((\lambda ,\varepsilon )\)-tripod between \(\gamma _X\) and \(\gamma _Y\) and \((\lambda ,\varepsilon ')\)-tripod between \(\gamma _Y\) and \(\gamma _W\) (cf. Definition 7.8), respectively. Consider the set \(Z:=\left\{ (z_1,z_2)\in Z_1\times Z_2:\varphi _Y(z_1)=\psi _Y(z_2)\right\} \) and let \(\pi _1:Z\rightarrow Z_1\) and \(\pi _2:Z\rightarrow Z_2\) be the canonical projections to the first and the second coordinate, respectively. Define the tripod \(R_2\circ R_1\) between X and W as in equation (2). It is not difficult to check that \(R_2\circ R_1\) is a \((\lambda ,\varepsilon +\varepsilon ')\)-tripod between \(\gamma _X\) and \(\gamma _W\) and thus we have \(d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _X,\gamma _W)\le d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _X,\gamma _Y)+ d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _Y,\gamma _W).\)
Next assume that \(d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _X,\gamma _Y)=0\). We outline the proof of the fact that \(\gamma _X\) and \(\gamma _Y\) are isomorphic (cf. Definition 7.3). Because there are only finitely many tripods between X and Y up to equivalence (cf. Definition A.12), \(d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _X,\gamma _Y)=0\) implies that there must be a certain tripod between X and Y such that R becomes an \((\lambda ,\varepsilon )\)-tripod between \(\gamma _X\) and \(\gamma _Y\) for any \(\varepsilon >0\). In order to show that \(\gamma _X\) and \(\gamma _Y\) are isomorphic, one needs to prove that that R is in fact \((\lambda ,0)\)-tripod. After that, invoke Definition 7.1, (ii) and (iii) to verify that the multivalued map \(\varphi _Y\circ \varphi _X^{-1}:X\rightrightarrows Y\) is in fact a bijection from X to Y.
Lastly, by Remark A.11, for \(\lambda >0\), \(d_{\textrm{I},\lambda }^\textrm{dynM}\) is finite between bounded DMSs. \(\square \)
Remark A.14
(For \(\lambda >0\), \(d_{\textrm{I},\lambda }^\textrm{dynM}\) generalizes the Gromov-Hausdorff distance) Let \(\lambda >0 \). Given any two constant DMSs \(\gamma _X\equiv (X,d_X)\) and \(\gamma _Y\equiv (Y,d_Y)\), the value \(d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _X,\gamma _Y)\) equals the Gromov-Hausdorff distance between \((X,d_X)\) and \((Y,d_Y)\) up to multiplicative constant \(\frac{\lambda }{2}\). Indeed, for any tripod between X and Y, condition (15) reduces to
Therefore,
We have the following bilipschitz-equivalence relation between the metrics \(d_{\textrm{I},\lambda }^\textrm{dynM}\) for different \(\lambda >0\).
Proposition A.15
(Bilipschitz-equivalence) For all \(0<\lambda <\lambda '\),
Proof
Fix any two DMSs \(\gamma _X\) and \(\gamma _Y\) over X and Y. That \(d_{\textrm{I},\lambda '}^\textrm{dynM}(\gamma _X,\gamma _Y)\le d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _X,\gamma _Y)\) follows from the observation that any \((\lambda ,\varepsilon )\)-tripod R between \(\gamma _X\) and \(\gamma _Y\) is also a \((\lambda ',\varepsilon )\)-tripod (cf. Definition 7.8). We next prove \(d_{\textrm{I},\lambda }^\textrm{dynM}(\gamma _X,\gamma _Y)\le \frac{\lambda '}{\lambda }\cdot d_{\textrm{I},\lambda '}^\textrm{dyn}(\gamma _X,\gamma _Y).\) For some \(\varepsilon \ge 0\) let R be any \((\lambda ',\varepsilon )\)-tripod between \(\gamma _X\) and \(\gamma _Y\). It suffices to show that R is also a \((\lambda ,\frac{\lambda '}{\lambda }\varepsilon )\)-tripod. Fix any \(t\in T.\) Then,
By symmetry, we also have\(\bigvee _{[t]^{\left( \frac{\lambda '}{\lambda }\varepsilon \right) }}d_Y\le _{R}d_X(t)+\lambda \left( \frac{\lambda '}{\lambda }\varepsilon \right) ,\) as desired. \(\square \)
Appendix B Distance between weighted Reeb graphs
In this section we introduce a distance between weighted Reeb graphs which mediates between \(d_{\textrm{I}}^\textrm{F}\) and \(d_{\textrm{I}}^\textrm{R}\) (cf. Definition 2.15 and Table 3).
Proposition B.1
(Realization as an unlabeled formigram) For any weighted Reeb graph \(F:\textbf{Int}\rightarrow \omega \textbf{set}\), there exists an unlabeled formigram \(\theta :\textbf{Int}\rightarrow \textbf{Part}\) such that \(F\cong \mathcal {A}\circ \theta \).
The proof is rather trivial and thus we omit it. In general, a realization of a weighted Reeb graph as an unlabeled formigram is not unique; see Example 3.13. Proposition B.1 allows us to define the following dissimilarity measure on weighted Reeb graphs. From equation (3), recall how to define \(d_{\textrm{I}}^\textrm{F}\) between unlabeled formigrams. For all weighted Reeb graphs \(F,G:\textbf{Int}\rightarrow \omega \textbf{set}\), we define:
Since a realization of a weighted Reeb graph as an unlabeled formigram is not necessarily unique, we have to possibly take into account multiple realizations of F and G to compute W(F, G). This leads to the fact that W does not necessarily satisfy the triangle inequality and thus we consider the maximal sub-dominant metric of W [20] as a metric on weighted Reeb graphs:
Definition B.2
(Metric on weighted Reeb graphs) For any two weighted Reeb graphs \(F,G:\textbf{Int}\rightarrow \omega \textbf{set}\),
\(d_{\textrm{I}}^{\omega \textrm{R}}\) is the greatest metric on weighted Reeb graphs among those upper bounded by W. The metric \(d_{\textrm{I}}^{\omega \textrm{R}}\) mediates between \(d_{\textrm{I}}^\textrm{F}\) and \(d_{\textrm{I}}^\textrm{R}\):
Theorem B.3
For any two formigrams \(\theta _X\) and \(\theta _Y\), let \(\omega (\theta _X)\) and \(\omega (\theta _Y)\) be their weighted Reeb graphs. Then,
Proof
From the definition of \(d_{\textrm{I}}^\textrm{R}\) and Proposition 4.13, we have that
Since \(d_{\textrm{I}}^{\omega \textrm{R}}\) is the greatest metric on weighted Reeb graphs among those upper bounded by W, the left inequality in (B4) follows. The right inequality in (B4) directly follows from the definition of \(d_{\textrm{I}}^{\omega \textrm{R}}\). \(\square \)
\(d_{\textrm{I}}^{\omega \textrm{R}}\) is strictly less discriminative than \(d_{\textrm{I}}^\textrm{F}\) whereas strictly more discriminative than \(d_{\textrm{I}}^\textrm{R}\):
Example B.4
-
(i)
Consider \(\theta _X\) and \(\theta _Y\) in Example 3.13. Since the underlying weighted Reeb graphs of \(\theta _X\) and \(\theta _Y\) are isomorphic, their distance in \(d_{\textrm{I}}^{\omega \textrm{R}}\) is zero. However, by Remark 4.12 (i), we have that \(d_{\textrm{I}}^\textrm{F}(\mathcal {U}_L^X\circ \theta _X,\mathcal {U}_L^Y\circ \theta _Y)=d_{\textrm{I}}^\textrm{F}(\theta _X,\theta _Y)>0\).
-
(ii)
Let F and G be weighted Reeb graphs depicted in Fig. 14. Their unweighted Reeb graphs \(\mathcal {A}\circ F\) and \(\mathcal {A}\circ G\) are clearly isomorphic and thus \(d_{\textrm{I}}^\textrm{R}(\mathcal {A}\circ F,\mathcal {A}\circ G)=0\). On the other hand, \(d_{\textrm{I}}^{\omega \textrm{R}}(F,G)= 1/2;\) this follows from the observation that both F and G are uniquely realized (up to natural isomorphism) by unlabeled formigrams \(\theta \) and \(\theta '\), which leads to \(d_{\textrm{I}}^{\omega \textrm{R}}(F,G)=d_{\textrm{I}}^\textrm{F}(\theta ,\theta ')\). Also, it is not difficult to check that \(d_{\textrm{I}}^\textrm{F}(\theta ,\theta ')=1/2\).
Appendix C Smoothing formigrams
The goal of this section is to establish a few basic properties of smoothing of formigrams. In particular, we reveal its effect on the zigzag barcodes of formigrams and its compatibility with smoothing of Reeb graphs in [32]; see Propositions C.2 and C.5.
Recall that a formigram \(\theta _X\) is regarded as either a cosheaf-inducing function \(\textbf{R}\rightarrow \textrm{SubPart}(X)\) or a constructible cosheaf \(\textbf{Int}\rightarrow \textrm{SubPart}(X)\) (cf. Definition 3.5, Remark 2.18 (i) and (ii)). By Definition 2.23 and Proposition 2.25, a smoothing operation on formigrams can be induced via the join operation on subpartitions. Namely, \(S_\varepsilon \theta _X\) sends each \(I\in \textbf{Int}\) to \(\bigvee _{I^\varepsilon }\theta _X:=\bigvee \{\theta _X(t):t\in I^\varepsilon \}\).
Remark C.1
(Comparison with robust grouping structure) The use of the join operation is an important element that distinguishes our notion of smoothing from the robust grouping structure in [15]. In particular, given a dynamic metric space (DMS), the induced formigram of this DMS (which is obtained by combining Definition 3.10 and Proposition 7.5) can be smoothed out using the join operation. We emphasize that this smoothing operation is intrinsic in contrast to the robust grouping structure from [15]. Namely, our smoothing operation can be carried out without constructing any topological space in the spatiotemporal ambient space of the DMS as illustrated in [15, Fig. 11]. One consequence of this ‘intrinsicality’ is that, when a dynamic graph is the input data (as opposed to a dynamic metric space), we can smooth out its induced formigram (cf. Definition 3.10), while [15] does not propose such a method. Since the coordinates of entities are not always available in applications (e.g. sensor networks [31, 33], low-cost swarm robots [75], etc.), this intrinsicality is a desirable property.
Given a formigram \(\theta _X\), Fig. 15 illustrates both the relationship between \(\textrm{Reeb}(\theta _X)\) and \(\textrm{Reeb}(S_\varepsilon \theta _X)\) and the relationship between their zigzag barcodes. The following proposition precisely describes the relationship between \({\bar{c}}(\theta _X)\) and \({\bar{c}}(S_\varepsilon \theta _X)\). For any \(r\in \textbf{R}\), we define \(-\infty +r\) to be \(-\infty \).
Proposition C.2
Let \(\theta _X\) be a formigram over X and let \(\varepsilon \ge 0\). Then, we have the following bijection between \({\bar{c}}(\theta _X)\) and \({\bar{c}}(S_\varepsilon \theta _X)\) (cf. Fig. 15):
Recall the free functor \(\mathcal {F}:\textbf{set}\rightarrow \textbf{vec}\) (cf. Definition 2.6) and the fact that any constructible cosheaf \(\textbf{Int}\rightarrow \textbf{vec}\) is interval decomposable (cf. Proposition 2.19).
Lemma C.3
Given any constructible cosheaf \(F:\textbf{Int}\rightarrow \textbf{set}\), the barcode of \(\mathcal {F}\circ F:\textbf{Int}\rightarrow \textbf{vec}\) cannot include any interval of the form \([b,a]_{\textrm{BL}}:=\{(x,y)\in \textbf{Int}: x\le a <b\le y\}\) for \(a<b\) in \(\textbf{R}\) (cf. Fig. 16).
Proof
Since F is constructible is defined via colimits over restrictions of a zigzag diagram over \(\textbf{R}\) (cf. Definition 2.14), for any \(J\in \textbf{Int}\) and for any \(x\in F(J)\), there exist \(t\in J\) and \(y\in F([t,t])\) such that \(F([t,t]\subset J)(y)=x\). This property directly implies that the interval module \(I^{[b,a]_{\textrm{BL}}}\) cannot be a summand of \(\mathcal {F}\circ F\). \(\square \)
Proof of Proposition C.2
By Definitions 2.20 and 2.23, \({\bar{c}}(S_\varepsilon \theta _X)\) is equal to the multiset
where \(\textbf{R}_{y=x+2\varepsilon }\) is the line \(y=x+2\varepsilon \) identified with the real line via the bijection \((r-\varepsilon ,r+\varepsilon )\leftrightarrow r\). The table (C5) is directly obtained by Lemma C.3 and the block decomposability of \(\mathcal {F}\circ \mathcal {C}\circ \theta _X\) [12, Sect. 3]. \(\square \)
The bijective correspondence of barcodes given in Proposition C.2 directly implies the following:
Corollary C.4
Let \(\theta _X\) be any formigram over X. Then, for \(\varepsilon \ge 0\),
The smoothing operations defined for formigrams and Reeb graphs (cf. Definition 2.23) are compatible in the following sense:
Proposition C.5
Let \(\theta _X\) be a formigram over X. Then, for any \(\varepsilon \ge 0\),
Proof
Let \(I\in \textbf{Int}\). We have:
\(\square \)
Formigrams change in a continuous manner under \(\varepsilon \)-smoothing:
Proposition C.6
For any \(\varepsilon \ge 0\) and any formigram \(\theta _X\),
Proof
Consider the tripod and check that R is an \(\varepsilon \)-tripod between \(S_\varepsilon \theta _X\) and \(\theta _X\). \(\square \)
The following proposition is analogous to [32, Prop. 4.14]:
Proposition C.7
For any \(\varepsilon \ge 0\), \(S_\varepsilon \) is a contraction on formigrams, i.e. for any formigrams \(\theta _X\) and \(\theta _Y\)
Proof
For \(\delta \ge 0\), assume that is a \(\delta \)-tripod between \(\theta _X\) and \(\theta _Y\). We claim that R is also a \(\delta \)-tripod between \(S_\varepsilon \theta _X\) and \(S_\varepsilon \theta _Y\). First, we remark that \(\varphi _X^*S_\varepsilon \theta _X=S_\varepsilon \varphi _X^*\theta _X\). Indeed, for any \(I\in \textbf{Int}\), \((\varphi _X^*S_\varepsilon \theta _X)(I)=\varphi _X^*(S_\varepsilon \theta _Y(I))=\varphi _X^*(\theta _X(I^\varepsilon ))=(S_\varepsilon \varphi _X^*\theta _X)(I)\). Therefore,
and by symmetry we have \(S_\delta \left( \varphi _Y^*S_\varepsilon \theta _Y\right) \ge \varphi _X^*S_\varepsilon \theta _X\), completing the proof. \(\square \)
Appendix D About the 0-slack interleaving distance between DMSs
We clarify the computational complexity of \(d_\textrm{I}^{\textrm{dynM}}\) (cf. Theorem 7.15) and provide a few examples of computing \(d_\textrm{I}^{\textrm{dynM}}\).
Computational complexity of \(d_\textrm{I}^{\textrm{dynM}}\).
We relate the Gromov-Hausdorff distance between two given ultrametric spaces to the interleaving distance \(d_\textrm{I}^{\textrm{dynM}}\) between certain DMSs induced by those ultrametric spaces. Then, invoking results from F. Schmiedl’s PhD thesis [76, 77] we obtain the claim of Theorem 7.15.
Given a ultrametric space \((X,u_X)\), define a DMS \({\mathcal {D}}(X,u_X):=(X,d_X(\cdot ))\) where for all \(x,x'\in X\) and for all \(t\in \textbf{R}\), \( d_X(t)(x, x'):=\max (0,u_X(x,x')-t)\). It is noteworthy that for any \(x,x'\in X\), \(d_X(\cdot )(x,x'):\textbf{R}\rightarrow \textbf{R}_+\) is decreasing down to zero and that \(d_X(0)=u_X\), a legitimate metric (i.e. not just pseudo-metric), satisfying the second item of Definition 7.1. Furthermore, note that \({\mathcal {D}}(X,u_X)\) is clearly piecewise linear and that the set of breakpoints is \(S_{{\mathcal {D}}(X,u_X)} = \{u_X(x,x'),\,x,x'\in X\}.\) Recall Definition A.2.
Proposition D.1
For any two ultrametric spaces \((X,u_X)\) and \((Y,u_Y)\) we have
Proof
Let \({\mathcal {D}}(X,u_X)= (X,d_X(\cdot ))\) and \({\mathcal {D}}(Y,u_Y)=(Y,d_Y(\cdot ))\). Observe that for any \(x,x'\in X\), any \(t\in \textbf{R}\), and any \(\varepsilon \ge 0\), \(\min _{s\in [t]^\varepsilon }d_X(s)(x,x') = d_X(t+\varepsilon )(x,x')\) since \(d_X\) is decreasing over time. Thus, for some \(\varepsilon \ge 0\), a tripod is an \(\varepsilon \)-tripod between \((X,d_X), (Y,d_Y)\) (cf. Definition A.2) if and only if for all \(z,z'\in Z\) and for all \(t\in \textbf{R}\), \(d_X(t+\varepsilon )\left( \varphi _X(z),\varphi _X(z')\right) \le d_Y(t)\left( \varphi _Y(z),\varphi _Y(z')\right) \) and \(d_Y(t+\varepsilon )\left( \varphi _Y(z),\varphi _Y(z')\right) \le d_X(t)\left( \varphi _X(z),\varphi _X(z')\right) \), if and only if for all \(z,z'\in Z\) and for all \(t\in \textbf{R}\), \(\max \left( 0,u_X\left( \varphi _X(z),\varphi _X(z')\right) -t-\varepsilon \right) \le \max \left( 0,u_Y\left( \varphi _Y(z),\varphi _Y(z')\right) -t\right) \) and \(\max \left( 0,u_Y\left( \varphi _Y(z),\varphi _Y(z')\right) -t-\varepsilon \right) \le \max \left( 0,u_X\left( \varphi _X(z),\varphi _X(z')\right) -t\right) \) if and only if for all \(z,z'\in Z\),
completing the proof. \(\square \)
Proof of Theorem 7.15
Pick any two ultrametric spaces \((X,u_X)\) and \((Y,u_Y)\). Then, by Proposition D.1, the interleaving distance between \({\mathcal {D}}(X,u_X)\) and \({\mathcal {D}}(Y,u_Y)\) is identical to twice the Gromov-Hausdorff distance \(\Delta :=d_\textrm{GH}((X,u_X),(Y,u_Y))\) between \((X,u_X)\) and \((Y,u_Y)\). The rest of the proof follows along the same lines as that of Theorem 4.7. \(\square \)
Next we discuss a few computational examples of \(d_\textrm{I}^{\textrm{dynM}}\). Let \(\psi :\textbf{R}\rightarrow \textbf{R}_+\) be any non identically zero continuous function. Then, for any finite metric space \((X,d_X')\) we have the DMS \(\gamma _X^\psi = (X,d_X^\psi (\cdot ))\) where for \(t\in \textbf{R}\), \(d_X^\psi (t):=\psi (t)\cdot d_X'.\)
Example D.2
(An interleaved pair of DMSs I)This example refers to Fig. 17. Fix the two-point metric space and consider two DMSs \(\gamma _X^{\psi _0}=(X,d_X^{\psi _0})\) and \(\gamma _X^{\psi _1}=(X,d_X^{\psi _1})\) where, for \(t\in \textbf{R}\), \(\psi _0(t)=1+\cos (t)\), \(\psi _1(t)=1+\cos (t+\pi /4)\). Then, \(\gamma _X^{\psi _0}\) and \(\gamma _X^{\psi _1}\) are \(\varepsilon \)-interleaved if and only if for \(i,j\in \{0,1\}\), \(i\ne j\), and for all \(t\in \textbf{R}\), \(S_\varepsilon (\psi _i)(t):=\min _{s\in [t]^\varepsilon }\psi _i(s)=\left( \bigvee _{[t]^\varepsilon }d_X^{\psi _i}\right) (x,x')\le d_X^{\psi _j}(t)(x,x')=\psi _j(t)\). In fact, this inequality holds if and only if \(\varepsilon \ge \pi /4\), and hence \(d_\textrm{I}^{\textrm{dynM}}\left( \gamma _X^{\psi _0},\gamma _X^{\psi _1}\right) =\pi /4\) (cf. Fig. 17).
The following example generalizes the previous one.
Example D.3
(An interleaved pair of DMSs II) Fix the two-point metric space and consider two DMSs \(\gamma _X^{\psi _0}=(X,d_X^{\psi _0})\) and \(\gamma _X^{\psi _1}=(X,d_X^{\psi _1})\) where, for \(t\in \textbf{R}\), \(\psi _0(t)=1+\cos (\omega t)\), \(\psi _1(t)=1+\cos (\omega (t+\tau ))\), for fixed \(\omega >0\) and \(0<\tau <\frac{2\pi }{\omega }\). Since in this case \(\psi _1(t) = \psi _0(t+\tau )\) for all t, one would expect that the interleaving distance between \(\gamma _X^{\psi _0}\) and \(\gamma _X^{\psi _1}\) is able to uncover the precise the value of \(\tau \). In this respect, we have: \( d_\textrm{I}^{\textrm{dynM}}(\gamma _X^{\psi _0},\gamma _X^{\psi _1})=\min \Big (\tau ,\ \frac{2\pi }{\omega }-\tau \Big )=:\eta (\omega ,\tau ). \)
Appendix E Higher dimensional persistent homology barcodes of dynamic metric spaces.
In this section we discuss extendibility of Theorem 7.14. The zigzag barcodes \({\bar{c}}(\theta _X)\) and \({\bar{c}}(\theta _Y)\) in Theorem 7.14 encodes the clustering behaviors of the given DMSs for a fixed scale \(\delta \ge 0\).
However, we do not need to restrict ourselves to clustering features of DMSs. Imagine that a flock of birds flies while keeping a circular arrangement from time \(t=0\) to \(t=1\). Regarding this flock as a DMS (trajectory data in \(\textbf{R}^3\)), we may want to have an interval containing [0, 1] in its 1-dimensional homology barcode. This idea can actually be implemented as follows.
For a fixed \(\delta \ge 0\), we substitute the Rips complex functor \(\mathcal {R}_\delta \) for the Rips graph functor \(\mathcal {R}_\delta ^1\) in Proposition 7.13. What we obtain is a dynamic simplicial complex or zigzag simplicial filtration, a generalization of Definition 3.1, induced from any tame DMS \(\gamma _X\). We then can apply the k-th homology functor to this zigzag simplicial filtration for each \(k\ge 0\) in order to obtain a \(\textbf{vec}\)-valued constructible cosheaf over \(\textbf{R}\). This zigzag module will be a signature summarizing the time evolution of k-dimensional homological features of \(\gamma _X\). By virtue of Proposition 2.19 we eventually obtain the k-th homology barcode \({\bar{c}}\left( \textrm{H}_k\left( \mathcal {R}_\delta (\gamma _X)\right) \right) \) of \(\gamma _X\) with respect to the fixed scale \(\delta \ge 0\); see also [35] for the computation of \({\bar{c}}\left( \textrm{H}_k\left( \mathcal {R}_\delta (\gamma _X)\right) \right) \) for various \(\delta \). In particular, the 0-th homology barcode of the resulting zigzag module coincides with \({\bar{c}}\left( \pi _0\left( \mathcal {R}_\delta ^1(\gamma _X)\right) \right) \) as defined in Theorem 7.14.
A natural question is then to ask whether our stability theorem (cf. Theorem 7.14) can be extended to higher dimensional homology barcodes:
Question E.1
For any pair of tame DMSs \(\gamma _X=(X,d_X(\cdot ))\) and \(\gamma _Y=(Y,d_Y(\cdot ))\), is it true that for any \(\delta \ge 0\) and for any \(k\ge 1\),
Interestingly, we found a family of counter-examples that indicates that stability, as expressed by Theorem 7.14, is a phenomenon which seems to be essentially tied to clustering (i.e. \(\textrm{H}_0\)) information.
Theorem E.2
For each integer \(k\ge 1\) there exist two different tame DMSs \(\gamma _{X_k}\) and \(\gamma _{Y_k}\), and \(\delta _k\ge 0\) such that \( d_\textrm{I}^{\textrm{dynM}}\left( \gamma _{X_k},\gamma _{Y_k}\right) <\infty \) but such that the bottleneck distance between the barcodes of \(\textrm{H}_k\left( {\mathcal {R}}_{\delta _k}\left( \gamma _{X_k}\right) \right) \) and \(\textrm{H}_k\left( {\mathcal {R}}_{\delta _k}\left( \gamma _{Y_k}\right) \right) \) is unbounded.
Proof
Fix any \(k\ge 1\). We will illustrate DMSs \(\gamma _{X_k}\) and \(\gamma _{Y_k}\) as collections of trajectories of points in \(\textbf{R}^{k+1}\), with the metric inherited from the Euclidean metric of \(\textbf{R}^{k+1}\) across all \(t\in \textbf{R}\). For \(k=1\) or \(k=2\), see Fig. 18.
Define \(\gamma _{X_k}\) to be the constant DMS consisting of \(2(k+1)\) points \(\pm e_i=(0,\ldots ,0,\pm 1,0,\ldots ,0)\in \textbf{R}^{k+1}\) for \(i=1,2,\ldots ,k+1\). On the other hand, define \(\gamma _{Y_k}\) to be obtained from \(\gamma _{X_k}\) by substituting the still point \(+e_1\) of \(\gamma _{X_k}\) by the oscillating point \((1+\sin ^2(t))e_1=(1+\sin ^2(t),0,\ldots ,0)\) for \(t\in \textbf{R}\).
It is not difficult to check that \(d_\textrm{I}^{\textrm{dynM}}\left( \gamma _{X_k},\gamma _{Y_k}\right) \le \pi /2\). However, with the connectivity parameter \(\delta =\sqrt{2}\), their barcodes of the k-th zigzag persistent homology are \({\bar{c}}\left( \textrm{H}_k\left( \mathcal {R}_\delta (X_k)\right) \right) =\{(-\infty ,\infty )\}\) and \({\bar{c}}\left( \textrm{H}_k\left( \mathcal {R}_\delta (Y_k)\right) \right) =\{[n\pi ,n\pi ]:n\in \textbf{Z}\}\), respectively. Therefore, \(d_\textrm{B}\left( {\bar{c}}\left( \textrm{H}_k\left( \mathcal {R}_\delta (X_k)\right) \right) , {\bar{c}}\left( \textrm{H}_k\left( \mathcal {R}_\delta (Y_k)\right) \right) \right) =+\infty .\) \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, W., Mémoli, F. Extracting Persistent Clusters in Dynamic Data via Möbius Inversion. Discrete Comput Geom 71, 1276–1342 (2024). https://doi.org/10.1007/s00454-023-00590-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00454-023-00590-1
Keywords
- Persistence diagram
- Persistent homology
- Möbius inversion
- Dynamic metric space
- Dynamic graph
- Clustering
- Reeb graph