Skip to main content
Log in

The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

This paper presents a new approach to survival analysis using topological data analysis (TDA) within Bayesian statistics combined with machine learning algorithms suitable to time-to-event data. The paper brings into the analysis aspects of topological invariance through what is known as persistence homology. TDA demonstrates the existence and statistical significance of a kind of unmeasured heterogeneity originating from the topology of the data as a whole. Combined with machine learning tools persistence homology provides us with new tools to construct a rich set of ways to analyze data and build predictive models that are optimized using inherent topological invariants such as one-dimensional loops as regularization. Specifically, this paper incorporates persistent homology effects in different ways in the analysis of survival data through the technique of functional principal component analysis (FPCA): first, by using topological invariants converted into FPCA factors that shape Bayesian statistical analysis of time-to-event data; second, by using FPCA measures of topological invariants in regularizing the process of optimizing the data and the posterior distributions of the Bayesian estimation; three, by using FPCA factors of measures of topological invariants in machine learning algorithms and deep neural networks suitable for analyzing survival data as a way of going beyond usual parametric and semi-parametric models of survival analysis. The approach is illustrated through a running example of multi-frailty survival analysis of democracies in the period of 1950–2010.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

Notes

  1. More generally, a k-simplex is “geometrically the convex hull of its \(k +1\) affinely positioned vertices” (659 Sizemore et al. 2019). See the Appendix for more discussion.

  2. Homology groups of a space X “encode information about the way in which the simplices in successive dimensions are glued together” (102 Rabadan and Blumberg 2019), as “a sequence of vector spaces \(H_{\cdot }(X)\), the dimensions of which count various types of linearly independent [topological] holes in X.” (2 Ghrist 2017)

  3. The data is created using the python package scikit-learn (Pedregosa et al. 2011) https://scikit-learn.org/stable/

  4. I thank one of the reviewers for raising this issue of whether one is not dealing with some spurious effects in this regard. I fully examine this issue in a later section.

  5. Two widely used notions of a metric in persistence homology, Wasserstein and Bottleneck distances, are summarized in the Appendix.

  6. A precise definition of persistence intensity is given in the Appendix.

  7. This method has been implemented in the Python package scikit-fda https://fda.readthedocs.io/en/latest/

  8. Here are the specifics of the covariates: ccode: country code (from 2 to 920), case: country index (from 0 to 100), episode: breakdown episode (0 if none), censored: whether the country was censored (no breakdown) during the study period (1950-2010), event: whether a country has a breakdown episode, time (in years): time to event between breakdowns if any or until censoring occurs. If a country has a breakdown episode and is not censored then the time counter is reset to 1. year: year of observation (1950-2010), exec_pres: presidential system, exec_mix: mixed (president plus parliament), prevd_mil: prior regime military dictatorship, prevd_civ: prior regime civilian dictatorship, prevd_mon: prior regime monarchist dictatorship, NoColony: never former colony, BritColony: former British colony, cs: civil society, pi: party institutionalization, lnGDPMad:log of GDP per capita (as measured by Madison), GDPGrowthMad: GDP growth (as measured by Madison), dempregion: percent democracies in region. The names of all countries included in this dataset are listed in the Appendix.

  9. The following two equations are borrowed from the package scikit-survival at https://github.com/sebp/scikit-survival/blob/v0.19.0.post1/sksurv/metrics.py

  10. These results for time-dependent AUC are obtained using the package scikit-survival for machine learning survival analysis with a split of the data into training (80%) and testing (20%) data, respectively, 2437 and 610 observations (with the requirement that the time range for test data is not within time range of training data to avoid problems of perfect predictability).

  11. This way of modeling survival analysis using a log-cdf for the censored cases is adopted from the python package PYMC (Salvatier et al. 2016) for Bayesian models, which is adopted from STAN modeling of survival analysis.

  12. See (Carriere et al. 2021; Leygonie et al. 2022; Conti et al. 2022; Chen et al. 2019; Moor et al. 2020) and references therein. For a survey of Topological Machine Learning Methods, see (Hensel et al. 2021).

  13. For a quick introduction to topology optimization in engineering, see https://en.wikipedia.org/w/index.php?title=Topology_optimization &oldid=1116600501

  14. These results are summarized inTheorem 4.2 in Carriere et al. (2021) which provides explicit conditions ensuring the convergence of stochastic subgradient descent for functions of persistence (with the main condition being a Lipschitz condition).

  15. I thank the reviewers for encouraging me to address this important issue.

  16. For more efficient convergence the discrete time-to-event was converted to a log scale in the MCMC computation.

  17. A good summarizing reference for the machine classification scheme is Hastie et al. (2009). For the TDA functionals I used what is available in the package (GUDHI 2022).

  18. This benefited from an example in sickit-learn documentation (Pedregosa et al. 2011), https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_iris.html

  19. This last step is my amendment to the algorithm constructed in Hensel et al. (2022).

  20. These are: Random Survival Forests, Fast Survival SVM, Fast Kernel Survival Support Vector, Hinge Loss Survival SVM, Minlip Survival Analysis, Naive Survival SVM, Componentwise Gradient Boosting Survival Analysis, Gradient BoostingS urvival Analysis, Extra Survival Trees. They are packaged in scikit-survival (Pölsterl 2020), https://scikit-survival.readthedocs.io/en/stable/api/index.html.

  21. PE, PL, PI, SIl and TOP respectively stand for persisitence entropy, landscape, image, silhouette and topological vector.

References

  • Adams, H., et al.: Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18, 1–35 (2017)

    Google Scholar 

  • Adams, Henry, Emerson, Tegan, Kirby, Michael, Neville, Rachel, Peterson, Chris, Shipman, Patrick, Chepushtanova, Sofya, Hanson, Eric, Motta, Francis, Ziegelmeier, Lori: Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18(8), 1–35 (2017)

    Google Scholar 

  • Bansal, Aasthaa, Heagerty, Patrick J.: A comparison of landmark methods and time-dependent ROC methods to evaluate the time-varying performance of prognostic markers for survival outcomes. Diagnos. Prognost. Res. 3(1), 14 (2019)

    Article  Google Scholar 

  • Bauer, Ulrich, Lange, Carsten, Wardetzky, Max: Optimal topological simplification of discrete functions on surfaces. Dis. Comput. Geomet 47(2), 347–377 (2012)

    Article  Google Scholar 

  • Bernhard, Michael, Hicken, Allen, Reenock, Christopher, Lindberg, Staffan I.: Parties, civil society, and the deterrence of democratic defection. Stud. Comp. Int. Dev. 55(1), 1–26 (2020)

    Article  Google Scholar 

  • Berry, Eric, Chen, Yen-Chi., Cisewski-Kehe, Jessi, Fasy, Brittany Terese: Functional summaries of persistence diagrams. J. Appl. Comput. Topol. 4(2), 211–262 (2020)

    Article  Google Scholar 

  • Bubenik, Peter: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16, 77–102 (2015)

    Google Scholar 

  • Cang, Zixuan, Wei, Guo-Wei.: Persistent cohomology for data with multicomponent heterogeneous information. SIAM J. Math. Data Sci. 2(2), 396–418 (2020)

    Article  Google Scholar 

  • Carlsson, Gunnar: Topology and Data. Bull. Am. Math. Soc. 46, 255–308 (2009)

    Article  Google Scholar 

  • Carriere, M., Chazal, F., Glisse, M., Ike, Y., Kannan, H., Umeda, Y.: Optimizing persistent homology based functions. Pages 1294–1303 of: Meila, Marina, Zhang, Tong (eds), Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139. PMLR (2021)

  • Chazal, F.: The structure and stability of persistence modules the structure and stability of persistence modules. Springer (2016)

  • Chazal, Frederic, Michel, Bertrand: An introduction to topological data analysis: fundamental and practical aspects for data scientists. Front. Artificial Intell. 4, 108 (2021)

    Article  Google Scholar 

  • Chen, C., Ni, X., Bai, Q., Wang, Y.: A Topological Regularizer for Classifiers via Persistent Homology. pp. 2573–2582 of: Chaudhuri, Kamalika, Sugiyama, Masashi (eds), Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 89. PMLR (2019)

  • Conti, F., Moroni, D., Pascali, M.A.: A topological machine learning pipeline for classification. Mathematics 10(17) (2022)

  • Edelsbrunner, L., Zomorodian.: Topological Persistence and Simplification. Discrete & Computational Geometry 28(4), 511–533 (2002)

  • Fawcett, Tom: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  Google Scholar 

  • Fefferman, Charles, Mitter, Sanjoy, Narayanan, Hariharan: Testing the manifold hypothesis. J. Am. Math. Soc. 29(4), 983–1049 (2016)

    Article  Google Scholar 

  • Gameiro, Marcio, Hiraoka, Yasuaki, Obayashi, Ippei: Continuation of point clouds via persistence diagrams. Physica D 334, 118–132 (2016)

    Article  Google Scholar 

  • Ghrist, Robert: Barcodes - THe Persistent Topology of Data. Bull. Amer. Math. Soc. 45, 61–75 (2008)

    Article  Google Scholar 

  • Ghrist, Robert: Elementary Applied Topology,ed. 1.0. Createspace (2014)

  • Ghrist, Robert: Homological algebra and data. Math. Data, IAS/Park City Math. 25, 273–325 (2017)

    Google Scholar 

  • Hastie, T., Robert, T., Friedman, J.: The elements of statistical learning. data mining, inference, and prediction, Second Edition. Springer (2009)

  • Hatcher, A.: Algebraic Topology. Cambridge University Press (2002)

  • Hensel, F., Moor, M., Rieck, B.: A survey of topological machine learning methods. Front. Artificial Intell. 4 (2021)

  • Hensel, Felix, Glisse, Marc, Chazal, Frédéric., de Surrel, Thibault, Carriere, Mathieu, Lacombe, Theo, Kurihara, Hiroaki, Ike, Yuichi: RipsNet: A general architecture for fast and robust estimation of the persistent homology of point clouds. Proceed. Mach. Learn. Res. 196, 96–106 (2022)

    Google Scholar 

  • Kvamme, Håvard., Borgan, Ørnulf., Scheel, Ida: Time-to-Event Prediction with Neural Networks and Cox Regression. J. Mach. Learn. Res. 20(129), 1–30 (2019)

    Google Scholar 

  • Lambert, Jérôme., Chevret, Sylvie: Summary measure of discrimination in survival models based on cumulative/dynamic time-dependent ROC curves. Stat. Methods Med. Res. 25(5), 2088–2102 (2014)

    Article  Google Scholar 

  • Leygonie, Jacob, Oudot, Steve, Tillmann, Ulrike: A framework for differential calculus on persistence barcodes. Found. Comput. Math. 22(4), 1069–1131 (2022)

    Article  Google Scholar 

  • Duchin, M., Needham, T., Weighill, T.: The (homological) persistence of gerrymandering. Foundations of Data Science 1–42 (2021)

  • Moor, M., Horn, M., Rieck, B., Borgwardt, K.: Topological Autoencoders. pp. 7045–7054 of: III, Hal Daumé, Singh, Aarti (eds), Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119. PMLR (2020)

  • Obayashi, Ippei, Hiraoka, Yasuaki, Kimura, Masao: Persistence diagrams with linear machine learning models. J. Appl. Comput. Topol. 1(3), 421–449 (2018)

    Article  Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  • Pölsterl, Sebastian: scikit-survival: a library for time-to-event analysis built on top of scikit-learn. J. Mach. Learn. Res. 21(212), 1–6 (2020)

    Google Scholar 

  • Poulenard, Adrien, Skraba, Primoz, Ovsjanikov, Maks: Topological function optimization for continuous shape matching. Comput. Graphics Forum 37(5), 13–25 (2018)

    Article  Google Scholar 

  • Rabadan, Raul, Blumberg, Andrew J.: Topological Data Analysis for Genomics and Evolution: Topology in Biology. Cambridge University Press, Cambridge (2019)

    Book  Google Scholar 

  • Ross, Lauren N.: Distinguishing topological and causal explanation. Synthese 198(10), 9803–9820 (2021)

    Article  Google Scholar 

  • Salvatier, John, Wiecki, Thomas V., Fonnesbeck, Christopher: Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016)

    Article  Google Scholar 

  • Sizemore, Ann E., Phillips-Cremins, Jennifer E., Ghrist, Robert, Bassett, Danielle S.: The importance of the whole: topological data analysis for the network neuroscientist. Netw. Neurosci. 3(3), 656–673 (2019)

    Article  Google Scholar 

  • Skraba, P., Turner, K.: Wasserstein Stability for Persistence Diagrams. (2021) arXiv:2006.16824v3 [math.AT]

  • Tauzin, G., Lupo, U., Tunstall, L., Pérez, J.B., Caorsi, M., Medina-Mardones, A., Dassatti, A., Hess, K.: giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration (2020)

  • Tauzin, Guillaume, Lupo, Umberto, Tunstall, Lewis, Pérez, Julian Burella, Caorsi, Matteo, Medina-Mardones, Anibal M., Dassatti, Alberto, Hess, Kathryn: giotto-tda: a topological data analysis toolkit for machine learning and data exploration. J. Mach. Learn. Res. 22(39), 1–6 (2021)

    Google Scholar 

  • The GUDHI Project. GUDHI User and Reference Manual. 3.5.0 edn. GUDHI Editorial Board (2022)

  • Turner, Katharine: Medians of populations of persistence diagrams. Homol., Homotopy and Appl. 22(1), 255–282 (2020)

    Article  Google Scholar 

  • Turner, Katharine, Mileyko, Yuriy, Mukherjee, Sayan, Harer, John: Fréchet means for distributions of persistence diagrams. Dis. Comput. Geomet 52(1), 44–70 (2014)

    Article  Google Scholar 

  • Wasserman, Larry: Topological Data Analysis. Annual Review of Statistics and Its Application 5(1), 501–532 (2018)

    Article  Google Scholar 

  • Zhang, Jianfei, Chen, Lifei, Ye, Yanfang, Guo, Gongde, Chen, Rongbo, Vanasse, Alain, Wang, Shengrui: Survival neural networks for time-to-event prediction in longitudinal study. Knowl. Inf. Syst. 62(9), 3727–3751 (2020)

    Article  Google Scholar 

  • Zomorodian, A.: Topology for Computing. Cambridge University Press (2009)

Download references

Funding

The author declares that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Badredine Arfi.

Ethics declarations

Conflict of interest

The author has no relevant financial or non-financial interests to disclose

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 7798 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arfi, B. The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival. Qual Quant 58, 1685–1727 (2024). https://doi.org/10.1007/s11135-023-01708-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-023-01708-6

Keywords

Navigation