The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival

Arfi, Badredine

doi:10.1007/s11135-023-01708-6

The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival

Published: 15 July 2023

Volume 58, pages 1685–1727, (2024)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

Badredine Arfi ORCID: orcid.org/0000-0002-0905-7359¹

370 Accesses
1 Citation
6 Altmetric
1 Mention
Explore all metrics

Abstract

This paper presents a new approach to survival analysis using topological data analysis (TDA) within Bayesian statistics combined with machine learning algorithms suitable to time-to-event data. The paper brings into the analysis aspects of topological invariance through what is known as persistence homology. TDA demonstrates the existence and statistical significance of a kind of unmeasured heterogeneity originating from the topology of the data as a whole. Combined with machine learning tools persistence homology provides us with new tools to construct a rich set of ways to analyze data and build predictive models that are optimized using inherent topological invariants such as one-dimensional loops as regularization. Specifically, this paper incorporates persistent homology effects in different ways in the analysis of survival data through the technique of functional principal component analysis (FPCA): first, by using topological invariants converted into FPCA factors that shape Bayesian statistical analysis of time-to-event data; second, by using FPCA measures of topological invariants in regularizing the process of optimizing the data and the posterior distributions of the Bayesian estimation; three, by using FPCA factors of measures of topological invariants in machine learning algorithms and deep neural networks suitable for analyzing survival data as a way of going beyond usual parametric and semi-parametric models of survival analysis. The approach is illustrated through a running example of multi-frailty survival analysis of democracies in the period of 1950–2010.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A principal component model for forecasting age- and sex-specific survival probabilities in Western Europe

Article 01 December 2017

A neural network approach for the mortality analysis of multiple populations: a case study on data of the Italian population

Article Open access 06 March 2024

Advancing mortality rate prediction in European population clusters: integrating deep learning and multiscale analysis

Article Open access 15 March 2024

Notes

More generally, a k-simplex is “geometrically the convex hull of its \(k +1\) affinely positioned vertices” (659 Sizemore et al. 2019). See the Appendix for more discussion.
Homology groups of a space X “encode information about the way in which the simplices in successive dimensions are glued together” (102 Rabadan and Blumberg 2019), as “a sequence of vector spaces \(H_{\cdot }(X)\), the dimensions of which count various types of linearly independent [topological] holes in X.” (2 Ghrist 2017)
The data is created using the python package scikit-learn (Pedregosa et al. 2011) https://scikit-learn.org/stable/
I thank one of the reviewers for raising this issue of whether one is not dealing with some spurious effects in this regard. I fully examine this issue in a later section.
Two widely used notions of a metric in persistence homology, Wasserstein and Bottleneck distances, are summarized in the Appendix.
A precise definition of persistence intensity is given in the Appendix.
This method has been implemented in the Python package scikit-fda https://fda.readthedocs.io/en/latest/
Here are the specifics of the covariates: ccode: country code (from 2 to 920), case: country index (from 0 to 100), episode: breakdown episode (0 if none), censored: whether the country was censored (no breakdown) during the study period (1950-2010), event: whether a country has a breakdown episode, time (in years): time to event between breakdowns if any or until censoring occurs. If a country has a breakdown episode and is not censored then the time counter is reset to 1. year: year of observation (1950-2010), exec_pres: presidential system, exec_mix: mixed (president plus parliament), prevd_mil: prior regime military dictatorship, prevd_civ: prior regime civilian dictatorship, prevd_mon: prior regime monarchist dictatorship, NoColony: never former colony, BritColony: former British colony, cs: civil society, pi: party institutionalization, lnGDPMad:log of GDP per capita (as measured by Madison), GDPGrowthMad: GDP growth (as measured by Madison), dempregion: percent democracies in region. The names of all countries included in this dataset are listed in the Appendix.
The following two equations are borrowed from the package scikit-survival at https://github.com/sebp/scikit-survival/blob/v0.19.0.post1/sksurv/metrics.py
These results for time-dependent AUC are obtained using the package scikit-survival for machine learning survival analysis with a split of the data into training (80%) and testing (20%) data, respectively, 2437 and 610 observations (with the requirement that the time range for test data is not within time range of training data to avoid problems of perfect predictability).
This way of modeling survival analysis using a log-cdf for the censored cases is adopted from the python package PYMC (Salvatier et al. 2016) for Bayesian models, which is adopted from STAN modeling of survival analysis.
See (Carriere et al. 2021; Leygonie et al. 2022; Conti et al. 2022; Chen et al. 2019; Moor et al. 2020) and references therein. For a survey of Topological Machine Learning Methods, see (Hensel et al. 2021).
For a quick introduction to topology optimization in engineering, see https://en.wikipedia.org/w/index.php?title=Topology_optimization &oldid=1116600501
These results are summarized inTheorem 4.2 in Carriere et al. (2021) which provides explicit conditions ensuring the convergence of stochastic subgradient descent for functions of persistence (with the main condition being a Lipschitz condition).
I thank the reviewers for encouraging me to address this important issue.
For more efficient convergence the discrete time-to-event was converted to a log scale in the MCMC computation.
A good summarizing reference for the machine classification scheme is Hastie et al. (2009). For the TDA functionals I used what is available in the package (GUDHI 2022).
This benefited from an example in sickit-learn documentation (Pedregosa et al. 2011), https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_iris.html
This last step is my amendment to the algorithm constructed in Hensel et al. (2022).
These are: Random Survival Forests, Fast Survival SVM, Fast Kernel Survival Support Vector, Hinge Loss Survival SVM, Minlip Survival Analysis, Naive Survival SVM, Componentwise Gradient Boosting Survival Analysis, Gradient BoostingS urvival Analysis, Extra Survival Trees. They are packaged in scikit-survival (Pölsterl 2020), https://scikit-survival.readthedocs.io/en/stable/api/index.html.
PE, PL, PI, SIl and TOP respectively stand for persisitence entropy, landscape, image, silhouette and topological vector.

References

Adams, H., et al.: Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18, 1–35 (2017)
Google Scholar
Adams, Henry, Emerson, Tegan, Kirby, Michael, Neville, Rachel, Peterson, Chris, Shipman, Patrick, Chepushtanova, Sofya, Hanson, Eric, Motta, Francis, Ziegelmeier, Lori: Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18(8), 1–35 (2017)
Google Scholar
Bansal, Aasthaa, Heagerty, Patrick J.: A comparison of landmark methods and time-dependent ROC methods to evaluate the time-varying performance of prognostic markers for survival outcomes. Diagnos. Prognost. Res. 3(1), 14 (2019)
Article Google Scholar
Bauer, Ulrich, Lange, Carsten, Wardetzky, Max: Optimal topological simplification of discrete functions on surfaces. Dis. Comput. Geomet 47(2), 347–377 (2012)
Article Google Scholar
Bernhard, Michael, Hicken, Allen, Reenock, Christopher, Lindberg, Staffan I.: Parties, civil society, and the deterrence of democratic defection. Stud. Comp. Int. Dev. 55(1), 1–26 (2020)
Article Google Scholar
Berry, Eric, Chen, Yen-Chi., Cisewski-Kehe, Jessi, Fasy, Brittany Terese: Functional summaries of persistence diagrams. J. Appl. Comput. Topol. 4(2), 211–262 (2020)
Article Google Scholar
Bubenik, Peter: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16, 77–102 (2015)
Google Scholar
Cang, Zixuan, Wei, Guo-Wei.: Persistent cohomology for data with multicomponent heterogeneous information. SIAM J. Math. Data Sci. 2(2), 396–418 (2020)
Article Google Scholar
Carlsson, Gunnar: Topology and Data. Bull. Am. Math. Soc. 46, 255–308 (2009)
Article Google Scholar
Carriere, M., Chazal, F., Glisse, M., Ike, Y., Kannan, H., Umeda, Y.: Optimizing persistent homology based functions. Pages 1294–1303 of: Meila, Marina, Zhang, Tong (eds), Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139. PMLR (2021)
Chazal, F.: The structure and stability of persistence modules the structure and stability of persistence modules. Springer (2016)
Chazal, Frederic, Michel, Bertrand: An introduction to topological data analysis: fundamental and practical aspects for data scientists. Front. Artificial Intell. 4, 108 (2021)
Article Google Scholar
Chen, C., Ni, X., Bai, Q., Wang, Y.: A Topological Regularizer for Classifiers via Persistent Homology. pp. 2573–2582 of: Chaudhuri, Kamalika, Sugiyama, Masashi (eds), Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 89. PMLR (2019)
Conti, F., Moroni, D., Pascali, M.A.: A topological machine learning pipeline for classification. Mathematics 10(17) (2022)
Edelsbrunner, L., Zomorodian.: Topological Persistence and Simplification. Discrete & Computational Geometry 28(4), 511–533 (2002)
Fawcett, Tom: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article Google Scholar
Fefferman, Charles, Mitter, Sanjoy, Narayanan, Hariharan: Testing the manifold hypothesis. J. Am. Math. Soc. 29(4), 983–1049 (2016)
Article Google Scholar
Gameiro, Marcio, Hiraoka, Yasuaki, Obayashi, Ippei: Continuation of point clouds via persistence diagrams. Physica D 334, 118–132 (2016)
Article Google Scholar
Ghrist, Robert: Barcodes - THe Persistent Topology of Data. Bull. Amer. Math. Soc. 45, 61–75 (2008)
Article Google Scholar
Ghrist, Robert: Elementary Applied Topology,ed. 1.0. Createspace (2014)
Ghrist, Robert: Homological algebra and data. Math. Data, IAS/Park City Math. 25, 273–325 (2017)
Google Scholar
Hastie, T., Robert, T., Friedman, J.: The elements of statistical learning. data mining, inference, and prediction, Second Edition. Springer (2009)
Hatcher, A.: Algebraic Topology. Cambridge University Press (2002)
Hensel, F., Moor, M., Rieck, B.: A survey of topological machine learning methods. Front. Artificial Intell. 4 (2021)
Hensel, Felix, Glisse, Marc, Chazal, Frédéric., de Surrel, Thibault, Carriere, Mathieu, Lacombe, Theo, Kurihara, Hiroaki, Ike, Yuichi: RipsNet: A general architecture for fast and robust estimation of the persistent homology of point clouds. Proceed. Mach. Learn. Res. 196, 96–106 (2022)
Google Scholar
Kvamme, Håvard., Borgan, Ørnulf., Scheel, Ida: Time-to-Event Prediction with Neural Networks and Cox Regression. J. Mach. Learn. Res. 20(129), 1–30 (2019)
Google Scholar
Lambert, Jérôme., Chevret, Sylvie: Summary measure of discrimination in survival models based on cumulative/dynamic time-dependent ROC curves. Stat. Methods Med. Res. 25(5), 2088–2102 (2014)
Article Google Scholar
Leygonie, Jacob, Oudot, Steve, Tillmann, Ulrike: A framework for differential calculus on persistence barcodes. Found. Comput. Math. 22(4), 1069–1131 (2022)
Article Google Scholar
Duchin, M., Needham, T., Weighill, T.: The (homological) persistence of gerrymandering. Foundations of Data Science 1–42 (2021)
Moor, M., Horn, M., Rieck, B., Borgwardt, K.: Topological Autoencoders. pp. 7045–7054 of: III, Hal Daumé, Singh, Aarti (eds), Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119. PMLR (2020)
Obayashi, Ippei, Hiraoka, Yasuaki, Kimura, Masao: Persistence diagrams with linear machine learning models. J. Appl. Comput. Topol. 1(3), 421–449 (2018)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Pölsterl, Sebastian: scikit-survival: a library for time-to-event analysis built on top of scikit-learn. J. Mach. Learn. Res. 21(212), 1–6 (2020)
Google Scholar
Poulenard, Adrien, Skraba, Primoz, Ovsjanikov, Maks: Topological function optimization for continuous shape matching. Comput. Graphics Forum 37(5), 13–25 (2018)
Article Google Scholar
Rabadan, Raul, Blumberg, Andrew J.: Topological Data Analysis for Genomics and Evolution: Topology in Biology. Cambridge University Press, Cambridge (2019)
Book Google Scholar
Ross, Lauren N.: Distinguishing topological and causal explanation. Synthese 198(10), 9803–9820 (2021)
Article Google Scholar
Salvatier, John, Wiecki, Thomas V., Fonnesbeck, Christopher: Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016)
Article Google Scholar
Sizemore, Ann E., Phillips-Cremins, Jennifer E., Ghrist, Robert, Bassett, Danielle S.: The importance of the whole: topological data analysis for the network neuroscientist. Netw. Neurosci. 3(3), 656–673 (2019)
Article Google Scholar
Skraba, P., Turner, K.: Wasserstein Stability for Persistence Diagrams. (2021) arXiv:2006.16824v3 [math.AT]
Tauzin, G., Lupo, U., Tunstall, L., Pérez, J.B., Caorsi, M., Medina-Mardones, A., Dassatti, A., Hess, K.: giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration (2020)
Tauzin, Guillaume, Lupo, Umberto, Tunstall, Lewis, Pérez, Julian Burella, Caorsi, Matteo, Medina-Mardones, Anibal M., Dassatti, Alberto, Hess, Kathryn: giotto-tda: a topological data analysis toolkit for machine learning and data exploration. J. Mach. Learn. Res. 22(39), 1–6 (2021)
Google Scholar
The GUDHI Project. GUDHI User and Reference Manual. 3.5.0 edn. GUDHI Editorial Board (2022)
Turner, Katharine: Medians of populations of persistence diagrams. Homol., Homotopy and Appl. 22(1), 255–282 (2020)
Article Google Scholar
Turner, Katharine, Mileyko, Yuriy, Mukherjee, Sayan, Harer, John: Fréchet means for distributions of persistence diagrams. Dis. Comput. Geomet 52(1), 44–70 (2014)
Article Google Scholar
Wasserman, Larry: Topological Data Analysis. Annual Review of Statistics and Its Application 5(1), 501–532 (2018)
Article Google Scholar
Zhang, Jianfei, Chen, Lifei, Ye, Yanfang, Guo, Gongde, Chen, Rongbo, Vanasse, Alain, Wang, Shengrui: Survival neural networks for time-to-event prediction in longitudinal study. Knowl. Inf. Syst. 62(9), 3727–3751 (2020)
Article Google Scholar
Zomorodian, A.: Topology for Computing. Cambridge University Press (2009)

Download references

Funding

The author declares that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Political Science, University of Florida, Gainesville, FL 32618, USA
Badredine Arfi

Authors

Badredine Arfi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Badredine Arfi.

Ethics declarations

Conflict of interest

The author has no relevant financial or non-financial interests to disclose

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 7798 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Arfi, B. The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival. Qual Quant 58, 1685–1727 (2024). https://doi.org/10.1007/s11135-023-01708-6

Download citation

Accepted: 24 June 2023
Published: 15 July 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11135-023-01708-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival

Abstract

Access this article

Similar content being viewed by others

A principal component model for forecasting age- and sex-specific survival probabilities in Western Europe

A neural network approach for the mortality analysis of multiple populations: a case study on data of the Italian population

Advancing mortality rate prediction in European population clusters: integrating deep learning and multiscale analysis

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 7798 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival

Abstract

Access this article

Similar content being viewed by others

A principal component model for forecasting age- and sex-specific survival probabilities in Western Europe

A neural network approach for the mortality analysis of multiple populations: a case study on data of the Italian population

Advancing mortality rate prediction in European population clusters: integrating deep learning and multiscale analysis

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 7798 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation