Topological Deep Learning: A Review of an Emerging Paradigm

Topological data analysis (TDA) provides insight into data shape. The summaries obtained by these methods are principled global descriptions of multi-dimensional data whilst exhibiting stable properties such as robustness to deformation and noise. Such properties are desirable in deep learning pipelines but they are typically obtained using non-TDA strategies. This is partly caused by the difficulty of combining TDA constructs (e.g. barcode and persistence diagrams) with current deep learning algorithms. Fortunately, we are now witnessing a growth of deep learning applications embracing topologically-guided components. In this survey, we review the nascent field of topological deep learning by first revisiting the core concepts of TDA. We then explore how the use of TDA techniques has evolved over time to support deep learning frameworks, and how they can be integrated into different aspects of deep learning. Furthermore, we touch on TDA usage for analyzing existing deep models; deep topological analytics. Finally, we discuss the challenges and future prospects of topological deep learning.


Introduction
Topological data analysis (TDA) is a relatively recent amalgam of theory and algorithms that aim to obtain a geometric and topological understanding of data from real world applications.The approach to data employed in TDA fundamentally differs from that in statistical learning.Rather than finding summary statistics, estimators, fitting approximate distributions, clustering, or training neural nets, TDA instead seeks to understand the properties of the geometric object, often a manifold, on which the data resides.This reflects the common intuition that data tends to lie on, or close to, a lower dimensional manifold that is embedded in high dimensional feature space.In this article, we sometimes refer to this as the data manifold.
The main goal of TDA is to infer information about the global structure of the data manifold, such as its connectivity and the presence of multi-dimensional holes.An important property of the topological information obtained is its invariance to continuous deformation and scaling.This property also lends itself to robustness against perturbation and noise.Another benefit is the versatility of the TDA methods, owed mostly to the abstract origins of algebraic topology.The methods are applicable to a wide variety of data types and objects.This includes point cloud data in Euclidean spaces, categorical data, or the analysis of images and functions.Due to these aspects, the absence of parameters to tune, and the fundamental mathematical nature of the TDA approach, it is intriguing to include it in deep neural networks.
There has been much recent activity in co-opting topological approaches in deep learning, however, there remain considerable open questions as to what the leading approach should be, due to many computational and theoretical concerns.The TDA methods discussed in this paper form but a small part of the ever-expanding interface between topological data analysis and machine learning.We did our best to choose work that has a historical and linear connection with deep learning approaches, to improve understandability.
This survey provides the broader machine learning community with a convenient starting point to explore how TDA has been integrated with deep learning.To the best of our knowledge, this is the first work that comprehensively covers topological deep learning and organizes the research works in this field in a unified taxonomy (Section 3).
We start in Section 2 by introducing the key theoretical concepts of TDA and their representations for learning.In Section 3 we explain how topological approaches can fit into different deep learning constructs, such as learnable features, feature transformations, and loss functions.In Section 3.4 we shed the light on a promising use of TDA to understand and dissect trained deep models, called deep topological analytics.
We continue in Section 4.1 with a discussion of the known challenges of TDA and its adaptation to deep neural networks.We further discuss future directions and adjacent applications of topological deep learning, and we present some current libraries.Finally, we make some concluding remarks in Section 5.
Notations: We write X ∈ R n×d to denote the data set, where n is the number of samples and d the number of features or dimensions.We write M to denote the underlying data manifold, which for the purposes of this survey is a locally Euclidean space embedded in R d .We write BD and  PD as abbreviations of barcode diagram and persistence diagram.
2 Overview of TDA An object's topology is broadly defined as the characteristics that remain invariant under continuous deformation, as if the object was made of soft rubber.How many connected components the object contains, the holes or voids it contains, and how the object loops back on itself are a few examples of topological properties.In a sense, topological information can be considered qualitative.For example, if we demonstrate that data points lie on two totally disconnected sub-manifolds, then we know that the data comes from two very distinct sources, or that the underlying system has two distinct states.
A central concept is that of homology, which is a powerful tool to characterize the topological features of a space.Homology is an abstract concept.It is difficult to work with, and its general definition is outside the scope of this paper or even most of the TDA literature [Carlsson, 2009].In essence, the k-th homology (where 0 ≤ k ≤ d) is a group that characterizes the set of k-dimensional holes (or voids) in a topological space.A 1-dimensional hole can be traced around with a 1-dimensional loop (like a loop of string), whereas a 2-dimensional hole is a void, for example, the void within a hollow sphere in 3 dimensions.These k-dimensional holes are counted by the Betti numbers.The k-th Betti number is defined as the rank of the k-th homology, which in general can be quite difficult to compute.Fortunately, there are some spaces for which the Betti numbers are relatively straightfor-ward to compute.

Simplicial complexes and persistent homology
The k-th homology is much more convenient to work with when we restrict ourselves to simplicial complexes, which are structures built upon discrete sets.This is the natural domain for data-driven and machine learning applications.
A simplex can be considered a generalization of a triangle or tetrahedron, it is the simplest polytope of any given dimension.A simplex in zero dimensions is a point, in one dimension is a line segment, in two dimensions is a triangle, in three is a tetrahedron, and so on.We use k-simplex to refer to a simplex of dimension k.Note that any simplex is composed of faces which are themselves simplices of lower dimension.A simplicial complex K is a collection of simplices with two properties: each face of a simplex in K must also be in K, and the intersection of any two simplices of K is either empty or a face of both of them.
Consider each point in our data set X to be a vertex (a 0simplex).We can define a set of 1-simplices as connections between pairs of vertices, 2-simplices between collections of three vertices, and so on.Thus we build a simplicial complex K that gives some sense of "connectivity" between data points.It can be thought of as a hyper-graph on X.Note that K is not necessarily unique on X.
Homological information is much easier obtained for a simplicial complex, and in particular, the k-th Betti number can be obtained through tractable linear algebra.The Betti numbers in this setting are closely related to Euler characteristic, which gives the relationship between the numbers of vertices, edges, and faces in a polyhedron.
The goal now is to construct simplicial complexes on X that reflect the underlying topology of M. This is done by varying scale, typically a radius r > 0. The Čech complex and the Vietoris-Rips complex are two typical constructions [Chazal and Michel, 2021].A Čech complex C r (X) includes a k-simplex on (k + 1) vertices of X if the collection of balls of radius r centered on each vertex has a non-empty intersection.The Vietoris-Rips (or simply Rips) complex V r (X) includes a k-simplex on any set of (k + 1) vertices that all have a pairwise distance less than r of each other [Zomorodian, 2010].These two constructions of simplicial complexes can yield very different results on the same data set with the same r.
Persistent homology is obtained through a filtration F , which is a growing sequence of sub-complexes: Two commonly used examples of filtration are the sets of simplicial complexes, C r (X) or V r (X), that are obtained with increasing radius r.As we vary r, these constructs will naturally reflect different aspects of the topology of M.There is monotone inclusion of these simplicial complexes with increasing r, i.e. for two radii r ≤ r we have that C r (X) ⊆ C r (X) and V r (X) ⊆ V r (X).
The key idea is to track changes in topological features as they appear and disappear over the filtration.We may see new loops created, separate components connected, or holes filled in as we increase r.We record the lifetime of these features with respect to r, that is the appearance (at b i for birth) and disappearance (at d i for death) of a particular topological feature.

Representations of persistent homology
The set of birth and death coordinates obtained from the filtration forms the backbone of persistent homology.The two most popular representations of this information are barcode diagrams and persistence diagrams [Carlsson, 2009].The multi-set of intervals (b i , d i ) form the barcode diagram (BD), the name coming from the visual representation of the set of intervals as stacked line segments.In the persistence diagram (PD) the lifetime of each feature is represented by a point in R 2 with coordinates (b i , d i ).A filtration may have several copies of the same birth and death interval, which is represented in the PD by giving the point (b i , d i ) an integer valued multiplicity.It is important to note that the BD and PD contain equivalent information and one can define a bijection between the two.From here onwards we use the term PD to refer to either construct unless BD is explicitly referred to.
A data set's PD contains a wealth of topological information.Features that have a long persistence interval (d i − b i ) are considered to be likely to reflect the true topological features of the underlying manifold M.These features are represented in the PD by points far away from the diagonal.A short persistence interval describes a feature that is possibly generated from noise or is otherwise insignificant.Features with short persistence will be represented by points close to the diagonal line in the PD.Hence, points in the PD further from the diagonal are considered more informative.
Comparing the PDs of two objects is a way to assess their topological similarity.In the next section, we discuss various methods to represent them in manners suitable for machine learning and computation.

Homological feature vectorizations
Most machine learning methods assume that the input data resides in R d or more generally some Hilbert space H. Hence they cannot be directly applied to datasets comprised of PDs, and the multi-set information contained in the PD needs to be represented in some vector format.This process is called vectorization, which requires the definition of a continuous map f : PD → H.There is a plethora of different methods in the literature and there are some subtle consequences that come with different choices of vectorization techniques [Ali et al., 2022].It is important to note that these vectorization methods can be thought of as handcrafted feature engineering, rather than feature learning.In this section, we discuss various strategies that have evolved over time.
A simple approach for representing PDs is using their statistical properties such as the sum, mean, variance, maximum, minimum, etc [Ali et al., 2022].The total Betti number of a certain filtration can also be used as a summary representation [Cang et al., 2015].These approaches yield a univariate output and lose information, however can still be useful.
Another approach is to vectorize BDs using histogram-like methods [Cang and Wei, 2017].The basic concept is to discretize the BD along the filtration axis, creating equal sized bins in which we count the number of persistent intervals.Alternatively, tropical coordinates defined on the space of BDs are a useful and stable algebraic representation [Kališnik, 2018].
Yet a different approach is to construct various forms of persistence functions from PDs.These functions are readily vectorized themselves, however, it is also convenient to work with them directly for many tasks [Bubenik, 2020;Adams et al., 2017].Example of these persistent functions includes persistence landscape [Bubenik, 2020] , persistence Betti number [Edelsbrunner et al., 2002], persistence Betti function [Xia et al., 2017], persistence surfaces and persistence images [Adams et al., 2017], etc.
A useful feature representation technique called persistence codebooks [Zieliński et al., 2020] uses bag-of-words quantization techniques to group data points into a fixed sized vector.Chevyrev et.al. [Chevyrev et al., 2020] proposed persistence paths, which is a feature map for barcodes.
Representation can vary from simple to complex structures.To get better structural representations there is scope to investigate new methods of vectorization which can benefit topological learning models.Note however that when a large feature vector is used to represent PDs, the curse of dimensionality comes into play.In this case, variable selection, regularization approaches, or dropout methods should be considered [Pun et al., 2022].
In addition, it is important to consider the comparison of different PDs.To this end various metrics have been proposed, such as bottleneck distance [Mileyko et al., 2011], and adaptations of Gromov-Hausdorff and Wasserstein metric [Bubenik et al., 2017].Many other metrics have been considered in the literature as well.A central consideration is the stability of vectorizations and metrics.We discuss this further in Section 3.3.
As discussed, vectorization methods can be used in input space, however, kernel-based models are another important way to combine PD information with machine learning models [Kwitt et al., 2015].Since metrics can be modified into kernels, various approaches have been proposed to induce kernel function from PD information [Pun et al., 2022] and into traditional machine learning approaches like PCA and SVM.Topological-based kernel methods have been used successfully in various ways [Zhu et al., 2016;Kwitt et al., 2015], however techniques based on kernel methods suffer from scalability issues [Pun et al., 2022], as training typically scales poorly with the sample number (e.g., roughly cubic in the case of kernel-SVMs).We do not discuss topological kernel methods any further in this paper.
Many of the aforementioned methods have advantageous stability properties with respect to standard metrics in TDA like the Wasserstein or Bottleneck distances.However, they all have the same drawback: the mapping of topological representation that is compatible with existing learning techniques is pre-defined.Therefore, it is fixed and agnostic to any specific learning task, which makes it suboptimal.The phenomenal success of deep neural networks has shown that learning representations (i.e.feature learning) is a preferable approach.

Topological Deep Learning (TDL)
Topological representations that incorporate structural information hold great promise for topological deep learning models [Hofer et al., 2017].Combining these cues with deep learning approaches has inherent benefits in various applications.On the flip side, deep learning approaches can be useful in overcoming some common hurdles faced by TDA approaches in estimating robust topological features.The incorporation of topological concepts into deep learning has only recently been investigated and the following are general benefits: • Global features from input data can be efficiently and robustly extracted that would otherwise be inaccessible via traditional feature maps.
• TDA is versatile and adaptable, meaning that we are not limited to specific problems and types of data (such as images, sensor measurements, time series, graphs, etc.).
• TDA is noise-resistant in several different problems, including the classification of 3D surface meshes, the recognition of 2D object shapes, the manifold of natural image patches, analyzing activity patterns of the visual cortex, and clustering [Pun et al., 2022;Ali et al., 2022].
• TDA can be applied to arbitrary data structures even without any prepossessing, with the right filtrations.
• A new trend is emerging that allows efficient backpropagation through persistent homology components.This is a long-standing challenge in TDA (further discussed in Sec.3.3), but now topological layers are becoming compatible with deep learning and end-to-end training schemes.
We reiterate that though the benefits of using TDA (more specifically persistent homology) and deep learning together have demonstrated success, there are still some theoretical and computational challenges in the application of TDA to data.We discuss these issues at length in Section 4.1.
In the rest of this section, we investigate TDA for deep learning from lenses of different magnifications and perspectives as shown in Figure 1.In particular, we explore the use of persistent homology in various different ways.The discussion in Section 3.1-3.3 is focused on the on-training integration of TDA.That is, building topological neural architectures.However, a holistic view should also consider TDA's contribution to post-training (deep topological analytics).This uses TDA to study the 'shape' of a trained model.Thus, we review works that studied deep model complexity and interpretability using TDA in Section 3.4.

Learning Topological Features Embedding
In this section, we extend the discussion of fixed vectorization methods (Section 2.3 ) by introducing deep learnable vectorization (i.e.embedding).A key advantage here is the possibility of leveraging the deep model to simultaneously learn the vectorization of data and the representation of the target task.For example, we may parameterize the vectorization of persistence diagrams PD to embedding vector V ∈ R d by neural layers f w where w denotes the trainable parameters.Guided by the task loss, we can efficiently learn mapping f w : PD x → V x and automatically answer the question of "which family of vectorizations should best work for the given task".
Handling PDs by neural networks is the focus of many deep topological embedding works.Generally, PDs deep vectorization layers should be continuous and permutation invariant with respect to the input.The latter requirement is motivated by the set nature of the persistence diagram.Hofer et al. [Hofer et al., 2017] introduced the first learnable deep vectorization of PDs.It adopts a permutation invariant transformation by evaluating the PD's points against Guassian(s) whose mean and variance are learned during the training.Since permutation invariance was explored in other deep learning problems (e.g.Deep Set [Zaheer et al., 2017] for points cloud), some vectorization techniques for PD were borrowed from them.For example, PersLay [Carrière et al., 2019] builds on DeepSets for embedding extended PDs encoding graphs and uses it for graph classification.Recently, transformers were used for PDs embedding.Persformer [Reinauer et al., 2021] architecture showed superiority in synthetic and graph tasks while having some interpretability features.Note that transformers without positional encoding can be made as expressive as Deep Sets.Thus, the permutation invariance requirement can be maintained.
Beyond PDs, deep embedding was explored for other topological signatures.For example, PLLay [Kim et al., 2020] provides a layer for embedding persistence landscapes.PLLay claim to robustness to extreme topological distortion is backed by a tight stability bound that's independent of the input complexity.
Topological embedding transforms the topological input with a complex structure into a vector representation compat-ible with deep models.As discussed in this section, the process uses a custom topological input layer for embedding.In the next section, we explore topological components that enhance deep learning representation and usually have the flexibility to be plugged anywhere in the network.

Integration of Topological Representations
Representation learning is the process of learning features from data that can be used to improve the accuracy of the model.Deep learning excels in this regard thanks to its powerful feature learning, but having a good representation goes further than achieving good performance on a target task [Bengio et al., 2013].For example, TDA's stability can make deep representation resilient to input perturbation [de Surrel et al., 2022].Below we review two categories of deep topological representations.
Constrained Representations: One approach is to train deep neural networks to learn representations that preserve persistent homology of the input data.Again, TDA's versatility ensures the feasibility of this as the topological signature can be computed for both the input and the internal representation.For example, Topological Autoencoders [Moor et al., 2020] does the alignment through a loss minimizing the divergence between input and latent representation topologies (both captured by PDs).
Augmented Representations: Another approach for topological representation is augmenting the deep features with topological signatures.Persistence Enhanced Graph Network (PEGN) [Zhao et al., 2020] developed graph spatial convolution that builds on persistence homology.Normally, convolution filters are made adaptive to local graph structures by using node degree information.In contrast, PEGN weights the message passing (between nodes) by neighborhood information captured by persistence images.Moreover, Graph Filteration Learning (GFL) [Hofer et al., 2020] adapts the readout operation (a graph pooling-like operation) in Graph Neural Network (GNN) to be topologically aware.BDs are computed for the graph nodes feature and vectorized.Interestingly, the filtration function is learned end-to-end.Topological Graph Layer (TOGL) [Horn et al., 2022] extends GFL's idea and learns multiple filtrations of a graph (rather than one) in an end-to-end manner.
Unlike the embedding layers (e.g.PersLay [Carrière et al., 2019]) that expect a pre-specified input type (e.g.PDs), the topological representation layers discussed in this section enjoy more flexibility regarding the input and placement in the network.This comes with the attached cost of requiring careful design choices and guarantees on the layer characteristics (e.g.consistency of gradients in [Hofer et al., 2020]).

Topological Loss
The most common approach for leveraging topology in deep learning is incorporating a topological penalty in the loss.The popularity of the approach stems from the fact that Loss-based integration is straightforward and doesn't require changing the architecture or adding additional layers.The only caveat is that the loss should differentiable and easy to compute.As iterated previously the capability of topological features in capturing the complex structure of the data means deep learning can learn robust representations guided by the topological loss.Thus, the representations are likely invariant w.r.t typical transformations present in real-world datasets such as noise, and outliers.An example of this is a common persistence loss [Hu et al., 2019], which minimizes the difference between a predicted persistence diagram PD X and the true diagram PD Y : This has been used either as a standalone loss or as a regularizer (i.e.augmenting another loss) [Hu et al., 2019] in applications such as semantic segmentation [Hu et al., 2019], generative modeling [Wang et al., 2020].
As discussed in 3.1, PDs do not lend themselves to vector representations in Euclidean space.Moreover, the PD is not differentiable (a key requirement for using backpropagation).One strategy to resolve this is leveraging a divergence or metric that can handle PDs.The p-Wasserstein1 distance and the bottleneck distance are popular choices: where t is a point corresponding to a (b i , d i ) ∈ R 2 that is in PD X , and where Π(PD X , PD Y ) denotes a the set of bijection between PD X and PD Y , and .q is the q Euclidean norm.It can be seen that bottleneck distance is the largest distance between any pair of corresponding points across all bijections that preserve the partial ordering of the points (i.e.we cannot match a point with a birth time greater than another point's death time).This ensures that the topological features to be matched are comparable.
The initial popularity of bottleneck distance is perhaps fueled by a stability theorem [Cohen-Steiner et al., 2005] for PDs of continuous functions.According to this theorem, bottleneck distance is controlled by L ∞ distance, that is form some constant C. In effect, this means that the diagrams are stable with respect to small perturbations of the underlying data.A similar stability result exists for the p-Wasserstein distance.These are the foundation of the stability guarantees by recent deep learning works such as the stability of Heat Kernel Signature in graphs [Carrière et al., 2019] and stability of mini-batch-based diagram distances in Topological Autoencoders [Moor et al., 2020].
Among the limitations of ( 2) and ( 3) is the high computational budget needed by these distances when the number of points is large.As the distance requires point-wise matching, the computational complexity is O(n 3 ) for n points [Anirudh et al., 2016].Also, in many applications [Wang et al., 2020;Chen et al., 2019], we aim to learn a model f w that aligns a predicted diagram PD P with a target (i.e.ground truth) diagram PD T by gradually moving PD P points towards PD T .This is typically achieved by pushing w in the negative direction of ∇ w L topological and, obviously, assumes that the loss is differentiable w.r.t. the diagram.While the Wasserstein distance satisfies this requirement in general, it can have some instability issues [Solomon et al., 2021].Below, we select a few representative papers using topological losses in various applications and show how they handle these issues.
In generative modeling, TopoGAN [Wang et al., 2020] uses a slightly modified 1-Wassertsein distance to align the diagrams of generated and real images in medical image applications.The loss ignores the death time and focuses only on the birth time of the diagram features.Framed in this way, the loss becomes similar to the Sliced Wasserstein [Peyré et al., 2019] which can be computed efficiently and is still differentiable.A similar loss was used by [Hu et al., 2019] for segmentation to encourage the deep model to produce output whose topology is close to the ground truth.The crossentropy loss is augmented with the 2-Wasserstein loss between persistence diagrams.To alleviate the computational burden, the method performs the calculation on a single small image patch (part of the image) at a time.In [Clough et al., 2022] the authors rely on Betti numbers for semi-supervised image segmentation.A notable advantage here is the output of a network trained on a small set of labeled images can still capture the actual Betti numbers correctly.This gives us the opportunity to train the model initially on a small labeled dataset guided by the Betti numbers loss L β .Then, the model is fine-tuned using large unlabeled dataset and guided by a loss (that incorporates L β ).Since Betti numbers estimation is robust for the unlabeled data, L β will regularize the second stage of training (fine-tuning).In classification, [Chen et al., 2019] uses a topological regularizer.To speed up the computation it focuses on the zero homological dimension where the persistence computations are particularly faster.

Deep Topological Analytics
The complementary value of TDA goes beyond on-training integration and constructing topological neural architectures.In fact, leveraging TDA methods post-training can be even more insightful and powerful.Currently, researchers use TDA to address deep learning transparency [Liu et al., 2020], studying model complexity [Rieck et al., 2019] and even tracking down answers for seemingly mysterious aspects of deep learning e.g.why deep networks outperform shallow ones [Naitzat et al., 2020].These efforts are centered around analyzing deep models using TDA approaches.Hence, we call it deep topological analytics.Due to space limitations, we explore only two aspects of it below.
Quantifying structural complexity: [Watanabe and Yamana, 2021] treats the neural networks as a weighted graph G(V, E) where V and E denote the network neurons and the relevance scores (computed from weights); respectively.By computing persistence features (e.g.Betti numbers) across filtration, we can gain insight into the network complexity.For example, the increase in the Betti number (the occurrence of a cycle between a set of neurons) can reflect the complexity of knowledge in the deep neural networks.In [Rieck et al., 2019] the authors follow the same line and further develop training optimization strategies (e.g.early stopping) informed by homological features.
Visual exploration of models: another use of TDA here is providing a post-hoc explanation and/or visual exploration of the internal functioning of deep models.For example, topological information provides insight into the overall structure of high-dimensional functions.The authors in [Liu et al., 2020] use this to offer a scalable visual exploration tool for data-driven (and black box) models.This is an important research problem whose key challenge is doing it in an intuitive way.They also use topological splines to visualize the high dimensional error landscape of the models.Similarly, TopoAct [Rathore et al., 2021] offers insightful information on neural network learned representations and provides a visual exploration tool to study topological summaries of activation vectors.

Discussion
TDA is a steadily developing and promising area, with successes in a wide variety of applications.However, there are open questions in applying TDA with deep neural networks.In this section, we highlight several open challenges for future research of deep TDA in both practical and theoretical aspects and paint a speculative picture by outlining what persistent homology holds for the future.We also note some open-source implementations for researchers to get started.

Challenges
Despite the success of TDA and its use in deep learning we describe a few notable challenges here.
Computational cost: Many aspects of calculating persistent homology are computational intractable.The construction of the Čech complex for a given r is known to be an NP-hard task.Computing Betti numbers for a given simplicial complex are also infesable to compute for very large scale complexes.Costs of calculating TDA information adds to the already computationally expensive deep learning routines.
Lack of universal framework for vectorization: There is no universally accepted framework for incorporating topological information into deep learning.This is a theoretical matter as well as a computational one, for example there is a lack of strong theory encoding persistence diagrams as vectors, as discussed earlier.There have been a variety of adhoc solutions of varying merit, recently cataloged in [Ali et al., 2022].Alternatively, vectorization methods have been chosen as part of learning strategies [Hofer et al., 2017;Moor et al., 2020].
Statistical guarantees: Through this article we have not discussed the statistical aspects of persistence due to finite sampling.For example, there is no guarantee that the PD derived from X reflects the true homology of M. The framework for understanding the statistical robustness of persistence information is evolving.Some simple strategies for verification such as sub-sampling and cross-validation have been used in the literature [Chazal and Michel, 2021].There is scope to further understand issues such as the minimum number of data points required to guarantee robust PDs.Furthermore, persistence is not well understood from a probabilistic point of view, e.g. the distribution of persistence from a distribution of shapes.
High-dimensional learning challenge: There is no underlying theoretical framework for what topological features to expect with high-dimensional data.While abstract topological spaces can be enormously complex in high dimensions, we do not know whether to expect data to behave similarly.Moreover high dimensional homological features are unatainable due to computational cost, and in any case sensitivity of PDs to sampling or noise is not well understood in high dimension.This makes learning the underlying topology of the data for use in deep neural networks challenging.
The need for a good backpropagation strategy: The differentiability of PDs or other homological quantities is not guaranteed or necessarily well understood.This makes backpropagation in deep neural networks that incorporate topological signatures extremely challenging or only feasible under special conditions [Moor et al., 2020].
Capturing multi-variate persistence: In some cases, multiple concurrent filtrations are needed to fully capture the topology of the data manifold, especially for data in higher dimensions.This leads to multi-variate persistence, where the birth and death of topological features occur in multiple dimensions.This notion of persistence does not have a complete discrete invariant, unlike the one-dimensional BD that we've discussed so far.For the practical use of multi-variate persistence in deep learning, we would need new theoretical frameworks and better computational methods.

Successes and Future Directions:
Deep TDA has demonstrated potential in a variety of challenging settings.The invariance of PH information to continuous deformation means TDA applies well to settings where objects should have consistent shapes but may be transformed in some way.TDA also performs well at bridging the gap between structural information and prior knowledge.If we have prior knowledge of the topology of a class of objects, then PDs are an effective tool for the classification and comparison of data against this class, even in the presence of noise or limited data.This robustness incorporates well into deep learning.
TDA can produce good results in small datasets, this is especially useful for medical imaging applications where expense and privacy concerns limit data acquisition [Byrne et al., 2021].TDA has also been used in other settings with limited or noisy data such as power forecasting [Senekane et al., 2021], segmenting aerial photography [Mosinska et al., 2018] and astronomy [Murugan and Robertson, 2019].
In some applications, topological information may be more significant than statistical (pixel-wise) information.For example, in [Vukicevic et al., 2017] detecting holes between heart chambers is more important than inferring the thickness of septal walls.For these types of applications, a loss function combining topological and statistical information can be adjusted in favor of topology, when training a network.
As PH encapsulates global structure, developing topological loss functions could suppress small false positives or false negatives, particularly for computer vision tasks.For example, in image segmentation, morphological operations or conditional random field-based techniques are used to remove local errors, but they do not possess knowledge of global topology.The benefit of PH-based loss is that the correct global topology can be propagated, with local label smoothness.
It would be interesting to explore sophisticated deep learning architectures that learn mappings between high dimensional data and their corresponding PDs or other topological representations, furthering [de Surrel et al., 2022].Moreover, deep learning may yet yield new kinds of topological representation other than PDs, with robustness to different data deformations.PH could have further applications in multi-class open-set problems (where data may have unknown classes).If the topology among classes is relatively consistent, then the object labels of unknown classes can be better predicted.

Implementations
There are a number of open-source implementations of TDA available to practitioners.Here we present two libraries that have interfaces with deep learning architectures.
GUDHI is an open-source library2 that implements relevant geometric data structures and TDA algorithms, and it can be integrated into the TensorFlow framework.Per-sLay [Carrière et al., 2019] and RipsLayer are implementations using GUDHI that learn persistence representations from complexes and PDs.They can handle automatic differentiation and are readily integrated in deep learning architectures.
Giotto-deep3 is an open-source extension of the Giotto-TDA library.It aims to provide seamless integration between TDA and deep learning on top of PyTorch.To use topology for both pre-processing data (using a variety of available methods) and using it within neural networks, the developers aim to provide several off-the-shelf architectures.One such example is that of Persformer [Reinauer et al., 2021].

Conclusion
The recent growth in TDA, and the established efficacy of deep learning, has meant that integration of these techniques has been inevitable.There is no universal paradigm for combining TDA and deep learning.This article surveyed numerous ways in which these frameworks have benefited each other.We began with an overview of the key TDA concepts.Following this we reviewed TDA in deep learning from a variety of perspectives.We described numerous challenges and opportunities that remain in this field, as well as some observed success.

Figure 1 :
Figure 1: Topological Deep Learning introduces TDA methods to deep models leading to topological neural architectures that can potentially address deep learning limitations.This is done by plugging topological components for (a) learning features Embedding (Section 3.1), (b) enhancing the learned Representations (Section 3.2), and/or (c) regularizing the model using a topological Loss (Section 3.3).Beyond that, (d) TDA can be used post-training to reveal insights of trained models (interpretability) (Section 3.4).