Signal classification with a point process distance on the space of persistence diagrams

Abstract

In this paper, we consider the problem of signal classification. First, the signal is translated into a persistence diagram through the use of delay-embedding and persistent homology. Endowing the data space of persistence diagrams with a metric from point processes, we show that it admits statistical structure in the form of Fréchet means and variances and a classification scheme is established. In contrast with the Wasserstein distance, this metric accounts for changes in small persistence and changes in cardinality. The classification results using this distance are benchmarked on both synthetic data and real acoustic signals and it is demonstrated that this classifier outperforms current signal classification techniques.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Adcock A, Carlsson E, Carlsson G (2016) The ring of algebraic functions on persistence bar codes. Homol Homotopy Appl 18(1):381–402

    Article  Google Scholar 

  2. Adler RJ, Bobrowski O, Weinberger S (2014) Crackle: the homology of noise. Discrete Comput Geom 52(4):680–704

    MathSciNet  Article  Google Scholar 

  3. Azimi-Sadjadi MR, Yang Y, Srinivasan S (2007) Acoustic classification of battlefield transient events using wavelet subband features. In: Proceedings of SPIE defense and security symposium, p 6562

  4. Bampasidou M, Gentimis T (2014) Modeling collaborations with persistent homology. arXiv preprint arXiv:1403.5346

  5. Bauer U (2015) Ripser. https://github.com/Ripser/ripser

  6. Bogert BP, Healy MJ, Tukey JW (1963) The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking. In: Proceedings of the symposium on time series analysis, chapter, vol 15, pp 209–243

  7. Bubenik P (2015) Statistical topological data analysis using persistence landscapes. J Mach Learn Res 16(1):77–102

    MathSciNet  MATH  Google Scholar 

  8. Carlsson G (2009) Topology and data. Bull Am Math Soc 46(2):255–308

    MathSciNet  Article  Google Scholar 

  9. Chazal F, Cohen-Steiner D, Glisse M, Guibas LJ, Oudot SY (2009) Proximity of persistence modules and their diagrams. In: Proceedings of the twenty-fifth annual symposium on Computational geometry. ACM, pp 237–246

  10. Cohen-Steiner D, Edelsbrunner H, Harer J, Mileyko Y (2010) Lipschitz functions have \(L_p\)-stable persistence. Found Comput Math 10(2):127–139

    MathSciNet  Article  Google Scholar 

  11. Dhanalakshmi P, Palanivel S, Ramalingam V (2009) Classification of audio signals using SVM and RBFNN. Expert Syst Appl 36(3):6069–6075

    Article  Google Scholar 

  12. Edelsbrunner H, Harer J (2010) Computational topology: an introduction. American Mathematical Society, Providence

    Google Scholar 

  13. Emrani S, Gentimis T, Krim H (2015) Persistent homology of delay embeddings and its application to wheeze detection. IEEE Signal Process Lett 21(4):459–463

    Article  Google Scholar 

  14. Fasy BT, Kim J, Lecci F, Maria C, Rouvreau V (2015) The included GUDHI is authored by Clement Maria PbUBMK Dionysus by Dmitriy Morozov, Reininghaus J Tda: statistical tools for topological data analysis r package version 1.4.1. https://CRAN.R-project.org/package=TDA

  15. Garrett D, Peterson DA, Anderson CW, Thaut MH (2003) Comparison of linear, nonlinear, and feature selection methods for eeg signal classification. IEEE Trans Neural Syst Rehabil Eng 11:141–166

    Article  Google Scholar 

  16. Hatcher A (2002) Algebraic topology. Cambridge University Press, Cambridge

    Google Scholar 

  17. Kerber M, Morozov D, Nigmetov A (2016) Geometry helps to compare persistence diagrams. In: Proceedings of the eighteenth workshop on algorithm engineering and experiments, pp 103–112

  18. Krim H, Gentimis T, Chintakunta H (2016) Discovering the whole by the coarse: a topological paradigm for data analysis. IEEE Signal Process Mag 33(2):95–104

    Article  Google Scholar 

  19. Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Log Q 2:83–87

    MathSciNet  Article  Google Scholar 

  20. Law K, Stewart A, Zygalakis K (2015) Data assimilation: a mathematical introduction. Springer, Berlin

    Google Scholar 

  21. Lum PY, Singh G, Lehman A, Ishkanov T, Vejdemo-Johansson M, Alagappan M, Carlsson J, Carlsson G (2013) Extracting insights from the shape of complex data using topology. Sci Rep 3(3):1236

    Article  Google Scholar 

  22. Maroulas V, Nebenführ A (2015) Tracking rapid intracellular movements: a Bayesian random set approach. Ann Appl Stat 9(2):926–949

    MathSciNet  Article  Google Scholar 

  23. Mileyko Y, Mukherjee S, Harer J (2011) Probability measures on the space of persistence diagrams. Inverse Problems 27(12):124007

    MathSciNet  Article  Google Scholar 

  24. Nicolau M, Levine A, Carlsson G (2011) Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc Nat Acad Sci 108(17):7265–7270

    Article  Google Scholar 

  25. Oppenheim AV, Schafer RW (2004) From frequency to quefrency: a history of the cepstrum. IEEE Signal Process Mag 21:95–106

    Article  Google Scholar 

  26. Reininghaus J, Huber S, Bauer U, Kwitt R (2015) A stable multi-scale kernel for topological machine learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4741–4748

  27. Robins V, Turner K (2016) Principal component analysis of persistent homology rank functions with case studies of spatial point patterns, sphere packing and colloids. Physica D 334:99–117

    MathSciNet  Article  Google Scholar 

  28. Schuhmacher D, Vo B, Vo B (2008) A consistent metric for performance evaluation of multi-object filters. IEEE Trans Signal Process 56:3447–3457

    MathSciNet  Article  Google Scholar 

  29. Seversky LM, Davis S, Berger M (2016) On time-series topological data analysis: new data and opportunities. In: The IEEE conference on computer vision and pattern recognition, pp 59–67

  30. Sherwin J, Sajda P (2013) Musical experts recruit action-related neural structures in harmonic anomaly detection: evidence for embodied cognition in expertise. Brain Cogn 83:190–202

    Article  Google Scholar 

  31. Srinivas U, Nasrabadi NM, Monga V (2013) Graph-based multi-sensor fusion for acoustic signal classification. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 261–265

  32. Takens F (1980) Detecting strange attractors in turbulence. In: Dynamical systems and turbulence, Warwick 1980. Lecture notes in mathematics, vol 898, pp 366–381

    Google Scholar 

  33. Turner K, Mileyko Y, Mukherjee S, Harer J (2014) Fréchet means for distributions of persistence diagrams. Discrete Comput Geom 52(1):44–70

    MathSciNet  Article  Google Scholar 

  34. Venkataraman V, Ramamurthy KN, Turaga P (2016) Persistent homology of attractors for action recognition. In: 2016 IEEE international conference on image processing (ICIP), pp 4150–4154

  35. Xia K, Wei GW (2014) Persistent homology analysis of protein structure, flexibility, and folding. Int J Numer Methods Biomed Eng 30(8):814–844

    MathSciNet  Article  Google Scholar 

  36. Zhang H, Nasrabadi NM, Huang TS, Zhang Y (2011) Transient acoustic signal classification using joint sparse representation. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2220–2223

Download references

Acknowledgements

VM would like to thank the Army Research Office and its support via the Grant \(\#\) W911NF-17-1-0313 to VM. Both authors would like to thank Dr. Tung-Duong Tran-Luu for providing the Army Research Lab’s acoustic signal dataset and for useful discussions. The authors would also like to thank five anonymous reviewers for their comments, which substantially improved the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Vasileios Maroulas.

Appendices

Appendix A

Proof that \(d^c_p\) is a metric We adapt the proof from Schuhmacher et al. (2008) to the space \(P_W\). According to Definition 5, it is clear we have that \(d^c_p \ge 0\) and that \(d^c_p\) is symmetric and satisfies the identity. It remains to show the triangle inequality. We consider three persistence diagrams \(\mathbb {D}_1 = (t_1,\ldots ,t_\ell ), \mathbb {D}_2 = (u_1,\ldots ,u_n), \mathbb {D}_3 = (v_1,\ldots ,v_m)\). Assume that \(\ell \le n\) and that at most one of the cardinalities is zero. Since W is a closed and bounded subset of \(\mathbb {R}^2\), we consider some dummy points \((a_i)_{i \in \mathbb {N}}\) and \((b_i)_{i \in \mathbb {N}}\) at least distance c from W and each other. The two cases we must consider are \(\ell \le n \le m\) and \(\ell ,m \le n\).

We first treat the case when \(\ell \le n \le m\). Extend the persistence diagram \(\mathbb {D}_1\) with points \(t_{\ell + j} = a_j\) for \(1 \le j \le m - \ell \) and similarly for \(\mathbb {D}_2\) with \(u_{n + j} = b_j\) for \(1 \le j \le m-n\). This way the cardinality difference is equal to zero in Eq. (2). Moreover, after the dummy points have been added in, let \(\eta \) and \(\nu \) be the minimum permutations from \(\mathbb {D}_1\) to \(\mathbb {D}_3\) and from \(\mathbb {D}_3\) to \(\mathbb {D}_2\) respectively. Then, according to Eq. (2) and \(a \le c^pm\) implying \(\frac{a}{m} \le \frac{a + c^p(n-m)}{n}\), we have

$$\begin{aligned} d^c_p(\mathbb {D}_1,\mathbb {D}_2)= & {} \left( \frac{1}{n} \min _{\pi \in \varPi _n} \sum _{i=1}^n \min (c,||t_i - u_{\pi (i)}||_\infty )^p\right) ^\frac{1}{p}\nonumber \\\le & {} \left( \frac{1}{m} \min _{\pi \in \varPi _m} \sum _{i=1}^m \min (c,||t_i - u_{\pi (i)}||_\infty )^p\right) ^\frac{1}{p} \end{aligned}$$
(12)

The right hand side of Eq. (12) can further be bounded by

$$\begin{aligned}&\left( \frac{1}{m} \sum _{i=1}^m \min (c,||t_i - v_{\eta (i)}||_\infty )^p + \min (c,||v_i - u_{\nu (i)}||_\infty )^p\right) ^\frac{1}{p}\nonumber \\&\quad \le \left( \frac{1}{m} \sum _{i=1}^m \min (c,||t_i - v_{\eta (i)}||_\infty )^p\right) ^\frac{1}{p} + \left( \frac{1}{m} \sum _{i=1}^m \min (c,||v_i - u_{\nu (i)}||_\infty )^p\right) ^\frac{1}{p}\nonumber \\&\quad = d^c_p(\mathbb {D}_1,\mathbb {D}_3) + d^c_p(\mathbb {D}_3,\mathbb {D}_2) \end{aligned}$$
(13)

Note that in Eq. (13), we are mapping from \(\mathbb {D}_1\) to \(\mathbb {D}_3\) in the most optimal way (via permutation \(\eta \)) and then from \(\mathbb {D}_3\) to \(\mathbb {D}_2\) in the most optimal way (via permutation \(\nu \)).

The second case is when \(\ell ,m \le n\). Take \(\eta \) and \(\nu \) to be the minimum permutations from \(\mathbb {D}_1\) to \(\mathbb {D}_3\) and from \(\mathbb {D}_3\) to \(\mathbb {D}_2\) respectively as above. Then, similarly, we have that

$$\begin{aligned} d^c_p(\mathbb {D}_1,\mathbb {D}_2)= & {} \left( \frac{1}{n} \min _{\pi \in \varPi _n} \sum _{i=1}^n \min (c,||t_i - u_{\pi (i)}||_\infty )^p\right) ^\frac{1}{p}\nonumber \\\le & {} \left( \frac{1}{m} \sum _{i=1}^m \min (c,||t_i - v_{\eta (i)}||_\infty )^p + \min (c,||v_i - u_{\nu (i)}||_\infty )^p\right) ^\frac{1}{p}\nonumber \\\le & {} \left( \frac{1}{m} \sum _{i=1}^m \min (c,||t_i - v_{\eta (i)}||_\infty )^p\right) ^\frac{1}{p}\nonumber \\&+\, \left( \frac{1}{m} \sum _{i=1}^m \min (c,||v_i - u_{\nu (i)}||_\infty )^p\right) ^\frac{1}{p}\nonumber \\= & {} d^c_p(\mathbb {D}_1,\mathbb {D}_3) + d^c_p(\mathbb {D}_3,\mathbb {D}_2) \end{aligned}$$

Appendix B

Proof of Lemma 1

By “Appendix A”, it is clear that this is a metric space. We first show completeness. Let \(\{\mathbb {D}_n\}_{i=1}^k\) be a Cauchy sequence of persistence diagrams. It is clear that for some \(k_0\), we have that \(j,l \ge k_0\) implies \(|\mathbb {D}_j| = |\mathbb {D}_l| = k\), so we may assume without loss of generality that the associated cardinalities are equal. Fix an \(\epsilon > 0\). Note there is N such that for \(n,m > N\), \(d_p^c(\mathbb {D}_n,\mathbb {D}_m) < \epsilon \). In particular, since their cardinalities are the same, we have that

$$\begin{aligned} d^c_p(\mathbb {D}_n,\mathbb {D}_m) = \left( \frac{1}{k}\min _{\pi \in \varPi _k} \sum _{i=1}^k \left\| x^n_i - x^m_{\pi (i)}\right\| _\infty ^p\right) ^\frac{1}{p} < \epsilon \end{aligned}$$

and so we have that, for a given point \(x^n_i \in \mathbb {D}_n\),

$$\begin{aligned} \left\| x^n_i - x^m_{\pi (i)}\right\| _\infty < (k)^\frac{1}{p}\epsilon \end{aligned}$$

where \({\pi (i)}\) is the minimal permutation.

Thus, there is a sequence of points \(x^n_i, x^{n+1}_{\pi _{n+1}(i)}, x^{n+2}_{\pi _{n+2}(i)},\ldots \) such that the distance between any two points in this sequence is less than \(2(k)^\frac{1}{p}\epsilon \) via the triangle inequality, where \(\pi _{n+\alpha }\) is the minimal permutation between persistence diagrams \(\mathbb {D}_n\) and \(\mathbb {D}_{n+\alpha }\). This is a Cauchy sequence in W under the \(\inf \)-norm. Since W is complete, this sequence converges to some limit \(x_i \in S\). Repeating this for each element in \(\mathbb {D}_n\), we generate a persistence diagram \(\mathbb {D}^*\) consisting of points \((x_1,\ldots ,x_k)\) chosen as the limits above.

Therefore, for any fixed \(\epsilon ^p\), since each sequence above converges to the corresponding limits, there is some N such that for \(j > N\) we have \(||x^{j}_{i} - x_i||_\infty < \epsilon \) This implies that

$$\begin{aligned} d_p^c(\mathbb {D}_j,\mathbb {D}^*)= & {} \left( \frac{1}{k}\min _{\pi \in \varPi _k} \sum _{i=1}^k ||x^n_i - x_{\pi (i)}||_\infty ^p\right) ^\frac{1}{p} \le \left( \frac{1}{k} \sum _{i=1}^k ||x^n_i - x_i||_\infty ^p\right) ^\frac{1}{p} \\< & {} \left( \frac{1}{k} k\epsilon ^p\right) ^\frac{1}{p} = \epsilon \end{aligned}$$

Since this sequence converges to a limit in this space, this space is complete.

Finally, it remains to show separability. Consider the space \(P_{\mathbb {Q} \bigcap W,k}\) of all persistence diagrams with points in \(\mathbb {Q} \bigcap W\) and cardinality less than or equal to k. Then for any persistence diagram \(\mathbb {D}\), find \(\mathbb {D}_q \in P_{\mathbb {Q} \bigcap W,k}\) such that \(|\mathbb {D}| = |\mathbb {D}_q| = k\) and for all \(x_i \in \mathbb {D}\), there is a corresponding \(y_{x_i} \in \mathbb {D}_q\) such that \(||x_i - y_{x_i}||_\infty ^p \le \epsilon \). Then

$$\begin{aligned} d^c_p(\mathbb {D},\mathbb {D}_q) = \frac{1}{k} \left( \min _{\pi \in \varPi _k} \sum _{i=1}^k ||x_i - y_{\pi (i)}||_\infty ^p\right) \le \frac{1}{k} \sum _{i=1}^k ||x_i - y_{x_i}||_\infty ^p) \le \frac{1}{k} \sum _{i=1}^k \epsilon = \epsilon \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Marchese, A., Maroulas, V. Signal classification with a point process distance on the space of persistence diagrams. Adv Data Anal Classif 12, 657–682 (2018). https://doi.org/10.1007/s11634-017-0294-x

Download citation

Keywords

  • Classification of time series
  • Data space of persistence diagrams
  • Wasserstein metric
  • Cardinality
  • Persistent homology

Mathematics Subject Classification

  • 62H30
  • 62M10
  • 54H99
  • 62P30