Skip to main content
Log in

Simplicial band depth for multivariate functional data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

We propose notions of simplicial band depth for multivariate functional data that extend the univariate functional band depth. The proposed simplicial band depths provide simple and natural criteria to measure the centrality of a trajectory within a sample of curves. Based on these depths, a sample of multivariate curves can be ordered from the center outward and order statistics can be defined. Properties of the proposed depths, such as invariance and consistency, can be established. A simulation study shows the robustness of this new definition of depth and the advantages of using a multivariate depth versus the marginal depths for detecting outliers. Real data examples from growth curves and signature data are used to illustrate the performance and usefulness of the proposed depths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Apanasovich TV, Genton MG, Sun Y (2012) A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components. J Am Stat Assoc 107:180–193

    Article  MATH  MathSciNet  Google Scholar 

  • Cheng A, Ouyang M (2001) On algorithms for simplicial depth. In: Proceeding 13th Canadian conference on computational geometry, vol 1, pp 53–56

  • Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat 22:481–496

    Article  MATH  MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric Functional Data Analysis. Springer, New York

    MATH  Google Scholar 

  • Fraiman R, Muniz G (2001) Trimmed means for functional data. Test 10:419–440

    Article  MATH  MathSciNet  Google Scholar 

  • Genton MG, Johnson C, Potter K, Stenchikov G, Sun Y (2014) Surface boxplots. Stat 3:1–11

    Article  Google Scholar 

  • Gervini D (2012) Outlier detection and trimmed estimation for general functional data. Statistica Sinica 22:1639–1660

    MATH  MathSciNet  Google Scholar 

  • Gneiting T, Kleiber W, Schlather M (2010) Matérn cross-covariance functions for multivariate random fields. J Am Stat Assoc 105:1167–1177

    Article  MathSciNet  Google Scholar 

  • Ieva F, Paganoni M (2013) Depth measures for multivariate functional data. Commun Stat 42(7):1265–1276

    Article  MATH  MathSciNet  Google Scholar 

  • Liu RY (1990) On a notion of data depth based upon random simplices. Ann Stat 18:405–414

    Article  MATH  Google Scholar 

  • López-Pintado S, Jörnsten R (2007) Functional data analysis via extensions of the band depth, IMS lecture Notes-Monograph Series. Inst Math Stat 54:103–120

    Google Scholar 

  • López-Pintado S, Romo J (2007) Depth-based inference for functional data. Comput Stat Data Anal 51:4957–4968

    Article  MATH  Google Scholar 

  • López-Pintado S, Romo J (2009) On the concept of depth for functional data. J Am Stat Assoc 104(486): 718–734

    Google Scholar 

  • López-Pintado S, Wei Y (2011) Depth for sparse functional data. In: Ferraty F (ed) Recent advances in functional data analysis and related topics. Springer, Berlin, pp 209–212

    Chapter  Google Scholar 

  • Matérn B (1960) Spatial variation. Springer, New York

    Google Scholar 

  • Ramsay JO, Hooker G, Graves S (2009) Functional data analysis with R and MATLAB. Springer, New York

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York

  • Rousseeuw PJ, Ruts I (1996) Bivariate location depth. Appl Stat 45:516–526

    Google Scholar 

  • Stein ML (1999) Interpolation of spatial data: some theory for Kriging. Springer, Berlin

    Book  MATH  Google Scholar 

  • Sun Y, Genton MG (2011) Functional boxplots. J Comput Grap Stat 20:313–334

    MathSciNet  Google Scholar 

  • Sun Y, Genton MG (2012a) Adjusted functional boxplots for spatio-temporal data visualization and outlier detection. Environmetrics 23:54–64

    Article  MathSciNet  Google Scholar 

  • Sun Y, Genton MG (2012b) Functional median polish. J Agric Biol Environ Stat 17:354–376

    Article  MathSciNet  Google Scholar 

  • Sun Y, Genton MG, Nychka D (2012) Exact fast computation of band depth for large functional datasets: how quickly can one million curves be ranked? Stat 1:68–74

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc G. Genton.

Appendix

Appendix

Proof of Theorem 1

  1. 1(a).

    Let \(\mathbf {T}(\mathbf {x})\) be the combined function defined as \(\mathbf {T}(\mathbf {x}(t))=\mathbf {A}(t)\mathbf {x}(t)+\mathbf {b}(t)\), where \(t \in \mathcal {I}\). Assume it satisfies the standard assumptions for the linear transformation presented in Theorem 1. By definition,

    $$\begin{aligned} \textit{SBD}(\mathbf {x},P_\mathbf X )&= P\{ \mathbf {x}(t) \in \text{ simplex }\{\mathbf {X}_{1}(t),\ldots ,\mathbf {X}_{p+1}(t)\}, \forall t \in \mathcal{I}\}. \end{aligned}$$

    It is trivial to check that for any fixed \(t \in \mathcal{I}\), we have that the curve \(\mathbf {x}(t)\in \text{ simplex }\{\mathbf {X}_{1}(t),\ldots ,\mathbf {X}_{p+1}(t)\}\) if and only if the curve \(\mathbf {A}(t)\mathbf {x}(t)+\mathbf {b}(t) \in \text{ simplex }\{\mathbf {A}(t)\mathbf {X}_{1}(t)+\mathbf {b}(t),\ldots ,\mathbf {A}(t)\mathbf {X}_{p+1}(t)+\mathbf {b}(t)\}\) and therefore,

    $$\begin{aligned} \textit{SBD}(\mathbf {T}(\mathbf {x}),P_\mathbf{T(X) })=\textit{SBD}(\mathbf {x},P_\mathbf X ). \end{aligned}$$

    \(\square \)

  2. 1(b).

    Let \(g\) be a one-to-one transformation of the interval \(\mathcal{I}\). It is straightforward to prove that for any fixed \(t \in \mathcal{I}\), \(\mathbf {x}(g(t))\in \text{ simplex }\{\mathbf {X}_{1}(g(t)),\ldots ,\mathbf {X}_{p+1}(g(t))\}\) if and only if \(\mathbf {x}(t) \in \text{ simplex }\{\mathbf {X}_{1}(t),\ldots ,\mathbf {X}_{p+1}(t)\}\) and therefore,

    $$\begin{aligned} \textit{SBD}(\mathbf {x}(g),P_\mathbf{X (g)})=\textit{SBD}(\mathbf {x},P_\mathbf{X }). \end{aligned}$$

    \(\square \)

  3. 2.

    \(\textit{SBD}(\mathbf {x},P_\mathbf X )\) converges to zero when the supremum norm of the components of the process \(\mathbf {x}\) tend to infinity. Specifically,

    $$\begin{aligned} \underset{\min _{k=1,\ldots , p} \left\| x_k\right\| _{\infty }\ge M}{\sup } \textit{SBD}(\mathbf {x},P_\mathbf X )\longrightarrow 0,\text { \ \ when }M\rightarrow \infty , \end{aligned}$$

    where \(x_k\) is the \(k\)th component of the multivariate function \(\mathbf {x}\). We establish first by contradiction the inclusion

    $$\begin{aligned} \left\{ (\mathbf {X}_{1},\ldots ,\mathbf {X}_{p+1})\text {: }\mathbf {x}(t)\in \text{ simplex }\{\mathbf {X}_{1}(t),\ldots ,\mathbf {X}_{p+1}(t)\}, \forall t\in \mathcal{I} \right\} \\ \subset \underset{r=1}{\overset{p+1}{ \cup }}\underset{k=1}{\overset{p}{ \cup }}\left\{ (\mathbf {X}_{_{1}},\ldots ,\mathbf {X}_{_{p+1}}):\left\| X_{rk}\right\| _{\infty }\ge \left\| x_{k}\right\| _{\infty }\right\} \end{aligned}$$

    where \(X_{rk}\) is the \(k\)th component of the multivariate function \(\mathbf {X}_r\). If \(\mathbf {x}(t)\in \text{ simplex }\{ \mathbf {X}_{1}(t),\mathbf {X}_{2}(t),\ldots ,\mathbf {X}_{p+1}(t)\}\) for all \(t\in \mathcal I\), then, for each \(k\) and \(t\in \mathcal I,\)

    $$\begin{aligned} \underset{r=1,\ldots ,p+1}{\min }\left\{ X_{rk}(t)\right\} \le x_{k}(t)\le \underset{r=1,\ldots ,p+1}{\max }\left\{ X_{rk}(t)\right\} . \end{aligned}$$
    (7)

    Assume that \(\left\| X_{_{r,k}}\right\| _{\infty }<\left\| x_{k}\right\| _{\infty }\) for each \(k=1,\ldots ,p,\) and \(r=1,\ldots ,p+1;\) this implies that, for each \(r\) and \(k\), we have

    $$\begin{aligned} \underset{ t \in \mathcal I}{\max }\left| X_{rk}(t)\right| <\underset{t \in \mathcal I}{\max }\left| x_{k}(t)\right| . \end{aligned}$$

    Let \(t_k\) be the point where the maximum of \(x_{k}(t)\) is achieved. Then, for all \(r=1,\ldots ,p+1\), \(\left| X_{rk}(t_k)\right| <\left| x_{k}(t_k)\right| \), which contradicts (7). Therefore,

    $$\begin{aligned}&\underset{\min _{k=1,\ldots , p} \left\| x_k\right\| _{\infty }\ge M}{\sup }\textit{SBD}(\mathbf {x},P_\mathbf {X}) \\&\quad \le \underset{\min _{k=1,\ldots , p} \left\| x_k\right\| _{\infty }\ge M}{\sup }\text { }P\left( \mathbf {x}(t)\in \text{ simplex }\{\mathbf {X}_{1}(t),\ldots ,\mathbf {X}_{j}(t)\}, \forall t \in \mathcal I\right) \\&\quad \le \underset{\min _{k=1,\ldots , p} \left\| x_k\right\| _{\infty }\ge M}{\sup }\text { }\underset{r=1}{\overset{p+1}{\sum }}\underset{k=1}{ \overset{p}{\sum }}P\left( \left\| X_{_{rk}}\right\| _{\infty }\ge \left\| x_k\right\| _{\infty }\right) \\&\quad \le \underset{}{\overset{\text {\ }p+1}{\underset{r=1}{\text { }\sum }}\underset{k=1}{\overset{p}{\sum }}\text { }\underset{\min _{k=1,\ldots , p} \left\| x_k\right\| _{\infty }\ge M}{\sup }}P\left( \left\| X_{_{rk}}\right\| _{\infty }\ge \left\| x_k\right\| _{\infty }\right) \end{aligned}$$

    and this implies that \({\sup }_{\min _{k=1,\ldots , p} \left\| x_k\right\| _{\infty }\ge M}\textit{SBD}(\mathbf {x},P_\mathbf {X})\longrightarrow 0,\) when \(M\rightarrow \infty .\) \(\square \)

Proof of Theorem 2

  1. 1.

    Let \(\mathbf {T}(\mathbf {x}(t))=\mathbf {A}(t)\mathbf {x}(t)+\mathbf {b}(t)\) be a linear transformation satisfying the standard assumptions in Theorem 1. By definition,

    $$\begin{aligned} \textit{MSBD}(\mathbf {x},P_\mathbf {X})=E(\lambda [t\in \mathcal{I}, \text{ s.t. } \mathbf {x}(t)\in \text{ simplex }\{\mathbf {X}_{1}(t),\ldots ,\mathbf {X}_{p+1}(t)\}]). \end{aligned}$$
    (8)

    It is trivial to check that for any fixed \(t \in \mathcal{I}\), we have that the curve \(\mathbf {x}(t)\in \text{ simplex }\{\mathbf {X}_{1}(t),\ldots ,\mathbf {X}_{p+1}(t)\}\) if and only if the curve \(\mathbf {A}(t)\mathbf {x}(t)+\mathbf {b}(t) \in \text{ simplex }\{\mathbf {A}(t)\mathbf {X}_{1}(t)+\mathbf {b}(t),\ldots ,\mathbf {A}(t)\mathbf {X}_{p+1}(t)+\mathbf {b}(t)\}\) and therefore,

    $$\begin{aligned} \textit{MSBD}(\mathbf {T}(\mathbf {x}),P_\mathbf{T(X) })=\textit{MSBD}(\mathbf {x},P_\mathbf X ). \end{aligned}$$

    \(\square \)

  2. 2.

    The monotonicity property follows directly from expression (4) and the monotonicity property satisfied by the simplicial depth \(\textit{SD}\) defined in Eq. (3). \(\square \)

  3. 3.

    If \(P_{\mathbf {X}}\) has unique center of symmetry \(\mathbf {y}\in C(\mathcal{I},{\mathbb {R}}^{p})\), in the sense that \(P_{\mathbf {X}-\mathbf {y}}=P_{\mathbf {y}-\mathbf {X}}\), then for every \(t\in \mathcal{I}\), \(\mathbf {y}(t)\) is the center of symmetry for \(P_{\mathbf {X}(t)}\). Therefore, using the alternative expression of \(\textit{MSBD}\) in (4), and since \(\textit{SD}\) is maximized at the center, the proof is concluded. \(\square \)

  4. 4.

    The vanishing at infinity property follows directly from expression (4) and the vanishing at infinity properties of \(\textit{SD}\). \(\square \)

Proof of Theorem 3

By interchanging the two sums in the definition of \(\textit{MSBD}_{n}\), one can write

$$\begin{aligned} \textit{MSBD}_{n}(\mathbf {x})=\frac{1}{k}\underset{1\le j \le k}{\sum }\frac{1}{\left( {\begin{array}{c}n\\ p+1\end{array}}\right) }\ \underset{1\le i_{1}<\cdots <i_{p+1}\le n,}{\sum } \ I [\mathbf {x}(t_j)\in \text{ simplex }\{\mathbf {x}_{i_{1}}(t_j),\ldots ,\mathbf {x}_{i_{p+1}}(t_j)\}], \end{aligned}$$

which is equivalent to

$$\begin{aligned} \textit{MSBD}_{n}(\mathbf {x})=\frac{1}{k}\underset{1\le j \le k}{\sum }\textit{SD}_n(\mathbf {x}(t_j)), \end{aligned}$$

where \(\textit{SD}_n(\mathbf {x}(t_j))\) is the sample simplicial depth of \(\mathbf {x}(t_j)\) as defined in Eq. (6). Also, note that one can write \(\textit{MSBD}(\mathbf {x},P_\mathbf {X})=\frac{1}{k}{\sum }_{1\le j \le k}\textit{SD}(\mathbf {x}(t_j),P_{\mathbf {X}(t_j)})\), where \(\textit{SD}(\mathbf {x}(t_j),P_\mathbf {X}(t_j))\) is the population simplicial depth of \(\mathbf {x}(t_j)\) as defined in Eq. (3). Therefore,

$$\begin{aligned}&\sup _{\mathbf {x}\in ({\mathbb {R}}^{p})^k}|\textit{MSBD}_{n}(\mathbf {x})-\textit{MSBD}(\mathbf {x},P_\mathbf {X})|\\&\quad =\sup _{\mathbf {x}\in ({\mathbb {R}}^{p})^k}\left| \frac{1}{k}\underset{1\le j \le k}{\sum }\!\!\left( \textit{SD}_{n}(\mathbf {x}(t_j))-\textit{SD}(\mathbf {x}(t_j),P_{\mathbf {X}(t_j)})\right) \right| . \end{aligned}$$

By the uniform consistency of the sample simplicial depth proven in Liu (1990) we can conclude that the sample \(\textit{MSBD}_n\) converges uniformly almost surely to the population \(\textit{MSBD}\) as \(n\rightarrow \infty \):

$$\begin{aligned} \sup _{\mathbf {x}\in ({\mathbb {R}}^{p})^k}|\textit{MSBD}_{n}(\mathbf {x})-\textit{MSBD}(\mathbf {x},P_\mathbf {X})| \overset{a.s.}{\longrightarrow }0. \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

López-Pintado, S., Sun, Y., Lin, J.K. et al. Simplicial band depth for multivariate functional data. Adv Data Anal Classif 8, 321–338 (2014). https://doi.org/10.1007/s11634-014-0166-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-014-0166-6

Keywords

Mathematics Subject Classification

Navigation