Comments on: Shape-based functional data analysis

The paper under discussion proposes a non-standard way to approach functional data analysis based on the concept of shape-of-a-function. I congratulate the authors for such a stimulating piece of work. Among the most remarkable contributions of the paper, I would like to note the following: a formal definition of the shape of a function f as an equivalence class [f], the definition of the square-root velocity function (SRVF) $q_f$ of a function f and its use to describe the shape [f] of f, to align sets of functions optimally, and to define the shape distance $d_{\mathcal {S}}([f],[g])\equiv d_{\mathcal {S}}([q_f],[q_g])$ between the shapes of two functions f and g. The abundant examples in the article show the advantages of SRVF-based alignment over alternative approaches.

I would like to emphasize the relevance of using the shape metric $d_{\mathcal {S}}$ to perform shape-based functional data analysis and point out several directions in which $d_{\mathcal {S}}$ could complement the many tools covered in the paper discussed. Specifically, I will mention distance-based dimensionality reduction (Sect. 2) and distance-based regression (Sect. 3).

2 Shape-distance-based dimensionality reduction

Multidimensional scaling (MDS; see, for instance, Borg and Groenen 2005) is a family of dimensionality reduction methods which, starting from an inter-individual distance matrix D, aim to produce a low-dimensional configuration (a matrix with as many rows as observations in the data set, and a reduced number of columns) where the inter-row Euclidean distances approximately reproduce the entries in D. See Delicado (2011) for an application of MDS to functional data, including a comparison to principal component analysis (PCA).

In the last three decades, several extensions of MDS have been proposed in Statistics and Machine Learning, also known as manifold learning methods, with the common goal of being able to reproduce low-dimensional data configurations in the presence of nonlinearity. We can mention local MDS (Chen and Buja 2009), Isomap (Tenenbaum et al. 2000), or tSNE (t-stochastic neighborhood embedding) (van der Maaten and Hinton 2008) among others. See the R-package dimRed (Kraemer et al. 2018) which collects 18 different dimensionality reduction methods. See Hernández-Roig et al. (2021) for an application of manifold learning involving functional data.

To illustrate shape-distance-based dimensionality reduction, we simulate bimodal functions similar to those used in the paper. For $i=1,\dots ,n=51$, let $(Z_{1i},Z_{1i},Z_{1i})$ be independent observations of a three-dimensional normal with mean (1, 1, 0), standard deviations (0.1, 0.1, 1) and correlation $\rho $ between their components. Let $R_{1i} = 1 + 0.3\cos (\pi /4+Z_{1i})$, $R_{2i} = 1 + 0.3\sin (\pi /4+Z_{2i})$ and $A_i=2\Phi (Z_{3i})-1 \sim U(-1,1)$, being $\Phi $ the distribution function of the standard normal. We use $\rho =0.99$ to force $(R_{1i},R_{2i},A_i)$ to have an intrinsically one-dimensional nonlinear joint distribution. For $t\in [0,1]$, define the warping functions $\gamma _i(t)=t+A_i t(1-t)$. Finally, the functional data (see Fig. 1, left panel) are defined as

$$\begin{aligned} f_i(t)= Z_{1i} \exp (0.5 (\gamma _{i}(t)-1.5)^2) + Z_{2i} \exp (0.5 (\gamma _{i}(t)+1.5)^2). \end{aligned}$$

The function time_warping in the R package fdasrvf (Tucker 2023) has been used to compute the SRVF functions $q_i$, to align the functions $f_i$, and to compute the shape distance. Let $D_2$ be the $\mathbb {L}^2$ distance matrix with ij element $d_2(q_i,q_j)=\Vert q_i-q_j\Vert $, and let $D_{\mathcal {S}}$ be the distance matrix with ij element the shape distance $d_{\mathcal {S}}([q_i],[q_j])$. Table 1 summarizes the results of applying four different distance-based dimensionality reduction methods, using first $D_2$ and then $D_{\mathcal {S}}$. The results of MDS with $D_{\mathcal {S}}$ coincide by construction with those of the shape fPCA defined in the article (not included here; we have used the function fdasrvf::vertFPCA). It can be seen that the four methods’ performances improve when using $D_{\mathcal {S}}$ instead of $D_2$. The improvement is more notorious for classical MDS, probably because shape-based alignment is able to linearize the original nonlinear structure of the functional data set, and MDS based on Euclidean distances works better for linear data configurations. These facts are extra arguments in favor of using shape-based functional data analysis, as proposed by the Srivastava and co-authors. Among the nonlinear dimensionality reduction methods, Isomap is the best one, followed by tSNE. Remarkably, they are able to discover the true one-dimensional structure of the data even before of aligning the functional data, when they are based simply on $D_2$.

Table 1 Correlations (in absolute value) between the first principal component (based on the correlation matrix) scores of $(Z_{1i},Z_{2i},Z_{3i})$, and the first dimension scores obtained by four distance-based dimensionality reduction methods using $D_2$ and $D_{\mathcal {S}}$

Full size table

3 Shape-distance-based regression

Based on classical MDS, Cuadras and co-authors developed the distance-based linear model (DB-LM) (see Cuadras et al. 1996 and references therein), which uses the principal co-ordinates (the output of the metric MDS) as explanatory variables in a linear regression model. Without explicitly fitting the linear model, DB-LM obtains an algebraic closed expression for the fitted values directly from the distance matrix. Extensions of DB-LM with applications to functional data analysis can be found in Boj et al. (2010) (where local linear distance-based scalar-on-function regression is defined) and in Boj et al. (2016) (where both parametric and nonparametric distance-based versions of the generalized linear model are developed, and an application to logistic regression with a functional predictor is included). The R package dbstats (Boj et al. 2022) implements these methods, which are ready to be applied when the distance between functions is computed by the shape distance $d_{\mathcal {S}}$.

For instance, scalar-on-function shape distance-based regression is an analogue to pre-registration in shape-regression. In the simulated bimodal functions example, we define a scalar response function $Y_i$ as the range of the values of $f_i(t)$, $R_i$, plus a noise $\varepsilon _i\sim N(0,\sigma ^2)$, where $\sigma $ is chosen to have $cor(Y_i,R_i)^2\approx 0.85$ (see Fig. 1, right panel).

We fit a linear shape distance-based regression using the function dbstats::dblm, giving an R-squared of 0.40 (far from the 0.85 achievable if $R_i$ were known). The reason for this poor performance is the nonlinear relationship between the response $Y_i$ and the variables $(Z_{1i}, Z_{1i}, Z_{1i})$ that generated the functions $f_i$. To take into account nonlinearities, we fit a nonparametric local distance-based regression model (using dbstats::ldblm). Now the R-squared is 0.87. Similar results are obtained when fitting an additive model (using mgcv::gamWood 2017) using as explanatory variables the first three principal components in vertical shape fPCA.

Finally, we note that the distance-based approach equivalent to separating and including both shape and phase in regression would consist of additionally computing $\mathbb {L}^2$ distances between the estimated phase functions $\gamma _i$ (let $D_{\mathcal {P}}$ be the phase distance matrix), combining $D_{\mathcal {S}}$ and $D_{\mathcal {P}}$ into a new shape-and-phase distance matrix $(D_{\mathcal {P}\mathcal {S}}= \sqrt{\alpha D_{\mathcal {S}}^2 + (1-\alpha ) D_{\mathcal {P}}^2}$ element-wise, with $\alpha \in [0,1]$ being a tuning parameter), and computing the distance-based scalar-on-function regression from $D_{\mathcal{P}\mathcal{S}}$.

References

Boj E, Delicado P, Fortiana J (2010) Distance-based local linear regression for functional predictors. Comput Stat Data Anal 54(2):429–437
Article MathSciNet Google Scholar
Boj E, Caballé A, Delicado P, Esteve A, Fortiana J (2016) Global and local distance-based generalized linear models. TEST 25:170–195
Article MathSciNet Google Scholar
Boj E, Caballé A, Delicado P, Fortiana J (2022) dbstats: distance-based statistics. R package version 2.0.1
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, New York
Google Scholar
Chen L, Buja A (2009) Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J Am Stat Assoc 104:209–219
Article MathSciNet Google Scholar
Cuadras CM, Arenas C, Fortiana J (1996) Some computational aspects of a distance-based model for prediction. Commun. Stat. B Simul. 25:593–609
Article Google Scholar
Delicado P (2011) Dimensionality reduction when data are density functions. Comput Stat Data Anal 55(1):401–420
Article MathSciNet Google Scholar
Hernández-Roig HA, Aguilera-Morillo MC, Lillo RE (2021) Functional modeling of high-dimensional data: a manifold learning approach. Mathematics 9(4):406
Article Google Scholar
Kraemer G, Reichstein M, Mahecha MD (2018) dimRed and coRanking–unifying dimensionality reduction in r. R Journal 10(1):342–358 (coRanking version 0.2.6)
Article Google Scholar
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Article Google Scholar
Tucker JD (2023) fdasrvf: elastic functional data analysis. R package version 2.0.3. https://CRAN.R-project.org/package=fdasrvf
van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(Nov):2579–2605
Google Scholar
Wood SN (2017) Generalized additive models: an introduction with R, 2nd edn. Chapman and Hall/CRC, London
Book Google Scholar

Download references

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, Universitat Politècnica de Catalunya. BarcelonaTECH, Jordi Girona 31, 08034, Barcelona, Spain
Pedro Delicado

Authors

Pedro Delicado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Delicado.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Delicado, P. Comments on: Shape-based functional data analysis. TEST 33, 62–65 (2024). https://doi.org/10.1007/s11749-023-00904-8

Download citation

Received: 30 October 2023
Accepted: 04 November 2023
Published: 11 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11749-023-00904-8

Use our pre-submission checklist

Avoid common mistakes on your manuscript.