1 Introduction

The paper under discussion proposes a non-standard way to approach functional data analysis based on the concept of shape-of-a-function. I congratulate the authors for such a stimulating piece of work. Among the most remarkable contributions of the paper, I would like to note the following: a formal definition of the shape of a function f as an equivalence class [f], the definition of the square-root velocity function (SRVF) \(q_f\) of a function f and its use to describe the shape [f] of f, to align sets of functions optimally, and to define the shape distance \(d_{\mathcal {S}}([f],[g])\equiv d_{\mathcal {S}}([q_f],[q_g])\) between the shapes of two functions f and g. The abundant examples in the article show the advantages of SRVF-based alignment over alternative approaches.

I would like to emphasize the relevance of using the shape metric \(d_{\mathcal {S}}\) to perform shape-based functional data analysis and point out several directions in which \(d_{\mathcal {S}}\) could complement the many tools covered in the paper discussed. Specifically, I will mention distance-based dimensionality reduction (Sect. 2) and distance-based regression (Sect. 3).

2 Shape-distance-based dimensionality reduction

Multidimensional scaling (MDS; see, for instance, Borg and Groenen 2005) is a family of dimensionality reduction methods which, starting from an inter-individual distance matrix D, aim to produce a low-dimensional configuration (a matrix with as many rows as observations in the data set, and a reduced number of columns) where the inter-row Euclidean distances approximately reproduce the entries in D. See Delicado (2011) for an application of MDS to functional data, including a comparison to principal component analysis (PCA).

In the last three decades, several extensions of MDS have been proposed in Statistics and Machine Learning, also known as manifold learning methods, with the common goal of being able to reproduce low-dimensional data configurations in the presence of nonlinearity. We can mention local MDS (Chen and Buja 2009), Isomap (Tenenbaum et al. 2000), or tSNE (t-stochastic neighborhood embedding) (van der Maaten and Hinton 2008) among others. See the R-package dimRed (Kraemer et al. 2018) which collects 18 different dimensionality reduction methods. See Hernández-Roig et al. (2021) for an application of manifold learning involving functional data.

To illustrate shape-distance-based dimensionality reduction, we simulate bimodal functions similar to those used in the paper. For \(i=1,\dots ,n=51\), let \((Z_{1i},Z_{1i},Z_{1i})\) be independent observations of a three-dimensional normal with mean (1, 1, 0), standard deviations (0.1, 0.1, 1) and correlation \(\rho \) between their components. Let \(R_{1i} = 1 + 0.3\cos (\pi /4+Z_{1i})\), \(R_{2i} = 1 + 0.3\sin (\pi /4+Z_{2i})\) and \(A_i=2\Phi (Z_{3i})-1 \sim U(-1,1)\), being \(\Phi \) the distribution function of the standard normal. We use \(\rho =0.99\) to force \((R_{1i},R_{2i},A_i)\) to have an intrinsically one-dimensional nonlinear joint distribution. For \(t\in [0,1]\), define the warping functions \(\gamma _i(t)=t+A_i t(1-t)\). Finally, the functional data (see Fig. 1, left panel) are defined as

$$\begin{aligned} f_i(t)= Z_{1i} \exp (0.5 (\gamma _{i}(t)-1.5)^2) + Z_{2i} \exp (0.5 (\gamma _{i}(t)+1.5)^2). \end{aligned}$$
Fig. 1
figure 1

Left: Simulated bimodal functions. Right: Response in the scalar-on-function regression as a function of the first principal component (based on the correlation matrix) scores of \((Z_{1i},Z_{2i},Z_{3i})\)

The function time_warping in the R package fdasrvf (Tucker 2023) has been used to compute the SRVF functions \(q_i\), to align the functions \(f_i\), and to compute the shape distance. Let \(D_2\) be the \(\mathbb {L}^2\) distance matrix with ij element \(d_2(q_i,q_j)=\Vert q_i-q_j\Vert \), and let \(D_{\mathcal {S}}\) be the distance matrix with ij element the shape distance \(d_{\mathcal {S}}([q_i],[q_j])\). Table 1 summarizes the results of applying four different distance-based dimensionality reduction methods, using first \(D_2\) and then \(D_{\mathcal {S}}\). The results of MDS with \(D_{\mathcal {S}}\) coincide by construction with those of the shape fPCA defined in the article (not included here; we have used the function fdasrvf::vertFPCA). It can be seen that the four methods’ performances improve when using \(D_{\mathcal {S}}\) instead of \(D_2\). The improvement is more notorious for classical MDS, probably because shape-based alignment is able to linearize the original nonlinear structure of the functional data set, and MDS based on Euclidean distances works better for linear data configurations. These facts are extra arguments in favor of using shape-based functional data analysis, as proposed by the Srivastava and co-authors. Among the nonlinear dimensionality reduction methods, Isomap is the best one, followed by tSNE. Remarkably, they are able to discover the true one-dimensional structure of the data even before of aligning the functional data, when they are based simply on \(D_2\).

Table 1 Correlations (in absolute value) between the first principal component (based on the correlation matrix) scores of \((Z_{1i},Z_{2i},Z_{3i})\), and the first dimension scores obtained by four distance-based dimensionality reduction methods using \(D_2\) and \(D_{\mathcal {S}}\)

3 Shape-distance-based regression

Based on classical MDS, Cuadras and co-authors developed the distance-based linear model (DB-LM) (see Cuadras et al. 1996 and references therein), which uses the principal co-ordinates (the output of the metric MDS) as explanatory variables in a linear regression model. Without explicitly fitting the linear model, DB-LM obtains an algebraic closed expression for the fitted values directly from the distance matrix. Extensions of DB-LM with applications to functional data analysis can be found in Boj et al. (2010) (where local linear distance-based scalar-on-function regression is defined) and in Boj et al. (2016) (where both parametric and nonparametric distance-based versions of the generalized linear model are developed, and an application to logistic regression with a functional predictor is included). The R package dbstats (Boj et al. 2022) implements these methods, which are ready to be applied when the distance between functions is computed by the shape distance \(d_{\mathcal {S}}\).

For instance, scalar-on-function shape distance-based regression is an analogue to pre-registration in shape-regression. In the simulated bimodal functions example, we define a scalar response function \(Y_i\) as the range of the values of \(f_i(t)\), \(R_i\), plus a noise \(\varepsilon _i\sim N(0,\sigma ^2)\), where \(\sigma \) is chosen to have \(cor(Y_i,R_i)^2\approx 0.85\) (see Fig. 1, right panel).

We fit a linear shape distance-based regression using the function dbstats::dblm, giving an R-squared of 0.40 (far from the 0.85 achievable if \(R_i\) were known). The reason for this poor performance is the nonlinear relationship between the response \(Y_i\) and the variables \((Z_{1i}, Z_{1i}, Z_{1i})\) that generated the functions \(f_i\). To take into account nonlinearities, we fit a nonparametric local distance-based regression model (using dbstats::ldblm). Now the R-squared is 0.87. Similar results are obtained when fitting an additive model (using mgcv::gamWood 2017) using as explanatory variables the first three principal components in vertical shape fPCA.

Finally, we note that the distance-based approach equivalent to separating and including both shape and phase in regression would consist of additionally computing \(\mathbb {L}^2\) distances between the estimated phase functions \(\gamma _i\) (let \(D_{\mathcal {P}}\) be the phase distance matrix), combining \(D_{\mathcal {S}}\) and \(D_{\mathcal {P}}\) into a new shape-and-phase distance matrix \((D_{\mathcal {P}\mathcal {S}}= \sqrt{\alpha D_{\mathcal {S}}^2 + (1-\alpha ) D_{\mathcal {P}}^2}\) element-wise, with \(\alpha \in [0,1]\) being a tuning parameter), and computing the distance-based scalar-on-function regression from \(D_{\mathcal{P}\mathcal{S}}\).