Robust curve registration using the t distribution
 151 Downloads
 2 Citations
Abstract
In functional data analysis, it is often of interest to discover a general common pattern, or shape, of the function. When the subjectspecific amplitude and phase variation of data are not of interest, curve registration can be used to separate the variation from the data. Shapeinvariant models (SIM), one of the registration methods, aim to estimate the unknown sharedshape function. However, the use of SIM and of general registration methods assumes that all curves have the sharedshape in common and does not consider the existence of outliers, such as a curve, whose shape is inconsistent with the remainder of the data. Therefore, we propose using the t distribution to robustify SIMs, allowing outliers of amplitude, phase, and other errors. Our SIM can identify and classify the three types of outliers mentioned above. We use simulation and an empirical data set to evaluate the performance of our robust SIM.
Keywords
Outliers Functional data analysis Registration Mixedeffect model Robust estimation1 Introduction
Data that are observed repeatedly from subjects at different times can be assumed to be generated from underlying smooth functions. Functional data analysis (Ramsay and Silverman 2005) is concerned with the underlying curves.
If we are interested in discovering a general pattern or shape exhibited by the function that underlies the functions of all the subjects, then the aim of data analysis becomes the estimation of the unknown sharedshape function from the observed data. A naive approach to achieving that objective is simply to take the crosssectional mean of smoothed observations. However, repeatedly observed data often contain variation along the horizontal and vertical axes, making the estimation of the sharedshape difficult. The variations along the horizontal and vertical axes are often referred to as the amplitude and phase variation, respectively.
Curve registration, also called curve alignment in biology, is a method for separating the amplitude and phase variation from observed data when they are not of interest in data analysis. One possible objective of registration is to compare all of the curves without the uninteresting variation. In this case, registration can be considered as a preprocessing step of data analysis, in the sense that curves are transformed to enable all transformed curves to be as similar as possible, with those transformed (registered) curves then used for the subsequent data analysis. Another possible objective of registration is to estimate the unknown sharedshape function with the uninformative variation removed. In this case, the registration itself can be considered as a statistical analysis, in the sense that we are interested in a “true” sharedshape function and in estimating it from observed data. Though these objectives can be overlapping and many registration methods can achieve both objectives, we focus in this paper on the second objective for registration.
There is substantial literature on registration. Examples include landmark registration (Kneip and Gasser 1992; Kneip and Engel 1995), continuous monotone registration (Ramsay and Li 1998), registration by a selfmodeling warping function (Gervini and Gasser 2004), and FisherRao metricbased registration (Srivastava et al. 2011). There is also other work in this area such as Sakoe and Chiba (1978), Gervini and Gasser (2005), James (2007), Telesca and Inoue (2008), Raket et al. (2014) and Earls and Hooker (2017).
Selfmodeling nonlinear regression, commonly known as the use of a shapeinvariant model (SIM), is also considered to be a curve registration method (Lawton et al. 1972; Kneip and Gasser 1988), and can be used to achieve the second objective above. There is also substantial previous work on SIMs, including a method using kernel density estimation (Kneip and Engel 1995), nonparametric maximum likelihood estimation (Rønn 2001), considering a more flexible time transformation (Brumback and Lindstrom 2004), and simultaneously registering and clustering curves (Gaffney and Smyth 2005; Liu and Yang 2009; Zhang and Telesca 2014; Szczesniak et al. 2014). In addition, many studies such as Lindstrom (1995), Ke and Wang (2001), Rønn (2001) and Gaffney (2004) define a SIM model as a nonlinear mixedeffect model, for which the sharedshape function is considered as fixed effects and other subjectspecific variation as random effects.
Though the use of SIMs and other registration methods allows the presence of subjectspecific amplitude and phase variation in data, they assume that all relevant curves have the shared shape. In other words, they do not consider the existence of “outliers” in data. Outliers can be generally defined as observation values, whose data are inconsistent with the others (Welsh and Richardson 1997). In this paper as outliers, we consider a curve, whose shape is different from the remainder of the data, or one having considerably large amplitude or phase variation. If the data contain such outliers, those outliers will negatively affect the estimation of the sharedshape function. Especially, Gaussian distributions, which are quite sensitive to outliers, are often used for SIMs.
Consequently, in this paper, we address this issue by proposing a new kind of SIM that is more robust to outliers. Specifically, we use a mixedeffect model type SIM and assume a t distribution for the subject variation such as amplitude, phase, and other errors including the shape, allowing the existence of outliers for each component. The characteristic feature of our method is the formulation of our SIM to enable it to be related easily to a linear mixedeffect model (LMM, e.g., Verbeke 1997), with the result that we can utilize the advantages of an LMM using a t distribution. Specifically, our SIM can provide useful information for identification and classification of three types of outliers, amplitude, phase, and other errors. In addition, using the hierarchical structure of a t distribution, we derive an efficient expectation maximization (EM)type algorithm to estimate parameters (Dempster et al. 1977; Meng and Rubin 1993; Liu and Rubin 1994), which can be considered as a simple iteratively reweighted least squares, and also, allows to efficiently estimate the robust tuning parameter along with other parameters.
The remainder of this paper is organized as follows. In Sect. 2, we introduce our proposed model and derive the EM algorithm for estimating parameters. In addition, we show how our model can be used to separate amplitude, phase, and other errors, which can be used to identify outlier types. In Sects. 3 and 4, we use simulation and an empirical data set, respectively, to evaluate the performance of our model.
2 Robust shapeinvariant model
In this section, we propose a new method, robust SIMs, for estimating the sharedshape function without being affected by outliers. Various means of formulating an SIM have been proposed (Lawton et al. 1972; Rønn 2001; Gaffney 2004). In this paper, we choose the SIM defined in Liu and Yang (2009) as the base model to robustify, because using that SIM enables us to relate our model to LMMs more easily. As a result of this relationship, we can derive some properties that are useful for parameter estimation and identification of outlier types.
2.1 Definition of robust shapeinvariant model
In robust SIM, the parameters \(\varvec{\eta }\) and \(\sigma _r\) (\(r = 1,2,3\)) are estimated given \((t_{im}, y_{im})\)\((i = 1,\ldots ,n\,;\,m = 1,\ldots ,M_i)\). Especially, in robust SIM, the number of basis function p must be prespecified.
Thus, the model (1) can be interpreted as follows. We assume that the observed data \(y_{im}\)\((m = 1,\ldots ,M_i)\) consist of an unknown sharedshape function \(\phi (t)\) and that there is also other subject variation in the data, viz., the amplitude variation \(\alpha _i\), phase variation \(\beta _i\), and other errors \(\varepsilon _{im}\). This \(\varepsilon _{im}\) can be considered to contain not only observation errors, but also the shape difference from the sharedshape that cannot be explained by the simple shift variation \(\alpha _i\) and \(\beta _i\).
Moreover, applying the Taylor expansion has the advantage that the resulting model (1) includes all random variables \(\alpha _i\), \(\beta _i\) and \(\varepsilon _{im}\) in the separate terms linearly related to \(y_{im}\). This makes it easier to relate our model to LMM, as explained in later subsections.
2.2 Relationship to the linear mixedeffect model
2.3 Parameter estimation
 Estep

given the kth current estimator \(\varvec{\theta }^{(k)}\), the auxiliary function of (6) which we call the Qfunction, denoted by \(Q(\varvec{\theta }\,;\,\varvec{\theta }^{(k)})\) is calculated.
 Mstep

The parameters \(\varvec{\theta }^{(k)}\) are updated by maximizing the Qfunction.
To estimate parameters using the ECM or ECME algorithm, we need to derive Qfunction. As the Qfunction, the conditional expectation of the loglikelihood function for complete data is used. “The complete data” indicates that augmented data of the observed data and “missing data” that are not actually observed but assumed to be observed. In our case, \(\varvec{y}_i\) are observed data, while \(\alpha _i\) and \(\beta _i\) are considered as missing data. However, since all data are tdistributed, the resulting loglikelihood would still be complicated. Therefore, we use the gammanormal hierarchical structure of the t distribution to derive the Qfunction (e.g., Lange et al. 1989; Liu and Rubin 1995).
 Initialization:

Given \(\varvec{y}_i\) and \(\varvec{t}_i=(t_{i1},\ldots ,t_{iM_i})\) (\(i=1,\ldots ,n\)), determine basis functions \(b_{j}\) (\(j=1,\ldots ,p\)) and p. When estimating \(\varvec{h}\), the group indicator \(g_{i1},\ldots ,g_{iV}\) should also be given. Then randomly generate initial values for parameters \(\varvec{\theta }^{(0)} = (\varvec{\eta }^{(0)},\sigma _1^{(0)2},\sigma _2^{(0)2},\sigma _3^{(0)2}, \varvec{h}^{(0)})\), and set the number of iteration to \(k=0\) and convergence criterion \(\epsilon\).
 Estep 1:
 Given the current estimate \(\varvec{\theta }^{(k)}\) at the kth iteration, the Qfunction is obtained by computing$$\begin{aligned} \hat{\tau }_i^{(k)}= & {} \mathrm{E}_{\varvec{\theta }^{(k)}}[\tau _i\,\,\varvec{y}_i]=\frac{h_i^{(k)}+M_i}{h_i^{(k)}+d_i^{(k)2}}\nonumber \\ {\mathrm{where}}\,\,\, d_i^{(k)2}= & {} (\varvec{y}_i\varvec{B}_i\varvec{\eta }^{(k)})^\mathrm{{T}} \varvec{V}_i^{(k)1}(\varvec{y}_i\varvec{B}_i\varvec{\eta }^{(k)}) \end{aligned}$$(12)and for \(r = 1,2,3\)$$\begin{aligned} \varvec{V}_i^{(k)}= & {} \sigma ^{(k)2}_1\varvec{1}_{M_i}\varvec{1}_{M_i}^\mathrm{{T}}+\sigma ^{(k)2}_2\varvec{D}_i\varvec{\eta }^{(k)}\varvec{\eta }^{(k)T} \varvec{D}_i^\mathrm{{T}}+\sigma ^{(k)2}_3\varvec{I}_{M_i} \end{aligned}$$(13)$$\begin{aligned} \mathrm{E}_{\varvec{\theta }^{(k)}}[\varvec{u}_{ri}\,\,\varvec{y}_i,\tau _i]&=\sigma ^{(k)2}_r\varvec{Z}_{ri}^\mathrm{{T}}\varvec{V}_i^{(k)1}(\varvec{y}_i\varvec{B}_i\varvec{\eta }^{(k)})\nonumber \\ \mathrm{V}_{\varvec{\theta }^{(k)}}[\varvec{u}_{ri}\,\,\varvec{y}_i,\tau _i]&=\sigma ^{(k)2}_r\varvec{I}_{M_i}\sigma ^{(k)4}_r\varvec{Z}_i^\mathrm{{T}}\varvec{V}_i^{(k)1}\varvec{Z}_i\nonumber \\ \mathrm{E}_{\varvec{\theta }^{(k)}}[\varvec{u}_{ri}^\mathrm{{T}}\varvec{u}_{ri}\,\,\varvec{y}_i,\tau _i]&={\mathrm{tr}}(\mathrm{V}_{\varvec{\theta }^{(k)}}[\varvec{u}_{ri}\,\,\varvec{y}_i,\tau _i])+\mathrm{E}_{\varvec{\theta }^{(k)}}[\varvec{u}_{ri}\,\,\varvec{y}_i,\tau _i]^\mathrm{{T}}\mathrm{E}_{\varvec{\theta }^{(k)}}[\varvec{u}_{ri}\,\,\varvec{y}_i,\tau _i]. \end{aligned}$$(14)
 CMstep 1:
 Fix \(\varvec{\theta }_2^{(k)}\), and update \(\varvec{\theta }_1^{(k+1)}\) by maximizing conditional expectation of (8),...,(10), which leads to$$\begin{aligned} \varvec{\eta }^{(k+1)}= & {} \left( \frac{\hat{\tau }_i^{(k)}}{\sigma ^{(k)2}_3}\mathrm{E}_{\varvec{\theta }^{(k)}}[\varvec{X}_i^\mathrm{{T}}\varvec{X}_i\varvec{\eta }\,\,\varvec{y}_i,\tau _i]\right) ^{1}\frac{\hat{\tau }_i^{(k)}}{\sigma ^{(k)2}_3}\mathrm{E}_{\varvec{\theta }^{(k)}}[\varvec{X}_i^\mathrm{{T}}(\varvec{y}_i\alpha _i\varvec{1}_{M_i})\,\,\varvec{y}_i,\tau _i]\nonumber \\ \sigma ^{2(k+1)}_r= & {} \frac{1}{n}\sum _{i=1}^n\mathrm{E}_{\varvec{\theta }^{(k)}}[\varvec{u}_{ri}^\mathrm{{T}}\varvec{u}_{ri}\,\,\varvec{y}_i,\tau _i],\qquad (r = 1,2,3). \end{aligned}$$(15)
 Estep 2:

Given the current estimate, define \(\varvec{\theta }^{(k+1/2)} = (\varvec{\eta }^{(k+1)},\sigma _1^{(k+1)2}.\)
\(\sigma _2^{(k+1)2},\sigma _3^{(k+1)2}, \varvec{h}^{(k)})\). Then, calculating (12) using \(\varvec{\theta }^{(k+1/2)}\) instead of \(\varvec{\theta }^{(k)}\).
 CMstep 2:
 Fix \(\varvec{\theta }_1^{(k+1)}\), and update \(\varvec{\theta }_2^{(k+1)} = \varvec{h}^{(k+1)}\) by maximizing either function below:
 (multicycle version of ECM)
 Obtain \(\varvec{h}^{(k+1)}\) by maximizing conditional expectation of (11), that is$$\begin{aligned} \bar{h}_v^{(k+1)}= & {} \arg \max _{h}\sum _{i=1}^ng_{iv}\bigg \{\frac{h}{2}\left( \log \frac{h}{2}+\mathrm{E}_{\varvec{\theta }^{(k+1/2)}}[\log \tau _i\,\,\varvec{y}_i]\hat{\tau }_i^{(k+1/2)}\right) \\&\qquad \qquad \qquad \qquad \qquad \quad \log {\Gamma }\left( \frac{h}{2}\right) \bigg \},\quad (v=1,\ldots ,V). \end{aligned}$$
 (ECME)
 Obtain \(\varvec{h}^{(k+1)}\) by maximizing (6), which leads to$$\begin{aligned} \bar{h}_v^{(k+1)}= & {} \arg \max _{h}\sum _{i=1}^ng_{iv}\bigg \{\log {\Gamma }\left( \frac{h}{2}\right) +\log {\Gamma }\left( \frac{h+M_i}{2}\right) +\frac{h}{2}\log h\\& \frac{h+M_i}{2}\log \left( h+d_{i}^{(k+1/2)2}\right) \bigg \}. \end{aligned}$$
 Convergence test

Compute \(\log L(\varvec{\theta }^{(k+1)})\), the value of the objective function (6) using updated parameters and, for \(k>1\), if \(\log L(\varvec{\theta }^{(k+1)})\log L(\varvec{\theta }^{(k)})<\epsilon\), terminate; otherwise, let \(k = k+1\) and return to Estep 1.
From these updating formulas, this algorithm can be considered as iteratively reweighted least squares (Lange et al. 1989), and \(\hat{\tau }_i\) can be interpreted as a weight for the subject i on estimation. This weight is determined by the Maharanobis distance \(d^2_i\) in (13); for a subject, whose distance is larger, that is, whose observations are more distant from the mean, the weight of the subject on estimation decreases.
2.4 Decomposition of amplitude, phase and other errors
Though the principal objective of robust SIM is to estimate the sharedshape function without being affected by outliers, we can also use our robust SIM to identify subjects having outlying observations: subjects having very large Maharanobis distances \(d_i\) calculated using (13) compared to others could be considered as outliers.
Furthermore, using the relationship with LMM explained in Sect. 2.2, we can also identify which outlier type, \(\alpha _i\), \(\beta _i\) or \(\varvec{\varepsilon }_i\) the subject has. Specifically, as we allow a randomness in the \(\alpha _i\), \(\beta _i\) and \(\varvec{\varepsilon }_i\) component, a subject identified as having outliers can include outliers in either one component or several. Then, there could be a situation in which we want to identify subjects having only \(\varvec{\varepsilon }_i\) outliers, that is, subjects whose shape is inconsistent with others, or only \(\alpha _i\) outliers, that is, those having amplitude outliers. In that case, decomposing \(d_i\), where \(\varvec{\eta }\) and \(\alpha _i\), \(\beta _i\) are replaced with those current estimates, enables us to see the distance from the mean for each component separately.
Note that this type of decomposition had originally been performed in the LMM framework (Pinheiro et al. 2001), in which the distance is decomposed into the random effect \(\varvec{\gamma }_i\) and other error \(\varvec{\varepsilon }_{i}\) components. However, in this paper, this idea is extended to the further decomposition of random effects into amplitude and phase components, enabling us to handle the shift variation along the horizontal and vertical axes separately.
3 Simulation study
In this section, we describe a simulation we conducted to evaluate the performance of robust SIM compared to the existing methods.
3.1 Data generation
3.2 Setting and evaluation
In this simulation, we considered a full factorial design with \(c = 0.2,0.5\), \(s = 5, 10\) and \(n = 30, 60\). We also considered a “nonnoisy situation”, with no outlying observations (i.e., \(c = 0\) and \(s = 1\)). For each combination of all \(2 \times 2 \times 2 + 2\) cases (\(+2\) is for nonnoisy situation in \(n = 30\) and 60), we randomly generated 50 different data sets. For each data set, we applied the method and evaluated.
To estimate a robust SIM, we need to prespecify the applied basis functions and the way of handling the degrees of freedom \(h_i\). As basic functions, we used the \(p = 5\) Bsplines shown in Fig. 1. To estimate \(h_i\), we considered all cases (i), (ii), and (iii) mentioned in Sect. 2.1. Specifically, in case (i), we fixed \(h_i = h\) for all \(i=1,\ldots ,n\) varying h as 1, 5, 10, or 20. In case (ii), h was estimated by a datadriven approach. In case (iii), where \(h_i\) is estimated in groups, we allocated a proportion \(1  c\) of the subjects generated from a distribution with a smaller variance to group 1, and the remaining c proportion of the subjects to group 2. Note that as no outlying observations are generated in \(s = 1\) and \(c = 0\) situation, robust SIM (ii) and (iii) are equivalent for this situation, so only (ii) is applied.
To evaluate the performance of our robust SIM, we examine the accuracy of estimation for the true sharedshape function and compare it with several existing registration methods. In this simulation, we additionally applied following methods: (iv) SIM with a Gaussian assumption, that is, the model (1) in which a Gaussian distribution is assumed for all \(\alpha _i\), \(\beta _i\) and \(\varepsilon _{im}\) (Gaussian SIM), (v) landmark registration (Kneip and Gasser 1992), (vi) continuous monotone registration (Lange et al. 1989), (vii) registration by a selfmodeling warping function (Gervini and Gasser 2004), and (viii) Fisher–Rao metricbased registration (Srivastava et al. 2011). Methods (v) and (vi) were applied using the R package “fda”, method (vii) was implemented in Matlab code obtained from the author’s webpage^{1}, and method (viii) was applied using the R package “fdasrvf”. To handle the subjectvarying timepoints in these software packages, we first smoothed the observed data using the true Bspline basis functions, and applied the values of the smoothed function at the \(1,\ldots ,T\) timepoints in each method.
Landmarks must be specified in advance to apply (v). Here, two curving points in Fig. 1 are specified as landmarks, obtained as follows. Each observed datum \(y_{im}\) is smoothed using the same Bspline basis functions, as shown in Fig. 1, and from the resulting smoothed functions \(f_i\) (\(i = 1,\ldots ,n\)), the first landmark is set to be the timepoint \(\ell _{i1} = \arg \min _{t\in \{1,\ldots ,T\}}\{f^{\prime }(t) = 0\,\wedge \,f^{\prime \prime }(t)>0\}\), and the second is the timepoint \(\ell _{i2} = \arg \min _{t\in \{1,\ldots ,T\}}\{f^{\prime }(t) = 0\,\wedge \,f^{\prime \prime }(t)<0\}\). Note that we assume that \(\ell _{i1}<\ell _{i2}\) for all i.
For (vi) and (vii), we must specify a “target function” to which all functions are aligned in registration. Here, we use a median function as a target. That is, we take the median of coefficients of smoothed data \(f_1,\ldots ,f_n\), and use that median as the coefficient for the Bspline basis. For the hyperparameters in (vii), we specify the number of components to \(q_1 = 2\) and the number of basis functions as \(q_2 = 6\), because Gervini and Gasser (2004) recommends \(q_2 = 3q_1\). for methods that require initial values for estimation, we used 50 initial random starts.
To evaluate the performance, we calculate the mean squared error (MSE). That is, letting \(\hat{\phi }\) be the estimated sharedshape function, MSE is defined as \([\sum _{m=1}^{T}(\hat{\phi }(t_m)\phi (t_m))^2/T]^{1/2}\), where \(t_m = 1,2,\ldots ,T\). The \(\hat{\phi }\) in each method is constructed as follows. In (i) and (ii), we use \(\hat{\phi }(t_m) = \sum _{j=1}^p\hat{\eta }_jB_j(t_m)\)\((m = 1,\ldots ,T)\), where \(\hat{\eta }_j\) (\(j = 1,\ldots ,p\)) are estimated coefficients. For other methods, we use \(\hat{\phi }(t_m) = \mathrm{median}(\hat{f}_1(t_m),\ldots ,\hat{f}_n(t_m))\) (\(m = 1,\ldots ,T\)), where \(\hat{f}_i\)\((i = 1,\ldots ,n)\) are registered functions. Here, we take the median of all registered functions rather than mean, because if we were to take the mean to define \(\hat{\phi }\), the resulting MSE of some methods would be too large to compare with the other methods, possibly because there might be some severely misregistered functions due to outliers.
3.3 Results
Figures 2 (left) and 3 show the results for \(n = 30\) in nonnoisy and noisy situation, respectively. While the MSE results of robust and Gaussian SIM are similar in nonnoisy situation, the differences between them are apparent in noisy situation, especially for \(c = 0.5\). This indicates that SIM assuming a Gaussian distribution is sensitive to outliers.
In addition, the performance of existing methods worsens as c increases, while that of robust SIM appears not to be affected much by an increase in c. Comparing the situation between \(s = 5\) and 10, robust SIM appears to perform similarly for \(s = 10\) than for \(s = 5\). This indicates that robust SIM can be robust especially for distant outliers.
Though the general tendencies of MSE results by (i), (ii), and (iii) in robust SIM are similar, it seems that robust SIM with a smaller h in (i) or (iii) have better results in accuracy. Specifically, for \(h = 10, 20\) in (i) and (ii), the increase in c does affect the accuracy, while with \(h = 1, 5\) and (iii), the accuracy seems stable for all situations.
Figures 2 (right) and 4 show the results for \(n = 60\). For all methods the variance of MSE is reduced and the median of MSE is improved, compared to \(n = 30\). However, for \(c = 0.5\), the variance of MSE is increased for \(h = 10, 20\), (ii), and existing methods (iv),\(\ldots\),(viii).
Table 1 shows the degrees of freedom estimated by the robust SIM in case (ii) and (iii). \(h_i\) estimated by (ii) and (iii) for group 2 (having outlying observations) tends to be smaller than the one by (iii) for group 1. This indicates that the algorithm efficiently provides robustness to the model. In particular, in (iii) case, \(h_i\) is adjusted during each iteration, thereby conferring robustness to a subset of the data rather than the whole data set.
Estimated degrees of freedom by robust SIM with (ii) and (iii) case
n  c  s  (ii)  (iii) group 1  (iii) group 2  

Mean  SD  Mean  SD  Mean  SD  
30  0  1  761.41  54.60  –  –  –  – 
0.2  20  13.63  2.56  127.83  19.33  3.38  0.55  
50  8.66  0.97  132.52  18.03  2.43  0.37  
0.5  20  47.10  33.39  130.02  17.85  4.24  0.72  
50  5.91  0.81  125.45  19.50  2.60  0.31  
60  0  1  1337.75  420.25  –  –  –  – 
0.2  20  13.60  1.84  164.99  24.26  3.44  0.41  
50  8.56  0.79  164.49  26.00  2.42  0.26  
0.5  20  27.32  22.54  161.48  13.13  4.06  0.39  
50  5.81  0.49  155.80  14.95  2.59  0.20 
4 Application
4.1 Data and setting
In this section, we use an empirical electrocardiogram (ECG) data set to illustrate the performance of robust SIM. The ECG data set was obtained from the UCR Time Series Classification and Clustering Archive^{2}, which was analyzed in Olszewski (2001). The ECG database contains 200 data sets (corresponding to subjects in this paper), each consisting of measurements at 96 timepoints during one heartbeat. Each measurement was recorded by an electrode placed on the body. The first 100 subjects were defined as the training data, while the latter 100 were the test data. Each of the 200 data sets were labeled as “normal” or “abnormal.”
The purpose of this data analysis was twofold; first, it estimated the sharedshape functions from both the training and test data; second, it identified the outlier types among the subjects in test data, using the estimated sharedshape function as a reference curve. Therefore, to apply the data in this analysis, we picked up only “normal” subjects from the training data, and both “normal” and “abnormal” subjects from the test data. To reduce the computational burden, we used the data of 40 subjects, and 80% from the training data, and 20% from the test data with an equal number of normal and abnormal subjects. All subjects were randomly chosen.
In addition, we estimated the degrees of freedom in each group [corresponding to case (iii) in Sect. 2.1]. We prespecified \(g_{iv}\) (\(v=1,\ldots ,V\,;\,i=1,\ldots ,n\)) with \(V = 2\), and set the subjects from the training and test data as groups 1 and 2, respectively. Note that we did not define groups by “normal” and “abnormal”, because the label of the test data was assumed to be unknown in this analysis.
Robust SIM requires prespecification of p. To determine p, we use Akaike Information Criteria (AIC) as Liu and Yang (2009) suggested, which is defined as \(AIC=2\log \hat{L}+2K\), where \(\log \hat{L}\) is the maximized log likelihood and K is the total number of free parameters, which is \(p + 3 + V\). \(p = 8, 9, 10, 11\) are used as the candidate values. Same as the simulation study, we use order 4 and the \(p  4\) equidistant interior knots. Finally, we ran the algorithm with ten different initial values.
4.2 Result
Akaike information criterion (AIC) results for different p
p  8  9  10  11 

AIC  \(\) 443.22  \(\) 634.91  \(\) 811.03  \(\) 491.63 
Furthermore, using the decomposition in Sect. 2.4, we identify outlier types for each subject. Figure 6 shows each separated distance of amplitude, phase, and other errors, \(d_{1i}^2, d_{2i}^2\) and \(d_{3i}^2\) in (16). We can see that the subject \(i = 34, 36, 40\) appear to have large \(d_{i2}^2\), while several subjects such as \(i = 37, 38, 39\) have large \(d_{i3}^2\). At the same time, Figure 7 depicted the observations of subjects having the three largest \(d_{i2}^2\) and \(d_{i3}^2\) values using solid and dashed lines. This indicates that the shapes of observations having large \(d_{i2}^2\) with solid lines in Fig. 7 (left) appear similar to the shared shape, though these have the large phase difference from the shared one. On the other hand, the shape of observations having large \(d_{i3}^2\) in Figure 7 (right) with a dashed line appears to be inconsistent with that of the other observations. Furthermore, those with solid lines in right figure have shapes that are more similar to the shared one than the dashed line; however, the difference in shape from the shared one appears to be more than a simple shift variation.
5 Discussion
In this paper, we developed robust SIM using the t distribution. Our main objective was to estimate the shared unknown shape function without being affected by outliers. In addition, we showed that robust SIM can provide useful information for identifying outlier types.
Specifically, in our robust SIM, by applying the first order Taylor expansion on SIM, we separated \(\alpha _i\), \(\beta _i\) and \(\varvec{\varepsilon }_i\), making it easier to relate our model to LMM using the t distribution to utilize the properties of that model: we can decompose the distance into separate components, which can be used to identify outlier types. In addition, utilizing the hierarchical structure of t distribution, we could obtain efficient multicycle ECM and ECME algorithms which can be considered as a iteratively reweighted least squares and allow to estimate the degrees of freedom.
In the simulation, we showed that the estimation of our robust SIM is more stable than the other existing methods, especially when \(c = 0.5\). In an empirical data application, we illustrated the performance of our robust SIM with case (iii), showing how the decomposition of distance can be utilized to identify outlier types.
Finally, we mention three possible directions for future work. First, our simulation study suggests that estimating \(h_i\) by group improves the accuracy of the result. However, prespecifying the groups is a complicated task in many practical situations. One can heuristically estimate \(g_{iv}\) from the data by a heuristic algorithm such as the kmeans algorithm (e.g., MacQueen 1967). Specifically, when \(h_i\) is fixed, CMstep obtains \(v^*\) which contributes the increase of objective function the most from \(v=1,\ldots ,V\), and sets \(g_{iv^*}^{(k+1)}=1\) and all other elements to zero, and repeat the iteration. However, this approach increases the computational burden and slows the convergence. Therefore, an efficient and practical means of calculating \(h_i\) is highly desired.
Second, we might wish to vary \(h_i\) for the various components \(\alpha _i\), \(\beta _i\) and \(\varvec{\varepsilon }_i\). For example, in the ECG data, a larger \(h_i\) for \(\alpha _i\) than \(\beta _i\) and \(\varvec{\varepsilon }_i\) might be beneficial.
Finally, robust SIM explicitly parameterizes only the shift difference along the horizontal and vertical axes, not the scale difference. However, there could be a situation, where this is not enough. For example, we could say that the dashed line in Fig. 7 (left) might exhibit the scale difference from the shared shape, and therefore, it could be beneficial to extend robust SIM to distinguish such outliers in scale difference.
Note that these possible extensions of robust SIM are not straightforward, because the computations of the resulting model would become considerably more complicated. Therefore, we must consider efficient algorithms that guarantee a high level of estimation accuracy.
Footnotes
Notes
Acknowledgements
We are grateful to the Reviewers for their comments which have helped us to improve the manuscript.
References
 Brumback LC, Lindstrom MJ (2004) Self modeling with flexible, random time transformations. Biometrics 60(2):461–470MathSciNetCrossRefzbMATHGoogle Scholar
 De Boor C (1978) A practical guide to splines. Springer, New YorkCrossRefzbMATHGoogle Scholar
 Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Statistical Methodology) 39(1):1–38MathSciNetzbMATHGoogle Scholar
 Earls C, Hooker G (2017) Variational bayes for functional data registration, smoothing, and prediction. Bayesian Anal 12(2):557–582MathSciNetCrossRefzbMATHGoogle Scholar
 Gaffney S (2004) Probabilistic curvealigned clustering and prediction with regression mixture models. Ph.D. thesis, University of California, IrvineGoogle Scholar
 Gaffney SJ, Smyth P (2005) Joint probabilistic curve clustering and alignment. In: Advances in neural information processing systems, pp 473–480Google Scholar
 Gervini D, Gasser T (2004) Selfmodelling warping functions. J R Stat Soc Ser B (Statistical Methodology) 66(4):959–971MathSciNetCrossRefzbMATHGoogle Scholar
 Gervini D, Gasser T (2005) Nonparametric maximum likelihood estimation of the structural mean of a sample of curves. Biometrika 92(4):801–820MathSciNetCrossRefzbMATHGoogle Scholar
 James GM (2007) Curve alignment by moments. Ann Appl Stat 1(2):480–501MathSciNetCrossRefzbMATHGoogle Scholar
 Ke C, Wang Y (2001) Semiparametric nonlinear mixedeffects models and their applications. J Am Stat Assoc 96(456):1272–1298MathSciNetCrossRefzbMATHGoogle Scholar
 Kneip A, Engel J (1995) Model estimation in nonlinear regression under shape invariance. Ann Stat 23(2):551–570MathSciNetCrossRefzbMATHGoogle Scholar
 Kneip A, Gasser T (1988) Convergence and consistency results for selfmodeling nonlinear regression. Ann Stat 16(1):82–112MathSciNetCrossRefzbMATHGoogle Scholar
 Kneip A, Gasser T (1992) Statistical tools to analyze data representing a sample of curves. Ann Stat 20(3):1266–1305MathSciNetCrossRefzbMATHGoogle Scholar
 Lange KL, Little RJ, Taylor JM (1989) Robust statistical modeling using the \(t\) distribution. J Am Stat Assoc 84(408):881–896MathSciNetGoogle Scholar
 Lawton W, Sylvestre E, Maggio M (1972) Self modeling nonlinear regression. Technometrics 14(3):513–532CrossRefzbMATHGoogle Scholar
 Lindstrom MJ (1995) Selfmodelling with random shift and scale parameters and a freeknot spline shape function. Stat Med 14(18):2009–2021CrossRefGoogle Scholar
 Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4):633–648MathSciNetCrossRefzbMATHGoogle Scholar
 Liu C, Rubin DB (1995) ML estimation of the \(t\) distribution using EM and its extensions, ECM and ECME. Stat Sin 5(1):19–39MathSciNetzbMATHGoogle Scholar
 Liu X, Yang MC (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53(4):1361–1376MathSciNetCrossRefzbMATHGoogle Scholar
 MacQueen (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, CA, vol 1, pp 281–297Google Scholar
 Matos LA, Prates MO, Chen MH, Lachos VH (2013) Likelihoodbased inference for mixedeffects models with censored response using the multivariate\(t\) distribution. Stat Sin 23:1323–1345MathSciNetzbMATHGoogle Scholar
 Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278MathSciNetCrossRefzbMATHGoogle Scholar
 Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in timeseries data. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PAGoogle Scholar
 Pinheiro JC, Liu C, Wu YN (2001) Efficient algorithms for robust estimation in linear mixedeffects models using the multivariate \(t\) distribution. J Comput Graph Stat 10(2):249–276MathSciNetCrossRefGoogle Scholar
 Raket LL, Sommer S, Markussen B (2014) A nonlinear mixedeffects model for simultaneous smoothing and registration of functional data. Pattern Recognit Lett 38:1–7CrossRefGoogle Scholar
 Ramsay JO, Li X (1998) Curve registration. J R Stat Soc Ser B (Statistical Methodology) 60(2):351–363MathSciNetCrossRefzbMATHGoogle Scholar
 Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New YorkzbMATHGoogle Scholar
 Rønn BB (2001) Nonparametric maximum likelihood estimation for shifted curves. J R Stat Soc Ser B (Statistical Methodology) 63(2):243–259MathSciNetCrossRefzbMATHGoogle Scholar
 Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49CrossRefzbMATHGoogle Scholar
 Song PXK, Zhang P, Qu A (2007) Maximum likelihood inference in robust linear mixedeffects models using multivariate \(t\) distributions. Stat Sin 17:929–943MathSciNetzbMATHGoogle Scholar
 Srivastava A, Wu W, Kurtek S, Klassen E, Marron JS (2011) Registration of functional data using FisherRao metric. arXiv preprint arXiv:11033817
 Szczesniak RD, Viele K, Cooper RL (2014) Mixtures of selfmodelling regressions. J Biom Biostat 5(4):1–8Google Scholar
 Telesca D, Inoue LYT (2008) Bayesian hierarchical curve registration. J Am Stat Assoc 103(481):328–339MathSciNetCrossRefzbMATHGoogle Scholar
 Verbeke G (1997) Linear mixed models for longitudinal data. Springer, New YorkCrossRefzbMATHGoogle Scholar
 Welsh A, Richardson A (1997) Approaches to the robust estimation of mixed models. In: Maddala GS, Rao CR (eds) Handbook of statistics. Elsevier, Amsterdam, pp 343–384Google Scholar
 Zhang Y, Telesca D (2014) Joint clustering and registration of functional data. arXiv preprint arXiv:14037134
Copyright information
OpenAccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.