Longitudinal Associations Between Timing of Physical Activity Accumulation and Health: Application of Functional Data Methods

Accelerometers are widely used for tracking human movement and provide minute-level (or even 30 Hz level) physical activity (PA) records for detailed analysis. Instead of using day-level summary statistics to assess these densely sampled inputs, we implement functional principal component analysis (FPCA) approaches to study the temporal patterns of PA data from 245 overweight/obese women at three visits over a 1-year period. We apply longitudinal FPCA to decompose PA inputs, incorporating subject-specific variability, and then test the association between these patterns and obesity-related health outcomes by multiple mixed effect regression models. With the proposed methods, the longitudinal patterns in both densely sampled inputs and scalar outcomes are investigated and connected. The results show that the health outcomes are strongly associated with PA variation, in both subject and visit-level. In addition, we reveal that timing of PA during the day can impact changes in outcomes, a finding that would not be possible with day-level PA summaries. Thus, our findings imply that the use of longitudinal FPCA can elucidate temporal patterns of multiple levels of PA inputs. Furthermore, the exploration of the relationship between PA patterns and health outcomes can be useful for establishing weight-loss guidelines.

• Proof of Equation 7.For given eigenfunctions, eigenvalues, the BLUP for principal component scores β = ( ξ11 , . . ., ξ1N U , . . ., ξN1 , . . ., ξNN U , ζ111 , . . ., ζ1J11 , . . ., ζN1N V , . . ., ζNn N N V ) has a usual form as, not invertible and only the generalized inverse of ZΛZ ′ can be used [2].For our implementation, N U and N V are in general small numbers and the length of grid points of time is always significantly larger.In this case, . Thus we proved the expression in Equation 7 is the BLUP for β.

S.2 Simulation Results
We simulate data based on ideas implemented in [1].For each simulation setting, we generate 100 replicates with n = 100 subjects in each dataset.We assume a balanced design with n i = 3 visits for each subject and the time variable T ij is generated by standardizing the visits (j=1,2,3) to have unit variance.M = 300 is the total number of observations in each simulation replicate.The functional curves X ij (t) with length of 600 are generated as follows, .
where the number of eigenfunctions N U = N V = 4 and the scores ξ il 's and ζ ijm 's are mutually independent.The eigenfunctions bases are set as, where ϕ U0 l and ϕ U1 l are orthogonal but they are correlated with ϕ V m if m ̸ = 1.The true eigenvalues have two scenarios, For each of the 100 simulated datasets, we implemented the longitudinal FPCA, as described in section 3.2 to estimate the eigenfunctions, eigenvalues, scores and predicted functional trajectories.As a first assessment of model estimation accuracy, we computed the normalized errors between the estimated and true values of subject-level scores.The results are displayed in S.1, which show that the score parameters are unbiasedly estimated.The simulation results demonstrate the agreement with simulation results in Greven et al. (2010) [1], which provided a more complete list of simulation examples.As a second assessment of accuracy, we calculated three ways of residual (R ij (t)) MSE described in section 3.2, which is defined as the total mean squared count difference per observation between the predicted and observed activity curves, i.e. 1 M i,j ( t R_{ij}(t)) 2 .The results are displayed in S.2, and the findings are discussed in section 3.2.

S. 1 .
Boxplots of the normalized biases of estimated principal component scores for subject-level process ( ξil −ξ il )/ λ U l (left) and visit-level ( ζijm −ζ ijm )/ λ V m (right) based on simulation scenario 1 (top) and scenario 2 (bottom) with 100 replicates.Red line represents the zero.

S. 2 .
Boxplots of residual MSE from stepwise decomposition based on simulation scenario 1 (a) and scenario 2 (b) with 100 replicates.From left to right, the mean squared errors are acquired from three ways of computing residuals: Stepwise decomposition of two examples of PA records with raw count inputs (black) and estimated curves at each visit (red, blue): (a) is an example with a large first level 1 principal component score but a small first level 2 principal component score; (b) is an example with a small first level 1 principal component score but a large first level 2 principal component score.S.4.Boxplot of individual daily-average activity magnitude (sum of activity magnitudes over 600 minutes divided by 600) overall and at baseline, 6 months, 12 months.It illustrates an increase in PA magnitudes after baseline visits.S.4.Percentages of average variance explained by different levels of principal components.The cumulative variation is the sum of row entries for the current row.The last row presents the cumulative variance for each column.
1, S.2 and S.3.Compared with results in the original manuscript (Table2and S.4), averaging over daily inputs does not meaningfully alter results of either functional PCA or regression procedures.Specifically, although the variation explained by level-1 PCs is lower when only considering a random day, the overall variance explained by both level 1 and level 2 PCs are similar in general.Also, in the regression models, while point estimates vary, the findings are consistent across approaches, namely that higher person-level PC scores 1 and 2 are associated with lower (log)insulin and BMI.In fact, the averaged inputs could reduce the influence of random movements or activity on a single day, thus reducing noise in the data.Percentages of average variance explained by different levels of principal components and regression results of log(insulin) and BMI on the first two level 1 and level 2 principal component scores using three-day average data.Percentages of average variance explained by different levels of principal components and regression results of log(insulin) and BMI on the first two level 1 and level 2 principal component scores using up to seven-day average data.Figure S.3 provides two examples from our dataset of step-wise reconstructionof the activity curves, after the eigen-decomposition.The first individual example has a large first level 1 principal component score but a small first level 2 principal component score, and vice versa for the second individual example.It further illustrates that principal component scores can inform explained variation at subject-and visit-level.Specifically, minimal betweenvisit difference is witnessed in the first example, i.e., adding in the visit-level component to the subject-level curves does not improve the fit materially, indicating the majority of variation is explained at subject-level.While in the second example which has a larger level 2 principal component score, the figure presents more significant variation between the two visits, and thus the visit-level component is needed to recapitulate the trends of the original data.
S.1.Percentages of average variance explained by different levels of principal components and regression results of log(insulin) and BMI on the first two level 1 and level 2 principal component scores using a-random-day data.
Linear mixed effect regression results of health outcomes on scaled total activity counts and MVPA respectively.It presents that both total activity counts and MVPA both exhibit a negative association with health outcomes, which supports the PCR results.