Quantile regression for longitudinal functional data with application to feed intake of lactating sows

,


Introduction
This paper considers quantile regression for longitudinal data in the presence of functional covariates.It is motivated by data on the daily feed intake of lactating sows, where the aim is to study how temperature in the stable, or cell, during the day affects the feed intake, in particular for sows that eat scarcely.This is of interest because poor nutrition in the lactation period may lead to health downsides, both for the sows and the piglets, and production inefficiency.Daily temperature is measured every fifth minute and is therefore naturally treated as a functional covariate, and the study is longitudinal with both the feed intake and daily temperature profiles recorded for up to 21 days for each sow.
Quantile regression, first introduced by Koenker and Bassett Jr (1978), is a wellestablished framework from statistics and econometrics.It is suitable when the analysis aims at describing and quantifying the association between covariates and quantiles of the distribution of the response variable.In particular, it allows to robustly target not only the central parts of the response distribution, but also the more extreme regions.For overviews, see the seminal monograph by Koenker (2005) and for more recent developments, see Koenker et al. (2017).
Analyses of longitudinal data, including quantile regression, must account for the dependence between observations from the same subject in order to provide valid inference.A common approach is to include subject-specific effects in the model for the quantiles and use penalization, see for example Koenker (2004), Lamarche (2010), Harding and Lamarche (2017), Gu and Volgushev (2019), and Fasiolo et al. (2021a).We adopt the same approach for this paper.Alternatives include Kato et al. (2012) and Galvao and Kato (2016), who treated subject-specific parameters as fixed effects without penalization, and Canay (2011), who used a two-step procedure where subject-specific parameters are first estimated as fixed effects and then plugged in as offsets in a standard quantile regression (see also Besstremyannaya and Golovan (2019)).
Quantile regression with functional covariates, similar to scalar-on-function mean regression, describes the association between a quantile of the response and a functional covariate using an inner product between the functional covariate and an unknown smooth coefficient function.As it is common in nonparametric regression, we approximate the coefficient function using a finite basis representation, and thus the infinite-dimensional estimation problem is converted to a finite-dimensional one.Pre-specified spline functions and eigenfunctions obtained from the spectral decomposition of the functional covariates' covariance operator are the most popular choices for selecting the basis functions, and they have both been used for quantile regression.For example, Cardot et al. (2005) and Park et al. (2019) used splines, whereas Kato (2012), Chen and Müller (2012) and Li et al. (2022) used eigenfunctions.A related research area is additive quantile regression where the effect of a scalar covariate is modeled via a smooth function (Fenske et al., 2013;Greven and Scheipl, 2017;Geraci, 2019;Fasiolo et al., 2021a).
In this paper we consider functional quantile regression for scalar response and functional covariates, which are both observed repeatedly for many clusters or subjects.To the best of our knowledge no papers in the literature are devoted to this situation.We consider a set-up with longitudinal data and allow for the effect of the functional covariate on the quantile to evolve over observation time.We use penalized splines to handle the functional covariates and penalized cluster-or subject-specific intercepts to account for the dependence within clusters or subjects.The resulting model can be represented in a framework easily implementable using existing software from Fasiolo et al. (2021a).Moreover, we point out bias and variance issues of the estimators and propose adjustments obtained with bootstrap, using resampling techniques from Battagliola et al. (2022) for bias adjustment and from Galvao and Montes-Rojas (2015) for computation of standard errors.Altogether, our analysis gives new insight to the eating behavior of lactating sows, our ultimate goal.In particular, the analysis indicates that the association between temperature in the stable and the feed intake gets increasingly stronger after delivery.
The paper is structured as follows.The model framework and estimation methodology are described in Sections 2 and 3. We analyse the lactation data in Section 4 and summarise and discuss findings in Section 5.The appendix provides details about the bootstrap procedures and the practical implementation.Finally, additional results from the application, results from simulation studies, and example code can be found in the supplementary materials.

Framework
We ), where i = 1, . . ., N denotes clusters, and j = 1, . . ., n i denotes repeated measurements within cluster i. Observations from different clusters are assumed to be independent, but there may be within-cluster correlation.Covariates X ij (•) are square-integrable functions on a closed interval S ⊂ R, i.e., X ij (•) ∈ L 2 (S).In practice they are often observed on a dense grid {s 1 , . . ., s H } ⊂ S and possibly with measurement errors.
We are concerned with quantile regression.Let τ ∈ (0, 1) be a fixed quantile level, and assume that the τ -quantile for the conditional distribution of the j-th observation Y ij from cluster i given covariates X ij (•) and t ij takes the form where u i (dependence of τ suppressed in notation) specifies a cluster-specific level.The target parameters of the analysis are the intercept functions α τ (•) and the regression coeffient function β τ (•, •) which are both assumed to be common for all clusters.In particular, the functional covariate affects the τ -quantile in the same way for all clusters.Without further restrictions, α τ (•) is identifiable up to an additive constant, and β τ (t, •) is identifiable up to an additive component in the orthogonal complement of the vector space spanned by the functional covariates X ij (•).
The notation in (2.1) reflects that we think of data as emerging from a two-step process: u i is a sample of i.i.d.random variables with mean zero, and Y ij 's are then generated independently from a model with τ -quantile (2.1).The restriction E[u i ] = 0 ensures full (asymptotic) identifiability of α τ (•).We emphasize that (2.1) does not specify the full conditional distribution of Y ij given {X ij (•), t ij } in cluster i, only its τ -quantile, and we suggest to use it for one or a few quantile levels of particular interest.
With the two-step data generating process, we may also consider the τ -quantile of the conditional distribution of Y ij given {X ij (•), t ij } marginally over all clusters.The association between this implied marginal τ -quantile and X ij may not take a form similar to that of (2.1).Ignoring the cluster-specific parameters, i.e., fitting the quantile regression model (2.1) with all u i = 0, would therefore not target α τ (•) and β τ (t, •).This is an important difference compared to the associated mean regression mixed-effects model where the conditional and marginal means would be described by the same coefficient function, such that an analysis based on the marginal model would lead to reliable estimates (but possibly wrong inference).See the supplementary materials and Battagliola et al. (2022) for further considerations on marginal versus conditional models and analyses in quantile mixed-effects models.

Estimation methodology
Two main challenges arise for the estimation of the model (2.1) compared to classical quantile regression for independent data with scalar covariates: how to represent the longitudinal and longitudinal functional coefficients α τ (•) and β τ (•, •), and how to handle the cluster-specific intercepts u i .

Representation of the functional coefficient and smooth intercept
Firstly, we assume that t → α τ (t) and (s, t) → β τ (s, t) depend smoothly on time.This allows us to use tools from additive models (Wood, 2017) and hence approximate the functions along the t-coordinate with some basis functions {ψ l (•)} L l=1 .For simplicity, we choose to use the same basis functions for both coefficient functions.Secondly, in the functional regression literature it is also common to have a finite-dimensional representation of functional coefficients in the s-coordinate.Let {ϕ d (•)} D d=1 be basis functions for that purpose, and represent β τ (•, •) as a tensor product smooth with different bases for the two coordinates.To be specific, we write where a τ l 's and δ τ dl 's are unknown coefficients.In the application we use cubic spline bases; in the following we describe the methodology for general penalized splines.
With the representations in (3.1), the integral in (2.1) is approximated with and we can work with finite-dimensional version of model (2.1), namely Here, ξ d,ij = S ϕ d (s)X ij (s)ds, and the coefficients {a τ l } l and {δ τ dl } dl and the subjectspecific intercepts {u i } i are the unknown parameters.In practice, the integrals ξ d,ij are approximated with Riemann sums.

Estimation of coefficients and random intercepts
In standard quantile regression, parameters are estimated by minimizing an empirical loss, , where Q τ ij is short for the level τ quantile for observation j of cluster i and depends on the model parameters, and l τ is an appropriate loss function, typically the check loss function v → v(τ − 1 (v<0) ) (Koenker and Bassett Jr, 1978).The method proposed by Fasiolo et al. (2021a), named "QGAM", offers a flexible framework to model longitudinal quantile regression with scalar covariates and is implemented in an accompanying R package qgam.QGAM uses a smooth approximation of the check loss function in order to make it differentiable so common computational optimizers like the Newton method can be used for the minimization problem.
We adapt QGAM to functional covariates.We penalize the subject-specific intercepts {u i } i and coefficients {a τ l } L l=1 and {δ τ dl } D,L d=1,l=1 , as it is common in the additive mixedeffects models literature.In particular, for the former an 2 -penalty is added to the loss function, while for the latter penalty terms accounting for wiggliness in the s-and t-directions are added (Wood, 2017, Chapter 5).The tuning parameters defining the degree of penalization are selected as part of the procesdure as implemented in the qgam package, see Fasiolo et al. (2021a) for details.In the following, we refer to the extension of QGAM to functional covariates as "fQGAM".

Covariates observed with noise
In the previous sections the covariate functions were assumed to be observed densely and without measurement noise.Now, consider the more realistic situation with measurement noise and possibly sparse sampling, and denote by W ij,h the observations corresponding to points s h , i.e., where { ij,h } ijh are iid.random variables with mean zero, and mutually independent of the underlying functions.We propose to carry out a preliminary smoothing step and proceed with the analysis from Section 3.2 with the unobserved values X ij (s) replaced by their fitted/predicted values Xij (s).
There are many smoothing techniques available for functional data, e.g., kernel-based methods, smoothing splines, and smoothing with data-driven bases, see for example Ramsay and Silverman (2005).We choose to represent { Xij (•)} ij with eigenfunctions arising from Functional Principal Component Analysis (FPCA), by assuming independence over both i and j.The number of eigenfunctions should be large enough to capture the primary modes of variation of {W ij,h } ijh , but small enough to get smooth reconstructed functions.The choice is usually based on a preset Percentage of Variance Explained (PVE).Several implementations of FPCA are available depending on the sampling pattern of the functional data (dense or sparse, same or different sampling locations, missing values), see e.g.Yao et al. (2003), Xiao et al. (2018), and references therein.We use the fast covariance estimation (FACE) method from Xiao et al. (2016) in this work, ignoring potential dependence among functions.This is not inappropriate functions (see e.g.Goldsmith et al. (2012)).Approaches that account for dependence within subject or cluster have been discussed by Greven et al. (2010); Chen and Müller (2012); Park and Staicu (2015); Koner and Staicu (2023), to name a few.

Bootstrap procedures for variance assessment and bias adjustment
In the sow data application we are mainly interested in estimation and inference for quantiles and the differences between quantiles at specified directions of the functional covariate.It is known from the literature on quantile regression for longitudinal data with scalar covariates, that estimators may be biased and that it is difficult to properly assess the sampling variability of the estimators without resampling methods (Kato et al., 2012;Galvao and Montes-Rojas, 2015;Battagliola et al., 2022).We have seen in simulation studies (available in the supplementary materials) that the problems persist when covariates are functional, and we propose to use bootstrap strategies for variance estimation and bias adjustment.
Recall the quantile model (2.1) with repeated measurements of functional covariates and responses for each subject.Our target parameters are described as follows.Consider a fixed time point t, a function X(•) ∈ L 2 (S) and response Y .The corresponding linear predictor at level τ is and is interpreted as the τ -quantile for a typical subject (with u = 0).The function X(•) may or may not be one of the functions in the dataset.Furthermore, consider two functional covariates . For a fixed cluster, i.e. a fixed u and a fixed measurement time t, the corresponding difference in the τ -quantile is is the difference in quantile for a fixed subject when X(•) is changed in direction ∆X(•).In the following we refer to Q τ Y |X,0 (t) or D τ (t) as targets θ of interest and let θ denote the corresponding estimate calculated from estimates of the coefficients β τ (•, •) and α τ (•) in the representation (3.1).
First, consider estimation of var( θ).The estimated model coefficients, {â τ l } l and { δτ dl } dl , from qgam are accompanied with a variance-covariance matrix which can be used for computation of a standard error for the estimator θ.We refer to these standard errors as model-based standard errors.However, penalization of random effects is likely to cause underestimation of the true sampling variation of θ.As Galvao and Montes-Rojas (2015), we resample complete subject data with replacement to compute standard errors.Specifically, let θ1 , . . ., θB be estimates when the estimation procedure is applied to B bootstrap datasets, and use sd boot ( θ) = sd( θ1 , . . ., θB ) as an estimate for var( θ).Details of the sampling procedure can be found in the appendix.
Second, consider estimation of bias( θ).As documented by Battagliola et al. (2022), bias can occur even for large samples, caused by a combination of the incidental parameter problem (the number of parameters increase with sample size, Neyman and Scott (1948); Lancaster (2000)), non-linearity of quantiles, and penalization of the subjectspecific intercepts.Block resampling cannot be used for bias adjustment because the target parameter of interest is not computable under the bootstrap distribution; see also Karlsson (2009) who obtained little or no effect in an attempt to adjust for bias in a nonlinear quantile regression for longitudinal data.Instead, we propose to combine standard resampling of estimated random effects with wild bootstrap of residual terms, using the technique developed by Battagliola et al. (2022).The purpose is to generate bootstrap datasets under a distribution where the true value of the target parameter coincides with θ (the estimate obtained from the observed data), such that the bias can be estimated from the bootstrap estimates.Specifically, let θ1 , . . ., θB be estimated values of θ for B bootstrap datasets, then bias is estimated as bias boot ( θ) = 1 B B b=1 ( θb − θ).We explain the sampling procedure in more detail in the appendix.
The block resampling method and the sampling method suggested by Battagliola et al. (2022) differ in several ways.While the former is completely non-parametric, the latter relies on the model.Another important difference is that the covariate functions are resampled (together with the responses) by the block resampling method, but kept exactly as in the dataset in the approach of Battagliola et al. (2022).As a consequence, the procedure based on wild bootstrap would underestimate the variance of the estimator θ.Our suggested solution is to combine the estimated bias and estimated standard deviation from the two bootstrap sampling methods, respectively, to construct confidence intervals for the target θ.If the distribution of θ is well approximated by a normal distribution, and standard errors and bias are estimated as described above, it is natural to define approximate 1 − α confidence intervals as θ − bias boot ( θ) ± q 1−α/2 sd boot ( θ).
(3.5) Battagliola et al. (2022) demonstrated in a wide variety of simulation settings with clustered data and scalar covariates that bias was greatly reduced or removed, with the above bootstrap sampling process combining resampled cluster-specific intercepts and wild bootstrap for the residuals, and Galvao and Montes-Rojas (2015) demonstrated that sampling variation of estimators is measured appropriately with block resampling.We acknowledge that investigations of the coverage properties of the confidence intervals in the current setting would be of interest, but leave it for future research and focus on the application below.

Lactating sows' feed intake
Our ultimate goal is to study the impact of thermal conditions on the daily food intake for lactating sows, taking into account the progression of food intake over lactation days.(compromised reproductive system) as well as for its litter.In particular, low food intake may lead to increased body weight loss and reduced milk production, implying slower and poorer weight gain of the litter (see for instance Rosero et al. (2016), Bloemhof et al. (2013), Renaudeau and Noblet (2001), Johnston et al. (1999), Renaudeau et al. (2001)).
In this paper we focus on understanding the association between low quantile levels of feed intake and the hourly temperature and how this association varies over time.

Description and preprocessing of data
The data comes from a commercial research unit in Oklahoma, where 480 sows were monitored from July to October 2013.The animals were divided into 21 groups and then assigned to cells, where they were kept under observations during the lactation period for up to 21 days.For each sow at each lactation day, the food intake (in kg) is available, as well as the cell temperature (in °C), measured every five minutes for 24 hours from 2.00 pm to 1.59 pm the following day.Moreover, the parity of each sow is registered, i.e., the number of pregnancies the animal had before the current one.We will consider parity as a measure of age: a sow is "young" if it is at its first pregnancy and it is "old" otherwise.Previous studies have shown that younger and older sows behave differently (Staicu et al., 2020), so we analyze data from young and older sows separately.There are 475 sows in total with 237 young sows and 238 old ones, respectively.
The data are illustrated in Figure 1.Feed intake profiles are plotted in the left panel with profiles from three randomly selected sows from each age group highlighted.Although there is large within-sow variation over lactation days, it is also clear that some sows tend be have low (or high) feed intake throughout, calling for a subjectspecific component in the model.In a preprocessing step we smoothed the temperature curves with FACE (see Section 3.3), using a PVE of 99.99% so that most of the features of the curves are maintained.The reconstructed daily temperature trajectories are used in the analysis.Staicu et al. (2020) used a longitudinal dynamic functional regression framework for mean regression for the same data with emphasis on prediction of response trajectories.Park et al. (2019) carried out separate quantile regression analyses for a derived variable at three selected lactation days.For each day separately, the cumulative distribution function (CDF) was first estimated and then inverted to estimate quantiles of interest.In contrast, we carry out quantile regression for all lactation days simultaneously using the model framework and estimation method introduced in Sections 2 and 3, and our analysis provides estimates and confidence bands for the temperature effect on quantiles of feed intake.In particular, we consider the estimated quantiles of the feed intake when the daily temperature corresponds to the pointwise 20% and 80% quantiles of the smoothed temperature curves.These two temperature profiles, denoted Temp 20 (•) and Temp 80 (•), respectively, and shown in Figure 1 in blue and red, are the most extreme temperature quantiles considered by Park et al. (2019).The pointwise median temperature curves is also plotted in Figure 1 (green).Figure 1: The lactation data.Left: daily feed intake profiles over lactation days of young sows (upper panel) and of old sows (lower panel) with three randomly selected profiles (black) in each group.For some sows data are only available in a subset of the lactation period.Right: smoothed temperature curves (grey), as well as the pointwise temperature quantiles curves at quantile levels 20% (blue), 50% (green) and 80% (red) based on the whole dataset.

Estimated quantiles of feed intake
Denote the observed data by {(FI ij , Temp ij (•), t ij )} ij .For each sow i = 1, . . ., N (N = 237 or N = 238) and repeated measurement j = 1, . . ., n i (n i ranging from 7 to 21), FI ij refers to the daily feed intake expressed in kg, Temp ij (•) to the smoothed temperature function in °C recorded over a day and t ij to the lactation day.We allow for a subjectspecific intercept u i to account for the correlation of observations from the same sow.
For each age group, we consider the model where S represents a whole day from 2.00 pm to 1.59 pm.We approximate the smooth intercept α τ (•) using ten cubic splines and the coefficient function β τ (•, •) using a tensor product of ten cubic splines in both directions, with cyclic splines for the s-direction.

Qτ
T emp 20 ,0 (t) = ατ (t)+ S Temp 20 (s) βτ (s, t)ds, Qτ T emp 80 ,0 (t) = ατ (t)+ S Temp 80 (s) βτ (s, t)ds, plotted over t, for each age group and for τ = 0.1, 0.5.Notice that no random effects are included in the predictions such that their interpretation is for a "typical sow".The solid curves are estimated profiles, and we see a clear distinction between low (blue) and high (red) temperatures, at least from around lactation day five.High temperatures negatively influence the appetite of the sows-they tend to eat more in cooler conditions-and this difference increases over time, particularly at the 0.1 level.The group of young and old sows have similar quantiles of feed intake at the very beginning of their lactation period, followed by a steep increase up to around lactation day 5, a short period with constant feed intake, and a final increase up to a stable plateau.However, estimated increments appear to be smaller for young sows.The dashed curves show the bias-adjusted estimates.Although the bias adjustment is hardly visible, it is actually significantly different from zero at many instances at a 5% significance level (based on pointwise one-sample t-tests on the estimates from the bootstrap data).Notice that predicted quantiles at different quantile levels are plotted on different scales.
In order to illustrate the estimated temperature effects more clearly, Figure 3 shows the difference between the estimated feed intake quantile profiles for low and high temperatures, i.e.Dτ (t) = Qτ T emp 20 ,0 (t) − Qτ T emp 80 ,0 (t).The black curves and confidence bands show the estimates without adjustment and the corresponding model-based 95% pointwise confidence interval based on the variance-covariance matrix extracted from the model fit, and the orange curves and confidence bands show the bias-adjusted estimates and confidence bands obtained by bootstrap, as in equation (3.5).We used 100 bootstrap samples for bias adjustment as well as for computation of confidence intervals, cf.Section 3.4.Bias adjustment is most noticeable for young sows at the 0.1 quantile, and the bootstrap generated confidence bands are always wider than the model-based ones.Simulation results have indicated that the model-based standard errors underestimate the actual variation (see the supplementary materials), so we prefer the bootstrap generated confidence intervals.For completeness, the profiles obtained from bootstrap datasets are shown in Figure S6 in the supplementary materials.Irrespective of the method used for construction of confidence bands and of the bias adjustment, the overall conclusion is the same: No temperature effect is found early in the lactation period (up to around day five), but at later days quantiles of feed intake are negatively affected by high temperature, both at 0.1 and 0.5 quantile levels.In general, the influence of temperature on the quantiles becomes more prominent along the lactation period, but there are certain differences between age groups and between quantile levels.At quantile level 0.1 the difference in estimates has an increasing trend along lactation days for both groups of sows, while at the median the difference in estimates reaches a maximum of approximately 0.5 around lactation day 10-13 and then flattens out.This might indicate that the sows that eat less are those particularly sensible to the environmental temperature.

Comparison with simpler models
Now, let us turn to a comparison of the model (4.1), with three simpler alternatives.The first modification has β(s, t) ≡ β A (s) such that the temperature curves still have functional effects, but with same effect across lactation days; this would correspond to profiles of differences in Figure 3 being constant.The second modification has β(s, t) ≡ β B (t).Then the model (4.1) becomes such that the quantile depends on the temperature curve only through its integral or, equivalently, the average temperature over the day, and it is no longer a functional quantile regression model.The third modification combines the two previous sub-models; it has β(s, t) ≡ β C , such that and the temperature effect is the same across days and depends on the average temperature over the day only.
We measure goodness-of-fit with the AIC values based on the log-likelihood corresponding to the Extended log-F (ELF) distribution (Fasiolo et al., 2021a) and the effective degrees of freedom (EDF) known from additive models (Wood, 2017).The EDF is partitioned into two parts: the effective degrees of freedom for the smooth coefficients, denoted EDF α,β , and the degrees of freedom corresponding to the subject-specific intercepts, denoted EDF u .
The results are displayed in Table 1.For both groups the AIC values are notably smaller for the most complex model than for its competitors for quantile level τ = 0.1, while the values are closer among the models at the median.This indicates that it is particularly important to allow for time-varying coefficients and functional effects at lower quantiles.At level τ = 0.5 the most complex model is still selected for old sows, but for younger animals the smallest AIC is the one from model (4.2).Furthermore, in all cases, the AIC values from the model with β B (t) are smaller than the AIC value from the model with β A (s), indicating that it is more important to account for the temperature variation in the development along the lactation period than over the day.
For both groups and at both quantile levels, EDF α,β is the highest for the model (4.1), as expected, since it describes variation in both the s and t direction.Both EDF α,β and EDF u are larger when estimation is carried out at the median rather than at the 10% level; most likely because there is more information in the data to estimate the median, which in turn allows for higher flexibility.Finally, EDF u is always between 188 and 207, and thus smaller than 237 and 238, the number of young and old sows, respectively, so random effects are penalized to some degree.

Estimated effect of temperature
Finally, we turn the attention to the estimated coefficient function βτ (•, •) in model (4.1).
It is important to keep in mind that the estimated functional coefficient is only identifiable up to elements belonging to the orthogonal complement of the space spanned by the basis functions used in the finite dimensional representation of the functional covariates.Thus the interpretation of βτ (•, •) must be taken with caution.Figure 4 shows βτ (•, •) for each age group and at quantile levels 0.1 and 0.5, respectively.In each panel, s → βτ (s, t) is plotted for t fixed at each lactation day, on a colour scale that ranges from orange to green as the longitudinal time t goes by.Recall that, by construction, the development in both s and t direction is smooth, and the functions are cyclic over day.The estimated coefficient functions are predominantly negative, corresponding to an overall negative effect of temperature, cf. Figure 1.With the risk of overinterpretation, we see that the impact of temperature on feed intake is most prominent in the morning hours (from about 8 am to about 12, a bit later at late lactation days for young sows at the median).This is also the time of the day with the largest differences in temperature effects between lactation days.Sensitivity against temperature appears to increase over lactation days, but stabilizes earlier for young compared to older sows.

Discussion
This work was motivated by the study of heat stress effects on lactating sows.We used a model framework for scalar-on-function quantile regression for clustered or longitudinal data where dependence within cluster/subject is taken into account by including clusteror subject-specific intercept parameters.Estimation relies on basis expansions, more specifically penalized splines.As an alternative, an eigenfunction basis could be used, with the number of basis functions selected with an AIC criterion (inspired by Kato (2012)).This was studied in Battagliola (2021) and gave more wiggly estimates of β τ (•, •), compared to those in Figure 4. We prefer the spline expansions over the eigenfunction expansions, in particular because it avoids the extra step with selection of the size of the basis.We adapted existing software, making the methodology more easily applicable for practitioners, see the appendix for implementation details and the supplementary materials fore example code.
Our analysis helped uncover some interesting insights for the sow data application.First, the feed intake quantiles are similar for younger and older sows close to giving birth, but increase faster and to a higher level for older than younger sows, suggesting that sows at their second or later pregnancy acclimatize faster to the environment.Second, a high temperature in the stable affects feed intake negatively except for the early days in the lactation period; this is the case for both younger and older sows, and both at the median and at the 0.1 quantile levels.At early lactation days, the temperature effect is not significant, and also similar for both groups of sows.Third, the estimated temperature effect is generally larger at the 0.1 level compared to the 0.5 level, suggesting that the lower tail of daily feed intake for a sow is more sensible to variation in temperature; however this should be investigated further.Fourth, there is an increasing trend of the temperature effect throughout the lactation period at the 0.1 quantile level, and steeper for the older sows.This is confirmed by model comparisons where models with time-varying temperature effect are preferred over models with constant temperature effects.Fifth, for both groups and at the 0.1 quantile level, the model with functional effect of temperature over the day is preferred over a model which includes the average temperature only.This suggests that the shape, not only the level, of the temperature profile affects the fraction of sows with a low feed intake.
We have focused on models with a single functional covariate, but they could be extended to include more than one functional covariate or a mixture of functional and scalar covariates in a straight-forward way.Moreover, several grouping levels could be included as random effects; this could be relevant in the application because sows were kept together in the stables.It remains to study the robustness of estimates in such more complex models.The proposed models have similar flavor as models from Brockhaus et al. ( 2020), but we were not able to get reliable estimates with the accompanying software.
We adjusted estimates and standard errors with bootstrap methods.The sampling schemes have been used and studied in simpler models (Galvao and Montes-Rojas, 2015;Battagliola et al., 2022), but further examination would be interesting in the current setup.Another future research topic is the development of hypothesis testing procedures for the regression coefficient functions; see Li et al. (2022).The recent approaches of Abramowicz et al. (2018) and Pini et al. (2023) might be helpful for this purpose.Preliminary ideas involve test statistics computed as integrals over the domain S of pointwise test statistics and bootstrap computations for evaluation of their null distributions.The main challenge lies in designing appropriate permutation schemes that comply with the dependence structures in the data.
For a target of interest, θ, and estimates θ 1 , . . ., θB computed from B bootstrap samples, the bias of θ is estimated as explained in the main text.Wild bootstrap was introduced by Wu (1986) and Liu (1988) for mean regression, and adapted to quantile regression by Feng et al. (2011).Results in Feng et al. (2011), Wang et al. (2018) and Battagliola et al. (2022) indicate that wild bootstrap captures asymmetry and heteroskedasticity better than ordinary resampling of residuals.

Implementation
We used the software environment R (R Core Team, 2023) for the computations.The FACE method used for smoothing is implemented in the function fpca.face, which is part of package refund (Goldsmith et al., 2023).It can handle functional data observed on a dense or a sparse grid and also allows for missing values.One specifies either the selected PVE (pve) or the number of principal components (npc) of choice.The resulting eigenfunctions, the functional mean, and the predicted/smoothed functions are evaluated and returned at a dense grid.
In order to fit fQGAM, we rely on the package qgam (Fasiolo et al., 2021b).It includes the qgam function, which is a wrapper of the function gam (Wood, 2017;Wood and Scheipl, 2020).The call to qgam has the following structure: qgam(y ~formula, qu=tau, data=data) where y is the response and formula specifies any ordinary covariates, smooth effects, and random effects to include in the model.The quantile level of interest τ is passed to qu, and the entry data specifies the data frame of interest.The formula for qgam works with the same syntax as for gam.Hence, smooth terms are included with s() in the univariate case and with te() in the multivariate case.In both cases the user can choose the type and the number of basis functions by passing the arguments bs and k, respectively.Importantly for this paper, the functional covariate is included in the smooth term with the option by = Xhat, where Xhat is the matrix of (possibly) presmoothed functional covariate values, along with the grids of times points of observation, in form of a matrix.The subject-specific intercept is included as a smooth term with bs='re'.
The output from qgam is a gamObject, which stores several quantities related to the model and the estimation process, such as twice the log-likelihood (logLik) and the estimated effective degrees of freedom (edf2), which are used for computation of AIC values in our application.Moreover, Vp, the variance-covariance matrix of all estimated coefficients, is available, and can be used to compute model-based standard errors and confidence bands for functions of the parameters, such as the targets mentioned in Section 3.4.Estimated quantiles for new covariate functions can be computed with the function predict.gam,and the function allows to exclude one or more terms from the model, such as the subject-specific intercepts, in the prediction.
Finally, we have implemented the bootstrap schemes described in Section 3.4, namely block bootstrap and wild bootstrap, and the R functions are available as part of Supplementary Material in Section S3.Moreover, data preparation, implementation of fQGAM and bootstrap-based inference results are provided in an example using a dataset which is available in the refund package (Goldsmith et al., 2023).

Figure 2 :
Figure 2: Predicted quantiles corresponding to the 20% and 80% pointwise temperature profiles.Bootstrap-adjusted estimates are shown with dotted curves.The left column refers to sows at their first pregnancy, while right one refers to the older sows.Results at quantile levels τ = 0.1 and τ = 0.5 are shown in the top and bottom row, respectively.Notice that predicted quantiles at different quantile levels are plotted on different scales.

Figure 3 :
Figure 3: Estimated differences in quantiles between the pointwise 20% and 80% temperature curves, both without (solid black) and with (solid orange) bias adjustment.The corresponding pointwise confidence intervals, based on the model solely or on bootstrap, are illustrated with dashed curves.The left column refers to sows at their first pregnancy, while the right column refers to the older sows.Results at levels τ = 0, 1 and τ = 0.5 are shown in the top and bottom row, respectively.