Spatial joint models through Bayesian structured piecewise additive joint modelling for longitudinal and time-to-event data

Joint models for longitudinal and time-to-event data simultaneously model longitudinal and time-to-event information to avoid bias by combining usually a linear mixed model with a proportional hazards model. This model class has seen many developments in recent years, yet joint models including a spatial predictor are still rare and the traditional proportional hazards formulation of the time-to-event part of the model is accompanied by computational challenges. We propose a joint model with a piecewise exponential formulation of the hazard using the counting process representation of a hazard and structured additive predictors able to estimate (non-)linear, spatial and random effects. Its capabilities are assessed in a simulation study comparing our approach to an established one and highlighted by an example on physical functioning after cardiovascular events from the German Ageing Survey. The Structured Piecewise Additive Joint Model yielded good estimation performance, also and especially in spatial effects, while being double as fast as the chosen benchmark approach and performing stable in an imbalanced data setting with few events.


Introduction
Biometrical studies often capture time-to-event and longitudinal data on the same topic simultaneously.
Frequently used examples are the count of CD4 lymphocytes in HIV-positive patients and their time till onset of AIDS (Faucett and Thomas 1996;Wulfsohn and Tsiatis 1997;Rizopoulos 2011) or the level of serum bilirubin and other liver biomarkers in primary biliary cirrhosis patients and time to death (Crowther, Abrams, and Lambert 2013;Hickey et al. 2018).Other examples include PSA cancer marker and progression to recurrence of prostate cancer (Jacqmin-Gadda et al. 2010), autoantibody titers in children preceding the onset of Type 1 diabetes (Köhler, Beyerlein, et al. 2017) or physical functioning after a cardiovascular event and death (Rappl, Mayr, and Waldmann 2022).Separate analysis of these longitudinal and time-to-event outcomes leads to biased estimates and to avoid this both should be modelled jointly.These joint models consist of two submodels: A longitudinal submodel and a survival submodel with both being linked through an association parameter.While Wulfsohn and Tsiatis (1997) and Henderson, Diggle, and Dobson (2000) proposed to maximize the likelihood of a joint model via an Expectation-Maximization (EM) algorithm, Faucett and Thomas (1996) used a Bayesian Gibbs-sampling approach.In recent years advances have been made into statistical boosting (Waldmann et al. 2017;Griesbach, Groll, and Bergherr 2021).Software is available for all three estimation approaches across various statistical computation platforms, of which especially R hosts a number of wellestablished packages such as JM (Rizopoulos 2010), JMbayes (Rizopoulos 2016), joineRML (Hickey et al. 2018) and bamlss (Umlauf et al. 2021).Comparisons of selections of available software can be found in Yuen and Mackinnon (2016) and Rappl, Mayr, and Waldmann (2022).Traditionally the longitudinal submodel of a joint model is a linear mixed model (LMM) and the time-to-event submodel is a proportional hazards (PH) model, though other variants are possible.Depending on the scaling of the longitudinal outcome a generalized linear mixed model (GLMM) (Faucett, Schenker, and Elashoff 1998;Rizopoulos et al. 2008;Viviani, Alfó, and Rizopoulos 2014) or quantile regression model (Y.Huang and Chen 2016;Zhang et al. 2019) might be better suited.An alternative to the PH-models in the time-to-event submodel are accelerated failure time models (Tseng, Hsieh, and Wang 2005; Y. Huang and Chen 2016) and in certain data situations competing risks models are best suited (X.Huang et al. 2011;Andrinopoulou et al. 2014;Blanche et al. 2015).Also models with multivariate longitudinal outcomes are in use (Lin, McCulloch, and Mayne 2002;Rizopoulos and Ghosh 2011;Mauff et al. 2020) as are location-scale models (Barrett et al. 2019).Köhler, Umlauf, et al. (2017) expanded joint models to structured additive joint models with possibly smooth random effects and established non-linear association structures (2018) both via a Bayesian flexible tensor-product approach using Newton-Raphson procedures and derivative-based Metropolis-Hastings sampling.A good historic overview on joint models can be found in Tsiatis and Davidian (2004), while Alsefri et al. (2020) give a concise summary of recent developments in Bayesian joint models in particular.Still joint models with a spatial component are rare.Martins, Silva, and Andreozzi (2016) and Martins, Silva, and Andreozzi (2017) have described estimation of a Bayesian joint model with a spatial effect and a Weibull baseline hazard using OpenBUGS and WinBUGS respectively.The above mentioned Bayesian tensor-product approach by Köhler, Umlauf, et al. (2017) implemented in the R package bamlss also has the capability of estimating spatial joint models.In terms of model formulation both methods have in common that they use a PH-model for the survival submodel.However, assuming a parametric baseline hazard such as a Weibull hazard can be restrictive and derivative-based Metropolis-Hastings algorithms are computationally expensive as well as may prove sensitive towards data with few events.Therefore, in this paper we propose a Bayesian joint model with a structured additive LMM for the longitudinal outcome, but exchange the time-to-event submodel for a piece-wise additive mixed model (PAMM).The latter has been suggested by Bender, Groll, and Scheipl (2018) for modelling survival times based on the proportionality of a time-to-event process with a Poisson-distributed count process (Friedman 1982) thus expanding the available options for time-to-event models (e.g.accelerated failure times, competing risk).This formulation allows for estimation of the baseline hazard without any assumptions about its distributional form and is similarly flexible to the Köhler, Umlauf, et al. (2017) model with respect to the inclusion of (non-)linear, spatial and random effects.At the same time, it reduces runtimes by about 50% (compared to an established method) and has proven stable in imbalanced data settings with few events.The rest of the paper is structured as follows: In the next section, the methodology of piece-wise additive joint models is described in more detail and our extension of the concept is explained.In section three the results of a simulation study comparing our approach to an established one to proof the feasibility of the model formulation, its ability to estimate spatial effects and its runtime performance.We then apply this method to an example of physical functioning from the German Aging Survey.Section five concludes with some final remarks and further technical details can be found in the Appendix.

Theoretical background
In its original form the Joint Model assumes a linear mixed model (LMM) for the longitudinal outcome and a proportional hazards model (PH) for the time-to-event outcome (Wulfsohn and Tsiatis 1997;Faucett and Thomas 1996;Henderson, Diggle, and Dobson 2000).Let y denote the vector of longitudinal outcomes across all individuals i = {1, . . ., n} and observations times points t.Further, let λ(t) be the vector of individual specific risks to experience an event at time t proportional to the baseline hazard λ 0 (t) and based on the observed event or censoring times T and event indicator δ.Then in its most generic variant the original joint model takes the form where η s and η l are survival and respectively longitudinal submodel specific predictors and η ls is the shared predictor, via which both model parts are connected.The parameter α quantifies the association between the longitudinal and the time-to-event outcome.In the following, the addendum •(t) to denote time-varying predictors is dropped for ease of notation, since the subsequent concepts can be applied to time-varying and -constant predictors alike.Also note that, while it is theoretically possible to estimate time-varying survival predictors η s (t), the time-varying covariates included in that predictor may be prone to measurement error and it is therefore in most cases better to model them jointly.
The predictors η • are additive and may include (non-)linear, geographical or random effects of potentially time-varying covariates , where f k is a function representing the respective effect and p • denotes the predictor specific number of covariates.Restrictions apply to random effects, which need to be part of the shared predictor η ls , and geographical effects, of which there can only be one in the model for identifiability reasons.Reformulating predictor η • in matrix notation yields where Z k is an effect appropriate design matrix and γ k a vector of corresponding effect coefficients.For the Bayesian estimation of this model the generic prior for the coefficients γ k is proportional to a normal distribution with zero mean, variance σ 2 γ k and penalty matrix For non-linear and spatial effects the penalty matrix K k is rank deficient and as a result prior (4) is partially improper.Linear effects.For a vector γ k = (γ k1 , . . ., γ k J k ) of J k linear fixed effects the penalty matrix K k is an J k × J k identity matrix I J k reducing (4) to a J k -variate normal distribution.An alternative is to set p(γ kj | •) ∝ const ∀j = 1, . . ., J k .The corresponding design matrix Z k is a matrix of covariates of order n × J k , where n denotes the number of observations.Random effects.In the case of joint models random effects appear in the shared predictor exclusively.Thus let n be the number of individuals and n i be the number of observations per individual i, so that the total number of observations amounts to N = n i=1 n i .Further, let u i be a vector of observations (or 1 for random intercepts) of length n i specific to individual i.Then Z k is a matrix of vectors u i of order N × n, i.e.Z k = diag(u 1 , . . ., u n ) and γ k is a vector of random effects b i of length n, γ k = (b 1 , . . ., b n ) .The penalty matrix K k then is an n × n identity matrix I n .Non-linear effects.Modelling non-linear effects follows the Bayesian P-spline approach with Z k being a matrix of B-spline basis functions evaluated at observations x i (t).Then γ k is a vector of corresponding basis coefficients.The common choice of prior for these basis coefficients is a first or second order random walk.This is achieved by setting the penalty matrix K k equal to D D, i.e.K k = D D, where D is a matrix of first or second order differences.Spatial effects.For spatial effects Z k is assumed to be an n × S incidence matrix (potentially also N × S, for spatio-temporal observations) with an entry of 1 if observation i ∀ i = 1, . . ., n originates from location s ∀ s = 1, . . ., S with S unique locations and 0 otherwise.The corresponding coefficients γ k follow a Markov random field (MRF) prior achieved via the penalty matrix K k .K k is an adjacency matrix of order S × S with entries as the number of neighbours |n(s)| only when locations s and r are neighbours (s ∼ r) of the form The variance parameters of the coefficient distributions σ 2 γ k as well as the model variance σ 2 ε will a priori follow inverse gamma distributions, in particular

The piecewise expontential representation of the time-to-event submodel
The idea behind a PH-model is that an individual's hazard at time t is determined by an individual specific deviation of an underlying baseline hazard λ 0 (t) at time t.In mathematical notation a generic PH-model looks similar to (2) and takes the form λ(t) = λ 0 (t) exp{η}, where η represents an unspecified predictor.The aim of estimating such a model then is quantifying the coefficients governing η and determining λ 0 (t) over time t given the times to event T and the events δ.Now this approach can be re-written as an equivalent log-linear Poisson-model.This is achieved by dividing the continuous observation time t = (0, t max ] into J intervals and counting the events δ j in any given interval j.The intervals are specified by the boundaries 0 = κ 0 < • • • < κ J = t max and assuming constant baseline hazards λ j within each interval the generic PH-model changes to a piecewise exponential model of the form Then this formulation is proportional to a Poisson regression of the events δ j in intervals j = 1, . . ., J with expected value E(δ j ) in the sense that , where E(δ j ) = exp{log λ j + η + o j } with transformed exposure times o j = (o 1j , . . ., o nj ) of each individual i in each interval j as offsets (exp{o ij } = t ij ) (Friedman 1982).This further generalises to a piecewise additive mixed model (PAMM) when the interval-specific log-baseline hazard log λ j is represented as a smooth function of time f 0 (t j ) instead of a step-function and the predictor η contains (non-)linear, geographical and/or random effects (Bender, Groll, and Scheipl 2018).This form of estimation requires the data to be structured differently than in the conventional way.Table 1 gives an example of this data augmentation and more details can be found in Bender, Groll, and Scheipl (2018).
(ref:dataaug) Illustration of data augmentation used for applying Poisson regression.Data augmentation in this toy example was carried out using pammtools (Bender and Scheipl 2018).

Structured Piecewise Additive Joint Models (SPAJM)
Transferring this counting process representation to the context of joint models changes the notation thereof to The likelihoods then follow the distributions for the longitudinal and the time-to-event submodel respectively.

Posterior estimation and implementation
Posterior estimation of this model is accomplished via a Markov Chain Monte Carlo (MCMC) sampler, which in short is a combination of Gibbs-sampling and a Metropolis-Hastings (MH)-algorithm with iteratively weighted least squares (IWLS) proposals.The steps of this sampler are outlined in the following: 0. Initiate starting values for parameter vector θ [0] = (θ γ ls,p ls ) θ [0]  s = (γ

Survival effects: IWLS-MH-update
For k = 1, . . ., p s determine γ s,k as follows: s,k , working weights W s and working observations ỹs .The definition of working weights and observations is given in appendix A.2. Accept draw γ * s,k with probability

Shared effects: IWLS-MH-update
For k = 1, . . ., p ls determine γ [t] ls,k as follows: s,k , working weights W ls and working observations ỹls .The definition of working weights and observations is given in appendix A.3.Accept draw γ * ls,k with probability

Update variance parameters: Gibbs-update
• Model variance Let N = n i=1 n i be the total number of longitudinal observations as the sum of all observations n i per individual i across all individuals n.Draw σ In ã0 and b0 use η [t] l and η [t] ls .

• Effect variance
For In ãk• and bk• use γ •,k• .The above described algorithm is implemented in the current developer version of the statistical software BayesX (Belitz et al. 2022).

Simulation Study
With the following simulation study we want to (a) illustrate the flexibility of the SPAJM with regard to effect specification, (b) highlight its capability for estimating spatial effects and (c) confirm its computational advantage by comparing the performance of our approach to an already existing one.In order to meet intention (a) the simulated model will be maximally generic, i.e. include various types of effects in all possible predictors alongside a non-linear baseline hazard, and to meet intention (b) the model will comprise a spatial effect.Since the spatial effect can only be located in one of the predictors for identifiability reasons we will look at three settings to determine whether the quality of performance is location specific: Setting 1 The spatial effect is located in the shared predictor η ls , Setting 2 it is located in the survival predictor η s and Setting 3 in the longitudinal predictor η l .
Lastly, to ascertain intention (c) the runtimes of our approach will be contrasted to an already existing one.In terms of software we will use the BayesX implementation of the SPAJM and benchmark it against the similarly flexible joint model implementation of the tensor-product approach using Newton-Raphson procedures and derivative-based Metropolis-Hastings sampling by Köhler, Umlauf, et al. (2017) in the R package bamlss (Umlauf et al. 2021).We will use the current developer version of BayesX (Belitz et al. 2022) as well as bamlss version 1.1-8 on R-4.1.2(R Core Team 2022).

Setup
We generate longitudinal measurements y(t) for n = 200 individuals over n i = 6 individual specific, original time points each in the range of t ∈ (0, 1) according to the generic model given in ( 5) and ( 6) with α = −0.3 and the following predictors with the non-linear functions f 1 (x) = 0.5 x + 15 φ(2(x − 0.2)) − φ(x + 0.4) and f 2 (x) = sin(x).All covariates x ls• and x s• are simulated as time constant with the exception of x ls3 , which is simulated time dependent just like covariates x l• , with all x •• ∼ U (−1, 1).Further the model variance is set to σ 2 ε = 0.5 and the variances of the random intercepts and slopes are set to σ 2 b0 = σ 2 b1 = 2. True survival times T * i are determined based on a Weibull baseline hazard function λ 0 (t) = pqt q−1 with scale p = 0.4 and shape q = 1.5.The event times are then set to T i = min(T * i , 1) with event indicator δ i = 1 if T * i ≤ 1 and δ i = 0 otherwise for censored individuals.For a more realistic censoring scenario we apply in addition uniform censoring U (0, 1) to 50% of the censored individuals.
The spatial effect is based on the map of counties in western Germany available from the R package BayesX and calculated as f geo = sin(c x ) • cos(0.5 c y ) with c x and c y being the scaled x-and y-coordinates respectively of the centroids of each region.The regions are then randomly distributed across the individuals.For each setting we use R = 100 replications.Convergence is achieved in BayesX by using 70000 iterations per run with a burn-in of 10000 and a thinning factor of 60 and in bamlss by using 44000 iterations (54000 with f geo in η l ) with a burn-in of 4000 and a thinning factor of 40 (50).In order to compare the results of both implementations we calculate the mean squared error (MSE), bias and coverage of the 95%-high density interval (HDI) of the posterior distribution of each parameter and compare runtimes between BayesX and bamlss.

Results
The outcome of the estimation performance of the simulation study can be found in Figure 1 and the computational performance is illustrated in Figure 2. The summarized results in Figure 1 already make it clear that a joint model with a piecewise additive formulation of the survival submodel is is equal in terms of effect estimation to its established PH counterpart given the small MSE and bias values as well as the high coverage rates.Detailed results of the individual effects can be found in the appendix in Figure 4, which confirm this high level impression.Both methods exhibit the largest deviation from the true data in the shared predictors η ls .In terms of estimation any effect in this predictor belongs to the most demanding to estimate, as the corresponding likelihood features both model parts.Thus the larger bias here is to be expected.Furthermore, it quickly becomes clear that BayesX outperforms bamlss in the estimation results of the shared predictors η ls and the survival predictors η s .The reason for the performance of bamlss in the shared predictors η ls is due to the random effects, which can be seen from the more detailed Figure 4 in the Appendix.Their estimates remain rather small, which is why their high density intervals do not cover the simulated (true) random effects b 0 and b 1 , which in turn affects the overall results for the shared predictor η ls .Similarly the survival predictors η s perform rather weak with bamlss, which is mainly due to the rather large bias in the association α (see Figure 4).The estimation procedure implemented in bamlss is in fact tailored to identify advanced association structures in joint models, which is why the bias in α is highly likely a result of the underestimation of the random effects.Only in the estimation of the longitudinal predictor η l did bamlss surpass BayesX, which is interesting, since the formulation implemented in bamlss does not extend to longitudinal-only-predictors.It assumes η l to be a part of η ls , but since the data of η l is simulated such that it is not associated with the survival part of the model, the results of η l under bamlss are more precise than those of η ls .Figure 1 further demonstrates the capability of both methods to estimate spatial effects, while it also shows the indifference to the position (in η ls , η s or η l ) of the spatial effect within the model.First of all, the figure indicates a stable performance of the spatial effect f geo in both implementations independent of the predictor it belongs to.Secondly, also the other predictors remain very stable in their performance regardless of the simulation setting.If the position of the geographical effect f geo mattered, it would not just show in the estimation accuracy of the effect itself, but it would also affect the effects in other parts of the model, which is not the case here.Again the reason for the bamlss results are similar to before.The estimation results for f geo exhibit the same behaviour as for the random effects: They remain surprisingly small, resulting in a larger bias and thus only achieving a rather low coverage.Lastly, in terms of computational cost the piecewise additive approach in BayesX has an advantage over the PH approach in bamlss with lower runtimes (see Figure 2).With both methods Setting 1 with the geographic effect f geo in the shared predictor η ls is the most time consuming.But this is also the most complex setting in terms of estimation, therefore, the increased runtime is not surprising.Setting 2 with f geo in the survival predictor η s and Setting 3 with f geo in the longitudinal predictor η l are less complex from an estimation perspective, which is also evident in the short runtimes.More detailed descriptive statistics on the runtimes can be found in the appendix in Table 3.

Physical Functioning after a Caesura
In 2015 the World Health Organsiation (WHO) concluded in their "World Report on Ageing and Health" that the physical capacity dimension of "Healthy Ageing" still suffers from a lack of understanding.Physical capacity can be measured as functional health (aka physical functioning), which decreases naturally over time until death.However, certain physiological events have the power to alter the trajectory of an individual's functional health both in a negative and positive way, among them heart attacks, strokes or diagnoses of cancer (caesura, WHO 2015).While the longitudinal modelling of these trajectories is already of interest, the trajectories themselves influence an individual's survival time.Therefore, a joint model is appropriate to capture both these aspects of the data.To examine the development of physical functioning after a caesura in Germany we will resort to the German Ageing Survey (DEAS), which aims at studying the second half of life with people between 40 and 85 years old and living in Germany being eligible for study participation.The DEAS has collected information on physical functioning from a SF-36 survey, health conditions qualifying as caesurae, terminal dates and a multitude of other variables, which might help explain the development of physical functioning after a caesura, over the course of seven waves (1996,2002,2008,2011,2014,2017,2021) (Klaus and Engstler 2017;Engstler, Hameister, and Schrader 2014).Our analysis will focus on data from waves 2008 to 2021 with originally 6622 participants, of which 750 suffered from a heart attack or stroke i.e. a cardiovascular caesura, during their panel participation.Single observations, cases with missing data and caesurae with onset prior to the participant's entry into the panel were excluded from the analysis.For the remaining 636 the time of onset of the caesura was set to coincide with the interview date, in which the caesura was first reported, since the exact onset date of the caesurae is not collected.Out of 636 participants 79 (12.4%) died.
As explanatory variables for the trajectory of functional health (sf36) we consider time (t), gender (gender), the age of onset (aoo) of the caesura as well as living location of the participant on the level of European Nomenclature of Territorial Units for Statistics (NUTS) 2. In order to avoid re-identification of participants few regions had to be combined leaving now 33 regions of the original 36.The continuous and strictly positive variables SF-36 sf36, age of onset aoo and time t are scaled to the domain (0, 1).We then consider the model and estimate it with BayesX and bamlss.With 79 events out of 636 individuals the survival data is unbalanced and presents a situation that -in a standard survival analysis setting -would already prove difficult to estimate.bamlss proved to react sensitive to this imbalance, which is why we only present BayesX results here.For the linear effects they can be found in Table 2 and for the non-linear effects in Figure 3. None of the linear effects includes zero in their HDI, thus they are significantly different from zero.The association α of both model parts is negative meaning that a lower level of the modelled trajectory of physical functioning sf36 translates to a higher probability of experiencing an event.The intercept can be interpreted as a male individual at scaled age of 0.402 (i.e. an unscaled age of 40.2) years old at onset of the caesura can be expected to have an average scaled SF-36 level sf36 of 0.848 [0.827, 0.871].For women this reduces on average by -0.091 [-0.123, -0.059].Every scaled month after the caesura further reduces the level of sf36 by -0.247 [-0.277, -0.219].The age of onset aoo has in general a decreasing effect on sf36 (upper left panel Figure 3).Though it needs to be pointed out that before the scaled age of 0.55 (55 years old) the effect is positive, i.e. it increases the level of sf36 thus slowing down the natural decline of physical functioning, while for an aoo between roughly 0.55 and 0.7 (55 to 70 years old) the effect is constant around zero, i.e. it is negligible, and a caesura after an aoo of 0.7 (70 years old) has a negative effect on sf36 translating to an accelerated decline of physical functioning.In terms of living location there is a South-West against North-East (and Mid-West) divide (right panel Figure 3).People in the North-Eastern part of Germany especially in the area of Mecklenburg-Pommerania, Brandenburg and Saxony-Anhalt as well as those of the Western parts in the Dusseldorf and Cologne regions see a negative effect on their level of sf36.Those living in the South-Western part especially in South-West Baden-Wurttemberg (Black Forest region) see an increasing effect on their sf36 level.Given that the association is negative this means that the probability for an event is decreased most for people from the South West of Germany and increased most for those in the North-East and Mid-West.What these two areas have in common is that they comprise the most and least densely populated areas in Germany.This might be a starting point for further research to investigate what exactly triggers the effect to take this particular shape, since the living location in this example can be interpreted as a proxy for other variables that have not been included in the model.The baseline hazard is almost linear over time (lower left panel Figure 3), thus the risk of experiencing an event is roughly the same at all times throughout the study.

Conclusion and Discussion
The focus of this article has been on proposing a piecewise additive joint model for longitudinal and time-toevent data allowing for spatial, (non-)-linear and random effects to be included as well as estimation of the baseline hazard without any assumptions about its distributional form.In a simulation study comprising (non-)linear as well as a spatial effect it became evident that the piecewise additive approach yields results similar or better to the equally flexibly bamlss-methodology for joint models in R and that this performance is high independent of the position of the spatial effect.This method was illustrated by an example of the development of physical functioning after a caesura in people in their second half of life.The concept of piecewise additive joint models has not just proven its accuracy in estimating complex effects, but also its ability in handling unbalanced data in terms of availability of event observations.Applying the piecewise additive approach requires augmenting data, which is part of the time-to-event process.This augmentation artificially increases the size of the data set and when the original data is large, it can lead to longer runtimes.In our experience this is, however, seldom the case.Furthermore, this method could also be combined with other models in the longitudinal part of the model such as quantile regression, a location-scale model or multiple longitudinal outcomes in a multivariate joint model.Also, Bayesian variable or effect selection in this type of joint model could be investigated since very few methods for variable selection in joint models exist yet.
Here η s,k , and W [t]  s denotes the working weights and ỹ[t] s the working observations all evaluated at the current state t of the MCMC chain.The definition of working weights and observations is given further below.The acceptance probability of the IWLS proposal γ * is then •) being the likelihood evaluated at the proposal γ * s,k as well as the current state γ [t] of the effect.If the proposal then is accepted it becomes the new state γ s,k .Acceptance is established via random draws from a uniform distribution following the logic: 1. Draw u ∼ Unif(0, 1) 2. If u ≤ α then γ [t+1] = γ * else γ [t+1] = γ [t] .
For the definition of the working weights and observations consider the log-full conditional where (η s ) denotes the log-likelihood depending on predictor η s = ps k=1 Z s,k γ s,k (compare (3)) thus including γ s,k .Further, define the score vector v s as v [t]  s = ∂ (η i.e. the vector of first derivatives of (η s ) with respect to the predictor η s evaluated at the current iteration t and the working weights also evaluated at iteration t as W [t]  s = diag w 1 η s,1 , . . ., w na η [t]   s,na with n a the number of observations in the augmented dataset (compare Section 2.2 Table 1) and The vector of working observations ỹ is then determined by ỹ[t] s = η [t]  s + W [t]   s −1 v [t]  s .

A.3 Shared effects
The full conditional of the k th coefficient p(γ ls,k | y, δ, σ 2 γ ls,k , •) of the k = 1, . . ., p ls shared effects are neither tractable.Therefore, we also apply an MH-step with IWLS-proposal here.The procedure is similar to the survival specific coefficients but needs to consider the joint likelihood of both model parts.First consider the full conditional Let Z ls,k be the corresponding design matrix of effect γ ls,k and

Figure 1 :Figure 2 :
Figure1: Boxplots of mean squared error (MSE), bias and 95%-coverage for the geographic effect f geo as well as the predictors per method and simulation setting (Setting 1f geo in η ls , Setting 2f geo in η s , Setting 3f geo in η l ).The orange horizontal line marks the reference value of each statistic.

Figure 3 :
Figure 3: BayesX estimates of the smooth effect of the age of onset aoo and the geographical location on physical functioning as well as the estimated baseline hazard of the model.

Table 2 :
BayesX estimates of linear effects of physical functioning after a caesura.
being the value of γ ls,k at iteration t of the MCMC algorithm.More specifically The rest of the algorithm is analogous to the survival effects.To see how the working weights and observations build for the coefficients in the shared predictor, consider first the log-full conditional ls,k K γ ls,k γ ls,k + y (η ls ) + δ (η ls ), where y (η ls ) denotes the longitudinal part of the log-likelihood and δ (η ls ) the survival/ poisson part of the log-likelihood depending on predictor η ls = p ls k=1 Z ls,k γ ls,k (compare (3)) thus including γ ls,k .The vector of scores, i.e. first derivatives of the log-likelihoods with respect to η ls evaluated at iteration t, is log(p(γ ls,k | y, δ, σ 2 γ ls,k , •)) ∝ − γ

Table 3 :
Table 3 details descriptive measures of the run times.Statistical overview of run times of 100 replications per setting and estimation method.Method Min.1st.Qu.Median Mean 3rd.Qu.Max.