1 Introduction

In medical sciences, two types of data are usually observed simultaneously: repeated measurements and event time data. The repeated measurements are taken on the same subject at some selected schedule visits and data arising from such study design is known as longitudinal data. Event time data are also called survival data, where measurements are observed until any pre-specified event takes place. The observed repeated measurements are recorded before the occurrence of dropout or censoring.

Analyzing longitudinal and event time data, separate modelling techniques are available [1, 2], which do not consider association patterns that may exist among different outcomes recorded at the same individuals until dropout, censoring, or the event of interest took place. Simultaneous modelling is recommended in this regard, receiving valid estimates by accounting for individual variability [3,4,5]. Joint modelling links up sub-models of all responses together [6,7,8].

Many studies observe more than one longitudinal outcome on the same individuals until an event of interest takes place, which leads to a multivariate setting [9, 10]. Li et al. [11] discussed binary, continuous, nominal, and ordinal types of outcomes in longitudinal studies using a mixed joint modelling strategy, event time outcomes may also be observed repeatedly [12]. These different outcomes are analyzed by combining two or more models simultaneously such as Weibull-gamma-normal, probit-beta-normal, and Poisson-gamma-normal models.

Missing data in repeated measurements poses challenges to analysis and resulting stages [13], where outcomes and predictors relationship depends upon the reason for missingness [14]. Missingness in longitudinal studies may arise due to a variety of reasons, such as individuals not having any more interest in follow-up, dropout and/or death of participants, administrative failure to collect data, and censoring [15]. In case of missingness, it is recommended that the missing data should be taken into account in the modelling process to produce valid statistical inferences. This requires the analyst to have an idea about the missing data-generating process, which is known as the missing data mechanism.

Rubin [16] explained in detail about different types of missing data mechanisms by making assumptions related to observed and missing data. Missingness is considered missing completely at random (MCAR) in regards to the probability that a response would be missing and does not depend on missing data or the observed data. Missingness is considered missing at random (MAR) considering the probability that a response would be missing depends upon the observed data but it is independent of unobserved data. Not missing at random (NMAR) is dependent on observed and unobserved data. MCAR and MAR mechanisms are ignorable under some regulatory conditions which must be satisfied, while NMAR mechanism is assumed to be non-ignorable [17]. The ignorability assumption assumes that the process of generating missingness is MAR or MCAR. Philipson et al. [18] gave reviews about missing data handling techniques and provided a brief comparison of methods.

Discussion about the joint modelling of multivariate longitudinal and event time data with missingness is very rare in the literature. However, Njaji et al. [19] elaborated an extended shared parameter joint model framework considering MAR using Creemers et al. [20] ideology. In this paper, we consider the class of joint SREM proposed by Njaji et al. [19] by amending the model to account for association among continuous longitudinal, binary longitudinal, and event time responses while addressing MAR. To the best of our knowledge, no such research work is presented to date.

The main purpose of this research article is to develop joint modelling of mixed longitudinal continuous-binary data and one event time outcome with missing covariates, using an amended SREM. Our proposed joint SREM employs mixed-effects models for longitudinal responses and a proportional hazard model for the event time response, in such a way that the random effects information is shared between the sub-models. Joint SREM is proposed by assuming that the measurement and censoring processes are independent conditional on the random effects. We adopt a Bayesian approach to obtain parameter estimates using MCMC methods. Iterative simulation of the conditional posterior distribution is performed for each parameter using the Gibbs sampler via R and OpenBUGS software [21].

This research is motivated by a dataset collected from PCa patients at one of the most renowned public hospitals in Pakistan. Patients underwent various treatments, which were followed until tumor shrinkage or censoring occurred to take measurements of \({\text{PSA}}\) and \({\text{ALP}}\). While prostate early-stage cancer may not cause significant symptoms, at an advanced level, the presence of cancer can be determined by observing symptoms. If PCa is suspected based on symptoms, certain tests are required to confirm the diagnosis, and for grade detection, a crucial biomarker is the \({\text{PSA}}\) level [22,23,24].

In our research, diagnosed PCa patients are considered as study subjects and followed up until tumor shrinkage occurs, following physicians' directives with different treatments. During this period, various time-dependent and time-independent covariates are observed. PSA measurements are collected repeatedly as a continuous longitudinal response variable, while ALP measurements are collected repeatedly as a binary response variable. Tumor shrinkage is observed as the event of interest, which is not fully observed for all patients due to right censoring or dropout. We propose an amended joint SREM under MAR characterization for the covariates, the SREM poses a conditional independence assumption to be fulfilled; it means the measurement and the dropout processes are independent conditional on the random effects [19]. We have extended this assumption for censoring and measurement processes according to the directives of Papageorgiou and Rizopoulos [25].

This paper is organized as follows: In Sect. 2, we introduce our motivational PCa dataset. In Sect. 3, we describe the proposed joint modelling strategy. Section 4 presents the application of the joint model to analyze the PCa dataset. The final Sect. 5 concludes with a discussion of our results.

2 Motivation: PCa dataset

The motivation of this study is based on the PCa data collected from Mayo Hospital, Lahore. Patients who were diagnosed with PCa as a primary disease were included as study subjects. Data were collected for n = 1504 patients on two longitudinal responses: \({\text{PSA}}\) and \({\text{ALP}}\), with a median follow-up time of 3.00 per patient with a range of 1 to 5. In this study, we use log(PSA) as a continuous response variable and \({\text{ALP}}\) as a binary response variable. Figure 1, shows individual trajectories for \({\text{log}}({\text{PSA}})\) in the PCa dataset. As part of this study, follow-up visits are decided by physicians based on the severity of the disease and other factors.

Fig. 1
figure 1

Individual trajectories for log(PSA) in the PCa data set

Consecutive follow-up visits typically have a gap of 28 to 30 days, with the first visit involving a complete check-up and the creation of patients' record files.

Physicians prescribed blood tests over time to monitor changes in time-varying factors such as \({\text{Platelets}}\) and \({\text{Bilirubin}}\) that could affect the outcome variables, including repeated measures of \({\text{PSA}}\) and \({\text{ALP}}\), as well as the event time outcome, which is the time-to-tumor shrinkage.

Our analysis aims to investigate the association between \({\text{logPSA}}\) (mean ± sd: 1.96 ± 2.03) and \({\text{ALP}}\) (1 = high level, 0 = low level) with tumor shrinkage. Missing data occur in time-varying \({\text{Platelets}}\) and \({\text{Bilirubin}}\), with missingness rates of 61.61% and 61.57%, respectively. The primary event of interest in this study is the individuals' condition at the end of the study time, observed through tumor status (1: tumor shrinkage, 0: right-censored). Out of 1504 patients, 960 experienced events of interest, while 544 were right-censored.

Figure 2 presents the Kaplan–Meier (KM) plot for event time, separated by \({\text{Drug}}\) categories: EBRT and (ADT, prostatectomy, and combinations).

Fig. 2
figure 2

KM plot for \({\text{Drug}}\) categories in the PCa data set

The primary goal of analyzing PCa data is to explore the combined impact of \({\text{PSA}}\) levels [26] and \({\text{ALP}}\) levels [27] on tumor shrinkage over time in response to \({\text{Drug}}\).

The mean ± sd of \({\text{Age}}\) (44.69 ± 13.48), \({\text{Platelets}}\) (0.69 ± 0.46), \({\text{BMI}}\) (19.44 ± 2.14), and \({\text{Bilirubin}}\) (1.46 ± 1.25) are presented as continuous covariates. In terms of \({\text{Drug}}\) distribution, out of 1504 patients, 478 were prescribed EBRT.

For analyzing the data, categorical variables are coded into dichotomous variables. The \({\text{Drug}}\) is coded as one for (ADT, prostatectomy, and combinations), and zero for EBRT. Similarly, the \(\mathrm{Greason Score}\) is coded as one for greater than or equal to (4 + 3) and zero for lower than or equal to (3 + 4).

The analysis aims to explore the relationships among three models: linear mixed-effects for \({\text{log}}({\text{PSA}})\), logistic mixed-effects for \({\text{ALP}}\), and the occurrence of the event "shrinkage of tumor." Additionally, a key objective of this study is to assess the impact of covariates on the outcomes. The three sub-models under investigation share the same set of covariates, including two time-varying missing covariates: \({\text{Platelets}}\) per cubic ml/1000 and serum \({\text{Bilirubin}}\) mg/dl, both recorded for individuals. Baseline covariates, observed at the study's onset, encompass the \({\text{Age}}\) of patients in years, \({\text{BMI}}\) (body mass index) of patients in kg/m2, \(\mathrm{Gleason Score}\), and \({\text{Drug}}\). The study aims to elucidate the interplay between these variables and the specified models, contributing valuable insights into the factors influencing the outcomes of interest.

3 Model formation

3.1 Joint modelling of bivariate longitudinal and event time outcomes


The model formulation comprises three sub-models: the first two parts describe the processes for longitudinal outcomes, while the third one pertains to the event time outcome process. Let \({{\text{y}}}_{1{\text{ij}}}\) be the continuous longitudinal response for subject \({\text{i}}\),\({\text{i}}=\mathrm{1,2},3,\dots .,{\text{n}}\) at time \({{\text{t}}}_{{\text{ij}}},{\text{j}}=\mathrm{1,2},3,\dots .,{\text{m}},\mathrm{ where m}\) is the dimensional vector of longitudinal continuous measurements such that \({{\text{y}}}_{1{\text{ij}}}\) follows a linear-mixed effects model, which is written as,

$${{\text{y}}}_{1{\text{ij}}}={\upeta }_{{\text{ij}}}+{\upvarepsilon }_{{\text{ij}}},$$
(1)
$${\upeta }_{{\text{ij}}}={{\text{x}}}_{1{\text{ij}}}^{{\text{T}}}{\upbeta }_{1}+{{\text{w}}}_{1{\text{ij}}}^{T}{{\text{b}}}_{{\text{i}}},$$
(2)

where \({{\text{x}}}_{1{\text{ij}}}^{T}\) is a \({{\text{p}}}_{1}\) vector of fixed-effects explanatory variables, \({\upbeta }_{1}\) is a vector of the \({{\text{p}}}_{1}\) fixed effects parameters, \({{\text{b}}}_{{\text{i}}}\) is a \({{\text{q}}}_{1}-\) dimensional vector of random effects, and \({{\text{w}}}_{1{\text{ij}}}^{T}\) is a \({{\text{q}}}_{1}\) dimensional design vector for random effects, also, \({\upvarepsilon }_{{\text{ij}}}\sim \mathrm{ N}\left(0, {\upsigma }_{\upvarepsilon }^{2}\right)\) is a vector of error terms.

Let \({{\text{y}}}_{2{\text{ij}}}\) be the binary repeated measurements for \({\text{ith}}\) individual \({\text{i}}=\mathrm{1,2},3,\dots .,{\text{n}}\) at time \({{\text{s}}}_{{\text{ij}}, }{\text{j}}=\mathrm{1,2},3,\dots .,{\text{m}},\) here \({{\text{s}}}_{\mathrm{ij }}={{\text{t}}}_{{\text{ij}}}\). \({{\text{y}}}_{2{\text{ij}}}\) given \({{\text{y}}}_{1{\text{ij}}}\) follows the logistic mixed-effects longitudinal model [28], which can be written as,

$${{\text{y}}}_{2{\text{ij}}}|{{\text{y}}}_{1{\text{ij}}}\sim {\text{Ber}}\left({\uppsi }_{{\text{ij}}}\right),$$
(3)
$${\text{logit}}\left({\uppsi }_{{\text{ij}}}\right)={{\text{x}}}_{2{\text{ij}}}^{{\text{T}}}{\upbeta }_{2}+{{\text{w}}}_{2{\text{ij}}}^{{\text{T}}}{{\text{u}}}_{{\text{i}}}+{\upgamma }_{{\text{j}}}{{\text{y}}}_{1{\text{ij}}},$$
(4)

where \({{\text{x}}}_{2{\text{ij}}}^{T}\) is a \({{\text{p}}}_{2}\) \(-\mathrm{ p}2-{\text{dimensional}}\) vector of fixed-effects explanatory variable, and \({\upbeta }_{2}\) is the vector of \({{\text{p}}}_{2}\) fixed-effects parameters, which are unknown. \({{\text{u}}}_{{\text{i}}}\) is a \({{\text{q}}}_{2}-\) dimensional vector of the random effects, which are unobserved, and \({{\text{w}}}_{2{\text{ij}}}^{T}\) is its design vector. The term \({\upgamma }_{{\text{j}}}\) is the associated parameter, which is to check the effect of continuous response on the binary response for the time \({{\text{s}}}_{\mathrm{ij }}({{\text{s}}}_{\mathrm{ij }}={{\text{t}}}_{{\text{ij}}})\)

Let \({{\text{T}}}_{{\text{i}}}^{*}\) be the true dropout time for \({\text{ith}}\) individual \({\text{i}}=\mathrm{1,2},3,\dots .,{\text{n}}\) in such a way that \({{\text{T}}}_{{\text{i}}}={\text{min}}({{\text{T}}}_{{\text{i}}}^{*},{{\text{C}}}_{{\text{i}}})\) represents the observed dropout or censoring time, where \({{\text{C}}}_{{\text{i}}}\) is censoring.\({\Delta }_{{\text{i}}}={\text{I}}({{\text{T}}}_{{\text{i}}}^{*}\le {{\text{C}}}_{{\text{i}}})\) is the event indicator, which is equal to \(0\) for right censoring and is equal to \(1\) for an observed event. Event time is assumed to follow a Weibull model given as,

$${{\text{T}}}_{{\text{i}}}\sim {\text{Weibull}}\left({{\text{x}}}_{3{\text{i}}}^{{\text{T}}}{\upbeta }_{3}+{{\text{b}}}_{{\text{i}}}^{*}+{{\text{u}}}_{{\text{i}}}^{*} ,{\text{r}}\right),$$
(5)

where \({{\text{x}}}_{3{\text{i}}}^{T}\) is \({{\text{p}}}_{3}\) \(-\mathrm{ dimensional}\) vector of fixed-effects explanatory variable, \({\upbeta }_{3}\) is the vector of \({{\text{p}}}_{3}\) fixed-effects parameters and r is the Weibull distributions’ shape parameter. \({{\text{b}}}_{{\text{i}}}^{*}+{{\text{u}}}_{{\text{i}}}^{*}\) is the shared parameter that is associated with random effects of longitudinal outcomes such that the random effects \({({\text{b}}}_{{\text{i}}},{\mathrm{ b}}_{{\text{i}}}^{*})\) are assumed to follow a normal distribution with zero mean and variance–covariance matrix \({{\text{D}}}_{1}\mathrm{ and}\) are independent of \({\upvarepsilon }_{{\text{ij}}},\)

$${({\text{b}}}_{{\text{i}}},{\mathrm{ b}}_{{\text{i}}}^{*})\sim {\text{iidN}}\left(0, {{\text{D}}}_{1} \right).$$

Also,\({({\text{u}}}_{{\text{i}}},{{\text{u}}}_{{\text{i}}}^{*}) \sim \mathrm{iid N}\left(0, {{\text{D}}}_{2}\right)\), where

$${{\text{D}}}_{1}=\left(\begin{array}{c}{{\text{D}}}_{\mathrm{1,11}} {{\text{D}}}_{\mathrm{1,12}}\\ {{\text{D}}}_{\mathrm{1,21}} {{\text{D}}}_{\mathrm{1,22}}\end{array}\right), {{\text{D}}}_{2}=\left(\begin{array}{c}{{\text{D}}}_{\mathrm{2,11}} {{\text{D}}}_{\mathrm{2,12}}\\ {{\text{D}}}_{\mathrm{2,21}} {{\text{D}}}_{\mathrm{2,22}}\end{array}\right),$$

\({{\text{D}}}_{\mathrm{1,11}}\) and \({{\text{D}}}_{\mathrm{2,11}}\) are, respectively, \({{\text{q}}}_{1}\) and \({{\text{q}}}_{2}\) dimensional matrices. Also, \({{\text{D}}}_{\mathrm{1,12}}\) and \({{\text{D}}}_{\mathrm{2,12}}\) measure the degree of association between longitudinal and event time outcomes.

The longitudinal outcomes vector \({{\text{y}}}_{{\text{i}}}=({{\text{y}}}_{1{\text{ij}}},{{\text{y}}}_{2{\text{ij}}})\) and event time outcome \(({{\text{T}}}_{{\text{i}}},{\Delta }_{{\text{i}}})\) are independent given \(\mathrm{random effects}.\) The combined observed data is denoted as,

$${\text{Data}}=\left\{{{\text{y}}}_{1{\text{ij}}},{{\text{y}}}_{2{\text{ij}}},{{\text{x}}}_{1{\text{ij}}}^{T},{{\text{x}}}_{2{\text{ij}}}^{T},{{\text{w}}}_{1{\text{ij}}}^{T}{,{{\text{w}}}_{2{\text{ij}}}^{T},{\text{T}}}_{{\text{i}}},{\Delta }_{{\text{i}}},{{\text{x}}}_{3{\text{i}}}^{T}{\text{i}}=\mathrm{1,2},3,\dots .,{\text{n}},{\text{j}}=\mathrm{1,2},3,\dots .,{\text{m}}\right\}.$$

Let \(\mathrm{\varnothing }\) be the vector of all unknown parameters in the joint model, a full conditional joint distribution of observed data is written by,

$$\mathop \prod \limits_{{{\text{i}} = 1}}^{{\text{n}}} {\text{p}}\left( {{\text{y}}_{{\text{i}}} ,{\text{T}}_{{\text{i}}} ,{\Delta }_{{\text{i}}} {\text{|u}}_{{\text{i}}} ,{\text{u}}_{{\text{i}}}^{*} ,{\text{ b}}_{{\text{i}}} ,{\text{b}}_{{\text{i}}}^{*} ,\emptyset } \right) = \mathop \prod \limits_{{{\text{i}} = 1}}^{{\text{n}}} {\text{p}}\left( {{\text{y}}_{{1{\text{i}}}} {\text{|b}}_{{\text{i}}} ,{\upbeta }_{1} ,{\upsigma }_{\varepsilon }^{2} } \right) \times {\text{p}}\left( {{\text{y}}_{{2{\text{i}}}} {\text{|y}}_{{1{\text{i}}}} ,{\text{u}}_{{\text{i}}} ,{\upbeta }_{2} } \right) \times {\text{p}}\left( {{\text{T}}_{{\text{i}}} ,{\Delta }_{{\text{i}}} {\text{|b}}_{{\text{i}}}^{*} ,{\text{u}}_{{\text{i}}}^{*} ,{\upbeta }_{3} ,{\text{r}}} \right),$$
(6)

where, \(\mathrm{\varnothing }\) is the vector of all unknown parameters. Equation (6) can also be written as,

$$p\left( {y_{1i} |b_{i} ,\beta_{1} ,\sigma_{\varepsilon }^{2} } \right) \times p\left( {y_{2i} |y_{1i} ,u_{i} ,\beta_{2} } \right) = \prod\limits_{j = 1}^{m} {p\left( {y_{1ij} |b_{i,} \beta_{1,} \sigma_{\varepsilon }^{2} } \right)} \prod\limits_{i = 1}^{m} {p\left( {y_{1ij} |y_{1i} u_{i} \beta_{2} } \right)} ,$$
(7)

where, \({\text{p}}\left({{\text{T}}}_{{\text{i}}},{\Delta }_{{\text{i}}}|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right)={{\text{p}}}^{{\Delta }_{{\text{i}}}}\left({{\text{T}}}_{{\text{i}}}|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right){{\text{S}}}^{1-{\Delta }_{{\text{i}}}}\left({{\text{T}}}_{{\text{i}}}|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right)\), \({\text{p}}\left(.|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}}\right)\) and \({\text{S}}(.|{{\text{b}}}_{{\text{i}}}^{*},{{\text{u}}}_{{\text{i}}}^{*},{\upbeta }_{3},{\text{r}})\) are the density and the survival functions, respectively.

3.2 Bayesian computation of joint model

Estimation of parameters is done using Bayesian thinking by specifying priors for the unknown parameters, which yields the joint posterior density for all the observed components as,

$${\text{p}}\left(\mathrm{\varnothing },{{\text{u}}}_{{\text{i}}},{{{\text{u}}}_{{\text{i}}}^{*},\mathrm{ b}}_{{\text{i}}},{{\text{b}}}_{{\text{i}}}^{*}|{{\text{y}}}_{1{\text{i}}}^{\prime},{{\text{y}}}_{2{\text{i}}}^{\prime},{{\text{T}}}_{{\text{i}}},{\Delta }_{{\text{i}}}\right)\propto \prod_{{\text{i}}=1}^{{\text{n}}}\prod_{{\text{j}}=1}^{{\text{m}}}\Phi ({{\text{y}}}_{1{\text{ij}}};{{\text{x}}}_{1{\text{ij}}}^{{\text{T}}}{\upbeta }_{1}+{{\text{w}}}_{1{\text{ij}}}^{T}{{\text{b}}}_{{\text{i}}},{\upsigma }_{\varepsilon }^{2})\times \frac{exp\left({{\text{y}}}_{2{\text{ij}}}\left({{\text{x}}}_{2{\text{ij}}}^{T}{\upbeta }_{2}+{{\text{w}}}_{{\text{ij}}}^{{\text{T}}}{{\text{u}}}_{{\text{i}}}+{\upgamma }_{{\text{j}}}{{\text{y}}}_{1{\text{ij}}}\right)\right)}{1+{\text{exp}}\left({{\text{x}}}_{2{\text{ij}}}^{T}{\upbeta }_{2}+{{\text{w}}}_{{\text{ij}}}^{{\text{T}}}{{\text{u}}}_{{\text{i}}}+{\upgamma }_{{\text{j}}}{{\text{y}}}_{1{\text{ij}}}\right)} \times {\text{p}}\left({{\text{T}}}_{{\text{i}}},{\Delta }_{{\text{i}}}|{b}_{i}^{*},{u}_{i}^{*},{\upbeta }_{3},{\text{r}}\right)\Phi ({({\text{u}}}_{{\text{i}}},{{{\text{u}}}_{{\text{i}}}^{*}),0,{{\text{D}}}_{1}) \times\Phi (({\text{b}}}_{{\text{i}}},{{\text{b}}}_{{\text{i}}}^{*}),0,{{\text{D}}}_{2})\times {\text{p}}\left(\mathrm{\varnothing }\right),$$
(8)

where, \(\Phi (.,\mu ,\mathrm{ D})\) is the density function of a multivariate normal distribution with mean \(\mu\) and variance D and \({\text{p}}\left(\mathrm{\varnothing }\right) is\) the joint prior distribution of the parameters. The prior distributions for the unknown parameters are,

$${\upbeta }_{1}\sim {{\text{N}}}_{{\text{p}}1}^{\prime}\left({\upmu }_{{\upbeta }_{1}},{\sum }_{{\upbeta }_{1}}\right),$$
$${\mathrm{\rm B}}_{2}\sim {{\text{N}}}_{{\text{p}}2}^{\prime}\left({\upmu }_{{\upbeta }_{2}},{\sum }_{{\upbeta }_{2}}\right),$$
$${\upbeta }_{3}\sim {{\text{N}}}_{{\text{p}}3}^{\prime}\left({\upmu }_{{\upbeta }_{3}},{\sum }_{{\upbeta }_{3}}\right),$$
$${\upsigma }^{2}\sim \mathrm{I\Gamma }\left({{\text{a}}}_{{\upsigma }^{2}},{{\text{b}}}_{{\upsigma }^{2}}\right),$$
$${{\text{D}}}_{1}\sim \mathrm{IWishart }\left({\uppsi }_{{{\text{D}}}_{1}},{{\text{v}}}_{{{\text{D}}}_{1}}\right),$$
$${{\text{D}}}_{2}\sim \mathrm{IWishart }\left({\uppsi }_{{{\text{D}}}_{2}},{{\text{v}}}_{{{\text{D}}}_{2}}\right),$$
$${\text{r}}\sim\Gamma \left({{\text{a}}}_{{\text{r}}},{{\text{b}}}_{{\text{r}}}\right),$$
$${\upgamma }_{{\text{j}}}\sim {\text{N}}\left({\upmu }_{{\upgamma }_{{\text{j}}}},{\upsigma }_{{\upgamma }_{{\text{j}}}}^{2}\right),{\text{j}}=\mathrm{1,2},3,\dots ,{\text{m}},$$

where \(\mathrm{I\Gamma }\left({\text{a}},{\text{b}}\right)\) and \(\Gamma \left({\text{a}},{\text{b}}\right)\), respectively, denote the inverse gamma distribution and gamma distribution with shape parameter \({\text{a}}\) and scale parameter \({\text{b}}\). \(\mathrm{IWishart }\left(\uppsi ,\mathrm{ v}\right)\) represent the inverse Wishart distribution with scale parameter \({\text{v}}\) and matrix parameter \(\uppsi\). And, \({{\text{N}}}_{{\text{p}}}\left(\upmu , \sum \right)\) denotes a normal distribution with mean vector \(\upmu\) and covariance matrix \(\sum\). Hyperparameters of all the unknown parameters are assumed to be known with proper priors; we assigned low-informative prior distributions for all the parameters, as no previous knowledge is available for elicitation of informative priors. Additionally, we utilize MCMC methods, including the Gibbs sampler and the Metropolis–Hastings algorithm, to iteratively draw samples from the conditional posterior distribution.

3.3 Joint modelling with missing covariates

Specifying a model for missing covariates is essential, yet there is a limited amount of literature available on this topic [29, 30]. For ignorable missing covariates, Hartley and Hocking [31] applied the likelihood factorization method, while Schafer [32] discussed all the standard available techniques to handle incomplete multivariate data. Additionally, Little and Rubin [17] elaborated on addressing missing data problems using observed data likelihood techniques.

Let \({{\text{z}}}_{{\text{ik}}},\mathrm{ k}=1,\dots ,{{\text{K}}}_{1}\) be the time-invariant covariates with missing values. A generalized linear model (GLM) is considered for modelling them, as follows

$${\text{p}}\left({{\text{z}}}_{{\text{ik}}}|{\uptheta }_{1{\text{ik}}},{\uptau }_{1{\text{k}}}\right)={\text{exp}}\left\{{{\uptau }_{1{\text{k}}}}^{-1}\left({{\text{z}}}_{{\text{ik}}}{\uptheta }_{1{\text{ik}}}-{{\text{h}}}_{1{\text{k}}}\left({\uptheta }_{1{\text{ik}}}\right)\right)+{{\text{C}}}_{1{\text{k}}}\left({{\text{z}}}_{{\text{ik}}},{\uptau }_{1{\text{k}}}\right)\right\},$$
(9)

where \({\uptau }_{1{\text{k}}}\) is the dispersion parameter, and \({{\text{h}}}_{1{\text{k}}}\)(.) and \({{\text{C}}}_{1{\text{k}}}\)(.,.) are known functions. The formulation of the generalized linear model is completed by,

$$E\left( {{\text{z}}_{{{\text{ik}}}} } \right) = {\text{h}}_{{1{\text{k}}}} {^{\prime}}\left( {{\uptheta }_{{1{\text{ik}}}} } \right) = {\text{h}}_{{1{\text{k}}}} {^{\prime}}\left( {{\text{x}}_{{4{\text{ik}}}}^{{\text{T}}} \vartheta_{{1{\text{k}}}} } \right),$$
(10)

where the link function \({{\text{h}}}_{{\text{k}}}\mathrm{^{\prime}}\) (.) in (10) is the derivative of the function \({{\text{h}}}_{1{\text{k}}}\) (.) in (9); \({{\text{x}}}_{4{\text{ik}}}^{T}\) is a vector of covariates for the regression coefficients \({\mathrm{\vartheta }}_{1{\text{k}}}\). Also, let \({{\text{s}}}_{{\text{ijk}}},\mathrm{ k}=1,\dots ,{{\text{K}}}_{2}\) be the time-varying covariates with missing values. For those, a generalized linear mixed effects model (GLME) is considered as follows,

$${\text{p}}\left({{\text{s}}}_{{\text{ijk}}}|{\uptheta }_{2{\text{ijk}}},{\uptau }_{2{\text{k}}}\right)={\text{exp}}\left\{{{\uptau }_{2{\text{k}}}}^{-1}\left({{\text{s}}}_{{\text{ijk}}}{\uptheta }_{2{\text{ijk}}}-{{\text{h}}}_{2{\text{k}}}\left({\uptheta }_{2{\text{ijk}}}\right)\right)+{{\text{C}}}_{2{\text{k}}}\left({{\text{s}}}_{{\text{ijk}}},{\uptau }_{2{\text{k}}}\right)\right\}.$$

where \({\uptau }_{2{\text{k}}}\) is the dispersion parameter, and \({{\text{h}}}_{2{\text{k}}}\)(.) and \({{\text{C}}}_{2{\text{k}}}\)(.,.) are known functions. The formulation of the generalized linear mixed model is considered by,

$${\text{E}}\left( {{\text{s}}_{{{\text{ijk}}}} } \right) = {\text{h}}_{{2{\text{k}}}} {^{\prime}}\left( {{\uptheta }_{{2{\text{ijk}}}} } \right) = {\text{h}}_{{2{\text{k}}}} {^{\prime}}\left( {{\text{x}}_{{5{\text{ijk}}}}^{{\text{T}}} \vartheta_{{2{\text{k}}}} + {\text{w}}_{{3{\text{ijk}}}}^{T} {\text{v}}_{{{\text{ik}}}} } \right),$$

where the link function \({{\text{h}}}_{2{\text{k}}}\mathrm{^{\prime}}\) (.) is the derivative of the function \({{\text{h}}}_{2{\text{k}}}\) (.), \({{\text{x}}}_{5{\text{ijk}}}^{T}\) is vector of covariates for the regression coefficients \({\vartheta }_{2{\text{k}}},{{\text{v}}}_{{\text{ik}}} \sim \mathrm{iid N}\left(0, {D}_{3{\text{k}}}\right)\) are the random effects and \({{\text{w}}}_{3{\text{ijk}}}^{T}\) is the corresponding design matrix.

These additional assumptions pertain to the linear predictor of time-invariant covariates or the value of the covariate at the current time in longitudinal sub-models. However, for time-varying covariates, we consider the value of the linear predictor at the first observed time as a covariate for the event time sub-model. Additionally, we need to include the following term in the conditional posterior distribution (in Eq. 8),

$$\prod_{{\text{i}}=1}^{{\text{n}}}(\prod_{{\text{k}}=1}^{{{\text{K}}}_{1}}{\text{p}}\left({{\text{z}}}_{{\text{ik}}}|{\uptheta }_{1{\text{ik}}},{\uptau }_{1{\text{k}}}\right))\times \left(\prod_{{\text{k}}=1}^{{{\text{K}}}_{2}}{\text{p}}\left({{\text{s}}}_{{\text{ijk}}}|{\uptheta }_{2{\text{ijk}}},{\uptau }_{2{\text{k}}},{{\text{v}}}_{{\text{ik}}}\right)\right)\times\Phi \left({{\text{v}}}_{{\text{ik}}};0, {D}_{3{\text{k}}}\right)\times \mathrm{ p}\left(\Theta \right),$$

where \(\Theta\) is the vector of all unknown parameters in the missing covariate modelling.

4 Analysis of PCa data

This section illustrates the analysis of PCa data, as described in Sect. 2. To begin, we first address models for the time-varying missing covariates, \({\text{Platelets}}\), and \({\text{Bilirubin}}\). In this context, we employ two linear mixed-effects models with random intercepts, treating time as a fixed effect:

$${\text{Platelets}}_{{{\text{ij}}}} = {\upmu }_{ij}^{{{\text{Platelets}}}} + \varepsilon_{ij}^{{{\text{Platelets}}}} ,{\text{i}} = {1}, \ldots ,{\text{n}} = {15}0{4},{\text{ j}} = {1}, \ldots ,{\text{m}} = {5},$$
(11)
$${{\upmu }_{ij}^{{\text{Platelets}}}=\mathrm{\varsigma }}_{11}+{\mathrm{\varsigma }}_{12}{{\text{t}}}_{{\text{ij}}}+{v}_{1{\text{i}}},$$
$${{\text{Bilirubin}}}_{{\text{ij}}}={\upmu }_{ij}^{{\text{Bilirubin}}}+{\varepsilon }_{ij}^{{\text{Bilirubin}}},$$
(12)
$${{\upmu }_{ij}^{{\text{Bilirubin}}}=\mathrm{\varsigma }}_{21}+{\mathrm{\varsigma }}_{22}{{\text{t}}}_{{\text{ij}}}+{v}_{2{\text{i}}},$$

Where \({\upvarepsilon }_{{\text{ij}}}^{{\text{Platelets}}}\sim {\text{N}}\left(0,{\upsigma }_{{\text{Platelets}}}^{2}\right),\) \({\upvarepsilon }_{{\text{ij}}}^{{\text{Bilirubin}}}\sim {\text{N}}(0,{\upsigma }_{{\text{Bilirubin}}}^{2})\), \({v}_{1{\text{i}}}\sim {\text{N}}\left(0,{\uptau }_{{\text{Platelets}}}^{2}\right),\) \({v}_{2{\text{i}}}\sim {\text{N}}(0,{\uptau }_{{\text{Bilirubin}}}^{2})\), \({\mathbf{\varsigma }}_{1}=({\mathrm{\varsigma }}_{11},{\mathrm{\varsigma }}_{12})\), \({\mathbf{\varsigma }}_{2}=({\mathrm{\varsigma }}_{21},{\mathrm{\varsigma }}_{22})\), \({\mathbf{\varsigma }}_{1},\) \({\mathbf{\varsigma }}_{2}\sim {N}_{3}(\mathrm{0,1000}{{\varvec{I}}}_{2})\), where \({{\varvec{I}}}_{2}\) is a 2 × 2 diagonal matrix and \({\upsigma }_{{\text{Platelets}}}^{2},{\upsigma }_{{\text{Bilirubin}}}^{2}\),\({\uptau }_{{\text{Platelets}}}^{2},\) \({\uptau }_{{\text{Bilirubin}}}^{2}\sim \mathrm{I\Gamma }\left(\mathrm{0.1,0.1}\right).\)

The joint modelling comprises three sub-models. For the continuous outcome \({\text{log}}({\text{PSA}})\), a linear mixed-effects (LMM) model with a random intercept and slope is specified:

$${\text{log}}\left({{\text{PSA}}}_{{\text{ij}}}\right)={\upmu }_{ij}^{{\text{PSA}}}+{\varepsilon }_{ij}^{{\text{PSA}}},$$

where,

$${\upmu }_{ij}^{{\text{PSA}}}={\upbeta }_{11}+{\upbeta }_{12}{{\text{t}}}_{{\text{ij}}}+{\upbeta }_{13}{{\text{Age}}}_{{\text{i}}}+{\upbeta }_{14}{\upmu }_{ij}^{{\text{Platelets}}}+{\upbeta }_{15}{{\text{BMI}}}_{{\text{i}}}+{\upbeta }_{16}{\upmu }_{ij}^{{\text{Bilirubin}}}+{\upbeta }_{17}{\mathrm{Gleason Score}}_{{\text{i}}}+{\upbeta }_{18}{{\text{Drug}}}_{{\text{i}}}+{{\text{b}}}_{1{\text{i}}}+{{\text{b}}}_{2{\text{i}}}{{\text{t}}}_{{\text{ij}}}+{\upvarepsilon }_{{\text{ij}}},$$

where \({\upvarepsilon }_{{\text{ij}}}\)~N(0,\({\upsigma }_{\upvarepsilon }^{2}\)). In this model, we cannot use the values of \({\text{Platelets}}\) and \({\text{Bilirubin}}\) directly due to missingness. Instead, we consider the values of linear predictor at time j based on models (11) and (12), respectively. The second longitudinal outcome is a binary \({\text{ALP}}\): a logistic mixed-effects regression model is applied as follows:

$${\text{logit}}\left({\text{P}}\left({{\text{ALP}}}_{{\text{ij}}}=1\right)\right){=\upmu }_{ij}^{{\text{ALP}}},$$

where,

$${\upmu }_{ij}^{{\text{ALP}}}={\upbeta }_{21}+{\upbeta }_{22}{{\text{s}}}_{{\text{ij}}}+{\upbeta }_{23}{{\text{Age}}}_{{\text{i}}}+{\upbeta }_{24}{\upmu }_{ij}^{{\text{Platelets}}}+{\upbeta }_{25}{{\text{BMI}}}_{{\text{i}}}+{\upbeta }_{26}{\upmu }_{ij}^{{\text{Bilirubin}}}+{\upbeta }_{27}{\mathrm{Gleason Score}}_{{\text{i}}}+{\upbeta }_{28}{{\text{Drug}}}_{{\text{i}}}+{\upgamma }_{{\text{j}}}{\upmu }_{ij}^{{\text{PSA}}}{+{\text{u}}}_{1{\text{i}}}+{{\text{u}}}_{2{\text{i}}}{{\text{s}}}_{{\text{ij}}}.$$

The same as the model for the continuous outcome, the values of the linear predictor of \({\text{Platelets}}\) and \({\text{Bilirubin}}\) variables are considered as time-varying covariates in this model.

A Weibull model is specified for time to tumor shrinkage, such that \({{\text{T}}}_{{\text{i}}}\sim {\text{Weibull}}\left({\uplambda }_{{\text{i}}}^{{\text{t}}},{\text{r}}\right),\) where \({\uplambda }_{{\text{i}}}^{{\text{t}}}\) is given as follows:

$${\text{log}}\left({\uplambda }_{{\text{i}}}^{{\text{t}}}\right)={\upbeta }_{31}+{\upbeta }_{32}{{\text{Age}}}_{{\text{i}}}+{\upbeta }_{33}{\upmu }_{i1}^{{\text{Platelets}}}+{\upbeta }_{34}{{\text{BMI}}}_{{\text{i}}}+{\upbeta }_{35}{\upmu }_{i1}^{{\text{Bilirubin}}}+$$
$${\upbeta }_{36}{\mathrm{Gleason Score}}_{{\text{i}}}+{\upbeta }_{37}{{\text{Drug}}}_{{\text{i}}}+{{\text{u}}}_{3{\text{i}}}+{{\text{b}}}_{3{\text{i}}}.$$

In the event time sub-model, values of a linear predictor of \({\text{Platelets}}\) and \({\text{Bilirubin}}\) at baseline are considered instead of covariates.

Also, \({{\text{b}}}_{{\text{i}}}=\left({{\text{b}}}_{1{\text{i}}},{{\text{b}}}_{2{\text{i}}},{{\text{b}}}_{3{\text{i}}}\right)\sim {\text{MVN}}\left(\left(\mathrm{0,0},0\right),{\Sigma }_{{\text{b}}}\right)\), and \({{\text{u}}}_{{\text{i}}}=\left({{\text{u}}}_{1{\text{i}}},{{\text{u}}}_{2{\text{i}}},{{\text{u}}}_{3{\text{i}}}\right)\sim {\text{MVN}}\left(\left(\mathrm{0,0},0\right),{\Sigma }_{{\text{u}}}\right), i=1,\dots ,n,\) where \({\Sigma }_{{\text{b}}}\sim \mathrm{IWishart }\left({\uppsi }_{{\Sigma }_{{\text{b}}}},{{\text{v}}}_{{\Sigma }_{{\text{b}}}}\right),{\Sigma }_{{\text{u}}}\sim \mathrm{IWishart }\left({\uppsi }_{{\Sigma }_{{\text{u}}}},{{\text{v}}}_{{\Sigma }_{{\text{u}}}}\right),\) such that the hyper-parameters of \({\uppsi }_{{\Sigma }_{{\text{b}}}}={\uppsi }_{{\Sigma }_{{\text{u}}}}={{\text{I}}}_{2}\) and \({{\text{v}}}_{{\Sigma }_{{\text{b}}}}={{\text{v}}}_{{\Sigma }_{{\text{u}}}}=3\) which lead to low-informative priors.

For the prior distributions, it is to be assumed that \({\upsigma }_{\upvarepsilon }^{2}\sim \mathrm{I\Gamma }\left(\mathrm{0.1,0.1}\right),\) the regression coefficients \({\upbeta }_{11}, \dots .,{\upbeta }_{18},{\upbeta }_{21},\dots .,{\upbeta }_{28},\) and \({\upbeta }_{31},\dots .,{\upbeta }_{37}\) are fixed effects unknown parameters and the prior distributions for them are N(0,1000), \({\upgamma }_{{\text{j}}}, j=1,\dots ,m\) is the associated parameter of the continuous longitudinal \({\text{log}}\left(\mathrm{ PSA}\right)\) on the binary longitudinal \({\text{ALP}}\) at time \({\text{j}}\) and the prior distributions for it is N(0,1000).

For analyzing the data, in addition to the proposed joint model, separate models are also considered. Model estimation was conducted using R2OpenBUGS with MCMC in two parallel chains running for 10,000 iterations, the first 5000 being discarded as burn-in. For model comparison, DIC is considered; this criterion for the proposed joint model is equal to 39,920 and for separate models is equal to 51,290. Therefore, the proposed joint model has better performance. The results for these two models are summarized in Tables 1, 2 and 3.

Table 1 Parameter estimates (Mean), standard deviation (sd), and 95% credible interval (CI) of PCa data, by applying our proposed joint SREM for the characterization of MAR and separate models
Table 2 Parameter estimates (Mean), standard deviation (sd), and 95% credible interval (CI) of PCa data for the components of variance of the random effects by applying our proposed joint SREM for the characterization of MAR and separate models
Table 3 Parameter estimates (Mean), standard deviation (sd), and 95% credible interval (CI) of PCa data for modelling \({\text{Platelets}}\) and \({\text{Bilirubin}}\)

Table 1 presents the posterior mean, standard deviation (sd), and 95% CI of the PCa data analysis. The results indicate that \({\text{PSA}}\) decreases for every unit increase in time, observed in both joint and separate models. \({\text{Age}}\) has a significant effect on \({\text{PSA}}\); with a one-unit increase in \({\text{Age}}\), \({\text{PSA}}\) levels decrease during the follow-up time. Additionally, an increase in \({\text{Platelets}}\) is associated with an increase in \({\text{PSA}}\). It is noteworthy that a one-unit increase in \({\text{BMI}}\) decreases \({\text{PSA}}\) by 0.041. The results show that \({\text{PSA}}\) is higher among patients with a \(\mathrm{Gleason Score}\) greater than or equal to (4 + 3) compared to those with a \(\mathrm{Gleason Score}\) lower than (3 + 4). This study's results reveal that patients who received ADT, prostatectomy, and combinations have higher \({\text{PSA}}\) levels compared to those who received EBRT, and this effect is statistically significant.

Based on the results of the binary outcome, an increase in \({\text{BMI}}\) is associated with a significant decrease in \({\text{ALP}}\), while \({\text{ALP}}\) shows a significant increase with an increase in \({\text{Bilirubin}}\). The analysis of shared parameters indicates a positive association between continuous and binary longitudinal outcomes.

For the event time model, the risk of tumor shrinkage increases with time in patients with larger values of \({\text{BMI}}\) and \({\text{Bilirubin}}\). The hazard of tumor shrinkage is higher among patients who received EBRT than among those who received ADT, prostatectomy, and combinations.

To assess the dependence between longitudinal markers and the event time outcome, the significance of \({\upsigma }_{{\text{u}}13}\), \({\upsigma }_{{\text{u}}33}\), \({\upsigma }_{{\text{b}}23}\), and \({\upsigma }_{{\text{b}}33}\) is examined (see Table 2). As \({\upsigma }_{{\text{u}}33}\) is significant, it confirms the association between longitudinal markers and event time outcome. Additionally, Table 3 presents parameter estimations of missing covariates, considered in the context of accounting for missing data.

5 Conclusions

We have proposed an amended joint SREM with model-based handling of MAR, taking into account assumptions regarding non-informative censoring and ignorability. In this approach, different types of longitudinal outcomes (continuous and binary) are considered alongside an event time outcome. This research article introduces Bayesian joint modelling of multivariate longitudinal mixed measurements and the event time model using the MCMC approach within the joint modelling framework.

Our main purpose is to contribute to the understanding of PCa progression which affects a large percentage of men. Complexity increases in data analysis accounting for missing observations in time-varying covariates. This work aims particularly to determine, which factors affect the shrinkage of PCa tumors during treatment; and how these factors interact over time [33]. Two longitudinal outcomes \({\text{PSA}}\) and \({\text{ALP}}\) are modeled along with tumor shrinkage, using a joint modelling framework. Moreover, an association among the three responses is taken into consideration. The analysis is done in such a way that the effect of different factors does not change with time and has complete measurements for all patients and the effect of variables whose values change with respect to time and not all observations are taken on all time points for patients are checked on longitudinal and event time responses. A joint modelling strategy is adopted to understand the joint evolution of repeated measurements of \({\text{PSA}}\), \({\text{ALP}}\), and \({\text{TTE}}\) processes at an individual level. In contrast, separate analyses may lead to producing biased estimates and may provide inefficient results [6].

The linear mixed-effects models presented here are based on the assumption that dropout is ignorable (MAR). Substantial evidence indicates an association between \({\text{PSA}}\) levels and \({\text{ALP}}\), suggesting increased tumor shrinkage following treatment.

This article focuses on the MAR characterization; non-ignorable missingness (NMAR) is under consideration for future work plans. Instead of utilizing SREM, one can explore alternatives such as pattern mixture and selection for handling non-random missing data [18]. Additionally, variations may be introduced by employing a probit model instead of a logistic-mixed effects model to analyze the \({\text{ALP}}\) response variable. For the random effects and error terms, various distributional choices are available beyond the normal distribution. In cases where historical data or strong prior information is available, informative priors can be employed.