A general piecewise multi-state survival model: application to breast cancer


Multi-state models are considered in the field of survival analysis for modelling illnesses that evolve through several stages over time. Multi-state models can be developed by applying several techniques, such as non-parametric, semi-parametric and stochastic processes, particularly Markov processes. When the development of an illness is being analysed, its progression is tracked periodically. Medical reviews take place at discrete times, and a panel data analysis can be formed. In this paper, a discrete-time piecewise non-homogeneous Markov process is constructed for modelling and analysing a multi-state illness with a general number of states. The model is built, and relevant measures, such as survival function, transition probabilities, mean total times spent in a group of states and the conditional probability of state change, are determined. A likelihood function is built to estimate the parameters and the general number of cut-points included in the model. Time-dependent covariates are introduced, the results are obtained in a matrix algebraic form and the algorithms are shown. The model is applied to analyse the behaviour of breast cancer. A study of the relapse and survival times of 300 breast cancer patients who have undergone mastectomy is developed. The results of this paper are implemented computationally with MATLAB and R.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. 1.

    The degree of freedom is given by 7 possible transitions (1 → 1, 1 → 2, 1 → 3 1 → C, 2 → 2, 2 → 3, 2 → C), 3 periods, 8 groups of patients divided by treatment regimen and 35 estimated parameters: (7 − 1) × (3 − 1) × (8 − 1) − 35 = 84 − 35 = 49.


  1. Andersen PK, Keiding N (2001) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115

    MATH  Article  Google Scholar 

  2. Bacchetti P, Boylan RD, Terrault NA, Monto A, Berenguer M (2010) Non-Markov multistate modeling using time-varying covariates, with application to progression of liver fibrosis due to hepatitis C following liver transplant. Int J Biostat 6(1):1–14

    MathSciNet  Article  Google Scholar 

  3. Chen B, Yi GY, Cook RJ (2010) Analysis of interval censored disease progression data via multistate models under a non ignorable inspection process. Stat Med 29:1175–1189

    MathSciNet  Article  Google Scholar 

  4. Commenges D, Joly P (2004) Multi-state model for dementia, institutionalization and death. Commun Stat A 33:1315–1326

    MathSciNet  MATH  Article  Google Scholar 

  5. Cortese G, Andersen PK (2010) Competing risks and time-dependent covariates. Biom J 52(1):138–158

    MathSciNet  MATH  Google Scholar 

  6. Faddy MJ (1998) On inferring the number of phases in a coxian phase-type distribution. Commun Stat Stoch Models 14(1–2):407–417

    MATH  Article  Google Scholar 

  7. Farewell VT, Tom BDM (2014) The versatility of multi-state models for the analysis of longitudinal data with unobservable features. Lifetime Data Anal 20:51–75

    MathSciNet  MATH  Article  Google Scholar 

  8. Hollander M, Proschan F (1979) Testing to determine the underlying distribution using randomly censored data. Biometrics 35(2):393–401

    MathSciNet  MATH  Article  Google Scholar 

  9. Hougaard P (1999) Multi-state models: a review. Lifetime Data Anal 5:239–264

    MathSciNet  MATH  Article  Google Scholar 

  10. Ieva F, Jackson C, Sharples LD (2015) Multi-state modelling of repeated hospitalisation and death in patients with heart failure: the use of large administrative databases in clinical epidemiology. Stat Methods Med Res. https://doi.org/10.1177/0962280215578777

    Article  Google Scholar 

  11. Jackson CH (2011) Multi-state models for panel data: the msm package for R. J Stat Softw 38:1–29

    Article  Google Scholar 

  12. Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E (2003) Multi-state Markov models for disease progression with classification error. Statistician 52:193–209

    MathSciNet  Google Scholar 

  13. Kalbfleisch JD, Lawless JF (1985) The analysis of panel data under a Markov assumption. J Am Stat Assoc 80:863–871

    MathSciNet  MATH  Article  Google Scholar 

  14. Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley series in probability and mathematical statistics. Wiley, Hoboken

    Google Scholar 

  15. Meira-Machado L, de Uña-Alvarez J, Cadarso-Suarez C (2009) Multi-state models for the analysis of time-to-event data. Stat Methods Med Res 18(2):195–222

    MathSciNet  Article  Google Scholar 

  16. Neuts MF (1981) Matrix-geometric solutions in stochastic models. Volume 2 of Johns Hopkins series in the mathematical sciences. Johns Hopkins University Press, Baltimore

    Google Scholar 

  17. Pérez-Ocón R, Ruiz-Castro JE, Gámiz-Pérez ML (1998) A multivariate model to measure the effect of treatments in survival to breast cancer. Biom J 40(6):703–715

    MATH  Article  Google Scholar 

  18. Pérez-Ocón R, Ruiz-Castro JE, Gámiz-Pérez ML (2001) Non-homogeneous Markov processes for analysing the effect of treatments to breast cancer. Stat Med 20:109–122

    Article  Google Scholar 

  19. Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26:2389–2430

    MathSciNet  Article  Google Scholar 

  20. Santamaría C, García-Mora B, Rubio G, Navarro E (2009) A Markov model for analyzing the evolution of bladder carcinoma. Math Comput Model 50:726–732

    MathSciNet  MATH  Article  Google Scholar 

  21. Singer JD, Willett JB (2003) Applied longitudinal data analysis. Oxford University Press, Oxford

    Google Scholar 

  22. Titman AC (2014) Estimating parametric semi-Markov models from panel data using phase-type approximations. Stat Comput 24:155–164

    MathSciNet  MATH  Article  Google Scholar 

  23. Titman AC, Sharples LD (2010) Model diagnostics for multi-state models. Stat Methods Med Res 19(6):621–651. https://doi.org/10.1177/0962280209105541

    MathSciNet  Article  Google Scholar 

  24. Van De Hout A (2016) Multi-state survival models for interval-censored data. CRC Press, Boca Raton

    Google Scholar 

Download references


Funding was provided by Ministerio de Economía y Competitividad (Grant No. FQM-307), European Regional Development Fund (ERDF) (Grant No. MTM2017-88708-P), University of Milano-Bicocca (Grant No. 2014-ATE-0228).

Author information



Corresponding author

Correspondence to Mariangela Zenga.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A

The parameters of the model are estimated by a maximum likelihood function. These parameters are the matrices Tu (or parameters inside these matrices), the regression covariate vectors βu, for u = 1,…, k and the cut-points, all of them estimated jointly. We assume that n items are observed, all beginning in state 1, and item i is observed at mi change times, the last time being death or censorship. Given that the item is observed at change times, then for any item, the value of the covariate vector and the corresponding state is observed. Therefore, a sequence of times, states and values of the covariate vector is achieved for each item i: \(0 = t_{i,1} < t_{i,2} < \cdots < t_{{i,m_{i} }}\), \(1 = x_{1}^{i} , \ldots , \, x_{{m_{i} }}^{i}\) and \({\mathbf{z}}_{{l_{1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{{m_{i} }} }}^{i}\), respectively. \({\mathbf{z}}_{{l_{s} }}^{i}\) corresponds to the covariate vector for the interval that contains the time \(t_{i,s}\) for item i and for \(s = 1, \ldots ,m_{i}\).

We assume k − 1 unknown positive integer cut-points, c0 = 0 < c1 < ··· < ck−1 < ck = ∞. The likelihood function for estimating the parameters is given by

$$L\left( {c_{1} , \ldots ,c_{k - 1} ,{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = 1, \ldots ,k} \right) = \prod\limits_{i = 1}^{n} {\prod\limits_{s = 2}^{{m_{i} }} {h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = 1, \ldots ,k} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right)} } .$$

For the calculations, we define the intervals \(I_{q} = \left[ {c_{q - 1} ,c_{q} } \right[;J_{q} = \left] {c_{q - 1} ,c_{q} } \right] ,\, \, j = 1, \ldots ,k\). Let \(f_{x}^{q} \left( {t,{\mathbf{z}}_{q}^{i} ;{\mathbf{T}}_{q} ,{\varvec{\upbeta}}^{q} } \right)\) be the sojourn time probability in state x at time t calculated by using the matrix \({\mathbf{P}}_{q} \left( {{\mathbf{z}}_{q}^{i} } \right)\). Given that the state at any cut-point is known, then the factors in the likelihood function have the following expressions,

  1. 1.

    If ti,s−1 and ti,s belong to intervals Ij and Jj, respectively,

    $$h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = f_{{x_{s - 1}^{i} }}^{j} \left( {t_{i,s} - t_{i,s - 1} - 1,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right)T_{{x_{s - 1}^{i} x_{s}^{i} }}^{j} \left( {{\mathbf{z}}_{j}^{i} } \right) .$$
  2. 2.

    If ti,s−1 and ti,s belong to interval Ij−1, Jj, respectively,

    $$\begin{aligned} h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = j - 1,j} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = & f_{{x_{s - 1}^{i} }}^{j - 1} \left( {c_{j - 1} - t_{i,s - 1} ,{\mathbf{z}}_{j - 1}^{i} ;{\mathbf{T}}_{j - 1} ,{\varvec{\upbeta}}^{j - 1} } \right) \\ & \quad \times f_{{x_{s - 1}^{i} }}^{j} \left( {t_{i,s} - c_{j - 1} - 1,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right)T_{{x_{s - 1}^{i} ,x_{s}^{i} }}^{j} \left( {{\mathbf{z}}_{j}^{i} } \right). \\ \end{aligned}$$
  3. 3.

    If \(t_{i,s - 1} \in I_{j} \;{\text{and}}\;t_{i,s} \in J_{q} \;{\text{with}}\;q - j \ge 2\),

    $$\begin{aligned} h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = j, \ldots ,q} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = & f_{{x_{s - 1}^{i} }}^{j} \left( {c_{j} - t_{i,s - 1} ,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right) \\ & \quad \times \prod\limits_{u = j + 1}^{q - 1} {f_{{x_{s - 1}^{i} }}^{u} \left( {c_{u} - c_{u - 1} ,{\mathbf{z}}_{u}^{i} ;{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} } \right)} f_{{x_{s - 1}^{i} }}^{q} \left( {t_{i,s} - c_{q} - 1,{\mathbf{z}}_{q}^{i} ;{\mathbf{T}}_{q} ,{\varvec{\upbeta}}^{q} } \right)T_{{x_{s - 1}^{i} ,x_{s}^{i} }}^{q} \left( {{\mathbf{z}}_{q}^{i} } \right). \\ \end{aligned}$$

The likelihood function is maximized by considering several restrictions. The matrices \({\mathbf{P}}_{q}\) and \({\mathbf{P}}_{q} \left( {{\mathbf{z}}_{q}^{i} } \right)\) associated with the model should be stochastic matrices for any covariate vector \({\mathbf{z}}_{q}^{i}\). This restriction will not allow probabilities less than zero or greater than one for any values of the parameters.

Then, the cut-points are estimated, and the optimum values \(c_{1} , \ldots ,c_{k - 1}\) are the values that verify

$$c_{1} , \ldots ,c_{k - 1} \in {\rm N}\,{\text{such}}\,{\text{that}}\,L\left( {c_{1} , \ldots ,c_{k - 1} ,{\hat{\mathbf{T}}}_{u}^{{c_{1} , \ldots ,c_{k - 1} }} ,{\hat{\mathbf{\beta }}}_{u}^{{c_{1} , \ldots ,c_{k - 1} }} ,u = 1, \ldots ,k} \right) = \mathop {\hbox{max} }\limits_{{v_{j} }} \left\{ {L\left( {v_{1} , \ldots ,v_{k - 1} ,{\hat{\mathbf{T}}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,{\hat{\mathbf{\beta }}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,u = 1, \ldots ,k} \right)} \right\} ,$$

subject to \(0 < v_{j} < v_{j + 1} \,{\text{for}}\, \, j = 1, \ldots ,k - 2\) and \(v_{k - 1} < \mathop {\hbox{max} }\limits_{i} \left\{ {t_{{i,m_{i} }} } \right\}\), where vj belongs to the set of natural numbers for any j with the corresponding restrictions. \(\left( {{\hat{\mathbf{T}}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,{\hat{\mathbf{\beta }}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,u = 1, \ldots ,k} \right)\) are the maximum likelihood estimates of \(\left( {{\mathbf{T}}^{u} ,{\varvec{\upbeta}}^{u} ,u = 1, \ldots ,k} \right)\) for \(\nu_{1} , \ldots ,\nu_{k - 1}\).

The likelihood function has been implemented computationally with Matlab and it is maximized by using the function fmincon of this programme. This function is used to find the minimum of a constrained nonlinear multivariable function by using the interior-point algorithm.

Appendix B

See Tables 11 and 12.

Table 11 Contingency table of observed and expected counts for the homogeneous model
Table 12 Contingency table of observed and expected counts for the piecewise model

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ruiz-Castro, J.E., Zenga, M. A general piecewise multi-state survival model: application to breast cancer. Stat Methods Appl 29, 813–843 (2020). https://doi.org/10.1007/s10260-019-00505-6

Download citation


  • Survival
  • Breast cancer
  • Piecewise Markov model
  • Multi-state model