Skip to main content

Advertisement

Log in

A latent variable approach for modeling recall-based time-to-event data with Weibull distribution

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The ability of individuals to recall events is influenced by the time interval between the monitoring time and the occurrence of the event. In this article, we introduce a non-recall probability function that incorporates this information into our modeling framework. We model the time-to-event using the Weibull distribution and adopt a latent variable approach to handle situations where recall is not possible. In the classical framework, we obtain point estimators using expectation-maximization algorithm and construct the observed Fisher information matrix using missing information principle. Within the Bayesian paradigm, we derive point estimators under suitable choice of priors and calculate highest posterior density intervals using Markov Chain Monte Carlo samples. To assess the performance of the proposed estimators, we conduct an extensive simulation study. Additionally, we utilize age at menarche and breastfeeding datasets as examples to illustrate the effectiveness of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Banerjee A, Kundu D (2008) Inference based on type-II hybrid censored data from a Weibull distribution. IEEE Trans Reliab 57(2):369–378

    Article  Google Scholar 

  • Bernardo JM (1979) Reference posterior distributions for Bayesian inference. J Roy Stat Soc: Ser B (Methodol) 41(2):113–128

    MathSciNet  Google Scholar 

  • Chen MH, Shao QM (1999) Monte Carlo estimation of Bayesian credible and HPD intervals. J Comput Graph Stat 8(1):69–92

    MathSciNet  Google Scholar 

  • De Gruttola V, Lagakos SW (1989) Analysis of doubly-censored survival data, with application to AIDS. Biometrics pp 1–11

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc: Ser B (Methodol) 39(1):1–22

    MathSciNet  Google Scholar 

  • Diamond ID, McDonald JW, Shah IH (1986) Proportional hazards models for current status data: application to the study of differentials in age at weaning in Pakistan. Demography pp 607–620

  • Fan TH, Wang YF, Ju SK (2019) A competing risks model with multiply censored reliability data under multivariate Weibull distributions. IEEE Trans Reliab 68(2):462–475

    Article  Google Scholar 

  • Gelman A, Simpson D, Betancourt M (2017) The prior can often only be understood in the context of the likelihood. Entropy 19(10):555

    Article  Google Scholar 

  • Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109

    Article  MathSciNet  Google Scholar 

  • Hope AC (1968) A simplified Monte Carlo significance test procedure. J Roy Stat Soc: Ser B (Methodol) 30(3):582–598

    Google Scholar 

  • Jewell NP, van der Laan M (2003) Current status data: Review, recent developments and open problems. Handbook Statist 23:625–642

    Article  MathSciNet  Google Scholar 

  • Kundu D (2008) Bayesian inference and life testing plan for the Weibull distribution in presence of progressive censoring. Technometrics 50(2):144–154

    Article  MathSciNet  Google Scholar 

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J Roy Stat Soc: Ser B (Methodol) 44(2):226–233

    MathSciNet  Google Scholar 

  • Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247):335–341

    Article  Google Scholar 

  • Roy S, Gijo E, Pradhan B (2017) Inference based on progressive type-I interval censored data from log-normal distribution. Commun Stat-Simul Comput 46(8):6495–6512

    Article  MathSciNet  Google Scholar 

  • Salehabadi SM, Sengupta D (2016) Nonparametric estimation of time-to-event distribution based on recall data in observational studies. Lifetime Data Anal 22(4):473–503

    Article  MathSciNet  Google Scholar 

  • Salehabadi SM, Sengupta D (2018) Recent advances in the statistical analysis of retrospective time-to-event data. Advances in Growth Curve and Structural Equation Modeling, pp 137–150

  • Salehabadi SM, Sengupta D, Das R (2015) Parametric estimation of menarcheal age distribution based on recall data. Scand J Stat 42(1):290–305

    Article  MathSciNet  Google Scholar 

  • Soland RM (1969) Bayesian analysis of the Weibull process with unknown scale and shape parameters. IEEE Trans Reliab 18(4):181–184

    Article  Google Scholar 

  • Sun J, Kalbfleisch JD (1993) The analysis of current status data on point processes. J Am Stat Assoc 88(424):1449–1454

    Article  MathSciNet  Google Scholar 

  • Tan Z (2009) A new approach to MLE of Weibull distribution with interval data. Reliab Eng Syst Safety 94(2):394–403

    Article  Google Scholar 

  • Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82(398):528–540

    Article  MathSciNet  Google Scholar 

  • Tian Q, Lewis-Beck C, Niemi JB et al (2023) Specifying prior distributions in reliability applications. Appl Stoch Models Bus Ind. https://doi.org/10.1002/asmb.2752

    Article  Google Scholar 

  • Tsokos CP (1972) A Bayesian approach to reliability- Theory and simulation(Bayesian analysis application to reliability and life parameter estimation for Weibull failure model, using Monte Carlo simulation). Annual Reliability and Maintainability Symposium. Calif, San Francisco, pp 78–87

  • Wang L (2018) Inference for Weibull competing risks data under generalized progressive hybrid censoring. IEEE Trans Reliab 67(3):998–1007

    Article  Google Scholar 

  • Yamane T (1967) Statistics: an introductory analysis. Harper and Row, New York

    Google Scholar 

  • Zeglen M, Marini E, Cabras S et al (2020) The relationship among the age at menarche, anthropometric characteristics, and socio-economic factors in Bengali, girls from Kolkata, India. Am J Human Biol 32(4):e23380

    Article  Google Scholar 

Download references

Acknowledgements

We are very grateful to Prof. Debasis Sengupta of the Indian Statistical Institute in Kolkata, West Bengal, India for providing the age at menarche dataset used for the analysis. We are also thankful to the editor and anonymous reviewers for their helpful and constructive comments, which have improved the quality of this manuscript.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation and analysis were performed by CPY and VB. The first draft of the manuscript was written by CPY and all other authors commented on previous versions of the manuscript and helped in subsequent revisions. All authors read and approved the final manuscript.

Corresponding author

Correspondence to C. P. Yadav.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Expectations used in E Step of E-M algorithm

While applying the E-M algorithm in Sect. 2.1, we introduced three latent variables. The conditional densities of these introduced latent variables can be obtained by utilizing the concept of truncation. In this context, we define the conditional densities of these latent variables in a general form, applicable to arbitrary parameter values. The conditional density of non-recall latent variate \(t^*_{l}\) is given by

$$\begin{aligned} f_l\big (t^*_{l} | t^*_{l}<s_i,\varrho ,\varsigma \big )= \frac{f(t^*_{l};\varrho ,\varsigma )}{1-\bar{F}(s_i;\varrho ,\varsigma )} =\frac{\varrho \varsigma {t^*_{l}}^{\varrho -1} \exp \left\{ -\varsigma {t^*_{l}}^{\varrho }\right\} }{1-\exp \left\{ -\varsigma s_i^{\varrho }\right\} }. \end{aligned}$$
(25)

The conditional density of a right-censored latent variate \(t^*_{r}\) is given by

$$\begin{aligned} f_r\Big (t^*_{r} | t^*_{r}>s_i,\varrho ,\varsigma \Big )= \frac{f(t^*_{r};\varrho ,\varsigma )}{\bar{F}(s_i;\varrho ,\varsigma )} =\frac{\varrho \varsigma {t^*_{r}}^{\varrho -1} \exp \left\{ -\varsigma {t^*_{r}}^{\varrho }\right\} }{\exp \left\{ -\varsigma s_i^{\varrho }\right\} }. \end{aligned}$$
(26)

Further, the conditional density of random variate \(w^*_{l}\) is given by

$$\begin{aligned} h_w\Big (w^*_{l} | w^*_{l}<(s_i-t^*_{l}),\omega \Big )= \frac{h(w^*_{l};\omega )}{1-\bar{H}(s_i-t^*_{l};\omega )} =\frac{\omega \exp \left\{ -\omega w^*_{l}\right\} }{1-\exp \left\{ -\omega (s_i-t^*_{l})\right\} }. \end{aligned}$$
(27)

Here, the symbols \(\varrho \) and \(\varsigma \) represent the parameters of the Weibull distribution, while \(\omega \) denotes the parameter of the exponential distribution.

During the \((k+1)^{th}\) iteration in the M step of the E-M algorithm, the expectation terms of the latent variables can be computed using the conditional densities defined in Equation (25) to (27), with the obtained parameter values from the k-th step. The expectation terms utilized in the E-M algorithm and the construction of the Fisher information matrix are provided below:

$$\begin{aligned} \xi _1\Big (t_{li}^*;\alpha ,\beta \Big )&= E\Big [\ln (t_{li}^*) | t_{li}^*<s_i,\alpha ,\beta \Big ] =\frac{\int _{0}^{s_i} \ln (u) \alpha \beta u^{\alpha -1} \exp \left\{ -\beta u^{\alpha -1}\right\} du}{1-\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&=\frac{I_1(s_i)-\ln (\beta ) \Big ( 1-\exp \{-\beta s_i^{\alpha }\}\Big )}{\alpha \Big (1-\exp \{-\beta s_i^{\alpha }\}\Big )}, \nonumber \\ \xi _2\Big (t_{li}^*;\alpha ,\beta \Big )&= E\Big [{t_{li}^*}^{\alpha } | t_{li}^*<s_i,\alpha ,\beta \Big ]=\frac{\int _{0}^{s_i} u^{2 \alpha -1} \alpha \beta \exp \left\{ -\beta u^{\alpha -1}\right\} du}{1-\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&=\frac{1-(1+\beta s_i^{\alpha }) \exp \{-\beta s_i^{\alpha }\}}{\beta \Big (1-\exp \{-\beta s_i^{\alpha }\}\Big )}, \nonumber \\ \xi _3\Big (t_{ri}^*;\alpha ,\beta \Big )&= E\Big [\ln (t_{ri}^*) | t_{ri}^*>s_i,\alpha ,\beta \Big ]=\frac{\int _{s_i}^{\infty } \ln (u) \alpha \beta u^{\alpha -1} \exp \left\{ -\beta u^{\alpha -1}\right\} du}{\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&= \frac{I_2(s_i)-\ln (\beta ) \Big (\exp \{-\beta s_i^{\alpha }\}\Big )}{\alpha \exp \{-\beta s_i^{\alpha }\}}, \nonumber \\ \xi _4\Big (t_{ri}^*;\alpha ,\beta \Big )&= E\Big [{t_{ri}^*}^{\alpha } | t_{ri}^*>s_i,\alpha ,\beta \Big ]=\frac{\int _{s_i}^{\infty } u^{2 \alpha -1} \alpha \beta \exp \left\{ -\beta u^{\alpha -1}\right\} du}{\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&=\frac{1}{\beta }\Big (1+\beta s_i^{\alpha }\Big ), \nonumber \\ \xi _5\Big (w_{li}^*;\lambda \Big )&= E\Big [w_{li}^* | w_{li}^*<(s_i-t_{li}^*),\lambda \Big ]=\frac{\int _{0}^{s_i-t_{li}^*}u \lambda \exp \left\{ -\lambda u\right\} du}{1-\exp \{-\lambda (s_i-t_{li}^{*})\}} \nonumber \\&=\frac{1}{\lambda } \left[ \frac{1-\Big (1+\lambda (s_i-t_{li}^{*})\Big )\exp \{-\lambda (s_i-t_{li}^*)\}}{1-\exp \{-\lambda (s_i-t_{li}^{*})\}} \right] , \nonumber \\ \xi _6\Big (t_{li}^*;\alpha ,\beta \Big )&=E\Big [{t_{li}^*}^\alpha \ln (t_{li}^*) | t_{li}^*<s_i,\alpha ,\beta \Big ]=\frac{\int _{0}^{s_i} \ln (u) \alpha \beta u^{2 \alpha -1} \exp \left\{ -\beta u^{\alpha -1}\right\} du}{1-\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&=\frac{ I_3(s_i)-\ln (\beta )\Big (1-(1+\beta s_i^{\alpha }) \exp \{-\beta s_i^{\alpha }\}\Big )}{\alpha \beta \Big (1-\exp \{-\beta s_i^{\alpha }\}\Big )}, \nonumber \\ \xi _7\Big (t_{li}^*;\alpha ,\beta \Big )&=E\Big [{t_{li}^*}^\alpha (\ln (t_{li}^*))^2 | t_{li}^*<s_i,\alpha ,\beta \Big ]=\frac{\int _{0}^{s_i} \Big (\ln (u)\Big )^2 \alpha \beta u^{2 \alpha -1} \exp \left\{ -\beta u^{\alpha -1}\right\} du}{1-\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&= \frac{ I_4(s_i)+\Big (\ln (\beta )\Big )^2\Big (1-(1+\beta s_i^{\alpha }) \exp \{-\beta s_i^{\alpha }\}\Big ) -2\ln (\beta ) I_3(s_i)}{\alpha \beta ^2 \Big (1-\exp \{-\beta s_i^{\alpha }\}\Big )}, \nonumber \\ \xi _8\Big (t_{ri}^*;\alpha ,\beta \Big )&=E\Big [{t_{ri}^*}^\alpha \ln (t_{ri}^*) | t_{ri}^*>s_i,\alpha ,\beta \Big ]=\frac{\int _{s_i}^{\infty } \ln (u) \alpha \beta u^{2 \alpha -1} \exp \left\{ -\beta u^{\alpha -1}\right\} du}{\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&=\frac{ I_5(s_i)-\ln (\beta )\Big ((1+\beta s_i^{\alpha }) \exp \{-\beta s_i^{\alpha }\}\Big )}{\alpha \beta \exp \{-\beta s_i^{\alpha }\}}, \nonumber \\ \xi _9\Big (t_{ri}^*;\alpha ,\beta \Big )&=E\Big [{t_{ri}^*}^\alpha \Big (\ln (t_{ri}^*)\Big )^2 | t_{ri}^*>s_i,\alpha ,\beta \Big ]=\frac{\int _{s_i}^{\infty } \Big (\ln (u)\Big )^2 \alpha \beta u^{2 \alpha -1} \exp \left\{ -\beta u^{\alpha -1}\right\} du}{\exp \{-\beta s_i^{\alpha }\}} \nonumber \\&=\frac{ I_6(s_i)+\Big (\ln (\beta )\Big )^2\Big ((1+\beta s_i^{\alpha }) \exp \{-\beta s_i^{\alpha }\}\Big )-2\ln (\beta ) I_5(s_i)}{\alpha \beta ^2\exp \{-\beta s_i^{\alpha }\}}, \end{aligned}$$

where,

$$\begin{aligned}&I_1(z)=\int _{0}^{\beta z^{\alpha }}\ln (z) \exp \{-z\}dz, \quad I_2(z)=\int _{\beta z^{\alpha }}^{\infty }\ln (z) \exp \{-z\}dz, \\&I_3(z)=\int _{0}^{\beta z^{\alpha }}z \ln (z) \exp \{-z\}dz, \quad I_4(z)=\int _{0}^{\beta z^{\alpha }}z \Big (\ln (z)\Big )^2 \exp \{-z\}dz, \\&I_5(z)=\int _{\beta z^{\alpha }}^{\infty }z \ln (z) \exp \{-z\}dz, \quad I_6(z)=\int _{\beta z^{\alpha }}^{\infty }z \Big (\ln (z)\Big )^2 \exp \{-z\}dz. \end{aligned}$$

The expectation terms provided here are calculated using Monte Carlo integration techniques during simulation.

1.2 Simulation tables

See Tables 4, 5, 6, 7.

Table 4 Mean square error (MSE) and absolute bias (AB) for ML and Bayes methods under uniform monitoring points for varying sample sizes n
Table 5 Mean square error (MSE) and absolute bias (AB) for ML and Bayes methods under exponential monitoring points for varying sample sizes n
Table 6 Average length (AL), shape and coverage probability (CP) for ML and Bayes methods under uniform monitoring points for varying sample sizes n
Table 7 Average length (AL), shape and coverage probability (CP) for ML and Bayes methods under exponential monitoring points for varying sample sizes n

1.3 MCMC convergence diagnostics

See Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10.

Fig. 2
figure 2

MCMC trace plots, ACF and densities of the posterior samples under gamma prior for menarche data

Fig. 3
figure 3

MCMC trace plots, ACF and densities of the posterior samples under gamma prior\(^1\) for menarche data

Fig. 4
figure 4

MCMC trace plots, ACF and densities of the posterior samples under log-normal prior for menarche data

Fig. 5
figure 5

MCMC trace plots, ACF and densities of the posterior samples under gamma prior for breastfeeding data at Level I

Fig. 6
figure 6

MCMC trace plots, ACF and densities of the posterior samples under gamma prior\(^1\) for breastfeeding data at Level I

Fig. 7
figure 7

MCMC trace plots, ACF and densities of the posterior samples under gamma prior for breastfeeding data at Level II

Fig. 8
figure 8

MCMC trace plots, ACF and densities of the posterior samples under gamma prior\(^1\) for breastfeeding data at Level II

Fig. 9
figure 9

MCMC trace plots, ACF and densities of the posterior samples under gamma prior for breastfeeding data at Level III

Fig. 10
figure 10

MCMC trace plots, ACF and densities of the posterior samples under gamma prior\(^1\) for breastfeeding data at Level III

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panwar, M.S., Barnwal, V. & Yadav, C.P. A latent variable approach for modeling recall-based time-to-event data with Weibull distribution. Comput Stat (2024). https://doi.org/10.1007/s00180-023-01444-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00180-023-01444-3

Keywords

Navigation