Journal of Statistical Theory and Practice

, Volume 8, Issue 3, pp 444–459 | Cite as

Out-of-Sample Fusion in Risk Prediction

  • Myron Katzoff
  • Wen Zhou
  • Diba Khan
  • Guanhua Lu
  • Benjamin KedemEmail author


The probability that mortality from certain causes exceeds high thresholds is addressed. An out-of-sample fusion method is presented where an original real data sample is fused or combined with independent computer-generated samples in the estimation of exceedance probabilities assuming a density ratio model. Since the size of the combined sample of real and artificial data is larger than that of the real sample, the fused sample produces short confidence intervals relative to traditional methods. Numerical results show that the method maintains good coverage even for some misspecified cases.


Mortality Density ratio model Threshold probabilities Tilt Semiparametric Coverage 

AMS Subject Classification

Primary: 62F40 Secondary: 62F25 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agresti, A., and B. A. Coull. 1998. Approximate is better than “exact” for interval estimation of binomial proportions. Am. Stat., 52, 119–126.MathSciNetGoogle Scholar
  2. Brown, L. D., T. T. Cai, and A. DasGupta. 2001. Interval estimation for a binomial proportion. Stat. Sci., 16, 101–133.MathSciNetzbMATHGoogle Scholar
  3. Cheng, K. F., and C. K. Chu. 2004. Semiparametric density estimation under a two-sample density ratio model. Bernoulli, 10(4), 583–604.MathSciNetCrossRefGoogle Scholar
  4. Efron, B., and R. Tibshirani. 1996. Using specially designed exponential families for density estimation. Ann. Stat., 24, 2431–2461.MathSciNetCrossRefGoogle Scholar
  5. Fokianos, K. 2004. Merging information for semiparametric density estimation. J. R. Stat. Soc. Ser. B, 66, 941–958.MathSciNetCrossRefGoogle Scholar
  6. Fokianos, K., and I. Kaimi. 2006. On the effect of misspecifying the density ratio model. Ann. Inst. Stat. Math., 58, 475–497.MathSciNetCrossRefGoogle Scholar
  7. Fokianos, K., B. Kedem, J. Qin, and D. Short. 2001. A semiparametric approach to the one-way layout. Technometrics, 43, 56–65.MathSciNetCrossRefGoogle Scholar
  8. Fokianos, K., and J. Qin. 2008. A note on Monte Carlo maximization by the density ratio model. J. Stat. Theory Pract., 2, 355–367.MathSciNetCrossRefGoogle Scholar
  9. Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 2008. Bayesian data analysis, 2nd ed. New York, NY: Chapman and Hall/CRC.zbMATHGoogle Scholar
  10. Gilbert, P. B. 2000. Large sample theory of maximum likelihood estimates in semiparametric biased sampling models. Ann. Stat., 28, 151–194.MathSciNetCrossRefGoogle Scholar
  11. Gilbert, P. B., S. R. Lele, and Y. Vardi. 1999. Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika, 86, 27–43.MathSciNetCrossRefGoogle Scholar
  12. Kedem, B., and R. Gagnon. 2010. Semiparametric distribution forecasting. J. Stat. Plan. Inference, 140, 3734–3741.MathSciNetCrossRefGoogle Scholar
  13. Kedem, B., D. Wolff, and K. Fokianos. 2004. Statistical comparison of algorithms, IEEE Trans. Instrumentation Measure., 53, 770–776.CrossRefGoogle Scholar
  14. Kedem, B., G. Lu, R. Wei, and P. D. Williams. 2008. Forecasting mortality rates via density ratio modeling. Can. J. Stat., 36, 193–206.MathSciNetCrossRefGoogle Scholar
  15. Kedem, B., E.-Y. Kim, A. Voulgaraki, and B. I. Graubard. 2009. Two-dimensional semiparametric density ratio modeling of testicular germ cell data. Stat. Med., 28, 2147–2159.MathSciNetCrossRefGoogle Scholar
  16. Lu, G. 2007. Asymptotic theory for multiple-sample semiparametric density ratio models and its application to mortality forecasting. PhD dissertation, Department of Mathematics, University of Maryland, College Park, MD.Google Scholar
  17. Qin, J., and B. Zhang. 1997. A goodness of fit test for logistic regression models based on case-control data. Biometrika, 84, 609–618.MathSciNetCrossRefGoogle Scholar
  18. Qin, J., and B. Zhang. 2005. Density estimation under a two-sample semiparametric model. Nonparametric Stat., 17, 665–683.MathSciNetCrossRefGoogle Scholar
  19. Vardi, Y. 1982. Nonparametric estimation in the presence of length bias. Ann. Stat., 10, 616–620.MathSciNetCrossRefGoogle Scholar
  20. Vardi, Y. 1985. Empirical distribution in selection bias models. Ann. Stat., 13, 178–203.MathSciNetCrossRefGoogle Scholar
  21. Venables, W. N., and B. D. Ripley. 2002. Modern applied statistics with S, 4th ed. New York, NY: Springer.CrossRefGoogle Scholar
  22. Voulgaraki, A., B. Kedem, and B. I. Graubard. 2012. Semiparametric regression in testicular germ cell data. Ann. Appl. Stat., 6, 1185–1208.MathSciNetCrossRefGoogle Scholar
  23. Zhang, B. 2000. A goodness of fit test for multiplicative-intercept risk models based on case-control data. Stat. Sin., 10, 839–865.MathSciNetzbMATHGoogle Scholar

Copyright information

© Grace Scientific Publishing 2014

Authors and Affiliations

  • Myron Katzoff
    • 1
  • Wen Zhou
    • 2
  • Diba Khan
    • 1
  • Guanhua Lu
    • 2
  • Benjamin Kedem
    • 1
    • 2
    Email author
  1. 1.CDC/National Center for Health StatisticsHyattsvilleUSA
  2. 2.Department of MathematicsUniversity of MarylandCollege ParkUSA

Personalised recommendations