A bayesian solution to reconstructing centrally censored distributions

Article

Abstract

Bayesian methods are investigated for the reconstruction of mixtures in the case of central censoring. Earlier literature suggested that when the relationship between a continuous and a categorical variable is of interest, a cost-efficient strategy may be to measure the categorical variable only in the tails of the continuous distribution. Such samples occur in population epidemiology and gene mapping. Because central observations are not classified, the mixture component to which each observation belongs is not known. Three cases of censoring, which correspond to differing amounts of available information, are compared. Closed form solutions are not available and so Markov chain Monte Carlo techniques are employed to estimate posterior densities. Evidence for a mixture of two populations is assessed via Bayes factors calculated using a Laplace-Metropolis estimator. Although parameter estimates appear to be satisfactory in most situations, evidence of two populations is only found when the component populations are well separated, tail sizes are not too small, or typing information is available. Extension of these methods to incorporate fixed effects is illustrated by application to a cattle breeding experiment.

Key Words

Bayes factor Finite mixture Gibbs sampler Laplace-Metropolis estimator Markov chain Monte Carlo Selective genotyping 

References

  1. Besag, J., Green, P., Higdon, P. J., and Mengersen, K. (1995), “Bayesian Computation and Stochastic Systems” (with discussion), Statistical Science, 10, 3–66.MATHCrossRefMathSciNetGoogle Scholar
  2. Best, N., Cowles, M. K., and Vines, K. (1995), CODA Convergence Diagnosis and Output Software for Gibbs Sampling Output Version 0.30, Cambridge, MA: MRC Biostatistics Unit.Google Scholar
  3. Carlin, B. P., and Chib, S. (1995), “Bayesian Model Choice via Markov Chain Monte Carlo Methods,” Journal of the Royal Statistical Society, Ser. B, 57, 473–484.MATHGoogle Scholar
  4. Carlin, B. P., and Louis, T. A. (2000), Bayes and Empirical Bayes Methods for Data Analysis (2nd ed.), London: Chapman and Hall/CRC Press.MATHGoogle Scholar
  5. Celeux, G., Hurn, M., and Robert, C. (2000), “Computational and Inferential Difficulties With Mixture Posterior Distributions,” Journal of the American Statistical Association, 95, 957–970.MATHCrossRefMathSciNetGoogle Scholar
  6. Cohen, A. C. (1991), Truncated and Censored Samples Theory and Applications, New York: Marcel Dekker.MATHGoogle Scholar
  7. Contreras-Cristan, A., Gutierrez-Pena, E., and OReilly, F. (2003), “Inference Using Latent Variables for Mixtures of Distributions for Censored Data with Partial Identification,” Communications in Statistics—Theory and Methods, 32, 749–774.MATHCrossRefMathSciNetGoogle Scholar
  8. Darvasi, A., and Soller, M. (1992), “Selective Genotyping for Determination of Linkage Between a Marker Locus and a Quantitative Trait Locus,” Theoretical and Applied Genetics, 85, 353–359.CrossRefGoogle Scholar
  9. David, H. A. (1970), Order Statistics, New York: Wiley.MATHGoogle Scholar
  10. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm” (with discussion), Journal of the Royal Statistical Society, Ser. B, 39, 1–38.MATHMathSciNetGoogle Scholar
  11. Gelman, A., and Rubin, B. D. (1992), “Inference From Iterative Simulation Using Multiple Sequences,” Statistical Science, 7, 457–511.CrossRefGoogle Scholar
  12. Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.MATHCrossRefGoogle Scholar
  13. Geweke, J. (1992), “Evaluating the Accuracy of Sampling Based Approaches to Calculating Posterior Moments,” in Bavesian Statistics 4, eds. J. M. Bernado, J. O. Berger, A. P. David, and A. F. M. Smith, Cambridge, MA: Oxford University Press.Google Scholar
  14. Gilks, W., Richardson, S., and Spiegelhalter, D. (1996), Markov Chain Monte Carlo in Practice, London: Chapman Hall.MATHGoogle Scholar
  15. Green, P. J. (1995), “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,” Biometrika, 82, 711–732.MATHCrossRefMathSciNetGoogle Scholar
  16. Hastings, W. K. (1970), “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, 57, 97–109.MATHCrossRefGoogle Scholar
  17. Heath, S. C. (1997), “Markov Chain Monte Carlo Segregation and Linkage Analysis for Oligenic Models,” American Journal of Human Genetics, 61, 748–760.CrossRefGoogle Scholar
  18. Heidelberger, P., and Welch, P. (1983), “Simulation Run Length Control in the Presence of an Initial Transient,” Operations Research, 31, 1109–1144.MATHCrossRefGoogle Scholar
  19. Hsiao, C. K. (1997), “Approximate Bayes Factors When a Mode Occurs on the Boundary,” Journal of the American Statistical Association, 92, 656–663.MATHCrossRefMathSciNetGoogle Scholar
  20. Ihaka, R., and Gentleman, R. (1996), “R: A Language for Data Analysis and Graphics,” Journal of Computational and Graphical Statistics, 5, 299–314.CrossRefGoogle Scholar
  21. Lander, E. S., and Botstein, D. (1989), “Mapping Mendelian Factors Underlying Quantitative Traits using RFLP Linkage Maps,” Genetics, 121, 185–199.Google Scholar
  22. Lebowitz, R. J., Soller, M., and Beckmann, J. S. (1987), “Trait-Based Analyses for the Detection of Linkage Between Marker Loci and Quantitative Trait Loci in Crosses Between Inbred Lines,” Theoretical and Applied Genetics, 73, 556–562.CrossRefGoogle Scholar
  23. Lee, S., Park, S. H., and Park, J. (2003), “The Proportional Hazards Regression With a Censored Covariate,” Statistics and Probability Letters, 61, 309–319.MATHCrossRefMathSciNetGoogle Scholar
  24. McLachlan, G. J., and Jones, P. N. (1988), “Fitting Mixture Models to Grouped and Truncated Data via the EM Algorithm,” Biometrics, 44, 571–578.MATHCrossRefGoogle Scholar
  25. McLaren, C. E., Wagstaff, M., Brittenham, G. M., and Jacobs, A. (1991), “Detection of Two-Component Mixtures of Lognormal Distributions in Grouped, Doubly Truncated Data: Analysis of Red Blood Cell Volume Distributions,” Biometrics 47, 607–622.CrossRefGoogle Scholar
  26. Mengersen, K. L., and Robert, C. P. (1996), “Testing for Mixtures: A Bayesian, Entropic Approach,” in Bavesian Statistics 5, eds. J. M. Bernando, J. O. Berger, A. P. Dawid, A. F. M. Smith, Cambridge, MA: Oxford University Press, pp. 225–276.Google Scholar
  27. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953), “Equations of State Calculations by Fast Computing Machines,” Journal of Chemical Physics, 21, 1087–1092.CrossRefGoogle Scholar
  28. Muranty, H., and Goffinet, B. (1997), “Selective Genotyping for Location and Estimation of the Effect of a Quantitative Trait Locus,” Biometrics, 53, 629–643.MATHCrossRefGoogle Scholar
  29. Ord, K., and Bagchi, U. (1983), “The Truncated Normal-Gamma Mixture as a Distribution for Lead Time Demand,” Naval Research Logistics Quarterly, 30, 359–365.MATHCrossRefGoogle Scholar
  30. Pack, S. E., and Morgan, B. J. T. (1990), “A Mixture Model for Interval-Censored Time-to-Response Quantal Assay Data,” Biometrics, 46, 749–757.CrossRefGoogle Scholar
  31. Payne, R. W., et al. (1993), Genstat 5, Release 3 Reference Manual, Oxford: Oxford University Press.Google Scholar
  32. Pettitt, A. N. (1985), “Re-weighted Least Squares Estimation with Censored and Grouped Data: An Application of the EM Algorithm,” Journal of the Royal Statistical Society, Ser. B. 47, 253–260.MathSciNetGoogle Scholar
  33. Raftery, A. E. (1996), “Hypothesis Testing and Model Selection,” in Markov Chain Monte Carlo in Practice, eds. W. J. Gilks, S. Richardson, and D. J. Spiegelhalter London: Chapman and Hall, pp. 163–188.Google Scholar
  34. Raftery, A. L., and Lewis, S. (1992), “How Many Iterations in the Gibbs Sampler?” in Bayesian Statistics 4, eds. J. M. Bernado, J. O. Berger, A. P. David, and A. F. M. Smith, Oxford: Oxford University Press, p. 763–774.Google Scholar
  35. Richardson, S., and Green, P. J. (1997), “On Bayesian Analysis of Mixtures With an Unknown Number of Components,” Journal of the Royal Statistical Society, Ser. B, 50, 731–792.CrossRefMathSciNetGoogle Scholar
  36. Robert, C. (1996), “Mixtures of Distributions: Inference and Estimation,” in Markov Chain Monte Carlo in Practice, eds. W. Gilks, S. Richardson, and D. Spiegelhalter, London: Chapman and Hall.Google Scholar
  37. Robert, C. P. (1994), The Bayesian Choice, New York: Springer.MATHGoogle Scholar
  38. Robert, C. P., and Casella, G. (1999), Monte Carlo Statistical Methods, New York: Springer Verlag.MATHGoogle Scholar
  39. Satagopan, J. M., Yandell, B. S., Newton, M. A., and Osborn, T. C. (1996), “A Bayesian Approach to Detect Quantitative Trait Loci using Markov Chain Monte Carlo,” Genetics, 144, 805–816.Google Scholar
  40. Schneider, H. (1988), Truncated and Censored Samples from Normal Populations, New York: Marcel Dekker.Google Scholar
  41. Sillanpää, M. J., and Arjas, E. (1998), “Bayesian Mapping of Multiple Quantitative Trait Loci from Incomplete Line Cross Data,” Genetics, 148, 1373–1388.Google Scholar
  42. Smith, A. F. M., and Roberts, G. O. (1993), “Bayesian Computation via the Gibbs Sampler and Related Markov Monte Carlo Methods,” Journal of the Royal Statistical Society, Ser. B, 55, 3–23.MATHMathSciNetGoogle Scholar
  43. Smith, M. D., and Moffatt, P. G. (1999), “Fisher’s Information on the Correlation Coefficient in Bivariate Logistic Models,” Australian and New Zealand Journal of Statistics, 41, 315–330.MATHCrossRefMathSciNetGoogle Scholar
  44. Spiegelhalter, D., Thomas, A., Best, N., and Gilks, W. (1995), BUGS. Bayesian inference Using Gibbs Sampling, Version 0.50, Cambridge: MRC Biostatistics Unit.Google Scholar
  45. Stephens, D. A., and Fisch, R. D. (1998), “Bayesian Analysis of Quantitative Trait Locus Data Using Reversible Jump Markov Chain Monte Carlo,” Biometrics, 54, 1334–1347.MATHCrossRefGoogle Scholar
  46. Stephens, D. A., and Smith, A. F. M. (1993), “Bayesian Inference in Multipoint Gene Mapping,” Annals of Human Genetics, 57, 65–82.CrossRefGoogle Scholar
  47. Stephens, M. (2000a), “Bayesian Analysis of Mixtures With an Unknown Number of Components—An Alternative to Reversible Jump Methods,” The Annals of Statistics, 28, 40–74.MATHCrossRefMathSciNetGoogle Scholar
  48. — (2000b), “Dealing With Label-Switching in Mixture Models,” Journal of the Royal Statistical Society, Ser. B, 62, 795–809.MATHCrossRefMathSciNetGoogle Scholar
  49. Tanner, M. A. (1993), Tools for Statistical Inference (2nd ed.), New York: Springer-Verlag.MATHGoogle Scholar
  50. Tanner, M. A., and Wong, W. H. (1987), “The Calculation of Posterior Distributions by Data Augmentation” (with discussion), Journal of the American Statistical Association, 82, 528–550.MATHCrossRefMathSciNetGoogle Scholar
  51. Tweedie, R. L., and Mengersen, K. (1996), “Rates of Convergence of the Hastings and Metropolis Algorithms,” The Annals of Statistics, 24, 101–121.MATHCrossRefMathSciNetGoogle Scholar
  52. Uimari, P., and Sillanpää, M. J. (2001), “Bayesian Oligogenic Analysis of Quantitative and Qualitative Traits in General Pedigrees,” Genetic Epidemiology, 21, 224–242.CrossRefGoogle Scholar
  53. Vogl, C., and Xu, S. (2002), “Qtl Analysis in Arbitrary Pedigrees with Incomplete Marker Information,” Heredity, 89, 339–345.CrossRefGoogle Scholar
  54. Wang, Q. H., and Li, G. (2002), “Empirical Likelihood Semiparametric Regression Analysis Under Random Censorship,” Journal of Multivariate Analysis, 83, 469–486.MATHCrossRefMathSciNetGoogle Scholar
  55. Yi, N. J., and Xu, S. Z. (2002), “Linkage Analysis of Quantitative Trait Loci in Multiple Line Crosses,” Genetica, 114, 217–230.CrossRefGoogle Scholar

Copyright information

© International Biometric Society 2005

Authors and Affiliations

  1. 1.CSIRO Mathematical and Information SciencesQueensland Bioscience PrecinctST LUCIAAustralia
  2. 2.School of Mathematical SciencesQueensland University of TechnologyBrisbaneAustralia
  3. 3.AlbionAustralia

Personalised recommendations