Abstract
We propose numerical and graphical methods for outlier detection in hierarchical Bayes modeling and analyses of repeated measures regression data from multiple subjects; data from a single subject are generically called a “curve”. The first-stage of our model has curve-specific regression coefficients with possibly autoregressive errors of a prespecified order. The first-stage regression vectors for different curves are linked in a second-stage modeling step, possibly involving additional regression variables. Detection of thestage at which the curve appears to be an outlier and themagnitude and specific component of the violation at that stage is accomplished by embedding the null model into a larger parametric model that can accommodate such unusual observations. We give two examples to illustrate the diagnostics, develop a BUGS program to compute them using MCMC techniques, and examine the sensitivity of the conclusions to the prior modeling assumptions.
Similar content being viewed by others
References
Becker, R. A., Cleveland, W. S. and Shyu, M.-J. (1996). The visual design and control of trellis display,Journal of Computational and Graphical Statistics,5, 123–155.
Berger, J. O. and Hui, S. L. (1983). Empirical Bayes estimation of rates in longitudinal studies,Journal of the American Statistical Association,78, 753–760.
Berlin, J. A., Santanna, J., Schmid, C. H., Szczech, L. A. and Feldman, H. I. (2002). Individual patient-versus group-level data meta-regressions for the investigation of treatment effect modifiers: Ecological bias rears its ugly head.Statistics in Medicine,21, 371–387.
Carota, C., Parmigiani, G. and Polson, N. G. (1996). Diagnostic measures for model criticism.Journal of the American Statistical Association,91, 753–762.
Chaloner, K. (1994). Residual analysis and outliers in Bayesian hierarchical models,Aspects of Uncertainty. A Tribute to D. V. Lindley (eds. P. R. Freeman and A. F. M. Smith), 149–157, Wiley, Chichester.
Chaloner, K. and Brant, R. (1988). A Bayesian approach to outlier detection and residual analysis,Biometrika,75, 651–659.
Chen, M.-H. and Schmeiser, B. (1993). Performance of the Gibbs, hit-and-run, and metropolis samplers,Journal of Computational and Graphical Statistics,2, 251–272.
Cnaan, A., Laird, N. M. and Slasor, P. (1997). Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data,Statistics in Medicine,16, 2349–2380.
Crowder, M. J. and Hand, D. J. (1990).Analysis of Repeated Measures, Chapman & Hall, New York.
Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities,Journal of the American Statistical Association,85, 398–409.
George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling,Journal of the American Statistical Association,88, 881–889.
George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection,Statistica Sinica,7, 339–374.
Ho, Y.-Y., Peruggia, M. and Santner, T. J. (1995). Diagnostics for hierarchical Bayesian repeated measures models,27th Symposium of the Interface: Computing Science and Statistics (eds. M. M. Meyer and J. L. Rosenberger), 387–391, Interface Foundation of North America, Fairfax Station, Virginia.
Hodges, J. S. (1998). Some algebra and geometry for hierarchical models, applied to diagnostics (with discussion),Journal of the Royal Statistical Society, Series B,60, 497–536.
Johnson, M. E., Moore, L. M. and Ylvisaker, D. (1990). Minimax and maximin distance designs,Journal of Statistical Planning and Inference,26, 131–148.
Jones, M. C. and Rice, J. A. (1992). Displaying the important features of large collections of similar curves,The American Statistician,46, 140–145.
Jones, R. H. and Boadi-Boateng, F. (1991). Unequally spaced longitudinal data with AR(1) serial correlation,Biometrics,47, 161–175.
Joseph, L., Wolfson, D. B., Belisle, P., Brooks, J. O. 3rd, Mortimer, J. A., Tinklenberg, J. R. and Yesavage, J. A. (1999). Taking account of between-patient variability when modeling decline in alzheimer’s disease,American Journal of Epidemiology,149, 963–973.
Justel, A. and Peña, D. (1996). Gibbs sampling will fail in outlier problems with strong masking,Journal of Computational and Graphical Statistics,5, 176–189.
Koehler, J. R. and Owen, A. B. (1996). Computer experiments,Handbook of Statistics (eds. S. Ghosh and C. R. Rao), 261–308, North Holland, Elsevier, Amsterdam.
Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data,Biometrics,38, 963–974.
Lambert, P. C., Abrams, K. R., Jones, D. R., Halligan, A. W. F. and Shennan, A. (2001). Analysis of ambulatory blood pressure monitor data using a hierarchical model incorporating restricted cubic splines and heterogeneous within-subject variances,Statistics in Medicine,20, 3789–3805.
Langford, I. H. and Lewis, T. (1998). Outliers in multilevel data (Disc: P153-160).Journal of the Royal Statistical Society, Series A, General,161, 121–153.
Lindsey, J. K. (1993).Models for Repeated Measurements, Clarendon Press, Oxford.
Lindstrom, M. J. and Bates, D. M. (1990). Nonlinear mixed effects models for repeated measures data,Biometrics,46, 673–687.
McKay, M. D., Beckman, R. J. and Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code,Technometrics,21, 223–245.
Palmer, J. L. and Müller, P. (1998). Bayesian optimal design in population models for haematologic data,Statistics in Medicine,17, 1613–1622.
Pauler, D. K. and Laird, N. M. (2000). A mixture model for longitudinal data with application to assessment of noncompliance,Biometrics,56, 464–472.
Pauler, D. K. and Laird, N. M. (2000). Non-linear hierarchical models for monitoring compliance,Statistics in Medicine,21, 219–229.
Peruggia, M., Santner, T. J., Ho, Y. Y. and Macmillan, N. J. (1994). A hierarchical Bayesian analysis of circular data with autoregressive errors: Modeling the mechanical properties of cortical bone,Statistical Decision Theory and Related Topics V (eds. S. S. Gupta and J. O. Berger), 201–220, Springer-Verlag, New York.
Pettit, L. I. and Smith, A. F. M. (1985). Outliers and influential observations in linear models,Bayesian Statistics II (eds. J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith), 473–494, North Holland, Elsevier, Amsterdam.
Robert, C. P. and Casella, G. (1999).Monte Carlo Statistical Methods, Springer-Verlag, New York.
Sacks, J., Welch, W. J., Mitchell T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments,Statistical Sciences,4, 409–423.
Segal, M. R. (1994). Representative curves for longitudinal data via regression trees,Journal of Computational and Graphical Statistics,3, 214–233.
Sharples, L. D. (1990). Identification and accommodation of outliers in general hierarchical models,Biometrika,77, 445–453.
Spiegelhalter, D. J. and Marshall, E. C. (1999). Inference-robust institutional comparisons: A case study of school examination results,Bayesian Statistics 6, Proceedings of the Sixth Valencia International Meeting (eds. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith), 613–630, Clarendon Press, Oxford.
Spiegelhalter, D. J., Thomas, A., Best, N. G. and Gilks, W. R. (1996)BUGS Bayesian Inference Using Gibbs Sampling, Version 0.5, (version ii), MRC Biostatistics Unit, Cambridge, U.K.
Tan, M., Qu, Y., Mascha, E. and Schubert, A. (1999). A Bayesian hierarchical model for multi-level repeated ordinal data: Analysis of oral practice examinations in a large anaesthesiology training programme,Statistics in Medicine,18, 1983–1992.
Verdinelli, I. and Wasserman, L. (1991). Bayesian analysis of outlier problems using the Gibbs sampler,Statistics and Computing,1, 105–117.
Wakefield, J. C., Smith, A. F. M., Racine-Poon, A. and Gelfand, A. E. (1994). Bayesian analysis of linear and non-linear population models by using the Gibbs sampler,Applied Statistics,43, 201–221.
Weisberg, S. (1983). Comment on “Developments in linear regression methodology: 1959–1982”,Technometrics,25, 240–244.
Weiss, R. E. (1995). Residuals and outliers in repeated measures random effects models, Tech. Report, Department of Biostatistics, UCLA School of Public Health.
Welch, W. J. (1985). ACED: Algorithms for the construction of experimental designs,The American Statistician,39, p. 146.
Zellner, A. (1975). Bayesian analysis of regression error terms,Journal of the American Statistical Association,70, 138–144.
Author information
Authors and Affiliations
About this article
Cite this article
Peruggia, M., Santner, T.J. & Ho, YY. Detecting stage-wise outliers in hierarchical Bayesian linear models of repeated measures data. Ann Inst Stat Math 56, 415–433 (2004). https://doi.org/10.1007/BF02530534
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02530534