Abstract
Longitudinal data often suffer from heavy-tailed errors and outliers, which can significantly reduce efficiency and lead to invalid inferences. Robust techniques are essential, especially in joint mean-covariance modeling, as the estimation of the covariance matrix is more sensitive to heavy-tailed errors and outliers than the estimation of the mean. Motivated by the modified Cholesky decomposition of the covariance matrix, we propose a novel semiparametric method that uses robust techniques to simultaneously estimate the mean, autoregressive coefficients, and innovation variance. We provide a practical algorithm for this method and investigate the asymptotic properties of the mean and covariance estimators. Numerical simulations demonstrate that the proposed method is efficient and stable when the dataset is contaminated with outliers and heavy-tailed errors. The new robust technique yields statistically interpretable inferences in real data analysis, whereas traditional approaches fail to provide any acceptable inferences.
Similar content being viewed by others
References
Avella-Medina, M., Battey, H. S., Fan, J., & Li, Q. (2018). Robust estimation of high-dimensional covariance and precision matrices. Biometrika, 105, 271–284.
Barron, J. T. (2019). A general and adaptive robust loss function. In Conference on computer vision and pattern recognition.
Chen, Z., & Dunson, D. B. (2003). Random effects selection in linear mixed models. Biometrics, 59, 762–769.
Chen, Z., Tang, M. L., & Gao, W. (2018). A profile likelihood approach for longitudinal data analysis. Biometrics, 74, 220–228.
Chen, Z., Tang, M. L., Gao, W., & Shi, N. Z. (2014). New robust variable selection methods for linear regression models. Scandinavian Journal of Statistics, 411, 725–741.
Dallakyan, A., & Pourahmadi, M. (2022). Fused-lasso regularized Cholesky factors of large nonstationary covariance matrices of replicated time series. Journal of Computational and Graphical Statistics, 0, 1–14.
Diggle, P. J., Heagerty, P., Liang, K. Y., & Zeger, S. (2002). Analysis of longitudinal data (2nd ed.). Oxford University Press.
Dockery, D. W., Berkey, C. S., Ware, J. H., Speizer, F. E., & Ferris, B. G., Jr. (1983). Distribution of forced vital capacity and forced expiratory volume in one second in children 6 to 11 years of age. American Review of Respiratory Disease, 128, 405–412.
Fan, Y., Qin, G., & Zhu, Z. (2012). Variable selection in robust regression models for longitudinal data. Journal of Multivariate Analysis, 109, 156–167.
Fan, J., Wang, W., & Zhong, Y. (2019). Robust covariance estimation for approximate factor models. Journal of Econometrics, 208, 5–22.
Ferrari, D., & Yang, Y. (2010). Maximum \(L_q\)-likelihood estimation. The Annals of Statistics, 38, 753–783.
Goes, J., Lerman, G., & Nalder, B. (2020). Robust sparse covariance estimation by thresholding Tyler’s M-estimator. The Annals of Statistics, 48, 86–110.
Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73–101.
Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. The Annals of Statistics, 1, 799–821.
Huber, P. J. (1981). Robust Statistics. Wiley Press.
Ke, Y., Minsker, S., Ren, Z., Sun, Q., & Zhou, W. X. (2019). User-friendly covariance estimation for heavy-tailed distributions. Statistical Science, 34, 454–471.
Leng, C., Zhang, W., & Pan, J. (2010). Semiparametric mean-covariance regression analysis for longitudinal data. Journal of the American Statistical Association, 105, 181–193.
Leung, D., Wang, Y., & Zhu, M. (2009). Efficient parameter estimation in longitudinal data analysis using a hybrid gee method. Biostatistics, 10, 436–455.
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Li, D., & Pan, J. (2013). Empirical likelihood for generalized linear models with longitudinal data. Journal of Multivariate Analysis, 114, 63–73.
Maronna, R. A., Martin, R. D., Yohai, V. J., & Salibiáin-Barrera, M. (2018). Robust statistics: Theory and methods (with R) (2nd ed.). Wiley Press.
Newey, W. K., & Smith, R. J. (2004). Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica, 72, 219–255.
Pan, J., Ye, H., & Li, R. (2009). Nonparametric regression of covariance structures in longitudinal studies. Manchester Institute for Mathematical Sciences, University of Manchester.
Peña, D., & Prieto, F. J. (2001). Multivariate outlier detection and robust covariance matrix estimation. Technometrics, 43, 286–310.
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika, 86, 677–690.
Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika, 87, 425–435.
Pourahmadi, M. (2007). Cholesky decompositions and estimation of a covariance matrix: Orthogonality of variance-correlation parameters. Biometrika, 94, 1006–1013.
Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives. Statistical Science, 26, 369–387.
Qu, A., Lindsay, B. G., & Li, B. (2000). Improving estimating equations using quadratic inference functions. Biometrika, 87, 823–836.
Sowers, M., Randolph, J. F., Jr., Crutchfield, M., Jannausch, M. L., Shapiro, B., Zhang, B., & La Pietra, M. (1998). Urinary ovarian and gonadotropin hormone levels in premenopausal women with low bone mass. Journal of Bone and Mineral Research, 13, 1191–1202.
Tang, C. Y., Zhang, W., & Leng, C. (2019). Discrete longitudinal data modeling with a mean-correlation regression approach. Statistica Sinica, 29(2), 853–876.
Tsybakov, A. B. (2009). Introduction to nonparametric estimation. Springer.
Wang, Y. G., & Carey, V. (2003). Working correlation structure misspecification, estimation and covariate design: Implications for generalised estimating equations performance. Biometrika, 90, 29–41.
Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831–844.
Xu, L., Xiang, S., & Yao, W. (2019). Robust maximum \(L_q\)-likelihood estimation of joint mean-covariance models for longitudinal data. Journal of Multivariate Analysis, 171, 397–411.
Ye, H., & Pan, J. (2006). Modelling of covariance structures in generalised estimating equations for longitudinal data. Biometrika, 93, 927–941.
Zhang, W., & Leng, C. (2012). A moving average Cholesky factor model in covariance modelling for longitudinal data. Biometrika, 99, 141–150.
Zhang, W., Leng, C., & Tang, C. Y. (2015). A joint modelling approach for longitudinal studies. Journal of the Royal Statistical Society: Series B, 77, 219–238.
Acknowledgements
The author Ran is sincerely appreciated Prof. Kano Yutaka and Dr. Morikawa Kosuke of the graduate school of engineering science at Osaka University for their generous personalities and numerous help to the Ran’s life during a tough time.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ran, M., Yang, Y. & Kano, Y. Robust semiparametric modeling of mean and covariance in longitudinal data. Jpn J Stat Data Sci 6, 625–648 (2023). https://doi.org/10.1007/s42081-023-00204-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-023-00204-3