Abstract
Analysis of data with nonignorable nonresponse is an important and challenging task. Although some methods have been developed for inference under nonignorable nonresponse, they are only available for independent data. In this paper, we develop a two-stage propensity score adjustment method to estimate longitudinal time series models with nonignorable missingness. In particular, the response probability or propensity score is first estimated via solving the mean score equation based on the observed sample. Then, the inverse propensity scores are employed to conduct weighting adjustment for a composite likelihood based estimation. The propensity scores weighted estimation equations are shown to yield consistent and asymptotic normal estimators. Simulation studies and application to AIDS Clinical Trial data are presented to evaluate the performance of the proposed method.
Similar content being viewed by others
References
Bahari F, Parsi S, Ganjali M (2019) Empirical likelihood inference in general linear model with missing values in response and covariates by MNAR mechanism. Stat Pap. https://doi.org/10.1007/s00362-019-01103-0
Bickel PJ, Doksum KA (1977) Mathematical statistics: basic ideas and selected topics. Holden-Day, San Francisco
Binder DA (1983) On the variances of asymptotically normal estimators from complex surveys. Int Stat Rev 51(3):279–292
Brockwell PJ, Davis RA (1991) Time series: theory and methods, 2nd edn. Springer, New York
Davis RA, Yau CY (2011) Comments on pairwise likelihood in time series models. Stat Sin 21(1):255–277
Da Silva DN, Opsomer JD (2009) Nonparametric propensity weighting for survey nonresponse through local polynomial regression. Surv Methodol 35(2):165–176
Da Silva DN, Opsomer JD (2006) A kernel smoothing method of adjusting for unit non-response in sample surveys. Can J Stat 34(4):563–579
Eideh AAH, Nathan G (2006) Fitting time series models for longitudinal survey data under informative sampling. J Stat Plan Inference 136(9):3052–3069
Folsom, R. E. (1991). Exponential and logistic weight adjustments for sampling and nonresponse error reduction. Proceedings of the Social Statistics Section, American Statistical Association, pp 197–202
Härdle W (1990) Applied nonparametric regression. Cambridge University Press, Boston
Jiang DP, Zhao PY, Tang NS (2016) A propensity score adjustment method for regression models with nonignorable missing covariates. Comput Stat Data Anal 94:98–119
Joe H, Lee Y (2009) On weighting of bivariate margins in pairwise likelihood. J Multivar Anal 100(4):670–685
Kim JK, Im J (2014) Propensity score adjustment with several follow-ups. Biometrika 101(2):439–448
Kim JK, Riddles MK (2012) Some theory for propensity-score-adjustment estimators in survey sampling. Surv Methodol 38(2):157–165
Kim JK (2011) Parametric fractional imputation for missing data analysis. Biometrika 98:119–132
Kim JK, Kim JJ (2007) Nonresponse weighting adjustment using estimated response probability. Can J Stat 35(4):501–514
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Little RJA (1986) Survey nonresponse adjustments for estimates of means. Int Stat Rev 54(2):139–157
Little RJA (1988) Missing-data adjustments in large surveys. J Bus Econ Stat 6(3):287–296
Liu Q, Pierce DA (1994) A note on Gauss-Hermite quadrature. Biometrika 81(3):624–629
Liu T, Yuan X (2018) Doubly robust augmented-estimating-equations estimation with nonignorable nonresponse data. Stat Pap. https://doi.org/10.1007/s00362-018-1046-5
Ng CT, Joe H, Karlis D, Liu J (2011) Composite likelihood for time series models with a latent autoregressive process. Stat Sin 21(1):279–305
Priya RD, Kuppuswami S, Sivaraj R (2015) Bayesian based inference of missing time series values using Genetic Algorithm. Int J Hybrid Intell Syst 12:77–87
Qin J, Leung D, Shao J (2002) Estimation with survey data under nonignorable nonresponse or informative sampling. J Am Stat Assoc 97(457):193–200
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55
Riddles MK, Kim JK, Im J (2016) A propensity-score-adjustment method for nonignorable nonresponse. J Surv Stat Methodol 4(2):215–245
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Stubbendick AL, Ibrahim JG (2006) Likelihood-based inference with nonignorable missing responses and covariates in models for discrete longitudinal data. Stat Sin 16:1143–1167
Shao J, Wang L (2016) Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103(1):175–187
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Tang NS, Zhao PY, Zhu HT (2014) Empirical likelihood for estimating equations with nonignorably missing data. Stat Sin 24:723–747
Tseng CH, Elashoff R, Li N, Li G (2016) Longitudinal data analysis with non-ignorable missing data. Stat Methods Med Res 25(1):205–220
Van Der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Vasdekis VGS, Rizopoulos D, Moustaki I (2014) Weighted pairwise likelihood estimation for a general class of random effects models. Biostatistics 15(4):677–689
Wang L, Qi CC, Shao J (2019) Model-assisted regression estimators for longitudinal data with nonignorable dropout. Int Stat Rev 87(S1):S121–S138
Yuan Y, Yin GS (2010) Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66:105–114
Zhang H, Paik MC (2009) Handling missing responses in generalized linear mixed model without specifying missing mechanism. J Biopharm Stat 19(6):1001–1017
Zhang GY, Yuan Y (2012) Bayesian modelling longitudinal dyadic data with nonignorable dropout, with application to a breast cancer study. Ann Appl Stat 6(2):753–771
Zhang W, Xie F, Tan J (2020) A robust joint modeling approach for longitudinal data with informative dropouts. Comput Stat. https://doi.org/10.1007/s00180-020-00972-6
Zhao PY, Wang L, Shao J (2018) Analysis of longitudinal data under nonignorable nonmonotone nonresponse. Stat Interface 11(2):265–279
Zhou M, Kim JK (2012) An efficient method of estimation for longitudinal surveys with monotone missing data. Biometrika 99:631–648
Acknowledgements
The authors thank the Editor, the Associate Editor and referees for their constructive comments. The collaborative work described in this paper was supported by HKSAR-RGC-GRF Nos 14305517, 14601015 and 14302719 (Yau) and National Social Science Foundation of China, No. 18BTJ022 (Liu).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of the response model identifiability. For the response model identifiability, the mean score function \(\bar{S}(\eta )\) is not identifiable if there exist different \(\eta \) and \(\eta '\) such that \(\bar{S}(\eta )=\bar{S}(\eta ')\) for all \(x_{i}\), \(\delta _{i,t}\) and \(y_{i,t}\). To be specific, the mean score function \(\bar{S}(\eta )\) in (3.7) consists of two parts, \(\sum \nolimits _{i=1}^{n}\sum \nolimits _{t=2}^{T}\delta _{i,t}s_{y_{i,t}}(\eta )\) and
The first part is not identifiable if and only if two different \(\eta \) and \(\eta '\) give the same \(s_{y_{i,t}}(\eta )=\{\delta _{i,t}-\pi _{i,t}(\eta )\}\pi ^{-1}_{i,t}(\eta )\{1-\pi _{i,t}(\eta )\}^{-1}\partial \pi _{i,t} (\eta )/\partial \eta \) for all possible values of \(x_{i}\), \(\delta _{i,t}\) and \(y_{i,t}\), where \(\pi _{i,t} (\eta )\) is defined by (2.2). Here, we take the derivative of \(\pi _{i,t}(\eta )\) with respect to \(\eta _{1}\) for example, then we have
Suppose that there exist that \(\eta =(\eta _{1}, \eta _{2})\) and \(\eta '=(\eta '_{1}, \eta '_{2})\) such that
for all values of \(x_{i}\), \(\delta _{i,t}\) and \(y_{i,t}\). It is easy to see that \(\eta '_{1}=\eta _{1}\) must hold. Also, if \(\eta '_{1}=\eta _{1} \ne 0\), then by taking \(x_{i}\) sufficiently large or negative, we have \(\delta _{i,t}\exp (-\eta _{1}x_{i}-\eta _{2}y_{i,t})+\delta _{i,t}-1=\delta _{i,t}\exp (-\eta '_{1}x_{i}-\eta '_{2}y_{i,t})+\delta _{i,t}-1\). It follows that \(\eta '_{2}=\eta _{2}\). That is, the first part is identifiable.
The second part in (A1) is not identifiable if and only if two different \(\eta \) and \(\eta '\) give the same \(\pi ^{-1}_{y}(\eta )\partial \pi _{y}(\eta )/\partial \eta = \pi ^{-1}_{y}(\eta ')\partial \pi _{y}(\eta ')/\partial \eta \) and \(\pi ^{-1}_{y}(\eta )-1=\pi ^{-1}_{y}(\eta ')-1\) for all possible values of \(x_{i}\), \(\delta _{i,t}\) and \(y_{i,t}\), because \(\hat{f}(y|x_{i},\delta _{i,t}=1)\) in (A1) is obtained by (3.6) and not related to the parameter \(\eta \). Similar to the discussion in the first part, we have that two different \(\eta \) and \(\eta '\) cannot produce the equal second part. That is, the second part is also identifiable. Thus, the response model parameter \(\eta \) estimated by the mean score equation in (3.7) is identifiable.
Proof of Lemma 1
Note that
where \(z(x,y_{t};\eta )=\pi ^{-1}_{t}(\eta )\{1-\pi _{t}(\eta )\}^{-1}\partial \pi _{t}(\eta )/\partial \eta \), yielding (4.1). \(\square \)
Since \(u(\theta ; \eta )= \sum \nolimits _{k=1}^{K}\sum \nolimits _{t=k+1}^{T}\delta _{t-k}\delta _{t}\pi ^{-1}_{t-k}(\eta )\pi ^{-1}_{t}(\eta )U(\theta ;y_{t-k},y_{t})\), and the partial derivatives \(\partial U(\theta ;y_{t-k},y_{t})/ \partial \theta ^{T}\) of \(U(\theta ; y_{t-k},y_{t})\) with respect to \(\theta \) exist for \(k=1,\ldots , K\), the equation (4.2) obviously holds.
Proof of Theorem 1
Taking expectation on \(\bar{S}(\eta )\), we have
That is, \(\bar{S}(\eta )\) is unbiased. It follows from standard asymptotic theory that \(\widehat{\eta }\) is a consistent estimator of \(\eta _{0}\). \(\square \)
In fact, \(U_{PS}(\theta ; \widehat{\eta })\) is also unbiased because
Therefore, \(\widehat{\theta }_{PS}\) is a consistent estimator of \(\theta _{0}\).
Proof of Theorem 2
Let \(s(\eta )=\sum \limits _{t=2}^{T}\big \{\delta _{t}s(\eta ;\delta _{t},x,y_{t})+(1-\delta _{t})E[s(\eta ;\delta _{t},x,Y_{t})|x,\delta _{t}=0]\big \}\) and
We have
Since
the derivative \(\partial E[s(\eta ;\delta _{t},x,Y_{t})|x,\delta _{t}=0]/ \partial \eta ^{T}\) in the Eq. (A8) can be written as
By \(\partial O(x,y_{t};\eta )/ \partial \eta ^{T} = - (\partial \pi (x,y_{t};\eta )/ \partial \eta ^{T}) / \pi ^{2}(x,y_{t};\eta ) = - O(x,y_{t};\eta )z(x,y_{t};\eta )\), we obtain
Combining equations (A8) and (A11), we have
Since
and
we then have
Let \(\mathcal {I}_{11}(\eta )= - E\big (\partial s(\eta )/ \partial \eta ^{T} \big )\). Since \(\widehat{\eta }\) is the solution to the mean score equation \(\bar{S}(\eta )=0\) in (3.3), we have
where
\(\bar{s}_{0}(\eta ;x)=E[s(\eta ;\delta _{t},x,Y_{t})|x,\delta _{t}=0]\) and \(\bar{z}_{0}(\eta ;x)=E[z^{T}(x,Y_{t};\eta )|x,\delta _{t}=0]\), completing the proof of Theorem 2. \(\square \)
Proof of Theorem 3
Since \(\widehat{\psi } = (\widehat{\eta }, \widehat{\theta }_{PS})\) is the solution to
the variance of \(\widehat{\psi } = (\widehat{\eta }, \widehat{\theta }_{PS})\) can be obtained by
where \(\mathcal {I}=\mathcal {I}(\theta _{0}, \eta _{0})\) and
Further, we have
where \(\mathcal {I}_{11} = \mathcal {I}_{11}(\eta _{0})\), \(\mathcal {I}_{22} = \mathcal {I}_{22}(\theta _{0}, \eta _{0})\), \(\mathcal {I}_{21} = \mathcal {I}_{21}(\theta _{0}, \eta _{0})\). Combining equations (A19) and (A21), we have
It follows that
as \(n \rightarrow \infty \), where
That is, Theorem 3 holds. \(\square \)
Rights and permissions
About this article
Cite this article
Liu, Z., Yau, C.Y. A propensity score adjustment method for longitudinal time series models under nonignorable nonresponse. Stat Papers 63, 317–342 (2022). https://doi.org/10.1007/s00362-021-01261-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-021-01261-0