1 Introduction

In a typical clinical trial, patients are randomly assigned to different groups with specific treatments; each patient is expected to receive that treatment throughout follow-up to assess its effect on some outcome. However, most clinical trials are not ideal; patients often fail to adhere to their assigned treatment and switch to another trial treatment. Such non-compliance with assigned treatments is a common feature of clinical trials.

Robins (1994) developed structural mean models (SMMs) to cope with non-compliance without having to specify the mechanism of non-ignorable non-compliance (Rubin 1976) using randomization as an instrumental variable. One attractive feature of SMMs is their modeling flexibility, which allows for the expression of the causal effect of received treatments as a function of treatments and covariates through a finite number of unknown causal parameters without specifying the conditional expectation of potential outcomes under the control treatment. SMMs have now been proposed for continuous, discrete, and binary outcomes (Robins 1994; Vansteelandt and Goetghebeur 2003), and related structural distribution models have been developed for survival outcomes (Mark and Robins 1993; Loeys and Goetghebeur 2003).

To obtain a valid causal estimate, we must impose several assumptions. One of these is the correct model specification of the structural model. This can be numerically checked by evaluating the goodness of fit (GOF) of postulated SMMs to the data. For continuous outcomes, Comte et al. (2009) developed a test of the interaction between treatment and a baseline covariate, and Fischer et al. (2011) proposed a local GOF test which can detect a linear effect modification with a covariate but cannot detect a non-linear effect modification. Taguri et al. (2014) recently proposed a model selection criterion as an extension of Akaike’s information criterion (Akaike 1973) for evaluating the relative fitting of candidate models using the expected Kullback–Leibler distance as a metric. However, none of them proposed a global GOF test which can detect any misspecifications of the assumed model structure.

In general, the validity of the estimating equations depends on whether the parametric part of the SMM is correctly specified. If the SMM is misspecified, the resultant estimating equations deviates in expected value from zero, thus an inconsistent estimator will be yielded. To get a valid inference, it is desirable to assess the unbiasedness of the estimating equations. Diagnostic tools such as residuals have been widely used to assess the appropriateness of a generalized linear model (Su and Wei 1991; Lin et al. 2002). However, such methods cannot apply to non-compliance data with an instrumental variable. The aim of this article is to develop a global GOF test of linear SMMs. The idea is based on testing for the unbiasedness of g-estimating equations (Robins 1994). The residual processes will be constructed in the same spirit of Su and Wei (1991), Lin et al. (2002), Pan and Lin (2005), and Chen and Qin (2014). Under the null hypothesis that g-estimating equations are unbiased, the residual processes will fluctuate about zero. Thus a large absolute value of the residuals leads to the conclusion of model misspecification. Numerical studies show that the proposed test can control type I errors if the SMM is correctly specified. Furthermore, the proposed test detects non-linear effect modification of continuous covariates with high probability, while Fischer et al.’s test does not.

The reminder of this article is as follows. In Sect. 2, we briefly overview the SMMs and the g-estimation procedure. In Sect. 3, we review the method by Fischer et al. (2011) and propose a GOF test. In Sect. 4, we present a simulation study to investigate the performance of our proposed test. In Sect. 5, we apply the proposed method to data derived from a randomized trial to evaluate the impact of a primary care-based intervention on depression. Finally, in Sect. 6, we conclude with a discussion.

2 Structural mean models

We consider a randomized two-arm trial, where n patients are randomized to one of the two treatments. Let R be the indicator of treatment assignment, equal to 1 (0) for the test (control) treatment. Let A be the actual treatment whether an individual received test treatment (1: test, 0: control), X is the vector of baseline covariates, and Y is the continuous outcome measured at the end of the trial. We assume the observed data O i  = (R i , X T i , A i , Y i )T, i = 1,…, n are n independent and identically distributed random vectors. Thus, we omit the subscript i unless necessary. In contrast to the observed outcome variable Y, we define Y ra with r, a = 0, 1 as the potential outcome (Rubin 1974) that would be observed if possibly contrary to the fact that R were set to r and A were set to a. We make the following three assumptions to estimate causal treatment effects:

(A1):

Stable Unit Treatment Value Assumption (SUTVA)

The potential outcome for each patient does not depend on the treatment assigned or the treatment actually received by any other patient. SUTVA also implies the consistency assumption, which means that a patient’s potential outcome under his/her treatment is precisely his/her observed outcome. In notation, SUTVA implies that \( Y \; = \;RAY_{ 1 1} \; + \;R\left( { 1{-}A} \right)Y_{ 10} \; + \; \left( { 1{-}R} \right)AY_{0 1} \; + \;\left( { 1{-}R} \right) \left( { 1{-}A} \right)Y_{00} . \)

(A2):

Exclusion restriction

Treatment assignment only affects the outcome through its effect on treatment received. This assumption implies that Y ra  = Y a with r, a = 0, 1. Under this assumption, Y 11 = Y 01 = Y 1 is the potential outcome under test treatment, while Y 10 = Y 00 = Y 0 is that under control treatment.

(A3):

Randomization assumption

The random assignment R and Y 0 are conditionally independent given baseline covariates X, i.e., Y 0R | X.

Furthermore, we assume that the average causal treatment effects follow linear SMMs (Robins 1994; Goetghebeur and Vansteelandt 2005):

$$ E[Y - Y_{0} |A,{\varvec{X}},R]\; = \;A{\varvec{Z}}({\varvec{X}},R)^{{\varvec{T}}} {\boldsymbol{\theta}}, $$
(1)

where Z(X, R) is a v-dimensional (v ≥ 1) vector that depends on (X, R) and θ is the unknown v-dimensional causal parameter vector of interest. Note that from (1), \( E\left[ {Y_{ 1} - Y_{0} |A \; = \; 1,\varvec{X},R} \right] \; = \;{\varvec{Z}}\left( {\varvec{X},R} \right)^{\varvec{T}} {\boldsymbol{\theta}} \) is the effect of the treatment on the treated conditional on the baseline covariates and the randomization indicator (X, R). For example, when Z(X, R)T = (1, X T), we allow for the possibility that the average causal effect on the treated is not constant with levels of X and changes linearly with X.

Because the full data (Y 1, Y 0, A, R, X T) is only partially observed for each patient i, no regression methods for the complete data can be used to fit the model (1). However, from (1) and the assumption A3, it follows that \( E[Y - A{\varvec{Z}}({\varvec{X}},R)^{{\varvec{T}}} {\boldsymbol{\theta}}|{\varvec{X}},R] = E[Y_{0} |{\varvec{X}},R] = E[Y_{0} |{\varvec{X}}]. \) Using this, a consistent estimator of θ can be obtained from a class of unbiased g-estimating functions (Robins 1994):

$$ {\boldsymbol{\psi}}({\boldsymbol{\theta}}) = (R - p){\varvec{w}}({\varvec{X}})\{ U({\boldsymbol{\theta}}) - q({\boldsymbol{X}})\} , $$
(2)

where \( p = \Pr [R = 1|{\varvec{X}}] = \Pr [R = 1] \) is the randomization probability known by design, \( U({\boldsymbol{\theta}}) = Y - A{\varvec{Z}}({\varvec{X}},R)^{{\varvec{T}}} {\boldsymbol{\theta}}; \) w(X) is a v-dimensional vector function, and q(X) is a scalar function. For some w(X) and q(X), a consistent estimator of θ (called the g-estimator) is analytically obtained by solving g-estimating equations \( \sum\nolimits_{i = 1}^{n} {{\boldsymbol{\psi}}_{i} ({\boldsymbol{\theta}})} = 0, \) where \( {\varvec{\uppsi}}_{i} ({\varvec{\uptheta}}) \) is the i-th sample value of \( {\boldsymbol{\psi}}({\boldsymbol{\theta}}). \) The optimal choices for w(X) and q(X) from the viewpoint of efficiency that lead to a semiparametric efficient estimator of θ were derived by Robins (1994). Under the homoscedasticity assumption that the error variance of the regression of U(θ) on (R, X) is constant, these choices are given by \( {\varvec{w}}_{{\text{opt}}} ({\varvec{X}}) = \delta_{{\text{opt}}} ({\varvec{X}})E[{\varvec{Z}}({\varvec{X}},R)|{\varvec{X}}] \) and \( q_{{\text{opt}}} ({\varvec{X}}) = E[U({\boldsymbol{\theta}})|{\varvec{X}}], \) where \( \delta_{{\text{opt}}} \left( {\boldsymbol{X}} \right) = { \Pr }\left[ {A = 1|R = 1,{\boldsymbol{X}}} \right]{-}{ \Pr }\left[ {A = 1|R = 0,{\boldsymbol{X}}} \right] \) is called the compliance score (Joffe and Brensinger 2003). δ opt(X) upweights participants characterized by X for whom the effect of treatment assignment on the treatment received is large, thus contributing information to estimate the effect of the treatment on the outcome. Since the optimal choices are unknown functions of X, it is often assumed parametric models for δ opt(X) in w opt(X) and q(X). In our simulation and data analysis, we estimate Pr[A = 1|R, X] in δ opt(X) by a logistic regression. We assume q opt(X) is linear in X, which leads to an analytical estimator of \( {\hat{\boldsymbol{\theta }}} \) (Fischer et al. 2011). A consistent variance estimator of \( {\hat{\boldsymbol{\theta }}} \) is obtained as \( n^{ - 1} \hat{\Omega }({\hat{\boldsymbol{\theta }}})^{ - 1} \hat{\Lambda }({\hat{\boldsymbol{\theta }}})(\hat{\Omega }({\hat{\boldsymbol{\theta }}})^{ - 1} )^{{\mathbf{T}}} , \) where \( \Omega ({\boldsymbol{\theta}}) = - E[\partial {\boldsymbol{\psi}}({\boldsymbol{\theta}})/\partial {\boldsymbol{\theta}}^{{\varvec{T}}} ] \), \( \Lambda ({\boldsymbol{\theta}})\; = \text{var} [{\boldsymbol{\psi}}({\boldsymbol{\theta}})]. \)

3 Goodness of fit tests for structural mean models

3.1 Goodness of fit test proposed by Fischer et al. (2011)

Before discussing our method for assessing the fit of the SMM (1), we briefly review the GOF test proposed by Fischer et al. (2011). Their methods are essentially based on the fact that if model (1) is correctly specified, then the expected “treatment-free” outcomes \( U({\hat{\boldsymbol{\theta }}}) \) in both arms R = 1 and R = 0 will have the same regression functions on X. The GOF test was conducted using the following linear regression model for \( U({\hat{\boldsymbol{\theta }}}) \) on (X, R):

$$ E[U({\hat{\boldsymbol{\theta }}})|{\varvec{X}},R]\; = \beta_{0} + {\boldsymbol{\beta}}_{ 1}^{{\varvec{T}}} {\varvec{X}}\; + \;\beta_{2} R\; + \;{\boldsymbol{\beta}}_{ 3}^{{\varvec{T}}} R{\varvec{X}}. $$
(3)

To assess whether the model is a good fit, we conducted the test of the following null hypothesis: \( H_{0} :{\varvec{\beta}}_{ 3} = \bf{0}. \) If we find the interaction terms significant (that is, H 0 is rejected), then there is an evidence for the lack of fit. Note that this GOF test would not detect a non-linear effect modification by X because (3) includes only linear terms of X. We additionally note that model misspecification will usually occur in the model (3). To see this, let θ * be the true value of θ. For an arbitrary θ, the following equation holds by the SMM (1):

$$ E[U({\boldsymbol{\theta}})|{\varvec{X}},R] = E[Y_{0} |{\varvec{X}}] + E[A|{\varvec{X}},R]{\varvec{Z}}({\varvec{X}},R)^{{\varvec{T}}} ({\boldsymbol{\theta}}_{\boldsymbol{*}} - {\varvec{\theta}}). $$
(4)

The third term of (4) with \( {\boldsymbol{\theta}} = {\hat{\boldsymbol{\theta }}} \) will be apparently nonlinear in X when A is binary unless \( {\hat{\varvec{\theta }}} = {\boldsymbol{\theta}}_{\boldsymbol{*}} \) holds. In such cases, (3) is a misspecified model. This misspecification could affect the power of the GOF test, although the size of the test should be asymptotically equal to the nominal level because the third term of (4) with \( {\varvec{\theta}} = {\hat{\boldsymbol{\theta }}} \) has asymptotically zero expectation under the correct specification of (1).

3.2 Proposed goodness of fit test

Rather than assuming a parametric model for \( U({\hat{\boldsymbol{\theta }}}) \) such as (3), we can construct a GOF test in the spirit of Su and Wei (1991), Lin et al. (2002), Pan and Lin (2005), and Chen and Qin (2014). The idea is based on testing for the unbiasedness of the g-estimating equations under the correct model specification. To check the validity of the assumed SMM (1), we consider the following statistics:

$$ V_{n} ({\varvec{x}}) = n^{ - 1/2} \sum\limits_{i = 1}^{n} {I({\varvec{X}}_{i} \le {\varvec{x}})(R_{i} - p)\delta_{{\text{opt}}} ({\varvec{X}}_{i} )\{ U_{i} ({\hat{\boldsymbol{\theta }}}) - q_{{\text{opt}}} ({\varvec{X}}_{i} )\} } , $$
(5)

where x is a real-valued vector of length v. Under the null hypothesis that the SMM (1) is correctly specified, (5) has zero expectation for all values of x. Thus, a large value of the following omnibus test statistic \( G_{n} = \sup_{{{\varvec{x}}\, \in \,R^{v} }} |V_{n} ({\varvec{x}})| \) leads to the conclusion of model misspecification.

To make a GOF test, we need to specify the distribution of V n (x). The cumulative-sum process V n (x) converges in distribution to a zero-mean Gaussian process under the null hypothesis that the SMM (1) is correctly specified. Using the Taylor expansion of (5) about \( {\hat{\boldsymbol{\theta }}} \) around the true value \( {\varvec{\theta}}_{{\boldsymbol{*}}} , \) we have

$$ V_{n} ({\varvec{x}}) \approx n^{ - 1/2} \sum\limits_{i = 1}^{n} {I({\varvec{X}}_{i} \le {\varvec{x}})(R_{i} - p)\delta_{{\text{opt}}} ({\varvec{X}}_{i} )\{ U_{i} ({\varvec{\theta}}_{{\boldsymbol{*}}} ) - q_{{\text{opt}}} ({\varvec{X}}_{i} )\} } \; + \;{\boldsymbol{\eta}}({\varvec{x}};{\boldsymbol{\theta}})^{T} n^{1/2} ({\hat{\boldsymbol{\theta }}} - {\boldsymbol{\theta}}_{{\boldsymbol{*}}} ), $$
(5)

where \( {\boldsymbol{\eta}}({\varvec{x}};{\boldsymbol{\theta}}) = n^{ - 1} \sum\limits_{i = 1}^{n} {I({\varvec{X}}_{i} \le {\varvec{x}})(R_{i} - p)\delta_{{\text{opt}}} ({\varvec{X}}_{i} ){{\partial U_{i} ({\boldsymbol{\theta}})} \mathord{\left/ {\vphantom {{\partial U_{i} ({\boldsymbol{\theta}})} {\partial {\boldsymbol{\theta}}}}} \right. \kern-0pt} {\partial {\boldsymbol{\theta}}}}} \) and \( A \approx B \) means A−B = o p (1). Using a similar Taylor expansion, we obtain

$$ n^{1/2} ({\hat{\boldsymbol{\theta }}} - {\boldsymbol{\theta}}_{{\boldsymbol{*}}} )\; = \;\varOmega ({\boldsymbol{\theta}}_{{\boldsymbol{*}}} )^{ - 1} n^{{ - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} {\boldsymbol{\psi}}_{i} ({\boldsymbol{\theta}}_{{\boldsymbol{*}}} ), $$
(6)

Thus, from (5) and (6), we have

$$ V_{n} ({\varvec{x}}) \approx n^{ - 1/2} \sum\limits_{i = 1}^{n} {[I({\varvec{X}}_{i} \le {\varvec{x}})(R_{i} - p)\delta_{{\text{opt}}} ({\varvec{X}}_{i} )\{ U_{i} ({\boldsymbol{\theta}}_{{\boldsymbol{*}}} ) - q_{{\text{opt}}} ({\boldsymbol{X}}_{i} )\} } \; + \;{\boldsymbol{\eta}}({\varvec{x}};{\boldsymbol{\theta}}_{{\boldsymbol{*}}} )^{T} \Omega ({\boldsymbol{\theta}}_{{\boldsymbol{*}}} )^{ - 1} {\boldsymbol{\psi}}_{i} ({\boldsymbol{\theta}}_{{\boldsymbol{*}}} )]. $$
(7)

Although it is hard to specify the explicit distribution form of (7), Su and Wei (1991) proposed a simulation-based method to approximate the null distribution of V n (x). The idea is as follows. Suppose that S 1,…, S n are independent and identically distributed variables from F(S), with μ = E[S] = 0 and σ 2 = E[S 2] < ∞, then based on the central limit theorem, we have \( n^{ - 1/2} \sum\nolimits_{i = 1}^{n} {S_{i} } \to N\left( {0,\sigma^{ 2} } \right) \). Let Z 1,…, Z n be independent standard normal random variables. Then conditional on the original data S 1,…, S n , \( n^{ - 1/2} \sum\nolimits_{i = 1}^{n} {Z_{i} S_{i} } \sim N(0,n^{ - 1} \sum\nolimits_{i = 1}^{n} {S_{i}^{2} } ) \to N\left( {0,\sigma^{ 2} } \right) \). That is, \( n^{ - 1/2} \sum\nolimits_{i = 1}^{n} {Z_{i} S_{i} } \) has the same asymptotic distribution as that of \( n^{ - 1/2} \sum\nolimits_{i = 1}^{n} {S_{i} } \). Using these results, for large n, the distribution of V n (x) is approximated by

$$ \tilde{V}_{n} ({\varvec{x}}) \approx n^{ - 1/2} \sum\limits_{i = 1}^{n} {[I({\varvec{X}}_{i} \le {\varvec{x}})(R_{i} - p)\delta_{{\text{opt}}} ({\varvec{X}}_{i} )\{ U_{i} ({\hat{\boldsymbol{\theta }}}) - q_{{\text{opt}}} ({\varvec{X}}_{i} )\} } + {\boldsymbol{\eta}}({\varvec{x}};{\hat{\boldsymbol{\theta }}})^{T} \hat{\Omega }({\hat{\boldsymbol{\theta }}})^{ - 1} {\boldsymbol{\psi}}_{i} ({\hat{\boldsymbol{\theta }}})]\,Z_{i} , $$
(8)

where (Z 1,…, Z n ) is a random sample from N(0,1). To approximate the null distribution of V n (x), we generate large number of samples (Z 1,…, Z n ) from N(0,1) while fixing the data at their observed values.

4 A simulation study

In this section, the performance of the proposed method is evaluated via a simulation study. The following data (R, X, A, Y) were generated as follows. Let X be distributed as N(0,1) and the treatment assignment R be generated from Bernoulli(0.5). Next, the received treatment A was assigned according to the logistic model \( \text{logit} [\Pr (A = 1|R,X,\gamma )] = - 1 + 4R + X + \gamma , \) where γ follows N(0,0.25). Then, outcome Y was generated from N(3X + A(k 0 + k 1 X + k 2 X 2) + 0.5γ, 0.25). This leads to the true SMM: E[Y 1 − Y 0|A = 1, X] = k 0 + k 1 X + k 2 X 2. The shared random effect, γ, gave rise to non-ignorable non-compliance. We set (k 0,k 1,k 2) = (3,0,0) for no effect modification by X, (k 0,k 1,k 2) = (3,0.1,0), (3,0.2,0), (3,0.3,0), (3,0.4,0), (3,0.5,0), (3,2,0), (3,5,0) for linear effect modifications, (k 0,k 1,k 2) = (−0.2,0,0.4), (−0.4,0,0.8), (−0.6,0,1.2), (−0.8,0,1.6), (−1,0,2), (−2,0,4), (−4,0,8) for quadratic effect modifications. We set the sample size n = 500. For each setting, we ran 1000 simulations.

For the analysis of the simulated data, we assumed the main effect model: \( E\left[ {Y_{ 1} - Y_{0} |A = 1,X} \right] = \theta \). We investigated four GOF tests: (i) Fischer: Fisher et al.’s GOF test; (ii) V 1n (x): proposed GOF test with \( V_{n} (x) = n^{ - 1/2} \sum\nolimits_{i = 1}^{n} {I(X_{i} \le x)(R_{i} - p)U_{i} (\hat{\theta })} ; \) (iii) V 2n (x): proposed GOF test with \( V_{n} (x) = n^{ - 1/2} \sum\nolimits_{i = 1}^{n} {I(X_{i} \le x)(R_{i} - p)\hat{\delta }_{{\text{opt}}} (X_{i} )U_{i} (\hat{\theta })} ; \) (iv) V 3n (x): proposed GOF test with \( V_{n} (x) = n^{ - 1/2} \) \( \sum\nolimits_{i = 1}^{n} {I(X_{i} \le x)(R_{i} - p)\hat{\delta }_{{\text{opt}}} (X_{i} )\{ U_{i} (\hat{\theta }) - \hat{q}_{{\text{opt}}} (X_{i} )\} } . \) For each test, the two-sided significance level was set at 0.05.

Table 1 summarizes the empirical rejection probabilities by four methods. For no effect modification case, all of the four GOF tests kept the nominal significance level. For linear effect modification cases, the power of the all tests were increasing as the true effect was increasing. Among the four methods, Fischer et al.’s test performed the best in terms of the empirical power, although the power of the proposed test with V 3n (x) was only slightly lower than that of Fischer et al.’s test. This is not surprising because our GOF test was an omnibus test using a Kolmogorov-type test statistic. For quadratic effect modification cases, the power of the proposed tests were also increasing as the true effect was increasing. On the other hand, the power of the Fischer et al.’s test was not monotonically increasing with the strength of the true effect. Among the three statistics for the proposed method, V 3n (x) performed by far the best. This indicates that using the optimal nuisance functions δ opt (X) and q opt (X) as described in Sect. 3.2 is very important for the good performance of our proposed test.

Table 1 Empirical size and power of the GOF tests

5 Application

We now apply the proposed method to data derived from the PROSPECT (Prevention of Suicide in Primary Care Elderly: Collaborative Trial) (Bruce and Pearson 1999; Bruce et al. 2004). Data are available at http://research.bmh.manchester.ac.uk/biostatistics/research/data. PROSPECT was a multi-site prospective, randomized trial designed to evaluate the impact of a primary care-based intervention on reducing major risk factors (including depression) for suicide in later life. Participants were recruited from 20 primary care practices in New York City, Philadelphia and Pittsburgh regions. Ten pairs of practices were matched by region (urban vs suburban/rural), affiliation, size, and population type. Within these 10 pairs, practices were randomly allocated to one of the two conditions. The two conditions were either (a) an intervention based on treatment guidelines tailored for the elderly with care management including antidepressant medication (R = 1) or (b) treatment as usual (R = 0). For illustration purposes, here, we analyzed the data as if interventions were randomly assigned at the individual level. We use these data to assess the effect of antidepressant medication (A = 1: presence; A = 0: absence) on the change of the Hamilton Depression Rating Scale (HDRS) (Hamilton 1960) score at four months after randomization from baseline (Y). We use the baseline score of the HDRS as a baseline covariate (X), and it is centered with the mean value of the entire sample for estimation of SMMs.

Table 2 summarizes the analysis results. We started with an Intention-to-treat (ITT) analysis and it indicated that the HDRS score at four months was significantly lower in the intervention group than it was in the control group \( \text{(ITT effect}{ = } \hat{E}[Y|R = 1] - \hat{E}[Y|R = 0] = {-}3.62 \), 95 % confidence interval: −5.29 to −1.95). However, those who did not comply with the assigned treatment comprised 15.2 % (22/145) of the intervention group and 45.4 % (69/152) of the control group. Thus, the ITT effect would substantially underestimate the true causal effect of the treatment (that is, antidepressant medication). Then, we applied the following two SMMs for estimation of the causal treatment effect on the treated: (i) a one parameter SMM including the main effect only, that is, Z(X, R) = 1; (ii) a two parameter SMM assuming the effect modification with X, that is, Z(X, R) = (1, X). The baseline covariate X was centered; thus, the main effect parameter for the model (ii) was interpreted to represent the treatment effect at the mean value for the covariates. As shown in Table 2, the two SMMs gave much larger effect estimates than the ITT analysis did, as expected. From the estimation result of the two parameter SMM, the treatment effect was slightly larger for those with higher baseline levels of the baseline HDRS score, although the effect was not statistically significant. We then applied the proposed GOF test using the test statistics V 3n (x) in Sect. 4 as well as the test proposed by Fischer et al. (2011). The p value of the GOF test was large for the one parameter model (i) for both methods, indicating good fitting of the main effect model. No difference was observed between the two GOF tests in this analysis. As noted in Taguri et al. (2014), the larger model (two parameter model) gave the larger p value for the Fischer et al.’s test.

Table 2 Summary statistics and estimation results in the suicide prevention trial (PROSPECT)

6 Discussion

In this article, we have proposed a new global GOF test for the parametric part of the SMMs. The proposed model-checking method is an objective and informative approach for numerically checking the function form of covariates in SMM. Simulation studies demonstrate that the proposed test works well in terms of the type errors and power for both linear and non-linear effect modifications.

Although SMMs and g-estimation always provide a valid test of the no treatment effect in the presence of non-compliance (Robins 1994), the correct model specification is a fundamental assumption for consistently estimating the causal treatment effect. In this regard, assessing the GOF of the candidate SMMs is very important. Our GOF test and the model selection criterion proposed by Taguri et al. (2014) can be used as complementary approaches, with the GOF test evaluating the overall fit and the model selection criterion evaluating the relative fit of candidate models.

SMMs have been used to handle repeated measures over time as structural nested mean models (Robins 1994) and related structural distribution models have been developed for survival outcomes (Mark and Robins 1993; Loeys and Goetghebeur 2003). Recently, Wallace et al. (2016) proposed a model assessment technique which can detect misspecifications of nuisance functions in SMMs for dynamic treatment regimens using the property of double robustness in observational studies. It is interesting to investigate as to how to extend our method to these problems.