1 Introduction

Static and dynamic binary choice models are largely employed in microeconometric applications.Footnote 1 For these models, the fixed-effects approach is often advocated, as it allows for the estimation of partial effects of covariates that may be correlated with the individual specific unobserved heterogeneity in a nonparametric manner. However, unless the number of time occasions T goes to infinity, the maximum likelihood (ML) estimator of fixed-effects binary choice models is inconsistent due to the incidental parameters problem, that is, the presence of nuisance parameters whose number increases with the sample size (Lancaster 2000; Neyman and Scott 1948).

A popular method to overcome the incidental parameters problem is based on the conditional inference approach for the fixed-effects logit model (Andersen 1970; Chamberlain 1980), which admits sufficient statistics for the individual intercepts. The conditional ML (CML) method produces a fixed-T consistent estimator of the slope parameters, which makes it a particularly attractive strategy given the large availability of data sets, such as those produced by national household and workforce surveys, that are based on a rotating sampling scheme. However, a drawback of the CML approach is that plug-in estimates of the average partial effects (APEs) are not directly available, as the parameters for the individual effects are eliminated.

An alternative way to deal with the incidental parameters problem is to correct the bias of the ML estimator of the slope parameters. While the resulting estimator will not be fixed-T consistent, this approach is general and can be applied to any nonlinear model, as opposed to the CML strategy that is specific for the logit model. Several procedures have been proposed to reduce the order of the bias of the ML estimator from \(O(T^{-1})\) to \(O(T^{-2})\). Analytical bias corrections are provided by Fernández-Val (2009), whose derivations are based on general results for static (Hahn and Newey 2004) and dynamic (Hahn and Kuersteiner 2011) nonlinear panel data models. In the same vein, a modified log-likelihood function and score equation are proposed by Bester and Hansen (2009) and Carro (2007), respectively, to achieve the same bias reduction. An alternative bias correction method relies on the panel jackknife. A general procedure for nonlinear static panel data models is proposed by Hahn and Newey (2004), whereas a split-panel jackknife estimator is developed by Dhaene and Jochmans (2015) for dynamic models.

The advantage of relying on ML estimates is that plug-in estimators of the APEs are readily available. The estimators of the APEs share the same order of bias as the ML estimator, and the related corrections can be operated in a similar manner. However, especially in the case of dynamic binary choice models, the ML estimator is known to be severely biased (Heckman 1981) and the bias-corrected ML estimator exhibits a greater distortion compared to fixed-T consistent ones with short T (Carro 2007; Bartolucci and Nigro 2012). Practitioners are also usually more familiar with the CML approach, as it has become prominent in the applied literature perhaps due to its long-standing presence in graduate textbooks (Cameron and Trivedi 2005; Hsiao 2005; Wooldridge 2010) and its built-in implementation in popular microeconometrics software.

In this paper, we propose a multiple-step procedure to estimate the APEs in the fixed-effects logit model that combines the conditional inference approach with a bias reduction method. The APEs are evaluated at the fixed-T consistent CML estimator of the slope parameters and at the ML estimator for the unobserved heterogeneity, obtained by maximizing the log-likelihood evaluated at the CML estimate. Plugging in the estimated fixed effects produces an additional source of bias in the APEs; we reduce the order of this bias from \(O(T^{-1})\) to \(O(T^{-2})\) by applying the analytical correction proposed by Fernández-Val (2009). In this respect, our approach extends the work of Stammann et al. (2016), who study the same plug-in estimator without the bias correction.

The proposed procedure cannot be directly extended to the dynamic logit model (Hsiao 2005), for which CML inference for the slope parameters is not viable in a simple form. This is overcome by Bartolucci and Nigro (2010), who propose a quadratic exponential (QE) formulation (Cox 1972) for dynamic binary panel data models, which has the advantage of admitting sufficient statistics for the individual intercepts. Furthermore, Bartolucci and Nigro (2012) propose a QE model that closely approximates the dynamic logit model, the parameters of which can easily be estimated by pseudo CML (PCML). The resulting PCML estimator is consistent in the absence of state dependence, because in this case the QE model corresponds to the dynamic logit model and, as shown by Bartolucci and Nigro (2012) by simulation, it otherwise exhibits a moderate bias. We therefore extend the proposed procedure to include PCML estimates in the APEs when a dynamic logit is specified.

As it happens with the APE estimators based on analytical and jackknife corrections, the proposed method reduces the order of the bias from \(O(T^{-1})\) to \(O(T^{-2})\). However, such a bias is asymptotically negligible under rectangular array asymptotics as plug-in average-effect estimators converge at the rate \(n^{-1/2}\) (Dhaene and Jochmans 2015), where n is the sample size. Yet in spite of the asymptotic equivalence of bias-corrected and ML plug-in APE estimators, the simulation evidence provided by Dhaene and Jochmans (2015) suggests that operating some bias reduction entails a non-negligible improvement in small samples, especially with small values of T.

The proposed combination of the conditional inference approach with bias reduction provides a way to readily obtain APE estimates for the fixed-effects static and dynamic logit models. By means of an extensive simulation study, we show that the proposed approach has finite sample performance comparable to the ML and bias-corrected estimators with the static logit model, while it outperforms them when the dynamic logit is considered, especially when n and T are small. This is the result of plugging a fixed-T consistent estimator of the slope and state dependence parameters of the QE model into the APEs.

It is worth clarifying that while the CML and PCML estimators are fixed-T consistent, the asymptotic theory for the APE estimator here proposed is based on both \(n,T \rightarrow \infty \), due to the presence of the ML estimates of the individual intercepts and, consequently, of the bias correction, which is derived by means of a large-T asymptotic expansion.Footnote 2 This means that a few typical drawbacks of this setting theoretically apply here as well, namely time effects are ruled out, as they also are incidental parameters when \(T \rightarrow \infty \), and the rest of the covariates are required to be stationary (Dhaene and Jochmans 2015; Fernández-Val 2009). However, by means of dedicated simulation exercises, we show that the proposed approach is able to handle the presence of time dummies in the model specification and that violations of the stationarity assumption does not seem to affect in practice the finite-sample performance of all the approaches considered.

The rest of the paper is organized as follows: In Sect. 2, we briefly discuss how the incidental parameters problem affects the APEs estimator; in Sect. 3, we recall the bias correction strategies for APE estimators, and then, we illustrate the proposed methodology and its extension to accommodate the dynamic logit model; in Sect. 4, we investigate by simulation the finite sample performance of the proposed estimator, compare it with the panel jackknife and analytical bias correction strategies, and illustrate the results of some robustness exercises; in Sect. 5, we provide a real data application based on labour supply of married women; finally, Sect. 6 provides the main conclusions.

2 Average partial effects and the incidental parameters problem

We consider n units, indexed with \(i=1,\ldots ,n\), observed at time occasions \(t = 1, \ldots , T\). Let \(y_{it}\) be the binary response variable for unit i at occasion t and \({\varvec{x}}_{it}\) the corresponding vector of K strictly exogenous covariates. Under the static model, we assume that the \(y_{it}\) are conditionally independent, given \(\alpha _i\) and \({\varvec{x}}_{it}\), across i and t. Consider the logit formulation

$$\begin{aligned} p(y_{it} \vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }}) = \frac{\exp \left[ y_{it}(\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }})\right] }{1 + \exp (\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }})}, \end{aligned}$$
(1)

where \(\alpha _i\) is the individual specific intercept and the vector \({\varvec{\beta }}\) collects the regression parameters.

The fixed-effects estimator is obtained by ML, treating each individual effect \(\alpha _i\) as a parameter to be estimated. The ML estimator of \({\varvec{\beta }}\) is obtained by concentrating out the \(\alpha _i\) as the solution to

$$\begin{aligned} \hat{{\varvec{\beta }}}= & {} \underset{\small {\varvec{\beta }}}{\textrm{argmax}} \sum _{i=1}^n \sum _{t = 1}^T \log p(y_{it}\vert {\varvec{x}}_{it}; {\hat{\alpha }}_i({\varvec{\beta }}), {\varvec{\beta }}), \\ {\hat{\alpha }}_i({\varvec{\beta }})= & {} \underset{\alpha _i}{\textrm{argmax}} \sum _{t = 1}^T \log p(y_{it}\vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }}). \end{aligned}$$

Notice that \({\hat{\alpha }}_i({\varvec{\beta }})\) is estimated using only the data for subject i and it is therefore not consistent for \(\alpha _{i0}\) unless \(T \rightarrow \infty \). As a consequence, with T fixed and only \(n \rightarrow \infty \), the ML estimator of \(\hat{{\varvec{\beta }}}\) will be plagued by the estimation noise in \({\hat{\alpha }}_i({\varvec{\beta }})\) and will not be consistent for \({\varvec{\beta }}_0\), with \(\underset{n \rightarrow \infty }{\textrm{plim}} \, \hat{{\varvec{\beta }}} \equiv {\varvec{\beta }}_T \ne {\varvec{\beta }}_0\). This is the well-known incidental parameters problem (Lancaster 2000; Neyman and Scott 1948). In particular, Hahn and Newey (2004) show that \({\varvec{\beta }}_T = {\varvec{\beta }}_0 + B/T + O(T^{-2})\), which clarifies that \({\varvec{\beta }}_T \rightarrow {\varvec{\beta }}_0\) if \(T \rightarrow \infty \) and n is fixed. Moreover, if both \(n, T \rightarrow \infty \), \(\hat{{\varvec{\beta }}}\) will be asymptotically normal. However, Hahn and Newey (2004) show that the asymptotic distribution of the ML estimator will not be centred at its probability limit if n grows proportionally to T.

The incidental parameters problem affects the estimation of APEs as well; these effects are usually of interest to practitioners who want to quantify the influence of some regressor x on the response probability, other things being equal. For the logit model in (1), the partial effect of covariate \(x_{itk}\) for i at time t on the probability of \(y_{it} = 1\) can be written, depending on the typology of the covariate, as

$$\begin{aligned} m_{itk}(\alpha _i, {\varvec{\beta }}, {\varvec{x}}_{it}) = \left\{ \begin{array}{ll} p(y_{it} = 1 \vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }})\left[ 1 - p(y_{it} = 1 \vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }})\right] \beta _k, &{} x_{itk} \,\,\, \textrm{continuous}, \\ &{} \\ p(y_{it} = 1 \vert {\varvec{x}}_{it,-k}, x_{itk} = 1 \, ; \alpha _i, {\varvec{\beta }}) - &{} \\ p(y_{it} = 1 \vert {\varvec{x}}_{it,-k}, x_{itk} = 0 \, ; \alpha _i, {\varvec{\beta }}),&{} x_{itk} \,\,\, \textrm{binary}, \\ \end{array} \right. \end{aligned}$$

where \({\varvec{x}}_{it, -k}\) denotes the subvector of all covariates but \(x_{itk}\). The true APE of the kth covariate can then be obtained by simply taking the expected value of \(m_{itk}(\alpha _i, {\varvec{\beta }}, {\varvec{x}}_{it})\) with respect to \({\varvec{x}}_{it}\) and \(\alpha _{i0}\),

$$\begin{aligned} \mu _{k0} = \int m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0, {\varvec{x}}_{it})dG(\alpha _{i0},{\varvec{x}}_{it}), \end{aligned}$$

where \(G(\alpha _{i0},{\varvec{x}}_{it})\) denotes the joint distribution of \(\alpha _{i0}\) and \({\varvec{x}}_{it}\). An estimator of \(\mu _{k0}\) can be obtained by plugging in the ML estimators \(\hat{{\varvec{\beta }}}\) and \({\hat{\alpha }}_i(\hat{{\varvec{\beta }}})\), so that

$$\begin{aligned} {\hat{\mu }}_{k} = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\hat{{\varvec{\beta }}}), \hat{{\varvec{\beta }}}, {\varvec{x}}_{it}). \end{aligned}$$
(2)

It is now clear that, with small T, this estimator is plagued by two sources of bias: the first stems from the estimation error introduced by \({\hat{\alpha }}_i({\varvec{\beta }})\); the second is a result of using the asymptotically biased estimator \(\hat{{\varvec{\beta }}}\).

In order to better understand how these components will affect the bias of the APE, it is useful to introduce an expansion of \({\hat{\alpha }}_i({\varvec{\beta }})\) as \(T \rightarrow \infty \):

$$\begin{aligned} {\hat{\alpha }}_i({\varvec{\beta }}) = \alpha _{i0} + \frac{\xi _i}{T} + \frac{1}{T}\sum _{t=1}^T\tau _{it} + o_p\left( \frac{1}{T}\right) , \end{aligned}$$
(3)

where \(\frac{1}{\sqrt{T}}\sum _{t=1}^T\tau _{it} {\mathop {\rightarrow }\limits ^{d}}N(0, \sigma ^2_i)\), which follows from higher-order asymptotics for time series data (Bao and Ullah 2007). Dhaene and Jochmans (2015) then show that the combined asymptotic bias is

$$\begin{aligned} \underset{n \rightarrow \infty }{\textrm{plim}}\,\, {\hat{\mu }}_k = \mu _{k0} + \frac{D + E}{T} + O(T^{-2}), \end{aligned}$$
(4)

where, specifically,

$$\begin{aligned} D= & {} \sum \limits _{j = 0}^{\infty }\textrm{E}_{T}\left[ \frac{\partial m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial \alpha _{i}}\tau _{i,t-j} \right] + \textrm{E}_{T}\left[ \frac{\partial m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial \alpha _{i}}\xi _{i}\right] \nonumber \\ {}{} & {} + \frac{1}{2}\textrm{E}_{T}\left[ \frac{\partial ^2 m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial \alpha _{i}^2}\sigma ^2_{i}\right] \end{aligned}$$
(5)

is the bias generated from using \({\hat{\alpha }}_i({\varvec{\beta }})\) instead of \(\alpha _{i0}\).Footnote 3 Following Hahn and Newey (2004), this expression suggests that the bias introduced by plugging-in \({\hat{\alpha }}_i({\varvec{\beta }})\) has three components: (i) the asymptotic bias of \({\hat{\alpha }}_i({\varvec{\beta }})\); (ii) the correlation between \({\hat{\alpha }}_i({\varvec{\beta }})\) and \(\hat{{\varvec{\beta }}}\) depending on the same data; (iii) the variance of \({\hat{\alpha }}_i({\varvec{\beta }})\). This result can be clarified by noticing that D comes from an expansion of the APE around \(\alpha _{i0}\), where the term \({\hat{\alpha }}_i({\varvec{\beta }}) - \alpha _{i0}\) is characterized by (3). Furthermore,

$$\begin{aligned} E = \textrm{E}_{T}\left[ \frac{\partial m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial {\varvec{\beta }}'}B\right] \end{aligned}$$

is the bias from plugging in \(\hat{{\varvec{\beta }}}\), instead if using \({\varvec{\beta }}_0\).

Further insights can be drawn from Expression (4). First, notice that even if a fixed–T consistent estimator of \({\varvec{\beta }}_0\) was available, the asymptotic bias of the APE estimator would still be of order \(O(T^{-1})\) because of the presence of D. Secondly, using a bias-corrected estimate of \(\alpha _{i}\) along with a fixed-T consistent estimator of \({\varvec{\beta }}\) would not remove the bias of order \(O(T^{-1})\), as it would not take care of the last component in D.

The sources of bias discussed above, however, have been shown to become asymptotically negligible under rectangular array asymptotics, as plug-in estimators of average effects converge at a rate slower than \((nT)^{-1/2}\). Dhaene and Jochmans (2015) summarize this property in their Theorem 5.1, which is based on the following rationale. Consider the infeasible estimator

$$\begin{aligned} \mu ^*_{k} \equiv \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0, {\varvec{x}}_{it}), \end{aligned}$$

and let \(\mu _{ik}\) be the individual-specific average partial effect, with mean \(\mu _{k0}\) and finite variance. Then, \(\mu ^*_k\) can be written as:

$$\begin{aligned} \mu ^*_k = \frac{1}{n}\sum _{i=1}^n \mu _{ik} + \frac{1}{n}\sum _{i=1}^n \left( \frac{1}{T}\sum _{t=1}^Tm_{itk}(\alpha _{i0}, {\varvec{\beta }}_0, {\varvec{x}}_{it}) - \mu _{ik} \right) . \end{aligned}$$

Notice that the first term converges to \(\mu _{k0}\) at the rate \(n^{-1/2}\), whereas the second term converges to zero at the rate \((nT)^{-1/2}\), thus implying that the infeasible APE estimator will converge no faster than \(n^{-1/2}\).

From the above expression, it is straightforward to notice that any feasible average-effect estimator will converge at the same rate as \(\mu ^*_k\), thus making the bias introduced by replacing \(\alpha _{i0}\) and \({\varvec{\beta }}_0\) with ML estimates, or their first order bias-corrected versions, asymptotically negligible. However, based on their simulation evidence, Dhaene and Jochmans (2015) still suggest using some bias correction of the APE estimator in finite samples, especially when T is small. The proposed method operates such bias reduction, as well as the alternative analytical and jackknife bias corrections recalled in the following section.

3 Estimation of average partial effects

In the following, we briefly review the existing strategies based on analytical and jackknife bias corrections, which represent the benchmark for the finite sample performance of the proposed estimator. We then illustrate the proposed methodology, which combines the consistent CML estimator of \({\varvec{\beta }}_0\) and the analytical bias correction for the APE. Finally, we turn to the dynamic logit, for which the proposed procedure is based on a PCML estimator.

3.1 Existing strategies

The available bias reduction techniques for the estimation of APEs for fixed-effects binary choice models are based on either analytical or jackknife bias corrections.Footnote 4

Analytical bias corrections for the APEs amount to plug-in a bias corrected estimate of \({\varvec{\beta }}\), say \(\hat{{\varvec{\beta }}}^{c} = \hat{{\varvec{\beta }}} - {\hat{B}}/T\), instead of the ML estimate in expression (2), along with \({\hat{\alpha }}_i(\hat{{\varvec{\beta }}}^{c})\). Doing so effectively removes the E component of the bias in (4), but the APE estimator is still plagued by the estimation noise in \({\hat{\alpha }}_i({\varvec{\beta }})\), giving rise to the D component. In order to remove it, the bias-corrected estimator of \(\mu _k\) is computed as:

$$\begin{aligned} {\hat{\mu }}^c_k = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\hat{{\varvec{\beta }}}^{c}), \hat{{\varvec{\beta }}}^{c}, {\varvec{x}}_{it}) - {\hat{D}}, \end{aligned}$$
(6)

where \({\hat{D}}\) is the sample counterpart of D in (5) and then evaluated at \({\hat{\alpha }}_i(\hat{{\varvec{\beta }}}^{c})\) and \(\hat{{\varvec{\beta }}}^{c}\). Expressions for panel binary choice models are given in Fernández-Val (2009),Footnote 5 whose derivations are based on general results for static (Hahn and Newey 2004) and dynamic (Hahn and Kuersteiner 2011) nonlinear panel data models. For the expressions as well as for further details, we refer the reader to Hahn and Newey (2004), Fernández-Val (2009), and Hahn and Kuersteiner (2011). A similar correction of the APEs is provided by Bester and Hansen (2009), who also perform a comparison of their proposal with alternative strategies.Footnote 6 Furthermore, for the static logit model, Stammann et al. (2016) develop a computationally more efficient implementation of the bias corrected estimator of structural parameters and APEs proposed by Hahn and Newey (2004), based on a pseudo-demeaning algorithm.

An alternative bias correction method for the APE estimator relies on the panel jackknife. A general procedure for nonlinear static panel data models is proposed by Hahn and Newey (2004). Let \(\hat{{\varvec{\beta }}}^{(t)}\) and \({\hat{\alpha }}^{(t)}_i(\hat{{\varvec{\beta }}}^{(t)})\) be the ML estimators with the tth observation excluded for every subject. Then, the jackknife corrected estimator for the APE is

$$\begin{aligned} {\hat{\mu }}^c_k = T{\hat{\mu }}_k - \frac{T-1}{T}\sum _{t=1}^T \mu _k\left( {\hat{\alpha }}^{(t)}_i(\hat{{\varvec{\beta }}}^{(t)}), \hat{{\varvec{\beta }}}^{(t)}\right) . \end{aligned}$$

If the set of model covariates includes the lag of explanatory variables, then leaving out one of the T observations at a time becomes unsuitable. Instead, a block of consecutive observations has to be considered so as to preserve the dynamic structure of the data. The so-called split panel jackknife estimator is proposed by Dhaene and Jochmans (2015). A simple version of the estimator is the half-panel jackknife, which is based on splitting the panel into two half-panels, also non-overlapping if T is even and \(T \ge 6\), with T/2 time periods. Denote the set of half-panels as

$$\begin{aligned} S = \{S_1, S_2\}, \quad S_1 = \{1,\ldots ,T/2 \}, S_2 = \{T/2 +1,\ldots ,T \}; \end{aligned}$$

then the half-panel jackknife estimator of the APE is

$$\begin{aligned} {\hat{\mu }}^{(1/2)}_k = 2{\hat{\mu }}_k - \frac{1}{2}\left( {\bar{\mu }}^{S_1}_k + {\bar{\mu }}^{S_2}_k\right) , \end{aligned}$$

where \({\bar{\mu }}^{S_1}_k\) and \({\bar{\mu }}^{S_2}_k\) are the plug-in estimators evaluated at the ML estimators of the individual effects and slope parameters obtained using the observations in subpanels \(S_1\) and \(S_2\), respectively. Dhaene and Jochmans (2015) also illustrate generalized versions of the half-panel jackknife to deal with odd T and overlapping subpanels, as well as an alternative jackknife estimator based on the split-panel log-likelihood correction.

It is finally worth mentioning that jackknife and analytical higher-order bias corrections for the slope parameters have also been proposed by Dhaene and Jochmans (2015) and Dhaene and Sun (2021), respectively, and can be extended to the estimation of APEs, but their viability in dynamic models is limited. In the first case, the authors warn that the magnitude of the bias of the terms that are not eliminated increases in the order of the correction with the half-panel jackknife. In the second case, the correction is developed under the assumption of independent observations, thus ruling out its application to dynamic models completely.

3.2 Proposed methodology

The proposed multiple-step strategy is based on removing the two sources of bias in (4) by (a) using the fixed-T consistent CML estimator of \({\varvec{\beta }}\), \(\tilde{{\varvec{\beta }}}\), instead of the ML estimator \(\hat{{\varvec{\beta }}}\) and (b) reducing the order of bias of the APE plug-in estimator, induced by \(\hat{\alpha _i}(\tilde{{\varvec{\beta }}})\), from \(O(T^{-1})\) to \(O(T^{-2})\) by applying the analytical bias-correction of Fernández-Val (2009) reported in Eq. (6).

3.2.1 Multiple-step estimation

The first step consists in estimating by CML the structural parameters of the logit model in (1). Taking the individual intercept \(\alpha _i\) as fixed, the joint probability of the response configuration \({\varvec{y}}_i = ( y_{i1}, \ldots , y_{iT})'\) conditional on \({\varvec{X}}_i = ({\varvec{x}}_{i1}, \ldots , {\varvec{x}}_{iT})\) can be written as:

$$\begin{aligned} p({\varvec{y}}_i \vert {\varvec{X}}_i, \alpha _i) = \frac{\exp \left( y_{i+}\alpha _i + \sum _{t=1}^Ty_{it}{\varvec{x}}_{it}'{\varvec{\beta }}\right) }{\prod _{t=1}^T \left[ 1 + \exp \left( \alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }}\right) \right] }, \end{aligned}$$

where the dependence of the probability on the left hand-side upon the slope parameters is suppressed to avoid abuse of notation. It is well known that the total score \(y_{i+} = \sum _{t=1}^T y_{it}\) is a sufficient statistic for the individual intercepts \(\alpha _i\) (Andersen 1970; Chamberlain 1980). The joint probability of \({\varvec{y}}_i\) conditional on \(y_{i+}\) does not depend on \(\alpha _i\) and can therefore be written as:

$$\begin{aligned} p({\varvec{y}}_i \vert {\varvec{X}}_i, y_{i+}) = \frac{\exp \left[ \left( \sum _{t=1}^T y_{it}{\varvec{x}}_{it}\right) ' {\varvec{\beta }}\right] }{\underset{\small {\varvec{z}}: z_+ = y_{i+}}{\sum }\exp \left[ \left( \sum _{t=1}^T z_t{\varvec{x}}_{it}\right) ' {\varvec{\beta }}\right] }, \end{aligned}$$
(7)

where the denominator is the sum over all the response configurations \({\varvec{z}}\) such that \(z_+ = y_{i+}\) and where the individual intercept \(\alpha _i\) has been cancelled out. The log-likelihood function is

$$\begin{aligned} \ell ({\varvec{\beta }}) = \sum _{i=1}^n {\textbf{I}}(0<y_{i+}<T) \log p({\varvec{y}}_i \vert {\varvec{X}}_i, y_{i+}), \end{aligned}$$

where the indicator function \({\textbf{I}}(\cdot )\) is included to take into account that observations with total score \(y_{i+}\) equal to 0 or T do not contribute to the log-likelihood. The above function can be maximized with respect to \({\varvec{\beta }}\) by a Newton–Raphson algorithm using standard results on the regular exponential family (Barndorff-Nielsen 1978), so as to obtain the CML estimator \(\tilde{{\varvec{\beta }}}\), which is \(\sqrt{n}\)-consistent and asymptotically normal with fixed–T (see, Andersen , 1970, and Chamberlain , 1980, for details). Therefore, if plugged into the APE formulation (2) instead of the ML estimator \(\hat{{\varvec{\beta }}}\), the E component of the bias in (4) is removed since \(\tilde{{\varvec{\beta }}} {\mathop {\rightarrow }\limits ^{p}}{\varvec{\beta }}_0 \) as \(n \rightarrow \infty \).

In the next step, we obtain estimates of the individual intercepts \(\alpha _i\), which are not directly available as they have been cancelled out by conditioning on the total score. Our strategy is to obtain the ML estimates of \(\alpha _i\), denoted \({\hat{\alpha }}_i(\tilde{{\varvec{\beta }}})\), for those subjects such that \(0< y_{i+} < T\), by maximizing the individual term \(\sum _{t=1}^T \log p_{\tilde{\small {\varvec{\beta }}}}(y_{it} \vert {\varvec{x}}_{it}, \alpha _i)\), where \(p_{\tilde{\small {\varvec{\beta }}}}(y_{it} \vert {\varvec{x}}_{it}, \alpha _i)\) is the probability of the logit model defined in (1) evaluated at the CML estimate, namely at \({\varvec{\beta }}= \tilde{{\varvec{\beta }}}\). As well as the ML estimator, the analytical and the jackknife bias correction, our proposal leads to an APE equal to zero for the subjects whose response configurations are made of only 0s and 1s, as the marginal effects are evaluated at the ML (non-finite) estimates of \(\alpha _i\). However, even if \({\varvec{\beta }}\) is fixed at some \(\sqrt{n}\)-consistent estimate, the bias of the ML estimator of \(\alpha _{i0}\) will still be of order \(O(T^{-1})\) because \({\hat{\alpha }}_i(\tilde{{\varvec{\beta }}}) {\mathop {\rightarrow }\limits ^{p}}\alpha _{i0}\) only as \(T \rightarrow \infty \). Stammann et al. (2016) and Bartolucci and Pigini (2019) consider such plug-in estimator and confirm by simulation that this source of bias, although rather small for the static logit model, indeed shows up in finite samples. Moreover, Bartolucci and Pigini (2019) report that the bias is more severe for the dynamic logit model.Footnote 7 In Sect. 4, we show that correcting for the bias generated by the use of \({\hat{\alpha }}_i({\varvec{\beta }})\) instead of \(\alpha _{i0}\), denoted by D in (4), is necessary in finite samples, especially with short T.

In the final step, the APEs are obtained by simply replacing the ML estimators in (2) with \(\tilde{{\varvec{\beta }}}\) and \({\hat{\alpha }}(\tilde{{\varvec{\beta }}})\) and reducing the bias from \(O(T^{-1})\) to \(O(T^{-2})\) by applying the bias correction proposed by Fernández-Val (2009), that is,

$$\begin{aligned} {\tilde{\mu }}_{k} = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\tilde{{\varvec{\beta }}}), \tilde{{\varvec{\beta }}}, {\varvec{x}}_{it}) - {\tilde{D}}, \end{aligned}$$

where \({\tilde{D}}\) denotes the sample counterpart of (5) evaluated in \(\tilde{{\varvec{\beta }}}\) and \({\hat{\alpha }}(\tilde{{\varvec{\beta }}})\). It is worth stressing that the proposed estimator exhibits the same asymptotic properties of any feasible average effect estimator under rectangular array asymptotics, as outlined in Sect. 2, since the CML estimator is also proved to be \(\sqrt{nT}\)-consistent.Footnote 8

3.2.2 Standard errors

In order to derive an expression for the standard errors of the APEs \(\tilde{{\varvec{\mu }}} = ({\tilde{\mu }}_1, \ldots , {\tilde{\mu }}_K)'\), we need to account for the variability in \({\varvec{x}}_{it}\) and the use of the estimated parameters \(\tilde{{\varvec{\beta }}}\) in the first step. For the latter, we rely on the generalized method of moments (GMM) approach by Hansen (1982) and also implemented by Bartolucci and Nigro (2012) for the quadratic exponential model. In particular, following Newey and McFadden (1994), we formulate the proposed multi-step procedure as the solution of the system of estimating equations

$$\begin{aligned} {\varvec{f}}({\varvec{\beta }}, {\varvec{\mu }}) = {\varvec{0}}, \end{aligned}$$

where

$$\begin{aligned} {\varvec{f}}({\varvec{\beta }}, {\varvec{\mu }})= & {} \sum _{i=1}^n {\varvec{f}}_i({\varvec{\beta }}, {\varvec{\mu }}),\nonumber \\ {\varvec{f}}_i({\varvec{\beta }},{\varvec{\mu }})= & {} \begin{pmatrix} {\varvec{\nabla }}_{\small {\varvec{\beta }}}\ell _i({\varvec{\beta }})\\ {\varvec{\nabla }}_{\mu _1}g_i({\varvec{\beta }},\mu _1)\\ \vdots \\ {\varvec{\nabla }}_{\mu _K}g_i({\varvec{\beta }},\mu _K)\\ \end{pmatrix}, \end{aligned}$$
(8)

and

$$\begin{aligned} g_i({\varvec{\beta }},\mu _k)=\frac{1}{T}\sum _{t=1}^T\left[ m_{itk}(\alpha _i({\varvec{\beta }}),{\varvec{\beta }}, {\varvec{x}}_{it})-\mu _k\right] ^2, \quad \, k=1,\ldots ,K. \end{aligned}$$

The asymptotic variance of \((\tilde{{\varvec{\beta }}}^{\prime },\tilde{{\varvec{\mu }}}^{\prime })^{\prime }\) is then

$$\begin{aligned} {\varvec{W}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})= {\varvec{H}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})^{-1}{\varvec{S}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})[{\varvec{H}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})^{-1}]^{\prime }, \end{aligned}$$
(9)

where

$$\begin{aligned} {\varvec{S}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})= \sum _{i=1}^n {\varvec{f}}_i(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}}){\varvec{f}}_i(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})^{\prime }. \end{aligned}$$

Moreover, we have that

$$\begin{aligned} {\varvec{H}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})= \sum _{i=1}^n {\varvec{H}}_i(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}}), \end{aligned}$$

where

$$\begin{aligned} {\varvec{H}}_i({\varvec{\beta }},{\varvec{\mu }})= \left( \begin{array}{cc}{\varvec{\nabla }}_{\small {\varvec{\beta }}\small {\varvec{\beta }}} \, \ell _i({\varvec{\beta }}) &{} {\varvec{O}} \\ {\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\beta }}}\, {\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }}) &{} {\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\mu }}}\, {\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }}) \end{array}\right) \end{aligned}$$
(10)

is the derivative of \({\varvec{f}}_i({\varvec{\beta }},{\varvec{\mu }})\) with respect to \(({\varvec{\beta }},{\varvec{\mu }})\), with \({\varvec{O}}\) denoting a \(K \times K\) matrix of zeros and \({\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }})\) collects \( g_i({\varvec{\beta }},\mu _k)\), for \(k = 1,\ldots ,K\). Expressions for the derivatives in (8) are

$$\begin{aligned} \nabla _{\small {\varvec{\beta }}} \ell _i({\varvec{\beta }}) = \sum _{t=1}^T y_{it}{\varvec{x}}_{it} - \sum _{\small {\varvec{z}}: z_+ = y_{i+}} \left( p({\varvec{z}} \vert {\varvec{X}}_i, y_{i+}) \sum _{t=1}^T z_t {\varvec{x}}_{it} \right) , \end{aligned}$$

and

$$\begin{aligned} \nabla _{\mu _k} g_i({\varvec{\beta }}, \mu _k) = - \frac{2}{T}\sum _{t=1}^T \left[ m_{itk}( \alpha _i({\varvec{\beta }}), {\varvec{\beta }}, {\varvec{x}}_{it}) - \mu _k \right] . \end{aligned}$$

The second derivatives in (10) are

$$\begin{aligned} \nabla _{\small {\varvec{\beta }}\small {\varvec{\beta }}} \ell _i({\varvec{\beta }}) = \sum _{\small {\varvec{z}}: z_+ = y_{i+}}p({\varvec{z}} \vert {\varvec{X}}_i, y_{i+}) {\varvec{e}}({\varvec{z}}, {\varvec{X}}_i){\varvec{e}}({\varvec{z}}, {\varvec{X}}_i)', \end{aligned}$$

where

$$\begin{aligned} {\varvec{e}}({\varvec{z}}, {\varvec{X}}_i) = \sum ^T_{t=1} z_t {\varvec{x}}_{it} - \sum _{\small {\varvec{z}}: z_+ = y_{i+}}\left( p({\varvec{z}} \vert {\varvec{X}}_i, y_{i+}) \sum _{t=1}^T z_t {\varvec{x}}_{it} \right) , \end{aligned}$$

and \({\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\mu }}} \, {\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }})\) is a \(K \times K\) diagonal matrix with elements equal to 2. Finally, for the computation of the block \({\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\beta }}}g_i({\varvec{\beta }},{\varvec{\mu }})\), we rely on a numerical differentiation. Once the matrix in (9) is computed, the standard errors for the APEs \(\tilde{{\varvec{\mu }}}\) may be obtained by taking the square root of the elements in the main diagonal of the lower right submatrix of \({\varvec{W}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})\).

3.2.3 Dynamic logit model

The method proposed to obtain the APE for the logit model cannot be applied directly to the dynamic logit model (Hsiao 2005). In the latter case, the conditional probability of \(y_{it}\) is

$$\begin{aligned} p(y_{it}\vert {\varvec{x}}_{it}, y_{i,t-1}; \alpha _i, {\varvec{\beta }}, \gamma ) = \frac{\exp \left[ y_{it}(\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }}+ y_{i,t-1}\gamma )\right] }{1 + \exp (\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }}+ y_{i,t-1}\gamma )}, \end{aligned}$$
(11)

where \(\gamma \) is the regression coefficient for the lagged response variable that measures the true state dependence. Plugging the CML estimator of \({\varvec{\beta }}\) and \(\gamma \) in the APE formulation is not viable in this case because the total score is no longer a sufficient statistic for the incidental parameters if the lag of the dependent variable is included among the model covariates. Conditioning on sufficient statistics eliminates the incidental parameters only in the special case of \(T = 3\) and no other explanatory variables (Chamberlain 1985). Honoré and Kyriazidou (2000) extend this approach to include explanatory variables and the corresponding parameters can be estimated by CML on the basis of a weighted conditional log-likelihood. However, time effects cannot be included in the model specification, and the estimator’s rate of convergence to the true parameter value is slower than \(\sqrt{n}\).

More recently, Honoré and Weidner (2020) generalize the approach by Honoré and Kyriazidou (2000) to include any type of strictly exogenous covariate and by providing a \(\sqrt{n}\)-consistent generalized method of moments (GMM) estimator, based on moment conditions that are free of the incidental parameters. The viability of their approach in practice, though, has to be assessed since this estimator, as GMM estimators in general, suffers from a considerable small sample bias when built on a large number of moment conditions, which is shown to rapidly increase in the number of time occasions and in the number of covariates. A different perspective is taken by Bartolucci and Nigro (2010) who, instead of the dynamic logit, consider a QE formulation (Cox 1972) to model dynamic binary panel data, that has the advantage of admitting sufficient statistics for the individual intercepts.

Bartolucci and Nigro (2012) propose a QE model that approximates more closely the dynamic logit model, the parameters of which can easily be estimated by PCML. Under the approximating model, each \(y_{i+}\) is a sufficient statistic for the fixed effect \(\alpha _i\). By conditioning on the total score, the joint probability of \({\varvec{y}}_i\) becomes:

$$\begin{aligned} p^*({\varvec{y}}_i\vert {\varvec{X}}_i,y_{i0},y_{i+})=\frac{\exp \left( \sum _{t=1}^Ty_{it}{\varvec{x}}_{it}'{\varvec{\beta }}-\sum _{t=1}^T{\bar{q}}_{it}y_{i,t-1}\gamma + y_{i*}\gamma \right) }{\underset{\small {\varvec{z}}:z_+=y_{i+}}{\sum } \exp \left( \sum _{t=1}^Tz_t{\varvec{x}}_{it}'{\varvec{\beta }}-\sum _{t=1}^T{\bar{q}}_{it}z_{i,t-1}\gamma + z_{i*}\gamma \right) },\nonumber \\ \end{aligned}$$
(12)

where \(y_{i*} = \sum _{t=1}^T y_{i,t-1}y_{it}\) and \(z_{i*} = y_{i0}z_1 + \sum _{t>1} z_{t-1}z_t\). Moreover, \({\bar{q}}_{it}\) is a function of given values of \({\varvec{\beta }}\) and \(\alpha _i\), resulting from a first-order Taylor series expansion of the log-likelihood based on (11) around \({\varvec{\beta }}= \bar{{\varvec{\beta }}}\) and \(\alpha _i = {\bar{\alpha }}_i\), \(i = 1,\ldots ,n\), and \(\gamma = 0\) (see Bartolucci and Nigro 2012, for details). The expression for \({\bar{q}}_{it}\) is then

$$\begin{aligned} {\bar{q}}_{it} = \frac{\exp ({\bar{\alpha }}_i + {\varvec{x}}_{it}'\bar{{\varvec{\beta }}})}{1 + \exp ({\bar{\alpha }}_i + {\varvec{x}}_{it}'\bar{{\varvec{\beta }}}) }. \end{aligned}$$

Expressions for the partial effects and APEs are derived in the same way as for the static logit model. Let \({\varvec{w}}_{it} = ({\varvec{x}}_{it}', y_{i,t-1})'\) collect the \(K+1\) model covariates. Based on (11), the partial effect of covariate \(w_{itk}\) for i at time t on the probability of \(y_{it} = 1\) can be written as:

$$\begin{aligned} m_{itk}(\alpha _i, {\varvec{\theta }}, {\varvec{w}}_{it}) = \left\{ \begin{array}{ll} p(y_{it} = 1 \vert {\varvec{w}}_{it}; \alpha _i, {\varvec{\beta }}, \gamma )\left[ 1 - p(y_{it} = 1 \vert {\varvec{w}}_{it}; \alpha _i, {\varvec{\beta }}, \gamma )\right] \beta _k, &{} w_{itk} \,\,\, \textrm{continuous}, \\ &{} \\ p(y_{it} = 1\vert {\varvec{w}}_{it,-k}, w_{itk} = 1; \alpha _i, {\varvec{\beta }}, \gamma ) - &{} \\ p(y_{it} = 1 \vert {\varvec{w}}_{it,-k}, w_{itk} = 0; \alpha _i, {\varvec{\beta }}, \gamma ),&{} w_{itk} \,\,\, \textrm{binary}, \\ \end{array} \right. \end{aligned}$$

where \({\varvec{w}}_{it, -k}\) denotes the vector \({\varvec{w}}_{it}\) excluding \(w_{itk}\), and \({\varvec{\theta }}= ({\varvec{\beta }}', \gamma )'\). This expression may also be used to compute the APE of the lagged response variable. Notice that this function does not depend on \(\bar{{\varvec{\beta }}}\), since the probability in (11) does not depend on \({\bar{q}}_{it}\). The APE of the kth covariate can then be obtained by taking the expected value of \(m_{itk}(\alpha _i, {\varvec{\theta }}, {\varvec{w}}_{it})\) with respect to \({\varvec{w}}_{it}\) and \(\alpha _{i0}\), and can be written as

$$\begin{aligned} \mu _{k0} = \int m_{itk}(\alpha _{i0}, {\varvec{\theta }}_0, {\varvec{w}}_{it})dG(\alpha _{i0}, {\varvec{w}}_{it}), \end{aligned}$$

where \(G(\alpha _{i0}, {\varvec{w}}_{it})\) denotes the joint distribution of \(\alpha _{i0}\) and \({\varvec{w}}_{it}\).

As for the static logit model, the estimate of \(\mu _{k0}\) is based on those of the \(\alpha _i\), which we obtain in the same manner as in the first step described in Sect. 3.2.1. In addition, here the CML estimation of \({\varvec{\theta }}\) based on (12) relies on a preliminary step in order to obtain \({\bar{q}}_{it}\). In the first step, a preliminary estimate of \(\bar{{\varvec{\beta }}}\) is obtained by maximizing the conditional log-likelihood

$$\begin{aligned} \ell (\bar{{\varvec{\beta }}}) = \sum _{i=1}^n {\textbf{I}}(0<y_{i+}<T)\ell _i(\bar{{\varvec{\beta }}}), \end{aligned}$$

where

$$\begin{aligned} \ell _i(\bar{{\varvec{\beta }}}) = \log \frac{\exp \left[ \left( \sum _{t=1}^T y_{it}{\varvec{x}}_{it}\right) ' \bar{{\varvec{\beta }}} \right] }{\underset{\small {\varvec{z}}: z_+ = y_{i+}}{\sum }\exp \left[ \left( \sum _{t=1}^T z_t{\varvec{x}}_{it}\right) '\bar{{\varvec{\beta }}} \right] }, \end{aligned}$$

which is the same conditional log-likelihood of the static logit model and may be maximized by a standard Newton–Raphson algorithm. We denote the resulting CML estimator by \(\check{{\varvec{\beta }}}\). The estimate \({\check{\alpha }}_i\) is then computed by maximizing the individual log-likelihood

$$\begin{aligned} \ell _i({\bar{\alpha }}_i) = \sum _{t=1}^T \log \frac{\exp \left[ y_{it}({\bar{\alpha }}_i + {\varvec{x}}_{it}' \check{{\varvec{\beta }}}) \right] }{1 + \exp ({\bar{\alpha }}_i + {\varvec{x}}_{it}' \check{{\varvec{\beta }}})},\end{aligned}$$

where \(\check{{\varvec{\beta }}}\) is fixed. The probability \({\bar{q}}_{it}\) in (12) is estimated as \(\check{q}_{it} = \exp ({\check{\alpha }}_i + {\varvec{x}}_{it}'\check{{\varvec{\beta }}})/\left[ 1 + \exp ({\check{\alpha }}_i + {\varvec{x}}_{it}'\check{{\varvec{\beta }}}) \right] \).

In the second step, we estimate \({\varvec{\theta }}\) by maximizing the conditional log-likelihood

$$\begin{aligned} \ell ({\varvec{\theta }}) = \sum _{i=1}^n {\textbf{I}}(0<y_{i+}<T) \log p^*_{\check{\small {\varvec{q}}}_i}({\varvec{y}}_i\vert {\varvec{X}}_i,y_{i0},y_{i+}), \end{aligned}$$

where \(p^*_{\check{\small {\varvec{q}}}_i}({\varvec{y}}_i\vert {\varvec{X}}_i,y_{i0},y_{i+})\) is the joint probability in (12) evaluated at \(\check{{\varvec{q}}}_i = (\check{q}_{i1}, \ldots , \check{q}_{iT})'\). The above function can easily be maximized with respect to \({\varvec{\theta }}\) by a Newton–Raphson algorithm, so as to obtain the PCML estimator \(\tilde{{\varvec{\theta }}}\), which is a \(\sqrt{n}\)-consistent estimator of \({\varvec{\theta }}_0\) only if \(\gamma _0 = 0\), representing the special case in which the QE model corresponds to the dynamic logit model.Footnote 9 Nonetheless, Bartolucci and Nigro (2012) show that the PCML estimator has a limited bias in finite samples even in the presence of non-negligible state dependence.

The next step consists of recovering the estimates of \(\alpha _i\), \({\tilde{\alpha }}_i(\tilde{{\varvec{\theta }}})\), by maximizing the concentrated log-likelihood evaluated at \(\tilde{{\varvec{\theta }}}\). In the final step, the APEs can then be estimated by plugging \({\tilde{\alpha }}_i(\tilde{{\varvec{\theta }}})\) and \(\tilde{{\varvec{\theta }}}\) in the APE formulation and applying the same correction shown in Sect. 3.2.1, so as to obtain

$$\begin{aligned} {\tilde{\mu }}_{k} = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\tilde{{\varvec{\theta }}}), \tilde{{\varvec{\theta }}}, {\varvec{w}}_{it}) - {\tilde{D}}. \end{aligned}$$

Standard errors for \({\tilde{\mu }}_k\) can be obtained exactly in the same way as illustrated in Sect. 3.2.2 with the appropriate change of notation.

4 Monte Carlo simulation study

In the following, we illustrate the design and report the main results of the simulation studies aimed at assessing the finite sample performance of the estimators of the APEs for the static and dynamic logit models. We also discuss the results of some robustness exercises, with the related tables reported in Appendix.

4.1 Simulation set-up

We generate data for the logit model according to the formulation proposed by Honoré and Kyriazidou (2000), that is, for \( i = 1,\ldots ,n\),

$$\begin{aligned} y_{it}= & {} {\textbf{I}}(\alpha _i + y_{i,t-1}\gamma + x_{it}\beta + \varepsilon _{it} > 0), \quad t = 1,\ldots ,T, \end{aligned}$$
(13)
$$\begin{aligned} y_{i0}= & {} {\textbf{I}}(\alpha _i + x_{i0}\beta + \varepsilon _{i0} > 0), \end{aligned}$$
(14)

where \(x_{it} \sim N(0, \pi ^2/3)\) and \(\varepsilon _{it}\) follows a standard logistic distribution for \(t = 0, \ldots , T\).

Based on the above design, we generate data from the static and dynamic logit models as follows. For the static logit model, data are generated under assumption (13) with \(\gamma = 0\) and \(\beta = 1\), for \(t=1, \ldots , T\). Here the individual intercepts are given by \(\alpha _i = \sum _{t = 1}^{4} x_{it} / 4 \). For the dynamic logit model, data are generated for \(t=0,\ldots ,T\) using also (14), \(\gamma \) in (13) takes values (0.25, 0.5, 0.75), \(\beta = 1\), and the individual heterogeneity is generated as \(\alpha _i = \sum _{t = 0}^{3} x_{it} / 4\). We consider the same scenarios for both the static and dynamic logit model, corresponding to \(n = 100, 500\) and \(T = 4, 8, 12\), and the number of Monte Carlo replications is 1000.

For the static logit model, we compare the finite sample performance of the proposed APE estimator (denoted by CML-BC) with: (a) the ML plug-in estimator (ML); (b) Hahn and Newey (2004)’s jackknife bias corrected estimator (Jackknife-BC); (c) the ML estimator with the analytical bias correction (Analytical-BC) provided by Fernández-Val (2009), also mentioned in the previous section.

For the dynamic logit model, we compare the finite sample performance of the proposed APE estimator (PCML-BC) with: (a) the ML plug-in estimator; (b) Dhaene and Jochmans (2015)’s half-panel jackknife bias-corrected estimator (Jackknife-BC); (c) the analytically bias-corrected estimator (Analytical-BC) by Fernández-Val (2009). It must be noted that the half-panel Jackknife-BC estimator cannot be computed for \(T=4\).

For each scenario, we report the mean and the median of the ratio \({\tilde{\mu }}/\mu ^*\), the standard deviation of \({\tilde{\mu }}\), the rejection frequency at the \(5\%\) and \(10\%\) nominal value of a t-test for the true value of the APE, and the mean ratio between the estimator standard error and standard deviation.Footnote 10

4.2 Main results

Table 1 reports the simulation results for the static logit model. It emerges that the proposed estimator (CML-BC) has good finite sample performance with both small n and T. On the contrary, the Jackknife-BC and the Analytical-BC exhibit a sizable bias when \(T = 4\) and produce unreliable coverage intervals. Actually, the simulation results of the ML estimator, especially for \(T \ge 8\), suggest that the bias correction is unnecessary in this case. Therefore, it emerges here, and also from the results reported by Fernández-Val (2009), that for the APEs of static models the sources of bias are negligible not only asymptotically, as discussed in Sect. 2, but also in finite samples.

Table 1 Simulation results for \({\tilde{\mu }}\), static logit model

Bias corrections are bound to be more relevant for the dynamic logit, as ML is known to produce a severely biased estimator of the state dependence parameter in auto-regressive formulations for both linear and nonlinear models (Heckman 1981; Nickell 1981). Tables 2, 3 and 4 report the results for the partial effect relative to the state dependence parameter, \(\mu _y\). The simulation results of the APEs for the covariate, denoted \(\mu _x\), are reported in Tables 8, 9 and 10 in Appendix A.1.

Results confirm that the bias of plug-in APEs based on ML estimates is greater, especially concerning the partial effect of the state dependence parameters. While all bias corrections produce a remarkable improvement over ML, the proposed estimator outperforms the Analytical-BC and Jackknife-BC. With \(\gamma = 0.25\), the advantage is noticeable across all the scenarios considered. Furthermore, the proposed methodology seems to provide the most reliable confidence intervals among the examined estimators. In this regard, it is worth noticing that when \(T = 4\), all the estimators provide poor coverage. As for the APE of the covariate \(\mu _x\), the PCML-BC exhibits a better performance with \(T=4\), whereas all the bias correction strategies have comparable performance with larger values of T. These results suggest that the use of a fixed-T consistent estimator, even though for an approximating model, offers an advantage when the bias is sizable, as removing only the \(O(T^{-1})\) component may not provide enough of a reduction.

As discussed in Section 3.2.3, while the PCML estimator is consistent only when \(\gamma = 0\), Bartolucci and Nigro (2012) show that its finite-sample bias is limited even in presence of non-negligible state dependence. The results in Tables 2, 3 and 4 confirm that the PCML-BC APE estimator has the same behaviour: although the advantage over the alternative approaches is reduced with \(\gamma = 0.5\) and \(\gamma = 0.75\), the proposed approach remains the most effective strategy to correct APEs for the state dependence parameter, especially with short T.

Table 2 Simulation results for \({\tilde{\mu }}_y\), dynamic logit model, \(\gamma = 0.25\)
Table 3 Simulation results for \({\tilde{\mu }}_y\), dynamic logit model, \(\gamma = 0.5\)
Table 4 Simulation results for \({\tilde{\mu }}_y\), dynamic logit model, \(\gamma = 0.75\)

It is finally worth recalling that further to plugging fixed-T (pseudo)consistent estimators for the slope parameters in the APEs, the good finite-sample performance of the proposed approach also benefits form the correction aimed to remove the bias originating from the ML estimate of the unobserved heterogeneity, denoted by D in (4). In order to illustrate the indirect bias effect introduced by \({\hat{\alpha }}_i({\varvec{\beta }})\), we report in Table 5 the results of a simulation exercise where we compare the performance of the proposed approach (P)CML-BC with that of an APE estimator based on (P)CML estimates ignoring the bias correction. The latter approach is employed by Stammann et al. (2016) and was considered on an earlier version of this work by Bartolucci and Pigini (2019). It clearly emerges that a bias reduction is needed for both the static and dynamic logit model, especially with \(T=4\). Besides, these results also highlight that plugging fixed-T consistent estimators already brings a sizable improvement over the ML-based APEs.

Table 5 Simulation results for the static and dynamic logit, comparison between (P)CML and (P)CML-BC

4.3 Robustness exercises

In this section, we illustrate the design and discuss the results of a series of robustness exercises aimed to test the performance of the proposed approach under potentially problematic departures from the baseline setting. The results are collected in Appendix A.2.

The first set of exercises concerns the violation of standard hypotheses that are necessary for the asymptotic results on the bias correction of the ML estimator to hold. In fact, it is worth recalling the characterization of this correction that is based on a large-T asymptotic expansion, which theoretically rules out the possibility of including time dummies in the model specification as, with large-T asymptotics, time fixed-effects are incidental parameters as well.Footnote 11 We devise an experiment where data are generated from the dynamic logit model defined in (13)-(14), with the addition of a trending regressor \(\eta _t = -1 + \frac{2(t-1)}{T-1}\) in \([-1,1]\) and the model specification then includes time effects as well. The results for \(T = 4, 8\) are reported in Table 11 and suggest overall robustness of the approaches considered to the inclusion of time dummies, as also documented by Fernández-Val (2009).

Large-T asymptotics also requires stationarity of covariates, which may be rather restrictive in applications. Yet both Fernández-Val (2009) and Dhaene and Jochmans (2015) report similar performance of the approaches under departures from this assumption. We confirm the same behaviour for the proposed approach by means of a simulation exercise where data are generated from a logit model with a trending regressor. The design is based on the one adopted by Hahn and Newey (2004), where

$$\begin{aligned} y_{it} = {\textbf{I}}(\alpha _i + x_{it}\beta + \varepsilon _{it} > 0), \quad i = 1,\ldots ,n,\,\, t = 1,\ldots ,T, \end{aligned}$$

with \(\beta = 1\), \(\alpha _i \sim N(0,1)\), the error terms \(\varepsilon _{it}\) follow a standard logistic distribution, and

$$\begin{aligned} x_{it} = t/10 + x_{i,t-1}/2 + u_{it}, \end{aligned}$$

with \(u_{it} \sim U\left[ -0.5, 0.5\right] \) and \(x_{i0} = u_{i0}\). According to the results reported in Table 12, there are no remarkable differences with respect to the baseline scenario.Footnote 12

The second set of exercises explores the finite-sample performance of the proposed approach with different settings for the unobserved heterogeneity. We first consider the design based on assumptions (13)-(14) with individual intercepts generated as \(\alpha _i = \sum _{t=0}^3 x_{it}/4 + (u_i -1)\), with \(u_i \sim \chi ^2_1\). The corresponding results are reported in Table 13 for the scenarios with \(T = 4,8\). As expected with fixed-effects approaches, results are unaffected by the distribution of the unobserved heterogeneity.

We then consider a shift in the distribution of the individual effects and generate them as \(\alpha _i = \sum _{t=0}^3 x_{it}/4 - 3.5\). In this way, we generate samples where the frequency of 1s amounts to about \(12\%\), whereas in the baseline setting was around \(53\%\), thus simulating a response variable describing rare events, which is often realistic in applications. Results are reported in Table 14 and, as expected, document a deterioration in the performance of all the approaches considered. This is due to the limited availability of useful response configurations, which generates a larger small sample bias in the parameter estimates and increases the number of instances in which the PE must be set to zero.

5 Empirical application

We apply the proposed formulation to the problem of estimating the labour supply of married women. The same empirical application is considered by Fernández-Val (2009) and Dhaene and Jochmans (2015), after the seminal work of Hyslop (1999). The sample is drawn from the Panel Study of Income Dynamics (PSID), which consists of \(n = 1{,}908\) married women between 19 and 59 years of age in 1980, followed for \(T = 6\) time occasions, from 1980 to 1985, further to an additional observation in 1979 exploited as initial condition in dynamic models. We specify a static logit model for the probability of being employed at time t, conditional on the number of children of a certain age in the family, namely the number of kids between 0 and 2 years old, between 3 and 5, and between 6 and 17, and on the husband’s income. We also specify a dynamic logit model, that is, we include lagged participation in the set of model covariates.

Table 6 Female labour force participation: static logit model
Table 7 Female labour force participation: dynamic logit model

The estimation results for the static logit model are reported in Table 6, which shows the ML, Hahn and Newey (2004)’s panel Jackknife-BC, Hahn and Newey (2004)’s Analytical-BC, and CML estimates of the model parameters. The CML, Analytical-BC, and Jackknife-BC estimates of the parameters are all similar to each other and smaller (in absolute value) than the uncorrected ML ones. These suggest a negative effect on labour participation of having children younger than 17 in the household as well as of the level of the husband’s income. The estimated APEs obtained with the proposed method suggest that having an additional child between 0 and 2 reduces the probability of working by 8.9 percentage points, and having a child between 3 and 5 years old reduces the employment probability by 6.1 percentage points. The APE estimates obtained with the Analytical-BC and Jackknife-BC estimators point toward the same results, with the exception of having children between 6 and 17 years old, which appear to be not statistically significant, according to our procedure.

Table 7 reports the results for the dynamic logit specification. Here, we report the ML, Dhaene and Jochmans (2015)’s half-panel Jackknife-BC, Fernández-Val (2009)’s Analytical-BC, and PCML estimates of the model parameters. The effect of the exogenous model covariates is now smaller and all the APE estimates suggest a negative and statistically significant effect of having children between 0 and 5 years old in the household.

The PCML estimator detects a strong state dependence in the labour force participation of married women, as the estimated coefficient for lagged participation amounts to 1.706. In terms of APE, this is translated into an increase of 15.7 percentage points in the probability of being employed at time t for a woman who was working in \(t-1\), with respect to a woman who was not working in \(t-1\).

6 Conclusion

We develop a multiple-step procedure to compute APEs for fixed-effects logit models that are estimated by CML. Our strategy amounts to building a plug-in APE estimator based on the fixed-T consistent CML estimator of the slope parameters and bias-corrected estimates of APEs.

The proposed estimator is asymptotically equivalent to the plug-in ML and alternative bias-corrected APE estimators, and it exhibits comparable finite sample performance when the static logit model is considered. On the contrary, the proposed approach for the dynamic logit model has a remarkable advantage in finite samples. In this respect, the multiple-step procedure here developed could be particularly useful for practitioners who often deal with short-T datasets, such as rotated surveys, and/or highly unbalanced panels.