Conditional inference and bias reduction for partial effects estimation of fixed-effects logit models

Bartolucci, Francesco; Pigini, Claudia; Valentini, Francesco

doi:10.1007/s00181-022-02313-6

Conditional inference and bias reduction for partial effects estimation of fixed-effects logit models

Open access
Published: 26 October 2022

Volume 64, pages 2257–2290, (2023)
Cite this article

Download PDF

You have full access to this open access article

Empirical Economics Aims and scope Submit manuscript

Conditional inference and bias reduction for partial effects estimation of fixed-effects logit models

Download PDF

Francesco Bartolucci¹^na1,
Claudia Pigini²^na1 &
Francesco Valentini ORCID: orcid.org/0000-0001-9986-1334²^na1

1920 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We propose a multiple-step procedure to compute average partial effects (APEs) for fixed-effects static and dynamic logit models estimated by (pseudo) conditional maximum likelihood. As individual effects are eliminated by conditioning on suitable sufficient statistics, we propose evaluating the APEs at the maximum likelihood estimates for the unobserved heterogeneity, along with the fixed-T consistent estimator of the slope parameters, and then reducing the induced bias in the APEs by an analytical correction. The proposed estimator has bias of order $O(T^{-2})$, it performs well in finite samples and, when the dynamic logit model is considered, better than alternative plug-in strategies based on bias-corrected estimates for the slopes, especially in panels with short T. We provide a real data application based on labour supply of married women.

Indirect Inference Estimation of a First-Order Dynamic Panel Data Model

Article 29 November 2021

Non-linear Panel Data Models

Measurement Error in the Linear Dynamic Panel Data Model

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Static and dynamic binary choice models are largely employed in microeconometric applications.^{Footnote 1} For these models, the fixed-effects approach is often advocated, as it allows for the estimation of partial effects of covariates that may be correlated with the individual specific unobserved heterogeneity in a nonparametric manner. However, unless the number of time occasions T goes to infinity, the maximum likelihood (ML) estimator of fixed-effects binary choice models is inconsistent due to the incidental parameters problem, that is, the presence of nuisance parameters whose number increases with the sample size (Lancaster 2000; Neyman and Scott 1948).

A popular method to overcome the incidental parameters problem is based on the conditional inference approach for the fixed-effects logit model (Andersen 1970; Chamberlain 1980), which admits sufficient statistics for the individual intercepts. The conditional ML (CML) method produces a fixed-T consistent estimator of the slope parameters, which makes it a particularly attractive strategy given the large availability of data sets, such as those produced by national household and workforce surveys, that are based on a rotating sampling scheme. However, a drawback of the CML approach is that plug-in estimates of the average partial effects (APEs) are not directly available, as the parameters for the individual effects are eliminated.

An alternative way to deal with the incidental parameters problem is to correct the bias of the ML estimator of the slope parameters. While the resulting estimator will not be fixed-T consistent, this approach is general and can be applied to any nonlinear model, as opposed to the CML strategy that is specific for the logit model. Several procedures have been proposed to reduce the order of the bias of the ML estimator from $O(T^{-1})$ to $O(T^{-2})$. Analytical bias corrections are provided by Fernández-Val (2009), whose derivations are based on general results for static (Hahn and Newey 2004) and dynamic (Hahn and Kuersteiner 2011) nonlinear panel data models. In the same vein, a modified log-likelihood function and score equation are proposed by Bester and Hansen (2009) and Carro (2007), respectively, to achieve the same bias reduction. An alternative bias correction method relies on the panel jackknife. A general procedure for nonlinear static panel data models is proposed by Hahn and Newey (2004), whereas a split-panel jackknife estimator is developed by Dhaene and Jochmans (2015) for dynamic models.

The advantage of relying on ML estimates is that plug-in estimators of the APEs are readily available. The estimators of the APEs share the same order of bias as the ML estimator, and the related corrections can be operated in a similar manner. However, especially in the case of dynamic binary choice models, the ML estimator is known to be severely biased (Heckman 1981) and the bias-corrected ML estimator exhibits a greater distortion compared to fixed-T consistent ones with short T (Carro 2007; Bartolucci and Nigro 2012). Practitioners are also usually more familiar with the CML approach, as it has become prominent in the applied literature perhaps due to its long-standing presence in graduate textbooks (Cameron and Trivedi 2005; Hsiao 2005; Wooldridge 2010) and its built-in implementation in popular microeconometrics software.

In this paper, we propose a multiple-step procedure to estimate the APEs in the fixed-effects logit model that combines the conditional inference approach with a bias reduction method. The APEs are evaluated at the fixed-T consistent CML estimator of the slope parameters and at the ML estimator for the unobserved heterogeneity, obtained by maximizing the log-likelihood evaluated at the CML estimate. Plugging in the estimated fixed effects produces an additional source of bias in the APEs; we reduce the order of this bias from $O(T^{-1})$ to $O(T^{-2})$ by applying the analytical correction proposed by Fernández-Val (2009). In this respect, our approach extends the work of Stammann et al. (2016), who study the same plug-in estimator without the bias correction.

The proposed procedure cannot be directly extended to the dynamic logit model (Hsiao 2005), for which CML inference for the slope parameters is not viable in a simple form. This is overcome by Bartolucci and Nigro (2010), who propose a quadratic exponential (QE) formulation (Cox 1972) for dynamic binary panel data models, which has the advantage of admitting sufficient statistics for the individual intercepts. Furthermore, Bartolucci and Nigro (2012) propose a QE model that closely approximates the dynamic logit model, the parameters of which can easily be estimated by pseudo CML (PCML). The resulting PCML estimator is consistent in the absence of state dependence, because in this case the QE model corresponds to the dynamic logit model and, as shown by Bartolucci and Nigro (2012) by simulation, it otherwise exhibits a moderate bias. We therefore extend the proposed procedure to include PCML estimates in the APEs when a dynamic logit is specified.

As it happens with the APE estimators based on analytical and jackknife corrections, the proposed method reduces the order of the bias from $O(T^{-1})$ to $O(T^{-2})$. However, such a bias is asymptotically negligible under rectangular array asymptotics as plug-in average-effect estimators converge at the rate $n^{-1/2}$ (Dhaene and Jochmans 2015), where n is the sample size. Yet in spite of the asymptotic equivalence of bias-corrected and ML plug-in APE estimators, the simulation evidence provided by Dhaene and Jochmans (2015) suggests that operating some bias reduction entails a non-negligible improvement in small samples, especially with small values of T.

The proposed combination of the conditional inference approach with bias reduction provides a way to readily obtain APE estimates for the fixed-effects static and dynamic logit models. By means of an extensive simulation study, we show that the proposed approach has finite sample performance comparable to the ML and bias-corrected estimators with the static logit model, while it outperforms them when the dynamic logit is considered, especially when n and T are small. This is the result of plugging a fixed-T consistent estimator of the slope and state dependence parameters of the QE model into the APEs.

It is worth clarifying that while the CML and PCML estimators are fixed-T consistent, the asymptotic theory for the APE estimator here proposed is based on both $n,T \rightarrow \infty $, due to the presence of the ML estimates of the individual intercepts and, consequently, of the bias correction, which is derived by means of a large-T asymptotic expansion.^{Footnote 2} This means that a few typical drawbacks of this setting theoretically apply here as well, namely time effects are ruled out, as they also are incidental parameters when $T \rightarrow \infty $, and the rest of the covariates are required to be stationary (Dhaene and Jochmans 2015; Fernández-Val 2009). However, by means of dedicated simulation exercises, we show that the proposed approach is able to handle the presence of time dummies in the model specification and that violations of the stationarity assumption does not seem to affect in practice the finite-sample performance of all the approaches considered.

The rest of the paper is organized as follows: In Sect. 2, we briefly discuss how the incidental parameters problem affects the APEs estimator; in Sect. 3, we recall the bias correction strategies for APE estimators, and then, we illustrate the proposed methodology and its extension to accommodate the dynamic logit model; in Sect. 4, we investigate by simulation the finite sample performance of the proposed estimator, compare it with the panel jackknife and analytical bias correction strategies, and illustrate the results of some robustness exercises; in Sect. 5, we provide a real data application based on labour supply of married women; finally, Sect. 6 provides the main conclusions.

2 Average partial effects and the incidental parameters problem

We consider n units, indexed with $i=1,\ldots ,n$, observed at time occasions $t = 1, \ldots , T$. Let $y_{it}$ be the binary response variable for unit i at occasion t and ${\varvec{x}}_{it}$ the corresponding vector of K strictly exogenous covariates. Under the static model, we assume that the $y_{it}$ are conditionally independent, given $\alpha _i$ and ${\varvec{x}}_{it}$, across i and t. Consider the logit formulation

$$\begin{aligned} p(y_{it} \vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }}) = \frac{\exp \left[ y_{it}(\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }})\right] }{1 + \exp (\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }})}, \end{aligned}$$

(1)

where $\alpha _i$ is the individual specific intercept and the vector ${\varvec{\beta }}$ collects the regression parameters.

The fixed-effects estimator is obtained by ML, treating each individual effect $\alpha _i$ as a parameter to be estimated. The ML estimator of ${\varvec{\beta }}$ is obtained by concentrating out the $\alpha _i$ as the solution to

$$\begin{aligned} \hat{{\varvec{\beta }}}= & {} \underset{\small {\varvec{\beta }}}{\textrm{argmax}} \sum _{i=1}^n \sum _{t = 1}^T \log p(y_{it}\vert {\varvec{x}}_{it}; {\hat{\alpha }}_i({\varvec{\beta }}), {\varvec{\beta }}), \\ {\hat{\alpha }}_i({\varvec{\beta }})= & {} \underset{\alpha _i}{\textrm{argmax}} \sum _{t = 1}^T \log p(y_{it}\vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }}). \end{aligned}$$

Notice that ${\hat{\alpha }}_i({\varvec{\beta }})$ is estimated using only the data for subject i and it is therefore not consistent for $\alpha _{i0}$ unless $T \rightarrow \infty $. As a consequence, with T fixed and only $n \rightarrow \infty $, the ML estimator of $\hat{{\varvec{\beta }}}$ will be plagued by the estimation noise in ${\hat{\alpha }}_i({\varvec{\beta }})$ and will not be consistent for ${\varvec{\beta }}_0$, with $\underset{n \rightarrow \infty }{\textrm{plim}} \, \hat{{\varvec{\beta }}} \equiv {\varvec{\beta }}_T \ne {\varvec{\beta }}_0$. This is the well-known incidental parameters problem (Lancaster 2000; Neyman and Scott 1948). In particular, Hahn and Newey (2004) show that ${\varvec{\beta }}_T = {\varvec{\beta }}_0 + B/T + O(T^{-2})$, which clarifies that ${\varvec{\beta }}_T \rightarrow {\varvec{\beta }}_0$ if $T \rightarrow \infty $ and n is fixed. Moreover, if both $n, T \rightarrow \infty $, $\hat{{\varvec{\beta }}}$ will be asymptotically normal. However, Hahn and Newey (2004) show that the asymptotic distribution of the ML estimator will not be centred at its probability limit if n grows proportionally to T.

The incidental parameters problem affects the estimation of APEs as well; these effects are usually of interest to practitioners who want to quantify the influence of some regressor x on the response probability, other things being equal. For the logit model in (1), the partial effect of covariate $x_{itk}$ for i at time t on the probability of $y_{it} = 1$ can be written, depending on the typology of the covariate, as

$$\begin{aligned} m_{itk}(\alpha _i, {\varvec{\beta }}, {\varvec{x}}_{it}) = \left\{ \begin{array}{ll} p(y_{it} = 1 \vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }})\left[ 1 - p(y_{it} = 1 \vert {\varvec{x}}_{it}; \alpha _i, {\varvec{\beta }})\right] \beta _k, &{} x_{itk} \,\,\, \textrm{continuous}, \\ &{} \\ p(y_{it} = 1 \vert {\varvec{x}}_{it,-k}, x_{itk} = 1 \, ; \alpha _i, {\varvec{\beta }}) - &{} \\ p(y_{it} = 1 \vert {\varvec{x}}_{it,-k}, x_{itk} = 0 \, ; \alpha _i, {\varvec{\beta }}),&{} x_{itk} \,\,\, \textrm{binary}, \\ \end{array} \right. \end{aligned}$$

where ${\varvec{x}}_{it, -k}$ denotes the subvector of all covariates but $x_{itk}$. The true APE of the kth covariate can then be obtained by simply taking the expected value of $m_{itk}(\alpha _i, {\varvec{\beta }}, {\varvec{x}}_{it})$ with respect to ${\varvec{x}}_{it}$ and $\alpha _{i0}$,

$$\begin{aligned} \mu _{k0} = \int m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0, {\varvec{x}}_{it})dG(\alpha _{i0},{\varvec{x}}_{it}), \end{aligned}$$

where $G(\alpha _{i0},{\varvec{x}}_{it})$ denotes the joint distribution of $\alpha _{i0}$ and ${\varvec{x}}_{it}$. An estimator of $\mu _{k0}$ can be obtained by plugging in the ML estimators $\hat{{\varvec{\beta }}}$ and ${\hat{\alpha }}_i(\hat{{\varvec{\beta }}})$, so that

$$\begin{aligned} {\hat{\mu }}_{k} = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\hat{{\varvec{\beta }}}), \hat{{\varvec{\beta }}}, {\varvec{x}}_{it}). \end{aligned}$$

(2)

It is now clear that, with small T, this estimator is plagued by two sources of bias: the first stems from the estimation error introduced by ${\hat{\alpha }}_i({\varvec{\beta }})$; the second is a result of using the asymptotically biased estimator $\hat{{\varvec{\beta }}}$.

In order to better understand how these components will affect the bias of the APE, it is useful to introduce an expansion of ${\hat{\alpha }}_i({\varvec{\beta }})$ as $T \rightarrow \infty $:

$$\begin{aligned} {\hat{\alpha }}_i({\varvec{\beta }}) = \alpha _{i0} + \frac{\xi _i}{T} + \frac{1}{T}\sum _{t=1}^T\tau _{it} + o_p\left( \frac{1}{T}\right) , \end{aligned}$$

(3)

where $\frac{1}{\sqrt{T}}\sum _{t=1}^T\tau _{it} {\mathop {\rightarrow }\limits ^{d}}N(0, \sigma ^2_i)$, which follows from higher-order asymptotics for time series data (Bao and Ullah 2007). Dhaene and Jochmans (2015) then show that the combined asymptotic bias is

$$\begin{aligned} \underset{n \rightarrow \infty }{\textrm{plim}}\,\, {\hat{\mu }}_k = \mu _{k0} + \frac{D + E}{T} + O(T^{-2}), \end{aligned}$$

(4)

where, specifically,

$$\begin{aligned} D= & {} \sum \limits _{j = 0}^{\infty }\textrm{E}_{T}\left[ \frac{\partial m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial \alpha _{i}}\tau _{i,t-j} \right] + \textrm{E}_{T}\left[ \frac{\partial m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial \alpha _{i}}\xi _{i}\right] \nonumber \\ {}{} & {} + \frac{1}{2}\textrm{E}_{T}\left[ \frac{\partial ^2 m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial \alpha _{i}^2}\sigma ^2_{i}\right] \end{aligned}$$

(5)

is the bias generated from using ${\hat{\alpha }}_i({\varvec{\beta }})$ instead of $\alpha _{i0}$.^{Footnote 3} Following Hahn and Newey (2004), this expression suggests that the bias introduced by plugging-in ${\hat{\alpha }}_i({\varvec{\beta }})$ has three components: (i) the asymptotic bias of ${\hat{\alpha }}_i({\varvec{\beta }})$; (ii) the correlation between ${\hat{\alpha }}_i({\varvec{\beta }})$ and $\hat{{\varvec{\beta }}}$ depending on the same data; (iii) the variance of ${\hat{\alpha }}_i({\varvec{\beta }})$. This result can be clarified by noticing that D comes from an expansion of the APE around $\alpha _{i0}$, where the term ${\hat{\alpha }}_i({\varvec{\beta }}) - \alpha _{i0}$ is characterized by (3). Furthermore,

$$\begin{aligned} E = \textrm{E}_{T}\left[ \frac{\partial m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0)}{\partial {\varvec{\beta }}'}B\right] \end{aligned}$$

is the bias from plugging in $\hat{{\varvec{\beta }}}$, instead if using ${\varvec{\beta }}_0$.

Further insights can be drawn from Expression (4). First, notice that even if a fixed–T consistent estimator of ${\varvec{\beta }}_0$ was available, the asymptotic bias of the APE estimator would still be of order $O(T^{-1})$ because of the presence of D. Secondly, using a bias-corrected estimate of $\alpha _{i}$ along with a fixed-T consistent estimator of ${\varvec{\beta }}$ would not remove the bias of order $O(T^{-1})$, as it would not take care of the last component in D.

The sources of bias discussed above, however, have been shown to become asymptotically negligible under rectangular array asymptotics, as plug-in estimators of average effects converge at a rate slower than $(nT)^{-1/2}$. Dhaene and Jochmans (2015) summarize this property in their Theorem 5.1, which is based on the following rationale. Consider the infeasible estimator

$$\begin{aligned} \mu ^*_{k} \equiv \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}(\alpha _{i0}, {\varvec{\beta }}_0, {\varvec{x}}_{it}), \end{aligned}$$

and let $\mu _{ik}$ be the individual-specific average partial effect, with mean $\mu _{k0}$ and finite variance. Then, $\mu ^*_k$ can be written as:

$$\begin{aligned} \mu ^*_k = \frac{1}{n}\sum _{i=1}^n \mu _{ik} + \frac{1}{n}\sum _{i=1}^n \left( \frac{1}{T}\sum _{t=1}^Tm_{itk}(\alpha _{i0}, {\varvec{\beta }}_0, {\varvec{x}}_{it}) - \mu _{ik} \right) . \end{aligned}$$

Notice that the first term converges to $\mu _{k0}$ at the rate $n^{-1/2}$, whereas the second term converges to zero at the rate $(nT)^{-1/2}$, thus implying that the infeasible APE estimator will converge no faster than $n^{-1/2}$.

From the above expression, it is straightforward to notice that any feasible average-effect estimator will converge at the same rate as $\mu ^*_k$, thus making the bias introduced by replacing $\alpha _{i0}$ and ${\varvec{\beta }}_0$ with ML estimates, or their first order bias-corrected versions, asymptotically negligible. However, based on their simulation evidence, Dhaene and Jochmans (2015) still suggest using some bias correction of the APE estimator in finite samples, especially when T is small. The proposed method operates such bias reduction, as well as the alternative analytical and jackknife bias corrections recalled in the following section.

3 Estimation of average partial effects

In the following, we briefly review the existing strategies based on analytical and jackknife bias corrections, which represent the benchmark for the finite sample performance of the proposed estimator. We then illustrate the proposed methodology, which combines the consistent CML estimator of ${\varvec{\beta }}_0$ and the analytical bias correction for the APE. Finally, we turn to the dynamic logit, for which the proposed procedure is based on a PCML estimator.

3.1 Existing strategies

The available bias reduction techniques for the estimation of APEs for fixed-effects binary choice models are based on either analytical or jackknife bias corrections.^{Footnote 4}

Analytical bias corrections for the APEs amount to plug-in a bias corrected estimate of ${\varvec{\beta }}$, say $\hat{{\varvec{\beta }}}^{c} = \hat{{\varvec{\beta }}} - {\hat{B}}/T$, instead of the ML estimate in expression (2), along with ${\hat{\alpha }}_i(\hat{{\varvec{\beta }}}^{c})$. Doing so effectively removes the E component of the bias in (4), but the APE estimator is still plagued by the estimation noise in ${\hat{\alpha }}_i({\varvec{\beta }})$, giving rise to the D component. In order to remove it, the bias-corrected estimator of $\mu _k$ is computed as:

$$\begin{aligned} {\hat{\mu }}^c_k = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\hat{{\varvec{\beta }}}^{c}), \hat{{\varvec{\beta }}}^{c}, {\varvec{x}}_{it}) - {\hat{D}}, \end{aligned}$$

(6)

where ${\hat{D}}$ is the sample counterpart of D in (5) and then evaluated at ${\hat{\alpha }}_i(\hat{{\varvec{\beta }}}^{c})$ and $\hat{{\varvec{\beta }}}^{c}$. Expressions for panel binary choice models are given in Fernández-Val (2009),^{Footnote 5} whose derivations are based on general results for static (Hahn and Newey 2004) and dynamic (Hahn and Kuersteiner 2011) nonlinear panel data models. For the expressions as well as for further details, we refer the reader to Hahn and Newey (2004), Fernández-Val (2009), and Hahn and Kuersteiner (2011). A similar correction of the APEs is provided by Bester and Hansen (2009), who also perform a comparison of their proposal with alternative strategies.^{Footnote 6} Furthermore, for the static logit model, Stammann et al. (2016) develop a computationally more efficient implementation of the bias corrected estimator of structural parameters and APEs proposed by Hahn and Newey (2004), based on a pseudo-demeaning algorithm.

An alternative bias correction method for the APE estimator relies on the panel jackknife. A general procedure for nonlinear static panel data models is proposed by Hahn and Newey (2004). Let $\hat{{\varvec{\beta }}}^{(t)}$ and ${\hat{\alpha }}^{(t)}_i(\hat{{\varvec{\beta }}}^{(t)})$ be the ML estimators with the tth observation excluded for every subject. Then, the jackknife corrected estimator for the APE is

$$\begin{aligned} {\hat{\mu }}^c_k = T{\hat{\mu }}_k - \frac{T-1}{T}\sum _{t=1}^T \mu _k\left( {\hat{\alpha }}^{(t)}_i(\hat{{\varvec{\beta }}}^{(t)}), \hat{{\varvec{\beta }}}^{(t)}\right) . \end{aligned}$$

If the set of model covariates includes the lag of explanatory variables, then leaving out one of the T observations at a time becomes unsuitable. Instead, a block of consecutive observations has to be considered so as to preserve the dynamic structure of the data. The so-called split panel jackknife estimator is proposed by Dhaene and Jochmans (2015). A simple version of the estimator is the half-panel jackknife, which is based on splitting the panel into two half-panels, also non-overlapping if T is even and $T \ge 6$, with T/2 time periods. Denote the set of half-panels as

$$\begin{aligned} S = \{S_1, S_2\}, \quad S_1 = \{1,\ldots ,T/2 \}, S_2 = \{T/2 +1,\ldots ,T \}; \end{aligned}$$

then the half-panel jackknife estimator of the APE is

$$\begin{aligned} {\hat{\mu }}^{(1/2)}_k = 2{\hat{\mu }}_k - \frac{1}{2}\left( {\bar{\mu }}^{S_1}_k + {\bar{\mu }}^{S_2}_k\right) , \end{aligned}$$

where ${\bar{\mu }}^{S_1}_k$ and ${\bar{\mu }}^{S_2}_k$ are the plug-in estimators evaluated at the ML estimators of the individual effects and slope parameters obtained using the observations in subpanels $S_1$ and $S_2$, respectively. Dhaene and Jochmans (2015) also illustrate generalized versions of the half-panel jackknife to deal with odd T and overlapping subpanels, as well as an alternative jackknife estimator based on the split-panel log-likelihood correction.

It is finally worth mentioning that jackknife and analytical higher-order bias corrections for the slope parameters have also been proposed by Dhaene and Jochmans (2015) and Dhaene and Sun (2021), respectively, and can be extended to the estimation of APEs, but their viability in dynamic models is limited. In the first case, the authors warn that the magnitude of the bias of the terms that are not eliminated increases in the order of the correction with the half-panel jackknife. In the second case, the correction is developed under the assumption of independent observations, thus ruling out its application to dynamic models completely.

3.2 Proposed methodology

The proposed multiple-step strategy is based on removing the two sources of bias in (4) by (a) using the fixed-T consistent CML estimator of ${\varvec{\beta }}$, $\tilde{{\varvec{\beta }}}$, instead of the ML estimator $\hat{{\varvec{\beta }}}$ and (b) reducing the order of bias of the APE plug-in estimator, induced by $\hat{\alpha _i}(\tilde{{\varvec{\beta }}})$, from $O(T^{-1})$ to $O(T^{-2})$ by applying the analytical bias-correction of Fernández-Val (2009) reported in Eq. (6).

3.2.1 Multiple-step estimation

The first step consists in estimating by CML the structural parameters of the logit model in (1). Taking the individual intercept $\alpha _i$ as fixed, the joint probability of the response configuration ${\varvec{y}}_i = ( y_{i1}, \ldots , y_{iT})'$ conditional on ${\varvec{X}}_i = ({\varvec{x}}_{i1}, \ldots , {\varvec{x}}_{iT})$ can be written as:

$$\begin{aligned} p({\varvec{y}}_i \vert {\varvec{X}}_i, \alpha _i) = \frac{\exp \left( y_{i+}\alpha _i + \sum _{t=1}^Ty_{it}{\varvec{x}}_{it}'{\varvec{\beta }}\right) }{\prod _{t=1}^T \left[ 1 + \exp \left( \alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }}\right) \right] }, \end{aligned}$$

where the dependence of the probability on the left hand-side upon the slope parameters is suppressed to avoid abuse of notation. It is well known that the total score $y_{i+} = \sum _{t=1}^T y_{it}$ is a sufficient statistic for the individual intercepts $\alpha _i$ (Andersen 1970; Chamberlain 1980). The joint probability of ${\varvec{y}}_i$ conditional on $y_{i+}$ does not depend on $\alpha _i$ and can therefore be written as:

$$\begin{aligned} p({\varvec{y}}_i \vert {\varvec{X}}_i, y_{i+}) = \frac{\exp \left[ \left( \sum _{t=1}^T y_{it}{\varvec{x}}_{it}\right) ' {\varvec{\beta }}\right] }{\underset{\small {\varvec{z}}: z_+ = y_{i+}}{\sum }\exp \left[ \left( \sum _{t=1}^T z_t{\varvec{x}}_{it}\right) ' {\varvec{\beta }}\right] }, \end{aligned}$$

(7)

where the denominator is the sum over all the response configurations ${\varvec{z}}$ such that $z_+ = y_{i+}$ and where the individual intercept $\alpha _i$ has been cancelled out. The log-likelihood function is

$$\begin{aligned} \ell ({\varvec{\beta }}) = \sum _{i=1}^n {\textbf{I}}(0<y_{i+}<T) \log p({\varvec{y}}_i \vert {\varvec{X}}_i, y_{i+}), \end{aligned}$$

where the indicator function ${\textbf{I}}(\cdot )$ is included to take into account that observations with total score $y_{i+}$ equal to 0 or T do not contribute to the log-likelihood. The above function can be maximized with respect to ${\varvec{\beta }}$ by a Newton–Raphson algorithm using standard results on the regular exponential family (Barndorff-Nielsen 1978), so as to obtain the CML estimator $\tilde{{\varvec{\beta }}}$, which is $\sqrt{n}$-consistent and asymptotically normal with fixed–T (see, Andersen , 1970, and Chamberlain , 1980, for details). Therefore, if plugged into the APE formulation (2) instead of the ML estimator $\hat{{\varvec{\beta }}}$, the E component of the bias in (4) is removed since $\tilde{{\varvec{\beta }}} {\mathop {\rightarrow }\limits ^{p}}{\varvec{\beta }}_0 $ as $n \rightarrow \infty $.

In the next step, we obtain estimates of the individual intercepts $\alpha _i$, which are not directly available as they have been cancelled out by conditioning on the total score. Our strategy is to obtain the ML estimates of $\alpha _i$, denoted ${\hat{\alpha }}_i(\tilde{{\varvec{\beta }}})$, for those subjects such that $0< y_{i+} < T$, by maximizing the individual term $\sum _{t=1}^T \log p_{\tilde{\small {\varvec{\beta }}}}(y_{it} \vert {\varvec{x}}_{it}, \alpha _i)$, where $p_{\tilde{\small {\varvec{\beta }}}}(y_{it} \vert {\varvec{x}}_{it}, \alpha _i)$ is the probability of the logit model defined in (1) evaluated at the CML estimate, namely at ${\varvec{\beta }}= \tilde{{\varvec{\beta }}}$. As well as the ML estimator, the analytical and the jackknife bias correction, our proposal leads to an APE equal to zero for the subjects whose response configurations are made of only 0s and 1s, as the marginal effects are evaluated at the ML (non-finite) estimates of $\alpha _i$. However, even if ${\varvec{\beta }}$ is fixed at some $\sqrt{n}$-consistent estimate, the bias of the ML estimator of $\alpha _{i0}$ will still be of order $O(T^{-1})$ because ${\hat{\alpha }}_i(\tilde{{\varvec{\beta }}}) {\mathop {\rightarrow }\limits ^{p}}\alpha _{i0}$ only as $T \rightarrow \infty $. Stammann et al. (2016) and Bartolucci and Pigini (2019) consider such plug-in estimator and confirm by simulation that this source of bias, although rather small for the static logit model, indeed shows up in finite samples. Moreover, Bartolucci and Pigini (2019) report that the bias is more severe for the dynamic logit model.^{Footnote 7} In Sect. 4, we show that correcting for the bias generated by the use of ${\hat{\alpha }}_i({\varvec{\beta }})$ instead of $\alpha _{i0}$, denoted by D in (4), is necessary in finite samples, especially with short T.

In the final step, the APEs are obtained by simply replacing the ML estimators in (2) with $\tilde{{\varvec{\beta }}}$ and ${\hat{\alpha }}(\tilde{{\varvec{\beta }}})$ and reducing the bias from $O(T^{-1})$ to $O(T^{-2})$ by applying the bias correction proposed by Fernández-Val (2009), that is,

$$\begin{aligned} {\tilde{\mu }}_{k} = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\tilde{{\varvec{\beta }}}), \tilde{{\varvec{\beta }}}, {\varvec{x}}_{it}) - {\tilde{D}}, \end{aligned}$$

where ${\tilde{D}}$ denotes the sample counterpart of (5) evaluated in $\tilde{{\varvec{\beta }}}$ and ${\hat{\alpha }}(\tilde{{\varvec{\beta }}})$. It is worth stressing that the proposed estimator exhibits the same asymptotic properties of any feasible average effect estimator under rectangular array asymptotics, as outlined in Sect. 2, since the CML estimator is also proved to be $\sqrt{nT}$-consistent.^{Footnote 8}

3.2.2 Standard errors

In order to derive an expression for the standard errors of the APEs $\tilde{{\varvec{\mu }}} = ({\tilde{\mu }}_1, \ldots , {\tilde{\mu }}_K)'$, we need to account for the variability in ${\varvec{x}}_{it}$ and the use of the estimated parameters $\tilde{{\varvec{\beta }}}$ in the first step. For the latter, we rely on the generalized method of moments (GMM) approach by Hansen (1982) and also implemented by Bartolucci and Nigro (2012) for the quadratic exponential model. In particular, following Newey and McFadden (1994), we formulate the proposed multi-step procedure as the solution of the system of estimating equations

$$\begin{aligned} {\varvec{f}}({\varvec{\beta }}, {\varvec{\mu }}) = {\varvec{0}}, \end{aligned}$$

where

$$\begin{aligned} {\varvec{f}}({\varvec{\beta }}, {\varvec{\mu }})= & {} \sum _{i=1}^n {\varvec{f}}_i({\varvec{\beta }}, {\varvec{\mu }}),\nonumber \\ {\varvec{f}}_i({\varvec{\beta }},{\varvec{\mu }})= & {} \begin{pmatrix} {\varvec{\nabla }}_{\small {\varvec{\beta }}}\ell _i({\varvec{\beta }})\\ {\varvec{\nabla }}_{\mu _1}g_i({\varvec{\beta }},\mu _1)\\ \vdots \\ {\varvec{\nabla }}_{\mu _K}g_i({\varvec{\beta }},\mu _K)\\ \end{pmatrix}, \end{aligned}$$

(8)

and

$$\begin{aligned} g_i({\varvec{\beta }},\mu _k)=\frac{1}{T}\sum _{t=1}^T\left[ m_{itk}(\alpha _i({\varvec{\beta }}),{\varvec{\beta }}, {\varvec{x}}_{it})-\mu _k\right] ^2, \quad \, k=1,\ldots ,K. \end{aligned}$$

The asymptotic variance of $(\tilde{{\varvec{\beta }}}^{\prime },\tilde{{\varvec{\mu }}}^{\prime })^{\prime }$ is then

$$\begin{aligned} {\varvec{W}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})= {\varvec{H}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})^{-1}{\varvec{S}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})[{\varvec{H}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})^{-1}]^{\prime }, \end{aligned}$$

(9)

where

$$\begin{aligned} {\varvec{S}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})= \sum _{i=1}^n {\varvec{f}}_i(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}}){\varvec{f}}_i(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})^{\prime }. \end{aligned}$$

Moreover, we have that

$$\begin{aligned} {\varvec{H}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})= \sum _{i=1}^n {\varvec{H}}_i(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}}), \end{aligned}$$

where

$$\begin{aligned} {\varvec{H}}_i({\varvec{\beta }},{\varvec{\mu }})= \left( \begin{array}{cc}{\varvec{\nabla }}_{\small {\varvec{\beta }}\small {\varvec{\beta }}} \, \ell _i({\varvec{\beta }}) &{} {\varvec{O}} \\ {\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\beta }}}\, {\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }}) &{} {\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\mu }}}\, {\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }}) \end{array}\right) \end{aligned}$$

(10)

is the derivative of ${\varvec{f}}_i({\varvec{\beta }},{\varvec{\mu }})$ with respect to $({\varvec{\beta }},{\varvec{\mu }})$, with ${\varvec{O}}$ denoting a $K \times K$ matrix of zeros and ${\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }})$ collects $ g_i({\varvec{\beta }},\mu _k)$, for $k = 1,\ldots ,K$. Expressions for the derivatives in (8) are

$$\begin{aligned} \nabla _{\small {\varvec{\beta }}} \ell _i({\varvec{\beta }}) = \sum _{t=1}^T y_{it}{\varvec{x}}_{it} - \sum _{\small {\varvec{z}}: z_+ = y_{i+}} \left( p({\varvec{z}} \vert {\varvec{X}}_i, y_{i+}) \sum _{t=1}^T z_t {\varvec{x}}_{it} \right) , \end{aligned}$$

and

$$\begin{aligned} \nabla _{\mu _k} g_i({\varvec{\beta }}, \mu _k) = - \frac{2}{T}\sum _{t=1}^T \left[ m_{itk}( \alpha _i({\varvec{\beta }}), {\varvec{\beta }}, {\varvec{x}}_{it}) - \mu _k \right] . \end{aligned}$$

The second derivatives in (10) are

$$\begin{aligned} \nabla _{\small {\varvec{\beta }}\small {\varvec{\beta }}} \ell _i({\varvec{\beta }}) = \sum _{\small {\varvec{z}}: z_+ = y_{i+}}p({\varvec{z}} \vert {\varvec{X}}_i, y_{i+}) {\varvec{e}}({\varvec{z}}, {\varvec{X}}_i){\varvec{e}}({\varvec{z}}, {\varvec{X}}_i)', \end{aligned}$$

where

$$\begin{aligned} {\varvec{e}}({\varvec{z}}, {\varvec{X}}_i) = \sum ^T_{t=1} z_t {\varvec{x}}_{it} - \sum _{\small {\varvec{z}}: z_+ = y_{i+}}\left( p({\varvec{z}} \vert {\varvec{X}}_i, y_{i+}) \sum _{t=1}^T z_t {\varvec{x}}_{it} \right) , \end{aligned}$$

and ${\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\mu }}} \, {\varvec{g}}_i({\varvec{\beta }},{\varvec{\mu }})$ is a $K \times K$ diagonal matrix with elements equal to 2. Finally, for the computation of the block ${\varvec{\nabla }}_{\small {\varvec{\mu }}\small {\varvec{\beta }}}g_i({\varvec{\beta }},{\varvec{\mu }})$, we rely on a numerical differentiation. Once the matrix in (9) is computed, the standard errors for the APEs $\tilde{{\varvec{\mu }}}$ may be obtained by taking the square root of the elements in the main diagonal of the lower right submatrix of ${\varvec{W}}(\tilde{{\varvec{\beta }}},\tilde{{\varvec{\mu }}})$.

3.2.3 Dynamic logit model

The method proposed to obtain the APE for the logit model cannot be applied directly to the dynamic logit model (Hsiao 2005). In the latter case, the conditional probability of $y_{it}$ is

$$\begin{aligned} p(y_{it}\vert {\varvec{x}}_{it}, y_{i,t-1}; \alpha _i, {\varvec{\beta }}, \gamma ) = \frac{\exp \left[ y_{it}(\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }}+ y_{i,t-1}\gamma )\right] }{1 + \exp (\alpha _i + {\varvec{x}}_{it}'{\varvec{\beta }}+ y_{i,t-1}\gamma )}, \end{aligned}$$

(11)

where $\gamma $ is the regression coefficient for the lagged response variable that measures the true state dependence. Plugging the CML estimator of ${\varvec{\beta }}$ and $\gamma $ in the APE formulation is not viable in this case because the total score is no longer a sufficient statistic for the incidental parameters if the lag of the dependent variable is included among the model covariates. Conditioning on sufficient statistics eliminates the incidental parameters only in the special case of $T = 3$ and no other explanatory variables (Chamberlain 1985). Honoré and Kyriazidou (2000) extend this approach to include explanatory variables and the corresponding parameters can be estimated by CML on the basis of a weighted conditional log-likelihood. However, time effects cannot be included in the model specification, and the estimator’s rate of convergence to the true parameter value is slower than $\sqrt{n}$.

More recently, Honoré and Weidner (2020) generalize the approach by Honoré and Kyriazidou (2000) to include any type of strictly exogenous covariate and by providing a $\sqrt{n}$-consistent generalized method of moments (GMM) estimator, based on moment conditions that are free of the incidental parameters. The viability of their approach in practice, though, has to be assessed since this estimator, as GMM estimators in general, suffers from a considerable small sample bias when built on a large number of moment conditions, which is shown to rapidly increase in the number of time occasions and in the number of covariates. A different perspective is taken by Bartolucci and Nigro (2010) who, instead of the dynamic logit, consider a QE formulation (Cox 1972) to model dynamic binary panel data, that has the advantage of admitting sufficient statistics for the individual intercepts.

Bartolucci and Nigro (2012) propose a QE model that approximates more closely the dynamic logit model, the parameters of which can easily be estimated by PCML. Under the approximating model, each $y_{i+}$ is a sufficient statistic for the fixed effect $\alpha _i$. By conditioning on the total score, the joint probability of ${\varvec{y}}_i$ becomes:

$$\begin{aligned} p^*({\varvec{y}}_i\vert {\varvec{X}}_i,y_{i0},y_{i+})=\frac{\exp \left( \sum _{t=1}^Ty_{it}{\varvec{x}}_{it}'{\varvec{\beta }}-\sum _{t=1}^T{\bar{q}}_{it}y_{i,t-1}\gamma + y_{i*}\gamma \right) }{\underset{\small {\varvec{z}}:z_+=y_{i+}}{\sum } \exp \left( \sum _{t=1}^Tz_t{\varvec{x}}_{it}'{\varvec{\beta }}-\sum _{t=1}^T{\bar{q}}_{it}z_{i,t-1}\gamma + z_{i*}\gamma \right) },\nonumber \\ \end{aligned}$$

(12)

where $y_{i*} = \sum _{t=1}^T y_{i,t-1}y_{it}$ and $z_{i*} = y_{i0}z_1 + \sum _{t>1} z_{t-1}z_t$. Moreover, ${\bar{q}}_{it}$ is a function of given values of ${\varvec{\beta }}$ and $\alpha _i$, resulting from a first-order Taylor series expansion of the log-likelihood based on (11) around ${\varvec{\beta }}= \bar{{\varvec{\beta }}}$ and $\alpha _i = {\bar{\alpha }}_i$, $i = 1,\ldots ,n$, and $\gamma = 0$ (see Bartolucci and Nigro 2012, for details). The expression for ${\bar{q}}_{it}$ is then

$$\begin{aligned} {\bar{q}}_{it} = \frac{\exp ({\bar{\alpha }}_i + {\varvec{x}}_{it}'\bar{{\varvec{\beta }}})}{1 + \exp ({\bar{\alpha }}_i + {\varvec{x}}_{it}'\bar{{\varvec{\beta }}}) }. \end{aligned}$$

Expressions for the partial effects and APEs are derived in the same way as for the static logit model. Let ${\varvec{w}}_{it} = ({\varvec{x}}_{it}', y_{i,t-1})'$ collect the $K+1$ model covariates. Based on (11), the partial effect of covariate $w_{itk}$ for i at time t on the probability of $y_{it} = 1$ can be written as:

$$\begin{aligned} m_{itk}(\alpha _i, {\varvec{\theta }}, {\varvec{w}}_{it}) = \left\{ \begin{array}{ll} p(y_{it} = 1 \vert {\varvec{w}}_{it}; \alpha _i, {\varvec{\beta }}, \gamma )\left[ 1 - p(y_{it} = 1 \vert {\varvec{w}}_{it}; \alpha _i, {\varvec{\beta }}, \gamma )\right] \beta _k, &{} w_{itk} \,\,\, \textrm{continuous}, \\ &{} \\ p(y_{it} = 1\vert {\varvec{w}}_{it,-k}, w_{itk} = 1; \alpha _i, {\varvec{\beta }}, \gamma ) - &{} \\ p(y_{it} = 1 \vert {\varvec{w}}_{it,-k}, w_{itk} = 0; \alpha _i, {\varvec{\beta }}, \gamma ),&{} w_{itk} \,\,\, \textrm{binary}, \\ \end{array} \right. \end{aligned}$$

where ${\varvec{w}}_{it, -k}$ denotes the vector ${\varvec{w}}_{it}$ excluding $w_{itk}$, and ${\varvec{\theta }}= ({\varvec{\beta }}', \gamma )'$. This expression may also be used to compute the APE of the lagged response variable. Notice that this function does not depend on $\bar{{\varvec{\beta }}}$, since the probability in (11) does not depend on ${\bar{q}}_{it}$. The APE of the kth covariate can then be obtained by taking the expected value of $m_{itk}(\alpha _i, {\varvec{\theta }}, {\varvec{w}}_{it})$ with respect to ${\varvec{w}}_{it}$ and $\alpha _{i0}$, and can be written as

$$\begin{aligned} \mu _{k0} = \int m_{itk}(\alpha _{i0}, {\varvec{\theta }}_0, {\varvec{w}}_{it})dG(\alpha _{i0}, {\varvec{w}}_{it}), \end{aligned}$$

where $G(\alpha _{i0}, {\varvec{w}}_{it})$ denotes the joint distribution of $\alpha _{i0}$ and ${\varvec{w}}_{it}$.

As for the static logit model, the estimate of $\mu _{k0}$ is based on those of the $\alpha _i$, which we obtain in the same manner as in the first step described in Sect. 3.2.1. In addition, here the CML estimation of ${\varvec{\theta }}$ based on (12) relies on a preliminary step in order to obtain ${\bar{q}}_{it}$. In the first step, a preliminary estimate of $\bar{{\varvec{\beta }}}$ is obtained by maximizing the conditional log-likelihood

$$\begin{aligned} \ell (\bar{{\varvec{\beta }}}) = \sum _{i=1}^n {\textbf{I}}(0<y_{i+}<T)\ell _i(\bar{{\varvec{\beta }}}), \end{aligned}$$

where

$$\begin{aligned} \ell _i(\bar{{\varvec{\beta }}}) = \log \frac{\exp \left[ \left( \sum _{t=1}^T y_{it}{\varvec{x}}_{it}\right) ' \bar{{\varvec{\beta }}} \right] }{\underset{\small {\varvec{z}}: z_+ = y_{i+}}{\sum }\exp \left[ \left( \sum _{t=1}^T z_t{\varvec{x}}_{it}\right) '\bar{{\varvec{\beta }}} \right] }, \end{aligned}$$

which is the same conditional log-likelihood of the static logit model and may be maximized by a standard Newton–Raphson algorithm. We denote the resulting CML estimator by $\check{{\varvec{\beta }}}$. The estimate ${\check{\alpha }}_i$ is then computed by maximizing the individual log-likelihood

$$\begin{aligned} \ell _i({\bar{\alpha }}_i) = \sum _{t=1}^T \log \frac{\exp \left[ y_{it}({\bar{\alpha }}_i + {\varvec{x}}_{it}' \check{{\varvec{\beta }}}) \right] }{1 + \exp ({\bar{\alpha }}_i + {\varvec{x}}_{it}' \check{{\varvec{\beta }}})},\end{aligned}$$

where $\check{{\varvec{\beta }}}$ is fixed. The probability ${\bar{q}}_{it}$ in (12) is estimated as $\check{q}_{it} = \exp ({\check{\alpha }}_i + {\varvec{x}}_{it}'\check{{\varvec{\beta }}})/\left[ 1 + \exp ({\check{\alpha }}_i + {\varvec{x}}_{it}'\check{{\varvec{\beta }}}) \right] $.

In the second step, we estimate ${\varvec{\theta }}$ by maximizing the conditional log-likelihood

$$\begin{aligned} \ell ({\varvec{\theta }}) = \sum _{i=1}^n {\textbf{I}}(0<y_{i+}<T) \log p^*_{\check{\small {\varvec{q}}}_i}({\varvec{y}}_i\vert {\varvec{X}}_i,y_{i0},y_{i+}), \end{aligned}$$

where $p^*_{\check{\small {\varvec{q}}}_i}({\varvec{y}}_i\vert {\varvec{X}}_i,y_{i0},y_{i+})$ is the joint probability in (12) evaluated at $\check{{\varvec{q}}}_i = (\check{q}_{i1}, \ldots , \check{q}_{iT})'$. The above function can easily be maximized with respect to ${\varvec{\theta }}$ by a Newton–Raphson algorithm, so as to obtain the PCML estimator $\tilde{{\varvec{\theta }}}$, which is a $\sqrt{n}$-consistent estimator of ${\varvec{\theta }}_0$ only if $\gamma _0 = 0$, representing the special case in which the QE model corresponds to the dynamic logit model.^{Footnote 9} Nonetheless, Bartolucci and Nigro (2012) show that the PCML estimator has a limited bias in finite samples even in the presence of non-negligible state dependence.

The next step consists of recovering the estimates of $\alpha _i$, ${\tilde{\alpha }}_i(\tilde{{\varvec{\theta }}})$, by maximizing the concentrated log-likelihood evaluated at $\tilde{{\varvec{\theta }}}$. In the final step, the APEs can then be estimated by plugging ${\tilde{\alpha }}_i(\tilde{{\varvec{\theta }}})$ and $\tilde{{\varvec{\theta }}}$ in the APE formulation and applying the same correction shown in Sect. 3.2.1, so as to obtain

$$\begin{aligned} {\tilde{\mu }}_{k} = \frac{1}{nT}\sum _{i=1}^n \sum _{t=1}^T m_{itk}({\hat{\alpha }}_i(\tilde{{\varvec{\theta }}}), \tilde{{\varvec{\theta }}}, {\varvec{w}}_{it}) - {\tilde{D}}. \end{aligned}$$

Standard errors for ${\tilde{\mu }}_k$ can be obtained exactly in the same way as illustrated in Sect. 3.2.2 with the appropriate change of notation.

4 Monte Carlo simulation study

In the following, we illustrate the design and report the main results of the simulation studies aimed at assessing the finite sample performance of the estimators of the APEs for the static and dynamic logit models. We also discuss the results of some robustness exercises, with the related tables reported in Appendix.

4.1 Simulation set-up

We generate data for the logit model according to the formulation proposed by Honoré and Kyriazidou (2000), that is, for $ i = 1,\ldots ,n$,

$$\begin{aligned} y_{it}= & {} {\textbf{I}}(\alpha _i + y_{i,t-1}\gamma + x_{it}\beta + \varepsilon _{it} > 0), \quad t = 1,\ldots ,T, \end{aligned}$$

(13)

$$\begin{aligned} y_{i0}= & {} {\textbf{I}}(\alpha _i + x_{i0}\beta + \varepsilon _{i0} > 0), \end{aligned}$$

(14)

where $x_{it} \sim N(0, \pi ^2/3)$ and $\varepsilon _{it}$ follows a standard logistic distribution for $t = 0, \ldots , T$.

Based on the above design, we generate data from the static and dynamic logit models as follows. For the static logit model, data are generated under assumption (13) with $\gamma = 0$ and $\beta = 1$, for $t=1, \ldots , T$. Here the individual intercepts are given by $\alpha _i = \sum _{t = 1}^{4} x_{it} / 4 $. For the dynamic logit model, data are generated for $t=0,\ldots ,T$ using also (14), $\gamma $ in (13) takes values (0.25, 0.5, 0.75), $\beta = 1$, and the individual heterogeneity is generated as $\alpha _i = \sum _{t = 0}^{3} x_{it} / 4$. We consider the same scenarios for both the static and dynamic logit model, corresponding to $n = 100, 500$ and $T = 4, 8, 12$, and the number of Monte Carlo replications is 1000.

For the static logit model, we compare the finite sample performance of the proposed APE estimator (denoted by CML-BC) with: (a) the ML plug-in estimator (ML); (b) Hahn and Newey (2004)’s jackknife bias corrected estimator (Jackknife-BC); (c) the ML estimator with the analytical bias correction (Analytical-BC) provided by Fernández-Val (2009), also mentioned in the previous section.

For the dynamic logit model, we compare the finite sample performance of the proposed APE estimator (PCML-BC) with: (a) the ML plug-in estimator; (b) Dhaene and Jochmans (2015)’s half-panel jackknife bias-corrected estimator (Jackknife-BC); (c) the analytically bias-corrected estimator (Analytical-BC) by Fernández-Val (2009). It must be noted that the half-panel Jackknife-BC estimator cannot be computed for $T=4$.

For each scenario, we report the mean and the median of the ratio ${\tilde{\mu }}/\mu ^*$, the standard deviation of ${\tilde{\mu }}$, the rejection frequency at the $5\%$ and $10\%$ nominal value of a t-test for the true value of the APE, and the mean ratio between the estimator standard error and standard deviation.^{Footnote 10}

4.2 Main results

Table 1 reports the simulation results for the static logit model. It emerges that the proposed estimator (CML-BC) has good finite sample performance with both small n and T. On the contrary, the Jackknife-BC and the Analytical-BC exhibit a sizable bias when $T = 4$ and produce unreliable coverage intervals. Actually, the simulation results of the ML estimator, especially for $T \ge 8$, suggest that the bias correction is unnecessary in this case. Therefore, it emerges here, and also from the results reported by Fernández-Val (2009), that for the APEs of static models the sources of bias are negligible not only asymptotically, as discussed in Sect. 2, but also in finite samples.

Table 1 Simulation results for ${\tilde{\mu }}$, static logit model

Full size table

Bias corrections are bound to be more relevant for the dynamic logit, as ML is known to produce a severely biased estimator of the state dependence parameter in auto-regressive formulations for both linear and nonlinear models (Heckman 1981; Nickell 1981). Tables 2, 3 and 4 report the results for the partial effect relative to the state dependence parameter, $\mu _y$. The simulation results of the APEs for the covariate, denoted $\mu _x$, are reported in Tables 8, 9 and 10 in Appendix A.1.

Results confirm that the bias of plug-in APEs based on ML estimates is greater, especially concerning the partial effect of the state dependence parameters. While all bias corrections produce a remarkable improvement over ML, the proposed estimator outperforms the Analytical-BC and Jackknife-BC. With $\gamma = 0.25$, the advantage is noticeable across all the scenarios considered. Furthermore, the proposed methodology seems to provide the most reliable confidence intervals among the examined estimators. In this regard, it is worth noticing that when $T = 4$, all the estimators provide poor coverage. As for the APE of the covariate $\mu _x$, the PCML-BC exhibits a better performance with $T=4$, whereas all the bias correction strategies have comparable performance with larger values of T. These results suggest that the use of a fixed-T consistent estimator, even though for an approximating model, offers an advantage when the bias is sizable, as removing only the $O(T^{-1})$ component may not provide enough of a reduction.

As discussed in Section 3.2.3, while the PCML estimator is consistent only when $\gamma = 0$, Bartolucci and Nigro (2012) show that its finite-sample bias is limited even in presence of non-negligible state dependence. The results in Tables 2, 3 and 4 confirm that the PCML-BC APE estimator has the same behaviour: although the advantage over the alternative approaches is reduced with $\gamma = 0.5$ and $\gamma = 0.75$, the proposed approach remains the most effective strategy to correct APEs for the state dependence parameter, especially with short T.

Table 2 Simulation results for ${\tilde{\mu }}_y$, dynamic logit model, $\gamma = 0.25$

Full size table

Table 3 Simulation results for ${\tilde{\mu }}_y$, dynamic logit model, $\gamma = 0.5$

Full size table

Table 4 Simulation results for ${\tilde{\mu }}_y$, dynamic logit model, $\gamma = 0.75$

Full size table

It is finally worth recalling that further to plugging fixed-T (pseudo)consistent estimators for the slope parameters in the APEs, the good finite-sample performance of the proposed approach also benefits form the correction aimed to remove the bias originating from the ML estimate of the unobserved heterogeneity, denoted by D in (4). In order to illustrate the indirect bias effect introduced by ${\hat{\alpha }}_i({\varvec{\beta }})$, we report in Table 5 the results of a simulation exercise where we compare the performance of the proposed approach (P)CML-BC with that of an APE estimator based on (P)CML estimates ignoring the bias correction. The latter approach is employed by Stammann et al. (2016) and was considered on an earlier version of this work by Bartolucci and Pigini (2019). It clearly emerges that a bias reduction is needed for both the static and dynamic logit model, especially with $T=4$. Besides, these results also highlight that plugging fixed-T consistent estimators already brings a sizable improvement over the ML-based APEs.

Table 5 Simulation results for the static and dynamic logit, comparison between (P)CML and (P)CML-BC

Full size table

4.3 Robustness exercises

In this section, we illustrate the design and discuss the results of a series of robustness exercises aimed to test the performance of the proposed approach under potentially problematic departures from the baseline setting. The results are collected in Appendix A.2.

The first set of exercises concerns the violation of standard hypotheses that are necessary for the asymptotic results on the bias correction of the ML estimator to hold. In fact, it is worth recalling the characterization of this correction that is based on a large-T asymptotic expansion, which theoretically rules out the possibility of including time dummies in the model specification as, with large-T asymptotics, time fixed-effects are incidental parameters as well.^{Footnote 11} We devise an experiment where data are generated from the dynamic logit model defined in (13)-(14), with the addition of a trending regressor $\eta _t = -1 + \frac{2(t-1)}{T-1}$ in $[-1,1]$ and the model specification then includes time effects as well. The results for $T = 4, 8$ are reported in Table 11 and suggest overall robustness of the approaches considered to the inclusion of time dummies, as also documented by Fernández-Val (2009).

Large-T asymptotics also requires stationarity of covariates, which may be rather restrictive in applications. Yet both Fernández-Val (2009) and Dhaene and Jochmans (2015) report similar performance of the approaches under departures from this assumption. We confirm the same behaviour for the proposed approach by means of a simulation exercise where data are generated from a logit model with a trending regressor. The design is based on the one adopted by Hahn and Newey (2004), where

$$\begin{aligned} y_{it} = {\textbf{I}}(\alpha _i + x_{it}\beta + \varepsilon _{it} > 0), \quad i = 1,\ldots ,n,\,\, t = 1,\ldots ,T, \end{aligned}$$

with $\beta = 1$, $\alpha _i \sim N(0,1)$, the error terms $\varepsilon _{it}$ follow a standard logistic distribution, and

$$\begin{aligned} x_{it} = t/10 + x_{i,t-1}/2 + u_{it}, \end{aligned}$$

with $u_{it} \sim U\left[ -0.5, 0.5\right] $ and $x_{i0} = u_{i0}$. According to the results reported in Table 12, there are no remarkable differences with respect to the baseline scenario.^{Footnote 12}

The second set of exercises explores the finite-sample performance of the proposed approach with different settings for the unobserved heterogeneity. We first consider the design based on assumptions (13)-(14) with individual intercepts generated as $\alpha _i = \sum _{t=0}^3 x_{it}/4 + (u_i -1)$, with $u_i \sim \chi ^2_1$. The corresponding results are reported in Table 13 for the scenarios with $T = 4,8$. As expected with fixed-effects approaches, results are unaffected by the distribution of the unobserved heterogeneity.

We then consider a shift in the distribution of the individual effects and generate them as $\alpha _i = \sum _{t=0}^3 x_{it}/4 - 3.5$. In this way, we generate samples where the frequency of 1s amounts to about $12\%$, whereas in the baseline setting was around $53\%$, thus simulating a response variable describing rare events, which is often realistic in applications. Results are reported in Table 14 and, as expected, document a deterioration in the performance of all the approaches considered. This is due to the limited availability of useful response configurations, which generates a larger small sample bias in the parameter estimates and increases the number of instances in which the PE must be set to zero.

5 Empirical application

We apply the proposed formulation to the problem of estimating the labour supply of married women. The same empirical application is considered by Fernández-Val (2009) and Dhaene and Jochmans (2015), after the seminal work of Hyslop (1999). The sample is drawn from the Panel Study of Income Dynamics (PSID), which consists of $n = 1{,}908$ married women between 19 and 59 years of age in 1980, followed for $T = 6$ time occasions, from 1980 to 1985, further to an additional observation in 1979 exploited as initial condition in dynamic models. We specify a static logit model for the probability of being employed at time t, conditional on the number of children of a certain age in the family, namely the number of kids between 0 and 2 years old, between 3 and 5, and between 6 and 17, and on the husband’s income. We also specify a dynamic logit model, that is, we include lagged participation in the set of model covariates.

Table 6 Female labour force participation: static logit model

Full size table

Table 7 Female labour force participation: dynamic logit model

Full size table

The estimation results for the static logit model are reported in Table 6, which shows the ML, Hahn and Newey (2004)’s panel Jackknife-BC, Hahn and Newey (2004)’s Analytical-BC, and CML estimates of the model parameters. The CML, Analytical-BC, and Jackknife-BC estimates of the parameters are all similar to each other and smaller (in absolute value) than the uncorrected ML ones. These suggest a negative effect on labour participation of having children younger than 17 in the household as well as of the level of the husband’s income. The estimated APEs obtained with the proposed method suggest that having an additional child between 0 and 2 reduces the probability of working by 8.9 percentage points, and having a child between 3 and 5 years old reduces the employment probability by 6.1 percentage points. The APE estimates obtained with the Analytical-BC and Jackknife-BC estimators point toward the same results, with the exception of having children between 6 and 17 years old, which appear to be not statistically significant, according to our procedure.

Table 7 reports the results for the dynamic logit specification. Here, we report the ML, Dhaene and Jochmans (2015)’s half-panel Jackknife-BC, Fernández-Val (2009)’s Analytical-BC, and PCML estimates of the model parameters. The effect of the exogenous model covariates is now smaller and all the APE estimates suggest a negative and statistically significant effect of having children between 0 and 5 years old in the household.

The PCML estimator detects a strong state dependence in the labour force participation of married women, as the estimated coefficient for lagged participation amounts to 1.706. In terms of APE, this is translated into an increase of 15.7 percentage points in the probability of being employed at time t for a woman who was working in $t-1$, with respect to a woman who was not working in $t-1$.

6 Conclusion

We develop a multiple-step procedure to compute APEs for fixed-effects logit models that are estimated by CML. Our strategy amounts to building a plug-in APE estimator based on the fixed-T consistent CML estimator of the slope parameters and bias-corrected estimates of APEs.

The proposed estimator is asymptotically equivalent to the plug-in ML and alternative bias-corrected APE estimators, and it exhibits comparable finite sample performance when the static logit model is considered. On the contrary, the proposed approach for the dynamic logit model has a remarkable advantage in finite samples. In this respect, the multiple-step procedure here developed could be particularly useful for practitioners who often deal with short-T datasets, such as rotated surveys, and/or highly unbalanced panels.

Data and computer code availability

The datasets of this paper (1. code and programs, 2. data, 3. detailed readme files) are available in the GitHub repository https://github.com/fravale/replication_ape_logit.

Notes

Estimators of (dynamic) discrete choice models have been employed in seminal papers related to labour market participation (Heckman and Borjas 1980), and specifically to female labour supply and fertility choices (Hyslop 1999), self-reported health status (Contoyannis et al. 2004), poverty traps (Cappellari and Jenkins 2004), household finance (Alessie et al. 2004), and unionization of workers (Wooldridge 2005). More recently their application has been extended to the fields of firms’ access to credit (Pigini et al. 2016), migrants’ remitting behaviour (Bettin et al. 2018), energy consumption (Drescher and Janzen 2021), and innovation (Arroyabe and Schumann 2022).
In any case, the focus on the large n and large T perspective is necessary as APEs are often not point identified with fixed T (Chernozhukov et al. 2013).
Expressions for $\tau _{is}$, $\xi _i$, and $\sigma ^2_i$ for panel binary choice models are given in Fernández-Val (2009).
In the following discussion, we will use the notation for the static logit model, unless required otherwise. Nonetheless, everything that follows can be generalized to the dynamic logit model.
The term $\xi _i$ is denoted by $\beta _i$ and the term $\tau _{it}$ by $\psi _{it}$ in Fernández-Val (2009).
The approach by Bester and Hansen (2009) is not considered here, as it is very similar to that put forward by Fernández-Val (2009) and not specific to static and dynamic binary choice models.
It is worth recalling that using a bias corrected estimate of $\alpha _{i}$, such as the one proposed by Kunz et al. (2019), along with a fixed-T consistent estimator of ${\varvec{\beta }}$ will not reduce the order of the bias of the APE estimator to $O(T^{-2})$, as it would not take care of the last component in (5). Yet Bartolucci and Pigini (2019) show that the finite sample performance of the resulting APE estimator is superior to that of the panel jackknife with short-T, while the two estimators are comparable with moderately long panels.
The result is a special case of Theorem 1 by Hanfelt and Wang (2014), who extend the results in Hahn and Newey (2004) to derive asymptotic properties for a general class of estimators based on what they call a “relaxed” conditional likelihood. The CML estimator emerges as a special case when, as in our case, responses are distributed according to the regular exponential family.
The correspondence refers to the log-odds ratio. This is clarified by Theorem 1 in Bartolucci and Nigro (2012).
ML standard errors are computed for Hahn and Newey (2004)’s Jackknife-BC estimator. Bootstrapped standard errors (500 replications) are computed for Dhaene and Jochmans (2015)’s half-panel Jackknife-BC estimator.
Fernández-Val and Weidner (2016) provide an analytical formulation and a jackknife procedure to accommodate this case. Their simulation setup and results, however, suggest that their method is more suitable for dyadic network data, or in any case panel datasets where the dimensions of n and T are comparable, which is why their approach is not considered in our analysis, more focused on the large-n short-T framework. The conditional inference approach exploiting sufficient statistics for both individual and time effects is instead considered by Charbonneau (2017), Jochmans (2018), and Bartolucci et al. (2021).
Actually, the Honoré and Kyriazidou ’s design also violates the stationarity assumption, in that the initial observations $(y_{i0},x_{i0})$ are not drawn from their state distribution and the relationship between x and the unobserved heterogeneity changes drastically after the fourth time occasion. The only stationary setting should therefore be that of the static logit model with $T=4$. Nevertheless, the results in Table 12 with $T=4$ do not stress any deterioration in the performance of the approaches considered.

References

Alessie R, Hochguertel S, van Soest A (2004) Ownership of stocks and mutual funds: a panel data analysis. Rev Econ Stat 86:783–796
Article Google Scholar
Andersen EB (1970) Asymptotic properties of conditional maximum-likelihood estimators. J Roy Stat Soc B 32:283–301
Google Scholar
Arroyabe MF, Schumann M (2022) On the estimation of true state dependence in the persistence of innovation. Oxf Bull Econ Stat 84:850–893
Article Google Scholar
Bao Y, Ullah A (2007) The second-order bias and mean squared error of estimators in time-series models. J Econom 140:650–669
Barndorff-Nielsen O (1978) Information and exponential families in statistical theory. Wiley, Chichester
Google Scholar
Bartolucci F, Nigro V (2010) A dynamic model for binary panel data with unobserved heterogeneity admitting a $\sqrt{n}$-consistent conditional estimator. Econometrica 78:719–733
Bartolucci F, Nigro V (2012) Pseudo conditional maximum likelihood estimation of the dynamic logit model for binary panel data. J Econom 170:102–116
Article Google Scholar
Bartolucci F, Pigini C (2019) Partial effects estimation for fixed-effects logit panel data models (MPRA Paper No. 92243). University Library of Munich, Germany
Bartolucci F, Pigini C, Valentini F (2021) MCMC Conditional Maximum Likelihood for the two-way fixed-effects logit (MPRA Paper No. 110034). University Library of Munich, Germany
Bester CA, Hansen C (2009) A penalty function approach to bias reduction in nonlinear panel models with fixed effects. J Bus Econ Stat 27:131–148
Article Google Scholar
Bettin G, Lucchetti R, Pigini C (2018) A dynamic double hurdle model for remittances: evidence from Germany. Econ Model 73:365–377
Article Google Scholar
Cameron AC, Trivedi PK (2005) Microeconometrics: methods and applications. Cambridge University Press, New York
Book Google Scholar
Cappellari L, Jenkins SP (2004) Modelling low income transitions. J Appl Econom 19:593–610
Article Google Scholar
Carro JM (2007) Estimating dynamic panel data discrete choice models with fixed effects. J Econom 140:503–528
Article Google Scholar
Chamberlain G (1980) Analysis of covariance with qualitative data. Rev Econ Stud 47:225–238
Article Google Scholar
Chamberlain G (1985) Heterogeneity, omitted variable bias, and duration dependence. In: Heckman JJ, Singer B (eds) Longitudinal analysis of labor market data. Cambridge University Press, Cambridge
Google Scholar
Charbonneau KB (2017) Multiple fixed effects in binary response panel data models. Econom J 20:S1–S13
Article Google Scholar
Chernozhukov V, Ferández-Val I, Hahn J, Newey W (2013) Average and quantile effects in nonseparable panel models. Econometrica 81:535–580
Article Google Scholar
Contoyannis P, Jones AM, Rice N (2004) Simulation-based inference in dynamic panel probit models: an application to health. Empir Econ 29:49–77
Article Google Scholar
Cox D (1972) The analysis of multivariate binary data. Appl Stat 21:113–120
Article Google Scholar
Dhaene G, Jochmans K (2015) Split-panel jackknife estimation of fixedeffect models. Rev Econ Stud 82:991–1030
Article Google Scholar
Dhaene G, Sun Y (2021) Second-order corrected likelihood for nonlinear panel models with fixed effects. J Econom 220:227–252
Article Google Scholar
Drescher K, Janzen B (2021) Determinants, persistence, and dynamics of energy poverty: an empirical assessment using German household survey data. Energy Econ 102:105433
Article Google Scholar
Fernández-Val I (2009) Fixed effects estimation of structural parameters and marginal effects in panel probit models. J Econom 150:71–85
Article Google Scholar
Fernández-Val I, Weidner M (2016) Individual and time effects in nonlinear panel models with large $N, T$. J Econom 192:291–312
Article Google Scholar
Hahn J, Kuersteiner G (2011) Bias reduction for dynamic nonlinear panel models with fixed effects. Econom Theor 27:1152–1191
Article Google Scholar
Hahn J, Newey W (2004) Jackknife and analytical bias reduction for nonlinear panel models. Econometrica 72:1295–1319
Article Google Scholar
Hanfelt JJ, Wang L (2014) Simple relaxed conditional likelihood. Biometrika 101:726–732
Article Google Scholar
Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50:1029–1054
Article Google Scholar
Heckman JJ (1981) The incidental parameters problem and the problem of initial conditions in estimating a discrete time-discreted data stochastic process. In: Manski C, McFadden D (eds) Structural analysis of discrete data with econometric applications. MIT Press, Cambridge
Google Scholar
Heckman JJ, Borjas GJ (1980) Does unemployment cause future unemployment? Definitions, questions and answers from a continuous time model of heterogeneity and state dependence. Economica 47:247–283
Article Google Scholar
Honoré BE, Kyriazidou E (2000) Panel data discrete choice models with lagged dependent variables. Econometrica 68:839–874
Article Google Scholar
Honoré BE, Weidner M (2020) Moment conditions for dynamic panel logit models with fixed effects. arXiv preprint arXiv:2005.05942
Hsiao C (2005) Analysis of panel data, 2nd edn. Cambridge University Press, New York
Google Scholar
Hyslop DR (1999) State dependence, serial correlation and heterogeneity in intertemporal labor force participation of married women. Econometrica 67:1255–1294
Article Google Scholar
Jochmans K (2018) Semiparametric analysis of network formation. J Bus Econ Stat 36:705–713
Article Google Scholar
Kunz J, Śtaub KE, Winkelmann R (2019) Predicting fixed effects in panel probit models. Monash Business School 19
Lancaster T (2000) The incidental parameter problem since 1948. J Econom 95:391–413
Article Google Scholar
Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. Handb Econ 4:2111–2245
Google Scholar
Neyman J, Scott EL (1948) Consistent estimates based on partially consistent observations. Econometrica 16:1–32
Article Google Scholar
Nickell S (1981) Biases in dynamic models with fixed effects. Econometrica 49:1417–1426
Pigini C, Presbitero AF, Zazzaro A (2016) State dependence in access to credit. J Financ Stab 27:17–34
Article Google Scholar
Stammann A, Heiß F, McFadden D (2016) Estimating fixed effects logit models with large panel data (Tech. Rep. No. G01-V3). ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften, Leibniz- Informationszentrum Wirtschaft
Wooldridge JM (2005) Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. J Appl Econom 20:39–54
Article Google Scholar
Wooldridge JM (2010) Econometric analysis of cross section and panel data. MIT Press, Cambridge
Google Scholar

Download references

Funding

Open access funding provided by Università Politecnica delle Marche within the CRUI-CARE Agreement

Author information

Francesco Bartolucci, Claudia Pigini and Francesco Valentini have contributed equally to this work.

Authors and Affiliations

Department of Economics, University of Perugia, Via A. Pascoli 20, 06123, Perugia, PG, Italy
Francesco Bartolucci
Department of Economic and Social Sciences, Marche Polytechnic University, Piazzale R. Martelli 8, 60121, Ancona, AN, Italy
Claudia Pigini & Francesco Valentini

Authors

Francesco Bartolucci
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Pigini
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Valentini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Valentini.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Missing Open Access funding information has been added in the Funding Note.

Appendix A

1.1 Simulation results: additional tables

See Tables 8, 9 and 10.

Table 8 Simulation results for ${\tilde{\mu }}_x$, dynamic logit model, $\gamma = 0.25$

Full size table

Table 9 Simulation results for ${\tilde{\mu }}_x$, dynamic logit model, $\gamma = 0.5$

Full size table

Table 10 Simulation results for ${\tilde{\mu }}_x$, dynamic logit model, $\gamma = 0.75$

Full size table

1.2 Simulation results: robustness exercises

See Tables 11, 12, 13 and 14.

Table 11 Simulation results for the dynamic logit model, $\gamma = 0.5$, time dummies

Full size table

Table 12 Simulation results for the static logit model (${\tilde{\mu }}$), trending regressor

Full size table

Table 13 Simulation results for the dynamic logit model, $\gamma = 0.5$, non-normal $\alpha _i$

Full size table

Table 14 Simulation results for the dynamic logit model, $\gamma = 0.5$, rare events

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bartolucci, F., Pigini, C. & Valentini, F. Conditional inference and bias reduction for partial effects estimation of fixed-effects logit models. Empir Econ 64, 2257–2290 (2023). https://doi.org/10.1007/s00181-022-02313-6

Download citation

Received: 24 August 2021
Accepted: 23 September 2022
Published: 26 October 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00181-022-02313-6

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Conditional inference and bias reduction for partial effects estimation of fixed-effects logit models

Abstract

Similar content being viewed by others

Indirect Inference Estimation of a First-Order Dynamic Panel Data Model

Non-linear Panel Data Models

Measurement Error in the Linear Dynamic Panel Data Model

1 Introduction

2 Average partial effects and the incidental parameters problem

3 Estimation of average partial effects

3.1 Existing strategies

3.2 Proposed methodology

3.2.1 Multiple-step estimation

3.2.2 Standard errors

3.2.3 Dynamic logit model

4 Monte Carlo simulation study

4.1 Simulation set-up

4.2 Main results

4.3 Robustness exercises

5 Empirical application

6 Conclusion

Data and computer code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

1.1 Simulation results: additional tables

1.2 Simulation results: robustness exercises

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Conditional inference and bias reduction for partial effects estimation of fixed-effects logit models

Abstract

Similar content being viewed by others

Indirect Inference Estimation of a First-Order Dynamic Panel Data Model

Non-linear Panel Data Models

Measurement Error in the Linear Dynamic Panel Data Model

1 Introduction

2 Average partial effects and the incidental parameters problem

3 Estimation of average partial effects

3.1 Existing strategies

3.2 Proposed methodology

3.2.1 Multiple-step estimation

3.2.2 Standard errors

3.2.3 Dynamic logit model

4 Monte Carlo simulation study

4.1 Simulation set-up

4.2 Main results

4.3 Robustness exercises

5 Empirical application

6 Conclusion

Data and computer code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

Appendix A

1.1 Simulation results: additional tables

1.2 Simulation results: robustness exercises

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation