1 Introduction

Fixed effects (FE) methods for panel data (models with observation unit–specific fixed effectsFootnote 1) are widely applied in sociology and provide several advantages over cross-sectional methods. This has been shown in different contributions (e.g., Allison 2009; Brüderl and Ludwig 2015)Footnote 2. However, among the community of applied researchers, some confusion and misconceptions about what FE can and cannot do persist. Additionally, it seems that FE models are sometimes used without reflection. Thus, in a recent article, Hill et al. (2019) warn of the limitations of fixed effects analyses for panel data and discuss the critical deficiencies of fixed effects models that, in their view, are not addressed appropriately in many studies that use FE models. Furthermore, the authors argue that in many cases researchers use fixed effects methods without discussing the limitations of these methods. However, the authors also do not propose any solutions to the potential issues or any alternative estimation methods, which would be especially important because most of the critiques of FE models can be also applied to most other methods. Thus, the authors do not specify when it is appropriate to use FE models and when it is appropriate to use another method instead. In this article, we answer this question by discussing the advantages and disadvantages of FE models and comparing them to simple ordinary least squares (OLS) models. Therefore, we take the critique of Hill et al. (2019) as a starting point for a broader discussion of the pros and cons of fixed effects estimations.

We would first like to clarify that, throughout the article, we contrast linear FE models with classical pooled OLS (POLS) models, as POLS models provide the most understandable and natural benchmarks against which to assess linear FE models. Furthermore, a growing number of scholars (e.g., Angrist and Pischke 2010; Breen et al. 2018; Mood 2010) recommend using linear models even for binary dependent variables because these models can provide unbiased and consistent estimates of average effects (Wooldridge 2010, p. 563). Thus, we examine linear FE models herein. However, most characteristics of linear FE models should also apply to other versions of fixed effects estimations (e.g., FE probit or logit) because the rationale of using fixed effects in general is independent of the specific version one has in mind.

In the next section, we start by describing what linear fixed effects methods do and which conditions are necessary for consistency compared to POLS. We also discuss the limitations and concerns that should be considered when using FE models. Overall, we argue that fixed effects estimations are likely to outperform classical cross-sectional models (e.g., OLS models) and panel models (POLS models, random effects models) in most cases.

2 What Do Fixed Effects Models Do?

As a starting point, we first would like to clarify what linear FE models actually do. Assume that there is a classical linear regression (POLS) model for panel data in the following form:

$$y_{it}=x_{it}\beta +z_{i}\gamma +u_{it}+c_{i}$$

where \(x_{it}\) represents time-varying covariates, \(z_{i}\) represents time-constant covariates, \(u_{it}\)is a time-varying idiosyncratic error term, and \(c_{i}\) is the idiosyncratic, time-constant error term. Assume that, in this case, we are interested in the effect of \(x_{it}\)on \(y_{it}\). To obtain an unbiased estimate of \(\beta\), we require the following relatively strong assumption (Wooldridge 2010, p. 257):

$$E\left(c_{i}|x_{it} \right)=E\left(c_{i}\right)=0$$

Thus, individual-specific time-varying covariates have to be uncorrelated with the time-constant error term. For example, assume that we are interested in the effect of childbirth on maternal wages, i.e., the motherhood penalty. In this case, childbirth is likely correlated with typically unobserved and time-constant characteristics, such as individual attitudes and preferences (Budig and England 2001). Thus, any estimation of \(\beta\) is likely biased and inconsistent.

Introducing fixed effects by demeaning solves this problem:

$$y_{it}-\overline{y}_{i}=\left(x_{it}-\overline{x_{i}}\right)\beta +\left(z_{i}-\overline{z_{i}}\right)\gamma +u_{it}+(c_{i}-c_{i})$$

By removing idiosyncratic means from both sides of the equation, we are able to relax the strict exogeneity assumption that is required for a consistent estimation via POLS models. For consistency, FE models require the following assumption to hold:

$$E\left(u_{it}|x_{it},c_{i} \right)=E\left(u_{it}\right)=0$$

That is, time-varying covariates must not correlate with the time-varying error term. This is arguably a much weaker assumption than the exogeneity assumption required for POLS models. Additionally, since \(u_{it}\) is also part of the POLS models, this assumption also must hold for consistency in the case of POLS models. For example, if education is correlated with individual ability, the estimation results of FE models will not be biased as long as ability is time-constant.

Thus, the main benefit of fixed effects estimations is that the potential sources of biases in the estimations are limited in comparison to classical OLS models. In the case of OLS models, a correlation between any unobserved variable and the outcome or the treatment variable of interest results in a biased estimate of the treatment effect. In contrast, FE models limit the sources of bias to time-varying variables that correlate with the treatment as well as with the outcome over time. In most applications, this condition is far more achievable than the strong exogeneity assumption of OLS models. For an overview of FE models, see Brüderl and Ludwig (2015).

3 Potential Pitfalls and Limitations of Fixed Effects Estimations

In this section, we point to several important limitations of fixed effects estimations. As previously described, FE estimations are generally more credible than simple OLS regression results, but they are not perfect and do not lack drawbacks. In this context, we first show potential issues regarding FE estimations and then contrast the specific concerns with POLS models.

3.1 Unobserved Heterogeneity

In many applications, there are good reasons to still be concerned about unobserved time-varying heterogeneity. For example, some studies estimate the impact of wages on life satisfaction using fixed effects (e.g., Ferrer-i-Carbonell and Frijters 2004). In this case, there could still be omitted variables, such as chronic diseases (which we assume are partly time-varying, e.g., when one individual gets sick during the panel period), that affect both life satisfaction and wages. In such cases, FE models do not provide estimates of causal effects.

Furthermore, FE only provides unbiased estimates if treated and nontreated individuals do not differ with respect to the trend in the outcome of interest over time (Vaisey and Miles 2017). One example for this case of unobserved heterogeneity is the marital wage premium for men. Men with better career prospects, i.e., men on higher wage trajectories, tend to marry at higher rates because they are attractive partners. This pattern leads to an overestimation of the marital wage premium even in FE models (Ludwig and Brüderl 2018). Fixed effects individual slope (FEIS) models provide a solution to this problem by detrending the data (Brüderl and Ludwig 2015, p. 337).

However, we would also like to point out that no standard model (OLS models, logit models, probit models, Poisson models, etc.) is exempt from biases induced by unobserved heterogeneity (in general), especially time-varying unobserved heterogeneity, when using panel data. As shown in the previous section, POLS models also require the FE-consistency assumption in addition to the assumption regarding time-constant unobserved heterogeneity. Thus, if FE model results are biased, then POLS model results are also biased. Therefore, we would still argue that FE models are more robust to biases than other models because linear FE models still provide consistent estimation results when the conditional mean is correctly specified, while other models, such as logit or probit models, also require distributional assumptions to be met and require an absence of omitted variables for consistency. Thus, in many cases linear FE models may not estimate a causal effect but are more robust to biases than other models are.

3.2 Group Differences

The considerably weaker exogeneity assumption of FE models compared to that of OLS models implies that the former cannot estimate the coefficients of time-constant covariates at the same observation level as the fixed effects, which is an important drawback of FE models. For example, this means that one cannot include a gender dummy variable in FE estimations with individual-specific fixed effects because gender is time constant and is thus excluded from the estimation via the FE transformation. Thus, FE models are generally not suited for estimating absolute group differences such as gender wage gaps. However, we would also like to point out that one can still examine the interaction between group dummies and time-varying variables in FE models to estimate differences in the coefficients of covariates by group. For example, one can easily examine the interaction of gender and labor market experience in an FE model where the outcome is pay to determine whether there is a difference in the return on experience by gender; however, one cannot solely estimate the difference in pay with FE models.

3.3 Classical Measurement Error

Classical (i.e., random) measurement error can be more problematic in FE estimations than in other models (Angrist and Pischke 2009, p. 225). Consider an estimation where the outcome is the wage of an individual A in a two-wave panel. Now consider that the true wage of individual A is €2550 in both waves, but due to measurement error, individual A reports a wage of €2500 in wave 1 and €2600 in wave 2. In this case, FE models are more sensitive to measurement error than OLS models because all of the variance in FE models is time-varying and thus, in this case, only stems from measurement error. In comparison, OLS models also use time-constant variation, which largely negates the problem of measurement errors of this type because the overall wage is mostly reported validly. This drawback of FE models leads them to be more prone to classical measurement error in such cases, which can lead to attenuation bias, i.e. coefficient estimates being biased toward 0 (e.g., Swaffield 2001). Practically, this means that FE models provide relatively conservative coefficient estimates and might not detect effects that actually exist.

However, FE models are also more robust to certain types of measurement error problems than POLS models. For example, in wage regressions, wage is typically measured as gross wages. However, some people may mix up gross wages and net wages. This is a kind of measurement error that is nested within respondents; that is, people either get this question right or wrong. Thus, the measurement error varies systematically between individuals and is therefore captured by individual fixed effects. Throwing away between-individual variation and only using within-individual variation for estimations overcomes this kind of measurement error and reduces this potential source of bias. Therefore, for any given case, it is unclear whether measurement error leads to more severe biases in OLS or FE models. Nevertheless, researchers should be aware of the limitations of both estimation strategies.

3.4 Reverse Causality

Another problem that FE estimations are prone to is reverse causality (Vaisey and Miles 2017), and this problem could be widespread. Consider, for example, a regression of health on unemployment, i.e., we are interested in the effect of unemployment on individual health, using a yearly panel. In this case, using a simple FE model, it is unclear whether we actually obtain the causal effect of unemployment on health or if health instead affects the propensity to be employed. We know that both variables change between the survey years, but we do not necessarily know which one drives the effect. In the specific question of unemployment and health, there is some literature addressing this problem (e.g., Krug and Eberl 2018). However, reverse causality is a widespread problem for many research questions, and it is rarely discussed in practical applications. We would also like to point out that this problem also affects OLS estimations but could be more severe in FE estimations because the latter solely rely on intertemporal variation. There are methods that can be used to overcome reverse causality problems, such as dynamic panel estimation methods (e.g., Arellano and Bond 1991), cross-lagged panel models with fixed effects (Allison et al. 2017), or variants of event history models (e.g., Blossfeld 2001). For a comprehensive overview of this problem in panel models and potential solutions, we recommend Leszczensky and Wolbring (2019).

3.5 External Validity

Researchers using fixed effects regressions should at least be aware of issues regarding external validity, i.e., the generalizability of the estimation results. In this context, it is important to keep in mind that FE models solely identify effects based on within-individual changes, whereas POLS models consider between-individual variation (Bell and Jones 2015). Thus, FE estimates could be driven by selective groups. For example, an estimation of the wage effects of education using a fixed effects model with a general population survey will identify the monetary returns on education based on individuals who switch their educational degree in adulthood but will not account for anyone who has no variation in the schooling variable in adulthood (Hill et al. 2019). The group of individuals who attain higher educational degrees in adulthood could thus be very selective, and the effect identified is likely to not represent the overall effect of education on wages.

In this context, for better understanding we would also like to point to the literature on treatment effects (e.g., Angrist and Pischke 2009; Gangl 2010), which differentiates between the average treatment effect (ATE) and the average treatment effect on the treated (ATT) in FE and OLS models. If the exogeneity assumptions of the respective estimators are met, OLS models identify the ATE, which is the generalizable treatment effect. In contrast, when the exogeneity assumption is met, FE models identify the ATT, which is the treatment effect for those who select into the treatment. In this case, the effect is not generalizable to the general population. However, identifying the ATE is much more difficult than identifying the ATT due to the strict exogeneity assumption. Arguably, it is more useful to identify an unbiased ATT than a spurious ATE. Even causal effects, such as effects gained by an instrument variable estimator, can be transferred to the general public only in very special cases because they identify a local average treatment effect (LATE). In our view, it is difficult to identify any treatment effects that are arguably generalizable. Even lab experiments can be problematic in terms of external validity; it is sometimes unclear whether findings from a lab experiment can be generalized to the population. The setting in an experiment when investigating the use of incentives, for example, may differ largely from real-life situations in which incentives could matter. Thus, a local but causal effect is often more useful than a biased generalized effect. Nevertheless, one should be aware of the kind of effect that FE models identify and keep in mind that when FE estimates and estimates obtained via other methods differ, it does not necessarily imply that the estimates from other methods are biased—they could simply identify a different effect.

However, we would also like to point out that there are cases in which the comparison of POLS and FE estimates seems reasonable. For example, Budig and England (2001) investigate the motherhood wage penalty using FE models and uncover biases in previous studies using only OLS models because most women tend to become mothers at some point in time, which likely makes the sample used in the FE estimation less selective than in other applications.

3.6 Large Standard Errors

A valid point raised by critics of FE analyses (e.g., Hill et al. 2019; Longhi and Nandi 2014) is that FE analyses have lower statistical power than POLS analyses because the former use fewer cases to estimate coefficients and the coefficients only represent within-individual changes over time. This concern is certainly valid, and it is true that FE makes type II errors (i.e., the false null hypothesis is not rejected) more likely (Allison 2009, p. 9). The consequence of this is that FE models are always more conservative, which in our view is always preferable. Thus, FE models reduce the likelihood of type I errors.

At this point, we would also urge researchers not to make statements on whether there is an effect or not based on statistical significance only. In any case, one should also consider the size of coefficients. If the effect size is relatively large, but the effect is statistically nonsignificant, one should be cautious of stating that there is no effect. In either case, this concern is generally valid for analyses using small sample sizes and is not limited to FE models. Furthermore, the problem of low statistical power becomes less worrisome with the rise of long running panel data such as the PSID, SOEP, and BHPS; such data provide a large amount of within-individual variation over time.

3.7 Fixed Effects as a Black Box

One point of concern is that individual time-constant heterogeneity is essentially a black box (e.g., Angrist and Pischke 2009, p. 243; Hill et al. 2019). In other words, we do not know what information and which biases are eliminated. In most cases, it is unclear which characteristics are actually captured with fixed effects models. For example, FE models might account for a large degree of variation in health status, e.g., for variation that is genetically determined, but we do not know which part of lifestyle choices related to health are captured by the model, as lifestyle may be relatively time constant. Thus, this is an issue to consider when using FE modelsFootnote 3.

However, compared to POLS models, this is an advantage of FE models because they do not necessarily require defining the black box. In contrast, in a POLS model, one has to define what is in the black box (in terms of omitted variables) to rule out biased results due to an omitted variable bias. Assuming that personality traits are largely time-constant but contain a certain share of time-variability, it is still better to account for a large share of potential biases instead of none at all. In some cases, it is pretty clear what is definitively ruled out by fixed effects. For example, establishment fixed effects should in any case capture sector effects, region effects, etc., which we often want to account for in any case.

3.8 Lagged Dependent Variables and State Dependence

In some cases, researchers could be interested in state dependence, i.e., the effect of the lagged outcome \(y_{it-1}\) on the outcome in the current period \(y_{it}\). This is relevant when one is, for example, interested in the stickiness of unemployment, i.e., whether unemployment in the previous period in itself, net of other control variables, determines the propensity to be unemployed in the current period. The problem is that, in this case, researchers cannot simply use the lagged outcome as a control variable because it is mechanically correlated with the transformed error term, which renders the estimation of the coefficient inconsistent. Dynamic panel data models (e.g., Arellano and Bond 1991; Blundell and Bond 1998) provide a solution to this problem by instrumenting lagged dependent variables with prior differences of the outcome (e.g., by instrumenting \(y_{it-1}\) with \(y_{it-2}-y_{it-3}\)), thus tackling the problem of serial correlation. However, this problem is not exclusive to panel FE models but also applies when using POLS models with lagged dependent variables.

4 Conclusion

The use of fixed effects models in sociology is still increasing, and we wholeheartedly welcome this trend. However, there are still misconceptions and confusion among researchers regarding what fixed effects models can and cannot do. This article aims to provide a comprehensive and intuitive overview of this estimation method.

As we show in this paper, comparing FE models to common OLS models, FE models are still preferable in most cases. Therefore, we still encourage scientists to use FE models if they are appropriate for answering a specific research question and if panel data are available. However, we would like to urge researchers to be aware of exactly which effects these models identify and the limitations of these models. In general, we recommend using FE models, especially if (1) time-constant unobserved heterogeneity is likely to be a problem (e.g., concerning selection into the treatment), (2) one is not interested in societal group level differences, (3) time-varying unobserved heterogeneity is unlikely to pose a problem, and (4) the direction of the causal effect is theoretically clear (i.e., if there is likely no problem with reverse causality). In cases where there are still confounding factors, FE models will not provide causal estimates of the effect of interest. Regarding reverse causality, Leszczensky and Wolbring (2019, p. 21 f.) conclude that FE models are “hardly a silver bullet if causal inference is threatened by reverse causality.”

Furthermore, we would like to end on a cautious note. In most cases, it is not correct to interpret the coefficients of any standard OLS, FE, or comparable model as causal effects but rather as partial correlations, net of certain covariates. The identification of causal effects requires an exogenous shock, as in an experiment, to be credible in most cases. In economics, this notion has led to the credibility revolution (Angrist and Pischke 2010), and quasi-experimental designs are now common in this discipline, using methods such as difference-in-differences estimations, instrumental variable regressions, and regression discontinuity designs (e.g., Gangl 2010; Legewie 2012 for an overview in the field of sociology). We believe that the credible identification of causal effects requires this gold standard of identification. Again, especially in these cases, researchers should still be aware of the limitations of their study, the validity of the instruments they use to identify causal effects, and the kinds of effects they identify (especially when using natural experiments).

While one should acknowledge that in most cases FE estimations do not identify truly causal effects, we still deem these models quite valuable. There are many topics in which there are simply no quasi-experimental settings, and it is better to have estimations of partial correlations that are taken with a grain of salt than no research on these topics at all. This article suggests that in most cases, FE models are preferable to OLS models.