Testing the binomial fixed effects logit model, with an application to female labour supply

Regression models for proportions are frequently encountered in applied work. The conditional expectation function is bounded between 0 and 1 and therefore must be nonlinear, requiring nonstandard panel data extensions. One possible approach is the binomial panel logit model with fixed effects (Machado in J Econom 119:73–98, 2004). We propose a new and simple implementation of this conditional maximum likelihood estimator for standard software. We investigate the properties of the estimator under misspecification and derive a new test for overdispersion. Estimator and test are applied in a study of contracted working volumes, measured as proportion of full-time work, for women in Switzerland.

and Wooldridge (2008) a correlated random effects probit quasi-likelihood estimator, and Ramalho et al. (2016) a class of exponential GMM estimators.
And yet, proportions and related types of data are regularly encountered in applied econometric work. They sometimes correspond to the fraction of "successes" in a sequence of Bernoulli trials. Examples are the proportion of successful patent applications (Machado 2004) and the proportion of days absent from work (Barmby et al. 2001). Similarly, variety scores (e.g. the number of applicable items in a general health questionnaire), bounded count data, as well as ratings, can be re-scaled to the (0, 1)interval. For example, the degree of (customer or life) satisfaction in surveys often has a lower bound of zero (meaning "completely dissatisfied") and some upper bound (e.g. 10, meaning "completely satisfied"), which can then be re-coded as 100%. All these variables share the key features of being discrete and bounded, and the binomial model with a logit function for the expected proportion provides a natural starting point for modelling.
For the fixed effects setting, Machado (2004) shows that the incidental parameters problem can be overcome by a conditional maximum likelihood (CML) estimator, much like it is the case for the binary response logit model (Chamberlain 1980). She also provides Monte Carlo evidence indicating that the dummy variables (DV) approach is subject to an upward bias that is decreasing both in the length of the panel, T , and in the number of Bernoulli trials, K . For T > 5 and K > 5, CML and DV approaches yield quite comparable results with minor bias (Machado 2004).
This paper advances the earlier literature in three directions: First, we show how the binomial logit fixed effects estimator can be implemented in any off-the-shelf statistical software that includes a conditional logit routine, using the idea of cloning, or data expansion. Second, we study the properties of the CML and DV estimators for the case where the binomial distributional assumption fails. The leading example is that of overdispersion, originating from random unobserved heterogeneity or dependence among the Bernoulli trials. The CML estimator is not a pseudo-ML estimator in the sense of Gourieroux et al. (1984), and it does not possess formal robustness properties. We therefore investigate the extent of bias in a series of simulation experiments. Third and finally, we derive and implement a new test for the binomial assumption, i.e. a test for the hypothesis of no dispersion, as existing tests (e.g. Dean 1992) cannot be applied because the fixed effects are not estimated by the CML estimator.
To illustrate the proposed methods, we conduct a study of the determinants of women's work behaviour in Switzerland. The outcome variable is the contracted work-time percentage, where 0 means no work and 1 means full-time work. Data are extracted from the Swiss Household Panel for the years 2012-2016. The binomial logit estimates indicate that having children is associated with substantially reduced work-time percentage, ceteris paribus. Perhaps more surprisingly, having a partner makes the effect more pronounced, whereas speaking French reduces it.

Model and estimation
A proper panel model for proportions y it ∈ [0, 1] must overcome two challenges. First, the model should observe the restricted support of the outcome, as well as being able to handle data clustering at the end points. For instance, the log-odds transformation log[y it /(1 − y it )] is not defined for y it = 0 or y it = 1. Another method facing the same limitation is beta regression, which is flexible for fitting continuous proportional data but cannot give predictions at the boundaries with positive probability. Second, direct control for unobserved time-invariant individual heterogeneity (that may or may not be correlated with the regressors), using a dummy for each cross-sectional unit is subject to the incidental parameters problem, leading to inconsistent estimation of structural parameters when the length of panel T is fixed. Machado (2004) addresses these two issues by considering a binomial logit model with fixed effects. The application she had in mind was using information on the number of patent applications and patents granted at the firm level to estimate the probability of obtaining a patent (i.e. proportion of patents granted). She derived a consistent conditional maximum likelihood estimator based on the following assumptions: Assumption 1 Let Y it = K y it , where K is a known integer and Here, K is the number of "trials", Y it = K y it is the "number of successes", and y it is the proportion, or fraction of successes for observation unit i in period t.
Assumption 2 Let the expected proportion depend on covariates x it , and an individualspecific effect α i as follows: x it and α i can be correlated.

Assumption 3
Observations are independent between individuals and, conditional on group effects α i , serially uncorrelated. The objective of the analysis is estimation of β. Under Assumptions 1-3, the joint binomial density for Y i1 , Y i2 , . . . , Y i T conditional on t Y it is given by (see Machado 2004) where Q i = {(q 1 , q 2 , . . . , q T )|q t ∈ {0, 1, 2, . . . , K }, t q t = t Y it }. The conditional binomial approach eliminates the fixed effects α i which appear in the numerator and denominator with same power. Observations for which t Y it = 0 or t Y it = K T have a conditional probability of 1 and do not contribute to estimation of β. For proportion data, such outcomes tend to be much less prevalent than they are for binary outcomes.
In principle, the Machado (2004) approach solves an important problem in the analysis of panel data for proportions. In contrast to Papke and Wooldridge (2008), it is "semi-parametric", as there is no need to specify the relationship between the individual effect and the regressors, and also no need to add an assumption on the distribution of the individual effects. And yet, subsequent applications have been few, perhaps, because the estimator has a couple of limitations. First, the estimator is not readily available in standard econometric software packages. We therefore develop a simple modification that makes it easily implementable in standard software. And second, the binomial assumption may be violated, and the properties of the estimator under misspecification are unknown so far. We provide such a misspecification analysis in Sect. 2.3, and also derive a test for the binomial assumption in Sect. 3. In addition, it is important to point out that the binomial fixed effects estimator can be applied in a broader range of situations than hitherto considered, i.e. beyond those relating to the number (or proportion) of successes in a sequence of K independent Bernoulli trials. Even in the absence of such a process, the model can be a good starting point for fractions and shares, as we illustrate in an application to work-time percentages.

An alternative implementation
To understand, how the binomial logit fixed effects estimator can be implemented using any off-the-shelf statistical software with a conditional logit routine, note that the binomial distribution arises as the sum of K independent Bernoulli trials. Therefore, two estimators are equivalent: one based on a binomial log-likelihood function and the other based on a Bernoulli log-likelihood for an expanded dataset.
For the expanded dataset, one simply generates a sequence of K copies for each i, keeping the regressors unchanged, where the proportion y it is replaced by a sequence of 0/1 indicator variables d i jt in arbitrary order such that It follows that d i jt and y it have the same CEF: The logit (Bernoulli) log-likelihood function of the expanded dataset is given by This log-likelihood function is equal to the binomial log-likelihood as well as to the Bernoulli quasi-log-likelihood (Papke and Wooldridge 1996, replacing Y it by y it and (K − Y it ) by (1 − y it )), up to an additive constant, and the three ML estimators are therefore identical. Similarly, the conditional density function for individual i at time t can be written as: Compared with Eq. (3), the number of s such that {s| j s i jt = q it } is K q it for given q. Equation (7) is therefore basically the same as Eq. (3), except for the term t K Y it in the numerator of (3). But this term does not depend on any parameter and thus does not affect the first-order condition for the maximum of the log-likelihood function. Specifically, the conditional Bernoulli log-likelihood function is given by: with the first derivative which is the same as that of the conditional binomial model and therefore will yield the same consistent estimator of β, after elimination of the fixed effects. We from now on refer to this estimator as the Blogit (for binomial logit) conditional maximum likelihood estimator, or in short, Blogit CML, in contrast to the inconsistent binomial estimator with dummy variables included for each cross-sectional unit, Blogit DV. Of course, there can be situations where the expansion approach becomes practically infeasible: as K gets large, for instance, because proportions are measured at the granularity of percentage points, the size of the set S i of the conditional Bernoulli log-likelihood expressions is increased from T to 100 × T , at which point one may run into computational constraints.

Overdispersion
Departures from the binomial proportions model can take a number of forms. The first one is a violation of the independence assumption for the underlying Bernoulli trials. Positive dependence, or contagion, among the sequence of Bernoulli trials causes overdispersion, a conditional variance exceeding the binomial variance K p it (1 − p it ). Another violation stems from "random unobserved heterogeneity". This is in addition to the time-invariant unobserved heterogeneity α i . Random unobserved heterogeneity is time-and individual-specific, as well as unrelated to x it . Specifically, it means that p it is no longer a constant but rather a random variable, sayp it . Marginalizing over p it then leads to a mixture model that is characterized by overdispersion as well. Depending on the distribution ofp it , proportions can, for example, have a u-shaped probability function even conditional on α i and x it , i.e. probability mass stacked at the endpoints of 0 and 1, which is never the case for a binomial distribution that has either a single, or two adjacent modes.
A prominent example for a continuous mixture is the beta-binomial model, where the conditional probability isp and where φ > 0 is a parameter that determines the degree of overdispersion. It is straightforward to show that a beta-binomial distribution with this parameterization has expectation K it and variance Thus, the variance of the beta-binomial model is proportional to that of the binomial model. Overdispersion increases in K , the number of trials, and it decreases in the parameter φ. The binomial variance is obtained for K = 1, or in the limit, for φ → ∞, which also means that Var(p it ) → 0.
In general, fixed effects conditional maximum likelihood estimators are not consistent if the underlying model is misspecified. The reason is that the first-order condition is not a moment condition for the mean, but rather a function of the conditional probabilities. However, it might still be the case that the CML estimator works satisfactorily as long as the degree of overdispersion, in other words, the departure from the binomial assumption, is not too large. We will explore this type of robustness in a series of simulation experiments. We thereby extend results by Machado (2004), who considered the severity of the incidental parameters problem and the small sample properties of the CML estimator under the maintained assumption of a correctly specified binomial model. In our simulations, this assumption is dropped.

Simulation study
The simulation experiments employ two different data generating processes: one where the binomial assumption is satisfied, and the other, based on the beta-binomial model, where overdispersion is present. Unobserved time-invariant individual heterogeneity is positively correlated with the regressor in both cases. The degree of overdispersion is varied from 10 to 200%.
Both set-ups use the same logit conditional expectation function with a single regressor where β 0 = 0, β 1 = 2 and the size of the cross section is either N = 100 or N = 500. The time dimension increases from T = 2, T = 5 to T = 10.
The regressor x it is drawn from a uniform distribution with support [−1, 1] and has therefore a mean of 0 and a variance of 1/3. Draws are independent both across individuals and over time. We make a correlated random effects assumption: where ε i ∼ N (0, 1). It follows that the correlation between α i andx i is 0.5, a substantial amount. Once the mean is given, the dependent variable is obtained by generating pseudorandom numbers from either a binomial or a beta-binomial distribution. Specifically, we first draw integer random numbers from a (beta) binomial distribution with parameters K and (x it β 1 + α i ) and then divide the result by the number of categories K , e.g.
Ignoring the presence of the individual-specific component and estimating the marginal, pooled model instead has two effects: • β 1 is upward biased due to the positive correlation between x it and α i . • β 1 is downward biased due to omitted heterogeneity. In the probit model, there is a closed form expression for this bias (Wooldridge 2002). In the logit model, it needs to be computed numerically, but the direction is the same.
Which one of the two biases is larger is an empirical matter. The DV estimator, on the other hand, suffers from the standard incidental parameters bias that is upward (Abrevaya 1997). Table 1 shows the simulation results based on 1000 replications, for a sample size of N = 100. The mean and standard deviation of estimated coefficients across replications are reported. Three estimators were used: Blogit CML, Blogit DV, and pooled logit, respectively. Similar to Machado (2004), we find that the Blogit CML model estimates the true structural slope parameter very well even for small samples. There is a 2% upward bias for T = K = 2 that vanishes quickly as either T or K increases. The sampling variability decreases not only in T but also in K , albeit at a less than √ K rate. The Blogit DV estimator has a larger bias and a larger standard error, and hence a higher mean squared error, in all settings. The bias becomes small as T and K increase. For instance, for T = 10 and K = 10, the mean Blogit DV estimate is 2.025, whereas the mean Blogit CML estimate is 2.000. On the other hand, the pooled logit estimator has no tendency to converge to the true parameter β 1 = 2, over-or underestimating it depending on K and T . In the lower panel of Table 1, simulations are repeated for a larger sample, with N = 500 instead of N = 100. The qualitative conclusions remain unchanged.

Beta-binomial DGP
Simulations from the beta-binomial model add a further step: instead of directly obtaining binomial responses with (conditional on x it and α i ) success probability p it = (β 0 + β 1 x it + α i ),p it is now drawn from a beta distribution with mean p it : From (11), we know that the multiplicative variance inflation factor depends both on K and φ. To keep the degree of overdispersion constant for K = 2, 5, 10, we adjust φ accordingly. For example, for 10% overdispersion and K = 2, we have 1 + (K − 1)/(φ + 1) = 1.1, so φ = 9. As a practical limitation, common beta random number generators set lower bounds (above the theoretical ones of 0) for the two parameters. In Stata, for example, these are given by 0.05 and 0.15, respectively. From (15) we see that attempts to draw from the beta using arguments violating these bounds are more likely to arise when the mean is close to zero or one, or when φ is small (and therefore the degree of overdispersion is large). Since such occurrences only depend on exogenous factors, dropping these cases does not invalidate the estimation procedure. However, it affects the effective sample size and thus leads to higher standard errors than would otherwise be the case.
Figures 1 and 2 plot the relative biases of Blogit CML and Blogit DV against the degree of overdispersion, for N = 100 and N = 500, respectively. Overdispersion varies from 10% to 200%. (The full results on the means and standard deviations of the estimators for each DGP are given in Tables 6 and 7 in "Appendix.") Three key patterns emerge. First, overdispersion leads to an upward bias of both the Blogit CML and the Blogit DV estimators. The bias increases in the amount of Table 1 Simulation results under the binomial distribution  overdispersion. Second, the Blogit CML estimator always dominates the Blogit DV estimator, both in terms of bias and standard error. The same pattern was already found for the binomial case, and it persists in the presence of overdispersion. Third, for a given degree of overdispersion, the bias is decreasing in T as well as in K . However, increasing K alone not necessarily leads to a reduction in estimation bias, because it increases the amount of overdispersion, ceteris paribus. Again, results are qualitatively similar for N = 500 (see Fig. 2). The overall conclusion is that the Blogit CML estimator maintains a rather good performance even if the binomial model is misspecified, as long as the degree of overdispersion is modest, or else, as long as T is large. To take the two extreme cases, for N = 100 and K = T = 2, the mean estimate with 10% overdispersion is 2.1, a 5% upward bias. For K = T = 10, the mean estimate with 100% overdispersion is 2.049, a 2.45% upward bias.

A test for overdispersion
Existing binomial tests for y it , e.g. Dean's (1992) score test or regression-based tests regressing squared residuals y it −ˆ it onˆ it (1 −ˆ it ), require estimatesˆ it in order to obtain conditional variances Var(y it |x it ). However, the Blogit CML approach does not give usα i , so this is not feasible. To ascertain the validity of the Blogit CML model assumption, i.e. that K y it is binomial distributed conditional on α i and x it , we propose an alternative approach that usesβ but does not require estimates of α i , based on taking differences.
To start, consider a binary random variable M it defined by a draw from a Bernoulli distribution with mean y it , M it ∼ Bernoulli(y it ). Clearly, the conditional mean is The basic idea of the test is to compare the variances of the differences Y it − Y is and that of the difference M it − M is , for pairs of observations where the underlying probabilities p it = it are the same (or similar) for the two periods. For notational simplicity, let t = 1 and s = 2. In such cases, outcomes Y i1 , Y i2 can be regarded under H 0 as random draws from i.i.d. binomial distributions and the variance of Y i1 − Y i2 should be equal to the sum of binomial variances, under assumptions A1 and A3. On the other hand, the Bernoulli draws from the same distributions have standard variances. If there is over-or under-dispersion, the variance of Y i1 − Y i2 will be larger or smaller than the variance calculated from Bernoulli draws.
Specifically, consider the variable Conditional on y i1 , y i2 , Therefore, under A1, A2 and A3, the expectation of z i is given by Under the binomial assumption, Var(Y it ) = K it (1 − it ), and it follows that Hence, the expected value of z i is zero under the null hypothesis of binomial dispersion as long as x i1 β = x i2 β.
One possible alternative to the null of a binomial variance is given by the betabinomial model, where the variance is and η is equal to η = K −1 φ+1 > 0. In this case, overdispersion originates from random unobserved heterogeneity.

Case I: discrete covariates
Define the set of individuals with the same expectations over time, A = {i : i1 = i2 }, for which E(z i |i ∈ A) = 0 holds. With time-invariant fixed effect α i and a single regressor, the set A is equal to {i : x i1 = x i2 }. In general, the set A is broader, including all cases where x i1 β = x i2 β. In most cases, it will be possible to find such a set A if all covariates are finite discrete variables, assuming that the x-values are drawn from a stationary distribution. The test term for discrete x it is defined as: where |A| represents the number of elements in A. Under H 0 , τ A p − − → 0. Further, by the central limit theorem (CLT), the statistic τ A converges to a normal distribution, where σ 2 A = Var(z i |i ∈ A). In practice, σ 2 A is replaced by the sample varianceσ 2 A . So we reject the binomial distribution assumption at the α% significance level if where the critical value c is the 1 − α/2-percentile of the standard normal distribution. Individuals in the set A do not contribute to the estimation of the Blogit CML model, since x it are cancelled out as fixed effects. Nonetheless, they are needed for generating our dispersion test. This nonparametric method to build a test is similar to finding proper cell estimators in matching theory, but likewise faces the curse of dimensionality. It is hard to find the set A when the dimension of x it becomes larger. If |A| shrinks, the convergence rate √ |A| will decrease and the estimator τ A will converge more slowly.

Case II: continuous covariates
The set A = {i : i1 = i2 } is empty or very small when x i1 and x i2 are continuous. A more general method uses a kernel estimator for the conditional expectation E(z i | i1 − i2 = 0). The main idea is to put more weight on individuals with smaller | i1 − i2 |. Since we do not observe the underlying expectations it directly, we find the set A by using observables x it . Under the assumption of a single scalar regressor and time-invariant unobserved heterogeneity, we can decompose the conditional expectation (17) by a Taylor expansion at x i2 , As the fixed effect α i is cancelled out, an alternative conditional expectation function is given by i , Then, under the binomial assumption, The result generalizes to a vector-valued x, in which case The next step is to build a kernel estimator for τ (0). One conditional moment , where h is the kernel bandwidth for i and K ( i − h ) is the kernel function. For a given sample, i needs to be replaced bŷ i = (x i1 − x i2 )β, whereβ is estimated. We can use the Blogit CML estimator for estimation, as it is consistent under the binomial null hypothesis. We construct a local estimateτ for the object of interest τ (0) (see Pagan and Ullah 1999): ) is chosen for simplicity.

Asymptotic properties
Let f = f ( = 0) denote the continuous density function of the random variable at point 0. The kernel density estimatorf for f iŝ In addition, rewrite z i as the sum of its conditional expectation E(z i | i ) = τ ( i ) and an error term u i , such that The estimatorτ is a combination off and z î The expectation ofτ is We therefore obtain a bias that is proportional to h 2 .
To guarantee consistency of the estimatorτ n , convergence of the mean square error to zero is required. The MSE is equal to MSE(τ ) = Bias(τ ) 2 + Var(τ ). So the bias for τ n should decrease to zero, as n increases: Besides the convergence condition for bias, we also consider the asymptotic performance of the variance ofτ . Using a result on the variance of conditional expectations from Pagan and Ullah (1999), we obtain: To make sure that the MSE converges at the fastest speed, bias 2 and variance should converge at the same rate: h 4 ∝ 1 nh . Otherwise, the slower speed dominates the convergence rate. Thus, h is of order h ∝ n − 1 5 and by the central limit theorem, Here, σ 2 = Var(z 2 i | = 0), with the same definition as in the discrete case (Eq. 20). In practice, we standardize i at first and set bandwidth h = 0.9n − 1 5 . The approximate bias is calculated byÊ can be used as a t-test.

Multiple periods
The test can be extended to multiple time periods. With T = 2, there is a single moment condition for E(z i | i = 0) that can be tested. For T > 2, one possibility is to combine T − 1 such moment conditions into a single test statistic. In the discrete case, for set A t = {i : x i,t = x i,t+1 }, As we derived before, E(g i,t ) = 0.
In matrix form, with n t = |A t |, the cardinality of set A t . Denote n = (n 1 , . . . , n T −1 ) . To calculate the sample variance-covariance matrixŜ, we replace off-diagonal elements with pairwise sample covariances and diagonal ones with g t sample variances. A test statistics can be derived In the continuous case, moment conditions are Under the null hypothesis, E(g i,t ) = 0. These moment conditions can be written in matrix form for individual i = 1, . . . , N as: LetŜ denote the sample variance:

a test statistic is given by
The Chi-square test rejects the binomial distribution assumption at the α% significance level if J ≥ χ 2 α (T − 1).

Simulation study
We conduct a number of simulation experiments to examine the performance of these tests under two scenarios. In the first setting, explanatory variables are discrete (in fact, there is a single binary regressor, to keep things as simple as possible), while the explanatory variable is continuous in the second. The remaining aspects of the DGP regarding fixed effects, expectation functions and parameters setting are the same as those in Sect. 2.3. Table 2 presents rejection rates, i.e. the relative number of times that our test rejects the binomial assumption over 1000 replications, when x is discrete. x it is either 0 or 1 with equal probability. In this case, Pr(x i1 = x i2 ) = 50%, and on average half of the observations will be in the set A of individuals with the same expectations over time and thus informative for computing the test statistic. As before, the number of time periods increases from T = 2 to T = 10, and binomial parameter from K = 2 to K = 10.
The first row of each subpanel shows results without overdispersion, i.e. sampling from a binomial DGP applies. In this case, the rejection rates are equivalent to the proportion of type-I errors and ideally should be close to the nominal size of the test, in this case 5%. The lower part of each subpanel shows the rejection rates under H 0 when H 0 is false, i.e. the power.
When we implemented the multi-period discrete test as described in Sect. 3.4, we found that the size of the test was seriously distorted when T was large. For T = 10, the rejection rates under the binomial assumption were 44.7% for N = 100, and 12.9% for N = 500. For larger N , there is a convergence to the nominal size, but it is rather slow (e.g. 8.5% rejection rate for N = 1000). The reason for this test behaviour is the poor estimation of the covariance elements of the weighing matrixŜ. For example, when T = 3, the covariance between g i,1 and g i,2 is estimated based on the small subset of observations for which x i1 = x i2 = x i3 . The imprecise estimation of the covariances for small N leads to a large sampling variability ofŜ, and this problem increases with T . As an alternative, we therefore show in Table 2 simulation results, where all off-diagonal elements ofŜ were set to zero, leading to a better performance of the test in small samples. We also note that the continuous, kernel-based test does not suffer from this problem.
Reassuringly, we find that the test has some power against the alternative of rather modest overdispersion (10%), in particular for N = 500, K = 2 and T = 10, where around 36% of wrong null hypotheses are rejected. As the dispersion degree increases, Table 2 Simulation results for rejection rates when x is discrete x it is a binary variable with 50% probability equal to 0 or 1. The remaining DGP is the same as in Table 1; the null hypothesis is that of binomial dispersion the power of the test also grows, and it reaches 100% for DGPs where overdispersion, the number of observations, and the number of time periods are large.
In Table 3, we show the results for the kernel weighted test statistics for continuous regressors. x it is drawn from a uniform distribution with positive support between -1 and 1, with mean 0 and variance 1/3. The general patterns regarding type-I errors and power of the tests are mostly similar to those of Table 2. As in Table 2, the power of the test tends to decrease in K , for a given overall degree of overdispersion, but this tendency is more uniform in the continuous version of the test. This indicates that the power of the test reacts differently to the two parameters driving overdispersion, and in particular that it is more sensitive to increases in φ rather than K . The combined results from our simulation experiments are reassuring: on the one hand, modest amounts of overdispersion cause only minor bias of the Blogit CML estimator; on the other hand, the test we derive has good power properties against medium-or high-dispersion alternatives to the binomial assumption.

Application to labour supply
In this illustrative application of the binomial estimator and overdispersion test, we re-consider the association between fertility and female labour supply. Data are from the Swiss Household Panel (SHP) for the years 2012-2016. The SHP is an ongoing longitudinal survey of people residing in Switzerland. It collects information on a large range of topics on living conditions, both objective and subjective, including work, fertility and health. We restrict the analysis to women aged 25-45, who participated in the survey at least twice during the 5-year period. This gives us a panel of 5854 person-year observations for 1712 different women.
There exists a huge literature modelling female labour supply, a large part of which is devoted to the endogeneity of the fertility decision. We want to make here a different point, namely that the labour supply outcome, i.e. the amount of time a women decides to spend in market work, fits in principle into the empirical framework discussed in this paper and hence can be analysed using the methods proposed in this paper: empirically, the amount of days or hours worked is discrete, and it has a lower bound of zero, as well as an upper bound, and can thus be expressed as a proportion.
Modelling labour supply as a fraction of time may be promising in particular in institutional settings, where employment contracts offer various part-time options. A case in point is Switzerland, where vacancies are advertised, and work contracts are written using full-time fractions. For instance, 60% work-time means that the worker is employed for the equivalent of 3 days per week and also is paid 60% of a fulltime salary. In practice, the large majority of agreed-upon work-time percentages are multiples of 10%. Figure 3 shows the distribution of work-time percentages for the sample of women extracted from the Swiss Household Panel. Here, the data are pooled over the five years. The relative frequency of zeros is 14.4%, meaning that the estimated participation rate in our sample for this age group is 85.6%, a number very close to the official statistic published by the Federal Statistical Office (BfS 2016). Although there are peaks for non-work and for full-time work, all intermediate values are present in the data.

Table 3
Simulation results for rejection rates when x is continuous   In particular, Fig. 3 documents that for Switzerland, the vast majority of women does not work full-time. A question one can then ask is: How does the work-time percentage vary with the presence of children in the household? Box-plots in Fig. 4 show, for our data, a clear negative association between work and children. The median work-time percentage drops from 80% or higher for those aged 30 or below to 50% for women in their early 40s. At the same time, older women are more likely to have children. The key assumption of the following analysis is that we can treat 10-times the work-time percentage as a binomial variable with outcomes 0, 1, . . . , 10, where the mean will be modelled as a function of covariates as well as individual-specific timeinvariant fixed effects.
It is of course difficult to imagine the work-time decision as literally arising from an underlying sequence of K independent Bernoulli trials. Nevertheless, the binomial model can provide a useful approximation to the distribution of work-time percentages, in particular since it conditions on fixed effects and hence is compatible with an observed unconditional "W-shaped" outcome distribution as observed in Fig. 3. Our illustrative application abstracts from additional complexities, such as potential differences between the participation decision (i.e. the extensive margin) and the intensive margin, and, beyond the inclusion of individual fixed effects, the endogeneity of the fertility decision. Clearly, these are important concerns, and they should be addressed in further extensions of the approach. Table 4 provides some descriptive statistics (means and standard deviations) for both the dependent and the explanatory variables used in the estimation. The average work-time percentage is 56%, with a standard deviation of 0.34. Under the binomial assumption, the standard deviation for a fraction with a mean of 0.56 is equal to √ 0.56(1 − 0.56)/10 = 0.157, substantially below the observed standard deviation of 0.346. Hence, there is evidence of overdispersion at the marginal level.

Results
Women have an average age of 36.3 years, and 63.1% report having at least one child in the year they are surveyed. For 58.4% of person-year observations, there is a partner present in the household. The health status is captured by a 5-point scale for self-assessed health, where 0 means "not well at all" and 4 means "very well". We treat it as a cardinal scale for simplicity, and also abstract from its potential endogeneity to working or having children. Finally, we include information on language region. There is quite a bit of evidence that work-norms differ between the French and the German-speaking parts of Switzerland, with some stigma attached to working mothers, in particular during the first years of the child's life (see Steinhauer 2018). This stigma seems to be stronger in the German-speaking part of Switzerland (65% or our sample) but less so in the French-speaking part (29% of our sample). Our final estimation model includes four year dummies, age-squared (the linear age term is dropped; alternatively, one could identify the linear age effect by setting a second year effect equal to zero), indicators for the presence of a child and partner, and the health variable. Since language region is mostly constant over time, it is near-collinear with the fixed effects when applying the Blogit CML or Blogit DV estimators, and we therefore only include its interaction with the child-indicator variable.
As is the case for the binary logit model with fixed effects, DV estimation of the binomial model is subject to the perfect prediction problem (see, for example, Kunz et al. 2018). Outcomes for women, whose work-time percentage is either zero or one in each year, are perfectly predicted, meaning that the associated dummy coefficient will tend to minus or plus infinity, respectively. For the Blogit CML, perfectly prediction formally does not arise as the α i 's are not estimated. However, all such observations have mechanically a log-conditional likelihood value of zero and thus do not contribute to estimation of β either. To use the same estimation sample everywhere, we right away drop all perfectly predicted outcomes, leading to a final sample size of 4661 person-year observations for the work-time percentage model.
Regression results are given in Table 5. The first column shows the estimated coefficients from the Blogit CML and the second those from the Blogit DV model. The last two columns add corresponding (binary) logit models for the extensive margin model (work yes/no), again using alternatively the CML or DV estimators. Standard errors are clustered at the individual level. The linear age term has been dropped due to collinearity in a model with individual and year fixed effects.
When interpreting magnitudes, we note the recent suggestion by Kemp and Santos Silva (2016) and focus on expected (semi-) elasticities. These can be estimated without knowledge of α i and are thus very suitable for our conditional maximum likelihood approach. For the binomial proportion model with E(y it |x it , α i ) = it , we obtain A good estimator of the overall mean of it is the sample mean of the outcome, =ȳ = 0.55, so that the CML estimatorsβ can be multiplied by 0.45 to obtain an estimate of the population average semi-elasticities with respect to changes in the associated covariate.
From columns (1) and (2) of Table 5, we find a large negative association between having a child and the amount of work. The point estimate of the main effect is about -2, which means that not having a child increases the expected work-time percentage by about 90 percent. This effect is highly statistically significant, as are two of the three interaction effects: having a child reduces the work-time percentage more if a partner is present than otherwise, underlining the relevance of pecuniary motives for work, and the need to "make ends meet". The labour supply response of women to having children is about half as large for French-speaking women as it is for German speak- In this application, the Blogit CML and the Blogit DV results are very similar. The DV results are always a bit larger in absolute value, but the difference never exceeds 5%. This resonates with our simulation results, because both T and K are relatively large. Nevertheless, the joint test for the binomial assumption derived in Sect. 3.3 indicates a clear rejection (test value of 37.7 with a χ 2 0.95 critical value of 9.5). This rejection result due to overdispersion was already foreshadowed, although not logically implied because of the conditional nature of the test, by the high proportion of no work (zero) and full-time work (100%) as evident in Fig. 1. However, we know from the simulation results (Tables 1, 2) that even with 50% overdispersion, the bias of the Blogit CML is small for K = 10 and T = 5, a setting similar to the current application. At the same time, the probability of rejecting the wrong H 0 is very close to 1 (see Table 3). On a practical note, the CML estimator can be computed much faster than the DV estimator, by a factor of about 10 in our case. The speed problem of DV models would be exacerbated in applications with more cross-sectional units, to the point where computation of the Blogit DV estimator may become infeasible in the current Stata/R setting.
In the last two columns of Table 5, we allow for a comparison with results from a more conventional binary logit extensive margin estimator. A first point to note is that the effective sample becomes much smaller, since all observations with variation in the positive range only, i.e. percentages between 10 and 100%, are now coded as "1" and thus become perfectly predicted. Their variation does not contribute to estimation, the usable sample size drops by 3/4, and the standard errors of the estimated coefficients increase accordingly. We had to drop the interaction between speaking Italian and having children, as it could not be estimated in the reduced sample.
The estimated coefficients tend to be substantially larger, but they are not directly comparable. To obtain the implied expected semi-elasticities for the probability of work, coefficients need to be multiplied by the non-participation rate, 0.145 in this case, compared to a factor of 0.45 applicable in the first two columns. Based on the CML estimates, some of the extensive margin semi-elasticities are smaller than the overall semi-elasticities (like the main effect of having a child), and some of them larger (such as self-rated health). In terms of statistical significance, we find that the health and partner coefficients were not significant in the work-time percentage model, but they are in the participation model. And in terms of point estimates, the interaction between speaking French and having children just offsets the main effect of having at least one child, meaning that there is no difference in participation probabilities for French-speaking mothers and non-mothers, although some labour supply responsiveness was found in the work-time percentage model for the combined extensive and intensive margin effect. Also, the participation model suffers from a massive incidental parameters bias, since the point estimates for the DV estimator exceed those of the CML estimator by 50% on average.

Concluding remarks
Although Machado (2004) introduced the fixed effects binomial model as a method for proportions of successes in a sequence of Bernoulli trials, it can be used for discrete bounded outcomes, or fractions, more generally. However, it remained an open question whether or not the conditional binomial logit maximum likelihood estimator is robust to misspecification. In this paper, we focus on the consequences of overdispersion as it originates, for instance, from neglected unobserved heterogeneity. We show in simulation experiments that the Blogit CML estimator maintains a rather good performance even if the binomial model is misspecified, as long as the length of the panel T is sufficiently large, or the degree of overdispersion is modest.
We then derive a test of the null hypothesis that the binomial assumption is valid, based on departures from the implied binomial variance function. The test computes the variance of within-individual outcome differences. For the subset of observations whose regressors do not change over time, the mean difference is zero (and close to zero if regressors do not differ "too much") and it is possible to construct and compare two variance estimates, one with and one without the binomial assumption, that both do not depend on fixed effects. This is essential, as fixed effects are not estimated by the Blogit CML estimator. Our simulation experiments show that the test has good power properties against the alternative of medium or large degrees of overdispersion. But these are exactly the cases where the bias of the Blogit CML estimator becomes noticeable.
We study in our empirical application an outcome related to women's work behaviour, namely the contracted work-time percentage. In our sample of mid-aged women obtained from the Swiss Household Panel, 65% of all women report working part-time, i.e. a percentage between 10 and 90%. The empirical analysis using the fixed effects binomial logit model predicts substantially different work-time percentages for mothers and non-mothers. Having a partner makes the difference more pronounced, whereas speaking French reduces it. We show how these coefficients can be interpreted in terms of expected semi-elasticities even if the fixed effects are not estimated. In comparison with the fixed effects logit estimation for the participation model, much fewer observations are lost in the work-time percentage model due to perfect prediction, contributing to a much more precise estimation of the model parameters.
In future work, we will consider alternative estimators that could be pursued if the binomial null hypothesis is rejected. If the logit conditional expectation function is to be kept, a binomial logit correlated random effects model is a possible approach. Such a model would explicitly account for overdispersion, by assuming for instance that random, time-varying unobserved heterogeneity follows a normal distribution with mean depending on the regressors.
Funding Open Access funding provided by University of Zurich.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.