Modified-likelihood estimation of fixed-effect models for dyadic data

We consider point estimation and inference based on modifications of the profile likelihood in models for dyadic interactions between n agents featuring agent-specific parameters. The maximum-likelihood estimator of such models has bias and standard deviation of order n-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}$$\end{document} and so is asymptotically biased. Estimation based on modified likelihoods leads to estimators that are asymptotically unbiased and likelihood ratio tests that exhibit correct size.


Introduction
A growing literature has uncovered the importance of interactions between agents through networks as drivers for economic and social outcomes.A leading approach to statistical modeling of dyadic interaction is through the inclusion of agent-specific parameters (see, e.g., Snijders 2011 for many references).A specific example that has received substantial attention in the recent literature is the β-model for network formation.There, agent fixed effects serve to capture degree heterogeneity in link formation and the inclusion of dyad-level covariates reflects homophily; see, e.g., Graham (2017), Jochmans (2018), and Dzemski (2019).
Estimation of fixed-effect models for dyadic data is non-standard as the number of parameters grows with the sample size in a similar manner as in the classic incidental-parameter problem for one-way panel data discussed in Neyman and Scott (1948).Under so-called dense-network asymptotics, common parameters in regular models can be consistently estimated, but inference is plagued by asymptotic bias; see (Fernández-Val andWeidner 2016, 2018) and Graham (2017), for examples, discussion, and approaches to bias-correct the point estimator.
In this paper we look at generic estimation problems for undirected dyadic data and consider inference based on modifying the likelihood function in the spirit of Pace and Salvan (2006) and Arellano andHahn (2006, 2007).In its most general form, the modified likelihood is a bias-corrected version of the profile likelihood, that is, of the likelihood after having profiled-out the nuisance parameters.The adjustment is both general and simple in form, involving only the score and Hessian of the likelihood with respect to the nuisance parameters.The adjustment term removes the leading bias from the profile likelihood and leads to asymptotically unbiased inference and likelihood ratio statistics that are χ 2 -distributed under the null.The form of the adjustment can be specialized by using the likelihood structure, as in DiCiccio et al. (1996).
We work out the modifications to the profile likelihood in a linear version of the β-model and (in appendix) in a linear version of the (Bradley and Terry 1952) model for paired comparisons.These simple illustrations give insight in how the adjustments work.We next apply them to the β-model in the simulation designs of Graham (2017).We find that both modifications dramatically improve on maximum likelihood in terms of bias and mean squared error as well as reliability of statistical inference, and that they are considerably more reliable than ex post bias-correction of the maximumlikelihood estimator.

Fixed-effect models for dyadic data
We consider data on dyadic interactions between n agents.For each of n(n−1) /2 distinct agent pairs (i, j) with i < j we observe the random variable z i j , which may be a vector.The density of z i j (relative to some dominating measure) takes the form f (z i j ; ϑ, β i , β j ), where ϑ and β 1 , . . ., β n are unknown Euclidean parameters.We may observe an outcome y i j generated by pair (i, j) together with a vector of dyad characteristics x i j , in which case we have z i j = (y i j , x i j ) , and we could consider the distribution of the outcome conditional on the covariates.In what follows, we take the z i j to be (conditionally) independent.Models of this form are relevant in many areas.Examples include the analysis of network formation as mentioned before, but also the study of strategic behavior among agents (Bajari et al. 2010), and the construction of rankings (Bradley and Terry 1952).Our goal is to perform inference on ϑ treating the β i as fixed effects.
The log-likelihood is where we let β = (β 1 , . . ., β n ) .For simplicity of exposition we ignore any normalization that may be needed on β to achieve identification.When a normalization of the form c(β) = 0 is needed, everything to follow goes through on replacing (ϑ, β) by the constrained likelihood (ϑ, β) − λ c(β), where λ denotes the Lagrange multiplier.We give a detailed example in appendix.
It is useful to recall that the maximum-likelihood estimator of ϑ can be expressed as is the profile likelihood.
Inference based on the profile likelihood performs poorly because the estimation noise in β(ϑ) introduces non-negligible bias.Moreover, in regular settings, so that bias and standard deviation are of the same order of magnitude.Consequently, the maximum-likelihood estimator is asymptotically biased.

Modified profile likelihood
In its simplest form, modified likelihoods can be understood as yielding a superior approximation to the target likelihood Moreover, the profile likelihood is the sample counterpart to this infeasible likelihood.
Replacing β(ϑ) with β(ϑ) introduces bias that leads to invalid inference.To see this suppose that where we introduce .
An expansion of the profile likelihood around β(ϑ) yields Combining the two expansions and taking expectations then shows that the bias of the profile likelihood is of the form the variance of V (ϑ).Equation (2.1) is a conventional asymptotically linear representation of the estimator of the fixed effects; see, e.g., Rilstone et al. (1996).Low-level conditions for it to go through in specific models are provided in (Fernández-Val andWeidner 2016, 2018).The difficulty in the current case, as opposed to say the one-way panel data model (as dealt with in Hahn and Newey 2004), is to handle the non-sparse nature of the Hessian matrix.
With (2.2) in hand, a modified likelihood is where we define the plug-in estimators for matrices In large samples, this modification removes the leading bias from the profile likelihood.Consequently, in large samples, the likelihood ratio statistic has correct size and will have bias o(n −1 ).Furthermore, under usual regularity conditions, we have the limit result as n → ∞, where we let be the Fisher information for ϑ.
The only point at which the likelihood setting has been used so far is in the statement of the limit distribution of θ − ϑ, where the expression for the asymptotic variance exploits the information equality.Bias-corrected estimation-using the same formula for the bias as before-thus carries over to more general extremumtype estimation problems; the only change being that, now, the asymptotic variance is Alternatively, following the arguments in Arellano and Hahn (2007) we can exploit the likelihood structure to get which validates the alternative modified likelihood see DiCiccio et al. (1996).Its maximizer, say θ, satisfies the same asymptotic properties as θ.

Illustration: a linear ˇ-model
Consider the following extension of the classic many normal means problem of Neyman and Scott (1948).Data are generated as 123 SERIEs (2023) 14:417-433 and are independent across dyads.The likelihood function for all parameters (ignoring additive constants) is Its first two derivatives with respect to the β i are Solving for the maximum-likelihood estimator of β i gives βi = zi − z for any ϑ.The profile likelihood is therefore and its maximizer is Some tedious but straightforward calculations yield which confirms that the maximum-likelihood estimator of ϑ suffers from asymptotic bias.Moreover, as n → ∞.
To set up the modified likelihood, first note that and that It is then easily seen that From this we obtain and its maximizer Clearly, this estimator removes the leading bias from the maximum-likelihood estimator.Moreover, , which shows that the remaining bias in the point estimator is small relative to its standard deviation.
As an alternative correction, we may exploit the likelihood structure to adjust the profile likelihood by the term where c is a constant that does not depend on ϑ.This yields the modification whose maximizer satisfies This estimator is exactly unbiased.
To give an idea of the magnitude of the bias in this problem, Table 1 contains the bias and standard deviation of the estimators θ, θ, and θ for various sample sizes n and variance parameter fixed to ϑ = 1.These results are invariant to the value of the β i and can be interpreted as relative bias for general values of ϑ.

Application to the ˇ-model
The β-model of network formation models Bernoulli outcome variables as having success probability where F(a) = (1+e −a ) −1 is the logit link function.We now present the results from a Monte Carlo experiment.The designs are borrowed from Graham (2017).All designs are of the following form.Let u i ∈ {−1, 1} so that P(u i = 1) = 1 2 .We generate the dyad covariate as and the fixed effects as where v i ∼ Beta(λ 1 , λ 2 ).We set μ = −λ 1 (λ 1 + λ 2 ) −1 , so that μ + v i has mean zero and will consider several choices for the parameters (γ 1 , γ 2 ) and (λ 1 , λ 2 ).The parameter choices are summarized in Table 2.In the first four designs (A1-A4), the β i are drawn independently of x i j from symmetric Beta distributions.In the next four designs (B1-B4) the β i are generated from skewed distributions that depend on u i (and thus correlate with the regressor x i j ).For both the designs labeled A and B, the average number of observed links per agent goes down as we move from the first design (A1 and B1) to the fourth design (A4 and B4).The average number of links decreases from about 50% to 12%.This is clear from the second block of Table 2, which contains the average, minimum, and maximum number of links per agent (in percentages).We simulate 10, 000 data sets for each design for n ∈ {25, 50, 75, 100} and ϑ = 1.Because the results across designs are qualitatively very similar, we present the full set of results only for Design A1 (Table 3).Tables 4 and 5 provide the results for n ∈ {50, 100} for all designs.Each table contains the mean and median bias of ϑ, θ, and θ, along with their standard deviation and their interquartile range (both across the Monte Carlo replications).The tables also provide the empirical size of the likelihood ratio test for the null that ϑ = 1 for theoretical size α ∈ {.05, .10}.Inference results based on the Wald statistic, using a plug-in estimator of I (ϑ), are very similar and not reported for brevity.
Because the results for n = 100 can be compared (up to Monte Carlo error) to the numerical results collected in Graham (2017, Table 2), Table 5 contains two additional columns in which we reproduce the results for his analytically bias-corrected maximum-likelihood estimator ( θBC ) and his 'tetrad logit' estimator ( θTETRAD ).The latter is based on moment conditions that are free of β i using a sufficiency argument.Bias-correcting θ does not salvage the likelihood ratio statistic, and the conditional likelihood function of the 'tetrad logit' estimator is a quasi-likelihood and, therefore, does not satisfy the information equality.Hence, the results on size for these two estimators are based on the Wald statistic.
Table 3 clearly shows that both the bias and standard deviation of θ are of order n −1 .Consequently, the likelihood ratio test is size distorted even in large samples.Point estimation through the modified likelihoods gives estimators with small bias relative to their standard error.Even for n = 25, the bias is only about 20% of the bias in maximum likelihood estimator.In larger samples, the estimators are essentially unbiased.Both θ and θ are also less volatile than is θ.This phenomenon has been observed elsewhere; we refer to Schumann et al. (2022).Thus, here, bias-correction Simulation results for the maximum-likelihood estimator ( θ), the modified profile-likelihood estimator ( θ), and the modified profile-likelihood estimator exploiting the likelihood structure ( θ) does not come at the cost of an increase in dispersion.Together with the substantial decrease in mean squared error, inference, too, improves dramatically.The likelihood ratio statistics for ˙ (ϑ) and ¨ (ϑ) have near-theoretical size for all n.
To give a more complete picture on inference via modifying the profile likelihood Fig. 1 presents power curves for the likelihood ratio statistic that go along with Table 3.The curves for ˆ (ϑ) (solid lines) are symmetric but not correctly centered, reflecting the fact that they are size distorted.This is so for all sample sizes and significance levels considered.Modifying the likelihood shifts the power curve so that the likelihood ratio test is (approximately) size correct.This is done without significantly altering the shape of the power curves.For the smallest sample size considered (n = 25; upper two plots) there is a small difference in power between the likelihood ratio test for ˙ (ϑ) (dashed lines) and ¨ (ϑ) (dashed-dotted lines); the former has slightly higher power than the latter for alternatives ϑ > 1 and slightly less power for ϑ < 1.This difference vanished rapidly as n increases, however, which is in line with the similar performance of both corrections observed in Table 3.
Tables 4 and 5 show that all conclusions from Design A1 carry over to the other designs.Moreover, the introduction of correlation between regressors and heterogeneous coefficients or skewing the distribution from which the latter are drawn does not prevent the modified likelihood to improve on maximum likelihood both in terms of point estimation and inference.A comparison of the two tables clearly shows that both the bias and standard deviation of θ shrink by a factor of one half as n doubles, again illustrating that both are of order n −1 .The subsequent reduction in bias by considering θ and θ and improvement in size are manifested for all designs.Power curves for the likelihood ratio statistic based on the profile likelihood ˆ (ϑ) (solid lines), the modified profile likelihood ˙ (ϑ) (dashed lines), and the modified profile likelihood that exploits the likelihood structure ¨ (ϑ) (dashed-dotted lines) for different sample sizes (vertically; n ∈ {25, 50, 75, 100}) and size (horizontally; left: α = .10,right: α = .05) Table 5 further shows that the modified-likelihood approach outperforms biascorrection of the maximum-likelihood estimator in Designs A3 and B3 and, in particular, in Designs A4 and B4.There, bias-correction of maximum likelihood introduces rather substantial additional bias relative to θ.The additional bias also leads to a large deterioration of the empirical size of the Wald statistic associated with θBC , Simulation results for the maximum-likelihood estimator ( θ), the modified profile-likelihood estimator ( θ), and the modified profile-likelihood estimator exploiting the likelihood structure ( θ) with actual sizes ranging up to seven times the nominal size.This type of sensitivity of analytical bias-correction has equally been observed in panel data applications; see (Dhaene and Jochmans 2015) and Higgins and Jochmans (2023).The performance of the modified likelihood is comparable to Graham's 'tetrad logit' estimator θTETRAD in terms of bias, and it tends to be somewhat more accurate in terms of the empirical size of the associated hypothesis tests.Moreover, inference based on the 'tetrad logit' estimator is conservative in all designs even though, with n = 100 and therefore 4, 950 dyadic observations, the sample size is large.In addition, the 'tetrad logit' estimator is computationally prohibitive in large networks.Simulation results for the maximum-likelihood estimator ( θ), the modified profile-likelihood estimator ( θ), the modified profile-likelihood estimator exploiting the likelihood structure ( θ, the bias-corrected maximum-likelihood estimator ( θBC ), and the 'tetrad logit' estimator ( θTETRAD Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix: A linear Bradley-Terry model
As an alternative to the specification of the Neyman and Scott (1948) model with complementarities, now suppose that independently across dyads.This model is overparameterized as, clearly, the mean of the β i is not identified.A common normalization in this type of model is n i=1 β i = 0 (Simons and Yao 1999), and we will maintain it here.The constrained likelihood is where λ is the Lagrange multiplier for our normalization constraint.The first-order condition for the constrained problem for β i for a given ϑ equals j>i z i j − j<i z ji ϑ − n ϑ β i = 0.
This gives βi = j>i z i j − j<i z ji n = zi (say) for all i and any ϑ.Observe that the sign of βi is driven by the comparison of the magnitudes of i< j z i j and i> j z ji .Also note that n i=1 βi = 0 holds.We therefore have and with it, the maximum-likelihood estimator θ = 2 n(n − 1) n i=1 i< j (z i j − zi + z j ) 2 .123 Simulation results for the maximum-likelihood estimator ( θ), the modified profile-likelihood estimator ( θ), and the modified profile-likelihood estimator exploiting the likelihood structure ( θ)

Table 2
Simulation designs for the β-model Power curves.Design A1 for all n.