1 Introduction

A growing literature has uncovered the importance of interactions between agents through networks as drivers for economic and social outcomes. A leading approach to statistical modeling of dyadic interaction is through the inclusion of agent-specific parameters (see, e.g., Snijders 2011 for many references). A specific example that has received substantial attention in the recent literature is the \(\beta \)-model for network formation. There, agent fixed effects serve to capture degree heterogeneity in link formation and the inclusion of dyad-level covariates reflects homophily; see, e.g., Graham (2017), Jochmans (2018), and Dzemski (2019).

Estimation of fixed-effect models for dyadic data is non-standard as the number of parameters grows with the sample size in a similar manner as in the classic incidental-parameter problem for one-way panel data discussed in Neyman and Scott (1948). Under so-called dense-network asymptotics, common parameters in regular models can be consistently estimated, but inference is plagued by asymptotic bias; see (Fernández-Val and Weidner 2016, 2018) and Graham (2017), for examples, discussion, and approaches to bias-correct the point estimator.

In this paper we look at generic estimation problems for undirected dyadic data and consider inference based on modifying the likelihood function in the spirit of Pace and Salvan (2006) and Arellano and Hahn (2006, 2007). In its most general form, the modified likelihood is a bias-corrected version of the profile likelihood, that is, of the likelihood after having profiled-out the nuisance parameters. The adjustment is both general and simple in form, involving only the score and Hessian of the likelihood with respect to the nuisance parameters. The adjustment term removes the leading bias from the profile likelihood and leads to asymptotically unbiased inference and likelihood ratio statistics that are \(\chi ^2\)-distributed under the null. The form of the adjustment can be specialized by using the likelihood structure, as in DiCiccio et al. (1996).

We work out the modifications to the profile likelihood in a linear version of the \(\beta \)-model and (in appendix) in a linear version of the (Bradley and Terry 1952) model for paired comparisons. These simple illustrations give insight in how the adjustments work. We next apply them to the \(\beta \)-model in the simulation designs of Graham (2017). We find that both modifications dramatically improve on maximum likelihood in terms of bias and mean squared error as well as reliability of statistical inference, and that they are considerably more reliable than ex post bias-correction of the maximum-likelihood estimator.

2 Fixed-effect models for dyadic data

We consider data on dyadic interactions between n agents. For each of \(\nicefrac {n(n-1)}{2}\) distinct agent pairs (ij) with \(i<j\) we observe the random variable \(z_{ij}\), which may be a vector. The density of \(z_{ij}\) (relative to some dominating measure) takes the form \( f(z_{ij};\vartheta ,\beta _i,\beta _j), \) where \(\vartheta \) and \(\beta _1,\ldots ,\beta _n\) are unknown Euclidean parameters. We may observe an outcome \(y_{ij}\) generated by pair (ij) together with a vector of dyad characteristics \(x_{ij}\), in which case we have \(z_{ij}=(y_{ij}, x_{ij}^\prime )^\prime \), and we could consider the distribution of the outcome conditional on the covariates. In what follows, we take the \(z_{ij}\) to be (conditionally) independent. Models of this form are relevant in many areas. Examples include the analysis of network formation as mentioned before, but also the study of strategic behavior among agents (Bajari et al. 2010), and the construction of rankings (Bradley and Terry 1952). Our goal is to perform inference on \(\vartheta \) treating the \(\beta _i\) as fixed effects.

The log-likelihood is

$$\begin{aligned} \ell (\vartheta ,\beta ) = \sum _{i=1}^n\sum _{i<j} \log f(z_{ij};\vartheta ,\beta _i,\beta _j), \end{aligned}$$

where we let \(\beta =(\beta _1,\ldots ,\beta _n)^\prime \). For simplicity of exposition we ignore any normalization that may be needed on \(\beta \) to achieve identification. When a normalization of the form \(c(\beta )=0\) is needed, everything to follow goes through on replacing \(\ell (\vartheta ,\beta )\) by the constrained likelihood \(\ell (\vartheta ,\beta )-\lambda \, c(\beta )\), where \(\lambda \) denotes the Lagrange multiplier. We give a detailed example in appendix.

It is useful to recall that the maximum-likelihood estimator of \(\vartheta \) can be expressed as

$$\begin{aligned} {\hat{\vartheta }} = \arg \max _\vartheta {\hat{\ell }}(\vartheta ), \end{aligned}$$

where \({\hat{\ell }}(\vartheta ) = \ell (\vartheta ,{\hat{\beta }}(\vartheta ))\), with

$$\begin{aligned} {\hat{\beta }}(\vartheta ) = \arg \max _\beta \ell (\vartheta ,\beta ), \end{aligned}$$

is the profile likelihood.

Inference based on the profile likelihood performs poorly because the estimation noise in \({\hat{\beta }}(\vartheta )\) introduces non-negligible bias. Moreover, in regular settings,

$$\begin{aligned} {\mathbb {E}}({\hat{\vartheta }}-\vartheta ) = O(n^{-1}), \qquad \qquad {\mathbb {E}}(({\hat{\vartheta }}-{\mathbb {E}}({{\hat{\vartheta }}}))^2)=O(n^{-2}), \end{aligned}$$

so that bias and standard deviation are of the same order of magnitude. Consequently, the maximum-likelihood estimator is asymptotically biased.

2.1 Modified profile likelihood

In its simplest form, modified likelihoods can be understood as yielding a superior approximation to the target likelihood

$$\begin{aligned} \ell (\vartheta ) = \ell (\vartheta ,\beta (\vartheta )), \qquad \beta (\vartheta ) = \arg \max _\beta {\mathbb {E}}(\ell (\vartheta ,\beta )). \end{aligned}$$

Moreover, the profile likelihood is the sample counterpart to this infeasible likelihood. Replacing \(\beta (\vartheta )\) with \({\hat{\beta }}(\vartheta )\) introduces bias that leads to invalid inference. To see this suppose that

$$\begin{aligned} {\hat{\beta }}(\vartheta )-\beta (\vartheta ) = \Sigma (\vartheta )^{-1} V(\vartheta ) + O_p(n^{-1}), \end{aligned}$$
(2.1)

where we introduce

An expansion of the profile likelihood around \(\beta (\vartheta )\) yields

$$\begin{aligned} {\hat{\ell }}(\vartheta )-\ell (\vartheta )= & {} ({\hat{\beta }}(\vartheta )-\beta (\vartheta ))^\prime V(\vartheta ) \\{} & {} - \frac{1}{2} ({\hat{\beta }}(\vartheta )-\beta (\vartheta ))^\prime \Sigma (\vartheta ) ({\hat{\beta }}(\vartheta )-\beta (\vartheta )) + O_p(n^{-1/2}). \end{aligned}$$

Combining the two expansions and taking expectations then shows that the bias of the profile likelihood is of the form

$$\begin{aligned} {\mathbb {E}} ({\hat{\ell }}(\vartheta )-\ell (\vartheta )) = \frac{1}{2} \textrm{trace} (\Sigma (\vartheta )^{-1} \Omega (\vartheta )) + O(n^{-1/2}) \end{aligned}$$
(2.2)

for

$$\begin{aligned} \Omega (\vartheta ) = {\mathbb {E}}[V(\vartheta )\, V(\vartheta )^\prime ], \end{aligned}$$

the variance of \(V(\vartheta )\).

Equation (2.1) is a conventional asymptotically linear representation of the estimator of the fixed effects; see, e.g., Rilstone et al. (1996). Low-level conditions for it to go through in specific models are provided in (Fernández-Val and Weidner 2016, 2018). The difficulty in the current case, as opposed to say the one-way panel data model (as dealt with in Hahn and Newey 2004), is to handle the non-sparse nature of the Hessian matrix.

With (2.2) in hand, a modified likelihood is

$$\begin{aligned} {\dot{\ell }}(\vartheta ) = {\hat{\ell }}(\vartheta ) - \frac{1}{2} \textrm{trace} ( {\hat{\Sigma }}(\vartheta )^{-1}{\hat{\Omega }}(\vartheta ) ), \end{aligned}$$

where we define the plug-in estimators

$$\begin{aligned} {\hat{\Sigma }}(\vartheta ) = {\hat{\Sigma }}(\vartheta ,{\hat{\beta }}(\vartheta )), \qquad {\hat{\Omega }}(\vartheta ) = {\hat{\Omega }}(\vartheta ,{\hat{\beta }}(\vartheta )), \end{aligned}$$

for matrices

$$\begin{aligned} -({\hat{\Sigma }}(\vartheta ,\beta ))_{i,j} = \hspace{-.02cm} \left\{ \begin{array}{cl} \sum _{k>i} \frac{\partial ^2 \log f(z_{ik};\vartheta ,\beta _i,\beta _k)}{\partial \beta _i^2} + \sum _{k<i} \frac{\partial ^2 \log f(z_{ki};\vartheta ,\beta _k,\beta _i)}{\partial \beta _i^2} &{} \qquad \text {if } i=j \\ \frac{\partial ^2 \log f(z_{ij};\vartheta ,\beta _i,\beta _j)}{\partial \beta _i\partial \beta _j} &{}\qquad \text {if } i<j \\ \frac{\partial ^2 \log f(z_{ji};\vartheta ,\beta _j,\beta _i)}{\partial \beta _i\partial \beta _j} &{}\qquad \text {if } i>j \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} ({\hat{\Omega }}(\vartheta ,\beta ))_{i,j} = \hspace{-.02cm} \left\{ \begin{array}{cl} \sum _{k>i} \left( \frac{\partial \log f(z_{ik};\vartheta ,{\beta }_i,\beta _k)}{\partial \beta _i}\right) ^2 \hspace{-.2cm}+ \hspace{-.1cm} \sum _{k<i} \left( \frac{\partial \log f(z_{ki};\vartheta ,\beta _k,\beta _i)}{\partial \beta _i}\right) ^2 &{} \hspace{-.10cm} \text {if } i=j \\ \frac{\partial \log f(z_{ij};\vartheta ,\beta _i,\beta _j)}{\partial \beta _i} \, \frac{\partial \log f(z_{ij};\vartheta ,\beta _i,\beta _j)}{\partial \beta _j} &{} \hspace{-.27cm}\text {if } i<j \\ \frac{\partial \log f(z_{ji};\vartheta ,\beta _j,\beta _i)}{\partial \beta _i}\, \frac{\partial \log f(z_{ji};\vartheta ,\beta _j,\beta _i)}{\partial \beta _j} &{} \hspace{-.27cm}\text {if } i>j \end{array} \right. \end{aligned}$$

In large samples, this modification removes the leading bias from the profile likelihood. Consequently, in large samples, the likelihood ratio statistic has correct size and

$$\begin{aligned} {\dot{\vartheta }} = \arg \max _\vartheta {\dot{\ell }}(\vartheta ), \end{aligned}$$

will have bias \(o(n^{-1})\). Furthermore, under usual regularity conditions, we have the limit result

$$\begin{aligned} ({\dot{\vartheta }}-\vartheta ) \overset{a}{\sim }N\left( 0,\frac{I(\vartheta )^{-1}}{n(n-1)/2}\right) \end{aligned}$$

as \(n\rightarrow \infty \), where we let

be the Fisher information for \(\vartheta \).

The only point at which the likelihood setting has been used so far is in the statement of the limit distribution of \({\dot{\vartheta }}-\vartheta \), where the expression for the asymptotic variance exploits the information equality. Bias-corrected estimation—using the same formula for the bias as before—thus carries over to more general extremum-type estimation problems; the only change being that, now, the asymptotic variance is \( I(\vartheta )^{-1} \Omega (\vartheta ) \, I(\vartheta )^{-1}. \)

Alternatively, following the arguments in Arellano and Hahn (2007) we can exploit the likelihood structure to get

$$\begin{aligned} \frac{1}{2} \textrm{trace} ( {\hat{\Sigma }}(\vartheta )^{-1}{\hat{\Omega }}(\vartheta ) ) = - \frac{1}{2} \log (\det {\hat{\Sigma }}(\vartheta )) + \frac{1}{2} \log (\det {\hat{\Omega }}(\vartheta ) ) + O(n^{-1}), \end{aligned}$$

which validates the alternative modified likelihood

$$\begin{aligned} \ddot{\ell }(\vartheta ) = {\hat{\ell }}(\vartheta ) + \frac{1}{2} \log (\det {\hat{\Sigma }}(\vartheta )) - \frac{1}{2} \log (\det {\hat{\Omega }}(\vartheta ) ); \end{aligned}$$

see DiCiccio et al. (1996). Its maximizer, say \(\ddot{\vartheta }\), satisfies the same asymptotic properties as \({\dot{\vartheta }}\).

2.2 Illustration: a linear \(\beta \)-model

Consider the following extension of the classic many normal means problem of Neyman and Scott (1948). Data are generated as

$$\begin{aligned} z_{ij} \sim N(\beta _i+\beta _j,\vartheta ), \end{aligned}$$

and are independent across dyads. The likelihood function for all parameters (ignoring additive constants) is

$$\begin{aligned} \ell (\vartheta ,\beta ) = -\frac{1}{2}\frac{n(n-1)}{2} \log \vartheta - \frac{1}{2} \sum _{i=1}^n\sum _{i<j} \frac{(z_{ij}-\beta _i-\beta _j)^2}{\vartheta }. \end{aligned}$$

Its first two derivatives with respect to the \(\beta _i\) are

$$\begin{aligned} \frac{\partial \ell (\vartheta ,\beta )}{\partial \beta _i} = \sum _{j>i} \frac{z_{ij}-\beta _i-\beta _j}{\vartheta } + \sum _{j<i} \frac{z_{ji}-\beta _j-\beta _i}{\vartheta } \end{aligned}$$

and

$$\begin{aligned} \frac{\partial ^2\ell (\vartheta ,\beta )}{\partial \beta _i\partial \beta _j} = \left\{ \begin{array}{cl} -\frac{(n-1)}{\vartheta } &{}\quad \text {if } i=j \\ -\frac{1}{\vartheta } &{}\quad \text {if } i\ne j \\ \end{array}\right. . \end{aligned}$$

Let \({\tilde{z}}_i=(n-2)^{-1}\sum _{j>i} z_{ij}+(n-2)^{-1}\sum _{j<i} z_{ji}\) and \({\overline{z}}=(2(n-1)^{-1}\sum _{i=1}^n {\tilde{z}}_i\). Solving for the maximum-likelihood estimator of \(\beta _i\) gives \( {\hat{\beta }}_i = {\tilde{z}}_i - {\overline{z}} \) for any \(\vartheta \). The profile likelihood is therefore

$$\begin{aligned} {\hat{\ell }}(\vartheta ) = -\frac{1}{2}\frac{n(n-1)}{2} \log \vartheta - \frac{1}{2} \sum _{i=1}^n \sum _{i<j} \frac{ (z_{ij}-({\tilde{z}}_i - {\overline{z}})-({\tilde{z}}_j - {\overline{z}}) )^2}{\vartheta }, \end{aligned}$$

and its maximizer is

$$\begin{aligned} {\hat{\vartheta }} = \frac{2}{n(n-1)} \sum _{i=1}^n \sum _{i<j} ( z_{ij}-({\tilde{z}}_i - {\overline{z}})-({\tilde{z}}_j - {\overline{z}}))^2. \end{aligned}$$

Some tedious but straightforward calculations yield

$$\begin{aligned} {\mathbb {E}}({\hat{\vartheta }}-\vartheta ) = -\frac{2}{n-1}\, \vartheta , \qquad \textrm{var}({\hat{\vartheta }}) = \frac{n-3}{n-1}\, \frac{2\, \vartheta ^2}{n(n-1)/2}, \end{aligned}$$

which confirms that the maximum-likelihood estimator of \(\vartheta \) suffers from asymptotic bias. Moreover,

$$\begin{aligned} \sqrt{\frac{n(n-1)}{2}} ({\hat{\vartheta }}-\vartheta ) \overset{d}{\rightarrow }\ N\big (-\sqrt{2}\vartheta , ({\sqrt{2}\vartheta })^2\big ), \end{aligned}$$

as \(n\rightarrow \infty \).

To set up the modified likelihood, first note that

$$\begin{aligned} ({\hat{\Sigma }}(\vartheta ))_{i,j} = \left\{ \begin{array}{cl} \frac{n-1}{\vartheta } &{}\quad \text {if } i = j \\ \frac{ 1}{\vartheta } &{}\quad \text {if } i\ne j \\ \end{array} \right. , \quad ({\hat{\Sigma }}(\vartheta )^{-1})_{i,j} = \left\{ \begin{array}{rl} \frac{\vartheta }{2}\frac{2n-3}{(n-1)(n-2)} &{}\quad \text {if } i = j \\ -\frac{\vartheta }{2}\frac{1}{(n-1)(n-2)} &{}\quad \text {if } i\ne j \\ \end{array} \right. , \end{aligned}$$

and that

$$\begin{aligned} ({\hat{\Omega }}(\vartheta ))_{i,j} = \left\{ \begin{array}{cl} \sum _{i<k} \frac{(z_{ik}-({\tilde{z}}_i -{\overline{z}})-({\tilde{z}}_k-{\overline{z}}))^2}{\vartheta ^2} + \sum _{i>k} \frac{(z_{ki}-({\tilde{z}}_k -{\overline{z}})-({\tilde{z}}_i-{\overline{z}}))^2}{\vartheta ^2} &{} \hspace{.2cm} \text {if } i = j \\ \frac{(z_{ij}-({\tilde{z}}_i -{\overline{z}})-({\tilde{z}}_j-{\overline{z}}))^2}{\vartheta ^2} &{} \hspace{.2cm} \text {if } i< j \\ \frac{(z_{ji}-({\tilde{z}}_j -{\overline{z}})-({\tilde{z}}_i-{\overline{z}}))^2}{\vartheta ^2} &{} \hspace{.2cm} \text {if } i> j \\ \end{array} \right. . \end{aligned}$$

It is then easily seen that

$$\begin{aligned} \frac{1}{2} \textrm{trace} ({\hat{\Sigma }}(\vartheta )^{-1}{\hat{\Omega }}(\vartheta )) = \frac{1}{2}\, \frac{2}{n-1} \sum _{i=1}^n \sum _{i<j} \frac{(z_{ij}-({\tilde{z}}_i -{\overline{z}})-({\tilde{z}}_j-{\overline{z}}))^2}{\vartheta }. \end{aligned}$$

From this we obtain

$$\begin{aligned} {\dot{\ell }}(\vartheta ) = -\frac{1}{2}\frac{n(n-1)}{2} \log \vartheta - \left( 1+\frac{2}{n-1}\right) \frac{1}{2} \sum _{i=1}^n \sum _{i<j} \frac{ (z_{ij}-({\tilde{z}}_i - {\overline{z}})-({\tilde{z}}_j - {\overline{z}}) )^2}{\vartheta }, \end{aligned}$$

and its maximizer

$$\begin{aligned} {\dot{\vartheta }} = \frac{n+1}{n-1} \, {\hat{\vartheta }} = {\hat{\vartheta }} + \frac{2}{n-1} {\hat{\vartheta }}. \end{aligned}$$

Clearly, this estimator removes the leading bias from the maximum-likelihood estimator. Moreover,

$$\begin{aligned} {\mathbb {E}}({\dot{\vartheta }}-\vartheta )= -\left( \frac{2}{n-1}\right) ^2 \vartheta , \qquad \textrm{var}({\dot{\vartheta }}) = \frac{n(n(n-1)-5)}{(n-1)^3}\, \frac{2\, \vartheta ^2}{n(n-1)/2}, \end{aligned}$$

which shows that the remaining bias in the point estimator is small relative to its standard deviation.

As an alternative correction, we may exploit the likelihood structure to adjust the profile likelihood by the term

$$\begin{aligned} - \frac{1}{2} \log (\det {\hat{\Sigma }}(\vartheta )) + \frac{1}{2} \log (\det {\hat{\Omega }}(\vartheta )) = \frac{n}{2}\, \log \, \vartheta + c, \end{aligned}$$

where c is a constant that does not depend on \(\vartheta \). This yields the modification

$$\begin{aligned} \ddot{\ell }(\vartheta ) = -\frac{1}{2}\frac{n(n-3)}{2} \log \vartheta - \frac{1}{2} \sum _{i=1}^n \sum _{i<j} \frac{ (z_{ij}-({\tilde{z}}_i - {\overline{z}})-({\tilde{z}}_j - {\overline{z}}) )^2}{\vartheta }, \end{aligned}$$

whose maximizer satisfies

$$\begin{aligned} {\mathbb {E}}(\ddot{\vartheta } -\vartheta ) = 0, \qquad \textrm{var}(\ddot{\vartheta }) = \frac{2\, \vartheta ^2}{n(n-3)/2}. \end{aligned}$$

This estimator is exactly unbiased.

To give an idea of the magnitude of the bias in this problem, Table 1 contains the bias and standard deviation of the estimators \({\hat{\vartheta }}\), \({\dot{\vartheta }}\), and \(\ddot{\vartheta }\) for various sample sizes n and variance parameter fixed to \(\vartheta =1\). These results are invariant to the value of the \(\beta _i\) and can be interpreted as relative bias for general values of \(\vartheta \).

Table 1 Many normal means

3 Application to the \(\beta \)-model

The \(\beta \)-model of network formation models Bernoulli outcome variables as having success probability

$$\begin{aligned} {\mathbb {P}}(y_{ij}=1\vert x_{ij};\vartheta ,\beta _i,\beta _j) = F(\beta _i+\beta _j+x_{ij}^\prime \vartheta ), \end{aligned}$$

where \(F(a)=(1+e^{-a})^{-1}\) is the logit link function. We now present the results from a Monte Carlo experiment. The designs are borrowed from Graham (2017). All designs are of the following form. Let \(u_i\in \lbrace -1, 1 \rbrace \) so that \({\mathbb {P}}(u_i=1) = \frac{1}{2}\). We generate the dyad covariate as

$$\begin{aligned} x_{ij} = u_i \, u_j, \end{aligned}$$

and the fixed effects as

$$\begin{aligned} \beta _i = \mu + \gamma _1 \, \frac{1+u_i}{2} + \gamma _2 \, \frac{1-u_i}{2} + v_i, \end{aligned}$$

where \(v_i\sim \textrm{Beta}(\lambda _1,\lambda _2)\). We set \(\mu = - \lambda _1 (\lambda _1+\lambda _2)^{-1}\), so that \(\mu +v_i\) has mean zero and will consider several choices for the parameters \((\gamma _1,\gamma _2)\) and \((\lambda _1,\lambda _2)\). The parameter choices are summarized in Table 2. In the first four designs (A1–A4), the \(\beta _i\) are drawn independently of \(x_{ij}\) from symmetric Beta distributions. In the next four designs (B1–B4) the \(\beta _i\) are generated from skewed distributions that depend on \(u_{i}\) (and thus correlate with the regressor \(x_{ij}\)). For both the designs labeled A and B, the average number of observed links per agent goes down as we move from the first design (A1 and B1) to the fourth design (A4 and B4). The average number of links decreases from about \(50\%\) to \(12\%\). This is clear from the second block of Table 2, which contains the average, minimum, and maximum number of links per agent (in percentages).

Table 2 Simulation designs for the \(\beta \)-model

We simulate 10, 000 data sets for each design for \(n\in \lbrace 25, 50, 75, 100\rbrace \) and \(\vartheta =1\). Because the results across designs are qualitatively very similar, we present the full set of results only for Design A1 (Table 3). Tables 4 and 5 provide the results for \(n\in \lbrace 50,100 \rbrace \) for all designs. Each table contains the mean and median bias of \(\vartheta \), \({\dot{\vartheta }}\), and \(\ddot{\vartheta }\), along with their standard deviation and their interquartile range (both across the Monte Carlo replications). The tables also provide the empirical size of the likelihood ratio test for the null that \(\vartheta =1\) for theoretical size \(\alpha \in \lbrace .05,.10 \rbrace \). Inference results based on the Wald statistic, using a plug-in estimator of \(I(\vartheta )\), are very similar and not reported for brevity.

Because the results for \(n=100\) can be compared (up to Monte Carlo error) to the numerical results collected in Graham (2017, Table 2), Table 5 contains two additional columns in which we reproduce the results for his analytically bias-corrected maximum-likelihood estimator (\({\hat{\vartheta }}_{\textrm{BC}}\)) and his ‘tetrad logit’ estimator (\({\hat{\vartheta }}_{\textrm{TETRAD}}\)). The latter is based on moment conditions that are free of \(\beta _i\) using a sufficiency argument. Bias-correcting \({\hat{\vartheta }}\) does not salvage the likelihood ratio statistic, and the conditional likelihood function of the ‘tetrad logit’ estimator is a quasi-likelihood and, therefore, does not satisfy the information equality. Hence, the results on size for these two estimators are based on the Wald statistic.

Table 3 \(\beta \)-model. Design A1 for all n

Table 3 clearly shows that both the bias and standard deviation of \({\hat{\vartheta }}\) are of order \(n^{-1}\). Consequently, the likelihood ratio test is size distorted even in large samples. Point estimation through the modified likelihoods gives estimators with small bias relative to their standard error. Even for \(n=25\), the bias is only about \(20\%\) of the bias in maximum likelihood estimator. In larger samples, the estimators are essentially unbiased. Both \({\dot{\vartheta }}\) and \(\ddot{\vartheta }\) are also less volatile than is \({\hat{\vartheta }}\). This phenomenon has been observed elsewhere; we refer to Schumann et al. (2022). Thus, here, bias-correction does not come at the cost of an increase in dispersion. Together with the substantial decrease in mean squared error, inference, too, improves dramatically. The likelihood ratio statistics for \({\dot{\ell }}(\vartheta )\) and \(\ddot{\ell }(\vartheta )\) have near-theoretical size for all n.

Fig. 1
figure 1

Power curves. Design A1 for all n. Power curves for the likelihood ratio statistic based on the profile likelihood \({\hat{\ell }}(\vartheta )\) (solid lines), the modified profile likelihood \({\dot{\ell }}(\vartheta )\) (dashed lines), and the modified profile likelihood that exploits the likelihood structure \(\ddot{\ell }(\vartheta )\) (dashed-dotted lines) for different sample sizes (vertically; \(n\in \lbrace 25, 50, 75, 100 \rbrace \)) and size (horizontally; left: \(\alpha =.10\), right: \(\alpha =.05\))

To give a more complete picture on inference via modifying the profile likelihood Fig. 1 presents power curves for the likelihood ratio statistic that go along with Table 3. The curves for \({\hat{\ell }}(\vartheta )\) (solid lines) are symmetric but not correctly centered, reflecting the fact that they are size distorted. This is so for all sample sizes and significance levels considered. Modifying the likelihood shifts the power curve so that the likelihood ratio test is (approximately) size correct. This is done without significantly altering the shape of the power curves. For the smallest sample size considered (\(n=25\); upper two plots) there is a small difference in power between the likelihood ratio test for \({\dot{\ell }}(\vartheta )\) (dashed lines) and \(\ddot{\ell }(\vartheta )\) (dashed-dotted lines); the former has slightly higher power than the latter for alternatives \(\vartheta >1\) and slightly less power for \(\vartheta <1\). This difference vanished rapidly as n increases, however, which is in line with the similar performance of both corrections observed in Table 3.

Table 4 \(\beta \)-model. All designs for \(n=50\)
Table 5 \(\beta \)-model. All designs for \(n=100\)

Tables 4 and 5 show that all conclusions from Design A1 carry over to the other designs. Moreover, the introduction of correlation between regressors and heterogeneous coefficients or skewing the distribution from which the latter are drawn does not prevent the modified likelihood to improve on maximum likelihood both in terms of point estimation and inference. A comparison of the two tables clearly shows that both the bias and standard deviation of \({\hat{\vartheta }}\) shrink by a factor of one half as n doubles, again illustrating that both are of order \(n^{-1}\). The subsequent reduction in bias by considering \({\dot{\vartheta }}\) and \(\ddot{\vartheta }\) and improvement in size are manifested for all designs.

Table 5 further shows that the modified-likelihood approach outperforms bias-correction of the maximum-likelihood estimator in Designs A3 and B3 and, in particular, in Designs A4 and B4. There, bias-correction of maximum likelihood introduces rather substantial additional bias relative to \({\hat{\vartheta }}\). The additional bias also leads to a large deterioration of the empirical size of the Wald statistic associated with \({\hat{\vartheta }}_{\textrm{BC}}\), with actual sizes ranging up to seven times the nominal size. This type of sensitivity of analytical bias-correction has equally been observed in panel data applications; see (Dhaene and Jochmans 2015) and Higgins and Jochmans (2023). The performance of the modified likelihood is comparable to Graham’s ‘tetrad logit’ estimator \({\hat{\vartheta }}_{\textrm{TETRAD}}\) in terms of bias, and it tends to be somewhat more accurate in terms of the empirical size of the associated hypothesis tests. Moreover, inference based on the ‘tetrad logit’ estimator is conservative in all designs even though, with \(n=100\) and therefore 4, 950 dyadic observations, the sample size is large. In addition, the ‘tetrad logit’ estimator is computationally prohibitive in large networks.