Maximum Augmented Empirical Likelihood Estimation of Categorical Marginal Models for Large Sparse Contingency Tables

Categorical marginal models (CMMs) are flexible tools for modelling dependent or clustered categorical data, when the dependencies themselves are not of interest. A major limitation of maximum likelihood (ML) estimation of CMMs is that the size of the contingency table increases exponentially with the number of variables, so even for a moderate number of variables, say between 10 and 20, ML estimation can become computationally infeasible. An alternative method, which retains the optimal asymptotic efficiency of ML, is maximum empirical likelihood (MEL) estimation. However, we show that MEL tends to break down for large, sparse contingency tables. As a solution, we propose a new method, which we call maximum augmented empirical likelihood (MAEL) estimation and which involves augmentation of the empirical likelihood support with a number of well-chosen cells. Simulation results show good finite sample performance for very large contingency tables.

Categorical marginal models (CMMs; Bergsma et al., 2009; also see, e.g., Bergsma, 1997;Bergsma & Rudas, 2002;Bartolucci et al., 2007;Colombi & Forcina, 2001;Evans & Forcina, 2013;Lang & Agresti, 1994;Lang, 1996;Molenberghs & Lesaffre, 1999;Rudas & Bergsma, 2023) are flexible tools to model location, spread, and association in dependent or clustered categorical data, when the dependence itself is not of interest.CMMs require data in a table format for input; that is, for a dataset with N respondents and J categorical variables, CMMs require a (vectorized) J -variate contingency table, where each cell corresponds to a response pattern, and the frequencies within the cells represent the observed frequencies of each response pattern.The only assumption of the CMMs under consideration is that the cell frequencies in the contingency table follow a multinomial distribution, rendering a very flexible method.
CMMs can be a valuable psychometric tool since they allow for null-hypothesis significance testing (NHST) of complex coefficients without the need to specify a parametric model or impose additional assumptions.In Psychometrics, NHTS often occurs under the assumption of a parametric model.For example, testing measurement invariance across several groups is typically done under a structural equation model (e.g., Cheung & Rensvold, 2002).However, rather than testing H 0 (the null-hypothesis of interest), we implicitly test H * 0 : H 0 plus the assumption that the structural equation model fits the data.Rejecting H * 0 does not provide information about H 0 because H * 0 should be rejected either when H 0 is false or when the structural equation model does not fit the data (cf.Jorgensen et al., 2017).In other fields of psychometrics (e.g., nonparametric modeling, classical test theory) and applied statistics, there is no comprehensive parametric modeling framework.In such situations, it becomes particularly valuable if the assumptions required for NHST are easily satisfied, ensuring that the null hypothesis of interest is not excessively confounded by data failing to meet the assumptions, thus maintaining a close approximation between H * 0 and H 0 .The CMM assumption that cell frequencies follow a multinomial distribution is very lenient, implying that every response pattern should, in principle, be observable.
The process of relaxing assumptions for NHST can be a time-consuming endeavor spanning several years.For instance, in the case of NHST for Cronbach's alpha, there exists a history of research papers progressively relaxing the required assumptions: Feldt derived tests for three types of null-hypothesis on Cronbach's alpha: alpha equals some criterion value (Feldt, 1969), alpha is equal across groups (Feldt, 1965), and alpha is equal across different measurements (Feldt 1980).Feldt assumed that alpha asymptotically follows an F distribution.This assumptions was subsequently relaxed by Van Zyl et al. (2000), who derived a distribution without restricting the covariances, Maydeu-Olivares et al. (2007) who relaxed the assumptions of Feldt's first hypothesis by deriving asymptotically distribution-free interval estimates for alpha, Maydeu-Olivares et al. (2010) who proposed testing Feldt's hypotheses in a structural equation modeling framework, and ultimately, Kuijpers et al. (2013), who proposed using CMMs for testing Feldt's hypotheses.Each successive paper demonstrated significant enhancements in the properties of NHST for Cronbach's alpha when compared to its predecessors.
In some cases, no hypothesis tests are available leaving CMMs as a possible option to derive hypothesis tests.For example, Van der Ark et al. (2008) used CMMs for developing NHST for Mokken's (1971) scalability coefficients, which allows testing scalability coefficients for item pairs, individual items, and scales across groups and across measurement occasions.Finally, we would like to note that CMMs can be used in conjunction with latent variables models, although this needs further development.We refer to Bergsma et al. (2009), for other applications of CMMs, and Bergsma et al. (2009Bergsma et al. ( , 2013) ) who introduced CMMs with latent variables.
CMMs can be estimated using the maximum likelihood (ML) method, which has many favorable properties, including asymptotic efficiency.A serious limitation of the ML method is that for large contingency tables estimation is infeasible, as ML requires the computation of an expected frequency for each cell in the contingency table.This curse of dimensionality may be an important reason why CMMs have failed to become popular in psychometrics.Most psychological and educational tests consist of many variables (usually referred to as items) yielding an extremely large number of possible response patterns and, therefore, extremely large contingency tables.For example, Raven's Advanced Progressive Matrices (Raven et al., 2003), measuring general intelligence, consists of 48 binary items, which yields a contingency table of 2 48 ≈ 2.81 × 10 14 cells; and the personality inventory NEO-PI-R (Costa & McCrae 2008), measuring five personality traits, consists of 48 five-category items per trait, which yields a contingency table of 5 5×48 ≈ 5.66 × 10 167 cells.Lloyd (2000) estimated that if every particle in the universe could be used as part of a huge computer, it could store approximately 10 90 bits.Hence, for contingency tables based on psychological and educational tests, the required computer capacity easily exceeds the ultimate physical limits of computation, whereas The ML estimation procedure to estimate CMMs implemented in the R-package cmm (Bergsma & Van der Ark, 2023) cannot handle more than a few million cells.
In this paper, we give a new adaptation to the ML estimation procedure to solve the above problem.Although there are alternative estimation procedures that may be used to estimate 1230 PSYCHOMETRIKA CMMs, we preferred to stay within a ML-framework as ML guarantees asymptotic efficiency, whereas alternatives estimation methods for contingency tables, such as generalizing estimation equations (GEE's, e.g., Qaqish & Liang, 1992), and composite likelihood (e.g., Varin et al., 2011) are not, and weighted least squares (Grizzle et al., 1969;a.k.a the GSK-method) is sensitive to sparsity in the marginal distribution (cf.Rudas & Bergsma, 2023).In addition, an adaptation of the ML approach is easy to fit in the existing software.
Initially, we considered the empirical likelihood method (Owen, 2001, Qin & Lawless, 1994), a data-driven, nonparametric estimation method.The core idea behind the empirical likelihood method is to construct a likelihood function directly from the observed data, without assuming any specific underlying probability distribution; that is, given vector valued data x 1 , . . ., x N , an empirical likelihood is the likelihood of a probability distribution with support {x 1 , . . ., x N } (Owen, 2001) .In the context of CMMs, the empirical likelihood method involves constructing the likelihood solely from cells with nonzero frequencies, while regarding cells with zero frequency as structural zeroes and setting their estimated probability to zero.Given that the number of cells with nonzero frequencies cannot exceed the sample size, and in the case of psychological and educational test data, the sample size rarely exceeds 10,000, the empirical likelihood method serves as a computationally feasible alternative to ML.We abbreviate the method of maximizing the empirical likelihood subject to model constraints by MEL.
Unfortunately, the support {x 1 , . . ., x N } belonging to the empirical likelihood may be too small (i) to estimate the parameters of a CMM, or, even if this can be done, (ii) to estimate the asymptotic covariance matrix of the ML estimators of the parameters of the CMM.We will refer to these two problems as the first-and second-order estimation problems, respectively (see Appendix A for more details).The first problem has also been called the empty set problem (Grendár & Judge, 2009).As far as we are aware, the second problem has not yet been described in the literature.The solution to these problems which we propose in this paper is to augment the empirical likelihood support with a number of well-chosen points, and we will refer to the method of maximizing the resulting empirical likelihood as maximum augmented empirical likelihood (MAEL).Note that as the sample size goes to infinity, assuming no structural zeroes, the probability that all cells in a contingency table will have a positive count will go to 1, so for categorical data MEL, MAEL and ML are asymptotically equivalent.
The reason why MEL and MAEL estimators work asymptotically (as N → ∞) is because they are with probability tending to 1 equivalent to ML estimator.That justifies testing goodness of fit and making inferences for parameters in same ways as we would do with ML.Two related methods, called adjusted empirical likelihood, Chen et al. (2008) and balanced augmented empirical likelihood (Emerson & Owen, 2009; also see Nguyen et al., 2015, Xia & Liu, 2019) have been considered for continuous data.These methods augment the data set with one or two additional observations.In contrast, our methodology consists of only augmenting the support of distributions corresponding to the empirical likelihood with additional points, but without adding any observations to the data.
The remainder of the paper is organized as follows.In Sect. 1, we give a brief overview of and notation for CMMs.In Sect.2, we describe ML and MEL estimation for CMMs and introduce MAEL estimation.In Sect.3, we present two simulation studies.Study 1 compares the convergence rate and computation time of ML, MEL, and MAEL estimation for small contingency tables, and Study 2 investigates the Type I error rate of CMMs using MAEL estimation for small and large contingency tables, and bias and variance of the model parameters.In Sect.4, we briefly discuss the advantages and disadvantages of MAEL estimation in relation to other, non-likelihoodbased estimation procedures.In Appendix A, we describe the first-and second-order estimation problems in some generality, whereas Appendix B gives details of the estimation algorithm used.1231 1. CMMs Consider the categorical variables X 1 , . . ., X j , . . ., X J with X j ∈ {0, . . .g j }.Let x 1 , . . ., x i , . . ., x N be i.i.d.data points, where each x i = (x i1 , . . ., x i J ) consists of the scores of the ith respondent on the variables X 1 , . . ., X J .The data can be collected in a J -way contingency table of observed frequencies with L = J j=1 g j cells.The observed frequency of the response pattern (x 1 , . . ., x J ) on variables (X 1 , . . ., X J ) is denoted by n X 1 , x 1 , ..., ..., X J x J .The observed frequencies in the contingency table are collected in an L × 1 vector n, arranged in lexicographical order; that is, the digit in the last row of the corresponding response pattern changes fastest and the digit in the first row changes slowest.As an example, Eq. 1 shows the vector n containing the observed frequencies of the response patterns pertaining to the scores of N = 130 respondents on J = 3 binary variables, a, b, and c: If it is clear which variables are involved, then the superscript may be omitted.Marginal frequencies are denoted by removing the appropriate variable(s) from the subscript and score(s) from the superscript.In some formulas, the subscript i in n i is used as an index.For example, i n i means the sum over all elements of n.
The probability that a randomly drawn respondent has response pattern x 1 , . . ., x J given that the CMM of interest is true, is denoted by π x J .The expected frequencies and probabilities are collected in vectors m, and π, respectively, in the same manner as the observed frequencies were collected in n.ML estimates of m and π are denoted by m and π, respectively.Without any constraints imposed upon the data, m = n and π = n/N .
Let A be a matrix of zeroes and ones, so that A T m consists of the relevant marginals from the contingency table.A CMM is defined by constraints of the form where f is an appropriate function, Z is a design matrix of full column rank, and β is a vector of parameters.For estimation purposes, parameter β is eliminated from the equation as follows.Let B be the orthogonal complement of the column space spanned by the columns of Z (i.e., B T Z = 0 and the concatenated matrix (B Z) is square and non-singular).By pre-multiplying both sides of Eq. 2 by B T , the CMM is written as a set of constraints: Note that parameter β can be obtained from Eq. 2 by The constraint formulation B T f(A T m) = 0 (cf.Eq. 3) is computationally convenient since it allows the Lagrange multiplier technique to be used, and asymptotic theory has been developed using this formulation (Aitchison & Silvey, 1958, Lang, 2005).In addition, the parameter formulation f(A T m) = Zβ (Eq.2) is not possible if B T is of full column rank because Z, the orthogonal complement of B, does not exist.Therefore, the parameter formulation of CMMs will be disregarded from here on.
For notational convenience, we can replace B T f(A T m) by g(m).So, the shortest notation for a CMM is g(m) = 0. (5) Let D be the number of constraints in Eq. 5; that is, the length of vector g(m).The fit of the CMM can be investigated by comparing n and the ML estimate under the model, m, using a likelihood ratio test statistic (G 2 ) or Pearson's Chi-square test statistic (X 2 ), which have an asymptotic Chisquare distribution with D degrees of freedom if the model is true.Example 1 shows a simple CMM following the build up in Eqs. 2, 3, 4, and 5, whereas Example 2 shows a CMM that has been used in psychometrics.
Example 1.Consider n in Eq. 1. Suppose that we want to fit the CMM that prescribes marginal homogeneity: . First, pre-multiplying m by design matrix A T (Eq.2) yields the required margins; that is, To write the CMM as a set of constraints, f(A T m) is pre-multiplied by constraint matrix B T (cf.Eq. 3, left-hand side), and set to zero, yielding is the orthogonal complement of B, with (Z T Z) −1 = 1, parameter β (which in this case is 1-dimensional) can be obtained by Eq. 4; that is, Conventional short notation g(m) = 0 (Eq.5) is obtained by letting g(m) = B T f(A T m); that is, The vector of expected frequencies that is closest (in an ML sense) to n (Eq. 1) and meets the requirement of Eq. 9 is Comparing n given in Eq. 1 and m given in Eq. 10 yields G 2 = 2.6107 (df = 2, p = .2711).Using a nominal Type I error rate of α = .05,the hypothesis of marginal homogeneity should not be rejected.
Example 2. Item-scalability coefficient H j ( j = 1, . . ., J ) is used in Mokken scale analysis (e.g., Mokken, 1971;Sijtsma & Van der Ark, 2017) and expresses the strength of the relationship between item j and the other items in the test, comparable with a regression coefficient in a regression model.One of the criteria of a Mokken scale is that all coefficients H j are greater than some lower bound c.The lower bound that is used as a default is c = 0.30 (Sijtsma & Molenaar 2002).Hence, a relevant question is whether all H j > 0.30.Coefficients H j are not independent 1234 PSYCHOMETRIKA from each other, and CMMs can be used to control for this nuisance dependence and test all coefficients simultaneously.Under the assumption that the items are numbered in an ascending order of their probability of answering the item correctly (i.e., item 1 is the least popular or most difficult item, item J the most popular or least difficult item), item-scalability coefficients H j , j = 1, . . ., J for dichotomous items (Mokken, 1971, p. 151) are defined as Consider the observed frequencies in Eq. 1.Let H = (H a , H b , H c ) be a vector containing the item-scalability coefficients of items a, b, and c.Equation 11shows that H is a function of m.The constraints H − (0.3, 0.3, 0.3) T = 0 defines a CMM (Eq.5); we refer to Van der Ark et al. ( 2008) for computational details.
The sample values of H j for the vector of observed frequencies in Eq. 1 are H a = 0.231, H b = 0.164, and H c = 0.055.Fitting the CMM that all item-scalability coefficients equal 0.3 to the data in Eq. 1 yields G 2 = 14.84 (d f = 3, p = 0.0023).Using a nominal Type I error rate of α = 0.05, the hypothesis H = (0.3, 0.3, 0.3) T should be rejected.

ML and MEL Estimation
Assuming that the frequency vector n follows a multinomial distribution, the likelihood function is The maximum likelihood estimate m maximizes L(m|n) subject to the model constraint and the multinomial constraint In Appendix B, an algorithm for finding m is given.For multinomial distributions, MEL estimation is similar to ML estimation, with the difference that all cells for which n i = 0 are treated as structural zeros.The MEL estimate of m maximizes L(m|n) subject to Eqs. 13 and 14 and the structural-zero constraint MEL estimation can be done using the same algorithm as ML estimation because the cells i for which n i = 0 can simply be left out of the estimation procedure.For MEL estimation, fewer cells 1235 need to be estimated, which makes the procedure faster and more suitable for large contingency tables than ML estimation.
In general, a superscripted asterisk indicates that the cells i for which n i = 0 are left out; that is, L * is the number of cells for which n i > 0, n * is the vector of length L * of nonzero observed frequencies (i.e., n * is the vector containing those n i that are greater than zero).The corresponding expected frequencies and expected probabilities are denoted m * and π * , respectively, and g * (m * ) equals g(m) with the elements of m corresponding to zero observed cells set to zero.Example 3 shows an illustration of MEL estimation.Example 3.This example illustrates MEL estimation of the CMM in Example 1.For the vector of observed frequencies in Eq. 1, In Eq. 16, n abc 100 has been omitted, which implies that m abc 100 is fixed to zero, and not considered in the estimation procedure.The CMM in Eq. 9 under MEL reduces to Comparing n * given in Eq. 16 and

The First-and Second-Order Estimation Problems for CMMs
Unfortunately, the support {x 1 , . . ., x N } belonging to the empirical likelihood may be too small for the CMM to be estimated and to do inference.We identify two problems, which are described more formally and in some more generality in Appendix A. We say that the first-order estimation problem occurs if the equation g * (m * ) = 0 does not have any solutions.This is also known as the empty set problem (Grendár & Judge, 2009).The second-order estimation problem occurs if the empirical likelihood support is too small to be able to estimate the covariance matrix of the estimated marginal parameters.Occurrence of the first-order problem implies occurrence of the second-order problem, and absence of the second-order problem implies absence of the firstorder problem.If the second-order problem occurs, inference for the model is problematic.The first-and second-order estimation problems can occur for MEL estimation with sparse observed contingency tables, as illustrated next.
Example 4. Consider a 2 × 2 contingency table and let Then, it can be verified that g * (m * ) = m 01 × 1 = 0 does not have any solutions; that is, the first-order estimation problem (or empty set problem) occurs, and hence so does the second-order one.If, on the other hand, we observed then the first-order problem does not occur.Assuming Poisson sampling for simplicity, we have var(g(n) − g(m)) = 4m 01 + 4m 10 .Under empirical likelihood, this is zero; that is, the variance of the marginal parameter cannot be estimated, and the second-order problem occurs.

MAEL Estimation
A solution to the first-and second-order estimation problems is obtained by augmenting the empirical likelihood support with a number of support cells, which we call maximum augmented empirical likelihood (MAEL) estimation.The question arises which cells to add.For CMMs, there is a fairly natural choice, in particular, suppose the order k marginal distributions are of interest for a particular CMM.Then clearly, to avoid the first-order estimation problem, the support must contain for every marginal cell at least one cell in the contingency table contributing to it.Hence, this is the least augmentation that should be done for the empirical likelihood support.To avoid the second-order estimation problem, note that the covariance between observed marginals is a function of higher-order marginals, for example, where a plus in the subscript denotes summation over that subscript.If the relevant higher-order marginals are estimable, the second-order estimation problem can typically be avoided.
If the second-order estimation problem occurs, it can be resolved by augmenting the empirical likelihood support so that each of the relevant higher-order marginals has one or more cells contributing to it.We found that the methodology is not affected much by which cells were chosen.In practice, we randomly added cells, which gave good results.
The notation is as follows.For ML estimation, all L cells of n are considered, and for MEL estimation, only the L * cells with a positive observed count, collected in n * , are considered.MAEL can be regarded as an intermediate estimation method, considering the L * cells with a positive observed count plus a number of cells with zero observed count to avoid the first-and second-order estimation problems.Let L † be such that L * ≤ L † ≤ L, and let n † , m † , and π † denote the augmented vector of observed frequencies, expected frequencies, and probabilities, respectively.
Example 6 explores some possibilities to augment the empirical likelihood support for a small example, illustrating that the fit of a CMM decreases dramatically when too few cells are added to n * .Example 6. Suppose that 1238 PSYCHOMETRIKA and suppose the marginal homogeneity CMM in Eq. 9 is the CMM of interest.The ML estimate is m = (0, 32.5, 0, 32.5, 32.5, 0, 32.5, 0) T with G 2 = 180.22(df = 2).For MEL estimation, the second-order estimation problems occur.Because Eq. 9 reduces to The rows of the design matrix in Eq.21 contain only nonnegative elements, and the constraints imply that m abc 100 = m abc 110 = 0.But since n abc 100 > 0 and n abc 110 > 0, the likelihood function is zero whenever Eq.21 holds; that is, G 2 = ∞.
The problem of a zero likelihood can be circumvented by adding n abc 011 to n * .Then we obtain  (0, 54.167, 32.5, 21.67, 21.67) T with G 2 = 232.92(df = 2).G 2 is now much closer to G 2 of the ML solution.

Comparing ML, MEL, and MAEL
Two studies compared the ML, MEL, and MAEL estimation procedures for three CMMs relevant for psychology and educational sciences: 1. Model "Alpha".Kuijpers et al. (2013) showed that testing whether Cronbach's alpha (α) equals a certain benchmark can be done using a CMM with 1 degree of freedom.Model "Alpha" is α = .8,because .8 is an arbitrary but commonly used benchmark to assess the quality of the test-score reliability (see, e.g., Nunnally, 1978).1239 2. Model "H j ".For a set of J items, Van der Ark et al. (2008) showed that testing whether each item-scalability coefficients H j ( j = 1, ..., J ) equals the researcherspecified lower-bound values c can be done using a CMM with J degree of freedom.
Let H = (H 1 , ..., H J ) T .Model "H j " is H = .31, as.3 is the default value of for lower bound c provided by software programs for Mokken scale analysis.3. Model "Mean".Bergsma et al. (2009, pp. 185-188) showed that testing equality of means of J variables can be done using a CMM with J − 1 degrees of freedom.Investigating equality of means may be useful when investigating whether a set of items are parallel (e.g., Lord & Novick, 1968, pp. 47-50) Study 1 is an exploratory simulation study to investigate the convergence rate and computation time under various settings.The tables are small to allow ML estimation.In Study 2, we investigated the Type I error rate of CMMs estimated with MAEL for realistic numbers of items in psychological and educational test data.We considered tables ranging from small (16 cells) to enormous (1.1 × 10 12 ).In addition, we investigated bias and variance of parameter β.ML estimation was not considered because it is feasible only for small tables, and MEL estimation was not considered because in most cases the algorithm runs into singularity problems and, consequently, does not converge.

Population Models and Estimation
Both Study 1 and Study 2 required population models (i.e., the vector of probabilities, π) that comply with the constraints of the CMM under consideration (i.e., "Model Alpha", "Model H j ", or "Model Mean" for J items).The population models were constructed as follows.First, we constructed a two-parameter logistic model (2PLM), a popular item response theory model (Birnbaum, 1968) , for which the location and discrimination parameters were selected (by trial and error) such that data generated from that 2PLM were close, in a loose sense, to the requirements of the CMM under consideration.Next, we generated 1000 response patterns from the 2PLM.Then, using ML (Study 1) or MAEL (Study 2), the CMM under consideration was estimated for the generated data, and the resulting estimated probabilities were used as the probabilities π of the data generating model.Finally, N observations were sampled from π.This data-generating procedure yields expected frequencies m that meet the constraints of the CMM of interest and have a relatively close fit to the 2PLM.
In Study 1, a certain percentage of the probabilities from the population model was deliberately set to zero, so as to create conditions with many zero cells.The cells in π that were set to zero were randomly selected, and afterwards π was rescaled.Note that setting random cells to zero is useful to investigate convergence, but makes investigation of Type I error and bias impossible.
The CMMs under consideration were estimated using the generated data as input, employing the R package cmm (Bergsma & Van der Ark, 2023), which offers MAEL estimation starting from version 1.0.All CMMs received uniform starting values and a maximum of 1,000 iterations.The code is available on the Open Science Framework at https://osf.io/yz8rm/).

Study 1: convergence Rates and Computation Times
For N = 50, we investigated the effect of four independent variables on convergence rate and computation time.Estimation Procedure had three levels: ML, MEL, and MAEL.Type of CMM had three levels: "Model Alpha", "Model H j ", and "Model Mean".For "Model Alpha" the criterion value was set to the sample value plus.2; and for "Model H j " the criterion value was set to the average of the sample H j values.For convenience, the criterion values depend on the sample values.Because Study 1 investigated only computation time and convergence rate, sample-dependent criterion values are not a problem.Minimum Percentage Cells with Zero PSYCHOMETRIKA Table 1.
Convergence rates (percentage) and median computation times in seconds for ML, MEL, and MAEL, for three different CMMs, two numbers of items (J ), and three percentages of unobservable response patterns (U ) based on 1,000 (J = 4, 8) and 100 (J = 10) replications.Observed Frequency (U ) had three levels: 0% (none), 25% (small percentage), and 75% (large percentage).Number of items (J ) had two levels: 4 dichotomous items, yielding L = 16 possible response patterns, and 8 items, yielding L = 256 response patterns.The number of items was kept small to allow for ML estimation.Hence, we had a 3 (Estimation Method) × 3 (CMM) × 3 (U ) × 2 (J ) experimental design with a total of 54 cells.Each cell in the experimental design was replicated 1,000 times.For a small extra design (100 replications), we estimated CMMs with 10 (L = 1024) items to demonstrate the sharp increase in computation time.

CMM
Table 1 shows that for the smallest tables (J = 4 and J = 8), both ML and MAEL almost always converged, whereas MEL often broke down for models "H j " and "Mean".For J = 10, ML ran into memory problems for models "H j " and "Mean", whereas MEL almost always broke down.For Model "Alpha", convergence results were satisfactory for all three estimation methods.
The distribution of the computation time was positively skewed.Therefore, we reported the median rather than the mean computation time.Naturally, MAEL and MEL were at least as fast as ML: Ranging from just as fast to more than 200 times faster.As the number of items increased, the computation time increased dramatically (Table 1, columns 4-6).This was especially true for ML estimation.For 4 and 8 items (L = 256), but the computation time was still reasonable in all sample (never longer than 100 s), but for 10 items (L = 1024) some runs took up to 30 min for Model "Alpha".
The results show that even for moderately large tables, ML may run into memory problems.Moreover, the results show that the first-and second-order estimation problems are omnipresent so that MEL often breaks down.This leaves MAEL as the viable candidate for estimating CMMs for large sparse contingency tables.

Study 2: Type I Error Rate
For MAEL estimation, we investigated the effect of the type of CMM, the number of items, and sample size on the Type I error rate and the bias and standard deviation of model parameter β. (Eq.2).As in Study 1, Type of CMMs had three levels: "Model Alpha" (the criterion value was set to 0.8), "Model H j " (the criterion value was set to 0.3), and "Model Mean".For "Model Alpha" and "Model H j " parameter β is fixed to β = 0.8 and β = 1 J • 0.3, respectively.Hence, bias and standard deviation of β were investigated only for Model "Mean", where β equals the overall mean item score.Moreover, we studied four levels of number of items: 4 (L = 16), 8 (L = 256), 20 (L = 1, 048, 576), and 40 (L ≈ 1.1 × 10 12 ); and three levels of sample size (N = 250, N = 500, and N = 1000).Hence, we had a 3 (CMM) × 4 (J ) × 3 (N ) experimental design with a total of 36 cells.Each cell in the experimental design was replicated 10,000 times for J = 4 and J = 8 items and 1000 times for J = 20 and J = 40 items.The empirical Type I error rate over the replications was compared to the nominal Type I error rate of 0.05, the mean value of β − β over replications was used to estimate the bias, and the standard deviation of β over replications was used as an estimate of the standard error of β.
Table 2 shows the Type I error rates for all cells in the design.In most cells, the Type I error rates are close to the nominal Type I error rate.For models with many degrees of freedom estimated using a relatively small sample size, the models are too liberal.For 40 items, models "H j " and "Mean" have 40 and 39 degrees of freedom, respectively.For N = 250, this results in approximately 6 observations per degree of freedom.Hence, the poor performance is not so much due to the large table as due to the increase in degrees of freedom.Results are satisfactory if the sample size per degree of freedom exceeds 25 (see Fig. 1).
For Model "Mean", the bias of β (not tabulated) was negligible in all cases, and the estimated standard error (Table 3) behaved as expected; that is, if N doubles, the estimated standard error decreased approximately by a factor √ 2.

Discussion
CMMs have potential for application to psychological data, but an important reason that this potential has so far not been realized may be that up to now ML estimation of CMMs could only be applied to contingency tables for a limited number of categorical variables (up to, say, 10-20 variables, depending on the number of categories per variable).The present paper shows that this limitation can be resolved by the newly introduced maximum augmented empirical likelihood (MAEL) estimation method, a procedure that considers all nonzero cells in the table (i.e., cells with at least one observation) and some well-chosen zero cells in the table (i.e., cells with no observations).MAEL can be thought of as lying in between maximum empirical likelihood (MEL) estimation, which considers only nonzero cells in the table and subsequently suffers from the firstorder and second-order estimation problems, and maximum likelihood (ML), which considers all cells in the table and runs into memory problems if the table is large.
The asymptotic distribution of the ML estimators of marginal parameters is known (Lang 2005), and depends only on the covariance matrix of the sample marginal distributions.In contrast  to MEL, due to the augmentation step MAEL allows this covariance matrix to be estimated.Simulation study 2 shows this estimation is done sufficiently well in a number of practical settings, in particular, the asymptotic distribution of the ML estimators also provide a good approximation of the distribution of the MAEL estimators.The asymptotic distributions of ML and MAEL estimators are identical.MAEL estimation has advantages compared to alternative methods which can be used to estimate CMMs for large contingency table, namely the weighted least squares method (Grizzle et al., 1969, a.k.a. the GSK-method), generalized estimating equations (GEEs, e.g., Qaqish & Liang, 1992), and composite likelihood (e.g., Varin et al., 2011).A comparison of GSK and GEE with ML estimation is given in Rudas and Bergsma (2023).All these four methods can be used to estimate CMMs for almost arbitrarily large contingency tables, but the only methods with guaranteed optimal asymptotic efficiency are MAEL and GSK.Unlike MAEL, however, GSK is sensitive to sparsity of the marginal distributions (Bergsma et al., 2013, see also the discussion of Berkson, 1980).Like GEE and GSK, MAEL estimation is computationally fast, and like ML but unlike GEE, it is asymptotically efficient.Furthermore, MAEL is less sensitive to sparsity of the marginal distributions than GSK.Thus, MAEL seems to be the preferred method for estimating CMMs.Researchers should take heed that if the ratio of the sample size and degrees of freedom becomes too small (say less than 25), the Type I error rates may be too liberal.This is not a feature of MAEL per se, but for all models that are too complex for the number of observations.Composite likelihood estimation is a possibly attractive alternative for estimating CMMs, which was not considered in this study because the estimation procedures are not yet available for CMMs, whereas MAEL fits nicely in the ML framework and software that is already available for CMMs.In addition, composite likelihood is a quasi-likelihood method, and hence asymptotic efficiency is lost, whereas ML, and hence MAEL and MEL, are asymptotically efficient (Aitchison & Silvey 1958, Lang, 2005).
for some function ψ.For example, if ψ(x, θ) = x − θ, then (24) implies that θ = EX.Denote the population value of θ by θ 0 .Suppose we have observed x 1 , . . ., x N , which are i.i.d. and distributed as X.The MEL estimator θ of θ defined by ( 24 The first-order estimation problem occurs if (25) does not have a solution (This problem is also known as the empty set problem, see Grendár & Judge, 2009).The best known example is the case that θ = EX while the population mean lies outside the convex hull of {x 1 , . . ., x N } (Qin & Lawless, 1994) .
Let F be the distribution function of X.Under some conditions on ψ and F, θ has an asymptotic multivariate normal distribution, in particular, The second-order estimation problem occurs if there does not exist a distribution function G with empirical support {x 1 , . . ., x N } such that V F (θ 0 ) = V G (θ 0 ).
Example 7. To illustrate the second-order estimation problem, consider a 2 × 2 contingency table with cell probabilities π i j > 0 (i, j ∈ {0, 1}), and let θ be the log ratio of marginal odds; that is, (For details on how to define ψ such that this θ is the solution of (24), see Owen, 2001).If one observed marginal count is zero, then θ = ±∞ under empirical likelihood; that is, the firstorder estimation problem occurs since −∞ < θ 0 < ∞.If the two observed off-diagonal cell counts are zero, then θ = 0 under empirical likelihood and as a consequence V G (θ ) = 0 for any distribution G with support the two diagonal cells.However, assuming no structural zeroes in the table, V F (θ 0 ) > 0, and therefore the second-order estimation problem occurs.In this case, the first-order estimation problem occurs in addition to the second-order one if θ 0 = 0.
A special case of the second-order estimation problem has been identified earlier by Bergsma et al. (2012), who called it the zero-likelihood problem.It occurs if the empirical likelihood is zero for all solutions of (25).In this case, for any distribution G with support {x 1 , . . ., x N }, V G (θ) is a matrix of zeroes, unequal to V F (θ 0 ); hence, the second-order estimation problem occurs.We propose a solution for both estimation problems by augmentation of the support of the empirical likelihood, resulting in an estimation procedure lying in a spectrum with ML at one extreme and MEL at the other, which we call maximum augmented empirical likelihood (MAEL) estimation.

B Algorithm for Maximum Likelihood Estimation
Under some regularity conditions, the maximum likelihood estimates under model ( 5) are a saddle point of the Lagrangian log-likelihood where μ and λ are Lagrange multipliers.In Eq. 26, n T log(m) is the unconstrained kernel of the log-likelihood, and the Lagrangian terms are added to satisfy the multinomial sampling constraint i m i = N (Eq.14) and the model constraint (Eq.13).Bergsma (1997, pp. 89-95) developed a Fisher scoring algorithm to find the ML estimates of the constrained expected frequencies in m (Eq.5) or, equivalently, the constrained cell probabilities π.This algorithm is a modification of Lagrangian algorithms by Aitchison and Silvey (1958) and Lang and Agresti (1994).It can be shown that μ = 1, so Eq. 26 can be simplified to L(m, λ) = n T log(m) + λ T B T g(A T m). (27) The ML estimates of the m and λ are obtained by means of an iterative procedure that determines a saddle point of this Lagrangian.We take the derivative of L(m, λ) with respect to log m rather than m because they yield simpler expressions.

Figure 1 .
Figure 1.Type I error rates by the ratio of sample size and degrees of freedom in Study 2. Dashed lines are the limits of the 95% confidence interval of the Type I error rate due to Monte Carlo error.

Table 2 .
Type I error rate for MAEL estimation of three different CMMs, four different numbers of items (J ), and three different sample sizes (N ), based on 1000 replications.

Table 3 .
Estimated standard error of CMM-parameter estimate β for Model "Mean", for four different numbers of items (J ), and three different sample sizes(N ), based on 1000 (J = 20 and J = 40) and 10, 000 (J = 4 and J = 8) replications.