Abstract
Large sample theory states the asymptotic normality of the maximum likelihood estimator of the person parameter in the two parameter logistic (2PL) model. In short tests, however, the assumption of normality can be grossly wrong. As a consequence, intended coverage rates may be exceeded and confidence intervals are revealed to be overly conservative. Methods belonging to the higher-order-theory, more specifically saddlepoint approximations, are a convenient way to deal with small-sample problems. Confidence bounds obtained by these means hold the approximate confidence level for a broad range of the person parameter. Moreover, an approximation to the exact distribution permits to compute median unbiased estimates (MUE) that are as likely to overestimate as to underestimate the true person parameter. Additionally, in small samples, these MUE are less mean-biased than the often-used maximum likelihood estimator.
Similar content being viewed by others
References
Agresti, A. (2001). Exact inference for categorical data: recent advances and continuing controversies. Statistics in Medicine, 20, 2709–2722.
Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken: Wiley.
Agresti, A., & Gottard, A. (2007). Nonconservative exact small-sample inference for discrete data. Computational Statistics & Data Analysis, 51, 6447–6458.
Agresti, A., & Min, Y. (2001). On small-sample confidence intervals for parameters in discrete distributions. Biometrics, 57, 963–971.
Aït-Sahalia, Y., & Yu, J. (2006). Saddlepoint approximations for continuous-time Markov processes. Journal of Econometrics, 134, 507–551.
Baker, F.B., & Kim, S.H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: CRC Press.
Barndorff-Nielsen, O. (1986). Inference on full or partial parameters on the standardized signed log likelihood ratio. Biometrika, 73(2), 307–322.
Bedrick, E.J. (1997). Approximating the conditional distribution of person fit indexes for checking the Rasch model. Psychometrika, 62(2), 191–199.
Bedrick, E.J., & Hill, J.R. (1992). An empirical assessment of saddlepoint approximations for testing a logistic regression parameter. Biometrics, 48(2), 529–544.
Birnbaum, A. (1964). Median-unbiased estimators. Bulletin of Mathematical Statistics, 11, 25–34.
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71(3), 425–440.
Brazzale, A.R. (1999). Approximate conditional inference in logistic and loglinear models. Journal of Computational and Graphical Statistics, 8(3), 653–661.
Brazzale, A.R. (2000). Practical small-sample parametric inference. Unpublished doctoral dissertation, Ecole Polytechnique Fédérale de Lausanne, Switzerland.
Brazzale, A.R. (2005). Hoa: An R package bundle for higher order likelihood inference. Rnews, 5(1), 20–27. (ISSN 609-3631).
Brazzale, A.R., & Davison, A.C. (2008). Accurate parametric inference for small samples. Statistical Science, 23(4), 465–484.
Brazzale, A.R., Davison, A.C., & Reid, N. (2007). Applied asymptotics: case studies in small-sample statistics. Cambridge: Cambridge University Press.
Brown, G.W. (1947). On small-sample estimation. The Annals of Mathematical Statistics, 18(4), 582–585.
Butler, R.W. (2000). Reliabilities for feedback systems and their saddlepoint approximation. Statistical Science, 15(3), 279–298.
Butler, R.W. (2007). Saddlepoint approximations with applications. New York: Cambridge University Press.
Casella, G., & Berger, R. (2002). Statistical inference. Pacific Grove: Duxbury/Thomson Learning.
Chieffo, A., Stankovic, G., Bonizzoni, E., Tsagalou, E., Iakovou, I., Montorfano, M., et al. (2005). Early and mid-term results of drug-eluting stent implantation in unprotected left main. Circulation, 111, 791–795.
Cox, D. (2006). Principles of statistical inference. New York: Cambridge University Press.
Davison, A.C. (2003). Statistical models. New York: Cambridge University Press.
Davison, A.C. (1988). Approximate conditional inference in generalized linear models. Journal of the Royal Statistical Society Series B (Methodological), 50(3), 445–461.
Davison, A.C., Fraser, D., & Reid, N. (2006). Improved likelihood inference for discrete data. Journal of the Royal Statistical Society Series B, 68(Part 3), 495–508.
DeMars, C. (2010). Item response theory. New York: Oxford University Press.
Doebler, A., Doebler, P., & Holling, H. (2013). Optimal and most exact confidence intervals for person parameters in item response theory models. Psychometrika, 78(1), 98–115.
Essen, C.-G. (1945). Fourier analysis of distribution functions. A mathematical study of the Laplace–Gaussian law. Acta Mathematica, 77(1), 1–125.
Fischer, G.H. (2007). Rasch models. In C. Rao & S. Sinharay (Eds.), Handbook of statistics: Vol. 26. Psychometrics (pp. 515–585). Amsterdam: North-Holland.
Fox, J.-P. (2010). Bayesian item response modeling. New York: Springer.
Hall, P. (1982). Improving the normal approximation when constructing one-sided confidence intervals for binomial or Poisson parameters. Biometrika, 69(3), 647–652.
Hall, P. (1992). On the removal of skewness by transformation. Journal of the Royal Statistical Society, Series B (Methodological), 54(1), 221–228.
Hambleton, R.K., & Zhao, Y. (2005). Item response theory (IRT) models for dichotomous data. In B. Everitt & D. Howell (Eds.), Encyclopedia of statistics in behavioral science (pp. 982–990). Chichester: Wiley.
Hirji, K.F. (2006). Exact analysis of discrete data. Boca Raton: Chapman & Hall/CRC Press.
Hirji, K.F., Tsiatis, A.A., & Metha, C.R. (1989). Median unbiased estimation for binary data. American Statistician, 43(1), 7–11.
Hoijtink, H., & Boomsma, A. (1995). On person parameter estimation in the dichotomous Rasch model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp. 54–68). New York: Springer.
Johnson, N.L., Kemp, A.W., & Kotz, S. (2005). Univariate discrete distributions (3rd ed.). Hoboken: Wiley.
Kay, S., Nuttall, A., & Baggenstoss, P. (2001). Multidimensional probability density function approximations for detection, classification, and model order selection. IEEE Transactions on Signal Processing, 49(10), 2240–2252.
Klauer, K.C. (1991a). Exact and best confidence intervals for the ability parameter of the Rasch model. Psychometrika, 56(2), 535–547.
Klauer, K.C. (1991b). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56(2), 213–228.
Kolassa, J. (1997). Infinite parameter estimates in logistic regression, with application to approximate conditional inference. Scandinavian Journal of Statistics, 24(4), 523–530.
Lehmann, E. (1951). A general concept of unbiasedness. The Annals of Mathematical Statistics, 22(4), 587–592.
Lehmann, E. (1999). Elements of large sample theory (1st ed.). New York: Springer.
Lehmann, E., & Casella, G. (1998). Springer texts in statistics. Theory of point estimation. (2nd ed.). New York: Springer. Hardcover.
Lehmann, E., & Romano, J. (2005). Testing statistical hypotheses (3rd ed.). New York: Springer.
Levin, B. (1990). The saddlepoint correction in conditional logistic likelihood analysis. Biometrika, 77(2), 275–285.
Liou, M., & Yu, L.-C. (1991). Assessing statistical accuracy in ability estimation: a bootstrap approach. Psychometrika, 56(1), 55–67.
Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245.
Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12(2), 475–490.
Molenaar, I., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55(1), 75–106.
Ogasawara, H. (2012). Asymptotic expansions for the ability estimator in item response theory. Computational Statistics, 27(4), 661–683.
Ogasawara, H. (2013). Asymptotic properties of the bayes and pseudo bayes estimators of ability in item response theory. Journal of Multivariate Analysis, 114, 359–377.
Pace, L., & Salvan, A. (1997). Principles of statistical inference from a neo-Fisherian perspective. Singapore: World Scientific.
Pace, L., & Salvan, A. (1999). Point estimation based on confidence intervals: exponential families. Journal of Statistical Computation and Simulation, 64, 1–21.
Pfanzagl, J. (1970a). Median unbiased estimates for m.l.r.-families. Metrika, 15(1), 30–39.
Pfanzagl, J. (1970b). On the asymptotic efficiency of median unbiased estimates. The Annals of Mathematical Statistics, 41(5), 1500–1509.
Pfanzagl, J. (1972). On median unbiased estimates. Metrika, 18(1), 154–173.
Pierce, D.A., & Peters, D. (1992). Practical use of higher order asymptotics for multiparameter exponential families. Journal of the Royal Statistical Society Series B, 54(3), 701–737.
R Development Core Team (2009). R: A language and environment for statistical computing [Computer software manual], Vienna, Austria. Available from http://www.R-project.org. (ISBN 3-900051-07-0).
Read, C.B. (2006). Median unbiased estimators. In S. Kotz, L.J. Norman, N. Balakrishnan, C.B. Read, & V. Brani (Eds.), Encyclopedia of statistical sciences (2nd ed., Vol. 7, pp. 4713–4715). New York: Wiley-Interscience.
Reeve, B., & Mâsse, L. (2004). Item response theory modeling for questionnaire evaluation. In S. Presser et al. (Eds.), Methods for testing and evaluating survey questionnaires (pp. 247–273). Hoboken: Wiley.
Reid, N. (1988) Saddlepoint methods and statistical inference. Statistical Science, 3(2), 213–238.
Rogers, L., & Zane, O. (1999). Saddlepoint approximations to option prices. The Annals of Applied Probability, 9(2), 493–503.
Routledge, R. (1994). Practicing safe statistics with the mid-p ∗. Canadian Journal of Statistics, 22(1), 103–110.
Salvan, A., & Hirji, K. (1991). Asymptotic equivalence of conditional median unbiased and maximum likelihood estimators in exponential families. Metron, 49, 219–232.
Severini, T.A. (2000). Likelihood methods in statistic. New York: Oxford University Press.
Small, C.G. (2010). Expansions and asymptotics for statistics. Boca Raton: Chapman & Hall/CRC Press.
Srivastava, M., & Yau, W. (1989). Saddlepoint method for obtaining tail probability of Wilks’ likelihood ratio test. Journal of Multivariate Analysis, 31, 117–126.
Stuart, A., & Ord, J. (1987). Kendall’s advanced theory of statistics (5th ed., Vol. 1). New York: Oxford University Press.
van der Linden, W.J. & Glas, G.A.W. (Eds.) (2000). Computerized adaptive testing: theory and practice. Dordrecht: Kluwer Academic.
Wang, S., & Carroll, R.J. (1999). High-order accurate methods for retrospective sampling problems. Biometrika, 86(4), 881–897.
Warm, T.A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450.
Young, G., & Smith, R. (2005). Essentials of statistical inference. New York: Cambridge University Press.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Appendices
Appendix A. Derivation of the Saddlepoint Approximation for the 2PL
The probability of the response vector x v in the 2PL is
and the probability to obtain the weighted sum score \(\sum_{i=1}^{n} x_{vi}\alpha_{i}\) is
where \(\sum_{\mathbf{x}_{v}|\sum x_{vi}\alpha_{i}}\) means summation over all response vectors having the same weighted sum score \(\sum_{i=1}^{n} x_{vi}\alpha_{i} \). In the context of the Rasch model, this sum is called \(\gamma_{r_{v}}\), the elementary symmetric function. If the weighted sums corresponding to the response vectors are all unique, formulae (A.1) and (A.2) agree exactly. Here we can observe an interesting feature of the exponential family, namely that the density of the data points x v is of the same family as the density of the sufficient statistic (e.g., Casella & Berger 2002, p. 217)
The density of the sufficient statistic is then
with the sole difference given in functions f 0(x) and F 0(s). Comparing this to Equations (A.1) and (A.2), we see that \(\exp(-\sum_{i}^{n}x_{vi}\alpha_{i}\beta_{i})=f_{0}(\mathbf{x}_{v})\) and \(\sum_{\mathbf{x}_{v}|\sum x_{vi}\alpha_{i}}\exp(-\sum_{i=1}^{n} x_{vi}\alpha_{i}\beta_{i})=F_{0}(r_{v})\). In order to simplify the notation, the weighted sum score \(\sum_{i=1}^{n} x_{vi}\alpha_{i}\) will be abbreviated in the sequel as w v . Because in exponential family theory (e.g., Lehmann & Casella 1998; Davison 2003) the log of the normalizing constants, i.e., the log of the denominators of Equations (A.2) and (A.1), are the cumulant generating functions, we abbreviate \(\log \prod_{i=1}^{n} [1+\exp \{\alpha_{i}(\theta_{v}-\beta_{i}) \} ]\) by K(θ v ).
Exponential tilting is a technique that allows any density to be embedded in an exponential family, and, by tilting an exponential family, a density of the same family is obtained (Davison 2003, p. 168). Any arbitrary density can be multiplied by exp(sϕ), and, by renormalizing this product, a valid density function is obtained.
The distribution of a response vector x v in the 2PL model is given in Equation (A.1). Multiplying this density by exp(w v ϕ v ) and normalizing the product yields
The summation in the denominator takes place over all possible response vectors. The above equation can be expressed in a slightly different way by writing the term in the denominator not depending on w v before the summation sign
Now, the first factors in the enumerator and denominator cancel, and from Equation (A.1) we have the result that the remaining term in the denominator must be
Starting from Equation (A.6), the probability P(x v |θ v +ϕ v ,β,α) can hence be written as
This equation can be solved for the initial density P(x v |θ v ,β,α), and one obtains
By comparing Equation (A.9) to Equations (A.3) and (A.4), one sees that, in order to obtain an approximation to the density of the sufficient statistic w v , we have to exchange P(x v |θ v ,β,α) with P(w v |θ v ,β,α) and likewise for P(x v |θ v +ϕ v ,β,α). This last term can be approximated by the first term of an Edgeworth expansion of a standardized sum \(S_{n}^{*}=(S_{n}-n\mu)/\sqrt{n}\sigma\) of independent variables Y 1,…,Y n (Pace & Salvan 1997). Here \(S_{n} =\sum_{i}^{n} Y_{i}\), μ=E(Y) and σ 2=Var(Y). The Edgeworth expansion for such a sum is given by
where \(\rho_{r}(S_{n}^{*})=\kappa_{r}\) are the respective standardized cumulants. The first two cumulants are given by ρ 1=0 and ρ 2=1 and H r (x) are Tchebycheff–Hermite Polynomials (e.g., Stuart & Ord 1987). The odd-ordered Hermite Polynomials vanish at x=0, the mean of the standardized sum. As can be seen, including only the first term of the above expansion approximates the density of the normalized sum as a normal density with error of order O(n −1/2) in accordance with the central limit theorem (e.g., Young & Smith 2005). Including the second and third term corrects for skewness and kurtosis and the error will drop to O(n −1), and with the fourth term to O(n −3/2). Since the Edgeworth expansion achieves greatest accuracy at the mean, where H 3(x) vanishes and leaves an error of O(n −1), the mean of P(w v |θ v +ϕ v ,β,α) is needed. Again, from the theory of exponential families we have that the expectation of a variable is given by the first derivative of the cumulant generating function. Equating the expectation to the sufficient statistic gives the so-called maximum likelihood equation
which is solved by the maximum likelihood estimator \(\hat{\theta}_{v}\). Since the distribution P(w v |θ v +ϕ v ,β,α) has the same sufficient statistic w v , it is also maximized at \(\hat{\theta}_{v}\), and hence \(\phi_{v}=\hat{\theta}_{v}-\theta_{v}\). For standardized sums the first term of the Edgeworth expansion (Pace & Salvan 1997, Chap. 10) is the standard normal density. The Jacobian of the transformation (Casella & Berger 2002, pp. 120, 158) for the univariate case is the reciprocal of the standard deviation, which in turn is obtained as the square root of the second derivative of the cumulant generating function. Hence, the approximation to the desired density is written as
where Rest stands for the terms not accounted for. Note that \(K''(\hat{\theta}_{v})\), the variance of variable w v , may not be confounded with the asymptotic variance of the estimate \(\hat{\theta}_{v}\), which of course is given by the reciprocal of \(K''(\hat{\theta}_{v})\). Keep in mind that \(\phi_{v}=\hat{\theta}_{v}-\theta_{v}\) so that Equation (A.9) can now be written as
This is the saddlepoint approximation to the probability of a certain weighted sum score w v for a given ability level θ v and item parameters β, α.
Appendix B. Polynomial Interpolation
In order to obtain stable values of r ∗ (Equation (16)), polynomial interpolation is done for values of θ that lie in an interval \(\hat{\theta}\pm 0.5 \cdot j(\hat{\theta})^{-1/2}\). First, the term r for all values of θ (Equation (10)) is modeled as a tenth order polynomial in u (Equation (14))
The coefficients are fitted by the least-squares criterion. Next the fraction r/u is determined by
After that the logarithm of this fraction in turn is modeled by a polynomial regression model with predictors given by the powers of r
The fraction of the logarithmic term and r is estimated by
The modified likelihood ratio is finally given by r minus the above term
Rights and permissions
About this article
Cite this article
Biehler, M., Holling, H. & Doebler, P. Saddlepoint Approximations of the Distribution of the Person Parameter in the Two Parameter Logistic Model. Psychometrika 80, 665–688 (2015). https://doi.org/10.1007/s11336-014-9405-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-014-9405-1