Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Hierarchical Bayesian Modeling for Test Theory Without an Answer Key

  • 504 Accesses

  • 16 Citations


Cultural Consensus Theory (CCT) models have been applied extensively across research domains in the social and behavioral sciences in order to explore shared knowledge and beliefs. CCT models operate on response data, in which the answer key is latent. The current paper develops methods to enhance the application of these models by developing the appropriate specifications for hierarchical Bayesian inference. A primary contribution is the methodology for integrating the use of covariates into CCT models. More specifically, both person- and item-related parameters are introduced as random effects that can respectively account for patterns of inter-individual and inter-item variability.

This is a preview of subscription content, log in to check access.

Figure 1.
Figure 2.
Figure 3.


  1. 1.

    The guessing bias parameter in the GCM is a person-specific, cognitive latent variable, which should not be confused with the item-specific guessing rate parameter of the three parameter logistic model (Birnbaum 1968).

  2. 2.

    Although if returning to an interpretation of the estimated values on their original unit-scale is of interest, then these values can be approximated through the transformation of variables technique (see e.g., Mood, Graybill, & Boes, 1974). However, these approximations do not account for the limits of the unit scale, and therefore in the case of large variances on the transformed scale, these approximations are biased towards the extremes.

  3. 3.

    This is an example of an ‘adjunct’ island. In this data set, there were four types of islands investigated: adjunct, subject, whether, and complex noun-phrase islands (see Sprouse et al. 2011).

  4. 4.

    In particular, there are four types of phrases: adjunct, subject, whether, and complex noun-phrases. The 64 items in the questionnaire are composed of each phrase-type having eight tokens as islands and eight tokens as non-islands. In each eight-token set, four tokens were long distances between the wh-word and its canonical position while four were short distances. Since the data set is used here for demonstrational purposes, for sake of simplicity, these four tokens were collapsed over of each condition (see Sprouse et al. 2011 for details).


  1. Baer, R.D., Weller, S.C., Alba Garcia, J.G., de Glazer, M., Trotter, R., & Pachter, L. et al. (2003). A cross-cultural approach to the study of the folk illness nervios. Culture, Medicine and Psychiatry, 27, 315–337.

  2. Batchelder, W.H., & Anders, R. (2012). Cultural consensus theory: comparing different concepts of cultural truth. Journal of Mathematical Psychology, 56, 316–332.

  3. Batchelder, W.H., Kumbasar, E., & Boyd, J. (1997). Consensus analysis of three-way social network data. The Journal of Mathematical Sociology, 22, 29–58.

  4. Batchelder, W.H., & Riefer, D.M. (1999). Theoretical and empirical review of multinomial process tree modeling. Psychonomic Bulletin & Review, 6(1), 57–86.

  5. Batchelder, W.H., & Romney, A.K. (1988). Test theory without an answer key. Psychometrika, 53, 71–92.

  6. Batchelder, W.H., Strashny, A., & Romney, A. (2010). Cultural Consensus Theory: aggregating continuous responses in a finite interval.

  7. Bimler, D. (2013). Two applications of the points-of-view model to subject variations in sorting data. Quality and Quantity, 47(2), 775–790.

  8. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores. Reading: Addison-Wesley.

  9. Buhrmester, M., Kwang, T., & Gosling, S.D. (2011). Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? In Perspectives on psychological science.

  10. Comrey, A.L. (1962). The minimum residual method of factor analysis. Psychological Reports, 11, 15–18.

  11. Congdon, P. (2003). Applied Bayesian modelling (Vol. 394). New York: Wiley.

  12. De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.

  13. De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer.

  14. Dey, D.K., Gelfand, A.E., Swartz, T.B., & Vlachos, P.K. (1998). A simulation-intensive approach for checking hierarchical models. Test, 7(2), 325–346.

  15. Fischer, G., & Molenaar, I. (1995). Rasch models: foundations, recent developments, and applications. New York: Springer.

  16. Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis. New York: Chapman & Hall.

  17. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.

  18. Hopkins, A. (2011). Use of network centrality measures to explain individual levels of herbal remedy cultural competence among the Yucatec Maya in Tabi, Mexico. Field Methods, 23(3), 307–328.

  19. Hruschka, D.J., Kalim, N., Edmonds, J., & Sibley, L. (2008). When there is more than one answer key: cultural theories of postpartum hemorrhage in Matlab, Bangladesh. Field Methods, 20, 315–337.

  20. Iannucci, A., & Romney, A. (1990). Consensus in the judgment of personality traits among friends and acquaintances. Journal of Quantitative Anthropology, 4, 279–295.

  21. Jackman, S. (2009). Bayesian analysis for the social sciences. New York: Wiley.

  22. Karabatsos, G., & Batchelder, W.H. (2003). Markov chain estimation methods for test theory without an answer key. Psychometrika, 68, 373–389.

  23. Klauer, K. (2010). Hierarchical multinomial processing tree models: a latent—trait approach. Psychometrika, 75, 70–98.

  24. Kruschke, J.K. (2011). Doing Bayesian data analysis: a tutorial with R and BUGS. New York: Academic Press.

  25. Lee, M.D. (2011). How cognitive modeling can benefit from hierarchical Bayesian models. Journal of Mathematical Psychology, 55, 1–7.

  26. Lunn, D.J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS—a Bayesian modeling framework: concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.

  27. Macmillan, N.A., & Creelman, C.D. (2005). Detection theory: a users guide (2nd ed.). Mahwah: Erlbaum.

  28. Merkle, E.C., Smithson, M., & Verkuilen, J. (2011). Hierarchical models of simple mechanisms underlying confidence in decision makings. Journal of Mathematical Psychology, 55, 57–67.

  29. Miller, E. (2011). Maternal health and knowledge and infant health outcomes in the Ariaal people of northern Kenya. Social Science & Medicine, 73(8), 1266–1274.

  30. Mood, A.M., Graybill, F.A., & Boes, D.C. (1974). Introduction to the theory of statistics. New York: McGraw-Hill.

  31. Morey, R.D. (2011). A Bayesian hierarchical model for the measurement of working memory capacity. Journal of Mathematical Psychology, 55(1), 8–24.

  32. Oravecz, Z., Vandekerckhove, J., & Batchelder, W. H. (in press). Bayesian cultural consensus theory. Field Methods.

  33. Plummer, M. (2011). Rjags: Bayesian graphic models using MCMC (R package version 2.2.0-3). http://CRAN.R-project.org/package=rjags.

  34. Rasch, G. (1960). Probabilistic models for some intelligent and attainment tests. Copenhagen: Danish Institute for Educational Research.

  35. Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical linear models: applications and data analysis methods. Newbury Park: Sage.

  36. Riefer, D.M., Knapp, B.R., Batchelder, W.H., Bamber, D., & Manifold, V. (2002). Cognitive psychometrics: assessing storage and retrieval deficits in special populations with multinomial processing tree models. Psychological Assessment, 14(2), 184.

  37. Robert, C.P., & Casella, G. (2004). Monte Carlo statistical methods. New York: Springer.

  38. Romney, A.K., & Batchelder, W.H. (1999). Cultural consensus theory. In R. Wilson & F. Keil (Eds.), The MIT encyclopedia of the cognitive sciences (pp. 208–209). Cambridge: MIT Press.

  39. Romney, A.K., Weller, S.C., & Batchelder, W.H. (1986). Culture as consensus: a theory of culture and informant accuracy. American Anthropologist, 88(2).

  40. Rouder, J., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12, 573–604.

  41. Scheibehenne, B., Rieskamp, J., & Wagenmakers, E. J. (2013). Testing adaptive toolbox models: a Bayesian hierarchical approach. Psychological Review, 120, 39–64.

  42. Smith, J.B., & Batchelder, W.H. (2010). Beta-MPT: multinomial processing tree models for addressing individual differences. Journal of Mathematical Psychology, 54(1), 167–183.

  43. Snijders, T., & Bosker, R. (1999). Multilevel analysis: an introduction to basic and advanced multilevel modeling. Thousand Oaks: Sage.

  44. Spearman, C.E. (1904). ‘General intelligence’ objectively determined and measured. The American Journal of Psychology, 15, 72–101.

  45. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B, 6, 583–640.

  46. Sprouse, J., Fukuda, S., Ono, H., & Kluender, R. (2011). Reverse island effects and the backward search for a licensor in multiple wh-questions. Syntax, 14(2), 179–203.

  47. Weller, S.C. (2007). Cultural consensus theory: applications and frequently asked questions. Field Methods, 19, 339–368.

  48. Weller, S.C., Baerm, R.D., Pachter, L.M., Trotter, R., Glazer, M., de Alba Garcia, J.G. et al. (1999). Latino beliefs about diabetes. Diabetes Care, 22, 722–728.

  49. Weller, S.C., Pachter, L.M., Trotter, R.T., & Baer, R.D. (1993). Empacho in four Latino groups: a study of intra- and inter-cultural variation in beliefs. Medical Anthropology, 15(2), 109–136.

  50. Wetzels, R., Vandekerckhove, J., Tuerlinckx, F., & Wagenmakers, E.J. (2010). Bayesian parameter estimation in the expectancy valence model of the Iowa gambling task. Journal of Mathematical Psychology, 54, 14–27.

  51. Yoshino, R. (1989). An extension of the “test theory without answer key” by Batchelder and Romney and its application to an analysis of data on national consciousness. Proceedings of the Institute of Statistical Mathematics, 37, 171–188. (in Japanese).

Download references


Work on this paper was supported by grant to the authors from the Army Research Office (ARO) and from the Oak Ridge Institute for Science and Education (ORISE). We would like to thank Jon Sprouse for making available to us his grammaticality data set.

We would also like to thank the four anonymous reviewers and Joachim Vandekerckhove for their useful comments.

Author information

Correspondence to Zita Oravecz.


Appendix A. Proof of Identifiability for the General Condorcet Model

Let the parameters be (D,G,Z) of the GCM with parameter spaces, respectively (0,1)N, (0,1)N, {0,1}M. Let Ω be the parameter space of (D,G,Z). Let Y be an N×M matrix of 1s and 0s. Let S be the space of all Y. Let h:ΩΠ, where Π is the space of all probability distribution over Y. Let p(y∣(D,G,Z)) be a particular probability density function of Π.


In this context a model is identified in case h is one-to-one, meaning that two different sets of parameters necessarily produce different probability density functions over Y.

Observation 1

The model is not identified if we allow all items to be false or if we allow all items to be true. Let Z be all 1s and Zbe all 0s, then so long as

$$ \forall i \quad D_i + (1-D_i)g_i = \bigl(1-D'_i\bigr)g'_i, $$

the model gives identical probabilities.

Observation 2

If we exclude these two extremes from the space of possible Zs, the model is identified. We need to show that

$$\begin{aligned} & \forall\mathbf{Y} \in\mathbf{S}: \\ &\quad p({\mathbf{Y}} \mid\mathbf{D}, \mathbf{G}, \mathbf{Z}) = p\bigl({\mathbf {Y}} \mid \mathbf{D}', \mathbf{G}', \mathbf{Z}' \bigr) \\ &\quad \Rightarrow\mathbf{D} = \mathbf{D}',\quad \mathbf{G} = \mathbf{G}',\quad \mathbf{Z} = \mathbf{Z}'. \end{aligned}$$

Suppose Z=Z′. Then there are \(Z_{k}=1=Z'_{k}\) and \(Z_{l}=0=Z'_{l}\). From these we have for all informants i: \(D_{i} + (1-D_{i})g_{i} = D'_{i} + (1-D'_{i})g'_{i}\) and \((1-D_{i})g_{i} = (1-D'_{i})g'\). From these we have \(D_{i} = D'_{i}\) and \(G_{i}=G'_{i}\). So if the model is not identified it must be that ZZ′. Pick k (without loss of generality since primed and unprimed can be swaped) with Z k =1, \(Z'_{k}=0\). For all informants i we have

$$ D_i + (1-D_i)g_i = \bigl(1-D'_i\bigr)g'_i. $$

Next pick j such that \(Z'_{j}=1\), this is possible because we have eliminated the case where Z′ can all be zeros. If Z j =1 then for all i,

$$ D_i + (1-D_i)g_i = D'_i+ \bigl(1-D'_i \bigr)g'_i, $$

and coupled with Equation (A.1), this is not possible. On the other hand if Z j =0, then for all informants i, we have

$$ (1-D_i)g_i = D'_i+ \bigl(1-D'_i\bigr)g'_i, $$

and coupled with Equation (A.1), this is not possible. Thus the model is identified.

Appendix B. Posterior Distributions of the HGCMs

The posterior distribution given the data for the HGCM with Gaussian population distributions can be derived the following way: For notational convenience, all of the person- and item-specific parameters are respectively collected into corresponding vectors (i.e., θ, g, δ, and Z). First, the conditional posterior for the model in which the probability variables have population distributions assigned on the transformed variable scale is derived as

$$\begin{aligned} &\operatorname{Pr}\bigl(\boldsymbol{\theta}, \boldsymbol{\beta}_{\theta}, \sigma^2_{\theta}, \boldsymbol{g}, \boldsymbol{ \beta}_g, \sigma^2_g, \boldsymbol{\delta}, \boldsymbol{\beta}_{\delta}, \sigma^2_{\delta}, \mathbf{Z}, \pi \mid \mathbf{Y}\bigr) \\ &\quad\propto \prod_{i=1}^I \prod _{k=1}^K \bigl( Z_k D_{ik} + (1 - D_{ik})g_i \bigr)^{(Y_{ik} \equiv1)} \\ &\qquad{}\times\bigl(-Z_k D_{ik} + D_{ik} +(1 - D_{ik}) (1-g_i) \bigr)^{(Y_{ik} \equiv0)} \\ &\qquad{} \times\prod_{i=1}^I \mathrm{Normal}_2 \left (\left [ \begin{array}{c} \mathrm{logit} (\theta_{i}) \\ \mathrm{logit} (g_i) \end{array} \right ] \biggm| \left[ \begin{array}{c} \boldsymbol{\beta}_{l (\theta)} \\ \boldsymbol{\beta}_{l(g)} \end{array} \right] , \boldsymbol{\Sigma}_{l(\theta g)} \right ) \\ &\qquad{}\times\prod_{k=1}^K \mathrm{Normal}\bigl(\mathrm{logit}(\delta_{k}) \mid \boldsymbol{ \beta}_{l(\delta)}, \sigma^2_{\delta}\bigr) \prod _{k=1}^K \mbox {Bernoulli}(Z_k \mid\pi) \\ &\qquad{}\times\mathrm{Normal}_{J+1}(\boldsymbol{\beta}_{l(\theta)} \mid0, 10 \mathbf{I}_{J+1}) \mathrm{Normal}_{J+1}(\boldsymbol{ \beta}_{l(g)} \mid0, 1000 \mathbf{I}_{J+1}) \\ &\qquad{}\times\mathrm{Normal}_{H+1}(\boldsymbol{\beta}_{l(\delta)} \mid0, 10 \mathbf{I}_{H+1}) \mbox{Uniform}( \pi\mid0,1) \\ &\qquad{}\times \mbox{Inverse-Wishart}(\mathbf{I}_{2}, 3) \mbox{Inverse-Gamma}\bigl(\sigma ^2_{\delta} \mid0.01, 0.01 \bigr). \end{aligned}$$

After the proportionality sign, the first double product describes the likelihood of the parameters given the data, based on Equations (3). It is followed by the products of the population densities of the person-specific parameters, as specified in Equations (8). The next line describes the population densities of the item-specific parameters as in Equations (9) and (10). Finally, the last two lines multiple all the above with the prior densities as chosen in Equations (17), (19), (21) and (20).

The only modification for the prior settings of the HGCM with the beta population distributions concerns the variance parameters. As θ i and g i are sampled univariately, priors have to be set for their ‘precision’ parameters (that determine their population variance). As is typically done for precision parameters, a moderately diffuse Gamma distribution can be chosen, where

$$\begin{aligned} \tau_\theta&\sim\mbox{Gamma}(1, 0.1). \end{aligned}$$

Then τ g as well as τ δ for item difficulty are set similarly.

Note that the \(\mathrm{HGCM}_{\mathcal{B}}\) is different only in terms of the population distributions for the item-and person-specific parameters. As discussed earlier, the beta distribution is parameterized in terms of regression coefficients and a precision parameter, as specified in Equations (16) and (B.1), and the posterior is written as

$$\begin{aligned} &\operatorname{Pr}(\boldsymbol{\theta}, \boldsymbol{\beta}_{\theta}, \tau_{\theta}, \boldsymbol{g}, \boldsymbol{\beta}_g, \tau_g, \boldsymbol{\delta}, \boldsymbol{\beta}_{\delta}, \tau_{\delta}, \mathbf{Z}, \pi \mid\mathbf{Y})\\ &\quad\propto \prod _{i=1}^I \prod _{k=1}^K \bigl( Z_k D_{ik} + (1 - D_{ik})g_i \bigr)^{(Y_{ik} \equiv1)} \\ &\qquad{}\times\bigl(-Z_k D_{ik} + D_{ik}+(1 - D_{ik}) (1-g_i) \bigr)^{(Y_{ik} \equiv0)} \\ &\qquad{} \times\prod_{i=1}^I \mathrm{Beta} ( \theta_{i} \mid\boldsymbol {\beta}_{\theta} , \tau_{\theta} ) \\ &\qquad{} \times\prod_{i=1}^I \mathrm{Beta} (g_{i} \mid\boldsymbol{\beta }_{g} , \tau_{g} ) \\ &\qquad{} \times\prod_{k=1}^K \mathrm{Beta} ( \delta_{k} \mid \boldsymbol{\beta }_{\delta}, \tau_{\delta}) \prod_{k=1}^K \mathrm{Bernoulli}(Z_k \mid\pi) \\ &\qquad{}\times\mathrm{Normal}_{J+1}(\boldsymbol{\beta}_{\theta} \mid0, 10 \mathbf{I}_{J+1}) \mathrm{Normal}_{J+1}(\boldsymbol{ \beta}_{g} \mid0, 1000 \mathbf{I}_{J+1}) \\ &\qquad{}\times\mathrm{Normal}_{H+1}(\boldsymbol{\beta}_{\delta} \mid0, 10 \mathbf{I}_{H+1}) \mbox{Uniform}( \pi\mid0,1) \\ &\qquad{} \times\mbox{Gamma}(\tau_\theta \mid1, 0.1) \mbox{Gamma}( \tau_g \mid1, 0.1)\mbox{Gamma}(\tau_\delta \mid1, 0.1). \end{aligned}$$

Appendix C. JAGS Code for the HGCMs

C.1 Normal Population Distributions


   for (i in 1:n){

      for (k in 1:m){

        D[i, k] <- (theta[i]*(1-delta[k]))/



        pY[i,k] <- g[i] - D[i,k]*g[i] + D[i,k]*z[k]

        Y[i,k]  ~ dbern(pY[i,k])}}

   for (i in 1:n){

          mean_theta[i]    <- Xtheta[i,]%*%coeff_theta

          mean_g[i]        <- Xg[i,]%*%coeff_g

          mean_thetag[i,1] <- mean_theta[i]

          mean_thetag[i,2] <- mean_g[i]

          logit_thetag[i,1:2] ~ dmnorm(mean_thetag[i,1:2],


          theta[i] <- ilogit(logit_thetag[i,1])

          g[i]     <- ilogit(logit_thetag[i,2])


   for (k in 1:m){

          Z[k]           ~ dbern(PI)

          delta[k]       <- ilogit(logit_delta[k])

          mean_delta[k]  <- Xdelta[k,]%*%coeff_delta

          logit_delta[k] ~ dnorm(mean_delta[k],prec_delta)}

   for (cov in 1:nrofthetacov){

        coeff_theta[cov] ~ dnorm(0, 0.01)}

   for (cov in 1:nrofgcov){

        coeff_g[cov] ~ dnorm(0, 0.01)}

   cov_thetag[1:2,1:2] <- inverse(precM_thetag[1:2, 1:2])

   var_theta           <- cov_thetag[1,1]

   var_g               <- cov_thetag[2,2]

   corr_thetag         <- cov_thetag[1,2]/


   precM_thetag[1:2, 1:2] ~ dwish(ID[1:2,1:2], 3)

    ID[1,1] <- 1

    ID[2,2] <- 1

    ID[1,2] <- 0

    ID[2,1] <- 0

   for (cov in 1:nrofdeltacov){

        coeff_delta[cov] ~ dnorm(0, 0.01)}

   prec_delta ~ dgamma(0.01, 0.01)

   var_delta  <- 1/prec_delta

   PI ~ dunif(0, 1)}

C.2 Beta Population Distributions


   for (i in 1:n){

      for (k in 1:m){

        D[i, k] <- (theta[i]*(1-delta[k]))/



        pY[i,k] <- g[i] - D[i,k]*g[i] + D[i,k]*z[k]

        Y[i,k]  ~ dbern(pY[i,k])}}

   for (i in 1:n){

        mean_theta[i] <- ilogit(Xtheta[i,]%*%coeff_theta)

        mean_g[i]     <- ilogit(Xg[i,]%*%coeff_g)

        theta[i] ~ dbeta(mean_theta[i]*prec_theta,


        g[i]     ~ dbeta(mean_g[i]*prec_g,


   for (k in 1:m){

        Z[k]          ~ dbern(PI)

        mean_delta[k] <- ilogit(Xdelta[k,]%*%coeff_delta)

        delta[k]      ~ dbeta(mean_delta[k]*prec_delta,


   for (cov in 1:nrofthetacov){

        coeff_theta[cov] ~ dnorm(0, 0.01)}

   for (cov in 1:nrofgcov){

        coeff_g[cov] ~ dnorm(0, 0.01)}

   for (cov in 1:nrofdeltacov){

        coeff_delta[cov] ~ dnorm(0, 0.01)}

   prec_theta_root ~ dgamma(1, 0.1)

   prec_theta      <- pow(prec_theta_root,2)

   prec_g_root ~ dgamma(1, 0.1)

   prec_g      <- pow(prec_g_root,2)

   prec_delta_root ~ dgamma(1, 0.1)

   prec_delta      <- pow(prec_delta_root,2)

   PI ~ dunif(0, 1)}

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Oravecz, Z., Anders, R. & Batchelder, W.H. Hierarchical Bayesian Modeling for Test Theory Without an Answer Key. Psychometrika 80, 341–364 (2015). https://doi.org/10.1007/s11336-013-9379-4

Download citation

Key words

  • Cultural Consensus Theory
  • Bayesian statistics
  • hierarchical model
  • covariate modeling