1 Introduction

Multilevel models—also known as hierarchical or mixed models—are used to model data with levels or hierarchies. They naturally occur in various everyday situations given the hierarchical or nested structure of the sampling units and are widely employed in fields such as social sciences, medicine, education, and reliability (Gaviria Morera 2005). According to Gaviria Morera (2005) and McElreath (2015), some of their most notable advantages are that they (1) provide better estimates for sampling when observations arise from the same individual, (2) explicitly model variation when the research questions include variation among individuals, and (3) make it possible to avoid data transformations.

Multilevel models are mainly characterized by modeling the random effect within groups at different levels. In Bayesian statistics, these models are said to occur naturally because there exists a multilevel structure in which the prior distribution represents one level of the model and the likelihood function constitutes the final stage, which results in the posterior distribution (Pinheiro and Bates 2006; Mason 1985; Bornmann et al. 2016; Gelman et al. 2013; Ntzoufras 2011).

Given the hierarchical nature of everyday situations, Bayesian multilevel logistic regression models have been increasingly used in recent years (Ayalew 2020; Aychiluhm et al. 2020; Gañan-Cardenas et al. 2021; Młynarczyk et al. 2021; Jabessa and Jabessa 2021; Fagbamigbe et al. 2021; Grogan-Kaylor et al. 2021; Sherwood et al. 2021). For example, Fagbamigbe et al. (2021) employed a Bayesian multilevel model to examine the main risk factors and regional variations in maternal mortality in Ethiopia. Similarly, Grogan-Kaylor et al. (2021) analyzed the association of physical punishment and nonphysical discipline with child socio-emotional functioning using Bayesian multilevel logistic regression models.

Other authors who have also employed multilevel models in their research studies include: Jara et al. (2008), Birlutiu et al. (2010), Tang and Duan (2014), De la Cruz et al. (2016), Lu et al. (2017) and Wang et al. (2019). For instance, Birlutiu et al. (2010) modeled multitask learning in hearing-impaired subjects using a hierarchical approach. Wang et al. (2019) presented a 10-minute trivia game-based activity for the intuitive understanding of confidence intervals. They fitted a mixed-effects Bayesian logistic model using noninformative priors for the coefficients and variance parameters. In this paper, we will develop an activity similar to that proposed by Wang et al. (2019), based on the results of two questionnaires that were administered to university students throughout the first academic semester of 2018.

For their part, Pérez et al. (2017) proposed the Scaled Beta2 (SBeta2) distribution as an alternative to the inverse-gamma distribution for modeling variances and standard deviations (or, more generally, for scales). They presented it as a flexible and tractable family that can be used to model scales in both hierarchical and nonhierarchical settings. In addition, they demonstrated that, for certain values of the hyperparameters, the mixture of a normal and an SBeta2 distribution gives a closed form marginal. Hence, we will here employ the SBeta2 and inverse-gamma distributions to model the variance parameters in the 2-level Bayesian multilevel logistic model. In addition, the resulting estimates of the models fitted with these two distributions will be compared by means of a simulation study to determine which distribution has the best predictive ability.

The rest of this paper is structured as follows. In Sect. 2, we provide an overview of Bayesian multilevel logistic regression and present the general models employed here. In Sect. 3, we introduce the SBeta2 distribution used to model the variance parameters. In Sect. 4, we present a simulation study to compare the performance of the proposed models. In Sect. 5, we apply the proposed models to real-life data. Finally, we outline the conclusions of this study.

2 Bayesian multilevel logistic regression

Multilevel models are extensions of regression models, with data structured in groups and coefficients that can vary by group. From the Bayesian perspective, Bayesian models are said to have an inherently multilevel or hierarchical structure. The prior distribution, \(\xi (\theta \mid \phi )\), of a model with prior parameters \(\phi \) can be considered one level or hierarchy, with the likelihood as the final stage of a Bayesian model, which results in the posterior distribution, \(\xi (\theta \mid y) \propto f (y\mid \theta )\xi (\theta \mid \phi )\), obtained using Bayes’ theorem. A key aspect in these models is that \(\phi \) is unknown; therefore, it will have its own prior distribution, \(\xi (\phi )\). The appropriate Bayesian posterior distribution is that of vector \((\phi , \theta )\) (Mason 1985; Bornmann et al. 2016; Gelman et al. 2013; Ntzoufras 2011). The joint prior distribution is given by

$$\begin{aligned} \xi (\phi ,\theta )=\xi (\theta \mid \phi )\xi (\phi ). \end{aligned}$$
(1)

Then, the joint posterior distribution is given by

$$\begin{aligned} \xi (\phi ,\theta \mid y)\propto \xi (\phi ,\theta )f(y\mid \phi ,\theta )=\xi (\theta \mid \phi )\xi (\phi )f(y\mid \theta ). \end{aligned}$$
(2)

This latter simplification holds because y and \(\phi \) are conditionally independent given \(\theta \). That is, the data distribution \(f(y\mid \phi ,\theta )\), depends only on \(\theta \); the hyperparameters \(\phi \) affect y only through \(\theta \) (Gelman et al. 2013).

In Bayesian multilevel logistic regression models, the assumption of interchangeability is employed when there is no information on the structure of the parameters of interest. It is assumed that all random parameters come from a common distribution and that their ordering does not affect the model or the results. In other words, the prior distributions of the hyperparameters are invariant to the random permutations of these latter (Ntzoufras 2011). For further elaboration on this concept, see the studies by Bernardo and Smith (2000) and Gelman et al. (2013).

Bayesian multilevel (or hierarchical) logistic regression models can be used to model clustered data having a binary response variable. Such is the case of logistic regression, which is considered a standard way of modeling this type of variables (i.e., data that take the values of 1 or 0). Results of these variables can be found in studies into, for instance, social, medical, and natural matters, in which the response is usually the presence or absence of a characteristic of interest (Peng et al. 2002; Gelman and Hill 2006; Pregibon 1981; Ntzoufras 2011). According to this, the dependent or response variable, y, is assigned a value of 1 if it has the characteristic of interest or a value of 0 if it does not.

In logistic regression, a single outcome variable, \(y_{i}\) \((i = 1,..., n\)), follows a Bernoulli probability function that takes the value of 1 with probability \(\pi _{i}\) or the value of 0 with probability \(1 - \pi _{i}\) and is denoted by \(y_{i} \sim \text{ Bernoulli }(\pi _{i})\). This type of regression adopts the logit link, which, besides being the most suitable choice because it is the canonical link, has a smooth and pleasant interpretation based on the probability ratio, \(\pi _{i}/(1 - \pi _{i})\) (King and Zeng 2001; Ntzoufras 2011).

Let us suppose that, in a study seeking to model the correct answers in a questionnaire administered to groups of students, \(y_{ij}=1\) (\(i=1,\dots , N_{j}\); \(j=1,\dots , J\)) if the \(i^{th}\) question is correctly answered by the individual in the \(j^{th}\) group and \(y_{ij}=0\) if it is not. In this case, the Bayesian hierarchical logistic model is given by

$$\begin{aligned} y_{j} \sim \text{ Binomial }(\pi _{j},N_{j}), \end{aligned}$$
(3)

where \(N_{j}\) is the number of questions in group j and \(y_{j}=\sum \limits _{i=1}^{N_{j}}y_{ij}\). The link function is given by

$$\begin{aligned} \text{ logit }(\pi _{j})= & {} \log \left( \frac{\pi _{j}}{1-\pi _{j}}\right) =\theta _{j}, \end{aligned}$$
(4)
$$\begin{aligned}&\theta _{j} \sim N(\mu _{\theta }, \sigma ^{2}_{\theta }), \end{aligned}$$
(5)

for \(j=1,\ldots ,J\), with J number of groups, \(\mu _{\theta }\) follows a noninformative normal distribution and \(\sigma ^{2}_{\theta }\) has a defined distribution on positive values.

If the questionnaire is assumed to be administered twice within each group, the previous model is then given by

$$\begin{aligned} y_{kj} \sim \text{ Binomial }(\pi _{kj},N_{kj}), k=1,2 \ \text{ and } \ j=1, \ldots ,J. \end{aligned}$$
(6)

The link function is given by

$$\begin{aligned} \text{ logit }(\pi _{kj})= & {} \log \left( \frac{\pi _{kj}}{1-\pi _{kj}}\right) =a_{j}+\theta _{j}I(k=1), \end{aligned}$$
(7)
$$\begin{aligned}&\theta _{j} \sim N(\mu _{\theta }, \sigma ^{2}_{\theta }), \end{aligned}$$
(8)

where \(y_{kj}\) are the correct answers in the \(j^{th}\) group in the \(k^{th}\) questionnaire. Parameter \(\theta _{j}\) denotes the log odds of their correct answers, while \(a_{j}\) is the log odds of their incorrect answers.

Noninformative prior distributions are commonly employed for parameters \(\mu _{\theta }, a_{j}\), and \(\sigma ^{2}_{\theta }\). In the specialized literature, \(\mu _{\theta } \sim N(0, 1000), a_{j} \sim N(0, 1000)\), and \(\sigma ^{2}_{\theta }\) are often modeled using inverse-gamma, half-Cauchy, uniform, and SBeta2 distributions (Ntzoufras 2011; Pérez et al. 2017). We will here use the SBeta2 and inverse-gamma distributions to model the variance parameters.

In this paper, data from Application 1 in Sect. 5 are modeled using Bayesian logistic models and following a two-step approach. First, the models are fitted per questionnaire both 1-level and 2-level models. Then, a similar procedure is carried out, but this time an independent variable that takes into account the results of the two questionnaires is included. In both scenarios, the results are compared by means of the Deviance Information Criterion (DIC) in order to identify which model exhibits the best predictive ability. In addition to being employed for assessing model adequacy (Spiegelhalter et al. 2002), this criterion is considered a generalization of the Akaike Information Criterion (AIC) and used to compare hierarchical or multilevel models (Ntzoufras 2011).

3 Scaled beta2 distribution

The SBeta2 distribution is a scaled version of the inverse beta distribution and is employed to model precisions and variances in Bayesian hierarchical models. This distribution extends previous proposals, such as those of Gelman (2006) and Berger (2006), for modeling the variance parameter. It is defined as

$$\begin{aligned} \text{ SBeta2 }(\psi \mid p,q,b)=\frac{\Gamma (p+q)}{\Gamma (p)\Gamma (q)b} \frac{\left( \frac{\psi }{p}\right) ^{(p-1)}}{\left( \frac{\psi }{p}+1\right) ^{(p+q)}} \ \text{ for } \ \psi>0, b>0, p>0, q>0. \end{aligned}$$
(9)

Variable \(\psi \) can be represented as the odds ratio, \(\psi =\frac{\pi }{1-\pi }\), where \(\pi \sim \text{ Beta }(\pi \mid p,q)\). The expected value and variance of \(\psi \) are given by

$$\begin{aligned} E(\psi )= & {} \frac{p}{q-1}p \ \text{ for } \ q>1. \end{aligned}$$
(10)
$$\begin{aligned} Var(\psi )= & {} \frac{p(p+q-1)}{(q-1)^{2}(q-1)} \ \text{ for } \ q>2. \end{aligned}$$
(11)

In the \(\text{ SBeta2 }(p,q,b)\) distribution, parameter p controls the distribution behavior at the origin; parameter q, that in the right-hand tail of the distribution; and parameter b, the scale. This distribution is characterized by its reciprocity, i.e., if \(\psi \sim \text{ SBeta2 }(p,q,b)\), then \(1/\psi \sim \text{ SBeta2 }(p,q,1/b).\) Other properties of this distribution are described in the study by Pérez et al. (2017). Expression \(\psi =\frac{\pi }{1-\pi }b\), with \(\pi \sim \text{ Beta }(p,q)\), can be used as a way to generate random variables, \(\psi \sim \text{ SBeta2 }(p,q,b)\). The main limitations of SBeta2 are that, in practice, it is difficult to choose the values of the scale parameter, and that SBeta2 is not conjugate distribution and hinders the computational process. Pérez et al. (2017) suggest that, in order to obtain a robust inference considering an \(\text{ SBeta2 }(p,q,b)\) distribution, the value of parameter q must be between 0 and 1 because the smaller the q value, the heavier-tailed the distribution. They also recommend selecting a value between 0.5 and 1 for p when the variance distribution is centered around zero to avoid a very large shrinkage towards the mean.

Fig. 1
figure 1

SBeta2 distribution for the variance parameter with different values of p, q and b

In light of the above, we here use two forms of the SBeta2 distribution—SBeta2(1,1,4) and \(\text{ SBeta2 }(1/2,1/2,625)\) plotted in the Fig. 1 and the inverse-gamma distribution—IG(\(\alpha \), \(\beta \)), where parameter \(\alpha \) denotes the shape; and parameter \(\beta \), the scale. In particular, we consider \(\alpha =\beta =0.001\) to model the variance parameters in the 2-level Bayesian multilevel logistic regression model (Llera and Beckmann 2016; Gelman 2006; Pérez et al. 2017; Rojas and Ramírez 2019). It is important to be careful when using the inverse-gamma distribution because studies such as those of Berger (2006), Gelman (2006), Pérez et al. (2017), and Rojas and Ramírez (2019) have outlined the drawbacks of using it as a prior distribution. It, for instance, can lead to posterior distributions with incorrect values or improper posterior distributions.

4 Simulation study: comparing the 1-level and 2-level Bayesian multilevel logistic regression model

This section presents a simulation study in which the scenarios were created to best show the behavior of the Bayesian multilevel logistic regression model. Data were generated from a binomial distribution for different groups and proportions of defects. The focus of this simulation study is to predict the proportions of successes, as well as to compare the predictive ability of the 1-level and 2-level Bayesian multilevel logistic model. The procedure we used is as follows:

  1. 1.

    Four groups (\(N_{j}\)) of different sizes (j) are considered. The sample sizes are 3, 6, 10, and 15, as shown in Table 1.

  2. 2.

    For each group, the number of correct answers (successes) is estimated using a binomial distribution with a probability of success in three percentiles (p = 25%, 50%, and 75%).

  3. 3.

    With each group, the 1-level and 2-level Bayesian multilevel logistic regression models are fitted, considering the bugs function in the R library (Sturtz et al. 2010). This function takes into account, among other values, the group sizes, the number of successes obtained in Item 2, the iterations (40,000), and a burn-in rate of 10% (4,000) to eliminate unstable autocorrelations.

  4. 4.

    The convergence of the posterior distributions of the parameters is tested using the chains obtained in Item 3. Then, the KPSS statistic (Kwiatkowski et al. 1992) (it is implemented in the tseries package (Trapletti et al. 2019) in R (R Core Team 2019)) and the p-value of the truncation parameter are calculated. If the p-value is below \(\alpha =0.05\), \(H_{0}\) (the Markov chain has reached the stationary distribution) is rejected. Finally, graphical analysis are performed based on the lags, ergodic averages, and densities. Other convergence proofs can be found in the studies by Cowles and Carlin (1996) and Brooks and Roberts (1998).

  5. 5.

    The probability intervals, along with the DIC, are calculated in each scenario to assess the predictive ability of the models.

Table 1 Sample sizes per group

The prior distributions considered to model the variance parameters in 2-level Bayesian multilevel logistic regression were selected based on the recommendations provided by Pérez et al. (2017). In particular, we employed the SBeta2(1,1,4), SBeta2(1/2,1/2,625), and the inverse-gamma distribution traditionally used to model said effect, i.e., IG(0.001, 0.001). For the 1-level Bayesian multilevel logistic regression model, we used noninformative normal distributions, N(0, 1000) (Ntzoufras 2011). The following are the models proposed here:

  • Model 1 (M1). With SBeta2(1,1,4)

    $$\begin{aligned}&y_{j} \sim \text{ Binomial }(\pi _{j},N_{j})\\&\text{ logit }(\pi _{j})=\theta _{j}\\&\theta _{j} \sim N(\mu _{\theta }, \sigma ^{2}_{\theta })\\&\mu _{\theta } \sim N(0, 1000)\\&\sigma ^{2}_{\theta } \sim \text{ SBeta2 }(1,1,4) \end{aligned}$$
  • Model 2 (M2). SBeta2(1,1,4) in model 1 is replaced with SBeta2(1/2,1/2,625).

    $$\begin{aligned} \sigma ^{2}_{\theta } \sim \text{ SBeta2 }(1/2,1/2,625) \end{aligned}$$

    .

  • Model 3 (M3). SBeta2(1,1,4) in model 1 is replaced with IG(0.001, 0.001).

    $$\begin{aligned} \sigma ^{2}_{\theta } \sim \text{ IG }(0.001, 0.001) \end{aligned}$$

    .

  • Model 4 (M4). 1-level model

    $$\begin{aligned}&y_{j} \sim \text{ Binomial }(\pi _{j},N_{j})\\&\text{ logit }(\pi _{j})=\theta _{j}\\&\theta _{j} \sim N(0, 1000) \end{aligned}$$

Convergence was tested by analyzing the autocorrelation, trace plot, density, and ergodic average. The autocorrelation between the various values of the parameters generated at different lags for \(\pi _{1}\) are shown in Table 2, with three groups and a 25% of correct answers (or successes). According to the information in this table, there is a low association between the values of the parameter generated with the different lags.

Figures 2 and 3 present the trace plots, density, and ergodic average of the chain of values generated with the different lags of the posterior distribution of \(\pi _{1}\). As can be seen in these figures, after burning the first 4000 values, the unstable autocorrelations are eliminated, and the posterior distribution of the parameter of interest is found to exhibit a stationary behavior. In addition to the aforementioned graphs, the KPSS test was applied to parameter \(\pi _{1}\), yielding a value of 0.29075, with a truncation parameter of 17 and a p-value of 0.1. This indicates that there is not enough sample evidence to reject \(H_{0}\), which states that the Markov chain has reached the stationary distribution. Similar results were obtained with M2 and M3, considering different parameters, groups, and percentages of successes.

Table 2 Lag autocorrelations for \(\pi _{1}\) of the model with \(\sigma ^{2}_{\theta } \sim \text{ SBeta2 }(1,1,4)\)
Fig. 2
figure 2

Trace plot and density function of one of the simulated chains of \(\pi _{1}\) in model 1

Fig. 3
figure 3

Ergodic averages of one of the simulated chains of \(\pi _{1}\) in model 1

Figures 4 and 5 display the probability intervals of M1, M2, M3, and M4 for the different groups and percentages of correct answers (p = 25%, 50%, and 75%). From these figures, we observe that M4 provides the widest intervals; and M2, the smallest ones. This is explained by the fact that the variability within each group is not considered in the 1-level model. Also, the intervals are found to be more precise as the sample size increases. Similar results were obtained with 10 and 15 groups (\(N_{10}\) and \(N_{15}\)).

Fig. 4
figure 4

Average percentage of correct answers in the subsequent results of group 1 in the different models for \(p=25\%\), 50%, and 75%

Fig. 5
figure 5

Average percentage of correct answers in the subsequent results of group 2 in the different models for \(p=25\%\), 50%, and 75%

Table 3 Deviance Information Criterion (DIC) of the 1-level and 2-level models by group (\(N_{j}\)), considering the percentages of successes (p)

Table 3 provides the results of the DIC, considering the four models (M1, M2, M3, and M4), the different groups, and the percentages of correct answers (p). As this table shows, M2 has a better predictive capacity because its DIC is lower in all scenarios, whereas the 1-level model (M4) exhibits the lowest predictive ability because of its higher DIC. Based on this, the prior distributions used to model the variance parameters in 2-level Bayesian multilevel logistic regression models are found to produce better results in terms of predictive ability because they provide more precise intervals and better DIC values.

5 Application: modeling the correct answers of university students

Table 4 shows the correct answers of the university students by group. These data were collected from two questionnaires administered to students from two Higher Education Institutions (HEIs) in Medellín (Colombia)—Universidad Nacional de Colombia (UN) (Medellín campus) and Instituto Tecnológico Metropolitano (ITM)—and using a procedure similar to that proposed by Wang et al. (2019). In their study, these authors present a 10-minute trivia game-based activity for the intuitive understanding of confidence intervals. In addition, they explain how the activity can be implemented one or more times in an inferential statistics course.

For data collection, we followed a three-step procedure. First, we formulated 10 general interest questions, the answers to which were provided in the form of an interval (a minimum and maximum value within which respondents considered the answer to each question would fall). Then, students were given 10 minutes to respond. Finally, the students’ responses were tabulated.

The questionnaires were administered to statistical inference students from the aforementioned universities (nine groups from UN and three from ITM) at the beginning and end of the first academic semester of 2018. It should be noted that permission was requested from the governing body of each HEI to conduct the experiment, and students were explained that the information provided would be employed for academic purposes. This experiment could be useful in examining how students understand basic statistical concepts such as confidence intervals and confidence levels in statistical tests.

To analyze the results, we first considered students’ correct answers in the two questionnaires administered throughout the academic semester. Then, a 2-level Bayesian multilevel logistic regression model was fitted to model the proportion of correct answers from questionnaires 1 and 2 separately, considering M1, M2, and M3. The purpose of this was to predict students’ percentage of correct answers in the two questionnaires, as well as to see whether there were any discrepancies in the forecasts when taking into account other values of the parameters in the prior distributions. A 1-level model (M4) was also considered and compared to the other models in order to identify which has the best predictive ability (Tables 5 and 6).

Subsequently, a single model with an independent variable that considered the two questionnaires and the prior distributions used in the aforementioned models was fitted to model the 2-level. The 1-level model was also fitted. The results of the models’ fitting are presented in Table 7.

Table 4 Students’ correct answers per questionnaire and university

To model students’ performance in terms of the probabilities of correct answers within the groups, M1, M2, M3, and M4 were fitted. If students in group j correctly answers question i, then \(y_{ij}=1\), and if they do not, then \(y_{ij}=0\) for \(j=1,2,\ldots 12\) and \(i=1,2,\ldots N_{j}\), with \(N_{j}\) number of questions per group.

The model in Eq. (5), with link function (7), was used to simultaneously model both questionnaires. If students in group j correctly answers question i in questionnaire k, then \(y_{ikj}=1\), and if they do not, then \(y_{ikj}=0\), for \(j=1,2,\ldots 12\), and \(i=1,2,\ldots N_{kj}\) and \(k=1,2\), with \(N_{kj}\) number of questions per group, considering questionnaire k. The following are the models proposed here:

  • Model 5 (M5). With SBeta2(1,1,4)

    $$\begin{aligned}&Y_{kj} \sim \text{ Binomial }(\pi _{kj},N_{kj})\\&\text{ logit }(\pi _{kj})= a_{j}+\theta _{j}I(k=1)\\&\theta _{j} \sim N(\mu _{\theta }, \sigma ^{2}_{\theta })\\&\mu _{\theta } \sim N(0, 1000)\\&a_{j} \sim N(0, 1000)\\ & \sigma ^{2}_{\theta } \sim \text{ SBeta2 }(1,1,4) \end{aligned}$$
  • Model 6 (M6). SBeta2(1,1,4) in model 5 is replaced with SBeta2(1/2,1/2,625)

    $$\begin{aligned} \sigma ^{2}_{\theta } \sim \text{ SBeta2 }(1/2,1/2,625). \end{aligned}$$
  • Model 7 (M7). SBeta2(1,1,4) in model 5 is replaced with IG(0.001,0.001)

    $$\begin{aligned} \sigma ^{2}_{\theta } \sim \text{ IG }(0.001, 0.001) \end{aligned}$$

    .

  • Model 8 (M8). 1-level model

    $$\begin{aligned}&Y_{kj} \sim \text{ Binomial }(\pi _{kj},N_{kj})\\&\text{ logit }(\pi _{kj})= a_{j}+\theta _{j}I(k=1)\\&\theta _{j} \sim N(0, 1000)\\&a_{j} \sim N(0, 1000) \end{aligned}$$

Tables 5 and 6 show the subsequent results of the models’ fitting for questionnaires 1 and 2, respectively; and Table 7 provides the results for both questionnaires in the same model. Tables 5 and 6 present the average (mean), the Standard Deviation (SD), and the probability interval (2.5%, 97.5%) per group, while Table 7 reports similar results but without the standard deviation. As can be seen from the tables above, \(\sigma _{\theta }\) is lower in M2 and M6 and higher in M4 and M7. This is explained by the fact that, although this variability within the groups exists, it is not modeled in the latter models.

As observed in Table 5, UN2 and UN3 (i.e., \(\pi _{5}\) and \(\pi _{6}\), respectively) are the groups with the highest probability of correct answers in the different models, while ITM1 (i.e., \(\pi _{1}\)) is the one with the lowest probability. Regarding the intervals, M4 is found to have the lowest precision, whereas M2 showed the highest precision. Also, from Fig. 6, we can see that M4 produces the widest intervals, while M2 provides the smallest intervals. This is may be due to the fact that variability within each group is not considered in M4.

Fig. 6
figure 6

Average percentage of correct answers per group in the subsequent results of questionnaire 1 for the different models

After the models were compared based on the resulting DIC values, M3 was found to be the model with the best ability to predict the percentage of correct answers because it had the lowest DIC (76.81), while M4 exhibited the lowest predictive ability, with a DIC of 84.81. It should be noted that M1 and M2 also yielded good results because their DIC was 77.01 and 77.21, respectively.

Table 5 Subsequent summary of the results of questionnaire 1 using a 1-level and 2-level Bayesian multilevel logistic regression models

Table 6 presents the posterior results for questionnaire 2. As can be seen, UN3 (i.e., \(\pi _{6}\)) is the group with the highest probability of correct answers in the different models, while ITM3 (i.e., \(\pi _{1}\)) is the one with the lowest probability. As for the intervals shown in this table, we observe a pattern (illustrated in Fig. 7) similar to that in Table 5: M4 produces the widest intervals, whereas M2 provides the smallest intervals. The resulting DIC values of M1, M2, M3, and M4 are similar to those calculated with the results of questionnaire 1 (81.78, 82.12, 81.93, and 84.14, respectively). However, in this case, M1 exhibits the best ability to predict the percentages of correct answers.

Table 6 Subsequent summary of the results of questionnaire 2 using a 1-level and 2-level Bayesian multilevel logistic regression
Fig. 7
figure 7

Average percentage of correct answers per group in the subsequent results of questionnaire 2 for the different models

Finally, Table 7 summarizes the results of M5, M6, M7, and M8 in terms of the probabilities of correct answers, which are similar to those presented in Tables 5 and 6. From this table, it can be seen that UN3, ITM1, and ITM3 (i.e., \(\pi _{6}\), \(\pi _{1}\), and \(\pi _{3}\), respectively) are the groups with the highest and lowest probability of correct answers. Also, when both questionnaires were compared using the odds ratio, we observed that students improved their probability of success in the ITM2, UN1, UN3, UN4, and UN9 groups (i.e., \(OR_{2}\), \(OR_{4}\), \(OR_{6}\), \(OR_{7}\), and \(OR_{12}\), respectively), with ITM2 showing the greatest improvement probability. In the rest of the groups, students were not found to improve such probability.

The behavior of the probability intervals is similar to that observed in Figs. 6 and 7: The model with the least precise probability intervals is M8 and that with the most precise intervals is M6. The DIC values of M5, M6, M7, and M8 were 165.1, 167.5, 165.7, and 169.2, respectively. According to this, M5 is the model with the best predictive ability.

Table 7 Subsequent summary of the results of questionnaires 1 and 2 using 1-level and 2-level Bayesian multilevel logistic regression models with an explanatory variable

6 Concluding remarks

Bayesian multilevel models are commonly used to represent data with levels or hierarchies, particularly in cases involving students within universities. Since there is often no information about the variance parameter, it is usually modeled using noninformative distributions. In this paper, we employed two versions of the SBeta2 distribution and the inverse-gamma distribution to model such parameter.

According to the results, the probability intervals are wider in the 1-level model than in the 2-level models. This is explained by the fact that, although variability within the groups exists, it is not considered in the former. In addition, the 1-level model can provide erroneous results when comparing the groups. For instance, as observed in Fig. 6, when considering variability within each group, the percentages of correct answers in UN2 and UN3 differ from those in ITM1 when there is, in fact, no difference. Likewise, the 2-level models were found to have a greater predictive ability than the 1-level model because they had a lower DIC value.

When the probability of success remains the same, 2-level Bayesian multilevel logistic regression models exhibit a better predictive ability when SBeta2 distributions, rather than a 1-level Bayesian multilevel logistic regression model or an inverse-gamma distribution, are used to model the variance parameter. Based on the estimated DIC values, similar results in terms of the improved predictive ability of the 2-level Bayesian multilevel logistic regression model with respect to the 1-level Bayesian multilevel logistic regression model are obtained when the probabilities of success and sample sizes vary.

Finally, the probability intervals of 2-level Bayesian multilevel logistic regression models proved to be more precise when SBeta2 distributions, rather than 1-level Bayesian multilevel logistic regression model or an inverse-gamma distribution, are employed to model the random effect.