1 Introduction

Biostatistics education at public health schools is often dogged by a pressing question about how to balance conceptual understanding and procedural skills. The distinction between conceptual understanding and mere knowledge of procedures has been thrown to sharp relief time and again in the realm of introductory statistics courses designed for non-statistics majors [3, 25, 28, 29, 34]. As a result, the traditional approach of bygone days that focused excessively on skills in the use of cut-and-dried statistical procedures has largely given way to a variety of new approaches that emphasized deep understanding of statistical concepts [1, 25, 34]. In the past few years, the pedagogical reform movement in the teaching of introductory statistics has accelerated considerably [4, 14, 31, 38], partly due to the recent convergence of statistics and data science [20].

In contrast, few comparable systematic attempts have been made to explore effective ways of teaching to the same audience follow-on biostatistics courses that focus on a particular branch of statistics, e.g., categorical data analysis, although recent years saw encouraging attempts to find innovative ways to teach to non-statistics majors advanced topics that were normally covered by follow-on statistics courses. Among such advanced topics are mixed-effects model [12], principal component analysis [11], and cluster analysis [5]. Public health students enroll in follow-on biostatistics courses either to fulfil their degree requirements or solely to increase their research competency by taking follow-on courses as electives. There is an emerging consensus that training for public health researchers and practitioners should include statistical methods beyond those covered by today’s introductory biostatistics courses [21]. The relative lack of scholarly research on pedagogical methods for follow-on statistics courses poses urgent challenges to public health schools.

The question of how to effectively teach conceptual knowledge in follow-on biostatistics courses is inextricably tied to that of whether statistics should be taught as a branch of mathematics. Whereas the answer might be debatable in the broad context of statistics education [8], in the present context the answer is that it should be taught as statistics to public health students. One important determining factor is the wide variety of backgrounds of public health students. Simpson [36] was among the earliest statistics educators who recognized the wide variety of undergraduate backgrounds as a pedagogically important characteristic of public health graduate students. She delineated the spectrum of these students’ mathematical ability as ranging from “hated maths at school and avoided it ever since” to “enjoy maths.” This frustrating picture painted by Simpson nearly 30 years ago remains mostly unaltered at today’s public health schools. As a result, a major portion of graduate degree-seeking students at today’s public health schools lack calculus based mathematical training, and they are unaccustomed to mathematical reasoning. In a recent example of teaching statistics as statistics to public health students, the instructor weaved high school algebra and concept-driven computing exercises to help students digest conceptual knowledge [48]. The goal of such a course is to use mathematics as an effective tool to improve students’ conceptual understanding of statistics, but it does not specifically aim to elevate students to a higher level of mathematical sophistication.

In principle, the central aim of a follow-on biostatistics course is the same as that of an introductory course. As reiterated recently by Conway IV et al. [9], the instructor should aim at helping students “build procedural fluency from conceptual understanding.” However, in practice, the instructor inevitably faces challenges posed by the often far more complex subject matter. The recent work of Cai & Wang [5] illustrates the nature of such a challenge. These researchers successfully avoided formal mathematical formulation in teaching the basic principle undergirding a clustering algorithm. They used a square bulletin board and pushpins of assorted colors to allow their students to grasp the essence of the algorithm. However, such ingenious pedagogical methods often result from a combination of diligent research and serendipity, and hence cannot be easily generalized to other topics. But a computational approach reported in a recent study [48] can be more easily generalized as a dominant pedagogical tool in developing some follow-on biostatistics courses.

In this paper I show the feasibility of adapting this approach to teach longitudinal data analysis to public health students. Longitudinal studies play an eminent role in epidemiology [6], but studying the subject of longitudinal data analysis is among the most challenging tasks for public health students. To make the original approach more suitable for teaching longitudinal data analysis to public health students, I will weave computational thinking (CT) into a battery of written or digital artifacts (as Wass et al. [43, p. 321] would call them) to aid students in learning longitudinal data analysis. CT has been identified as a fundamental skill comparable to reading, writing and arithmetic in a child’s education [44], and is now regarded as a foundational competency for being an informed citizen of the 21st century [17]. Moreover, CT is believed by some to have the potential to “help create a mindset that empowers students to simultaneously think both statistically and computationally.” [20] The resulting method reported here can be viewed as an example of “CT-in-biostatistics-learning,” a special case of what Grover & Pea [17] called “CT-in-STEM-learning.”

2 Background characteristics of students

The longitudinal data analysis course at my institution was tailor for its MPH (master of public health) and DrPH (doctor of public health sciences) students. The MPH program offers six concentrations, including environmental health, epidemiology and biostatistics, and the DrPH program offers three concentrations. The course’s goal is to train these students as informed consumers of biostatistics, in contrast to training students as innovators of statistics. Thus, a motto of the course is: “promote procedural fluency buttressed by a sound conceptual understanding.” To achieve that goal, the instructor must cope effectively with diverse student backgrounds by decoupling conceptual knowledge from higher mathematics. This has been a dominant theme in research on the teaching of introductory statistics courses [39], but the problem is accentuated in a follow-on course by the considerably more challenging subject matter, as exemplified by Cai & Wang [5] and by Zheng [48]. The following is a representative cross section of students who participated in my past longitudinal data analysis classes, presented here to help the reader appreciate the rationale behind the proposed pedagogy.

On the surface, student I was far from ready to take a concept-centric biostatistics course. As an undergraduate, she majored in Nursing. She enrolled in my longitudinal data analysis class as a second-year MPH student in epidemiology. She did not take any mathematics courses beyond high school, but she took an elective SAS programming course in the preceding semester. SAS [32], widely used in public health research, is a comprehensive statistical software suite, and students taking this elective programming course learn basic programming knowledge by way of writing code in SAS’s unique script language. Student I may seem to fall at the lower extreme of the spectrum, but participants with similar backgrounds were not uncommon. For example, student F had an undergraduate degree in molecular biology. She took an algebra course and a trigonometry course as an undergraduate. Before enrolling in my longitudinal data analysis class, she took a data management course and a categorical data analysis course. The data management course, offered by my school, was her first introduction to computing, where she learned basic concepts about database and code writing. She improved her SAS coding skills while taking my categorical data analysis course, as that course weaves SAS coding into learning conceptual statistical knowledge [48].

At the other end of the spectrum were students who had partially completed an introductory calculus sequence, and who also had some coding experience before coming to my school. For example, student S majored in biology as an undergraduate. She took AP Calculus I in high school, and then took Calculus II for the life sciences in college. In her undergraduate days, she learned the R language via a statistics course, and acquired a rudimentary knowledge of the Python language via an introductory computer science course.

A sizeable portion of students had work experience before enrolling in my school, and some continued to work while pursuing a public health degree via my school’s online degree program [47]. One such example is Student C, who worked as a staff at a government health agency while taking my longitudinal data analysis course. She majored in public health as an undergraduate. Her quantitative skills came from taking a statistics course and an epidemiology course as part of her public health curriculum, and from taking a chemistry course that involved mathematics. In addition, she learned the basics of a popular database tool called SQL and learned to use another major software package for statistical analyses in her work.

3 Exploring constructivism and the zone of proximal development

Education research in the last two decades or so has repeatedly pointed to an idea that was often couched in the language of constructivism [13, 15, 26]. Two aspects of the constructivism theory are particularly relevant to the present study. First, conceptual knowledge cannot simply be dispensed as if it were as obvious a fact as the triangle inequality in elementary geometry, because students must engage in a meaning-making process to assimilate the knowledge. Second, the meaning-making process, aka knowledge elaboration, must be carefully geared to a student’s prior knowledge. As von Glaserfeld [16] put it, “concepts cannot be simply transferred from teachers to students—they have to be conceived.” While this claim may appear new and refreshing to some, observations of this kind can be traced to much earlier investigators. For instance, in the 1930s, the pioneering psychologist Vygotsky [41, p. 170] made the following claim in the context of child psychology.

... pedagogical experience demonstrates that direct instruction in concepts is impossible. It is pedagogically fruitless. The teacher who attempts to use this approach achieves nothing but a mindless learning of words, an empty verbalism that simulates or imitates the presence of concepts in the child. Under these conditions, the child learns not the concept but the word, and this word is taken over by the child through memory rather than thought. Such knowledge turns out to be inadequate in any meaningful application. This mode of instruction is the basic defect of the purely scholastic verbal modes of teaching which have been universally condemned. It substitutes the learning of dead and empty verbal schemes for the mastery of living knowledge.

The above quote is not meant to draw a parallel between public health students learning biostatistical concepts and school children learning words that designate abstract concepts. However, this quote is likely to resonate with many biostatistics instructors at public health schools who attempted to explain abstract concepts such as the maximum likelihood estimate (MLE) of an odds ratio.

One solution to this pedagogical problem lies in the theory of constructivism. Schmidt [33] explained the relevance of constructivism from an information processing perspective. The information processing theory identifies three principles that are essential for successful learning of new knowledge: activation of relevant prior knowledge, provision of a context resembling situations in which the new knowledge will be applied (which is dubbed encoding specificity), and stimulation of knowledge elaboration. The concept of knowledge elaboration in statistics is almost as old as statistics itself. In any PhD statistics curriculum, proofs of theorems and derivations of formulas are a recurring theme. The reasons for teaching proofs and derivations are not merely to develop students′ theoretical research ability, as the proofs and derivations offer students ample opportunities to elaborate on the conceptual knowledge underpinning the theorems and formulas.

Considering the disparate backgrounds of public health students as sampled in the preceding section, a proof-based approach is out of reach for most public health students. An alternative way to foster knowledge elaboration is desirable. For example, Shillam et al. [34] relied on the use of real-world data and technology to help pharmacy students to develop conceptual understanding in an online introductory biostatistics course. Using real-world data is conducive to honing students’ ability to apply as well as to understand biostatistics. However, from the perspective of the information processing approach, the method of Shillam et al. aims more at encoding specificity than at knowledge elaboration. Learning to apply ready-made statistical procedures to real-world data is not always an effective way to stimulate knowledge elaboration. For instance, the statistical concept of the deviance would be hard for public health students to grasp if the instructor merely shows them real-world data examples. As demonstrated recently [48], a hands-on, computational approach gave students a genuine opportunity to elaborate on that concept.

This paper offers a similar computational route for teaching longitudinal data analysis to public health students. The main impetus for extending this novel approach to a new biostatistics course is that the computational approach provides a level playing field for all students with disparate mathematical readiness. The reasons for this advantage of the new method come to the fore when one views the students’ backgrounds from the perspective of the zone of proximal development (ZPD). The concept of the ZPD was proposed by Vygotsky [40, p. 84] to underscore the distinction between a child’s actual development level and her potential development level. In the context of child psychology, a child’s actual development level indicates the difficulty level of tasks that she can accomplish independently, while a child’s potential development level indicates the difficulty level of tasks that she can accomplish with a tutor’s assistance. Therefore, a child’s potential development level refers to her next level of performance achievable with a tutor’s assistance, which may not be related to her long-term or lifetime potential. The concept of the ZPD has long been used fruitfully in primary and secondary education, but its application in higher education is a relatively new phenomenon. One reason for this unfortunate delay was given by Wass & Golding [42]: “Teachers in higher education often do not have formal training as teachers and, therefore, have rarely been exposed to the ZPD as a theory.” Wass & Golding [42] used the ZPD successfully to facilitate the teaching of critical thinking in zoology. Murphy, [30, Chap. 9] expounded on how to explore the enormous potential of the ZPD in higher education.

As can be seen from the cross section of backgrounds of students given in the preceding section, using a theorem-proof approach to elaborate on conceptual knowledge is impractical for most students, because this sort of skill is not close enough to their attained mathematical levels, that is, it falls outside their ZPD. On the contrary, the computational approach exemplified in recent works by Zheng [47,48,49] is appreciably closer to most students’ ZPD. There are several contributing factors. First, public health schools are putting an increasing emphasis on computing literacy education in recent years. As a result, students with scant computing experience acquire basic computing skills by the end of their first academic year. My school’s data management course and SAS programming course allow students such as Student I to transition to a higher computer coding level that my longitudinal data analysis course requires. Second, society’s increasing reliance on data science propels students (either before or after enrolling in public health schools) to seek opportunities to enhance their computing skills, either to improve job performance (e.g. Student C’ effort to learn SQL coding) or for self-improvement (e.g. Student I’s effort to acquire a SAS certificate). These and other factors conspire to equalize the effect of prior computing knowledge on learning conceptual statistical knowledge, but there are no comparable mechanisms equalizing the effect of students’ prior mathematical knowledge.

Another factor that makes the proposed approach feasible is that it leverages imitative activity to advantage. The integral role of imitation in learning within the ZPD is long known. As Chaiklin [7] put it, the concept of the ZPD was constructed around Vygotsky’s technical concept of imitation. To help flatten the learning curve, instructors should view computing and coding as a means to catalyze students’ knowledge elaboration process, and hence acquisition of sophisticated coding skills is not the primary focus. Most of the computing exercises are imitative in nature as far as coding ability is concerned, which allows students to focus on re-creating meaning of statistical concepts embedded in the computing exercises. As will be made clear by examples in the ensuing section, these imitative problems can hardly be solved by mindlessly copying tutorial examples, as students must develop a degree of understanding in the process of solving such a problem.

With the ZPD thus identified, focus should now be shifted to construction of computing tasks that serve as scaffolding to help students assimilate conceptual knowledge that would otherwise be beyond their reach. Scaffolding is a helpful metaphor proposed by Wood et al. [45] and is now widely used in the ZPD theory. Construction of scaffolding for a specific content area is a daunting challenge faced by instructors as they are the ultimate implementers and testers of any learning theory.

4 A computational thinking approach in detail

As a detailed case study, this section offers concrete examples from several categories of problems whose conceptual understanding would be outside a typical students’ ZPD without well-designed computational exercises acting as stepping stones. The learning process is further eased by concentrating attention on cases in which the outcome variables are continuous. Only after students have developed a descent conceptual understanding of key ideas encountered in continuous outcome variable cases does the focus shift to the more popular cases of categorical outcome variables. As a consequence, the multivariate normal distribution is the first hurdle for students to overcome. The multivariate normal distribution should serve as an efficient vehicle for imparting the fundamental concept of the likelihood principle, which is a gateway to enabling students’ active participation in knowledge elaboration.

4.1 The normal distribution is the pedagogical backbone

The multivariate normal density function should be introduced in a heuristic manner. The univariate normal density along with its bell-shaped density curve is used as a basis to draw analogies. Facts about vectors and matrices are discussed only on a need-to-know basis. For example, the concepts of determinants and inverses are defined only intuitively by concrete numerical examples, as students only need to appreciate the fact that the normal density functions transforms an arbitrary point in the two- or higher-dimension space to a positive number, just as the univariate normal density function does in the one-dimensional space. To help students assimilate this basic idea, I asked students to compare the analytic expression

$$\begin{aligned} f(y)= \frac{1}{\sqrt{(2\pi )^n (\det \Sigma )}} \exp \left\{ -\frac{1}{2} (y-\mu )' \Sigma ^{-1} (y-\mu )\right\} \end{aligned}$$

with the following SAS implementation:

figure a

Students then use this implementation to get a feel for the density function by numerical exercises. For example, they are give a point that leads to a value larger than unity, and are asked to think why this does not violate the assumption of the total probability being unity. They are also given two points, one being closer to the mean vector than the other; and they are asked to numerically verify that the point “nearer” to the mean vector should give a larger value.

4.2 The likelihood function is key to knowledge elaboration

As shown previously [47, 48], the likelihood function enables students to concretize a number of important abstract concepts, allowing them to develop a deep understanding of rather mathematical ideas via hands-on, intuitive computational exercises. With a newly acquired working knowledge of the multivariate normal density function, students now are in a position to elaborate on the likelihood principle in a longitudinal data setting. As a warm-up exercise for mainstream models, students are asked to fit a normal model to the following real-world data.

Rat

Week

1

2

3

4

5

1

61

86

109

120

129

2

59

80

101

111

122

3

53

79

100

106

133

4

59

88

100

111

122

5

51

75

101

123

140

6

51

75

92

100

119

7

56

78

95

103

108

8

58

69

93

116

140

9

46

61

78

90

107

10

53

72

89

104

122

These data are from a study reported by Box [2], and the weekly weights are recovered by adding the weekly weight increases as were done by Lindsey [24, p. 150]. Students are required to adopt the following parameterized mean vector:

$$\begin{aligned}{}[\mu +p_1, \mu +p_2, \mu +p_3, \mu +p_4, \mu ] \end{aligned}$$
(1)

and are also asked to assume a covariance matrix of the form \(\sigma ^2 \times A\) with A being a \(5\times 5\) AR(1) matrix.

A conventional approach might direct students’ attention to two procedural skills. The first is data reorganization (see, e.g., Hedeker & Gibbons [18, p. 32]) and the second is code writing for model fitting. Students can master these skills relatively quickly. The conventional approach would then focus on output interpretation as a finale to a student’s learning process. Output interpretation touches the surface of conceptual knowledge, but it is not bona fide knowledge elaboration. To deepen students understanding of the computer output, I asked students to code the log-likelihood function and then use their computer code to verify the MLEs of the model parameters produced by a reputable statistical package such as SAS [32]. As Fig. 1 shows, in the process of accomplishing this task, a student learns the precise meaning of the parameter estimates, sees how the likelihood principle works in practice, and deepens understanding of the multivariate normal density function. After successfully coding the likelihood function, students can further enhance their understanding of the likelihood function by visualizing it. To focus students’ attention on the essence of the likelihood principle, I designed a problem that allowed students to see how the likelihood varies with a particular parameter while keeping the other parameters fixed at their respective MLEs. Figure 1C by a student shows that the log likelihood reaches its maximum value indeed at \(\mu =\hat{\mu }\). In this sequence of computational exercises, students can learn several CT skills such as problem decomposition and debugging, but they acquire these skills by solving interesting statistical problems, and by studying worked in-class examples that focus on statistical concepts. This approach can help instructors avoid the pitfall of teaching CT in a manner disconnected from the disciplinary content that CT aims to serve [23].

Fig. 1
figure 1figure 1figure 1

An illustration of how knowledge elaboration enriches students’ learning experience. Panel a shows a student’s work in model fitting and output interpretation. Panel b shows a student’s work in elaborating on the concept of the likelihood function. The student verifies the maximized likelihood by coding the log likelihood function and computing the log likelihood using SAS output as input of the log likelihood function. In Panel c a student uses visualization to help understand the likelihood principle

4.3 Exploring incomplete data

The above computational exercise capitalizes on students’ innate curiosity and natural inclinations for hands-on activities. The satisfaction derived from verifying the likelihood function through concept-driven computations can sustain students’ interest in learning conceptual knowledge through the semester and beyond. The following exercise piques students’ interest in exploring a new idea – the accommodation of incomplete data.

Textbooks often hail the capability to accommodate incomplete data as a distinctly attractive features of modern longitudinal data methods. Initially, students may be impressed, but soon excitement gives way to curiosity. This provides an opportunity for students to further elaborate on several key ideas. Hence, students are asked to revisit the foregoing computational exercise by assuming that the first rat lacks the second and fourth measurements and the second rat lacks the third measurement.

After studying an in-class example, students can see that the covariance matrix for the first rat should be proportional to the matrix

$$\begin{aligned} \begin{pmatrix} 1 &{} \rho ^2 &{} \rho ^4 \\ \rho ^2 &{} 1 &{} \rho ^2\\ \rho ^4 &{} \rho ^2 &{} 1\end{pmatrix}. \end{aligned}$$

Similarly, they can handily write down the covariance matrix for the second rat. Students can then modify their code from the previous exercise to confirm the maximized log likelihood value. Like the foregoing exercise, this problem aims to induce students to construct knowledge for themselves, a practice falling into the domain of experiential learning. Students are unknowingly led to use several CT techniques such as abstraction, decomposion and generalization to compute the likelihood function. For example, coding the two covariance matrices (see a student’s work in Fig. 2A) and modifying the loop inside the function mylik (see a student’s work in Fig. 2B) hone students’ ability to identify patterns or similarities, and to adapt an existing algorithm to solve similar problems. In this learning process, students do not try to memorize any important facts related to the accommodation of incomplete data, but they are likely to end up unknowingly internalizing the knowledge.

Fig. 2
figure 2

A student’s work to develop an understanding of how to accommodate incomplete data in a normal model. Panel a: coding three types of autoregressive covariance matrices. The first two matrices are for the two rats missing one or two measurements, while the last matrix is for rats with complete data. Panel b: The student coded the log-likelihood function, and found that her own \(-2\log L\) matched the same quantity produced by SAS proc mixed

4.4 Nurturing model building ability

The importance of teaching modeling to non-statistics majors is increasingly appreciated by statistics educators [12, 38]. A follow-on course on longitudinal data analysis should foster among students a sense of model building as part of routine biostatistical practice, because ready-made models do not appear as common in longitudinal data analysis as they appear in an introductory course. Models based on the multivariate normal distribution are an ideal starting point. The following stripped-down linear growth model was designed to help students see what a growth model could do in practice.

$$\begin{aligned} \begin{aligned} \log (w(t))&= {\left\{ \begin{array}{ll} a_1+b_1 t \ \ \ \text {for control}\\ a_2+b_2 t \ \ \ \text {for treatment}\\ \end{array}\right. } \end{aligned} \end{aligned}$$
(2)

Students were asked to fit this model to the rat body weight data of the first two groups of rats from the same study mentioned earlier [2]. Note that the first 17 observations constitute the control group.

With their increasing understanding of the multivariate normal distribution, students can readily see that equation (2) amounts to an economical way of specifying the mean vector of a normal distribution (in contrast to the relatively more wasteful way of specifying it by model (1)). The need for creating a time variable and a treatment dummy was easily understood in the context of this simple model. The subsequent SAS syntax for model fitting can also be mastered with little effort. However, albeit more time consuming, coding the likelihood function of the model helps students internalized the likelihood principle and better appreciate results from model fitting routines. Figure 3 shows the learning process workflow from data organization to model fitting to coding the likelihood function and verifying the quantity \(-2\log L\) (a symbol of SAS denoting the negative of twice the maximized likelihood).

Fig. 3
figure 3figure 3

A typical workflow of the learning process. a data organization and model fitting; b and c coding the likelihood function; verifying \(-2\log L\)

4.5 Simulation facilitates elaboration on a random intercept logit model

As shown repeatedly by statistics educators [19, 27, 35], Monte Carlo simulation plays a unique role in statistics education. In the following example, simulation catalyzed students’ elaboration on an otherwise elusive concept related to a random intercept logit model.

The random intercept logit model may appear unfathomable to students, partly because the likelihood function is not representable by elementary functions. Reliance on a calculus based method at the level of Hedeker & Gibbons [18] is not conducive to generating understanding for most public health students, although the standard integral symbol can still be retained as a notational shorthand for weighted averaging operations. Simulation, a major component of CT, is a convenient vehicle for knowledge elaboration in the present context.

To help students use simulation as a tool for knowledge elaboration, in a video lecture I discuss the mental health study example in Hedeker & Gibbons [18, p.175] from a slightly different perspective, putting emphasis on the likelihood function. My discussion begins with the definition of the linear component

$$\begin{aligned} \eta _{ij}=\beta _0+\beta _1\times drug_i +\beta _2\times \sqrt{t_{ij}} +\beta _3\times drug_i\times \sqrt{t_{ij}}+v_{0i} \end{aligned}$$

where \(v_{0i}\) are drawn from \(\text{ Normal }(0,\sigma _v^2)\). Because observation on the first subject is (1, 0, 0, 1), the contribution to the likelihood function by the first subject is

$$\begin{aligned} \int _{-\infty }^{\infty } \left[ \frac{1}{1+e^{-\eta _{11}}}\times \frac{1}{1+e^{\eta _{12}}} \times \frac{1}{1+e^{\eta _{13}}}\times \frac{1}{1+e^{-\eta _{14}}}\times \phi (v_{01})\right] dv_{01} \end{aligned}$$

with \(\phi (\cdot )\) denoting a normal density function with mean zero and variance \(\sigma _v^2\). Students are asked to interpret the integral sign \(\int\) merely as taking weighted average according to a normal curve.

After studying this worked example, students explore a subset of the real-world data generated by the study of Sommer et al. [37]. This data set consisted of 1200 observations on 275 preschool Indonesian children. The outcome variable was presence or absence of respiratory infection. Students are asked to explore the logit model

$$\begin{aligned} \eta _{ij}=\beta _0+v_{i} +\beta _1 \text{ Age}_{ij} +\beta _2\text{ Xer } +\beta _3\text{ Gender } +\beta _4 \text{ Cos } +\beta _5 \text{ Sin } +\beta _6 \text{ Height4age } +\beta _7\text{ Stunted } \end{aligned}$$

where \(v_{i}\) is a subject-specific random intercept. Definitions of the predictors in the above equation are given in Zeger & Karim [46, p.83]. For example, Xer is a dummy variable indicating whether a child had xerophthalmia, and Stunted is a dummy variable indicating whether the child’s height was below 85% height for the subject’s age. As in the previous examples, model fitting and output interpretation are relatively easy, and hence no students’ work needs showing here. The elaboration part is more engaging and more time-consuming, and debugging may consume a considerable amount of time for some students. A student’s work in verifying the likelihood function using simulation is shown in Fig. 4.

Another challenge is how to teach the concept of marginalization. The analytic approximation method given by Hedeker & Gibbons [18, p.179] is somewhat opaque to most public health students. However, the essence of marginalization is simply about averaging a quantity over the whole population, and students can better appreciate this point by simulation. Hence, students are asked to use simulation to compute the following probabilities and compare the results with those obtained by the analytic approach: Let gender=1, stunted=1, sin=1, cos=0, xer=1 and height4age=0 and find the probabilities of infection for age=\(-30, -20, \ldots , 20,30\). (Age was centered at 36 month.) Figure 5 shows a student’s work. Her results were quite similar to those she obtained by the analytic methods (not shown here).

Fig. 4
figure 4

A student’s work showing how to use simulation to compute the log-likelihood function under a random intercept logit model (Panel a). The \(-2\log L\) quantity she computed matched the same quantity given by SAS (Panel b)

Fig. 5
figure 5

A student’s work using simulation to compute marginalized probabilities under a random intercept logit model

5 Evidence of feasibility

One limitation of this study is the small number of students involved, as enrollment on a follow-on biostatistics course tends to be considerably lower than that on an introductory course. As a result, it is not possible so far to conduct a rigorous assessment of the new approach. Therefore, the reader should view the observations recounted here as the author’s personal experience, and the assertions are not necessarily supported by sufficient evidence that only a large-scale experiment can provide. Still, evidence of feasibility of the new approach has emerged from two course evaluation surveys. Both surveys were managed by Texas A &M University Office of Institutional Effectiveness & Evaluation. Note that conventional intervention assessment tools cannot be used directly, because the new approach shifted attention from learning of knowledge in declarative or procedural form to elaboration of conceptual knowledge.

The 2020 survey had six respondents. Two items are particularly relevant. The first item is the statement “On the whole, this was a good course.” Students had five options: SA (strongly agree), A (agree), U (undecided), S (disagree) and DS (strongly disagree). Three students chose “SA,” two chose “A,” but one chose “D.” The second item is the statement “On the whole, the information learned in the course was valuable to me.” Four students chose “SA,” but two chose “U.” These results suggest that most students were receptive to the new teaching method, but some students needed more individualized help.

The 2021 survey adopted a new format. Three items threw light on the feasibility of the new approach. Item A was stated as follows. “This course helped me learn concepts or skills as stated in course objectives/outcome.” Students were asked to choose an integer between 1 and 4 according to the following criteria.

  1. 1.

    This course did not help me learn the concepts or skills.

  2. 2.

    This course only slightly helped me learn the concepts or skills.

  3. 3.

    This course moderately helped me learn the concepts or skills.

  4. 4.

    This course definitely helped me learn the concepts or skills.

Six students out of a class of 8 responded with four students choosing a “4” and one student choosing a “3.” The survey also reported an average score of 3.83, which is indicative of students’ favorable perception about the effectiveness of the course. If students were indifferent about the curse’s effectiveness, they would have responded randomly. Under this randomness assumption, a simple Monte Carlo test shows that Prob(an average score \(\ge 3.8333)\approx 0.0017\). That is, a statistical test about the course’s effectiveness yields an approximate p value of 0.002.

Item B was phrased as follows. “In this course, I engaged in critical thinking and/or problem solving”. Similarly, students had four options: 1: Never; 2: Seldom; 3: Often; 4: Frequently. Four students gave a “4” and two gave a “3” out of the same six respondents. The average score was 3.666. A similar Monte Carlo exercise gives an approximate p vale of 0.007. This is an encouraging indication of students’ active engagement in knowledge elaboration despite their disparate backgrounds.

Item C is perhaps of the greatest interest to those who may consider adapting the new approach in their own teaching. Item C went as follows: “The instructor’s teaching methods contributed to my learning.” Students could choose an integer between 1 and 3 to indicate one of the following three opinions. 1: Did not contribute; 2: Contributed a little; and 3: Contributed a lot. Five students chose a “3”, but one chose a “1.” Clearly, fives students welcomed the new teaching method, but the same teaching method failed to bring one student’s learning into the ZPD. Future research may explore ways to provide more individualized scaffolding to bring more students into the ZPD.

6 Discussion

Longitudinal data analysis is an important follow-on biostatistics course for public health students. This paper has shown that by shifting to a unique computational approach, the instructor can bring conceptual knowledge learning into most students’ ZPD. Once inside their ZPD, students learn conceptual knowledge through knowledge elaboration that is structured and aided by carefully devised computational exercises. This paper highlights an important role of CT in those hands-on computational exercises.

The potential of CT in education is widely recognized. However, as Czerkawski & Lyman [10] noted, incorporation of CT in higher education faced higher hurdles than in K-12 education, because at college levels the use of CT in teaching and learning is highly dependent on the subject matter content. Integration of CT into curriculum content has been identified as an effective, synergistic approach to deepening students’ understanding of curriculum content, as well as to sharpening their CT skills [22]. The present work, along with previous works [47,48,49], exemplifies this idea in the context of biostatistics education of public health students. Injecting CT into a biostatistics course is less an objective than a means of catalyzing knowledge elaboration. The present work also exemplifies an oft-overlooked distinction, which some believed [10] was important to the promotion of computational thinking in higher education: applying CT skills is not the same as applying computers to data crunching. As shown in the foregoing examples, solving a problem (data crunching) could require far less computer coding than elaborating on the underlying key concepts.

As shown previously in a course on categorical data analysis [47, 48], the likelihood function can be extensively used as levers with which to facilitate knowledge elaboration. Even in the case of Generalized Estimating Equation (GEE), which is not reliant on the likelihood, students can develop a deeper appreciation of the strengths and shortcomings of the GEE approach when they have an intuitive grasp of the likelihood principle. From a statistical perspective, an emphasis on the likelihood function via a hands-on approach enables students to develop a deep understanding of the all-important likelihood principle, which cannot be accomplished by merely examining an analytic model defining equation and trying to understand its verbal explanation by a textbook. From a CT perspective, reconstructing the likelihood function for a concrete problem via computer coding nurtures students’ inclination to think by way of models. The actual coding process provides students with ample opportunity to learn basic CT skills, such as problem decomposition, simulation and debugging. CT skills are intended solely as a vehicle for fostering knowledge elaboration, and students’ acquisition of CT skills is a by-product of learning biostatistics. The pedagogic approach discussed here and elsewhere [48, 49] suggests a practical way of integrating CT training into public health biostatistics curricula.