Introduction

Authentic learning environments are used in many fields to emphasize the relevance of knowledge in the out-of-school world and overcome the problem of inert knowledge (Nachtigall et al., 2022). In science education, context-based learning is a central instructional method for implementing authentic learning in the classroom. Consequently, many science curricula around the world have incorporated context-based learning (e.g., Waddington et al., 2007).

Over the last 40 years, context-based learning (CBL) projects have been launched and evaluated in all parts of the world (Taconis et al., 2016). Although these are referred to under different names, such as the Science-Technology-Society approach (Aikenhead, 1994), Salter’s Chemistry (Bennett & Lubben, 2006), “Chemie im Kontext” (Parchmann et al., 2006), or socio-scientific issues (Sadler, 2009), they all pursue identical objectives. Scientific concepts should be integrated into authentic contexts to emphasize the connection between these concepts and the real world. This should make learning more relevant and provide an answer to the question why it is necessary (need-to-know principle) to learn science (Gilbert, 2006). Most research investigated the cognitive and motivational benefits of CBL (Bennett et al., 2007; Ültay & Çalık, 2012). Less attention has been paid to explore which contextual learning environments, and specifically which contexts, are more effective than others. To systematically investigate the effects of specific contexts and their characteristics, van Vorst et al. (2015) introduced a framework to operationalize these characteristics. In this framework, authenticity is mentioned as a central characteristic of contexts which can be established either through a connection to the everyday life of students or through uncommon scientific phenomena. Furthermore, it can be assumed that authentic learning in chemistry can be established particularly by references to the chemical laboratory, as a laboratory is a typical working area for chemists. Prior research suggests that different authentic contexts have different cognitive and motivational benefits (Fechner et al., 2015; Kölbach & Sumfleth, 2013; Podschuweit & Bernholt, 2018; Sevian et al., 2018a, 2018b; van Vorst et al., 2018) and are appropriate for different students to stimulate situational interest (Habig et al., 2018). Students also choose different contexts depending on their characteristics (e.g., prior knowledge and interest in chemistry; Güth & van Vorst, 2023; see also van Vorst & Aydogmus, 2021). Hence, it can be assumed that the choice of different contexts can have an impact on students’ individually perceived authenticity (Shaffer & Resnick, 1999). According to the self-determination theory (Ryan & Deci, 2020), offering choices leads to the satisfaction of the basic psychological need for autonomy, which has a positive effect on intrinsic motivation. Research points mostly to positive effects of choice on intrinsic motivation, situational interest, and performance (Høgheim & Reber, 2017; Patall et al., 2008; Reber et al., 2009). Effects on cognitive load in this context are less investigated so far (e.g., S. Schneider et al., 2018). However, some studies suggest that not the act of choosing is beneficial, but rather the congruence of the selected option with personal values and interests (Katz & Assor, 2007; Patall, 2013; Wilde et al., 2018). Little is known whether these results can be generalized to the choice between authentic contexts. Therefore, we pose the question: To choose or not to choose? Should learners choose or not choose between different contextualized tasks?

This paper aims to bridge this research gap through an experimental study in a context-based learning environment in which chemistry learners can choose different contexts or get contexts that do or do not match their individual characteristics.

Context-based learning

Despite the widespread use of CBL, the academic discourse is still characterized by a certain disagreement regarding the meaning of the term context (see Podschuweit & Bernholt, 2018). Finkelstein (2005) has proposed a model in which he distinguishes between three different levels of this term where the interaction between the task, the student, and the scientific concept (first level) is surrounded by the situation in the classroom (second level) and the idioculture of the learning group (third level). Gilbert (2006) took up the innermost level of the context term and described four perspectives on the definition of contexts: context as the direct application of concepts (1), context as reciprocity between concepts and applications (2), context as provided by personal mental activity (3), and context as the social circumstances (4). Our consideration of context is at the task level (Finkelstein, 2005) and in line with Gilbert’s (2006) second and third perspective. The context is constructed by an authentic situation from the real-world in a learning task. This situation does not only serve as a starting point for the development of scientific concepts (Bennett et al., 2007), but also frames the entire learning process like a storyline (Nentwig et al., 2007).

Based on a similar definition, van Vorst et al. (2015) developed a framework to further describe contexts and their characteristics in chemistry education. For this purpose, they have summarized and systematized context characteristics from the literature. Authenticity is described as a central contextual characteristic resulting from the interaction between context and students (van Vorst et al., 2015). Therefore, the context should be related to an actual, realistic, and genuine experience that students can encounter (Weiss & Müller, 2015). Authentic contexts can vary in their familiarity for learners by using situations, objects, or activities that relate to students’ immediate real-life experiences (Broman & Simon, 2015; Campbell & Lubben, 2000; George & Lubben, 2002) or to uncommon phenomena that do not or only rarely occur in the students’ everyday life (Kasanda et al., 2005). In chemistry education, the laboratory is probably the most authentic learning environment for students as chemistry is practiced here (e.g., Prins et al., 2008).

Empirical findings indicate that CBL is motivationally beneficial (Bennett et al., 2007; Sevian et al., 2018a, 2018b; Taasoobshirazi & Carr, 2008; Ültay & Çalık, 2012), but effects depend mainly on the specific context (Broman et al., 2018, 2020; Podschuweit & Bernholt, 2018; Sevian et al., 2018a, 2018b). Sevian et al., (2018a, 2018b) compare two university courses that introduced kinetic gas theory through two different contexts. The students of the first course identified the gas in a balloon by representing themselves as a human model, while the students of the second course used a computer simulation to develop an approach to reduce the CO2 concentration in the atmosphere. Students in the first course indicated a better understanding of particle motion, while those in the second course showed more advanced thinking and better application of chemical terminology. Podschuweit and Bernholt (2018) conducted a study with secondary school students to investigate the effects of heterogeneous and homogeneous contextual learning settings. In the homogeneous setting, students learned with contexts related to power plants, while the heterogeneous setting included contexts from physics, biology, and chemistry. The researchers found that using different contexts better addressed the characteristics of different learners and led to improved transfer performance. However, it is unclear which contexts are more effective for learning than others. Fechner et al. (2015) demonstrated in an experimental study, in which students worked on inquiry-based tasks, that everyday contexts triggered a higher level of situational interest than contexts from the laboratory. Van Vorst et al. (2018) revealed that contexts with uncommon phenomena lead to a higher emotional valence than everyday contexts. This effect was moderated by the learners’ prior knowledge and individual interest in chemistry. Learners with high interest and prior knowledge showed higher emotional valence when learning with uncommon phenomena and learners with low interest and prior knowledge with everyday contexts (Habig et al., 2018). This effect was also reflected in students’ context choice (van Vorst & Aydogmus, 2021). Students with low prior knowledge in chemistry, interest in chemistry, and chemistry-related self-concept often chose everyday contexts. Uncommon phenomena are chosen by students with higher prior knowledge, interest, and self-concept because of surprising information. Students with the highest prior knowledge, interest, and self-concept chose laboratory contexts. In each student group, no differences were found in terms of task-related satisfaction, situational interest, and cognitive load depending on the chosen context (Güth & van Vorst, 2023). Accordingly, we assume that each student chooses a context that is congruent with his or her values and interests. For example, learners were likely to choose the task that interests them most and that best corresponds to their prior knowledge (Reber et al., 2009).

Effects of choice

In self-determination theory, autonomy plays a central role as one of the three basic psychological needs to emerge intrinsic motivation. Ryan and Deci (2017, p. 97) refer to autonomy as a “…vehicle through which the organisation of the personality proceeds and through which other psychological needs are actualised.” Offering choice is an instructional method to meet the basic need for autonomy (Ryan & Deci, 2020).

Research shows ambiguous results on the effectiveness of choice. Schraw et al. (1998) found no effects from choosing a reading text in terms of interest or performance in an experimental study. Choice between an essay or a crossword puzzle did not influence cognitive or affective engagement (Flowerday & Schraw, 2003). Choosing names, animals, or numbers did not improve vocabulary learning among fifth and sixth graders (D’Ailly, 2004). In contrast, choice was beneficial to students’ interests (Reber et al., 2009). Høgheim and Reber (2017) showed that choice positively influences the triggered situational interest but not the maintained situational interest. S. Schneider et al. (2018) conducted two experimental studies and demonstrated that the provision of choice reduced intrinsic cognitive load, which is related to perceived task difficulty. This result contradicts the cognitive load theory (Sweller et al., 1998), which considers invested mental effort and perceived task difficulty as key measures of cognitive load (Schmeck et al., 2015). Choosing between learning materials might increase the cognitive load as it requires redundant cognitive processes for learning.

A meta-analysis synthesized empirical findings and concluded that offering choices is mostly beneficial for intrinsic motivation, engagement, and performance (Patall et al., 2008). However, these effects are often confounded by interest in the chosen option. In their literature review, Katz and Assor (2007) concluded that when choice is separated from the congruence of the options to the person’s values, interests, and goals, the mere act of choosing is not motivating. This assumption is also supported by recent research. Patall (2013) reported that choosing between tasks related to participants’ personal goals and life concerns does not further increase motivation. Flowerday and Shell (2015) demonstrated in a path model that situational interest after reading a text is determined by topic interest and not by the choice of the text. A quasi-experimental study in biology classes found that students were most intrinsically motivated when they could learn with their preferred option but did not have to choose it first (Wilde et al., 2018). Consequently, it is sufficient to get what you want even if it is not self-chosen.

The present study

The focus of the present study is to investigate the effects of context choice and congruence between students’ characteristics and different authentic contexts on task-related satisfaction, situational interest, and cognitive load. Based on the literature, we expected that students show higher task-related satisfaction (Hypothesis 1a) and situational interest (Hypothesis 1b) when they learn with an authentic context which is congruent with their individual characteristics. We assumed that the choice of context has no effect on task-related satisfaction (Hypothesis 2a) or situational interest (Hypothesis 2b) if students learn in a context that corresponds to their individual characteristics. Regarding cognitive load, we expected that the choice of context would reduce cognitive load (Hypothesis 3).

Method

Participants and design

To test our hypotheses, we conducted an experimental study with 355 students from 20 classes (7 secondary schools) in the German federal state of North Rhine-Westphalia. All students and schools voluntarily participated in the study. To comply with the current curriculum, the study was conducted in the third year of learning chemistry, in which the content of acidic and alkaline solutions is to be taught. Due to the ongoing COVID-19 pandemic, there was a high number of absences due to illness. Assuming that the missing values are completely at random (Lüdtke et al., 2007; Rubin, 1976), listwise deletion led to a total sample of N = 217 students from 19 classes. Students were between 14 and 18 years old (M = 15.03, SD = 0.80), 45.62% of the students were male and 47.47% were female, with 6.91% not indicating gender.

Students worked on three sequential subtasks on the content of acidic and alkaline solutions integrated in different contexts. Situational interest and cognitive load are measured after each of the three subtasks. At the end of the learning unit, we also measured students’ task-related satisfaction. Students’ reading comprehension was assessed as a control variable. We integrated three different treatments into the experiment following the design used by Wilde et al. (2018):

  1. (1)

    Students were able to choose a contextual task independently (choose & match)

  2. (2)

    Students were assigned a contextual task that matches their individual characteristics (no choose & match)

  3. (3)

    Students were assigned a contextual task that did not match their individual characteristics (no choose & no match)

We developed a predictive model with machine learning and supervised learning for the assignment of context-based task. The predictive model suggested an appropriate context based on the students’ measured characteristics (prior knowledge in chemistry, chemistry-related self-concept, interest in chemistry, choice motives). Thus, students in the second and third treatments did not choose a context in advance to identify the individual matching or non-matching contexts. Consequently, these students were not deprived of a previously granted choice, which has a negative impact on intrinsic motivation (Patall et al., 2008).

Development of the predictive model

The predictive model was developed using machine learning and data from a previous study. In this earlier study, we analyzed which students chose which contexts for learning. Learners who are similar in terms of their individual characteristics indicate no differences in task-related satisfaction, situational interest, and cognitive load after completing self-selected contextual tasks (Güth & van Vorst, 2023). We assume that self-selected contexts are congruent with the individual characteristics of the students (see also Reber et al., 2009). Supervised learning algorithms were applied to estimate the parameter of a model (Kuhn & Johnson, 2013), which can predict students’ context choices (output) as accurately as possible based on individual characteristics (input). The model can subsequently be used to predict the context choice based on students’ characteristics.

We used the tidymodels packages (Kuhn & Wickham, 2020; see Kuhn & Silge, 2022) in R to develop the predictive model. For classification, we tuned several models and compared their performance (see supplementary information, SI). Precision, recall, and F1-score were used as measures to evaluate the performance of the different models. Precision can be understood as the ratio of correctly identified positive cases in a category to all predicted positive cases. Recall is the ratio of correctly identified positive cases to all actual positive cases (true positives + false negatives). The harmonic mean, called the F1-score, is calculated to take both ratios into account (Géron, 2020). We report macro averages for all metrics across all categories to assess the performance of the entire model. In this way, all categories are considered equally (Kuhn & Silge, 2022).

An artificial neural network showed the highest performance in the test set (F1-scoremacro = 0.614, Precisionmacro = 0.647, Recallmacro = 0.597). As shown in Table 1, the neural network had difficulties in predicting the choice of uncommon and laboratory contexts.

Table 1 Confusion matrix for predicting student’s context choice in our held-out test set

It must be stated that the prediction by the neural network does not completely correspond to the actual context choice of the students (cf. “Results”). Some learners will therefore be assigned a context that does not match their individual characteristics. We took this finding into account by asking the students at the end of the unit whether they were satisfied with the context (cf. “Measures”).

Context-based learning environment

During the intervention, students worked individually in a context-based learning environment on a tablet computer. The learning environment consisted mainly of text and picture-based learning material. To avoid further effects on situational interest or satisfaction, the use of interactive elements, videos, or experiments was avoided. In designing the learning environment, we used principles (e.g., multimedia principle, signaling principle) from the cognitive theory of multimedia learning to support information processing (Mayer, 2020). Three different versions of the learning environment were available. We used the framework of van Vorst et al. (2015) for this purpose (Table 2). Except of the laboratory context, all contexts were from the human body and human disease topic area, as gender differences for these were found to be smaller than for other topics (OECD, 2016; Sjøberg & Schreiner, 2010). The dimensionality of the context characteristics and the affiliation of the contexts to the characteristics were analyzed in a previous study (Güth, 2023). It was confirmed that the contexts also correspond to the assumed context characteristics from the students’ point of view.

Table 2 Developed contextual tasks and their characteristic affiliation

Each of the three differently contextualized tasks consisted of three subtasks on the content of acidic and alkaline solutions, which is crucial in the current curriculum for the third year of learning chemistry. The first task dealt with pH indicators, the second with acids and their properties, and the third with the neutralization reaction (see Fig. 1 for an excerpt from the neutralization task in the context “Why brushing your teeth is so important”).

Fig. 1
figure 1

Extract from the neutralization task in the context “Why brushing your teeth is so important” translated into English

In each of the subtasks, a new aspect of the context was considered. The context framed the whole unit as a story (Nentwig et al., 2007). The sequence of the learning unit was based on the prototypical sequence of context-oriented learning from the German project “Chemie im Kontext” (Parchmann et al., 2006). More information about the learning environment is available in SI.

Measures

Reading comprehension

We measured students’ reading comprehension using an established instrument from W. Schneider et al. (2017). Satisfactory values for retest reliability were identified by W. Schneider et al. (2017) during test development.

Content-knowledge in chemistry

Content-knowledge in chemistry has been operationalized as the knowledge of fundamental basic concepts (structure and composition of matter, chemical reaction, energy) of the German educational standards acquired in the first 2 years of learning chemistry (Walpuski et al., 2011). We used 57 multiple-choice single-select items from Celik (2022) in a balanced incomplete block design. A computed Rasch model with partially fixed item parameters,Footnote 1 which were determined in a previous study (Güth & van Vorst, 2023), shows a good fit (Bond et al., 2021) and sufficient reliability (0.87 ≤ wMNSQ ≤ 1.22; − 2.18 ≤ t ≤ 2.39; WLE-Reliability = 0.61). We calculated weighted likelihood estimates (WLEs) in the TAM package (Robitzsch et al., 2022) in R as a measure of content knowledge.

Interest in chemistry

Students’ interest in chemistry was measured by four scales: individual interest in chemistry (15 items, e.g., “What we do in chemistry class interests me,” Fechner et al., 2015), content-related interest (3 items, e.g., “I am interested in the properties of acidic and basic solutions,” adapted from Habig et al., 2018b), topic-related interest (5 items, e.g., “I find it exciting to look at the human body from a chemical perspective,” adapted from Elster, 2007), and extrinsic motivation in chemistry (3 items, e.g., “It is important for me personally to get a good grade in chemistry,” adapted from Glynn & Koballa, 2006). All items were rated on a 4-point Likert scale from “does not agree at all” (1) to “totally agree” (4). We examined construct validity using a confirmatory factor analysis (CFA) with the lavaan package (Rosseel, 2012) in R and assumed sufficient model fit (Hu & Bentler, 1999; Marsh et al., 2010), χ2(293) = 658.65, p < 0.001, CFI = 0.921, TLI = 0.912, RMSEA = 0.062, SRMR = 0.054. Reliabilities were reasonable (0.73 ≤ ω ≤ 0.94).

Chemistry-related self-concept

We assessed the chemistry-related self-concept with a self-concept (8 items, e.g., “Chemistry comes easy to me”) and self-efficacy scale (3 items, e.g., “If I try hard, I can easily keep up in chemistry”) originally developed by Hoffmann et al. (1998) and adapted for chemistry education by Habig et al. (2018). Items were also rated on a 4-point Likert scale from “does not agree at all” (1) to “totally agree” (4). CFA showed sufficient model fit (χ2 (43) = 78.756, p < 0.001, CFI = 0.983, TLI = 0.978, RMSEA = 0.051, SRMR = 0.029) and excellent reliabilities (ωSelf-concept = 0.91, ωSelf-efficacy = 0.85).

Motives for choosing a context-oriented task

Students’ motives for choosing a contextual task were measured for the assignment with the predictive model. Based on the study by van Vorst and Aydogmus (2021), personal relevance (4 items, e.g., “…because the task is about a topic that I also encounter in everyday life”), surprising information (3 items, e.g., …because I was very surprised by some of the information in the text”), and interest and curiosity (3 items, e.g., “…because the topic of the task interests me”) were surveyed as meaningful motives for choice. CFA implied sufficient construct validity, χ2 (31) = 78.573, p < 0.001, CFI = 0.941, TLI = 0.915, RMSEA = 0.067, SRMR = 0.061. Apart from the scale for capturing the personal relevance (ω = 0.44), good reliabilities were found (ωsurp.information = 0.79, ωint.cur. = 0.76).

Situational interest

Situational interest of the students was captured according to the concept of Krapp (2007) through the emotional valence (3 items, e.g., “I am excited about the topic of the next task”), value-related valence (3 items, e.g., “The topic of the assignment was of personal importance to me”), and the epistemic component of interest (4 items, e.g., “I would like to learn more about the topics covered in today’s task”). We adapted items from Engeln (2004) that were rated on a 4-point Likert scale. Since we were able to identify high latent correlations between emotional and value-related valence in the CFA, we specified a common latent factor for emotional and value-related valence which we called affective valence. Since the items were used at several measurements, we tested for longitudinal measurement invariance according to the specifications of Mackinnon et al. (2022). After stepwise evaluation of configural, metric, scalar, and strict measurement invariance, partial strict measurement invariance (see Steenkamp & Baumgartner, 1998) was confirmed (see Table 7 in appendix). Reliabilities were satisfactory for each scale across each measurement (0.74 ≤ ω ≤ 0.90).

Cognitive load

Cognitive load was assessed with two single-item rating scales of perceived task difficulty (“how easy or difficult were the tasks to understand?”, Kalyuga et al., 1999) and invested mental effort (“When working on and understanding the tasks, my overall mental effort was…”, Paas, 1992). We used a German version with 7-points (Schmeck et al., 2015), ranging from very low (1) to (7) very high (invested mental effort) and very easy (1) to (7) very difficult (perceived task difficulty).

Satisfaction

At the end of the contextual learning unit, we asked the learners by one item whether they would work on this context again. For this, we presented learners with standardized descriptions of all contexts. Possible answers were “No, I would prefer not to work on any of the topics” (1), “No, I would work on another topic” (2), and “Yes, I would work on the same topic again” (3). We always displayed the name of the context instead of the term topic.

Procedure

Data collection was carried out in regular chemistry classes in three lessons (each 45 min, see Fig. 2). For the entire study, students were given a tablet computer and a tablet pen. In the first lesson, students were instructed about the procedure of the study. An individual anonymized code was given to each student. Students were not informed about the actual purpose of the study. We assessed content-knowledge in chemistry, interest in chemistry, chemistry-related self-concept, and motives for choosing a context in a web application on the tablet computer. After the lesson, we randomly assigned the students to a treatment. Based on the previously measured characteristics (content-knowledge in chemistry, interest in chemistry, chemistry-related self-concept, choice motives), the prediction model determined a context that was congruent or incongruent with the individual characteristics. In the next lesson, we provided a list with the individual code of each student and the learning material to be worked on. The title of the learning material was pseudonymized so that students did not know what treatment they were getting or in which context their peers were learning. Included in the learning material of the choose & match treatment were the three standardized context descriptions that students were asked to use to decide on a learning material. Once the decision was made, the learners were directed to the appropriate learning material. The learning material of the other treatments only contained the context description of the assigned context. After completing the first subtask, students were surveyed regarding their situational interest and cognitive load. In the remaining time, the learners worked on the second subtask and answered the items on situational interest and cognitive load again. In the last lesson, students worked on the third sub-task. Afterwards, we measured situational interest and cognitive load again. We also used an item to evaluate the students’ satisfaction. Subsequently, students completed the reading comprehension test.

Fig. 2
figure 2

Summary of the procedure

Data analysis

All analyses were conducted in R (Version 4.2.3, R Core Team, 2022). A multivariate analysis of variance (MANOVA) was conducted to identify differences between the treatments in the control variables. χ2-tests were performed to investigate differences between nominal variables. Requirements have been reviewed and, unless otherwise stated, are met (Field et al., 2012).

Since there is a nested structure in our data (e.g., measurements nested in students, students nested in classes), we performed multilevel analyses with the lme4 package (Bates et al., 2015) to answer our hypotheses. Due to the small number of classes (N = 19), we did not model the class level (Singmann & Kellen, 2019). We proceeded as follows for each dependent variable that was measured repeatedly: In a first step, we computed a null model without fixed effects to determine the within-group and between-group variance and calculated the intraclass correlation (ICC). Then, we computed a full model with fixed effects for time, treatment and processed context, and the two-way and three-way interaction effects. Repeated measurements (time) were modeled as a categorical variable to investigate differences between measurement points (Eid et al., 2017). In this case, no random slope can be estimated for time, as there is only one observation for each measurement. The random slope variance is completely confounded with the random error variance. The model is not identifiable (Barr, 2013; Barr et al., 2013). Accordingly, we computed a multilevel model with a random intercept for each participant. We used the Kenward-Roger approximation to calculate p-values for the fixed effects in the lmerTest package (Kuznetsova et al., 2017), where the degrees of freedom are estimated for an F-test. Compared to the likelihood-ratio test of nested models, this method does not yield anti-conservative p-values for smaller samples (Kuznetsova et al., 2017; Luke, 2017). Pairwise contrast analyses were performed in the emmeans package (Lenth, 2022) to investigate between which factor levels of the fixed effects significant differences are evident (Field et al., 2012). Estimated marginal means were used to account for dependencies in the data (Singmann & Kellen, 2019). For pairwise comparisons, Bonferroni correction was used. We calculated r to measure effect size and interpreted it according to the empirically determined guidelines of Lovakov and Agadullina (2021) for social psychology (small: r = 0.12, medium: r = 0.24, large: r = 0.41).

Results

Comparability of treatments

First, we investigated whether randomization resulted in comparable groups. Using a MANOVA and Pillai’s Trace, we found no significant differences between the groups regarding the control variables, V = 0.11, F(18, 414) = 1.33, p = 0.19, η2p = 0.05. Treatments were comparable in terms of gender, χ2 (2) = 0.29, p = 0.87, V = 0.03. However, treatments differed in terms of the contexts worked on, χ2 (4) = 13.27, p = 0.01, V = 0.17. Students from the no choose & no match group worked more often on the laboratory context. Demographic data and mean values of control variables for each treatment can be found in Table 8 (appendix).

Validation of the context assignment by the neural network

To verify how valid the context assignment by the neural network was, we compared the actual context choice of the choose & match treatment with the prediction by the neural network. The other treatments could not be used for this because they did not choose a context. However, the choose & match treatment was a random selection of the whole sample, so the result was assumed to be representative of the whole sample. Performance has decreased compared to the test sample (F1-scoremacro = 0.404, Precisionmacro = 0.424, Recallmacro = 0.405). As already shown, the prediction by our neural network did not completely match the actual context choice of the students (Table 3). We addressed this issue in the further data analysis.

Table 3 Confusion matrix for predicting student’s context choice in the choose & match treatment

Effects on situational interest

According to the result in the CFA, we computed a multilevel model for the affective valence and the epistemic component of situational interest. The intraclass correlation (ICC) indicated that 66.7% of the variance in affective valence and 74.7% of the variance in epistemic component is attributed to the individual student. Multilevel modeling was therefore necessary. In the next step, we computed the full model (Table 4).

Table 4 Multilevel analysis for predicting the affective valence of situational interest, epistemic component of situational interest, invested mental effort, and perceived task difficulty by time, treatment, processed context, and their two-way and three-way interactions

We found a significant main effect of time on the affective valence (Fig. 2a) and epistemic component (Fig. 2c) of situational interest.Footnote 2 Affective valence decreased from the first to the second measurement with medium effect size, b = 0.27, t(415) = 7.84, p < 0.001, r = 0.36. Affective valence does not decrease further from the second to the third measurement, b = 0.02, t(415) = 0.68, p = 0.98, r = 0.03. A similar decrease from the first to the second measurement was also found for the epistemic component, b = 0.13, t(415) = 4.19, p < 0.001, r = 0.20. There was no further decrease from the second to the third measurement, b = 0.01, t(415) = 0.17, p = 0.99, r = 0.008.

In addition, there were significant main effects of treatment on the affective valence (Fig. 3b) and epistemic component (Fig. 3d) of situational interest. On each measurement (see Table 10), there were no significant differences in affective valence between choose & match and no choose & match treatment. We identified that at each measurement point, the affective valence of the choose & match and the no choose & match treatment were higher than the affective valence of the no choose & no match treatment. The same differences were also evident for the epistemic component (Table 11). However, the effects between choose & match and no choose & no match and no choose & match and no choose & no match treatment were slightly lower (0.14 ≤ r ≤ 0.20). Main effects of the processed context were not significant just like the two-way and three-way interaction effects.

Fig. 3
figure 3

Estimated marginal means (95% CI) of affective valence of situational interest across duration of the contextual learning unit (a) by treatment (b) and estimated marginal means (95% CI) of epistemic component of situational interest across duration of the contextual learning unit (c) by treatment (d)

Effects on cognitive load

Data analyses were performed separately for perceived task difficulty and invested mental effort. A total of 8.4% of the variance in mental effort was attributed to the individual student. The proportion of variance in task difficulty that can be attributed to the individual student was much higher at 27.1%. Although the ICC was lower than in situational interest, a multilevel analysis was conducted (Table 4).

There were significant main effects of time on mental effort (Fig. 4a) and task difficulty (Fig. 4c). Mental effort increased from the first to the second measurement (b =  − 0.21, t(400) =  − 15.47, p < 0.001, r = 0.61) and remained constant from the second to the third measurement, b =  − 0.06, t(407) =  − 0.45, p = 0.98, r = 0.02. Similarly, perceived task difficulty increased from the first to the second measurement (b =  − 1.25, t(400) =  − 9.55, p < 0.001, r = 0.43), but then remained stable, b =  − 0.03, t(407) =  − 0.24, p = 0.99, r = 0.01.

Fig. 4
figure 4

Estimated marginal means (95% CI) of invested mental effort across duration of the contextual learning unit (a) by treatment (b) and estimated marginal means (95% CI) of perceived task difficulty across duration of the contextual learning unit (c) by treatment (d)

Furthermore, we found a significant main effect of the treatment on mental effort (Fig. 3b). Students from the choose & match treatment reported higher mental effort at the second measurement point than students from the no choose & match treatment, b = 0.78, t(520) = 2.96, p = 0.01, r = 0.13. These differences were not significant at other times of measurement. We found no significant effects of the treatment on perceived task difficulty.

Effects on task-related satisfaction

At the end of the context-based learning unit, we asked students if they would work on the context again. A total of 40.09% of the students indicated they would like to work on the same context again. In contrast, 43.87% answered they would prefer to work on one of the other contexts. Only a small proportion of students (16.13%) did not want to work on any of the contexts again. A χ2-test revealed a significant difference with a small to medium effect between students’ satisfaction and treatment, χ2 (4) = 25.24, p < 0.001, V = 0.24. Students from the two match treatments (choose & match/no choose and match) wanted to work on the same context more often than the students from the no match treatment. In contrast, the students from the no match treatment indicated to work on a different context more often (Fig. 5).

Fig. 5
figure 5

Task-related satisfaction of the treatments

Effects of choosing non-matching contextual tasks

Since the satisfaction survey at the end of the learning unit showed that not all students received a matching or non-matching context, the treatment was adapted (Table 5). Demographic data and the descriptive data of the adapted treatments are available in Table 13. A MANOVA using Pillai’s Trace showed that the adapted treatments differ with respect to the control variables, V = 0.223, F(27, 621) = 1.84, p = 0.005, η2p = 0.07. Post-hoc conducted ANOVA indicated differences with medium effect size in content-related interest, F(3, 213) = 5.20, p = 0.002, η2 = 0.07). No differences were found in the other control variables, including gender (χ2 (3) = 3.42, p = 0.33, V = 0.13) and processed context (χ2 (6) = 10.04, p = 0.12, V = 0.15).

Table 5 Description and number of students (n) of the adapted treatments

We repeated the multilevel analysis with adapted treatments for situational interest (Table 6), as we have identified meaningful effects here. Since the treatments differed regarding content-related interest, we considered it as a covariate (grand-mean centered). As we could not find a main effect of context and the treatments were comparable in terms of processed context, we did not include the processed context in the model to calculate a more parsimonious model.

Table 6 Multilevel analysis for predicting the affective valence and epistemic component of situational interest by time, adapted treatment, content-related interest, and their two-way and three-way interactions

Not surprisingly, significant main effects of time and content-related interest were evident for the affective valenceFootnote 3 (see also Table 14). In addition, there was still a significant main effect of treatment (Fig. 6). The pairwise contrast analysis with Bonferroni correction confirmed the already identified differences between the choose & match and no choose & no match as well as the no choose & match and no choose & no match treatment for each measurement with small to medium effect sizes (Table 15). Students in the choose & no match treatment showed comparable affective valence to the other treatments at the first and second measurements. At the third measurement, the affective valence was below the choose & match (b = 0.42, t(345) = 3.02, p = 0.01, r = 0.16) and no choose & match treatment (b = 0.53, t(345) = 4.20, p < 0.001, r = 0.22). There were no differences to the choose & no match treatment, b =  − 0.04, t(345) =  − 0.39, p = 0.99, r = 0.02. The significant cross-level time × treatment interaction on affective valence and the subsequent contrast analysis showed different trends for students in the no choose & match and choose & no match treatment from the first to the third measurement, b =  − 0.33, t(417) =  − 3.11, p = 0.036, r = 0.15. For the epistemic component of situational interest, effects were similar, although the differences shown are not significant at every time of measurement (Table 16). Moreover, the time × treatment interaction was not significant.

Fig. 6
figure 6

Estimated marginal means (95% CI) of affective valence of situational interest across duration of the contextual learning unit by adapted treatment (a) and estimated marginal means (95% CI) of epistemic component of situational interest across duration of the contextual learning unit by adapted treatment (b)

Discussion

In this experiment, we investigated the effects of choosing between authentic contexts on task-related satisfaction, situational interest, and cognitive load. In particular, we analyzed whether the congruence between individual characteristics and the different contexts plays a decisive role in this regard.

For this purpose, we have trained a machine learning model that predicts the context choice using the individual characteristics of the students. The results show that the prediction made by the neural network does not fully correspond to the choice of context made by the students. There can be several reasons for this. Our neural network uses latent variables to predict students’ context choice. Measurement error can cause ML algorithms to poorly model relationships between latent variables and students’ contextual choices (Jacobucci & Grimm, 2020). Accordingly, the low reliability of the test for measuring prior knowledge in chemistry as well as for measuring the choice motive of personal relevance could also have led to underfitting of the true relationship. Furthermore, it must be questioned whether the selected variables (i.e., prior knowledge in chemistry, interest in chemistry) are sufficient to accurately predict students’ context choices. It is possible that, in addition to other personal variables, task-related or situational variables are also decisive here. Class imbalance and small sample size in the training set could also be relevant (Géron, 2020).

Regardless of the congruence of the context with personal characteristics or the possibility of choice, affective and epistemic component decreased over the duration of the context-based learning unit. The decrease from the first to the second subtask is possibly due to increased perceived task difficulty or a novelty effect of the work in the context-oriented learning environment on the tablet computer (e.g., Palmer, 2009). We found no significant main effect of the processed context on the affective and epistemic component of situational interest, so that it cannot be assumed that a context with a certain kind of characteristic generally leads to a higher situational interest (see also van Vorst et al., 2018; Habig et al., 2018, Güth & van Vorst, 2023).

In our data, we find support for the hypothesis that the congruence between the characteristic of a context and the individual characteristics of the students are crucial for beneficial effects on situational interest and satisfaction. After each subtask, students who learned in a matching context report higher affective valence and higher epistemic component of situational interest than students who learned in a non-matching context. Simultaneously, there were no differences between students who were assigned a matching context or who chose the context themselves at any measurement point. Task-related satisfaction also showed that learners would be more likely to process the contextual task again if they were learning in a context that was congruent with their individual characteristics. However, it became evident that some students who were supposed to work with a matching context would not work on the context again. Consequently, some students from the match treatments worked in a non-matching context. Based on this result, we adapted the treatments. This resulted in a choose & no match group, which has already been called for in previous research (e.g., Flowerday & Shell, 2015; Wilde et al., 2018). The reasons for the low task-related satisfaction after choosing a contextualized task in this group could be the thematic narrowness of the tasks or a fundamental lack of interest in chemistry.

Based on the adapted treatments, we were able to confirm the results of the original treatment with the assignment of contexts by the neural network (and Hypothesis 1a, 1b, 2a, and 2b). The choice of a non-matching context seems to provide a short-term advantage for the affective valence of situational interest compared to the assignment of a non-matching context.

The results fit seamlessly into the previously conducted research. Beneficial effects result only from congruence between characteristics of the context and learners. Provision of choice is irrelevant here (Flowerday & Shell, 2015; Patall, 2013; Wilde et al., 2018). When processing non-matching contextualized tasks, choice offers a short-term advantage in terms of the affective valence of situational interest. Thus, context choice is a short-term interest trigger (Høgheim & Reber, 2017) in the sense of a catch factor (see Mitchell, 1993). Accordingly, the congruence between the choice options and the individual characteristic of the student seems to be more decisive for satisfaction of the basic psychological need for autonomy than the opportunity to choose. Possibly matching contextual tasks lead to higher personal authenticity (Shaffer & Resnick, 1999).

Regarding cognitive load, we found that both mental effort and perceived task difficulty increased with a large effect from the first to the second subtask. This enormous increase in cognitive load is probably attributable to the shift from the macroscopic to the sub-microscopic level within the subtasks (Treagust et al., 2003). We found a main effect of the treatment on invested mental effort, but not on perceived task difficulty. Contrary to our hypothesis 3 and the results of S. Schneider et al. (2018), only the second measurement showed that students who had to choose showed more mental effort. However, these findings are in line with the cognitive load theory. Offering choices merely leads to further cognitive processes that increase cognitive load (Sweller et al., 1998). It is surprising that these results only appear after the second sub-task. Further research is needed to clarify whether this is a substantial effect or a statistical artefact.

Limitations and future directions

Our results are mainly limited by the performance of the neural network. The students’’ context choice does not completely match the prediction by the neural network. We have taken this into account by restructuring the treatment using task-related satisfaction, but it is still important to consider how accurate the assignment of learners to a treatment is. Since comparable models have hardly been trained in research, it remains to be clarified what performance can be expected from a machine learning model using latent variables.

In the context of this work, it was assumed that the individual learning requirements do not change during the learning process due to the short duration of the intervention and thus the appropriate or inappropriate contextual task remains the same. In future studies, which should focus on a longer period of time, it should be investigated to what extent the change in individual characteristics can lead to a change in the matching or non-matching contextual task.

The generalizability of our findings is essentially limited by the narrowly defined topic and chemical content of our context-based tasks. It would be desirable to investigate whether the results can also be replicated with other tasks. Our results do not indicate whether the contexts used were differentially authentic for the learners. In an earlier study, we only examined whether the contexts corresponded to the assumed characteristics (everyday, uncommon, laboratory). It is unclear whether contexts with these characteristics are perceived as having different levels of authenticity. Since authentic learning is also characterized by collaborative work on inquiry-based tasks (Nachtigall et al., 2022), subsequent research should focus on such learning environments. The question that arises here is whether the results can also be replicated for inquiry-based tasks that are embedded in different contexts. It would also be relevant to investigate how congruence between context and students can be established in collaborative learning environments.

Within this study, we did not survey learners’ perceived autonomy. In future work, it would therefore be desirable to assess whether learners who are allowed to choose a context experience the same level of autonomy as learners who are given a matching context. Resulting effects on cognitive and motivational outcomes would be relevant.

Conclusions

This paper provides strong evidence that students should learn in authentic contexts that match their individual characteristics in order to improve students’ affective learning outcomes. If we return to the initial question of the paper: Effects of choice depend on whether the context is congruent with the students’ requirements. If students learn in a context that fits their characteristics, it does not matter whether they have chosen the context by themselves or not. Choice between incongruous contexts brings a short-term advantage in situational interest. However, offering choice seems to be an easily implementable instructional method to create congruence between students and contexts. This requires at least one context related to students’ goals, interests, or values. Teachers or scholars need to consider the individual characteristics of the class or population when providing contexts for authentic learning in chemistry education to enhance affective learning outcomes for students. The extent to which these results are also reflected in cognitive learning outcomes needs to be clarified.