Introduction

Non-formal learning

Formal learning in institutions like schools or universities is based on a curriculum that usually focuses on knowledge acquisition and ends with a certification of learning goals and qualifications (Aßmann, 2013; Leu, 2014). In contrast, non-formal learning is also goal-oriented but takes place outside the formal learning systems. Its content is not determined through a curriculum and the learning process does not require final certification (Rogers, 2014). Non-formal learning has a complementary and sometimes compensatory function to formal learning (Commission of the European Communities, 2001). One institution that can complement school-based formal learning is the out-of-school lab. As an institution especially addressing schoolchildren, it offers additional and different learning opportunities compared to schools (Glowinski & Bayrhuber, 2011; Haupt et al., 2013). In out-of-school labs, the contents of projects do not need to be part of a (school) curriculum (Euler, 2005) but can be a thematic continuation or deepening (Euler & Schüttler, 2020). Participation in the projects is optional and the topics are thus freely selectable. Another important characteristic of out-of-school labs is that depending on the project, they aim to offer students the opportunity to act like real researchers and, therefore, to learn authentically (Betz, 2018; Glowinski & Bayrhuber, 2011; Sommer et al., 2018). According to LernortLabor (2019) as well as Haupt et al. (2013), the Alfried Krupp-Schülerlabor der Wissenschaften, in which we conducted our study, meets in our study conception the criteria of a classical out-of-school lab as students are visiting the lab with their whole class and the topic of this project is an extension of the school curriculum in physics. Further, the project day was didactically preplanned, and the goal of the day students should have reached by experimenting was already set.

Authenticity in learning situations

Working (authentically) in out-of-school labs means that students’ learning is inspired by researchers’ working approaches. Such inquiry learning involves using methods from the respective discipline (Betz, 2018; Glowinski & Bayrhuber, 2011; Haupt et al., 2013). In science, for example, this means “planning and conducting investigations, drawing conclusions, revising theories, and communicating results” (Lee & Songer, 2003, p. 923), but also working collaboratively and using typical materials of the respective discipline (Nachtigall et al., 2022). This is also in line with students’ expectations when visiting an out-of-school learning setting in the context of science lessons: students expect more time for experimentation than in school as well as working collaboratively (Nachtigall & Rummel, 2022). Further, they expect that the experiments are student-centered and that they can pursue their own ideas during the experimentation process (Garner & Eilks, 2015; Schwarzer & Parchmann, 2015). Additionally, students also expect a visit to an out-of-school lab to be authentic in the sense that it is similar to working like real researchers (Schwarzer & Parchmann, 2015).

Authentic learning, and along with it the perceived authenticity, has different advantages compared to school learning: It has a motivating effect on students (Betz, 2018; Glowinski & Bayrhuber, 2011; Nachtigall et al., 2018) as well as effects on students’ (situational) interest (Nachtigall & Rummel, 2021; Pawek, 2009), which are factors relevant for learning (Krapp, 1992), and especially important for making career decisions (Woods-McConney et al., 2013). It allows students to experience natural contexts addressing real problems, and thereby links the more abstract learning at school with the application of knowledge to the real world (Gerstenmaier & Mandl, 1995; Herrington & Oliver, 2000; Stamer et al., 2021). This should lead to the acquisition of scientific knowledge (Scharfenberg & Bogner, 2014). Further, through authentic learning situations, students can gain insights into the real work of scientists (Euler, 2005; Reimann et al., 2020; Scharfenberg & Bogner, 2014) and break down corresponding stereotypes, for example, that scientists are primarily male and wear a white coat (Christidou, 2011; Hagenkötter et al., 2021; Höttecke, 2001). This is important so that students can, for example, make informed decisions about their future careers (Euler, 2005; Scharfenberg & Bogner, 2014; Stamer et al., 2019, 2021) as well as to learn scientific reasoning for everyday life problems (Chinn & Malhotra, 2002; Mansfield & Reiss, 2020). Recent studies showed that students did not know about important aspects of scientists’ work, like interpreting data or collaboration between different researchers, and therefore, their judgments of authenticity in a learning situation were biased by their limited knowledge (Stamer et al., 2021).

Undoubtedly, there is a difference between researchers and students, for example regarding prior knowledge about the content and the inquiry process, but also regarding time and resource restrictions. Because of this, real research can be overwhelming for students (Lee & Songer, 2003). Therefore, it must be acknowledged that the authenticity of a learning situation can be classified on a continuum between a didacticized learning situation and the real world of scientists (Betz et al., 2016; Sommer et al., 2020).

Betz et al. (2016) suggested a model where perceived authenticity in learning situations is shaped by personal and learning setting characteristics. The perceived authenticity again may have an influence, for example, on students’ situational interest, concept development, and competencies. According to the model, aspects of the teaching situation such as location, method, content, material, innovation, and the instructor may influence on perceived authenticity and can be influenced through the instructor, whereas personal characteristics such as gender, prior knowledge, epistemic beliefs, and individual interest of the students are not selectable for the instructor. For example, a location could seem more authentic, if it has been prepared primarily for scientific purposes. A method is considered authentic if it is part of the scientific methods used in the discipline (Finger et al., 2022; Sommer et al., 2020). In any case, the teaching situation needs to be evaluated regarding its authenticity against the background of the discipline. The model has already been tested in the context of out-of-school labs through various studies in different subjects (history: Mierwald, 2020; linguistics: Betz, 2018; social sciences: Nachtigall et al., 2018; for an overview, see Nachtigall & Rummel, 2021)—however, empirical results on the assumptions postulated by the model were rather mixed. Betz (2018) conducted a study with the same tasks, but at two different places: in schools and in an out-of-school lab. When comparing the effects of the location on students’ perceived authenticity and their situational interest, the out-of-school lab was perceived as more authentic, and the situational interest was higher in the out-of-school lab. Nachtigall et al. (2018) as well as Nachtigall and Rummel (2021), for instance, varied the authenticity of the method from direct instruction to an approach closer to (social) scientists, namely productive failure and assessed students’ perceived authenticity as well as their situational interest. The productive failure approach simulated features of scientific inquiry, in which students worked in an initial phase on a complex problem without prior guidance and received canonical solutions to discover the limitations of their own ideas and to learn from their mistakes later on. In the direct instruction approach, typical solution strategies were presented to students before they worked on the problem. In contrast to their hypotheses, students in the more authentic productive failure approach did not report higher perceived authenticity than students who participated in the less authentic direct instruction approach. Additionally, Stamer et al. (2021) conducted a study in an out-of-school lab in which they compared two conditions with each other: watching videos of authentic scientists working and afterwards conducting the experiments versus just conducting the experiments. They found that students also watching the experiments have a better idea of scientists’ activities. Concerning the material, Schüttler et al. (2021) conducted a study in physics that compared students’ perceived authenticity of low-cost versus high-end laboratory equipment. As expected, they found that students’ perceived authenticity was significantly higher when the material was equivalent to that of real researchers. As can be seen, there are different possibilities to foster perceived authenticity in out-of-school learning opportunities. The rather mixed results for the variation of the method led us to conduct a study with varying methods in physics.

Experimenting as a learning method in physics

Experimenting is established with different functions in physics classes (Euler et al., 2015; Lazonder & Harmsen, 2016). To foster learning, it is necessary that students work not only hands-on but also with their minds, following their own ideas (Hofstein & Lunetta, 2004). This can be achieved through inquiry-based learning by allowing students to investigate a phenomenon in a “more authentic [way]” (Hofstein & Lunetta, 2004, p. 30), which means similar to the way used by scientists. According to the definition of Lazonder and Harmsen (2016), inquiry is a method where:

students conduct experiments, make observations or collect information in order to infer the principles underlying a topic or domain. These investigations are governed by one or more research questions, either provided by the teacher or proposed by the students; adhere (loosely) to the stages outlined in the scientific method; and can be performed with […] tangible materials […]. (p. 682)

Nevertheless, it is discussed whether inquiry-based learning supports effective learning of physical concepts (Hattie, 2009; Kirschner et al., 2006; Lazonder & Harmsen, 2016). In their meta-analysis, Lazonder and Harmsen (2016) conclude that guidance seems to be necessary for concept development through inquiry-based learning. However, students can be supported with different levels of guidance depending on their level of knowledge and experiences during the experimentation process (Bell et al., 2005). Bell et al. (2005), for example, distinguish in their model between four levels depending on the information about the research question, the method, and the solution that is given to the students. Further, Tesch and Duit (2004) found that in German physics lessons, the level of openness is rather low as students have nearly no opportunity to plan or analyze an experiment on their own. Chinn and Malhotra (2002) found similar results when analyzing textbook tasks as well as tasks designed by researchers: students mostly do not follow their own research questions, often do not need to conduct several studies, or do not have the task to develop a theory based on their experimental results. This is in line with the findings of Börlin (2012) that students in German physics classes have only seldom the chance to decide which experiments to conduct or in which order they want to conduct the experiments during station learning. Thus, students cannot conduct an inquiry process directly without any instructional support, for example by the teachers. They rather need especially at the beginning more support which then can decrease (Bell et al., 2005). Further, Horstendahl et al. (2000) postulated that students with high competencies and high motivation can conduct experiments with fewer instructions whereas students with low competencies and high motivation better receive more instructions to avoid overwhelming them. In line with this, Euler (2005) found that students with less interest in science need more guidance during an experimentation process. Overall, it must be considered that inquiry for school students is relatively authentic compared to the work of scientists as it is not close to authentic inquiry tasks (Chinn & Malhotra, 2002).

When using experimentation for concept acquisition, the predict-observe-explain-approach (POE) by White and Gunstone (1992) is often suggested (Hofstein & Lunetta, 2004) as it specifies the question and the task. During POE, students first need to predict the outcome of an experiment, then conduct the experiment and observe it, and afterwards, give explanations for the observed situation. In the last step, there may occur differences between predictions and observations and students receive the possibility to discuss these discrepancies. As a result, POE helps students to investigate on their own with the goal to work minds-on as well as hands-on (Hofstein & Lunetta, 2004; White & Gunstone, 1992).

The present study

In the present mixed methods study, we examined students’ perceived authenticity and their perception of authentic research by varying the intended authenticity of the method. Therefore, we used an experimentation process with two different levels of guidance in an out-of-school lab. The students investigated through an experimentation process the optical phenomenon of the so-called sun thalers (Schlichting, 1995). This refers to the round spots of light that appear on the ground under a canopy of leaves when sunlight passes through the holes. We decided to use this phenomenon as students have no prior knowledge about it because varying the light source is not part of the German school curriculum and, furthermore, because the experimental setup is simple and not dangerous. Students were asked to investigate this phenomenon in small groups: They were given four different light sources (single and multiple LED spots, one elongated and one extended bulb) and three differently shaped apertures (circular, triangular, and rhombic) to examine the pattern that occurs (Wosilait et al., 1998). We had two different intervention groups: Guided experimenting students received instruction on which five combinations of light source and aperture they should use in which order. The combinations considered the variable control strategy and the bridging strategy (Clement, 1993). The sequence of the combinations got increasingly cognitively demanding and closer to the real situation of sun thalers. For the sixth and final experiment, the light source and aperture were freely selectable for guided experimenting students. Self-determined working students needed to select the combinations of light source and aperture for all six experiments on their own since they were not given any specifications for this.

Research questions and hypotheses

Our research was led by three research questions and their related hypotheses: In Research Question 1, we aimed to explore the influence that the level of guidance has on students’ perceived authenticity. With regard to this question, we hypothesized the following:

H1.1: Self-determined experimenting students experience their learning as being more authentic than guided experimenters.

Following Betz et al. (2016) and Finger et al. (2022), we assumed that self-determined experimentation can be located closer to the work of researchers on the continuum between a didactic learning setting and the real professional world. Furthermore, we expected the following in particular:

H1.2: Self-determined experimenters have especially on the dimension “methods” a higher perceived authenticity then guided experimenters.

The difference between the self-determined and the guided intervention was the method during the experimentation process with the self-determined intervention intended to be relatively more authentic due to less instructions and, thus, being closer to the real world of scientists. Further, as students expect student-centered experimentation processes in small groups in out-of-school learning settings (Garner & Eilks, 2015; Schwarzer & Parchmann, 2015), the self-determined intervention was closer to this expectation of students for out-of-school lab visits. Therefore, in addition to H1.1, we especially expected a difference regarding this component of the perceived authenticity in favor of self-determined experimenters as their experimentation process was with less instructions and accordingly was more open for students pursuing their own ideas.

In Research Question 2, we examined what reference students use in their assessment judging their perceived authenticity. The questionnaire for the assessment of perceived authenticity in science education by Finger et al. (2022), which is used in this study, is based on a specific understanding of authentic research, but there is no guarantee that students completing the questionnaire will have the same understanding of research in mind. Even when students are really working like researchers, it is probable that they do not recognize it, because they have limited knowledge about the work of researchers (Stamer et al., 2021). Previous studies have already shown that students have different ideas about science and scientists, for example, that scientists are mostly men in a laboratory, who can be role models for society, but their personality can also be eccentric and their actions immoral (for an overview see Christidou, 2011). Thus, we complemented the quantitative assessment of perceived authenticity with student interviews to explore the understanding of authentic research of students from our sample and to better understand their judgments. This analysis was explorative.

Finally, in Research Question 3, we investigated to what extent perceived authenticity predicts students’ learning outcomes. Thereby, we hypothesized the following:

H3.1: Students with higher perceived authenticity show higher learning outcomes than students with lower perceived authenticity.

According to the model by Betz et al. (2016), perceived authenticity has, among other things, an impact on knowledge acquisition, and therefore, with a higher perceived authenticity, the learning outcomes should be higher. Nevertheless, empirical findings are ambiguous in this regard: Scharfenberg et al. (2007) found a significantly higher learning success for groups experimenting in a laboratory in comparison to groups learning non-experimental in school, which includes a difference in favor to the laboratory regarding a more authentic location as well as a more authentic method. For example, Nachtigall et al. (2018) could not find a significant effect of perceived authenticity on the learning outcome.

Methods

Conceptualization of the mixed methods study

In order to gain a deeper understanding of students’ learning through authentic experimenting in a non-formal learning setting and their perceptions of authentic research, we conducted a mixed methods study (Johnson et al., 2007). We used a (convergent) parallel design for our study as both qualitative and quantitative methods are performed independently, but the respective data were collected (nearly) simultaneously and were combined in the interpretation of the results (Schoonenboom & Johnson, 2017; Teddlie & Tashakkori, 2010). Both sets of data were used to focus on the research questions from different perspectives.

Sample

Our sample consists of N = 142 seventh and eighth grade students (female = 63, male = 77, diverse = 2; \({M}_{{\text{Age}}}\) = 13.04, \({SD}_{{\text{Age}}}\) = 0.65) from six different classes of two German secondary schools. Teachers voluntarily registered for the project day based on their interests. The participating classes were randomly divided into small groups that were alternately assigned to one of the two interventions. In each class, one group per intervention was randomly selected to be videographed during the experimentation process and interviewed after the project day. For the quantitative data analysis, the sample was reduced to N = 139 valid cases, where each instrument was filled out. N = 69 students in 25 different groups experimented guided, while N = 70 students distributed over 23 different groups experimented self-determined. The interviews were voluntary and took place a few days after the visit to the out-of-school lab in the afternoon during the students’ free time. Thus, the sample for the qualitative content analysis is based on N = 24 students from 11 different groups.

Study design

In the following, the design of the intervention as well as the procedure will be presented.

Design of the intervention

In our study, students experimented guided or self-determined during an unassisted experimentation process for exploring the phenomenon of the sun thalers. To create a cognitive conflict, the students in both groups were first shown that behind a triangular aperture, there is a triangular light spot if it is illuminated with a punctual light source, but surprisingly, a circular light spot if an extended circular light source is used. Then, they were instructed to solve this in a series of experiments. Guided experimenting students received a prescribed sequence of the combinations of light sources (punctual light source, multiple (i.e., 28) LED spots, elongated light source, and one extended circular bulb) and apertures (circular, triangular, rhombic) for the first five of their six experiments. Thereby, the sequence is based on the variable control strategy combined with the bridging strategy (Clement, 1993) closing stepwise the gap between the two initial experiments. Only for the sixth experiment students were allowed to freely select the combination of light source(s) and aperture. In contrast, self-determined experimenters received the same materials but no instruction on when to use which light source respectively which aperture. Therefore, they needed to decide for all their six experiments which material they wanted to use to investigate the phenomenon.

When using the introduced authenticity model by Betz et al. (2016) for classifying our study design into their proposed learning setting characteristics, we varied the method. The self-determined intervention is intended to be relatively more authentic as students, like researchers, did not receive direct instructions about which materials to use and when. Therefore, self-determined experimenters are not able to follow a more “simple, algorithmic procedure” (Hodson, 1999), which is one of the myths about science which is often propagated in science classes (Hodson, 1999). It must be noted that the given materials (i.e., light sources and apertures) may not perceived as particular authentic materials by the students, since they are quite ordinary and (partly) familiar to the students (Braund & Reiss, 2006; Nachtigall et al., 2022). But they are not dangerous and thus can be used by students independently without strict supervision for experimenting in small groups. Working in small groups should contribute to the authenticity (Betz et al., 2016), because it can be regarded to be more similar to researchers’ work mode (Chinn & Malhotra, 2002). However, the limited amount of time in the out-of-school lab project can be seen as a restriction compared to authentic research because there is a difference between an authentic timeframe and an educational timeframe for an inquiry process (Hod & Sagy, 2019), and in our study, the authentic timeframe was limited by the project day.

With regard to the other aspects of authenticity, i.e., location, instructor, and innovation, the following applied: students worked in a real laboratory in an out-of-school lab at a university as location. The instructor was a real researcher, even if her appearance did not correspond to typical students’ conceptions about scientists (Hagenkötter et al., 2021; Höttecke, 2001) as the instructor was younger and female and did not wear glasses. The context of the sun thalers is a real-life phenomenon, which can be observed in students’ everyday life. Investigating such an observable phenomenon through simplified model experiments in the laboratory is a common approach for inquiry tasks in classrooms, even when the simplification is a known limitation of model experiments (Chinn & Malhotra, 2002). However, the novelty value for science is low since the phenomenon has already been explored. Nevertheless, the phenomenon was unknown to the students and, therefore, could be an innovative topic for experimentation.

Overall, the self-determined intervention is intended to be relatively more authentic on the continuum between a didacticized learning setting and the world of real researchers (Betz et al., 2016; Hod & Sagy, 2019; Sommer et al., 2020), but it needs to be noted that it is still not authentic research in the eyes of scientists as they are some restrictions (i.e., less authentic material due to its characteristics) for the project day as explained above.

Procedure

Each participating class completed the same project day, which lasted 4.5 h in total with an additional 1.5 h of break time. The day started with the pre-test, which was to be filled out on a computer. This was followed by a repetition of rectilinear light propagation and light ray paths were considered from the various light sources to be used in the experimental process. Further, we tried to induce a cognitive conflict through two demonstration experiments. First, we showed a punctual light source directed at a triangular-shaped aperture which resulted in a triangular-shaped light spot, and afterwards, we used the extended circular light source, which produced a circular image for the same aperture. This image is formed because the triangular images of the many point-like light sources of the extended light source overlap in a circle (Schlichting, 1995). Since imaging with extended light sources is rarely considered in school, this image was not expected by students, causing surprise and a cognitive conflict. Thus, the students were challenged to solve this in their small groups with experiments following the question: What influence do the shape of the light source and aperture have on the image?

After the necessary experimental material had been distributed, all students were given 10 min to freely explore the materials and develop their first ideas. Next came a period of 64 to 87 min (\({M}_{\mathrm{Length Exp}.\mathrm{ process}}\) = 80 min, \({SD}_{\mathrm{Length Exp}.\mathrm{ process}}\) = 9 min) with more structured investigations in which students conducted six experiments according to the POE scheme in their small groups (White & Gunstone, 1992). Each participant was encouraged to write down his or her predictions, discoveries, and explanations in a prepared lab book.

Having finished the experiments, the students answered the post-test and the questionnaire about perceived authenticity on the computer. Finally, the results of the experimentation process were discussed with the whole class and the results were applied to the phenomenon of the sun thalers. For organizational reasons, interviews with two randomly selected small groups of each class could not take place on the same day. For eight groups, the interview was conducted between 1 and 3 days after the project day, three groups had the interview 16 respectively 25 days later, and one group dropped out. Usually, the interviews were conducted via the video conferencing system Zoom. In one case, the interview took place at the school. The interviews lasted between 13:23 and 36:20 min.

Measures

Perceived authenticity

We measured students’ perceived authenticity with the questionnaire of Finger et al. (2022) which is based on the model of authenticity by Betz et al. (2016). The questionnaire allowed a validated multidimensional assessment of perceived authenticity for out-of-school lab learning settings. The basis of the questionnaire is the understanding that authentic research in an out-of-school lab is when students experience the everyday life and working methods of researchers in their discipline as well as the complexity of their field of work (Finger et al., 2022). The questionnaire consists of 13 items (Cronbach’s α = 0.84) that reflect the four dimensions of authenticityFootnote 1: method (4 items; Cronbach’s α = 0.70; “I myself proceeded as one proceeds in research.”), location (3 items; Cronbach’s α = 0.82; “I worked at a place where research was being done.”), instructor (3 items; Cronbach’s α = 0.67; “I was assisted by a real researcher.”), and innovation (3 items; Cronbach’s α = 0.79; “I helped research answer an important question.”). Each item was assessed on a 5-point Likert scale from 1 = completely wrong to 5 = completely correct. Further, all items are on the authenticity-continuum scale close to the reality of scientists.

Implicit subjective understanding of authentic research

We conducted group interviews with the small groups that were selected to be videographed. Each interview consisted of five parts regarding the experimentation process and students’ perception of it: We started the interviews with an open question about the way of experimenting and whether the students had already experience, plus what was different regarding experimenting in school. Next, we asked students to describe and explain their specific procedure during the experimentation process. This was followed by questions about authentic research. In the fourth part, students told us about their knowledge gained through the experiments, and finally, we finished the interviews with students reporting their personal challenges. All interviews were semi-structured and each part started with an open conversation prompt that was followed up by more specific questions if topics were not already covered in the talk (Brenner, 2006). This allowed us to strengthen the comparability between the different group interviews. Furthermore, students of that age are used to a certain level of guidance in discussions from school which makes the situation more familiar (adapted from Grecu et al., 2022). In this paper, our focus for the analysis is on the part about what is authentic research for the participating students. This part of the interviews covered the students’ ideas of what research is and if they think they conducted research during the project day. Through this, we wanted to analyze students’ implicit subjective understandings of authentic research (König & Volmer, 2020).

Prior and post knowledge test

Prior to and after the experimentation process, students answered a knowledge test about geometrical optics. In the pre-test, four different areas from geometrical optics were covered by 19 items: light propagation (5), shadow (5), aperture (5), and reflection (4) ((adapted) items of Teichrew & Erb, 2019; Haagen-Schützenhöfer & Hopf, 2013; Mavanga, 2001; McDermott & Shaffer, 2009; self-constructed items). We used two-tier items: The first tier assessed students’ knowledge about the topic and the second tier assessed students’ conceptual reasoning. Due to Rasch analysis and classical discriminatory power analysis applied to the pre-test data, eight items had to be excluded. Thus, the pre-test results of the students are calculated based on 11 items (Cronbach’s α = 0.59).

To keep the test time short, only items related to aperture imaging were used in the post-test. For this purpose, four additional items were added to the five items of the pre-test. Due to the results of the Rasch analysis and the discriminatory power analysis, we had to exclude three items. Thus, all results related to the post-test are based on the remaining six items (Cronbach’s α = 0.56).

For each item, students could receive up to two points in our rating: one for the correct first tier and the second point if both tiers were answered correctly. If just the second tier was answered correctly, students did not receive a point. Since we dropped the items concerning light propagation in the post-test, we do not expect the repetition of rectilinear light propagation and light ray paths to have a direct impact on the post-test results.

Results

For all calculations (except the preliminary analyses), the sample is based on N = 139 valid cases. The sample for the qualitative content analysis is based on N = 24 students from 11 small groups. Quantitative analyses were conducted with SPSS 28 and the Rasch analysis software Winsteps (version 4.5.4). For qualitative analyses, MAXQDA 2020 was used.

Before more in-depth analyses, a χ2-test regarding gender, an ANOVA regarding students’ grades in physics, mathematics, and German, and regarding prior knowledge were calculated to ensure that both intervention groups are comparable. It has to be noted that not all students wanted to share their grades (valid cases out of N = 139 students—physics: N = 105; mathematics: N = 133; German: N = 135).

The samples for the two interventions did not differ statistically significantly regarding gender (χ2 (2) = 2.63, p = .27, ϕ = 0.14). Neither differences in grades (physics, F(1,103) = 1.29, p = .26, η2 = 0.01, 95% CI [0.00; 0.08]; mathematics, F(1,131) = 1.59, p = .21, η2 = 0.01, 95% CI [0.00; 0.07]; German, F(1,133) < 1) nor in the pre-test (F(1,137) = 1.07, p = .30 (two-tailed), η2 = 0.01, 95% CI [0.00; 0.06]) were found between guided and self-determined experimenting groups.

What influence does the level of guidance have on students’ perceived authenticity (RQ1)?

First, descriptive statistics will be presented in Table 1. Since not all dimensions were assessed with the same number of items in the FEWAW, the mean score is calculated for each dimension. This makes the dimensions comparable on the same scale from 1 (completely wrong) to 5 (completely correct).

Table 1 Means, standard deviations, and one-way analyses of variance for the four dimensions of perceived authenticity

We then calculated a multivariate analysis of variance (MANOVA) with the four scales of the questionnaire as dependent variables and the intervention groups as factors, for comparing students’ perceived authenticity based on their level of guidance. The MANOVA revealed no differences between guided and self-determined experimenting students on the combined dimensions of perceived authenticity (F(4, 134) = 0.90, p = .464, \({\eta }_{p}^{2}\) = 0.026, Wilk’s Λ = 0.974) (H1.1). Furthermore, as can be seen in Table 1, even for the dimension method where we expected a difference in perceived authenticity depending on the level of guidance, no significant difference was found (H1.2).

Accordingly, our hypotheses that self-determined experimenting students perceived the learning situation as more authentic (H1.1), especially with respect to the method (H1.2), cannot be confirmed. Guided experimenters even reported slightly higher scores on all scales, although the differences were minimal and not significant.

What reference do students use in their assessment judging their perceived authenticity (RQ2)?

For the qualitative analysis, we first created anonymous transcripts of the interviews according to the guidelines of Lamnek and Krell (2016). Afterwards, we conducted a qualitative content analysis (Kuckartz & Rädiker, 2022) with the goal to extract students’ individual subjective understanding (König & Volmer, 2020) of research and to cluster these statements into a category system we developed deductively-inductively. We began by marking statements of students that referred to what they understand by research and what they do not evaluate as such. These statements were then assigned to the categories of method, location, instructor, and innovation from the questionnaire of Finger et al. (2022). We also found statements which did not belong to either of the four dimensions; these were first marked as such and later categorized into our inductively developed categories. For this purpose, we have elaborated on the implicit subjective understanding concealed behind a statement by examining the students’ expressed perception of research for each statement. Furthermore, similar statements were clustered into categories. The coding process was iteratively done in multiple coding cycles by the first author and results were discussed in the team several times.

Finally, the created codes of students’ implicit subjective understanding of authentic research could be assigned to 13 categories (for an overview, see Table 2). These categories are presented in the following and are illustrated by short quotations from students’ statements in the interviewsFootnote 2 (Elo et al., 2014; Kuckartz, 2019). The quotations were translated from German and were smoothed out linguistically. We first consider the categories used in the FEWAW questionnaire.

Table 2 Overview of our category system
  1. 1.

    Method

    The dimension method refers to statements about the guidelines during a research process. These ranged from researchers having no guidelines at all (“We had instructions. And researchers have these mostly I think probably not” (January 17, 2022, guided)) to researchers sticking to predefined processes (“[it was typical for research that] we got a chronological order of the experiments” (December 6, 2021, guided)).

  2. 2.

    Location with specific rules

    For the second dimension location, we found students’ statements expressing the idea that researchers work in special professional laboratories (“we were sitting in a laboratory where actually other things are being researched” (February 7, 2022, self-determined)) where certain rules apply (“I find that is so typical for research that you have rules that you have to follow.” (December 7, 2021, guided)). Students were unanimous in this regard.

  3. 3.

    Instructor

    Regarding the dimension instructor from the questionnaire, no references were found in the interview data.

  4. 4.

    Innovation

    The category innovation includes statements that refer to the extent to which research or the subject of research and the resulting outcome must be something new. Thereby, students differentiated between something new for themselves (“so for us, I think it was already research, because we didn’t know what the outcome would be” (December 6, 2021, guided)) and for researchers (“that’s something that quite a few people have done before us and for me, that would be when you really find something new.” (December 13, 2021, guided)). Two statements also indicated that it is research even when contents are already explored if you get closer to the true answer (“that we have researched for ourselves, but we came closer to the correct result, so we could already guess the result a little bit, therefore I would say that we have already researched but not like real scientists.” (December 13, 2021, guided)).

    In addition to the four dimensions from the questionnaire, we extracted the following categories inductively from the material: 5. research interest, 6. topic, 7. scientific practices including the subcategory experimenting, 8. working mode, 9. time constraints, 10. material and devices, 11. collaboration, 12. hints, and 13. belonging to our data acquisition. The categories 7 to 12 are all part of the work of researchers, but we decided not to categorize them as subcategories of “method.” Statements in the category “method” were especially focused on the level of openness in guidelines during a research process, while statements in the categories 7 to 12 were independent from the different levels of guidance.

  5. 5.

    Research interest

    While the category innovation includes statements about the degree of novelty, the category research interest includes statements about the goal and purpose respectively motives of research and researchers, i.e., the interest in knowledge in general. Students in our sample have the understanding that researchers ask themselves questions and find answers to these questions, for example by using experiments. This self-questioning is assumed to be intrinsically motivated as can be seen by the following statement: so you ask yourself all your life questions about it and then you just want to find out, so then you do research, and you also think not only about the possible things, but also about the things that are actually not possible and then you just investigate it. (December 9, 2021, self-determined)

  6. 6.

    Topic

    The category topic covers statements about researchers having different subjects and contents of interest in knowledge such as “[Researchers] work with carbon dioxide or so” (January 17, 2022, self-determined) and “Everything can be the content of research” (December 6, 2021, self-determined). These statements range from less to more specific ideas of topics.

  7. 7.

    Scientific practices including the subcategory experimenting

    Statements about scientific practices include, for example, documenting or testing hypotheses, “[we] really researched, because everything that was done was documented. One has also, so at least WE have also documented in detail, so that one can derive something from it afterwards” (December 9, 2021, self-determined) or “we just proceeded according to such a scheme, always justified [our results], we have thought about why […] and just also have made sketches” (December 7, 2021, guided). Further, statements specifically about experimenting as a scientific practice were assigned to the subcategory experimenting. Examples from this category are “[Researchers] experiment” (December 13, 2021, self-determined) and “[It speaks for research that] we found out everything on our own with experiments.” (December 6, 2021, self-determined).

  8. 8.

    Working mode

    In contrast, statements about the working mode emphasize the behavior like “[we did not work like researchers,] our concentration speaks against it, I would say.” (December 7, 2021, self-determined).

  9. 9.

    Time constraints

    Students’ statements unanimously indicate that researchers have no time constraints, for example, “They don’t have such time constraints, of course, so I think they’re freer with time” (December 7, 2021, guided) and “[I do not think it is typical that] we always had to wait for the others until they had finished their experiment and we always only had a given time how long we [were allowed to] need” (February 7, 2022, self-determined).

  10. 10.

    Material and devices

    Statements in the category material and devices refer to the subjective understanding of the equipment used by researchers such as “[Research is] with things that you don’t have in your everyday household, that you need certain material for it” (January 17, 2022, self-determined). Students were in agreement that research equipment is not commonplace but can be used in other research projects as well.

  11. 11.

    Collaboration

    Students think that researchers are working together in groups (collaboration) as was the case during the out-of-school project day. An example is “[…] in research to what I have noticed so far, often [one] researches among other people, that you discuss with, and that was for me just quite like research on that day, that you did something in a group” (December 6, 2021, self-determined).

  12. 12.

    Hints

    However, the interviewed students have the understanding that researchers (unlike in the out-of-school project) do not receive feedback or advice from outside of their research group (hints): “Not typical for research was that you got support […] if you had questions, they were answered” (December 9, 2021, self-determined).

  13. 13.

    Belonging to our data acquisition

    Statements related to our testing such as “[that we needed to answer] a knowledge test on a computer” (February 7, 2022, self-determined) were categorized as belonging to our data acquisition.

To what extent does perceived authenticity predict students’ learning outcomes (RQ3)?

Descriptive results of the learning outcome in the form of the post-test results can be seen in Table 3. As the ANOVA shows, there is no significant difference in the learning outcome depending on the received level of guidance.

Table 3 Means, standard deviations, and one-way analysis of variance for post-test results

A mediation analysis was conducted to analyze whether learners’ perceived authenticity of the method (M) mediates the effect of the intervention (X) on the post-test results (Y) (H3.1). Considering the bootstrap confidence interval (see Fig. 1), the mediation analysis revealed no indirect effects of the intervention on the post-test results. Further, the mediation analysis demonstrated that students who perceived the method as more authentic achieved better post-test results (see Fig. 1).

Fig. 1
figure 1

Mediation model with post-test results as outcome variable

Discussion

Authentic learning situations allow students to work like real researchers and apply their knowledge to the real world, which is usually not possible through more abstract learning in school (Gerstenmaier & Mandl, 1995; Herrington & Oliver, 2000). This has, for example, motivating effects as well as effects on conceptual development and competencies (Betz et al., 2016; Glowinski & Bayrhuber, 2011). But a learning situation intended to be more authentic is not always subjectively perceived as more authentic by the students (Gulikers et al., 2008; Nachtigall et al., 2018), and therefore, the expected effects could not always be measured (Nachtigall et al., 2018). Consequently, subjectively perceived authenticity must be taken into account for finding effects on, for example, conceptual development in analyses. In this mixed methods study, we compared the effects of different levels of guidance on perceived authenticity (RQ1). In addition, the understanding of authentic research that students expressed in the interviews was analyzed (RQ2). Finally, we investigated whether students with a higher perceived authenticity have a higher learning outcome than students with a lower perceived authenticity (RQ3).

For RQ1, we did not find a difference in the perceived authenticity depending on the level of guidance (H1.1). This applies both to the summarized perceived authenticity and to the single dimensions method, location, instructor, and innovation. Although this result stands in contrast to our hypotheses (H1.1, overall perceived authenticity; H1.2, perceived authenticity of the method), it is consistent with the results of Nachtigall and Rummel (2021) for learning activities in an out-of-school lab for social sciences in which the authenticity level of learning activities did likewise not affect students’ perceived authenticity. These findings could be explained by the proposal of Nachtigall (2019) that more prominent factors may overshadow the other dimensions of an authentic learning situation. In our case, this would refer to the location, which was perceived as most authentic from students. Following Hod and Sagy (2019), the timeframe has an impact on the authenticity of a learning situation. Thereby, we had two different timeframes: Overall, our project day was from 9 am to 3 pm and students worked all day on the same topic. This is closer to a normal working day and could be in favor of the perceived authenticity. On the other hand, the time for experimenting was limited for each experiment, and thus, this could lower the perceived authenticity. Therefore, it is possible that the timeframe had a limiting influence on students’ perceived authenticity.

Considering RQ2, it can be assumed that students have a different perception of research and how researchers work than the questionnaire of Finger et al. (2022) expects. For instance, the item “I myself proceeded as one proceeds in research.” does not specify how one in research proceeds, and therefore, students needed to use their own understanding of authentic research for answering this item; as can be seen in the qualitative analysis, students had different understandings of an authentic method. Gulikers et al. (2008) found similar results: When students and teachers were asked about their perception of authenticity of the same learning situation, students rated the authenticity significantly lower than teachers. Further, the overview of Christidou (2011) is in line with our ambiguous results as in her meta-analysis, students had different images of science as well as of scientists. When interpreting the results of the questionnaire, it must be considered that students may have different understandings of research. Consequently, the results cannot simply be aggregated by intervention and compared with each other, when we cannot guarantee that they have the same understanding of research. Hence, the test instrument should be supplemented with the request that students explain their understanding of research for being able to properly interpret the results. Thus, the mixed method approach was beneficial to gain a deeper understanding of students’ responses and their interpretation of authentic research.

Regarding the learning outcome, we did not find a difference depending on the intervention. Further, while analyzing the effect of the perceived authenticity of the method on the learning outcome with a mediation analysis (RQ3), the mediation analysis demonstrated a significant, positive effect of the perceived authenticity of the method on the post-test results. This result suggests that the mediator variable exerted a higher effect on the post-test results than the intervention during the experimentation process. As there was no mediation effect of the perceived authenticity of the method, mediation through other variables seems to be a possible solution for being better able to explain our results. Such variables could be personal characteristics such as situational interest or motivation, which were not assessed in this study. Both may be improved through authentic learning situations (Betz et al., 2016; Itzek-Greulich et al., 2017; Schüttler et al., 2021) and correlate with perceived authenticity (Nachtigall & Rummel, 2021). It was shown previously that self-determined experimenting had a positive effect on students’ intrinsic motivation which itself often leads to a better learning outcome (Euler, 2005).

Surprisingly, however, the different interventions themselves did not have an impact on the learning outcome. This result might be attributable to three possible reasons: (a) lack of prior knowledge to produce learning-relevant situations when experimenting self-determined, (b) lack of preparation and post-processing of the project day, and (c) our measures did not capture the differences in authenticity as well as in the learning outcome. With regard to students’ lack of prior knowledge (a), we observed that the sequence of experiments of self-determined experimenting groups was mostly not based on the variable control strategy for choosing the next combination of light source and aperture. When counting how many self-determined experimenting groups selected experiments based on variable control strategies, just nine out of 23 groups were able to control the variables for three successive experiments. No group conducted experiments with at least four out of the six experiments based on the variable control strategy. Thus, it is possible that self-determined experimenters did not produce learning-relevant situations in which they could have better investigated the phenomenon of sun thalers. This could be explained by students’ missing prior experiences from school learning regarding conducting their own experiments. Out of the 70 self-determined experimenters, 54 reported that they conduct experiments (really) seldom in school, which may be enhanced by the restrictions during the corona pandemic. But with the rather limited scope for action due to the limited possibilities for the combination of light source and aperture as well as the restriction to six experiments, it seems to be of no difference for the learning outcome whether the guided experimenters systematically investigated the phenomenon or the self-determined experimenters unsystematically. Future studies could therefore examine whether the intervention has a significant impact on the perception of the authenticity as well as on the learning outcome in a more complex setting, with correspondingly a greater scope for action. In more complex learning settings, students may, in fact, require a less authentic but more supportive approach, particularly if it offers them the necessary structure to facilitate learning. Repeated visits might be beneficial, especially for self-determined experimentation, as it is then possible to learn how to systematically create situations that are relevant to learning (Zehren et al., 2013).

Regarding the lack of preparing or post-processing the project day in school (b), we did not offer teachers such material. Previous studies have found that the effects of the out-of-school lab are more sustainable when students received materials and time at school for preparing as well as post-processing the project day (Euler, 2005; Guderian, 2007; Reimann et al., 2020). For example, Reimann et al. (2020) found a stabilization in knowledge for students with preparation as well as follow-up phases while students’ knowledge without preparation or follow-up at school decreased. Garner and Eilks (2015) emphasized the relevance of integration into the school “as essential elements for the success of out-of-school learning” (p. 1199) as students then perceive the experiences as less unrelated events. However, even if student laboratory projects are integrated in such a way, positive effects do not automatically occur and may even fail to appear (Schwarzer, 2020).

Finally, we cannot guarantee that our measures were able to capture differences between the self-determined and the guided experimenting students regarding their perceived authenticity as well as their learning outcome (c) as the difference between both interventions was rather small. It has to be noted that it is possible that there were differences between both interventions, for example, in regard to students’ perceived autonomy as self-determined students were allowed to make more decisions than guided students, which we did not measure.

Overall, the results of this study show little impact of the method on the perceived authenticity and of the perceived authenticity on the learning outcome. We found in our study that the reason might be that students do not really know what it means to work like a researcher. This is in line with the findings of Stamer et al. (2021). This means that there is a need to improve students’ understanding of authentic science to achieve the goals of authentic learning. Namely, these goals are effects on motivation (Betz, 2018; Glowinski & Bayrhuber, 2011; Nachtigall et al., 2018) as well as (situational) interest in science (Nachtigall & Rummel, 2021; Pawek, 2009), informed (career) decisions (Euler, 2005; Scharfenberg & Bogner, 2014; Stamer et al., 2019, 2021; Woods-McConney et al., 2013), acquisition of scientific knowledge (Scharfenberg & Bogner, 2014), and scientific reasoning (Chinn & Malhotra, 2002; Mansfield & Reiss, 2020).

Limitations

The present study has some important limitations. Firstly, our knowledge tests each have rather low reliability. Even after item exclusion by reliability and Rasch analyses, reliabilities are Cronbach’s α = 0.59 (pre-test) and Cronbach’s α = 0.56 (post-test). This means that our knowledge tests did not measure peoples’ knowledge about geometrical optics in general (pre-test) and aperture illustrations in particular (post-test) consistently (Boone et al., 2014). The low reliability in both tests can be explained by the response behavior of the students if they do not have a stable concept. The problem of a rather low reliability regarding knowledge tests in this age group is not untypical (see for example the studies of Chu et al. (2009) and Härtig et al. (2019)). Another limitation is that according to the model by Betz et al. (2016), the perceived authenticity is influenced by personal characteristics such as interest which we did not assess. Therefore, we cannot analyze our sample regarding personal characteristics as predictors for perceived authenticity. Further, as already discussed, students’ perceived authenticity may vary and even be lower than the perception of teachers of the same situation (Gulikers et al., 2008). Students’ perception of authenticity was not assessed in advance, but only at the end through the questionnaire respectively after the project day through the interviews. Finally, the students in our sample are only from two different schools. Thus, the generalizability of the results may be limited.

Implications for further research

After conducting the study, there are several implications for future research. First of all, due to the positive effects of preparation as well as follow-up of student lab projects on interest, motivation, and knowledge development (Euler, 2005; Guderian, 2007; Reimann et al., 2020), both should be integrated. A possibility for this could be the preparation of a lesson, including materials, for schools in which basic knowledge like light propagation is repeated so that students are more familiar with this. For the follow-up, students could discuss open questions with their teachers respectively the instructors of the project day. When using the videoconference system Zoom for this, this would even make it easier as research showed (Yonai et al., 2022).

Additionally, the interaction between the different dimensions of authenticity (location, method, instructor, and innovation) should be examined. In the present study as well as in the studies by Nachtigall and Rummel (2021) and Nachtigall et al. (2018), the perceived authenticity of the location may have overshadowed the perception of the learning method and have thus led to nonsignificant results. It remains to be investigated whether the method, as a conceptual aspect during a research process, is perceived as more authentic by the students when the location, as a physically tangible aspect during a research process, is (perceived) less authentic, for example, because the project day takes place in the school and not in an out-of-school lab.

Further, as our interview data showed, students had different understandings of research and did not use the same understanding as a basis for answering the questionnaire. Another implication for further research could be to explain to students explicitly what research is and which domain-specific methods are used, so that they have a comparable idea of research.

In addition, it would be interesting to identify different types that cluster students with similar understandings of research. Therefore, a larger sample may be needed than the one used in the qualitative sample as well as a selection of the developed categories that should be answered by the students for easier comparability, either during an interview or through a questionnaire. The questionnaire could be created based on our results. Further, we conducted a qualitative content analysis of students’ subjective understanding of authentic research due to the scope of our data. Another possibility would be a more reconstructive way of analyzing students’ understanding as this would allow them to gain a deeper insight.

Finally, we did not examine whether the different groups had a productive, collaborative experimentation experience with relevant learning situations. Based on this, we also did not analyze the effect of group work on individual perceptions of authenticity. This seems to be interesting as different group constellations could have an influence on the learning experience, for example, when in one group one member takes the role of the teacher and guides the remaining group members. This could have a negative impact on the perceived authenticity of the method as it became less open.