Introduction

Implementing digital technologies into teaching can increase the quality of instruction by enabling novel learning processes, such as collaborative learning forms, adaptive learning tasks or subject-specific implementation, including simulations or digital measurement (Nerdel & von Kotzebue, 2020). However, meta-analyses show that implementing digital technologies does not guarantee high-quality teaching (e.g. Baker et al., 2018; Kates et al., 2018). “Technology can amplify great teaching but great technology cannot replace poor teaching” the OECD (2015, p. 4) appropriately summarises in this regard. Consequently, teaching quality depends on the quality of the implementation of digital media/technologies and the resulting didactic added value (Backfisch et al., 2021). Research on teachers’ professional competences assumes that teachers’ objective professional knowledge (especially pedagogical content knowledge, PCK) and motivational beliefs are central factors influencing the quality of instruction (e.g. Förtsch et al., 2016; Kunter et al., 2013). Teachers’ professional knowledge of digital technology use is typically assessed through self-report measures. Though economical, these test instruments show less validity in assessing complex and integrated knowledge on using digital technologies in teaching (e.g. Akyuz, 2018; Scherer et al., 2017). Self-report measures of TPACK (technological pedagogical content knowledge) seem to rather assess the current confidence to integrate technology in instruction and, thus, likely to be an indicator of the technology-related self-efficacy beliefs (e.g. Backfisch et al., 2020; Lachner et al., 2019; Willermark, 2018). In addition to self-efficacy beliefs and personal attitudes, motivational beliefs are significant predictors of implementation intention as well as frequency of use in the classroom (Farjon et al., 2019).

However, the quality of these implementations cannot be predicted yet, because little research has been done in this field of technology-enhanced teaching (Backfisch et al., 2021). Studies show that self-assessments of knowledge on technological pedagogical content knowledge (TPACK) only to a small extent reflect teachers’ actual knowledge and actions (Drummond & Sweeney, 2017; von Kotzebue, 2022). On the other hand, Backfisch et al. (2020) showed that utility beliefs do predict instructional quality and degree of innovation of instruction described by teachers. While there are already many studies in the area of subject-specific PCK (e.g. Förtsch et al., 2017; Park & Chen, 2012; van Driel et al., 1998), there are still few studies that look at performance assessed TPACK and the quality of instruction. Förtsch et al. (2017) showed that biology-specific PCK measured through performance assessment influences instructional quality of biology lessons, which in turn affects student achievement and situational interest. PCK research in biology education identified this connection in various foci, such as using models, technical terms or cognitive activation (e.g. Dorfner et al., 2020; Förtsch et al., 2017, 2018). Since TPACK is an extension or specification of PCK, TPACK might also be connected to increased teaching quality, student performance and interest.

Since previous studies mostly rely on self-reported rather than performance assessment TPACK, there is still little empirical evidence on the cognitive structure of TPACK (e.g. Archambault & Barnett, 2010; Lin et al., 2013; Scherer et al., 2017). Exceptions that focus on performance assessment TPACK so far are, e.g., Akyuz (2018) and Lachner et al. (2021). There are also very few biology-specific test instruments with regard to TPACK (self-report measures instrument, Mahler & Arnold, 2022; and performance assessment, von Kotzebue, 2022). However, in this field, studies providing more detailed information about TPACK are needed. Besides the low number of studies analysing quality of technology-enhanced instruction (planning) and use of technology, the different dimensions (knowledge, beliefs, quality of instruction) are mostly examined in isolation, which means that no correlations can be reported (e.g. Schmid et al., 2021). Another point of criticism is that hardly any subject-specific test instruments are used. For example, lesson plans from different subjects are evaluated together, and then statements are made about TPACK. However, TPACK is actually characterised as being subject-specific, which is why lesson plans should be categorised by subject and evaluated separately (Schmid et al., 2021). The present study aims at contributing to closing these research gaps. Therefore, biology-specific TPACK (self-report and performance assessed) and beliefs will be assessed together and the influence of these constructs on the quality of lesson planning on the topic of honey bees among prospective biology teachers will be analysed.

For the subject of biology, the potential added value lies on the one hand in the fact that functional structures and complex processes that are invisible to the eye, some of which are too small, too fast or too slow, become comprehensible with the help of images or simulations (e.g. Schwanewedel et al., 2018). On the other hand, digital tools, such as sensors, hardware and software can significantly support or even enable the subject-specific ways of working in biology: research, measurement/data acquisition, data processing and presentation, and simulation/modelling (e.g. Becker et al., 2020; Nerdel, 2017). This technology-enabled way of working also reflects the current workday of biologists, as they are mostly supported in their work by digital tools. Both in the working world and in science teaching, the aim is not to replace the previous way of doing things, but to support it (Schaal et al., 2013; Schwanewedel et al., 2018).

Theoretical Background

TPACK is the most prominent and frequently used framework to describe teachers’ professional knowledge for effectively integrating educational technology in the classroom (Koehler et al., 2014). Extending Shulman’s (1986) model to include technological knowledge (TK) resulted in the TPACK framework, consisting of a total of seven components (e.g. Koehler et al., 2014). The TPACK-component (technological pedagogical content knowledge) results from the overlap area/intersection of the three basic components (CK, PK, TK). Consequently, this component refers to content-specific teaching strategies with educational technology (Koehler et al., 2014). It encompasses several areas, such as specifically combining content, pedagogy and technology to present subject matter in different ways. In turn, varying the presentation of subject matter helps students with different backgrounds understand it and facilitates learning (Pamuk et al., 2015; von Kotzebue, 2022). There are two different perspectives of how the components of the TPACK model are related and how they are developed (e.g. Schmid et al., 2021): an integrative view and a transformative view. The integrative view assumes that the TPACK component results from the integration of the other six components (CK, PK, TK, PCK, TCK, TPK) and is thus related to each knowledge component (Schmid et al., 2021; von Kotzebue, 2022). The transformative view assumes that the TPACK component is directly affected by TPK, TCK and PCK, but not by the basic components TK, PK and CK (Schmid et al., 2021; von Kotzebue, 2022). In addition, the overlap of knowledge components is assumed to be more than the sum of the individual knowledge components and thus the TPACK component is an independent form of knowledge that still transforms beyond the underlying components (see also Schmid et al., 2021). Consequently, this TPACK component is considered the central component for successful technology integration in subject-specific teaching (e.g. Lachner et al., 2021). In the present study, the transformative view of TPACK is assumed, and due to the relevance of the TPACK component and in order to be able to calculate models that are as economical/small as possible, only the TPACK component is focused on, but this is measured with different measurement methods.

TPACK Assessment Tools and Beliefs Towards Learning with Digital Technologies

The TPACK components can be assessed using different instruments (e.g. Koehler et al., 2014), whereby self-report measure is currently the most commonly used method. This method is very economical regarding test design. Test processing and test evaluation are comparatively quick and cost-efficient too (Scherer et al., 2018). However, criticism of self-report measures is increasing, especially concerning assessment of knowledge (e.g. Willermark, 2018). Nevertheless, this method enables measuring self-efficacy beliefs and self-confidence in media teaching skills (Scherer et al., 2017, 2018). The reason why self-report measures are an important factor in determining intention and frequency of digital media use in the classroom is (e.g. Farjon et al., 2019; Scherer et al., 2019) because self-efficacy beliefs influence acceptance of educational technologies (see Lachner et al., 2019). In order to assess and promote professional knowledge components, objective instruments that directly assess knowledge are needed (e.g., Lachner et al., 2019), such as, for instance, performance assessments. Development and testing of these instruments are still in their early stages (Lachner et al., 2021). Although these tasks are directly assessing knowledge, only a small sub-area can be assessed due to time-consuming design, processing and evaluation. So far, hardly any explicitly subject-specific instruments have been used in this context (e.g. Akyuz, 2018), though subject specificity is a core aspect of the TPACK model. In order to assess teachers’ TPACK, it is necessary to examine what they can actually do with technology in their respective subjects (Voogt et al., 2013).

In addition to assessing TPACK, beliefs towards learning with digital technologies in classrooms can be assessed. According to the well-known TAM technology acceptance model (Davis, 1989), behavioural intention to use media (for teaching) is assumed to depend on attitudes towards technology. Perceived ease of use and perceived usefulness of technology are influencing factors on using technology. Studies show that almost 40% of teachers’ usage intentions can be explained by TAM variables (Scherer & Teo, 2019). Perceived usefulness of digital media in the classroom and self-efficacy beliefs with regard to digital media use can largely predict intentions to use digital media in teaching (Scherer et al., 2019). In addition to these aspects, teachers’ motivational conditions are believed to be an essential prerequisite for successful implementation of digital media (Scherer & Teo, 2019). In particular, a positive attitude towards using digital media is considered a crucial determinant for effective media use (Backfisch et al., 2021).

Using triangulation of different test instruments is advisable when examining TPACK. First, the quality of self-assessments depends on a person’s ability to assess their knowledge appropriately and, thus, measure their self-confidence. Second, determining TPACK via performance assessment is still relatively new (e.g. Voogt et al., 2013). In order to investigate more closely how these test instruments are related to instruction or instructionally relevant actions, lesson plans can additionally be analysed. However, previous studies including lesson plans or observations of actual lessons rarely relate them to self-reported TPACK (e.g. Harris & Hofer, 2011; Ocak & Baran, 2019; Valtonen et al., 2020).

Instructional Quality and SAMR Model

PCK is regarded a central prerequisite for successful lesson planning (e.g. Gess-Newsome, 1999); hence, it can be assumed that TPACK plays an equally important role in planning technology-enhanced subject lessons. Nevertheless, there has been comparatively little empirical research on instructional planning (e.g. Krüger & Großmann, 2020), and there is a lack of operationalisation of this competence or general quality criteria. However, there are recurring aspects that are characteristic for good teaching. In the context of the present study, reference was made to 6 characteristics or criteria that are frequently found in the literature of instruction quality. In the selection, care was also taken to ensure that these are also assessable or observable in the lesson plans. The first criterion is the cognitive level at which the students are. Cognitive demands form an important quality criterion of instruction (e.g. Förtsch et al., 2017). Since students’ cognitive processes are not directly observable, often tasks given to them in class are analysed (Stein & Lane, 1996). Tasks that are cognitively demanding lead to better processing of content (e.g. Lipowsky et al., 2009). When tasks are analysed, they are categorised according to the level of cognitive effort students must exert to successfully complete the respective tasks (Stein et al., 2009), such as, for instance, reproduction of knowledge.

The second quality criterion of instruction is student proximity or content’s relevance and relatability to students’ everyday life. Using authentic problems promotes in-depth knowledge and applicable knowledge whereas using highly abstract examples does not support either of these knowledge domains (Schwan, 2005). Moreover, factors such as authenticity and information about general and practical relevance of the content can increase intrinsic motivation (Lewalter & Geyer, 2005).

Another quality criterion is whether and to what extent students can work self-determined and creatively during lessons. This criterion is based on the experience of autonomy (basic needs), which is defined as a person’s natural aspiration to experience themselves as an independent “center of action” (Krapp, 1998). Students rarely get the opportunity to use and work with digital technologies on their own in class (e.g. Kramer et al., 2019). However, if we consider the potentials of digital media, e.g. for biology lessons (see the “Introduction” section), the specific added value lies in the active use of simulations or subject-specific tools, and thus, self-activity is also a quality criterion. Digital technologies have the potential to enhance learning through active learning approaches (Tamim et al., 2011). Moreover, student activity can–in this respect–be categorised using the ICAP model (Chi & Wylie, 2014). This model distinguishes four quality-levels of (visible) learning activity: passive, active, interactive and constructive. Kramer et al. (2019) showed that digital media are primarily used in biology classes to present subject content; hence, the teacher is active while students are passive.

In addition to these five rather general quality criteria, the sixth criterion relates specifically to technology-enhanced instruction. When planning lessons, it is important to decide whether to use digital media and which media to use. When planning lessons, it is important to decide whether to use digital media and which media to use. Therefore, determining whether learning objectives can be better pursued through technology implementation is an integral factor in this decision-making process. For instance, Puentedura’s (2014) SAMR model describes four levels at which digital media use can be located: substitution, augmentation, modification and redefinition. If digital media are used to substitute conventional media without any functional improvement, it can generally be dispensed with.

Aims and Research Questions

The current study examines the impact of biology teacher students’ TPACK (self-assessment and performance assessment) and their beliefs about learning with digital technologies in the classroom on their ability to integrate technology into teaching in order to produce high-quality lesson plans. Since research in this area is still at a very early stage, biology-specific test instruments are still missing; therefore, the instruments used in this research were mostly newly developed and tested. Also, in order to increase validity, different test formats were implemented simultaneously to comparatively analyse how these instruments are related to concrete lesson planning.

Therefore, the current study focused on following research aims:

  • (A1) Objective assessment and description of the quality of technology-based lesson planning by means of theory-based criteria.

  • (A2) Analysis of the relationships between study-related (general) factors and self-reported/performance assessed TPACK, and beliefs and their influence on the quality of the lesson plans. In doing so, the following research questions arise:

    • RQ1. What are the correlations between study-related factors (number of semesters, teaching experience) and self-reported and performance-rated TPACK as well as beliefs about learning with digital technologies in the classroom and the quality of lesson plans with technology integration (instructional quality and SAMR)?

    • RQ2. What impact do self-reported TPACK, performance-assessed TPACK and beliefs about learning with digital technologies in the classroom have on the quality of biology-specific lesson plans with technology integration?

  • (A3) Comparative analysis of two lesson plans, one of which scored particularly well concerning technological and general instructional quality characteristics, the other scored particularly poorly.

Methods

Sample

Eighty-two biology teacher students at an Austrian university participated in this study (64 female, 18 male). Students’ mean age was 24.21 years (SD = 4.64). On average, they were in their 8th semester of the teacher training programme for biology (M = 8.25; SD = 2.75). All participants usually studied at least one, in rare cases two, other teaching-subjects in addition to biology. Through school internships or employment at schools alongside the master’s programme, Austrian teacher students already have some teaching experience. Therefore, participants had an average of 50 h of teaching experience, although the number of hours already taught varied greatly in number (M = 50.18; SD = 164.74). Data was collected over three semesters from biology teacher students during a biology education course. As part of their coursework, participants were required to create a lesson plan for a given topic. In addition, pre-service teachers voluntarily answered a questionnaire assessing their demographic data and TPACK measures via self-report measure and performance assessment. They answered the questionnaire before creating the lesson plan. Participants were required to read and agree to an informed consent form before participating in the study. Participation was voluntary and anonymous.

Materials

Questionnaire

The TPACK self-report measure consists of 5 items related to the TPACK scale by Schmidt et al. (2009) and was extended and reformulated via self-development to fit the context of biology lessons (based on Mahler & Arnold, 2022). An example item is “I can use technical media to visualise biological processes for my students”. All items of the self-report measure were rated on a 5-point Likert scale ranging from “strongly disagree” to “strongly agree”. Cronbach’s ɑ is .84, suggesting acceptable internal consistency.

The TPACK performance assessment was newly developed following established PCK instruments (e.g. Jüttner et al., 2013; von Kotzebue & Nerdel, 2015) and primarily containing open-ended tasks on honey-bees for secondary education. All four items relate to requirement of lesson planning with digital technologies (Sailer et al., 2021b; von Kotzebue et al., 2020) and focused primarily on the competence areas data acquisition, data processing and presentation of the DiKoLAN framework (von Kotzebue et al., 2021). The items begin with a teaching scenario in a certain grade level; the situation and the biology-specific and technology-related tasks were briefly presented. Due to the different tasks, Cronbach’s ɑ is 0.64 and is thus still assumed to be an acceptable internal consistency.

Two TPACK-Items focused on knowledge of how to deal with student cognitions and two items on knowledge about instructions. An example item with a focus on knowledge of how to deal with student cognitions is as follows: “Imagine several 7th grade students asking you the following question at the end of the last biology lesson: “Why are honey bees lazy at noon and do not fly?”. In the upcoming lesson you want to address the content “water stress of plants” and that this can lead to a decrease in midday flights. When the outside temperature is too high, flowering plants that provide nectar for honeybees suffer from lack of water. Water stress results in reduced nectar production by flowering plants. Therefore, visiting flowers, especially around midday, may be less attractive to honeybees. In bullet points, describe as many different ways as possible in which you would address the students’ misconception of the phenomenon described above using one or more digital representations for each approach.” The number of correctly described approaches is added up. An approach was regarded as correct if appropriate content and digital representation were described to eliminate the students’ difficulties, e.g. a simulation showing the honeybees’ forays at different temperatures, comparing a “hot noon” vs. a “normal” to “cool” noon, showing that honey bees do fly at midday and that flight behaviour is connected to outside temperature instead of time of day.

Secondly, an example of an item on knowledge about instructions is as follows:

“Imagine that a teacher at your school gives you his lesson materials, which you can use or modify to use in your 7th grade classroom. His lesson plan looks like this:

The students each sit in front of a computer and are given the task of watching an animation (duration 7 min) saved on the desktop and reading through a text as an explanation, which they have been handed out as a worksheet. The topic is the different activities in the life of a honey bee. After all students have watched the animation and read the text on the worksheet, the teacher/you write a summary notebook entry on the whiteboard about the entire topic, which the students transfer to their worksheets.”

“Indicate what you would take directly from this lesson plan and what you would change and how (e.g., changes in the sequence, media used, ...). Describe your planning in bullet points.”

Participants are also shown excerpts (images) from the animation and an excerpt from the corresponding worksheet. In addition to the deficits in the lesson concept, such as the passivity of the students or the simultaneous processing of the text and the very long animation, there are also deficiencies in the animation and the worksheet. In the animation, for example, the honey bees are portrayed incorrectly and trivialised, which can lead to misconceptions and a lack of realism. In the text, errors can be found in the content, among other things. The number of (correct) points of criticisms found that should be changed is added up. In addition, concrete suggestions for change should be made in each case.

In order to obtain a valid test instrument with unambiguous coding, the self-report survey and performance assessment were tested by conducting several expert interviews (n = 6; science educators and experienced teachers). A coding manual was used to ensure objective coding of the performance assessment. Coding reliability was examined using double coding of 21 questionnaires. Cohen’s kappa coefficients were calculated for the TPACK-performance assessed dimension (κ = .90), indicating satisfactory agreement between the two raters, suggesting objective and reliable measurement (Landis & Koch, 1977). In addition, the construct validity of the two test instruments (self-report measure, performance assessment) was initially tested in a previous study by means of discriminant and convergent validity (von Kotzebue, 2022). In this validation study, the full test instruments were examined, each consisting of items on the four T-dimensions of the TPACK model (TK, TCK, TPK and TPACK). In the context of the present study, only the TPACK-dimension of the two test instruments were included because of the additional inclusion of the lesson plans and the rather small sample size for path models.

The beliefs about learning with digital technologies in the classroom were assessed using a scale of 8 items from Vogelsang et al., (2019; adapted from Richter et al., 2001). An example item is “Using digital media enables self-determined learning to a high degree”. All items were rated on a 5-point Likert scale ranging from “strongly disagree” to “strongly agree” and Cronbach’s ɑ is .82.

Lesson Plans

Biology teacher students were asked to plan a lesson (100 min) for secondary school students on the topic of honey bees using various digital media for implementation. The lesson’s topic was specified as follows: “What do honey bees need to live?” They were required to include the following content: bee-friendly and bee-unfriendly landscapes; food sources and plants that honey bees can find in the respective landscapes; transport of food to the beehive. At least three learning objectives pursued in this lesson had to be specified. Moreover, at least one digital medium–which the students had to create in addition to planning the lesson–and at least one more digital medium–which students had to research online–had to be incorporated in the lesson plan. Participants were provided with a style sheet that aimed at providing a detailed description of the chronological flow of the lesson (i.e. lesson activity, duration of the activity, form of student work, technologies used during the lesson).

The lesson plans were coded based on the descriptions in the articulation schemata and the digital technologies specified. First, the lesson was broken down into activities (e.g. drawing a bee-friendly environment on a paper; wordcloud on the topic “Which insects do you know?” using Mentimeter). Activities were then coded according to whether or not digital technologies were included.

Different coding categories were used for activities implementing digital technologies. First, instructional quality was coded using five categories: cognitive level, student proximity, self-activity, self-determination and ICAP (Cronbach’s ɑ = .65). In addition, implementation of digital technologies was coded according to the SAMR model. This is a helpful model that can categorise the use of digital media by four levels to what extent a digital media use adds value to the analog non-digital medium. The category “cognitive level” was categorised as 1 = reproduction; 2 = reorganisation; and 3 = transfer. Student proximity, self-activity and self-determination categories were categorised as 1 = not implemented; 2 = partially implemented; and 3 = implemented. ICAP and SAMR were coded in ascending order according to the four levels of the respective model: 1 = passive, 2 = active, 3 = constructive, 4 = interactive; or 1 = substitution, 2 = augmentation, 3 = modification, 4 = redefinition. After coding all activities that included digital technology use, averages per category across all activities for each lesson plan were calculated. Two trained coders independently coded 20% of the lesson plans (n = 17), yielding high inter-coder reliabilities (Cohen’s kappa for each category of Instructional quality κ = 1.000–0.795; Cohen’s kappa for SAMR κ = 0.808). Therefore, the remaining lessons were ranked by one coder (Landis & Koch, 1977).

Results

Descriptive Results

The three constructs collected with the questionnaire showed the following means and standard deviations: TPACK self-report measure (TPACK_srm) (M = 3.82, SD = .69), TPACK performance assessment (TPACK_pa) (M = 2.00, SD = .85) and beliefs about learning with digital technologies in the classroom (beliefs) (M = 3.79, SD = .51). Lesson plans showed an average of M = 5.33 number of activities per lesson, including M = 3.95 activities with digital media (SD = 1.67; Min = 1.0; Max = 9.0) and M = 1.38 without digital media (SD = 1.12; Min = 0.0; Max = 5.0). The individual categories showed low mean scores with clear differences between the categories, such as self-activity (M = 2.54) and self-determination (M = 1.40) (Table 1).

Table 1 Descriptive results of lessons plan categories

Intercorrelations and Path Model

Intercorrelations were first calculated (Table 2) in order to analyse correlations between study-related (general) criteria (number of semesters, teaching experience), constructs of the questionnaire (TPACK_srm and TPACK_pa, beliefs) and constructs of instructional planning (SAMR, instructional quality) and therefore answer the first research question. Correlations between TPACK_pa and both categories of instructional planning are shown: SAMR (r = .250, p = .026), instructional quality (r = .353, p = .001). TPACK_srm significantly correlates with number of semesters (r = .310, p = .005) and beliefs correlate with implementation of SAMR (r = .243, p = .032). The two categories of lesson planning correlate significantly with each other (r = .435, p < .001).

Table 2 Intercorrelations study-related (general) criteria (number of semesters, teaching experience), the constructs of the questionnaire (TPACK_srm and TPACK_pa, beliefs) and the instructional planning (SAMR, instructional quality)

Next, to answer the second research question, these correlations were analysed in a path model with MPlus 8 of Muthén and Muthén. Here, the number of semesters and teaching experience was assumed to be possible predictors for the constructs of the questionnaire. Furthermore, TPACK_srm, TPACK_pa and beliefs were assumed to act as predictors for the quality of lesson planning (see Fig. 1). The model has acceptable fit values (estimator WLSMV, χ2 = 9.31, df = 7, χ2/df = 1.33; CFI = .94, RMSEA = .06). The only significant effect for TPACK_srm is shown by number of semesters (β = .31, SE = .10, p = .003). However, TPACK_srm does not show significant effects on the categories of lesson planning. In contrast, TPACK_pa has an effect on SAMR (β = .28, SE = .10, p = .007) and instructional quality (β = .35, SE = .10, p < .001). Beliefs act as a significant predictor of SAMR implementation in lesson planning (β = .26, SE = .10, p = .010) but not on instructional quality (β = .17, SE = .10, p = .107). Through this model, 15% of the variance from SAMR (R2 = .15, SE = .08, p = .053) and 18% of the variance from instructional quality (R2 = .18, SE = .08, p = .019) can be explained.

Fig. 1
figure 1

Path model of the study-related (general) criteria (number of semesters, teaching experience), the constructs of the questionnaire (TPACK_srm and TPACK_pa, beliefs) and the instructional planning (SAMR, instructional quality); black paths, significant values; grey paths, non-significant values

Explorative Analysis of Lesson Plans with High and Low Scores on SAMR and on Instructional Quality Criteria

In order to gain greater insight into how individual lessons are distributed regarding the categories SAMR and quality of instruction, the distribution of these factors is visualised in a scatter diagram (see Fig. 2). The broad distribution of SAMR and instructional quality shows that there are lessons that score very well and others that score very poorly in the given criteria. Luci’s lesson and Jessica’s lesson stand out as “poor”, while Paulina’s and Rita’s lessons are conspicuously “good”. The selected lesson plans exhibit higher scores concerning TPACK_pa and beliefs for more successful lessons and lower scores for less successful lessons. However, TPACK_srm shows a more diverse picture (see Table 3). The difference between successful and less successful lesson planning–according to the selected criteria–will now be illustrated using the four selected lesson plans as examples (see also Fig. 3).

Fig. 2
figure 2

Scatter diagram of the division of lesson planning with respect to the criteria SAMR and Instructional quality

Table 3 Results of the questionnaires and lesson planning of the four selected good and poor lesson plans
Fig. 3
figure 3

Results of the four selected good and poor lesson plans on the lesson planning criteria

ICAP

In the consolidation and backup phase of Paulina’s lesson, students work through five stations in groups. At every station, students work interactively and solve problems together. For example, students are asked to use tablets and search the web to find out which mix of seeds would be suitable for the ideal bee meadow. Jessica’s lesson plan includes a passive classroom activity in the elaboration phase. Here, the teacher gives a talk about bee mortality using a PowerPoint presentation.

Student Proximity

Rita’s lesson includes a student-focused activity at the beginning of the lesson. Here, students leave the school building during the introductory phase and are required to observe which pollinators are attracted to which colours of plants in a meadow. They have to document their observations in a Google Docs using their tablets. In contrast, Jessica’s lesson plan includes an activity in which the topic is not implemented in a student-centred way and which does not establish a connection to everyday life. In this case, the teacher presents a continuous text without meaningful illustrations on a PowerPoint slide during the elaboration phase.

SAMR

The activity in Rita’s lesson, which was described in the category of student proximity, illustrates modification in the SAMR model, since new forms of work and tasks are implemented. A collaborative and production-oriented student work phase is facilitated by using Google Docs (automatic chart is created). The same topic (bee-friendly environment of bees) is addressed by Jessica in her lesson, including pictures of different environments presented by the teacher on PowerPoint slides. This activity meets the level of substitution in the SAMR model.

Cognitive Level

In Rita’s lesson, the students are asked to comment on, reflect on and interpret the observations made in the previous activity (observing a meadow) using tablets again (requirement area 3–transfer).

Self-determination

Regarding this category, Rita’s lesson includes an activity with a high degree of students’ self-determination. They are required to creatively present a scenario of how they imagine life without bees in the future. They are free to use a medium of their choice.

Self-activity

As mentioned at the beginning, the students go through five stations in Paulina’s lesson. At every station, students are given the opportunity act on their own. For example, students can work independently with a learning programme on the topic bee colony.

Discussion

The present study aimed at examining biology teacher student’s ability to produce high-quality technology-enhanced lesson plans and how TPACK (self-report and performance assessment) and their beliefs about learning with digital technologies in the classroom influence the quality of these lesson plans. Therefore, three sub-goals were defined.

First, the quality of lesson planning was objectively assessed and subsequently described, using theory-based criteria. The individual categories showed low mean values, although there were also differences between the categories. Student’s low level of self-determination in the lesson plans is striking. This seems problematic, since striving for situationally appropriate autonomy is an important condition for optimising the experience of competence. Successful completion of a task is only experienced as a confirmation of one’s own ability if it has been solved largely independently (Lewalter et al., 1998). Also, the cognitive level is low (reproduction or reorganisation) in lesson planning activities. Previous studies in German-speaking countries pointed to a low level of cognitive demand in STEM lessons too (e.g. Förtsch et al., 2017; Kunter et al., 2013). However, cognitively demanding tasks lead to better processing of content (Lipowsky et al., 2009) and thus to greater cognitive analysis (Craik & Lockhart, 1972). Consequently, higher student achievement and deeper and more conceptual understanding of the content can be accomplished (e.g. Förtsch et al., 2017; Lipowsky et al., 2009; Stein & Lane, 1996). According to the SAMR model, most activities involving technology use are classified as substitution or augmentation, and regarding quality levels of learning activity according to the ICAP model, the mean is active. This might seem like a low level of learning quality and activity. However, compared to other studies, this is in fact above average, as studies show that usually students are passive during instructional time when technology is used (e.g. Kramer et al., 2019; Sailer et al., 2021a).

Second, correlations between quality of lesson plans and self-reported as well as performance assessed TPACK, beliefs, number of semesters studied and teaching experience were analysed. The study shows that self-report measured TPACK is not a significant predictor for the quality of lesson planning. In contrast, performance-assessed TPACK is a significant predictor for SAMR and quality of instruction in lesson plans. Furthermore, SAMR can be significantly predicted by beliefs. However, the variance explanation of the two factors (SAMR; instructional quality) is relatively low by performance-assessed TPACK and beliefs. Thus, many other factors probably have an influence on the quality of lesson plans with technology integration. Self-reported TPACK, however, is a poor predictor for ability to integrate technologies into lesson plans in a qualitative manner, which is consistent with recent findings (Backfisch et al., 2020; Schmid et al., 2021). Studies linking self-reported TPACK to more objective measures also showed little to no association (e.g., Akyuz, 2018; Krauskopf & Forssell, 2018; Schmid et al., 2021; von Kotzebue, 2022). Nonetheless, (prospective) teachers’ self-reported TPACK is an important characteristic because it may rather reflect teachers’ confidence to integrate technology and therefore may be an indicator of teachers’ technology-related self-efficacy beliefs (e.g. Lachner et al., 2021). In turn, TPACK self-efficacy is related to frequency of technology use (e.g. Farjon et al., 2019; Scherer et al., 2018). However, frequency of technology use is not connected to quality of technology integration. Quality depends on perceived usefulness of technology for instruction (see Backfisch et al., 2020) as well as performance assessed TPACK.

Third, two strikingly good and two strikingly poor lesson plans were compared in order to gain more detailed insight into how the difference in quality of the selected criteria specifically manifests itself. Repetitive characteristics of, according to the chosen criteria, good or poor implementation of lesson plans emerged. Good lesson plans were characterised by students solving problems interactively and collaboratively with the help of technology. Furthermore, the students were asked to reflect and interpret their observations and they were given the opportunity to present their results in a self-determined and creative way. During the lesson, a connection was made between the topic of bees and their lives. In poorly implemented lesson plans, students were often passive and listened to the teacher’s PowerPoint presentations. Furthermore, no attempt was made to create a connection to the students’ everyday lives and thus to demonstrate the relevance of the topic to their lives as well. The implementation of technology mostly represented a mere substitution of the analog medium without, however, having any apparent added value.

Limitations

There are four major limitations in this study. First, only student teachers were included. They may be less able to assess questions about self-assessment of TPACK or beliefs about learning with digital technologies in the classroom, because they still have little practical experience. Schmid et al. (2021) aptly summarise this phenomenon by referring to it as an “in-between” stage of knowledge development. Student teachers have already acquired knowledge but are still lacking the practical experience to adequately assess knowledge/skills (see Park et al., 1988). Second, concerning knowledge components, only TPACK was included. While this represents the central skill area in the context of teaching with digital technologies, studies show how important PCK and PK are in this context (e.g. Backfisch et al., 2020; Lachner et al., 2019). Third, the items on TPACK performance assessment and lesson planning were related to the topic of honeybees. In contrast, the items on self-reported TPACK and beliefs were more general. This raises the question whether the results of performance assessed TPACK and lesson planning can be transferred to other topics, whether the two constructs are hence closer related to each other than to the other two and whether the results would be different if all questions were explicitly related to the same topic. Fourth, in order to be able to draw conclusions about the validity of lesson planning as a predictor for quality of instruction, real lessons–or at least elements of them–should be included in the analyses, as well as students’ achievement and motivation.

Conclusion

The present results show that beliefs and TPACK performance assessment play a critical role in the quality of technology integration in instruction. Thus, it can be assumed that previously used self-reported TPACK measures among teacher students alone are not sufficient to predict the ability of quality technology integration in teaching. In conclusion, the present study is characterised by several unique features: the study focused exclusively on biology teacher students, thus countering the criticism that most TPACK studies are not subject-specific and analyse different subject areas together (e.g. Schmid et al., 2021). In addition to self-reports, subject-specific performance assessment is used within a scenario approach in which teachers are asked to plan a technology-enhanced lesson (see also Backfisch et al., 2020; Harris & Hofer, 2011; Kramarski & Michalsky, 2010). Through implementation of scenarios, qualitative aspects of teachers’ technology integration can be assessed in a highly contextualised manner (Backfisch et al., 2020). Studies have shown that these scenario approaches can trigger similar cognitive and motivational processes as authentic (classroom) practices (Bolzer et al., 2015; Robinson & Clore, 2001). However, since coding lesson plans is very time-consuming, especially when, unlike other studies (including Schmid et al., 2021), they do not analyse surface features, such as frequency of technology use, but rather theory-based features adapted from research on teaching quality. Thus, they are only suitable for small sample sizes (see also Lachner et al., 2019). In short, the present combination of biology-specific constructs measured simultaneously in this study is unique in the context of technology integration in teaching.

However, further research needs to include real lessons or educational technologies created by students as well as effectiveness of technology use. Furthermore, in addition to biology student teachers, active practicing biology teachers should be included in the study to analyse the predictive value of self- and performance-assessed TPACK as well as beliefs towards learning with digital technologies in classrooms on the quality of instruction and to compare with the results of the student teachers.