1 Introduction

Argumentation is a vital form of human cognition insofar as we are constantly dealing with conflicting information and make decisions in our daily lives, which implies weighing up arguments and counterarguments (Besnard & Hunter, 2008). Competence in argumentation is a hallmark of scientifically literate citizens, and it is central to collaborative learning and critical reasoning among students (Osborne, 2010). Indeed, a culture of argumentation in the science classroom is important for helping students learn to evaluate evidence (Erduran & Jimenez-Aleixandre, 2007). For this to be achieved, it is essential that both in-service and preservice science teachers have the argumentation skills required by twenty-first century citizens (Tan et al., 2017). According to Zhao et al. (2021), however, preservice science teachers are not sufficiently prepared for teaching argumentation, and hence, there is a need for specific argumentation instruction (Capkinoglu et al., 2021).

A number of learning progressions for argumentation in science have been described in the literature (Bravo-Torija & Jiménez-Aleixandre, 2018; Lee et al., 2014; Osborne et al., 2016), and they could form the basis of instruction for science teachers aimed at helping them to improve their argumentation competence and to incorporate argumentation-based learning into their classroom practice. However, approaches to argumentation instruction designed specifically for preservice science teachers and based on validated and reliable learning progressions are scarce (de Sá Ibraim & Justi, 2016), and those which have been reported focus on specific aspects of argumentation skills, such as the use and evaluation of evidence, dialectics, counter-argumentation, or critique (Brocos & Jiménez-Aleixandre, 2020; Cebrián-Robles et al., 2018; von der Mühlen et al., 2019; Zhao et al., 2021); von der Mühlen et al., 2019; Zhao et al., 2021). Furthermore, and as noted by de Sá Ibraim and Justi (2016), little is known about the impact that instruction of this kind might have on the argumentation skills of preservice science teachers.

In the broadest sense, it has been suggested that teachers are able to implement instruction based on scientific practices in the classroom if they have experienced approaches of this kind during their training (Martínez-Chico et al., 2019). Accordingly, Zembal-Saul et al. (2002) highlight the importance of providing preservice elementary science teachers (hereinafter, PEST) with opportunities to build their argumentation competence prior to embarking on their professional career, as in this way they will be better able to develop the same skills of critical thinking and scientific reasoning in their own students, a necessary step in learners’ progress towards understanding scientific concepts and applying them in real life (Zembal-Saul, 2009). By teaching science as argument, young learners can be helped to understand how scientific knowledge is constructed and how to base their life decisions on evidence (Zembal-Saul et al., 2002). In a similar vein, Boyer (2016) stresses the need for PEST to learn how to construct evidence-based arguments and explanations and to incorporate this into their classroom instruction, insofar as the quality of young learners’ argumentation is enhanced when teachers are able to show them the structure and basis of different arguments and to encourage debate.

In order to help students develop argumentation competence, it is necessary to carefully plan the instruction you want to implement. So, when planning instruction it is important to consider the topics or contexts that will be addressed. Research suggests that argumentation and critical reasoning skills may be enhanced if they are contextualized to controversial socioscientific issues (SSIs) (Jiménez-Tenorio et al., 2020; Levinson, 2006). These are issues on which society and the scientific community are often divided, insofar as the needs or wishes of certain social groups may not concur with the views and proposals of scientists (Mauriz & Evagorou, 2020). This highlights the importance of collaboration and dialogue between science and society so as determine the best course of action to tackle social problems (Moreno-Díaz & Jiménez-Liso, 2012). Socioscientific issue–based instruction has therefore emerged as an effective way for students to contextualize their science learning within a complex social and political context (Hancock et al., 2019).

A contemporary example of a controversial SSI is breastfeeding. This is a relevant topic to address with PEST because it is the starting point for human nutrition (Illescas-Navarro et al., 2019) and an important aspect of health education (Martínez-Roche, 2000).

In light of the above, the present study aims to shed light on the link between learning progressions and instruction, specifically by analyzing the impact of instruction related to the SSI of breastfeeding and based on a previously validated learning progression (Osborne et al., 2016) on the scientific argumentation competence of a group of PEST.

2 Theoretical Framework

2.1 Learning Progressions for Argumentation in Science

According to Osborne et al. (2016), learning progressions in science describe possible pathways through which students’ understanding and use of scientific concepts, explanations, and related practices may, with instruction, become more sophisticated over time. Investigating and assessing students’ argumentation competence within the context of a learning progression can therefore provide useful information for the design of instructional modules (Osborne et al., 2016). In the view of Upahi and Ramnarain (2022), the growing interest of researchers in learning progressions is grounded in the potential they hold to align curriculum, instruction, and assessment.

A key purpose of learning progressions is to establish the steps that students might be expected to pass through in their conceptual and/or skills development (Corcoran et al., 2009). In science, this process begins with basic notions and progresses through several stages towards deeper levels of understanding (Bravo-Torija & Jiménez-Aleixandre, 2018). If PEST have a clear idea of the stages through which students may pass as they progress in their learning, they will be better placed when it comes to planning learning goals and choosing their approach to instruction and assessment (Corcoran et al., 2009).

In the field of science education, Toulmin’s (1958) argument pattern (TAP) has provided the framework for the design of various learning progressions (Berland & McNeill, 2010; Bravo-Torija & Jiménez-Aleixandre, 2018; Lee et al., 2014; Osborne et al., 2016) and for assessing students' scientific argumentation skills (Erduran, 2018; Erduran et al., 2004; Syerliana, & Muslim,, & Setiawan, W., 2018; Zhao et al., 2021). The TAP model starts, at base, with the construction of an argument and identification of its key elements (backing, warrant, and claim), and then progresses to more complex levels involving counter-claims and rebuttals. As defined by Toulmin, a claim is an assertion put forward publicly for general acceptance, warrants (or grounds) are the specific facts that are relied on to support a given claim, and backings are generalizations that make explicit the set of experiences relied on to establish the trustworthiness of the ways of arguing applied in a particular case (Erduran et al., 2004).

Berland and McNeill (2010) used Toulmin’s model to identify the key elements of an argumentative product: claim, evidence, reasoning, and rebuttal. They designed a progressive learning model based on three dimensions: the instructional context, the argumentative product and the argumentative process. These dimensions are used to support students in constructing and justifying knowledge claims over time and at different school levels.

Inspired in part by Toulmin’s model, Lee et al. (2014) proposed a framework for assessing students' scientific argumentation comprising five performance levels of increasing complexity. At level 1, students are simply able to make or identify a scientific claim without supporting evidence, while at level 5, they can distinguish conditions where their scientific arguments hold true and recognize limitations associated with the various elements of their arguments.

Osborne et al. (2016) designed and validated a learning progression that considers claims, evidence, and warrants (as in the TAP model), together with additional elements, namely counter-arguments, counter-critique, and comparative argument. Each level of their learning progression is assigned an alphanumeric code: the number (0, 1, or 2) indicates one of the three levels broad levels of argumentation, differentiated by the degree of justification they imply, while the letter (a, b, c, or d) indicates the degree of difficulty within a given level. For example, level 0b, identifying a claim indicates a student is able to identify another person's claim but without providing justification for it, which is one level of difficulty above level 0a, constructing a claim, in which the student merely states a relevant claim. A defining feature of this learning progression is that it operationalizes argumentation as a combination of both construction and critique.

Bravo-Torija and Jiménez-Aleixandre (2018) outlined a learning progression for the use of evidence to support scientific arguments in the context of decision-making. The rationale for this focus was that the use of evidence is a central feature of knowledge assessment and argumentation. Their model comprises five levels of complexity, from level 1, where a student is able to identify and extract information in response to a problem but has difficulty connecting it to other knowledge, to level 5, characterized by the ability to evaluate options based on the available evidence and to put forward arguments by synthesizing evidence from multiple sources, both those that support and those which challenge their chosen option.

Despite, however, the importance that educational researchers have ascribed to learning progressions and the efforts to develop validated learning progressions for argumentation, it remains unclear how these progressions might inform the design of instructional modules on scientific argumentation. That is to say, although learning progressions have been used to evaluate the outcomes of instruction, the literature to date has not explicitly examined how they might provide a framework for the design of teaching itself. In this sense, for this work, we have chosen to follow the progressive learning model of Osborne et al. (2016) because as it includes the highest levels of progression and also because it seems important to us to differentiate between the levels of construction and critique.

2.2 Improving the Scientific Argumentation Competence of Preservice Science Teachers

In order to be able to carry out these learning progressions in scientific argumentation, instructional strategies are necessary. Although the literature includes several proposals for improving the argumentation competence of elementary and middle-school students, few have been targeted at preservice science teachers. Table 1 summarizes the characteristics of studies that have explored this topic with preservice teachers, indicating the focus of instruction and the teaching strategies used.

Table 1 Characteristics of studies that have explored argumentation competence among preservice science teachers, indicating the instructional strategies used

It can be seen in Table 1 that most of the studies use SSIs as a context in which to promote the argumentation competence of preservice teachers (i.e., the aim is that they learn to argue), while two studies are focused on both the learning and teaching of argumentation. The studies employ a variety of instructional strategies (normally two or three), the most widely used being justification of arguments and decisions, evaluation of arguments, and debates. Other less commonly used techniques include role play and 360° feedback. The studies by Brocos and Jiménez-Aleixandre (2020), Cebrián-Robles et al. (2018), Capkinoglu et al. (2021), Zhao et al. (2021), Cayci (2020), and Aydeniz and Ozdilek (2016) all required preservice teachers to justify arguments and decisions related to real-life problems or SSIs.

With regard to role play, Simonneaux (2001) suggests that its use enables students to experience different points of view regarding SSIs, thereby helping to develop their argumentation competence. In a similar vein, Cebrián-Robles et al. (2018) point out that preparing and enacting a role play requires students to search for evidence, draw up arguments (Tekin et al., 2020, and critique the views put forward by others.

These addressed SSIs required information research by the students, whether as support for a subsequent role play (Cebrián-Robles et al., 2018 or to inform their decision making (Brocos & Jiménez-Aleixandre, 2020. Cayci (2020) proposed a different SSI each week and gave students a few days in which to gather information prior to classroom discussion of the topic in question. Türkӧz and Öztürk (2019) similarly asked students to extract key information from videos as a precursor to debate. In fact, debates have commonly been used as a way of addressing SSIs and as a strategy to illustrate the importance of argumentation in teaching science (Capkinoglu et al., 2021, as the basis for producing a consensus position on real-life problems (Brocos & Jiménez-Aleixandre, 2020), and as an element in the design and teaching of argumentation activities (Aydeniz & Ozdilek, 2016.

With respect to the evaluation of arguments, two approaches are described in the literature. One is limited to identifying the elements of an argument (whether in a text or video), while the other involves both identifying elements and evaluating the quality of arguments (using rubrics). Regarding the former, the studies by Aydeniz and Ozdilek (2016), Capkinoglu et al. (2021, 2022), Cayci (2020), Cebrián-Robles et al. (2018), and Türkӧz and Öztürk (2019) all required students to identify elements of arguments in a written text, whereas in Cebrián-Robles et al. (2018), Boyer (2016), and Tekin et al. (2020); this was achieved by annotating a video. As for the second approach (i.e., both identifying elements and evaluating the quality of arguments), Cebrián-Robles et al. (2018) and Capkinoglu et al. (2022) designed a rubric based on the TAP framework to evaluate the arguments of preservice teachers. Tekin et al. (2020) similarly used a rubric to assess the level of scientific knowledge that students had used in making their arguments.

Continuing with assessing strategies, Cebrián-Robles et al. (2018) and Capkinoglu et al. (2022) both used the 360° evaluation system as a learning resource. This involves a number of steps. Students begin by putting forward an argument in response to a question, and each student then has to evaluate their own argument and that of a classmate. They then receive feedback from the teacher regarding the quality of the arguments put forward, thus encouraging further reflection.

Another strategy that may be useful for developing the argumentation competence of PEST is what is known as cartography of controversies (Cabello-Garrido et al., 2021). This approach implies mapping out the relationships between the different actants (human and non-human agents) involved in a given controversy (Latour, 2005). Producing the map is a collaborative task and it requires students to search for information and identify arguments and counter-arguments of relevance to the SSI that is being addressed. This process helps to foster critical thinking and their ability to reason scientifically (Christodoulou et al., 2021).

2.3 Breastfeeding as a Context for Argumentation

Breastfeeding is an example of an SSI (Illescas-Navarro et al., 2019 in that it involves the intersection of scientific knowledge with social and cultural considerations. From the scientific perspective, breastfeeding is considered to have numerous benefits for the health of both baby and mother (Stuebe, 2009). For newborns, it offers protection against infections and disease and boosts cognitive development (Gartner et al., 2005), while for mothers it is a preventive factor in relation to breast and ovarian cancer (Stuebe, 2009). There are cases, however, where babies cannot be breast-fed and formula feeding is necessary. Examples would be when an infant is diagnosed with classic galactosemia, or when the mother has untreated active tuberculosis, is receiving chemotherapy or is taking certain prescribed medications (Gartner et al., 2005). Similarly, some women may choose formula feeding because of breast or nipple pain due to musculoskeletal impairment (Charette & Théroux, 2019). Mothers with esthetic breast implants may likewise find it difficult to breastfeed if the procedure has damaged the mammary glands or nerve tissue, hampering milk production (Cheng et al., 2018).

The choice of breast milk or formula is also influenced by both social and cultural factors, such as balancing work and family commitments (Giménez López et al., 2015) or negative attitudes to breastfeeding in public (Morris et al., 2020). There are also numerous myths that may dissuade mothers from breastfeeding, for example, the belief that some mothers produce low-quality milk that is less nutritious (Padró, 2019). Another issue that cuts across both the scientific and social domains concerns the extent to which health professionals have received adequate preparation to support breastfeeding mothers, and whether their personal attitudes (e.g., with regard to extended breastfeeding) may lead them to offer wrong or confusing advice, leading mothers to discontinue breastfeeding (Cockerham-Colas et al., 2012).

Mention should also be made of the mother’s psychological and emotional wellbeing as a factor to consider. Montgomery et al. (2006) reported a relationship between breastfeeding and psychological stress, although Loret de Mola et al. (2016) concluded that mothers who breastfeed are less likely to experience more severe symptoms of depression. For their part, Lamontagne et al. (2008) highlight the social pressure that mothers often experience, especially when they have to cease breastfeeding because the infant does not latch on to the breast or finds sucking difficult. These mothers may feel guilty and that they have failed, and research suggests they often receive insufficient support and advice to help them switch to formula milk (Larsen & Kronborg, 2013). In a similar vein, Crossley (2009) argues that breastfeeding has become a moral imperative underpinned by the idea that breast is best, an idea reinforced by the misconception that what is natural (in this case, breast milk) is inherently better as it is pure and perfect (Lake, 2005). As Beckett and Hoffman (2005) point out, there are now social movements seeking to promote a return to what is natural, whether in relation to food, clothes or medicine. A final issue to consider concerns the role of fathers. Rempel et al. (2017) found that mothers’ breastfeeding intentions and behaviors, including the duration of breastfeeding, were influenced by fathers’ support, which may involve making sure the mother is comfortable when breastfeeding, taking care of household tasks, and valuing her through direct expressions of appreciation.

As an SSI, breastfeeding therefore involves several controversies:

How long should mothers breastfeed? The WHO recommends exclusive breastfeeding for the first 6 months, after which it may continue up to 2 years of age or beyond as a complement to other nutritious foods (World Health Organization, 2011). However, mothers who opt for extended breastfeeding cite a lack of social approval as the main constraint against continuing (Li et al., 2002).

Should mothers breastfeed in public? Research suggests that mothers who breastfeed in public often feel uncomfortable and vulnerable, and also that some people consider that mothers who do so are self-absorbed and inconsiderate and that breastfeeding is disgusting (Hauck et al., 2021; Morris et al., 2020. In this context, Komodiki et al. (2014) argue that increasing acceptance of breastfeeding in public would increase rates of exclusive breastfeeding and improve health outcomes for both the baby and mother, while Hauck et al. (2021) emphasize the need to foster women’s confidence and build communities in which breastfeeding is recognized as a cultural norm. However, Komodiki et al. (2014) also note that attitudes toward breastfeeding in public are shaped by cultural and religious factors, and hence, there is no single global view of this issue.

Breast milk or formula? Regarding the controversy over the choice of breast milk or formula, there is a large body of support for breastfeeding, due to its benefits for the health of baby and mother (Gartner et al., 2005; Giménez López et al., 2015; Stuebe, 2009). However, as already noted, there are various reasons why a mother may choose formula, including the fear that the baby is not ingesting enough from the breast, the inconvenience or fatigue associated with breastfeeding, or the inability of the baby to 'latch on' to the breast (Merritt, 2018).

Should breastfeeding be baby-led or scheduled? There is evidence to suggest that baby-led breastfeeding aids digestion, helps the baby to lose less weight during the immediate post-natal period, and is associated with a reduced incidence of neonatal hyperbilirubinemia (Fallon et al., 2014; World Health Organization, 1998). As to why mothers may opt for the scheduled approach, restricting both the frequency and length of breastfeeds (Fallon et al., 2014), possible reasons include the need to balance work and family commitments, a lack of social approval or insufficient support from health professionals (Cockerham-Colas et al., 2012).

Given these controversies, we consider that breastfeeding offers a suitable context in which to implement and evaluate an instructional module aimed at developing the argumentation competence of PEST.

3 Aim and Research Questions

The overall aim of this study was to analyze whether an instructional module focused on the SSI of breastfeeding and based on the learning progression described by Osborne et al. (2016) could help to improve the argumentation competence of PEST. To this end, the outcomes obtained in a group of students who received the instructional module (experimental group) were compared with those in a group of control students who did not. Outcomes were assessed using the same two assessment tasks in both groups: one task related to the SSI of breastfeeding (the focus of the instructional module, and for which scientific knowledge is required to construct and critique arguments), while the other concerned a school lunch program (domain-specific knowledge not necessary). The specific research questions addressed were as follows:

  1. 1.

    What is the initial (pretest) level of argumentation competence among students in the experimental and control groups?

  2. 2.

    What level of argumentation competence is observed in the experimental group following their participation in the instructional module (post-test), and how does this compare with that observed in the control group at the same time-point?

  3. 3.

    To what extent, if at all, is the argumentation competence acquired by students following the instructional module on the SSI of breastfeeding (experimental group) transferable to another context that does not require domain-specific knowledge to construct or critique arguments, and how do these outcomes compare with those observed among controls.

4 Method

4.1 Participants

Participants were 106 students enrolled during the 2019–2020 academic year in the third year of the Bachelor's in Elementary Education, a 4-year degree program offered by the University of [BLINDED FOR REVIEW]. For the purposes of the study, they were divided into two groups: experimental (n = 57) and control (n = 49). None of the students in either group had previously received instruction in scientific argumentation as part of their degree program. In the context of the Teaching Science module that all students had to complete, those in the experimental group received 18 h of formal instruction on scientific argumentation, focused specifically on the SSI of breastfeeding (this instruction is described in detail in the next section). By contrast, students in the control group received generic instruction in argumentation based on the TAP framework (Erduran et al., 2004), but the activities were not structured around a learning progression and did not include any aspects related to the SSI of breastfeeding. The latter is the standard way of teaching argumentation on our university’s degree program for PEST. Accordingly, the approach used in the experimental group constitutes an innovation, both in terms of how argumentation is taught and the SSI on which it is based, and it is therefore necessary to examine its impact (as we do in the present study) before extending it to all students.

The new instructional module on scientific argumentation (experimental group) was implemented by the first author of the present study. Her qualifications include a bachelor’s in physiotherapy and two master’s degrees, one in new research trends in health science, the other in secondary education (human health). She has also received training in scientific argumentation and its use with preservice teachers. Students in the control group, who only worked with the usual content of the Teaching Science module, were taught by a faculty member with a bachelor’s in biology and a master’s in secondary education (biology and geology).

4.2 Instructional Module

The aim of the instructional module was to improve the argumentation competence of PEST in the context of breastfeeding as an SSI, and it was designed based on the learning progression for scientific argumentation described by Osborne et al. (2016). The activities addressed three of the controversies associated with breastfeeding that we mentioned previously: How long should mothers breastfeed? Should mothers breastfeed in public? and Breast milk or formula? A fourth controversy (Should breastfeeding be baby-led or scheduled?) was used as the basis for one of the pretest/posttest activities, but it was not addressed as part of the instruction.

The instruction involved a sequence of eight activities with a total duration of 18 h (Palma-Jiménez et al., 2021). Each activity was designed to last for 2 h, with the exception of the role play, for which 4 h were set aside. Particular emphasis was placed on the use of information and communication technologies (ICT) and reflection, because both are key elements of students’ training. The assessment tools used included 360° feedback (Tee & Ahmed, 2014), digital rubrics (Cebrián-Robles & Franco-Mariscal, 2018), and a video annotation platform (Cebrián-Robles et al., 2019).

The learning progression of Osborne et al. (2016) was used to design the sequence of activities according to the following criteria:

  1. a)

    The set of activities covers all the levels of learning progression. Figure 1 shows the sequence of activities included in the instructional module, indicating in each case the name of the activity, the controversy addressed, and the level or levels of the learning progression (Osborne et al., 2016) to which it corresponds, indicating in each case the alphanumerical code used by these authors (0a, 1a, 2a, etc.).

  2. b)

    The activities are sequenced in such a way that students’ progress through the different levels (from easier to more complex). Thus, activities 2 to 7 correspond primarily to levels 0 and 1 of the learning progression, and only in activity 8 do the majority of tasks correspond to level 2 performance. With the exception of activity 1, which is designed to explore students' existing ideas and requires them to construct a complete argument (level 1c in the learning progression).

  3. c)

    The two dimensions considered in the learning progression (construction and critique) are addressed as balanced as possible. However, taking into account the performance level of the PEST, the first activities of the sequence focused more on the initial levels related to critique (0b, 0d, and 1d), which are more difficult than those equivalent to the construction dimension (0a, 0c, and 1c).

Fig. 1
figure 1

Sequence of activities included in the instructional module

We will now briefly describe each of these activities.

  1. 1.

    Introduction to scientific argumentation: We began by asking students the following question: “Do you think infants should be breastfed beyond the age of two years? Justify your answer”. Accordingly students had to construct a complete argument (i.e., construct a claim and a warrant and provide evidence). This question and their answers formed the starting point for the next activity. In the second part of this initial activity, students were introduced to the concept of argumentation and the different elements that make up an argument, using the TAP model (Erduran et al., 2004) as a guide. We also described and discussed the different levels of the learning progression.

  2. 2.

    360° feedback with rubrics: The instructor began by returning to the question asked in activity 1 and showing students an example of how they might assess a complete argument. In order to apply the 360° feedback system (Tee & Ahmed, 2014), we used the CoRubricFootnote 1 platform and a digital rubric designed by the authors to assess the arguments students had offered in response to the questionFootnote 2. The task for students was, first, to assess the argument put forward by a classmate, and second, to assess their own argument to the same question. They then received feedback from the instructor. With respect to levels of the learning progression, this activity corresponded to identifying a complete argument.

  3. 3.

    Annotating a text: Students were given a text by Gómez Fdez-Vegue (2015) in which reference is made to the controversy, how long should mothers breastfeed? Working individually they had to identify and highlight in the text the key parts of the argument (claim, warrant, and evidence). The instructor was available to answer any queries they had. This was followed by a group session in which each student described how they had tackled the activity, with the class as a whole providing corrections.

  4. 4.

    Annotating a video: Students watched a video by Garrido (2018) that offers an explanation and defense of extended breastfeeding. Working individually and using the CoAnnotationFootnote 3 platform, students had to highlight and label those fragments of the video that corresponded to a claim, a warrant, and evidence. This was followed by a group session in which each student shared their response to the task, with the class as a whole providing corrections.

  5. 5.

    Debate using Kialo: Students were first asked to search for information about the controversy, should mothers breastfeed in public? They then formed workgroups comprising 4–6 students each, with the class as a whole being divided into two teams, one who would argue in favor of breastfeeding in public, the other against. The next stage involved a debate using KialoFootnote 4, a platform that enables debates to be conducted online in real time. The debate had three parts: first, each group had to construct an argument consistent with the team to which they had been assigned, that is to say, either for or against breastfeeding in public; second, each group had to write an alternative counter argument or provide a counter-critique for an argument put forward by a group on the other team; and third, each group had to respond with an argument to the counter argument or counter-critique they had received from a group on the opposing team. Throughout this activity the instructor was on hand to answer any queries the students had, and she also acted as debate moderator.

  6. 6.

    Analysis of product labels: According to Bravo-Torija and Jiménez-Aleixandre (2018), one of the things that students find most difficult is interpreting data and establishing connections between different sets of data so as to integrate them within their argument justifications. This activity was designed to address this issue. Students were first given information about the nutritional content of breast milk, alongside two product labels from different infant formulas. Then, working in small groups (4–6 students), they had to construct two arguments in favor of either breast milk or one of the two formulas, basing their arguments on an analysis of the three sources of evidence they had been given (i.e., the nutritional composition of each milk). Each group then shared their arguments with the class as a whole, with the instructor providing corrections where necessary.

  7. 7.

    Cartography of controversy: The task for students in this activity was to analyze the controversy, breast milk or formula? This required them to engage primarily with two levels of argumentation: constructing a complete argument and providing an alternative counter argument. The instructor began by introducing them to the concept and method of mapping controversies. Based on actor-network theory (Latour, 2005) this entails analyzing a controversy and producing a map showing (1) the relationships between the different actants involved (i.e., ideas, people or objects with some relationship to the controversy) and (2) the spheres of influence (referred to as poles) in which they operate. By way of an example, students were shown a cartography illustrating the dominant model of meat production and consumption in Western countries (Cabello-Garrido et al., 2021). Working in small groups, students then produced their own maps of the controversy, breast milk or formula? The instructor was on hand throughout to answer any queries. Students’ maps were created using the drawing tool in Google Drive, and each group uploaded its map to a shared folder, thus enabling classmates to comment on and compare the various maps produced.

  8. 8.

    Role play: The controversy, breast milk or formula? also provided the focus for the role play, which was designed based on the approach described by Cruz-Lorite et al. (2020). This activity comprised two stages, each lasting 2 h. In the first stage, students were told that the role play would take the form of a television debate involving a panel of ten participants, five arguing in favor of breast milk and five in favor of formula. Additional roles were that of the program presenter, who would act as the debate moderator, and that of the TV audience, who would decide at the end who had won the debate. Students were then divided into small groups (4–6 members) and assigned one of the aforementioned roles. Each group was given a role card describing the role they would be representing and which also included space for them to write down the arguments they were going to use, as well as the sources of information consulted. They then had 1 week in which to prepare for the debate. The second stage of this activity involved the actual role play, which had two parts. In part 1, each group nominated a spokesperson who would present the group’s position. The task for the rest of the group was to act as advisers and to take notes about the strengths and weaknesses of the arguments put forward by other roles. In this first part, the spokesperson of each group defended their position (i.e., in favor or breast milk or of formula) using the arguments that the group had devised during the preparatory week and which had been written down on the role card. After all ten participants had stated their case, there was a 10-min break in which each group had to prepare counter arguments and counter-critiques to use in the second part of the role play, which took the form of a debate where students would need to engage with more complex levels of the learning progression: providing an alternative counter argument, providing a counter-critique, constructing a one-sided comparative argument, providing a two-sided comparative argument, and constructing a counter claim with justification. At the end of the debate, the group representing the TV audience chose a winning side (i.e., breast milk or formula) based on the quality of the arguments put forward.

It can be seen that the instructional module comprises a broad series of activities, addresses three of the main controversies surrounding the SSI of breastfeeding, and makes use of a variety of teaching strategies, including computer-supported ones (Scheuer et al., 2010). Consequently, and given that it has been designed based on a validated learning progression, we consider that it could improve the scientific argumentation competence of PEST.

4.3 Data Collection: Pretest/Posttest Assessment

To analyze the impact of the instructional module on students’ argumentation competence, we designed a pretest/posttest assessment tool comprising two tasks: one related to the topic of breastfeeding and required scientific knowledge to construct or critique arguments, while the other, focused on the issue of school lunch, did not necessitate domain-specific knowledge. The questions for each task were formulated and sequenced in such a way as to cover most levels of the learning progression for scientific argumentation (Osborne et al., 2016) that underpinned the instruction.

The breastfeeding task (Supplementary material, part 1) concerned the controversy over baby-led versus scheduled breastfeeding. Students first had to read a fragment of text extracted from Fallon et al. (2014), after which they had to answer a series of questions. The school lunch task (Supplementary material, part 2) concerned proposed changes to school lunch programs and was an adaptation of an assessment task used by Osborne et al. (2016). This assessment task was chosen for three reasons: it is relevant to the professional context of the PETS because it is related to the feeding of students in school lunches and it addresses an important topic in elementary schools. Also, it was validated by Osborne et al. (2016) as a control assessment task, and it does not require domain-specific scientific knowledge to construct or critique arguments, insofar as the forms of evidence that appear in the text refer exclusively to financial aspects of the school lunch program in the case of questions SL2 to SL5. Although the answers to question SL1 could be conditioned by knowledge like diabetes, celiac disease, intolerances, and childhood obesity or what PETS understands by a healthy diet, which is not the same for everyone, this scientific knowledge of PETS was not considered. The researchers in this task aimed to focus on the ability to argue, without regard to possible scientific concepts that could be addressed. In translating and adapting the task described by Osborne et al. (2016) to our sociocultural setting, the only major changes made involved references to government agencies (i.e., references to the US government and the US Department of Agriculture were changed to the equivalent agencies in our setting), and hence, we do not consider that the validity of the task is undermined in any way.

To avoid the pretest/post-test assessment being overly time-consuming, each of the tasks reflected a maximum of nine items from the learning progression. Of these, only five are comparable across the two tasks and were chosen so as to enable us to assess aspects of all three of the broad levels (0, 1, and 2) described by Osborne et al. (2016).

4.4 Validation of the Pretest/Posttest Assessment Tool

To ensure that the proposed assessment tasks were adequate (i.e., in terms of wording, structure, readability) for the educational level of our PEST, and to confirm our proposed linking of questions to learning progression levels (hereinafter, LPLs), we submitted it for appraisal and validation by seven professionals with extensive experience of argumentation (four were university teachers of science education, and three were secondary education science teachers). Following their analysis and feedback, the wording of some questions was changed slightly, and we also revised the LPL to which one of the questions was linked. Regarding the latter, question 5 in the school lunch task was originally linked to LPL 2aFootnote 5 (providing a counter-critique), but the experts recommended changing this to LPL 2b (constructing a one-sided comparative argument). With respect to the wording of questions, this was changed when the experts initially disagreed about the LPL to which the question should be linked. This was the case for four of the original questions in the breastfeeding task and two in the school lunch task. The final version of the assessment tool (following validation by experts) is shown in parts 1 and 2 of the Supplementary material.

4.5 Data Analysis: Design of the Rubrics

To analyze students’ responses to the questions, we created two rubrics, one for each of the assessment tasks (see Supplementary material, parts 3 and 4). The rubrics were developed through an iterative process involving all the authors of this paper and taking into account students’ responses and the LPLs, and their design was informed by a rubric used in a previous study of argumentation competence in PEST (Cebrián-Robles et al., 2018). For each question in both rubrics, we established different performance levels, indicating the extent to which a student’s response was judged to show achievement of the LPL to which the question corresponded. Each performance level was assigned a number (1, 2, 3, etc.) so as to enable a score to be awarded for a student’s response to each question.

Two of the levels included in the learning progression, namely constructing a complete argument (LPL 1c) and providing a two-sided comparative argument (LPL 2c), were assessed by summing the performance scores for multiple elements of a single question, one in each of the two assessment tasks. The first question in the school lunch task (SL1; see Supplementary material, parts 2 and 4) required students to construct a claim (LPL 0a), provide evidence (LPL 0c), and construct a warrant (LPL 1a), which together reflect the ability to construct a complete argument (LPL 1c). Accordingly, a score for the latter ability can be obtained by summing the scores for its three constituent elements; based on the number of performance levels defined in the rubric for each of these elements (see Supplementary material, part 4), this score ranges between 0 and 9. For example, if a student’s response was rated as performance level 1 for constructing a claim, level 2 for providing evidence, and level 1 for constructing a warrant, they would obtain a score of 4 out of a maximum possible 9 for constructing a complete argument.

The fifth question in the breastfeeding task required students to construct a one-sided comparative argument for and/or against a stated opinion (see Supplementary material, part 3). Accordingly, a student whose response set out an argument both for the opinion with which they agreed and against the opinion with which they disagreed may be considered to have provided a two-sided comparative argument (LPL 2c), a score for which could be obtained by summing the performance levels achieved in each case; based on the number of performance levels defined in the rubric for the two types of argument (for and against), this score ranges between 0 and 8. For example, if a student’s response was rated as performance level 1 for the argument in favor and level 2 for the argument against, they would obtain a score of 3 out of possible maximum of 8 for providing a two-sided comparative argument (LPL 2c).

4.6 Validation of Rubrics

The rubrics were validated through a three-stage process. First, and following the method described by Sadler and Zeidler (2005), we used investigator triangulation to build credibility and confirmability in the analysis of data. Specifically, two researchers independently analyzed 20% of students’ responses to the two assessment tasks so as to build consensus regarding their interpretation. We then calculated the percentage agreement between raters, computing Cohen’s kappa coefficient (Cohen, 1960) as a measure of reliability (Cohen et al., 2007). Kappa ranges from 0 to 1, with 1 indicating the maximum level of agreement. Inter-rater agreement for responses to the breastfeeding task was 82.1% (Cohen’s kappa = 0.57; p = 0.000), which may be interpreted as fair agreement (Landis & Koch, 1977). Agreement for the school lunch task was 88.1% (Cohen’s kappa = 0.74; p = 0.000), indicating substantial agreement (Landis & Koch, 1997). Finally, all responses for which inter-rater agreement was below 80% were reviewed again by the researchers until consensus was reached regarding their interpretation. This process led to a slight modification being made to the rubric for the breastfeeding task. Specifically, a new performance level was added for the scoring of question 2 (corresponding to LPL 0c, providing evidence) so as to account for responses where students based their answer on the experience of non-health professional family members or friends.

To further illustrate how the rubrics were applied in practice, we will now provide some examples of responses given by students in the experimental group to the two assessment tasks, indicating the LPL and the performance level reflected in the response.

4.6.1 Breastfeeding Task

Constructing a claim (0a): “I agree more with breastfeeding on demand, that is, baby-led.” Performance level 3 which corresponds to “an opinion is clearly expressed”, as it makes its position clear.

Providing evidence (0c): “Each baby will feed at a different rate, so it’s not a good idea to set specific times, because depending on the baby you might give them more or less milk than they really need [ …] babies need adequate nutrition and the best way of achieving this is feeding on demand, because they’ll be full when they’ve have had enough.” Performance level 6.

Identifying evidence (0d): “The evidence is that babies feed according to their needs, they take in the amount they need, and this favors digestion.” Performance level 3.

Providing an alternative counter argument (1d): “With scheduled breastfeeding the baby does not feed according to what they need, whereas feeding on demand means that the baby’s needs will be met. And with feeding on demand, milk production is baby-led and so their needs are always met, you avoid the problem of too little or too much. Also, this way of doing things means that no limits are put on the bond between baby and mother […].” Performance level 4.

Providing a counter-critique (2a): “Attachment in the first years of life is not a bad thing, it’s something positive because it creates more ties with the mother, and this doesn’t mean that the child is going to be overly dependent on the mother when they’re older.” Performance level 3.

Constructing a one-sided comparative argument (for) (2b): “From the point of view of the baby’s health I think Andrea’s argument is better because she backs it up by saying that baby-led feeding helps with digestion, because the baby feeds according to need, taking in the amount of milk that is necessary.” Performance level 5.

Constructing a one-sided comparative argument (against) (2b): “I agree with Andrea, because her argument focuses on the baby’s health, she refers to the amount of milk needed for correct development, and concludes that it’s better to give the baby what they want or need, not feeding according to the mother’s timetable. By contrast, Marina doesn’t refer to the baby’s health, she only talks about the baby’s independence.” Performance level 5.

Providing a two-sided comparative argument (2c): “I agree with Andrea because the baby knows when it’s hungry and will want to feed, and when it’s full it will stop. But in Marina’s argument, the baby might already be full and then we’re forcing it to feed, and this might train the body to take in more food than is really needed, which might lead to obesity later on in life.” Performance level 5.

Constructing a counter claim with justification (2d): “Breastfeeding on demand helps to ensure that the baby’s needs are met, that they get the amount of milk that is needed, […]. Also, because the baby controls their own appetite, there’s less of a risk of them becoming overweight in later life. It’s also been shown that jaundice is less common when babies are adequately breast fed. Therefore, I think that breastfeeding on demand is the best option for satisfying the needs of a newborn baby.” Performance level 5.

4.6.2 School Lunch Task

Constructing a claim (0a): “I agree more with the idea that schools in Andalusia should adopt the new school lunch program [...].” Performance level 3.

Identifying a claim (0b): “That it would be best not to adopt the new program.” Performance level 3.

Providing evidence (0c): “[...] because they will get money from the regional government, and also because the new menus are designed to promote children’s health [...]” Performance level 4.

Constructing a warrant (1a): “[...] The program is about offering children a balanced diet that is rich in vitamins through a variety of foods, so it would be a positive change.” Performance level 5.

Identifying a warrant (1b): “The idea is that with the money that would be saved, education could be improved by hiring more teachers and offering more after-school activities.” Performance level 4.

Constructing a complete argument (1c): “I agree more with the idea that schools in Andalusia should adopt the new school lunch program, because they will get money from the regional government, and also because the new menus are designed to promote children’s health. The program is about offering children a balanced diet that is rich in vitamins through a variety of foods, so it would be a positive change.” Summing the scores for constructing a claim (performance level 3), providing evidence (performance level 4), and constructing a warrant (performance level 5).

Providing an alternative counter argument (1d): “I think your argument overlooks the importance of health and adequate nutrition for children, which is essential for their development, and without it they might have problems later in adult life. […]. If children don’t have adequate health, then other money will have to be found to deal with the consequences of poor diets, which affects things like anxiety or obesity. Therefore, I think setting aside this money is a good way of promoting children’s nutrition and health.” Performance level 4.

Providing a counter-critique (2a): “In my view, this choice should be based on what is best for children’s health, not the money that is available. Children’s health should come first, as their proper development depends on it, and in this way you avoid health problems later on in life. Also, what’s the point of having more teachers and more after-school activities if children aren’t healthy enough to take part in them?” Performance level 2.

Constructing a one-sided comparative argument (a better one) (2b): “Joanne’s argument focuses on children’s health and good nutrition, which is crucial for their physical and psychological development. By contrast, Christine’s argument is based on the financial aspect and on areas that have nothing to do with nutrition. […]. Therefore, I think that Joanne’s argument is better because she is pointing out that a good diet is essential and necessary, it can improve children’s quality of life, whereas Christine overlooks these aspects and pays no attention to health.” Performance level 4.

4.7 Statistical Analysis

Application of the Kolmogorov-Smirnov test indicated that the data obtained from the two assessment tasks were not normally distributed (p < 0.05 for all variables and both groups). Consequently, the data were analyzed using non-parametric tests: (a) the Wilcoxon signed-rank test to analyze pretest vs. post-test differences in each group, and (b) the Mann–Whitney U test to investigate pretest vs. post-test differences between groups. The effect size (r) was calculated using the equation r = Z/√N, where Z is the value of the statistical test, and N is the sample size (Fritz et al., 2012). In the case of the Wilcoxon test, the sample size was N × 2. Effect sizes were interpreted according to Cohen’s (1988) criteria: value > 0.5, large; between 0.3 and 0.5, moderate; below 0.3, small.

5 Results and discussion

In this section, we begin by describing the performance of both groups of students (experimental and control) on the two assessment tasks at pretest. We then compare these pretest results with those obtained at post-test, exploring differences between groups so as to analyze the impact of the instructional module.

5.1 Performance at Pretest

With respect to the first research question, Table 2 shows the results obtained by both groups at pretest for each of the LPLs, linked to the corresponding items of the two assessment tasks (breastfeeding and school lunch).

Table 2 Results obtained by the two groups at pretest on each of the assessment tasks, according to levels of the learning progression

The data in Table 2 show that the two groups of students started out with a very similar level of argumentation competence. A significant difference was only observed in relation to two LPLs, providing evidence (LPL 0c) in the breastfeeding task (U = 1048, p = 0.023; r = 0.220) and constructing a claim (LPL 0a) in the school lunch task (U = 1228, p = 0.044; r = 0.196). In both cases, the difference was due to better performance overall in the control group, although the effect sizes were small.

At pretest, students achieved only intermediate or low performance levels (50% of maximum possible or less) on the majority of LPLs. This reflects the findings of Zhao et al. (2021), who similarly noted that preservice science teachers struggle to construct or identify (evaluate) arguments, suggesting that they are not well-prepared to teach argumentation in the classroom. Consistent with this view, Capkinoglu et al. (2021) recently concluded that preservice science teachers require at least a few hours of explicit formal instruction in argumentation, as was provided to the experimental group in the present study, and we would argue that instruction should ideally be framed within the context of learning progressions such as that employed here (Osborne et al., 2016). In this respect, the pretest performance of our PEST may be considered consistent with the degree of difficulty implied by the different levels of the learning progression (Osborne et al., 2016). In line with the results obtained by Cebrián-Robles et al. (2018), constructing a claim (LPL 0a) appears to be the easiest level for students to achieve, followed by providing evidence (LPL 0c) and constructing a warrant (LPL 1a). It is worth noting, however, that most responses to question SL1.3, which required students to construct a warrant (LPL 1a), were based on an “it’s what the experts recommend” type of argument (performance level 2) (Jiménez-Aleixandre, 2010), rather than on the use of evidence that was then connected to the claim (performance level 3). This is illustrated by the following response, corresponding to a student in the experimental group: “The first option, because this diet will have been studied by professionals who know what is best for children’s development”.

To explore the influence of scientific knowledge at pretest, we compared responses to questions corresponding to the same LPL across the two assessment tasks: breastfeeding (scientific knowledge required) and the school lunch program (domain-specific knowledge not necessary). Results were very similar for the questions that required students to construct a claim (LPL 0a), probably because this is the lowest level of the learning progression. Although providing evidence (LPL 0c) is considered a more complex task (Brocos & Jiménez-Aleixandre, 2020), the results in this case were again similar across the two tasks. When the question required students to provide an alternative counter argument (LPL 1d), both groups performed better in the task that did not necessitate domain-specific knowledge (i.e., school lunch program). As Mason and Scirica (2006) note, scientific knowledge helps to generate counter arguments, and hence one would expect students to produce lower quality counter arguments on a task that requires such knowledge, prior to instruction on the issue in question. Finally, there were no significant differences between the two tasks on questions that required students to provide a counter-critique (LPL 2a) or construct a one-sided comparative argument (LPL 2b). This is likely due to the level of complexity that these questions imply, insofar as they both require the integration of two elements of argumentation (constructing and critiquing). According to Osborne et al. (2016), constructing an argument from its different elements is easier than critiquing another person's argument, which is a more abstract task.

These results from the pretest assessment support the need for an instructional module that introduces students to the process of constructing an argument and which also includes strategies such as role play that require critical thinking skills.

5.2 Analysis of the Impact of Instruction

To address the second research question, we analyzed the level of argumentation competence achieved by students in the experimental group following their participation in the instructional module (posttest), and compared the outcomes with those observed in the control group at the same time-point. Below, we present the results obtained from the analysis of responses to the breastfeeding task (Table 3) at pretest and posttest and for both groups.

Table 3 Results from the analysis of responses to the breastfeeding task before (pretest) and after (post-test) application of the instructional module in the experimental group, and comparison with controls

It can be seen in Table 3 that students in the control group showed no significant post-test improvement in performance on any of the LPLs. Consequently, in the rest of this section, we will focus solely on the results obtained in the experimental group.

Based on the results obtained in the experimental group, we may state that the instructional module had an important impact on students’ argumentation competence, insofar as significant pretest/post-test differences were observed for all of the LPLs analyzed, with the exception of providing a counter-critique (LPL 2a). It can be seen in Table 3 that of the eight LPLs analyzed in the breastfeeding task, a significant improvement in performance was observed following the instructional module for seven of them (five with a moderate effect size and two with a small effect size).

Constructing a claim (LPL 0a) is the lowest level in the learning progression and, as we have seen, students’ performance in this respect was generally already good at pretest. This would account for the small effect size of improvement following the instructional module.

A greater difference was observed with regard to providing evidence (LPL 0c), with students progressing from performance level 1 (no evidence provided and answer based on the experience of non-health professional family or friends) to level 5, indicating that their responses included evidence that was connected to the claim and which was also scientifically sound. This is recognized as being a complex ability (Bravo-Torija & Jiménez-Aleixandre, 2018). Like Sandoval and Millwood (2005) who found that students were aware of the need to cite data and use evidence, but often did not cite sufficient evidence to support their claims. By way of illustration, consider the following pretest and post-test responses given by the same student:

“From watching what women around me do when they're breastfeeding.” (pretest, performance level 1)

“Babies need adequate nutrition during the first months of their life, so baby-led feeding means that the baby will take in the amount of milk they need when they need it. The baby will learn to eat according to need, which is important for avoiding weight problems in adulthood. Babies feed at their own pace, and it’s not helpful to establish a timetable. When they’re smaller they feed more slowly and get tired in the process, so with a schedule they might not have enough time to take in enough milk and feel full, and then they’ll need to feed again a short time later. Also, this approach can prevent babies from feeling anxious if their cries of hunger are not responded to.” (post-test, performance level 5)

Although identifying evidence (LPL 0d) is a slightly more complex ability than providing evidence (LPL 0c), the results again indicated a significant improvement (with a moderate effect size) in students’ performance. This could be due to the content of the instructional module (Fig. 1), in which several activities required students to identify different elements of an argument.

Regarding the LPLs that imply both constructing and critiquing of arguments, we observed a significant improvement (with a moderate effect size) in students’ performance in relation to providing an alternative counter argument (LPL 1d), providing a two-sided comparative argument (LPL 2c), and constructing a counter claim with justification (LPL 2d). They also improved with respect to constructing a one-sided comparative argument (LPL 2b), although in this case the effect size was small. Providing a counter-critique (LPL 2a) was the only LPL of this kind for which no significant differences were observed following the instructional module. It should be highlighted that when it came to constructing a counter claim with justification (LPL 2d), students were able to provide a different argument to that of either Andrea and Marina (see question B6 in Supplementary material, part 1), drawing on more kinds of evidence than were included in the source text and, in the majority of cases, reaching performance level 3 (out of 4) of the rubric. This contrasts strongly with their responses at pretest, where most of them provided an argument that was similar to Marina’s or which was incorrect from a scientific point of view (performance level 1). These results support the idea that the more information or data, especially of a scientific kind that is made available to students (in this case, through instruction on the SSI of breastfeeding), the more convincing and persuasive will be the arguments they produce (Capkinoglu et al., 2021). It should be noted, however, that our students did not reach the highest performance level for this LPL (level 4), as despite providing a different argument to the ones they were asked to consider, they did not explain why their argument was better.

The absence of a significant improvement in students’ ability to provide a counter-critique (LPL 2a) may be due to how the corresponding question was worded, insofar as they were asked to produce a counter argument to Marina’s (see question B4 in Supplementary material, part 1), but not explicitly to critique it. As regards the small effect size obtained for the improvement in student’s performance in constructing a one-sided comparative argument (LPL 2b), this may again have to do with the wording of the corresponding question. Specifically, students were asked which of two arguments (one in favor of baby-led breastfeeding, the other in favor of the scheduled approach) they considered to be the strongest, and at both pretest and post-test the majority limited themselves to arguing in support of the opinion that they agreed with, rather than critiquing the argument with which they disagreed (Kuhn et al., 2017).

5.3 Transfer of Argumentation Competence

To address the third research question, we analyzed whether the level of argumentation competence acquired by students in the experimental group following instruction on the SSI of breastfeeding was transferred to another context that did not require domain-specific knowledge to construct or critique arguments (school lunch task), and we compared the results with those obtained by controls. Below we present the results obtained from the analysis of responses to the school lunch task (Table 4) at pretest and post-test and for both groups.

Table 4 Results from the analysis of responses to the school lunch task before (pretest) and after (post-test) application of the instructional module in the experimental group, and comparison with controls

As in the case of the breastfeeding task, there were no significant differences between the pretest and post-test performance of students in the control group on the school lunch task. Consequently, the following analysis of transfer of argumentation competence is based solely on the results obtained in the experimental group.

Table 5 summarizes the findings for each LPL, indicating for each of the assessment tasks the effect size of significant differences between pretest and post-test performance.

Table 5 Summary of results for each LPL when comparing the pretest and post-test performance of the experimental group on the two assessment tasks

It can be seen in Tables 4 and 5 that of the nine LPLs analyzed in the school lunch task, a significant improvement in performance was observed following the instructional module for eight of them (five with a moderate effect size and three with a small effect size).

The impact observed was generally greater for LPLs related to constructing an argument (i.e., providing evidence [LPL 0c], constructing a warrant [LPL 1a], and constructing a complete argument [LPL 1c] than for LPLs corresponding to critiquing (i.e., identifying a claim [LPL 0b] and identifying a warrant [LPL 1b]). These results support the notion that critiquing an argument is a more difficult task than is constructing an argument (Osborne et al., 2016). The exception to this pattern of results was the small effect size obtained for the improvement in performance in constructing a claim (LPL 0a), although as in the case of the breastfeeding task, this is likely due to the high performance level that students already had at pretest.

In relation to constructing a warrant (LPL 1a), students progressed from using an “it’s what the experts recommend” type of argument (performance level 2) (Jiménez-Aleixandre, 2010) to providing evidence and connecting it to their claim (performance level 3). Similarly, prior to the instructional module many students were unable to identify a warrant (LPL 1b) as such, that is to say, they identified the kind of backing used in the argument but not the actual warrant linking it to the claim (e.g., in response to question SL3, they only stated that Christine’s argument was related to costs; performance level 2). Following the instructional module, however, the majority of students were able to identify the warrant (performance level 3). As Syerliana, and Muslim,, and Setiawan, W. (2018) point out, justification requires not simply a suitable backing (scientific, social, economic, ethical knowledge, etc.) but also a warrant that connects this evidence to a claim.

Regarding the ability to construct a one-sided comparative argument (LPL 2b), many students at pretest expressed support for Joanne’s argument without explaining why it was better than Christine's (question SL5, performance level 1). In the post-test assessment, however, the majority not only expressed agreement with Joanne’s argument but also indicated why it was the stronger of the two they were asked to consider (performance level 2).

Finally, and in relation to the LPLs that imply both constructing and critiquing of arguments, we observed a significant improvement (with a moderate effect size) in students’ ability to provide an alternative counter argument (LPL 1d) and to construct a one-sided comparative argument (LPL 2b). These improvements may be due to the content of the instructional module (Fig. 1), in which several activities were linked to these two LPLs. Conversely, there was no significant improvement in students’ ability to provide a counter-critique (LPL 2a). As we argued when discussing the results for the breastfeeding task, this could be due to how the corresponding question was worded, insofar as students were not explicitly asked to critique the argument they had been presented with (see question SL4 in Supplementary material, part 2).

6 Conclusions

The results obtained lead us to draw the following conclusions, which we will present according to the three research questions:

RQ 1: What is the initial (pretest) level of argumentation competence among students in the experimental and control groups?

  • Students in both groups initially achieved only intermediate or low performance levels (50% of maximum possible or less) on all LPLs, with the exception (on both assessment tasks) of constructing a claim, on which many students reached the highest level.

  • The two groups of PEST started out with a very similar level of argumentation competence. A significant difference was only observed in relation to two LPLs, providing evidence (LPL 0c) in the breastfeeding task and constructing a claim (LPL 0a) in the school lunch task. In both cases, the difference was due to better performance overall in the control group, although the effect sizes were small.

  • Scientific knowledge appears to have been most relevant in the case of questions associated with level 1 of the learning progression (LPL 1a-d), those involving warrants or justifications. By contrast, there were no significant differences between the two assessment tasks (one requiring scientific knowledge, the other not) on questions linked to the lower level (LPL 0a-d) or the higher level (LPL 2a-d) of the learning progression, most likely due to the fact that these questions imply a lesser and greater level of complexity, respectively.

  • The pretest performance of our PEST may be considered consistent with the degree of difficulty implied by the different levels of the learning progression (Osborne et al., 2016).

RQ 2: What level of argumentation competence is observed in the experimental group following their participation in the instructional module (post-test), and how does this compare with that observed in the control group at the same time-point?

  • Students in the control group showed no improvement in performance on any of the LPLs between the two assessment points for either task. This suggests that without explicit instruction, students will not develop their argumentation competence, not even on tasks that do not require domain-specific knowledge.

  • In the case of the experimental group, the instructional module had an important impact on their argumentation competence, insofar as significant pretest/post-test differences were observed for all of the LPLs analyzed, with the exception (on both assessment tasks) of providing a counter-critique (LPL 2a).

In addition to demonstrating how formal instruction can develop the argumentation competence of PEST in relation to the different levels of the learning progression proposed by Osborne et al. (2016), our use of assessment rubrics provided a further level of detail about the specific kinds of improvement that were achieved. For instance, we were able to observe how many students initially provided no evidence in support of their claim or simply relied on an “it’s what the experts recommend” type of argument, whereas after instruction they were able to incorporate one or more pieces of evidence and connect it to their claim. The data obtained through the rubrics also highlighted the importance of domain-specific knowledge when constructing arguments about SSI such as breastfeeding.

RQ 3: To what extent, if at all, is the argumentation competence acquired by students following the instructional module on the SSI of breastfeeding (experimental group) transferable to another context that does not require domain-specific knowledge to construct or critique arguments, and how do these outcomes compare with those observed among controls?

  • The argumentation skills that students in the experimental group acquired during the instructional module, focused on the SSI of breastfeeding, were found to be transferable to a different context (the school lunch program). Of the nine LPLs analyzed in the school lunch task, a significant improvement in performance was observed following the instructional module for eight of them.

This suggests that by contextualizing instruction around a SSI (Sadler & Zeidler, 2005); in this case breastfeeding, students were helped not only to develop their argumentation competence in science but also to transfer their new skills to another everyday context and issue that did not require domain-specific knowledge to construct or critique arguments.

This research has gone a step further in the sense of basing this instruction on validated learning progressions, which we consider to be a novel contribution to the field of argumentation teaching. Using a learning progression in combination with various teaching strategies to design an instructional argumentation module has helped PETS improve their argumentation skills, except at some specific levels, such as providing a counter-critique and transferring these skills to another context not used during teaching. Overall, the results obtained support our initial premise and the view of various authors (de Sá Ibraim & Justi, 2016) regarding the importance of providing PEST with formal instruction in argumentation and basing this instruction on validated learning progressions.

7 Strengths, Limitations, and Implications

In this study, we sought to design an instructional module based on a previously validated learning progression for scientific argumentation (Osborne et al., 2016), with the latter also serving as the framework for assessing the impact of instruction on the argumentation competence of PEST. Although we were able to design teaching activities covering all levels of the learning progression, assessing students' performance was somewhat more challenging. In this respect, a limitation of the present study is the fact that the two assessment tasks did not include questions for exactly the same LPLs, the rationale being that to do so would make the assessment too long and time consuming for students. Consequently, although we were able to compare students’ performance in relation to several of the key elements of the learning progression described by Osborne et al. (2016), a more complete analysis that considers all its levels is now required. That said, the assessment tool did cover all three of the broad levels of the learning progression (0, 1, and 2), and it would appear to be suitable for evaluating them. Indeed, the rubrics were specifically designed with the aim of providing greater analytical detail about students’ performance in relation to each of these levels, and the nature of the results obtained suggest that this goal was achieved. We therefore believe that the assessment tool used here is a useful complement to the learning progression described by Osborne et al. (2016) and that it can help to provide a more precise evaluation of the extent to which students are able to achieve the different levels.

With respect to the difficulties in constructing and critiquing arguments shown in the literature, we consider that the teaching module used in this research, according to the results obtained, has contributed to improving the PETS skills related to both dimensions of argumentation, although to a greater extent in the case of construction. Therefore, we consider it important to include in argumentation teaching modules activities such as debate through Kialo or role-playing that allow us to address the more advanced levels of criticism on the Osborne et al. (2016). Likewise, we suggest addressing both dimensions in an integrated way in the sequence of learning activities, devoting special attention at the beginning of the sequence, as has been done in this module, to the identification of argumentative elements, the first levels of the critical dimension, which is usually the most complex.

We consider that the instructional module described in this study addresses the need to engage preservice science teachers in learning activities that can enhance their argumentation competence (Boyer, 2016). Presumably, an improvement of PEST's abilities and facilities around scientific argumentation and reliance on scientific evidence carries with it the prospect then of PEST being able to meaningfully translate this understanding to their classrooms and support elementary students in scientific claim construction and arguing from evidence (McNeill & Knight, 2013). Therefore, the next step, as Zembal-Saul (2009) describes, would be for them to carry this forward into their future professional practice and apply what they have learned to build the scientific reasoning skills of their own students.

A task for future research would be to apply our instructional module and assessment tool in the context of a different SSI to that considered here so as to provide further support for their utility in developing and evaluating the argumentation competence of preservice science teachers. It should also be noted that the activities which form part of our instructional module were designed to address different levels of the learning progression described by Osborne et al. (2016). Consequently, the decision as to which activities might be employed in future applications will depend on the educational level of students and the specific learning progression levels that one wishes to address, that is to say, it is not mandatory to follow the sequence of activities described in the present study. Finally, future studies might also consider adapting the activities used here so as to address other learning progression levels and then analyze the impact on students’ argumentation competence.