1 Introduction

There is increasing interest in teaching probability at different educational levels, given that recent research suggests that even young children develop ideas of probability with adequate instruction (Batanero & Álvarez-Arroyo, 2023). Probability is an area with the same relevance as calculus and algebra; however, it is the only branch that deals with uncertainty, leading to a new type of reasoning (Sharma, 2015). It applies many mathematical methods and has connections to proportionality, combinatorics, functions, and mathematical logic (Van Dooren, 2014). Another peculiarity of probability is that some conclusions are not validated deductively, as they are sometimes based on the analysis of statistical data. The existence of propositions (such as the law of large numbers), given with a probability, is also specific (Batanero & Borovcnik, 2016).

Probabilistic knowledge is essential for students, because of the existence of randomness in many daily situations, such as reacting to media reports or making decisions when facing risk (Gigerenzer, 2015). In addition, a basic understanding of probability is also needed to apply statistical inference, which is useful in varied fields, and simulation makes its study now possible for all educational levels (Bargagliotti et al., 2020).

To respond to this need, probability is introduced in primary education in Spain and continues being studied in secondary and high schools with conditional probability, Bayes' theorem, and probability distributions (Ministerio de Educación y Formación Profesional, 2022). However, teaching is based almost exclusively on learning definitions and properties and solving textbook problems (Muñiz-Rodríguez et al., 2016). It is unusual to confront students with real problems in which they must apply their probabilistic literacy and reasoning (Álvarez-Arroyo et al., 2022a).

However, developing students’ adequate probabilistic literacy and reasoning requires previous education of teachers in this component of mathematical knowledge and related didactic knowledge. Although research on teachers’ knowledge to teach probability is increasing, we have not found investigations that assess teachers’ probabilistic literacy and reasoning in everyday situations or their related didactic–mathematical knowledge.

The main objective of this paper is to present the design and administration of a questionnaire to a group of prospective secondary school mathematics teachers (PTs), asking them to interpret a report on traffic accidents taken from the media. Specifically, three research questions guided the research: 1) Do prospective teachers use adequate probabilistic reasoning and literacy to solve probabilistic tasks related to the news report? 2) Are they capable of analysing the probabilistic content of the tasks? and 3) Can they predict the students’ difficulties in solving the same task?

Below we present the previous research, theoretical framework, and results for these questions.

2 Previous research

We summarise research dealing with probabilistic reasoning and literacy, and with teachers’ knowledge to teach probability directly related to this paper.

2.1 Probabilistic reasoning and literacy

Most previous research on probabilistic reasoning and literacy (see Section 3.2) has been linked to decision-making under uncertainty and was developed under the programme heuristics (intuitive strategies in problem solving) and biases (Kahneman et al., 1982), summarised in Gilovich et al. (2002). This research described different heuristics, such as representativeness (expecting that a small sample will have the main characteristics of the population) or availability (judging more probable events that are easy to remember). In the fallacy of composition (Chernoff et al., 2018), individuals incorrectly generalise to a whole (e.g. compound events in flipping several coins) a property of a part (equiprobability of events when flipping only a coin).

Other reasoning biases appear in conditional probabilities (Díaz & de la Fuente, 2007; Falk, 1986), which are involved in the task posed in this study. One of them is the fallacy of the transposed conditional, which consists of confusing a conditional probability \(P\left(A|B\right)\) with its transposed \(P(B|A\)) without understanding the asymmetry of conditional probabilities (Borovcnik, 2016). Another bias is confusing conditioning and causation. Díaz & de la Fuente (2007) investigated these biases in a wide sample of university students who correctly solved probability problems. The pervasiveness of these biases has produced recent research that analysed different visualisations that might help students overcome these biases (described in Batanero & Álvarez-Arroyo, 2023).

The probabilistic reasoning of high school students in sampling tasks is supported by their understanding of variability, independence, and randomness (Sánchez & Valdez, 2017). Moreover, their reasoning about distributions is based on the ideas of sample space, combinatory, classical and frequentist approaches to probability, random variables, distribution, and variability (Sánchez et al., 2018).

Álvarez-Arroyo et al., (2022a, b) analysed high school students’ probabilistic reasoning when interpreting news reports. They reported that many students presented the transposed conditional bias and confused conditioning and causation when interpreting conditional probabilities (Álvarez-Arroyo et al., 2022b). Other students failed to compute compound probabilities and interpret small probabilities (Álvarez-Arroyo et al., 2022a).

2.2 Teachers’ knowledge to teach probability

Previous research on teachers’ probability knowledge has mainly focussed on probabilistic knowledge of specific concepts. For example, Chernoff et al. (2018) described the fallacy of composition in Canadian secondary school PTs’ understanding of randomness. Gómez et al. (2016) and Vásquez and Alsina (2015) studied Spanish primary school PTs and Chilean in-service teachers’ mathematical knowledge of the different meanings of probability, as did Ingram (2022) with secondary mathematics and science PTs. The general conclusion was the lack of capacity to relate the different meanings of probability. In Hourigan and Leavy (2020), Irish primary school PTs demonstrated an understanding of probability fairness, but some of them presented representativeness bias. Díaz et al. (2012) analysed the understanding of conditional probability of secondary school PTs and found a good capacity to compute probability, but a high incidence of transposed conditional bias, confusion between conditional and compound probability, and incorrect conceptions of independence. Similar biases were reported by Brückler and Milin Šipuš (2023) in PTs from Croatia, adapting items from Díaz and de la Fuente (2007) to the context of the COVID-19 pandemic.

We can classify research on teachers’ didactic knowledge into different components (see Section 3). Concerning the knowledge of students and specialised knowledge, Gómez et al. (2013) analysed primary school PTs’ knowledge to teach the frequentist meaning of probability. Although participants recognised the correct and incorrect answers of students and could explain the reasons for their errors, they identified few mathematical objects in the task when asked to do so.

Martin et al. (2022) investigated the attitudes and beliefs of in-service teachers and found agreement in the social usefulness and interest of probability in education and in the improvement of teachers’ attitudes after teaching probability. Regarding knowledge of curriculum, teachers in Martin et al. (2022) agreed on the relevance of the theoretical approach to probability, the main probabilistic concepts to teach, and the use of traditional tasks.

Concerning the knowledge of teaching resources, in-service primary school teachers successfully adapted games of chance to be used in teaching (Malaspina & Malaspina, 2020) and constructed fair and unfair games for specific random generators (Hourigan & Leavy, 2020). However, some primary school PTs in Alonso-Castaño et al. (2021) posed problems that were inadequate for their students. In-service secondary school teachers used various tasks, manipulative resources, and technology to exploit the frequentist approach to probability, and there was agreement on the value of technological tools in teaching (Martin et al., 2022). Moreover, simulation activities educated teachers on mathematical and technological knowledge (Biehler et al., 2018).

In summary, research related to the current study has focussed on heuristics and biases in decision making under uncertainty, students’ probabilistic reasoning in different situations, and isolated components of teachers’ knowledge. Such research has not considered the probabilistic literacy and reasoning of teachers when interpreting media news or their related didactic knowledge.

3 Theoretical framework

We base our study on the ideas of probabilistic literacy and reasoning and the Teacher Didactic Mathematical Knowledge model described below.

3.1 Probabilistic literacy and reasoning

Gal (2005) defined probability literacy as “the knowledge and dispositions that students may need to develop to be considered literate regarding real-world probabilistic matters” (p. 40). According to the author, it implies appropriate use of probability language and competence to compute probabilities in varied situation and helps people understand probability statements in situations such as interpreting news reports. A probabilistic literate person appreciates the role of probability in different contexts and can pose critical questions when encountering information about random situations.

Probability literacy also involves a basic knowledge of main probability concepts, which includes the fundamental stochastic ideas (Heitele, 1975): 1) Probability in its different interpretations (classical, frequentist, subjective); 2) Sample space (set of observable events in the random experiment); 3) Equidistribution (observing that in some experiments all the possible outcomes have the same probability); 4) Addition rule, to compute the probability of a compound event by the sum of probability of simple disjoint events that compose it; 5) Independence and product rule, that help solving probabilities in compound experiments; 6) Conditional probability, that formalises the change in our degree of belief in a random event with new information; 7) Combinatorics or techniques to compute the number of elements in the sample space; 8) Random variable, that introduces the study of probability distributions, such as the normal model; 9) Law of large numbers, which connects statistics and probability, and is the base for the frequentist meaning of probability; and 9) Sampling, which serves to generalize information from samples to populations.

Probability literacy also implies a critical stance towards probabilistic information; understanding and controlling one’s own feelings regarding uncertainty (Lilleholt, 2019); overcoming mistaken beliefs, and positive attitudes towards probability..

In this study, we also assess probabilistic reasoning, which arises when solving probability problems and using arguments to prove a probabilistic assertion (Sánchez & Valdez, 2017). Following Ben-Zvi and Garfield’s (2004, p. 7) definition of statistical reasoning, we can see probabilistic reasoning as how people reason with probability ideas and make sense of probability information. It requires judgments to establish the credibility of evidence (whether the information is sufficient and relevant to the problem) and its inferential strength (whether such information can be generalised) (Schum, 2001).

According to Borovcnik (2016), probabilistic reasoning includes competence in choosing the probabilistic model to apply in each situation, ability to discriminate between conditioning and causation, and correctly interpreting probabilistic statements. Other components of probabilistic reasoning (Batanero & Borovcnik, 2016) are:

  1. 1.

    Becoming aware of the influence of prior probabilities on probability judgements. Realising that many probabilities depend on other (prior) probabilities and must be considered.

  2. 2.

    Recognising the asymmetry of conditional probabilities. While independence is a symmetric relationship, conditional probability is not.

  3. 3.

    Correctly interpreting small probabilities. When an event has a negligible probability, we usually assume that it will not occur. However, when the experiment is repeated many times, the event tends to occur (Burns et al., 2010).

3.2 Teacher didactic-mathematical knowledge

Extensive research has focused on the education of mathematics teachers (e.g., Carrillo et al., 2018; Llinares, 2023), and frameworks have been suggested to describe the components of teachers’ knowledge.

We used the Didactic-Mathematical Knowledge (DMK) model (Godino, 2009; Pino-Fan & Godino, 2015; Pino-Fan et al., 2015), which is divided into three dimensions: mathematical, didactic, and metadidactic. The mathematical dimension refers to mathematical knowledge “per se”, the didactic facet includes the knowledge needed to teach a given topic, and the metadidactic dimension allows the teacher to identify and manage the norms and contextual restrictions that condition the teaching of mathematics (Breda et al., 2017).

First, we focus on PTs’ mathematical knowledge related to probabilistic literacy and reasoning. According to the DMK model, teachers need Common Content Knowledge (CCK), which is shared by teachers and students and allows teachers to solve the problem-situations of the educational level at which they teach. To solve the tasks posed to them, the PTs in the sample should use their probabilistic literacy and reasoning skills, which are part of their probabilistic CCK.

Teachers also needs Advanced Content Knowledge (ACK) of the mathematical concept being studied at a more advanced level; this knowledge provides the teacher with the mathematical foundations necessary to link the mathematical object to other mathematical or extra mathematical ideas, and to the same topic in more advanced courses. In the paper, we centre on the teachers’ ability to solve problems related to the probability of a complementary event and conditional probability, as well as the critical reading of probability information. Such knowledge appears in the Spanish secondary school curriculum guidelines and should be taught to high school students and therefore is part of teachers’ CCK.

The DMK model considers the following facets of didactic knowledge: epistemic (specialised mathematical knowledge or mathematical knowledge specific to teaching); ecological (knowing the relation of the topic with other themes in the curriculum and with society); cognitive (understanding the students' learning, difficulties and reasoning); affective (managing students and own attitudes, beliefs and emotions); mediational (familiarity with teaching resources and technology) and interactional (managing the classroom discourse).

This study focuses on the epistemic and cognitive components. The epistemic facet includes specific mathematical knowledge that allows teachers to use different mathematical representations and procedures, various arguments and justifications, and identify the knowledge at play when solving a mathematical task (Pino-Fan et al., 2015). To evaluate part of the PTs’ knowledge in this facet, we asked them to identify the fundamental stochastic ideas in a task they had previously solved.

The cognitive facet consists of knowledge about the students’ reasoning and learning and helps teachers predict their possible solutions to mathematical tasks. It also serves to foresee students’ difficulties and misconceptions while solving mathematical problems. To evaluate part of this knowledge, we asked the PTs to predict their students’ difficulties when solving a task that they had previously solved.

4 Method

4.1 Sample and setting

In Spain, any person who wants a mathematics teacher position in a public or private secondary (students aged from 12 to 16 years) or high school (16 to 18 years) must complete a university degree (4 years of university studies). Afterward, they should study for a specific master’s programme oriented to provide didactic competencies, curricular knowledge, and teaching practise to graduates and also compulsory to obtain a position in private secondary and high schools.

The sample consisted of 66 PTs enrolled in the master’s programme, which included all participants in the academic year 2021–2022 at the University of Granada. Half of the participants had completed a university degree in mathematics, and the remainder had undertaken other scientific degrees (e.g., statistics, physics, architecture, or engineering). All of them had taken one or several university courses in advanced statistics, including probability and inference. Their probabilistic knowledge was homogeneous; therefore, we did not separate the results by the participants’ career. These participants had been taught fundamental statistical ideas in a previous lesson and were given a summary of Heitele (1975) for consultation. Furthermore, they had previously worked on identifying the mathematical content and students’ errors in other tasks.

4.2 Questionnaire

The PTs individually completed a questionnaire as part of a course unit on statistics education, which included the identification of students’ possible difficulties and the design of additional probabilistic tasks using other news reports. In this paper, we analyse the questions displayed in Fig. 1, which are based on a news report about traffic accidents published in Spanish media. The first six questions were adapted from Álvarez-Arroyo et al. (2022b), who selected the topic to foster students’ interest in statistics and make them aware of the possible effects of alcohol and drug consumption.

Fig. 1
figure 1

Questionnaire given to participants

In this study, we proposed the same tasks on the participants to evaluate their probabilistic CCK and investigate our first research question. To evaluate the epistemic facet of their DMK and answer the second research question, we added question 7, in which participants were asked to analyse the mathematical content of the task. Finally, to evaluate the cognitive facet of their DMK, we added question 8 in which they should identify students’ potential difficulties in the task (research question 3). Once the written reports with the participants’ solutions to the different questions were collected, we performed a content analysis (Neuendorf, 2016). The coding was reviewed by the authors until a consensus was reached in case of disagreement.

4.3 Categories of responses to the mathematical questions

To answer the first six questions, which evaluate their CCK (first research question), the PTs should read the report and understand the questions posed. Let us consider the events: D = “the person died”, TP = “testing positive for alcohol or drugs”, TN = “testing negative”, and S = “wearing a seatbelt”. Then, the problem data are: \({\mathrm{P}}\left({\mathrm{TP}}|{\mathrm{D}}\right)=0.43\); \(\mathrm{P(}\stackrel{\mathrm{-}}{\mathrm{S}}\mathrm{|}\mathrm{D) = 0.23}\), where \(\stackrel{\mathrm{-}}{\mathrm{S}}\) (not wearing the seatbelt) is complementary to S. Participants should identify these data to answer questions 1 to 5. In Question 6 they should provide a reasoning that requires critical reading of the report.

We describe below the categories of responses in questions 1–6, which develop those used by Álvarez-Arroyo et al. (2022b). Table 6 with examples of responses to different categories is presented in the Appendix (participants have been coded as Px).

4.3.1 Probability of the complementary event

In Question 1, participants should compute the conditional probability \(P\left(TN|D\right)=0.57,\) which is complementary to \(P(TP|D).\) In question 2, we demand the probability \(P\left(S|D\right)=0.77\), complementary to \(\mathrm{P(}\stackrel{\mathrm{-}}{\mathrm{S}}\mathrm{|}\mathrm{D)}\). In questions 1 and 2, we classified the responses as follows: R1): Correct answers; R2): Confusing the probability requested and its complement; R3): Indicating that there was not enough data in the report to compute the probability; and R4): Other arithmetical errors or confusing the data.

4.3.2 Conditioning and causation

Questions 3 to 5 test the ability to distinguish a conditional probability and its transposed and to differentiate between conditioning and causation, two components of probabilistic reasoning included in the participants’ CCK. The requested probabilities are \(P(D|TP)\) in Question 3, \(P(D|TP)\) and \(P(D|TN)\) in Question 4, and \(P(D|S)\) and \((D|\overline{S })\) in Question 5. These probabilities should be obtained using Bayes' theorem; however, this is not possible because the report omits some data, specifically, the probabilities of consuming alcohol or drugs, wearing a seatbelt, and dying in both cases. Thus. PTs should notice the missing data and justify that a conditional probability does not imply cause and effect.

We classified the responses in the following way: R1) Correct answer, where the PT distinguishes conditional probability and its transposed, as well as conditioning and causation; R2) Offering missing information but not clearly pointing to the difference between conditional probability and cause-effect; R3) Applying contextual knowledge without indicating which data were missing to solve the problem, not clearly demonstrating that the PT distinguished conditioning and causation; R4) Distinguishing a conditional probability and its transposed, with no reference to conditioning and causation; R5) Distinguishing conditioning and causation but confusing a conditional probability and its transposed; and R6) Confusing conditional probability with its transposed and mixing up causation and conditioning.

4.3.3 Identifying missing data

The correct response to Question 6 implies identifying the information missing in the report to answer the different questions. It involves posing critical questions, which is part of statistical literacy (Gal, 2005), and interpreting probabilistic statements, a component of probabilistic reasoning (Borovcnik, 2016). Both components are part of the teachers’ CCK. We coded the answers as follows: R1) Correct responses, providing the missing data; R2) Correct responses not providing the missing data; R3) Offering irrelevant information; R4) Suggesting that the information is incorrect because we deal with a sample; R5) Pointing to the need of public awareness on the issue, without reference to the problem data; and R6) Not questioning the information or providing confusing information.

4.4 Categories of responses to didactical questions

4.4.1 Knowledge in the epistemic facet

In Question 7, related to the epistemic facet (research question 2), the PTs should analyse the mathematical content of questions 1–6 they had previously solved and identify the fundamental ideas (Heitele, 1975) involved in its solution. An a priori analysis of the task revealed the following fundamental ideas, which were used as categories for this question:

  1. 1.

    Probability. All the questions proposed involve computation or comparison of some probability. There is no equiprobability in the different events intervening (for example, wearing or not a seatbelt). Consequently, it is not possible to apply the Laplace rule, and the probabilities should be estimated from the frequency information about the different events.

  2. 2.

    Sample space. Participants should identify the different events in the random experiments intervening in each question: consumption or not of alcohol or drug; wearing or not a seatbelt; and dying or not.

  3. 3.

    Equidistribution, for each elementary event intervening (for example, each of the drivers who tested positive in drug or alcohol consumption). However, the compound events in the situation (e.g., testing positive, testing negative) are not equiprobable.

  4. 4.

    The addition rule does not appear explicitly but intervenes in questions 1 and 2 through the probability of the complementary event. Similarly, although no question is directly related to the product rule, we indirectly identify this rule when computing each conditional probability because we deal with compound experiments.

  5. 5.

    Dependence and independence of events should be analysed because the likelihood of dying is different depending on wearing a seatbelt and drug consumption.

  6. 6.

    Conditional probabilities are requested in the task, where the PTs should identify the event and the condition and not confuse a conditional probability with its transposed or causation.

  7. 7.

    The law of large numbers appears indirectly because we estimate the probability using the frequentist approach.

  8. 8.

    Sampling, because although the report refers to all traffic accidents, it can be seen as a sample of accidents in other years.

4.4.2 Knowledge in the cognitive facet

To assess the cognitive facet of the PTs' DMK, in Question 8, we asked the participants to identify the foreseeable errors of high school students when solving questions 1–6. We built the categories of analysis by considering the difficulties that high school students revealed in Álvarez-Arroyo et al. (2022b). These categories were classified into four groups: a) Formulating the problem; b) Conceptual understanding; c) Application of procedures; and d) Interpreting the results. Table 7 with examples of responses to different categories is presented in the Appendix.

FE: Errors in formulating the problem. These errors appear when the report is incorrectly read, there is incorrect identification of the data provided, or faulty translation of the questions into mathematical problems (mathematisation of the questions). This may happen when: FE1) The student might not understand the questions posed or not identify the intervening variables or events; FE2) The problem is not solved because not all the information needed was available; and FE3) There is difficulty in progressing from the verbal statement of the task to developing a mathematical problem.

CE. Conceptual errors. Solving the task requires identifying and applying different concepts, and students might confuse some of these concepts. These conceptual errors were classified as follows: CE1) Not recognising the conditional probability, since in the task this probability is defined verbally instead of symbolically; CE2) Confusing an event and its complementary in questions 1 and 2; CE3) Not distinguishing a conditional probability and its transposed or misapplying the Bayes’ theorem in questions 3 to 5; CE4) Confusing dependence and independence of events “testing positive in alcohol or drugs” and “wearing the seatbelt” and “dying in the accident”; and CE5) Identifying conditioning and causation.

PE. Procedural errors. The procedures required to reply to the different questions are mostly reduced to computing the probabilities of the complementary event. Potential errors in these procedures were classified as follows: PE1) incorrect computation of probability, PE2) incorrect computation of proportion or percentages, and PE3) misapplication of the addition or product rules.

RE. Reasoning and interpretation of results. An objective of the task was to increase the critical attitude of PTs when facing incomplete probabilistic information or interpreting counterintuitive results. Potential errors in this interpretation are: RE1) Misinterpreting counterintuitive results; RE2) Probabilistic reasoning biases, such as confusing condition and causation; and RE3) Lacking a critical attitude to face missing or biased information.

5 Results

Below we present the results for each of the three research questions.

5.1 Teachers’ probabilistic reasoning and literacy

To answer the research question 1 the responses involving teachers’ CCK (Godino, 2009; Pino-Fan et al., 2015) are classified into dealing with computing the probability of a complementary event, discriminating causation and conditioning, and identifying missing data.

5.1.1 Probability of a complementary event

Table 1 presents the results obtained by computing the probability of a complementary event. Most responses to the first two questions were correct; thus, in general, there was a good understanding of the question statement and the participants correctly identified the data in the news report. In addition, the probability of the complementary event was correctly computed, and its application was recognized by the PTs.

Table 1 Percentages of responses to questions 1 and 2

A small group of participants repeated the data in the report by incorrectly interpreting the question posed and confusing a probability with its complementary (R2). They were unable to identify the type of probability intervening in the situation and thus missed a component of probability literacy (Gal, 2005). Another failure in statistical literacy was shown in response R3, which implies a-critical reading of the information, since both the number of drivers killed (232) and the percentage of those who tested positive (43%) were provided in the report. Other errors were related to improper calculation of the percentage by considering an incorrect total.

These results were similar to those of high school students in Álvarez-Arroyo et al. (2022b), although high school students produced a slightly higher number of errors in Question 2.

5.1.2 Conditioning and causation

Table 2 presents the results for questions 3 to 5. The percentage of correct responses (R1), where the participant rightly argued that the question could not be answered due to lack of information and pointed to some missing data required to solve the question, was small. However, the partly correct responses (categories R2 to R5) totalled 39.4%, 28.8 and 22.7%, respectively, in the three questions.

Table 2 Percentages of responses to questions 3, 4 and 5

Globally, the results in these questions were poor because of the existence of biases affecting the participants’ probabilistic reasoning (Borovcnik, 2016) that they might transmit to their future students. In all three questions, a high percentage in our study (more than a half) explicitly reasoned according to the transposed conditional bias (Falk, 1986) (R5 and R6). These participants assumed that the requested probability was given as data in the problem, because confused a probability \({\mathrm{P}}\left({\mathrm{D}}|{\mathrm{TP}}\right)\) with its transposed \({\mathrm{P}}\left({\mathrm{TP}}|{\mathrm{D}}\right).\) The report provides the probability that a person who died tested positive, and the PT interpreted this data as the probability that a person who had consumed drugs or alcohol died (similar reasoning regarding the use of seatbelt). Moreover, these PTs attributed a relationship between cause and effect to the conditional probability.

Only 19.7% to 36.4% of participants explicitly distinguished between conditioning and causation, and understood that there might be other causes of death apart from the information provided in the report (categories R1 and R5). Finally, we found PTs with both biases (R6) or that provided other incorrect responses. The biases in conditional probability reasoning confirm those found in other prospective teachers by Díaz et al. (2012), where 40% of prospective teachers reasoned according to the transposed conditional bias in a multiple-choice item and by Brückler and Milin Šipuš (2023). Other incorrect responses included suggesting that the data were based on a sample and were not reliable, or confusing conditional and compound probability.

The percentage of PTs in our sample with biases in conditional probability reasoning was similar to that reported by Álvarez-Arroyo et al. (2022b) with high school students.

In other words, the results suggest the prevalence of biases regardless of the high probabilistic training of participants; this could be explained because formal teaching of probability at the university does not pay attention to these misconceptions.

5.1.3 Identifying missing data

Table 3 reproduces the results for question 6, which evaluates the critical interpretation of probabilistic statements by PTs (another component of probabilistic literacy, Gal, 2005). The results in this question were better. Even when only 45.5% of the PTs correctly identified the missing data to solve the questions posed (R1), another 21.2% answered correctly without specifying the data required to solve the problem. These PTs were aware that the information was incomplete but did not specify what information was needed.

Table 3 Percentages of responses to question 6

Adding these two categories, 66.7% of the PTs provided correct responses, revealing a critical capacity to interpret probabilistic information in this question. The participants critically analysed the information in the report, correctly interpreted the probability data, and indicated some missing data, applying all of these characteristics of probability reasoning.

It is worth noting that 15.2% of PTs suggested information irrelevant to the solution, which would not help solve the problem. A small percentage alluded to the need for greater awareness of the risk of driving when there is substance consumption or the seatbelt is not used as criticism of the information provided by the media in the task; although this response is irrelevant from the point of view of probability, it is valuable in relation to the formation of PTs. Finally, 6% did not question the information or did not reply. The participants in category R4 suggested that we would not trust the information obtained from the sample. However, even if the data were obtained from a sample, the sample size was large enough to assure a good estimation of the probabilities involved.

In Álvarez-Arroyo et al. (2022b), high school students proved worse probabilistic reasoning in this question, since only 23.7% and 17.1% of them replied correctly or identified the missing data. In addition, 30.3% of them suggested irrelevant information, 7.9% pointed to the need for public awareness, and 11.8% did not reply. Moreover, no participant in our study relied on the authority principle to interpret the data, as did 7.9% of students in Álvarez-Arroyo et al. (2022b). In summary, the PTs in the sample revealed a higher level of critical reading of the information than those students.

5.2 Knowledge in the epistemic facet

To reply research question 2, Table 4 presents the frequency of each of the fundamental stochastic ideas (Heitele, 1975) identified by the PTs as a solution to question 7. Because each participant generally provided several fundamental ideas, the sum of frequencies was higher than 66 (sample size). Thus, an average of 3.1 fundamental ideas per participant were quoted, which is a small number compared with our a priori analysis (Section. 4.4).

Table 4 Frequency and percentage of participants identifying different fundamental ideas (question 7)

Ideas easily identified by the PTs were, in decreasing order of frequency: probability, sample space, conditional probability, and sampling; however, not all of the participants identified these ideas despite their relevance in the task. Less frequent was the understanding that dependence and independence or the addition rule were involved in solving the task. Even if participants implicitly applied these ideas to answer the questions, they could not recognise them explicitly when we asked them to.

Other ideas correctly quoted, but to a much lesser extent (frequencies from 15 to 1), were random variable, the product rule, equidistribution, law of large numbers, and combinatorics. Thus, all the ideas of our a priori analysis (Section 4.4) were identified, but the number of PTs who quoted them was negligible. With regard to ideas incorrectly identified, because they do not intervene in the situation, five PTs suggested simulation, one correlation, and another variance.

In summary, we observed scarce knowledge in the epistemic facet of probability in the PTs in our sample. These results coincide with those of Gómez et al. (2016) and Vásquez and Alsina (2015), whose participants also showed limited competence in identifying the mathematical content in probability tasks.

5.3 Knowledge in the cognitive facet

To reply research question 3, Table 5 presents the frequency of students’ potential errors identified by the PTs in the task solution (question 8). Because each participant provided several potential difficulties, the sum of frequencies was higher than 66 (sample size).

Table 5 Frequency and percentage of potential errors identified by participants in the task solution (question 8)

Most frequent was quoting conceptual errors, although only a third of participants argued that students might not recognise the conditional probabilities in the task. 15 PTs warned of not distinguishing a conditional probability and its transposed or misapplying the Bayes’ theorem, and a few of them proposed that the students might not appreciate this dependence or argued the confusion between conditioning and causation.

Errors in interpreting the problem statement or in its formulation was the second most frequently cited category. Most often, these PTs indicated that the students might not understand the statement or not identify the intervening variables or events; other PTs answered that the students might not reply by considering that not all the information needed was available. Finally, the difficulty in progressing from the verbal statement of the task to developing a mathematical problem was noticed by a few participants.

Regarding procedural errors, some PTs alluded to incorrect computation of probabilities, and a few reported that there might be incorrect computation of proportion or percentages or misapplication of sum or product rules.

The data provided in the report could suggest ideas contrary to students’ knowledge and experience about traffic accidents. They required probabilistic reasoning (Borovcnik, 2016) to interpret these results correctly, and some of them may fail. However, only a few participants pointed to the difficulty of interpreting the results: three of them argued that it was difficult to work with counterintuitive information; therefore, these participants did not pay attention to the critical stance towards probabilistic information, which is a component of probabilistic literacy (Gal, 2005) and was needed to solve the task. A few participants pointed to reasoning biases related to conditional probability, and only four recognised that the lack of a critical attitude to face missing or biased information might be an obstacle for students. Probabilistic reasoning requires sound judgement to establish the credibility of the evidence (Schum, 2001), but only a few participants recognised that this might be an obstacle for the students.

Finally, 30 incorrect responses pointed to errors that did not apply in the tasks, such as confusing the sample and population or incorrectly solving equations.

In summary, we observed poor knowledge of PTs in the cognitive facet, given the scarce number of difficulties predicted (2.3 in average). Again, our results agree with those of Gómez et al. (2016) and Vásquez and Alsina (2015), whose participants also showed poor competence to recognise students’ errors in probability tasks.

6 Discussion and conclusions

This study investigates the probabilistic reasoning and literacy of a sample of prospective secondary school teachers when interpreting news from the media. Their capacity to identify the fundamental stochastic ideas needed to solve the task, and their ability to predict their students’ potential difficulties when solving the same task were also analysed.

Regarding the first research question, although it is not possible to generalize to other teachers, as we worked with a non-random sample, the results of our study indicate that most participants in the sample put into practice their probabilistic knowledge and reasoning, which are part of their probabilistic CCK. These participants showed their knowledge of probabilistic language and competence in identifying the problem data and computing probabilities, particularly, in the first two questions posed to them.

A substantial part of the sample correctly interpreted probabilistic statements and chose the probabilistic model to be applied in the situation in the other questions. In fact, and according to Sánchez and Valdez (2017), their probabilistic reasoning was evidenced when solving a non-routine probability problem and using arguments to prove the truth of a probabilistic assertion. Moreover, in some questions some participants made efficient use of contextual knowledge, which they used to compensate for the lack of data and as a support to discriminate between conditioning and causation, two additional components of probabilistic reasoning, according to Borovcnik (2016). All of these results were better than those of high school students in Álvarez-Arroyo et al. (2022b), and predictable, given the high formal knowledge in probability of participants in our study.

The study outcomes, however, point to important shortcomings in the participants’ probability reasoning, with a notable proportion of them not recognising the difference between conditioning and causation (questions 3 to 5) and incorrectly interpreting probabilistic statements, two basic components of probabilistic reasoning suggested by Borovcnik (2016) and Batanero and Borovcnik (2016). Moreover, noteworthy is the confusion of a conditional probability with its transposed, which has already been shown in previous research with students (e.g., Díaz & de la Fuente, 2007; Falk, 1986) and prospective teachers (Brückler & Milin Šipuš, 2023; Díaz et al., 2012). The percentage of PTs in our sample with biases in conditional probability reasoning was similar to that reported by Álvarez-Arroyo et al. (2022b) with high school students. This prevalence of biases regardless of the high probabilistic training of participants raises a concern about the formal teaching of probability at university.

Finally, some participants lacked the capacity to make judgments to establish the credibility of evidence in the report, another component of probabilistic reasoning (Schum, 2001). This capability is not taught formally to the students, but it is a knowledge needed by the teachers to adequately teach probability.

In response to the second and third research questions in this study, the didactic knowledge of these prospective teachers in both the epistemic and cognitive facets was poor, even after master’s degree training. The responses to Question 7 proved the participants’ limited competence to identify the fundamental stochastic ideas (Heitele, 1975) in the task they had previously solved and their failure to predict the potential errors of their students. Similar results were also found by Gómez et al. (2013) and Vásquez and Alsina (2015) in primary school prospective teachers.

Consequently, we should reflect that high mathematical preparation alone does neither assure the teachers’ needed competence to adequately teach probability, nor it is enough to overcome their probabilistic reasoning biases. A limitation of the teaching experiment was that the final questions in the task did not help develop sufficiently the epistemic and cognitive facets of teachers’ knowledge and advice more practice with performing epistemic and cognitive analyses are needed in the education of teachers.

These results open a line of research to develop new tasks such as the one analysed in this paper, which complement traditional problem solving in teaching and teacher education. On the one hand, these activities visualise the usefulness of probability in everyday life while reinforcing probabilistic reasoning; on the other hand, they make teachers aware of their own reasoning biases. These tasks should be complemented with questions that address the remaining didactic facets (affective, interactional, mediational, and ecological) and the meta didactic knowledge in teachers’ education. Simulation of the situations analysed in the tasks might also help improve the probabilistic reasoning and didactic knowledge of prospective teachers (Biehler et al., 2018).