1 Introduction

Teleological explanations are causal explanations that refer to a purpose or a goal (Ariew, 2007; Brock & Kampourakis, 2023; Kampourakis, 2020; Lennox, 1992; Walsh, 2008). For instance, when we say that a person is going to the bookstore in order to buy a book, we are explaining their actions in terms of an intention. Or when we say that animals have hearts in order to pump blood, we explain the presence of hearts in terms of their function in the body. When students say that organisms change their features in order to adapt to their environments, they explain the changes they observe in terms of a perceived need. Finally, we can say that we design airplanes with long wings and powerful engines in order for them to be able to fly, describing the design underlying their features. What matters is that in all of the aforementioned explanations, there is a common feature: something exists or happens in order for something else to occur. All such accounts are teleological explanations of different kinds, but not all of them are scientifically legitimate.

Research in psychology has found that teleological explanations are used in different contexts by people of different cultural backgrounds (e.g., Kelemen, 1999; Kelemen et al., 2005; Kelemen et al., 2013; Rottman et al., 2017; Schachner et al., 2017). Therefore, teleological explanations seem to be due to a cognitive bias that emerges in childhood and persists into adulthood, and so, it is important to consider the legitimacy of teleological explanations in teaching. Most importantly, teleological explanations are not necessarily wrong. However, the adjective “teleological” has been extensively used in the science education literature to describe students’ misconceptions, especially in biology (e.g., Kampourakis & Zogza, 2008; Lennox & Kampourakis, 2013; Trommler & Hammann, 2020). Therefore, it is useful to consider which teleological explanations are legitimate and which are not, rather than reject them altogether.

Teleological explanations describing the intentions of conscious agents are perfectly legitimate. So, when we say that we go to the bookstore in order to buy a book, or that we design airplanes with specific features in order for them to be able to fly, we describe our own intentions to achieve a goal, and therefore, we explain why we did whatever we did in order to achieve that goal. These teleological explanations based on intentions and design are legitimate as an account of our behavior or of the construction or use of artifacts. However, it has been found in research that these kinds of explanations are often extrapolated to nature, because people—especially young children—perceive the features of organisms and of natural objects in the same way that they perceive the features of artifacts (Kelemen, 2012).

Teleological explanations are quite prevalent in biology where they are often indicated by phrases such as “in order to,” “so that,” and “for the sake of.” Whereas in the past teleological explanations were considered to be illegitimate in biology, philosophers have long shown that some kinds of teleological explanations in biology can be legitimate (e.g., Lennox, 1993; Brandon, 1981). A careful analysis of teleological explanations shows that there are different kinds, with some being based on design and intentions and others being based on natural selection (Lennox & Kampourakis, 2013). When it comes to science teaching and learning, legitimacy is determined not by the structure of the explanation itself but by the underlying causal account; that is, legitimacy depends on whether the causes referred to match causes in the world. Legitimate teleological explanations reflect underlying causal structures, even if the causes are not referred to directly. By contrast, illegitimate teleological explanations cite as causes entities that do not bring about the phenomena of interest, or in some cases even reverse causality. Teleological explanations based on design or need are considered illegitimate in biology, whereas those based on natural selection are considered to be legitimate. So, among the examples mentioned above, it is legitimate to say that animals have hearts in order to pump blood, because we thus explain the presence of hearts in terms of their function in the body. In short, something exists because of what it does, and so, we can say that it exists for doing it. In contrast, saying that organisms change their features in order to adapt to their environments is an illegitimate teleological explanation because it accounts for the changes observed in terms of a perceived need (Kampourakis, 2020).

By contrast to biology, teleological explanations are a relatively understudied topic in physics (Brock & Kampourakis, 2023). Therefore, in the present paper, we contrast teleological explanations in biology and physics textbooks. The large quantity of text in the sample required a systematic and exhaustive analysis, so we adopted text-mining methods. Our analysis is based on a corpus consisting of eight English science textbooks. The English context was chosen because it is a country where national examination boards work with publishers to produce textbooks which are widely used in schools. The widespread use of standard texts means that the explanations in the examination board books potentially influence a large cohort of students’ understanding of scientific ideas and are, therefore, worthy of study. The search was carried out by defining a list of phrases that marked potential teleological statements. Our intention from this analysis is not to generalize about the prevalence of forms of teleological explanations in science textbooks but to identify patterns in a sample of legitimate and illegitimate teleological explanations and to reflect on the usefulness of the text-mining approach in the context of science education research.

2 Theoretical Background

2.1 Legitimate and Illegitimate Teleological Explanations in Physics

Whilst their legitimacy in the context of biology education has been acknowledged (Kampourakis, 2020; Lennox & Kampourakis, 2013), until recently, discussion has suggested that teleological explanations in the context of physics education are likely to be illegitimate (Trommler & Hammann, 2020). In physics, the entities of interest are often inanimate objects or abstract concepts which do not have agency, and therefore, teleological explanations implying their intentions are often considered illegitimate. However, we have recently argued that there are cases of legitimate teleological explanations in physics education (Brock & Kampourakis, 2023). Our argument draws on Lange’s (2017) model of constraint-based teleology. Lange observes that some explanations are legitimate even when they do not specify a causal agent. Legitimate causal explanations can highlight constraints that make an outcome inevitable, without directly referring to a cause. Consider the claim that a “… compact star will shrink to minimize the total energy, eventually collapsing to a black hole” (Schaffner-Bielich, 2020, pp. 86–87). The active cause in the star’s collapse, gravitational force, is not referred to. Instead, the author cites a universal constraint, the conservation of energy, to indicate a necessary final state, the reduction in total energy of the system. In legitimate constraint teleological explanations, the end state required by a constraint can be implied by referring to the constraint itself. For example, in proposing the existence of a force, “… a force called radiation-reaction force, must be present in order to avoid violating the conservation law of energy” (Cornille, 2003, p. 388), an end state in which the final energy of a system is equal to its initial value is implied. Lange (2017) argues that constraints of this form can indicate which outcomes are inevitable and which are impossible and are a legitimate explanatory form.

A condition on constraints is that they must be axiomatic and invariant across contexts (Lange, 2017). We therefore propose that teleological explanations that draw on laws, such as the law of conservation of energy in the example above, are legitimate because they rely on universal constraints. By contrast, teleological explanations that invoke rules or laws that are only true in some situations and lack the invariance required of constraints are not legitimate. For example, consider the explanation of a rotating object: “…for there to be a resultant force towards the centre…, the frictional force must increase” (OCR, 2019, p. 28). In this case, the change in frictional force is explained by the need for a centripetal force, a condition that is only found in rotating systems. The requirement for a resultant force acting towards the center of rotation is not a necessary condition; it is not present in all systems; hence, this form of teleology, we argue, is illegitimate. Some examples of legitimate and illegitimate teleological explanations (Brock & Kampourakis, 2023), with an indication of the reason for their legitimacy, are shown in Table 1.

Table 1 A summary of the application of teleological explanation in the context of physics (from Brock & Kampourakis, 2023)

Searching for the examples in Table 1 to support our argument for the existence of legitimate teleological explanations in physics, raised the question of how frequent such explanations were in science textbooks. Table 1 has been generated in an unsystematic fashion, from spontaneous notes from our reading and from an unstructured search. To develop a robust analysis of the frequency of different types of teleological explanations, we developed a text-mining approach.

2.2 Text Mining as a Means to Analyze a Large Corpus

Human language is a structured communication system based on grammar and vocabulary, with the distinctive feature of being compositional. We can combine or rearrange words to form new sentences with little effort. It is also referential, allowing us to refer to people, objects, or events from the past or future. To understand human language, however, we must overcome several challenges related to its peculiarities (Tsourakis, 2022). For instance, when we write, we tend to omit a lot of common-sense knowledge, assuming that the reader possesses it. The inherent ambiguity of natural language cannot be resolved without the proper context. Consider, for example, the word break, which can be interpreted as a pause from doing something, but it can also refer to a personal or social separation. Besides lexical ambiguity, we can encounter syntactic ambiguity, as in the following phrase: The fish is ready to eat. Is the fish ready to be fed, or can we eat the fish now? Similarly, in the case of identifying irony and sarcasm, positive or negative words can connote the opposite of their normal meaning, such as yeah, sure. Even worse, human texts can contain stereotypes and biases that prohibit their use in systems for the general public. Finally, when dealing with languages that lack the necessary linguistic resources, such as data, tools, and language technologies, we may face constraints that hinder our ability to analyze relevant material.

Linguistics is the main field for studying human languages and applying scientific methods to questions about their nature and function (Department of Linguistics, The Ohio State University, 2022). In recent decades, computational approaches to linguistic questions have gained ground, giving rise to text mining. The latter defines the processes of extracting useful and relevant information from unstructured textual data (Tsourakis, 2022). Text mining’s ability to identify patterns, relationships, and insights within this data source makes it an attractive alternative to manual human scrutiny for quickly processing large quantities of text. It also has the advantage of being less biased than manual methods as it relies on data-driven insights rather than human intuition or subjectivity. With the exponential growth of digital content, text-mining techniques allow for a more comprehensive understanding of the science education texts, including teaching materials and research papers. For instance, researchers might identify the underlying themes or topics within a large corpus, find terms and concepts of interest, extract the attitudes and opinions of writers in a particular area, and so forth (e.g., Byeon et al., 2021). Conversely, text mining can lack contextual understanding and, thus, lead to incorrect interpretation of nuances in a corpus. Often, domain-specific knowledge is needed to analyze scientific texts accurately, meaning algorithmic techniques can misclassify phrases. Having humans involved in the analysis process provides a means to interpret textual data categorized automatically, generating insights that can lead to informed decisions. We use the latter, hybrid, analytical approach in this paper.

2.3 Text Mining in Science Education Research

A recent systematic review searched publications for studies that used text mining in the context of science and mathematics education (Shin & Shim, 2021). Shin and Shim identified 64 articles that applied the technique between 2010 and 2019, 41 in the context of science education. The two most common uses for the technique in science education were for modeling students’ cognition by analyzing their responses to probes for research purposes and in automated assessment. The authors found only two studies which used text mining for the same end as us, documentary analysis. Wahlberg and Gericke (2018) used text mining to analyze how protein synthesis was described in Swedish secondary textbooks. Reitsma et al. (2012) used text mining to examine the alignment of US curriculum documents from different standard authoring bodies. In addition to the research identified by Shin and Shim (2021), one additional study applied text-mining techniques in the context of science education. Jiang and McComas (2014) analyzed popular science texts for the inclusion of content related to the nature of science. In reflecting on the application of text mining to the context of science education, Jiang and McComas (2014) concluded:

The successful application of the text mining technique in the current study opened a new branch for science education research, which invites more applications of such technique on the analysis of other aspects of science textbooks, popular science writing, or any other materials involved in science teaching and learning. (p. 1804)

Our study extends the so far limited application of text mining in science education research to the examination of teleology and considers the affordances and limitations of the technique. We ask two questions: In what contexts do physics and biology textbook authors make use of legitimate and illegitimate teleological explanations? To what extent is text mining a useful approach for identifying instances of legitimate and illegitimate teleological explanations in school science textbooks?

3 Methods

3.1 Textbook Sample

We approached publishers of school science textbooks in England with a request for electronic copies of textbooks used to teach physics and biology curricula to 11–18-year-old students. In response, we received a set of eight textbooks suitable for the analysis from a single publisher (Table 2). The books form a convenience sample—our intention is not to draw general conclusions about textbook authors’ use of teleology; rather, we aim to report usage in the case of books from one publisher and to reflect on the potential of the text-mining approach. The texts included biology and physics textbooks focused on preparation for external examinations at age 16 (AQA Physics and Biology - GCSE) and for examination at age 18 (AQA Biology and Physics A-level and Advanced Biology/Physics for You). In the English system, in many schools, one teacher (who may be a specialist in biology, chemistry, or physics) teaches content across all three disciplines to students aged 11–14 years old. The sample included two textbooks (Activate 1 and 2) with biology, chemistry, and physics content aimed at 11–14-year-old students. Given that our focus is on the legitimacy of some forms of teleology in biology and physics, the analysis mainly focuses on text from the biology and physics books, with a smaller number of sentences drawn from the general science books (which include biology and physics topics).

Table 2 Summary of the eight texts analyzed (B, biology; P, physics; S, science. Note, in the paper, the books are referred to by their codes, B1, B2, etc.)

3.2 Using Text Mining to Identify Teleological Explanations

For this exploratory study, we analyzed the convenience sample of three physics, three biology, and two science textbooks (see Table 2), using the AntConc corpus analysis software (Anthony, 2022). Researchers and linguists use this software to analyze and explore large corpora of texts, as it supports several languages and can handle texts in various formats, including plain text, HTML, XML, and PDF. In our case, all textbooks were available in PDF format. After specifying a list of pertinent terms for teleology, we loaded the whole corpus into the tool. Then, we performed a Key-Word-In-Context (KWIC) search for each term, providing concordance results. The search results display a keyword surrounded by a context of several words on either side. This technique allows us to see how words and phrases are commonly used in a corpus of texts. Initially, the phrases we looked for were the following: “in order to”; “in order that”; “for the sake of”; “to this end”; “for this reason”; “to achieve this”; “and so”; “as”; “as a result of”; “because”; “due to”; “hence”; “since”; “so that”; “therefore”; and “thus.” We used the maximum context size offered by the tool, equal to 25 tokens on the left of the search term and 25 tokens on the right. In addition, the software provides color highlighting for the term and the surrounding text to enhance readability. All results were saved into an Excel file for further processing.

In the next step, the first coder examined the terms. The excerpts surrounding the search phrases were classified into the categories of legitimate teleology, illegitimate teleology, not teleological, and unclear by the coder. We used dropdown menus in Excel with predefined options based on the previous categories to facilitate the annotation task. A second coder then checked the first classification and disagreements were resolved through discussion. To limit the responses in the tables below, we set a criterion of removing terms for which 5% or lower of the responses were categorized as legitimate teleology in either the biology or physics books.

4 Results

Our results report data related to the frequency of use of different forms of teleology in the sample textbooks. We first report counts of types of teleological explanations in the physics textbooks in our sample, consider the challenges of categorization and the limitations of text mining, and conclude with a comparison between teleological explanations in the physics and biology textbooks we studied.

4.1 The Prevalence of Teleological Explanations in Physics Textbooks in the Sample

We searched all the sample texts for the phrases mentioned in Sect. 3. The following four phrases were not found in any of the physics textbooks: “in order that”; “for the sake of”; “to this end”; “for this reason.” The number of instances of the other terms that met the 5% criterion for inclusion discussed above and the number of legitimate and illegitimate teleological explanations among them are presented in Table 3. As shown there, except for the phrase “so that” and “in order to,” in all other cases, the number of legitimate teleological explanations is relatively low. It is also interesting that, overall, the number of legitimate teleological explanations is always higher than the number of the illegitimate ones. Many of these legitimate cases relate not to natural phenomena, but to human intentions or intentional behavior. Therefore, even though the number of legitimate teleological explanations is not high overall, it is nevertheless important that legitimate teleological explanations are used in textbooks. As teleology has often been associated with misconceptions, in our view, the kinds of legitimate teleological explanations that exist in textbooks merit consideration.

Table 3 The number of incidences of phrases in the physics texts (P1, P2, P3) indicating teleological explanations, categorized as legitimate or illegitimate, unclear, and not teleological

Table 3 indicates that, in our sample of physics books, three terms were particularly likely to be associated with legitimate teleology. These are “to achieve this” (4/4 cases), “in order to” (35/36 cases), and “so that” (142/199 cases). No terms were particularly likely to be associated with illegitimate teleology. We attempt to explain the pattern by considering subcategories of teleology. The examples of legitimate teleology in physics textbooks were coded into the subcategories of constraint and intentional teleology (discussed below) by two coders. Illustrative examples of legitimate and illegitimate forms of these two subcategories, drawn from all search terms, are shown in Table 4.

Table 4 Illustrative examples of subcategories of legitimate and illegitimate teleological explanations for different terms in the physics textbooks. Bold text added for emphasis

The examples in Table 4 include two categories of legitimate teleological explanations in physics, legitimate intentional teleology, and legitimate constraint-based teleology. The examples of legitimate intentional teleology in the table (related to vehicles (the “hence” example) and electrical plugs (using “so that”)) refer to the intentions of the designers of objects. Elsewhere (Brock & Kampourakis, 2023), we have argued that an additional form of legitimate teleological explanations in physics arises when constraints are cited as causes. For example, the entry in Table 4 for the term “because” is “The ball loses an equal amount of momentum, because the total momentum is conserved.” The law of conservation of momentum is given as the cause of the ball’s change in momentum, an argument from a constraint. Constraint-based teleology is premised on the assumption that constraints have sufficient necessity; that is, they are invariant across contexts and over time (Lange, 2017). The principle of conservation of momentum in this case is an example of a constraint due to its invariance. Relationships that do not have a sufficient degree of necessity, we have suggested, cannot be legitimately cited as causes. For example, an illegitimate example in Table 4 refers to Ohm’s law: “The p.d. V must also fall exponentially. V = I R, and so I is proportional to V. ∴ The current I must fall exponentially” (P5, p. 307). Ohm’s law is an example of ceteris paribus law (Cartwright, 1980), that is, not a true law because the relationship between the variables only holds if some condition is met (temperature remains constant). We have argued that such cases are illegitimate because there are contexts in which the relationship is not true. Amongst the cases of legitimate teleology in physics textbooks, we identified two categories of legitimate teleology: cases that refer to an agent as a cause, and cases that cite a constraint. In total, in the physics books, we found 171 cases of legitimate constraint teleology and 66 of legitimate intentional teleology. Table 3 includes an interesting pattern. Some phrases are more likely to indicate legitimate teleological explanations than others. Phrases that met the inclusion criterion and were only found in legitimate uses in the physics context were “in order to,” “to achieve this,” and “want(s) to” (Table 3). After coding cases as legitimate or illegitimate, the cases were then coded in categories by type of explanation (Table 5).

Table 5 Subcategories of legitimate teleological explanations in physics textbooks by term

For our sample of physics textbooks, some terms are more likely to be linked to one category of legitimate teleology. Consider the case of a term that was cut from the main analysis because of its low instances of legitimate cases, “due to.” Both legitimate instances of “Due to” are examples of constraint (“Due to conservation of energy, ….” (P2, p. 12) and “…due to the invariance of the speed of light” (P3, p. 501)). In the cases of “in order to,” in physics contexts, 34 out of 35 of the responses were coded as legitimate agential teleology (for example, “the active [radioactive] wastes are dealt with in order to reduce the hazards” (P1, p. 487)). It is interesting to observe that “due to” is more likely to indicate a constraint, by comparison with “in order to” which tends to indicate human action. Science education textbook writers might also note that the terms “as” (6% of cases categorized as illegitimate), “so that” (6%) and “because” (5%) had relatively high proportions of instances in which their use was categorized as illegitimate (though the total counts for “as” and “because” are relatively low). For example, in the case of “as,” a force, friction, is implied to have a goal: “As it is trying to stop things moving, friction always acts parallel to the surfaces in contact and in the” (P3, p. 20). When coding, we found that in the cases of “in order to” and “to achieve this,” the inter-rater agreement was high (97.3% and 100%, respectively) and the likelihood of teleological legitimacy was similarly high (97.3% and 100%, respectively). This agreement can be explained by data in Table 5—both “In order to” and “to achieve this” are very likely to indicate human agency, making them relatively uncontroversial to classify. Two terms that were particularly likely to be associated with legitimate teleology “to achieve this” (4/4 cases) and “in order to” (35/36 cases) are both terms in which the majority of instances are agential. The relative easiness of identifying human action may explain the high proportion of legitimate application.

4.2 Some Cases of Disagreement in Coding

The use of text mining might imply that instances of particular forms of explanation can be easily searched for, by their association with particular phrases (like those in Table 3). However, our dual coding process emphasizes that the relationship between words and meaning is complex, and raters differed in their interpretation of whether particular claims were teleological or not. Ambiguity in language, subjectivity, and contextual factors contribute to these disagreements (Aroyo & Welty, 2015). Ambiguity arises from multiple meanings and unclear references, subjectivity stems from different perspectives and biases, and contextual factors vary based on cultural background and domain-specific knowledge. For example, in the case of “so that,” the coders were in agreement in about 69% of the cases. This is a relatively low percentage of agreement (an analysis of thresholds of coder agreement suggests that 90% agreement is accepted by all and 80% agreement by many researchers (Neuendorf, 2002)), and normally, the coders should discuss their criteria and recode anew. In contrast, the inter-coder agreement for “in order to” and “to achieve this” were 97.3% and 100%, respectively, and in general, there was high inter-coder agreement. In cases where there was disagreement, we discussed the cases and, in most instances, agreed on the categorization of the disciplinary expert coder. However, we have decided to report the results of this initial coding of phrases including “so that” here (see Table 6), because we found some interesting and consistent patterns of disagreement.

Table 6 Selected examples of coding differences for “so that” in physics textbooks. Bold text added for emphasis. L = Legitimate, I = Illegitimate, NT = Not teleological

The disagreements between coders, in the examples above, can be categorized into two groups, which can be thought of as operating in a hierarchy. First, different interpretations of terms determined whether an excerpt was categorized as teleological or not. The conjunction “so that” can be read with two meanings, the first indicating a purpose (The man ate the cake so that his hunger was satisfied) and second indicating an end state, but one that is not causal, where “so that” is read to mean “in such a way as to” (The leaf fell and landed so that it was perfectly covering the stone). In the case of the final example in Table 6 “The block swings so that the centre of mass rises a vertical distance of 0.15 m” (P5, p. 121 emphasis added), the sentence can be read as a claim that the swing occurs in order for the center of mass to reach a particular height. Alternatively, the “so that” clause can be taken to mean “in such a way that”; that is, the sentence means the mass swings in such a way that the center of mass rises 0.15 m, or simply “The block swings so the center of mass rises a vertical distance of 0.15 m.” Strictly, the meaning of “so that” indicates purpose (Collins, n.d.) and the sentence should be categorized as a teleological case. It can be further categorized as illegitimate (if the intention is ascribed to the block) or legitimate, if agency is attributed to the person who swung the object (but who doesn’t appear in the sentence). In this case, it might be assumed that the author did not intend to ascribe agency to an inanimate object.

Second, constraints are defined as having necessity; that is, they apply over time and across contexts (Lange, 2017). For an explanation to be categorized as legitimate constraint teleology, the conditions cited must have the property of necessity. However, the threshold of necessity required for a condition to be judged as a constraint is unclear. Consider the sentence: “The free electrons in each sphere now spread out, so that the charge is distributed evenly over the surface of each sphere” (P5, p. 289, emphasis added). The end state in this case is a uniform distribution of charge. The conditions that cause that distribution can be interpreted in different ways. First, the uniform distribution of charge can be seen as arising from a variational principle, whereby systems tend to an end state with a local minimum of electrical potential energy. Variational principles related to the conservation of energy can be seen as legitimate constraints (Brock & Kampourakis, 2023). Alternatively, the end state of uniform distribution might be interpreted as contingent on the absence of other local charges, and hence, the condition can be interpreted as lacking the necessity expected of a constraint. Detecting cases of legitimate constraint teleology in the context of physics can be subtle and open to interpretation. For example, in the case of “The total p.d. across both lamps is 6 V. This is shared between the two lamps, so that each lamp has a p.d. of 3 V across it” (P5, p. 245, emphasis added). One interpretation is that the sentence implies some agency in the circuit; that is, there is an intention behind the distribution of potential difference, which would be illegitimate. By contrast, the claim can be seen as a form of constraint teleology, and the potential differences across the bulbs must add to the supply EMF, due to conservation of energy, a legitimate constraint. We chose to code the sentence as legitimate constraint teleology. In the context of a falling object, the sentence “Drag and weight may eventually balance so that the projectile falls at its terminal velocity” (P5, p. 48, emphasis added) was interpreted as illegitimate by the first coder as it implies an inanimate object, the projectile, has an intention, and legitimate by the second coder as an instance of constraint teleology. On discussion, it was agreed that whilst the reading of “so that” indicates a condition linked to Newton’s first law of motion, that is, no net force leads to motion at constant velocity, the constraint is not necessary, as there can be periods of acceleration and motion at constant velocity depending on the conditions. The case was agreed as an instance of illegitimate teleology.

Two cases in Table 6 refer to the human senses and allow an application of a category of legitimate teleology from the biological context—that of functions. The disciplinary expertise (or lack thereof) of the coders may underlie the disagreement in this case. Coder 1 categorized both examples as illegitimate, due to the implication that waves travel with the intention of being perceived.

When the waves reach your ears, they make your eardrums vibrate in and out so that you hear sound. (P2, p.180)

The light reflects/transmits off an object that is luminous/non-luminous into your eye so that you see it. (P3, p. 137)

An alternative reading of these cases, preferred by coder 2 and agreed upon, is that they are cases of biological functional teleology, a legitimate form of teleology (discussed above). Such differences in coding suggest that the categorization of some statements can require discussion and may not be easily automated, as we discuss in the conclusion.

4.3 Comparison to the Respective Findings in Biology Textbooks

Teleology in biology is a topic that has been widely researched and has been considered a conceptual obstacle to understanding evolution (e.g., Kampourakis & Zogza,  2009) and heredity (e.g., Stern et al., 2022). However, as explained in Sect. 1, the problem is not teleology and teleological explanations per se, but the underlying design stance (Kampourakis, 2020). Students can legitimately explain that something exists for a purpose or role, insofar as this purpose or role has emerged (evolved) through natural processes such as natural selection. So, the key question is not whether teleological explanations are used in biology textbooks, but rather whether these are legitimate or not. In biology, illegitimate teleological explanations are those that are based on design, intentionality for non-conscious entities, or other anthropomorphic accounts. For instance, a rabbit can run and hide in order to avoid a predator—this is a behavior (intentional or instinctive, it does not make much difference). However, a rabbit cannot change its color in order to conceal itself and hide from the predator, nor can a population of rabbits gradually change their color in order to conceal themselves, unless this occurs through a process of natural selection (there is variation, and some individuals have a survival and reproduction advantage over others).

As we argue above for the case of physics textbooks, in the three biology textbooks, we have limited the search to terms where the percentage of legitimate teleological explanations was over 5% across both books (see Table 7).

Table 7 The number of incidences of phrases in the biology texts (B1, B2, B3) indicating teleological explanations, categorized as legitimate or illegitimate, not teleological, and unclear

The sentences containing these phrases were coded independently by two coders, and in all cases, there was more than 95% agreement. The legitimate teleological explanations in Table 7 were classified into five different kinds of cause (see Table 8):

  • Explanations related to human intention, that is, about something that was done (in nature or in the laboratory) in order for humans to achieve something.

  • Explanations referring to a physiological function, that is, something happening within organisms—above the cellular level—that contributed to something or maintained their physiology.

  • Explanations related to a molecular function, that is, something happening within or around cells that contributed to something or maintained their physiology.

  • Explanations citing an adaptation, in most cases with some form of the term “adapt-” itself included.

  • Explanations referring to animal behavior.

Table 8 Subcategories of legitimate teleological explanations in biology textbooks by term

Most legitimate teleological explanations were found to be related to the phrases “in order to” and “so that,” with an almost 100% agreement between the two coders. Let us consider the sentences that contained the phrase “in order to” (see Table 9, note there was one instance of “in order that”), for which all 110 sentences were coded as legitimate teleological:

  • 50 referred to a human intention, that is, about something that was done (in nature or in the laboratory) in order for humans to achieve something.

  • 30 referred to a physiological function, that is, something happening within organisms—above the cellular level—that contributed to something or maintained their physiology.

  • 18 referred to a molecular function, that is, something happening within or around cells that contributed to something or maintained their physiology.

  • 9 referred to an adaptation, in most cases with some form of the term “adapt-” itself included.

  • 3 referred to animal behavior.

Table 9 Examples of subcategories of legitimate teleological explanations in biology textbooks containing the phrase “in order to.” Bold text added for emphasis

The phrase “so that” was also related to legitimate teleological explanations, even though not all of them were legitimate. Overall, we found 150 sentences with “so that” of which 130 were legitimate teleological, 4 were illegitimate teleological, 12 were non-teleological, and 4 were unclear. Among the 130 legitimate teleological explanations, we found the following categories (Table 10).

Table 10 Examples of subcategories of legitimate teleological explanations in biology textbooks containing the phrase “so that.” Bold text added for emphasis
  • 59 referred to a human intention, that is, about something that was done (in nature or in the laboratory) in order for humans to achieve something (in several cases there were also instructions to students using the textbooks).

  • 34 referred to a physiological function, that is, something happening within organisms - above the cellular level - that contributed to something or maintained their physiology.

  • 29 referred to a molecular function, that is, something happening within or around cells that contributed to something or maintained their physiology.

  • Four had unclear categorizations.

  • Two referred to an adaptation, in one case with the term “adapt” itself included.

  • One referred to an animal behavior.

  • One referred to a constraint of the kind found in physics textbooks.

Furthermore, among the four sentences in which instances of illegitimate teleology were found, three referred to some kind of intentionality and one to design. Cases of illegitimate teleology in the biology textbooks are presented in Table 11.

Table 11 Examples of subcategories of illegitimate teleological explanations in biology textbooks containing the phrase “so that.” Bold text added for emphasis

Given these results, it is now interesting to compare the findings in the physics and the biology textbooks. Table 12 presents an overview of the results that reveal some interesting similarities. We can see legitimate teleological explanations related to “in order to” and “so that,” the two most prevalent terms, occur with approximately the same frequency in both physics and biology textbooks.

Table 12 Summary of the counts and percentages of illegitimate and legitimate teleological explanations related to “in order to” and “so that” in disciplinary textbooks

The findings in Table 12 are interesting from both a theoretical and a methodological point of view. Having made a case for the legitimacy of teleological explanations in physics based on constraints, it is interesting to find that the sample physics textbooks include a significant number of legitimate explanations of this form. Within this small sample, markers of legitimate teleological seem to have differing prevalence in the biology and physics textbooks in the sample, with “in order to” being nearly three times as common in biology as in physics, perhaps related to the agential implication of the term. The more neutral “so that” which can be applied both to agents and constraints has a similar prevalence between the disciplines. In considering the value of text mining as a research approach, the table presents some grounds for optimism, at least in the context of searching for teleological explanations in textbooks. The term “in order to” is associated with a high probability in the physics, biology, and general science textbooks in the sample of identifying legitimate teleological explanations. Such findings may act as useful guidance and caveats to science education textbook writers, teachers, and editors. When writing a sentence with the phrases “so that” and “in order to,” our findings can prompt an author that they may be using a teleological explanation of some form, and to consider the guidance we have discussed on the legitimacy of various uses. As such, our analysis has led to practically useful advice for those writing scientific explanations.

5 Conclusions

This exploratory study confirms that legitimate forms of teleological explanations are used by the science textbook authors in our samples, and the findings suggest several recommendations for teachers, science textbook writers, and researchers interested in using text-mining approaches. Before considering the implications of our findings, it is worth re-emphasizing the caveat that the conclusions are based on a small set of textbooks from one publisher. Therefore, we will avoid generalizations about the prevalence of legitimate and illegitimate teleological explanations in science textbooks in general. First, we note that, in the set of science textbooks analyzed, both legitimate and illegitimate biological and physical teleological explanations, of a number of types, were found. This suggests a simplistic categorization of teleology as illegitimate, or legitimate only in the contexts of agents, should be discarded. Our discussion of coding indicates judgements of the legitimacy of teleological explanations can be subtle and nuanced. When writing a sentence (or giving a verbal explanation in class) using one of the terms (e.g., “in order to”) linked to both legitimate and illegitimate uses, textbook authors and teachers should reflect on the conditions we suggest for legitimate use. In some cases, for example, in the context of judging whether a constraint is sufficiently necessary, judgements can be subjective and depend on semantic inferences (see the discussion above about the meaning of “so that”). We hope that the identification of markers that are likely to identify legitimate forms of teleological explanation reliably, across contexts (notably, “so that” and “in order to,” in the context of the books analyzed), can be replicated with larger data sets to provide general guidance to authors and teachers.

An aim of our study was to consider the potential of text mining as a research tool in science education. Our exploratory study highlights the potential of the tool, but also suggests that even when text searching is systematic, interpretations of language rely on subjective interpretations of meaning, so that coding text, at least in the case of teleological explanation, is also somewhat subjective. We found, during the coding process, that the context of phrases was significant for our judgements. For example, consider this fragment of a sentence: “…the area over which diffusion can take place is greatly increased. Therefore the rate of diffusion is also greatly increased, so that much more of a substance moves in a given time” (B2, p. 15). When considered in isolation, the sentence might be considered a candidate for a legitimate constraint teleological explanation—as the area of some surface is increased, the rate of diffusion across the surface will increase due to the rules of probability. However, by expanding the search, the full sentence is revealed to be:

By folding up the membrane of a cell, or the tissue lining an organ, the area over which diffusion can take place is greatly increased. Therefore the rate of diffusion is also greatly increased, so that much more of a substance moves in a given time. (B2, p. 15)

The reference to the function of the intestine suggests the explanation should be categorized as legitimate functional teleology. It is difficult to predict how much text might be required to reliably categorize a fragment. Despite such uncertainties, there is scope for greater automation of the search process we used. For example, the phrase “in order to conserve” might be used to find cases of legitimate constraint teleology related to conservation laws (for example, “in order to conserve energy,” “in order to conserve momentum,” etc.).

Contemporary accounts suggest there is a lack of agreement over criteria for “good” scientific explanations (Alameh & Abd-El-Khalick, 2018). De Regt and Dieks (2005) and Alameh and Abd-El-Khalick (2018) propose a pragmatic approach to explanation that draws on a “toolbox” of different approaches in different contexts. Rather than expecting there to be definitive rules for categorizing legitimate from illegitimate explanatory approaches, we argue that there is likely to be some scope for interpretation in the classification of cases, and hence, a fully automated approach to text mining may not be possible. Whilst human coding of cases is responsive, it is also time consuming and inconsistent; by contrast, automated coding is rapid and reliable but lacks the flexibility of human intelligence (Heap et al., 2017). A hybrid mining approach (as in our study which used an automated initial search procedure to create a manageable data set for human coding) might have the benefits of both approaches. Text mining can, by reporting patterns of current usage, support the development of recommendations for textbook authors and teachers writing instructional material. In the case of teleology for example, our search suggests that authors and teachers, in the context of physics teleological explanations, are cautious when they use the phrases “and so,” “as,” “in order to,” and “therefore.” In our sample, these phrases were associated with relatively high probabilities of illegitimate uses of teleology (albeit the total incidence of such cases was low). In this way, text-mining approaches have the potential to offer guidance to writers of science education textbooks and other teaching resources about appropriate use of scientific language.

The data set used in this study is relatively small (eight texts), and future studies, by drawing on larger corpora, will be able to report generalizable claims about language use in science education texts. For example, it would be interesting to determine patterns in the prevalence of categories of legitimate teleological explanations (e.g., human intention, physiological function, molecular function) in a larger sample of textbooks. More generally, text mining could be used to determine authors’ explanatory approaches in different contexts. Given the suggestion that approaches to explanation are contextual (Alameh & Abd-El-Khalick, 2018; De Regt & Dieks, 2005) and not necessarily clearly determined by theoretical rules, empirical data on usage patterns (for example, what forms of explanation are common in different topics in disciplines?) will be useful data for authors and teachers. Data related to trends in usage over time would also be of interest. We encourage researchers in science education to adopt the text-mining approach—it has the potential to develop novel knowledge about the customary linguistic usage of a field and hence support clarity of communication in science education texts.