Contents

  • Introduction………………………………………………………………………………………………………………2

  • 1. What are taxonomic species?..................................…………………………….......................................................4

  • Prototypical concepts………………………………………………………………………………......………….4

  • Subjectivity: neotypological concepts (NTCs)…………………………………………………………....……...5

  • Objectification: phytography and species taxon concepts (STCs)…………………………………………….6

  • The structure of species taxon concepts……………………………………………………..………………….7

  • Nomenclature…….…………………………………………………………………………………………...7

  • Description………...…………………………………………………………………………....…………….8

  • Diagnosis………...……………………………………………………………………………………….…....8

  • Cited specimens: the hypodigm……………………………………………………………...……………...8

  • Other elements of the species taxon concept…………………………………………………….…….....9

  • 2. The need to operationalise……………………………………………………………………………..................10

  • Names alone are not enough…………………………………………………………………………….……...10

  • Modelling species taxon concepts: building the data set……………………………………….…………......11

  • Modelling species taxon concepts: computational analysis…………………………………………………...12

  • Missing data………………………………………………………………………………………..………...12

  • Character types…………………………………………………………………………………..…………..12

  • Updating and optimising data sets that represent STCs……………………………………………………...12

  • Automated species identification…………………………………………………………………..…………...13

  • 3. Online publication outputs from alpha taxonomy………………………………………….…………………..13

  • Specimen character data sets………………………………………………………………………..…………..13

  • Geographical point data sets…………………………………………………………………………..………...13

  • Computational analyses used……………………………………………………………………..…………......14

  • Specimen images…………………………………………………………………………………..……………..14

  • 4. A standard reference system for species taxon concepts……………………………………………………….14

  • Glossary…………………………………………………………………………………………………………………..16

  • Acknowledgements……………………………………………………………………………………...……………....20

  • References……………………………………………………………………………………………...………………..20

Introduction

Taxonomic species are the groups recognised by plant taxonomists in formal taxonomic literature: floras, monographs, taxonomic revisions of genera and species groups and papers presenting species new to science. They are the bedrock of plant alpha taxonomy (see glossary); recent estimates for terrestrial plants put their number at over 350,000 (Nic Lughadha et al. 2016; Freiberg et al. 2020). They are known scientifically primarily from published treatments (taxon concepts) which include a detailed phenotypic (usually morphological) description as the core data of each species.

The focus of the present review is this vast array of existing species taxon concepts, which more than any other body of information, has the best claim as the current overall description of plant biodiversity, despite its many shortcomings. I am thus less concerned with advocating how new species ought to be conceived and more with attempting to understand what sort of knowledge taxonomic species represent, and why they continue to be used and described. Despite oft-expressed worries that taxonomists are becoming ever fewer in number, new published accounts of species abound, not only as recently discovered taxa, but as revisions of various kinds, and in Floras (Frodin 2001). Along with this, new herbaria are constantly being created around the world and older ones renovated and expanded (Index Herbariorum 2021).

All this invites reflection on the fact that classical plant taxonomy is still an active field of continuing relevance, despite the extraordinary advances in phylogenetics and ecology and the recognition that there are huge clades of microorganisms and fungi which are still largely unknown and whose exploration and description are being carried out using molecular systematics rather than morphology (Wu et al. 2019).

DNA provides a means of forming groups of individuals with far-reaching interpretability within the framework of evolutionary theory, and this approach is so powerful that most systematists today accept, even if only implicitly, that molecular data will ultimately become the machine language of taxonomy (Bateman 2018). Recent advances in capturing DNA from herbarium specimens (Baker et al. 2019) show that in the future macrobiological species could come to be based on molecular delimitation at sample sizes at least as great as those represented in the world's herbaria.

Integrative taxonomy, or integrative species delimitation, has arisen recently as a methodological approach in the light of the progressive influence of molecular data (e.g. Dayrat 2005; Bateman 2012). One aspect of this data integration is to find the optimal consilience between molecular data sets and data from fields such as ecology, behaviour, cytology, and others, in order to establish broadly based species taxa. Another is to reconcile, in particular, molecular and morphological patterns, because taxonomists of macrobiological organisms implicitly (and explicitly) acknowledge the importance of being able to recognise species from their morphological appearance. Understanding why this is so is one of my main preoccupations here.

The need for morphological diagnosability (Muñoz-Rodríguez et al. 2022) is widely argued on the basis of practicality, which is to say that, in addition to more profound explanations based on the latest biological research, it is incumbent on taxonomists to also provide society with a description of biological diversity that everyone, with a little effort, can understand. But it also seems that the question goes deeper, and is connected with the limitations of our human cognition. Both taxonomists themselves, as well as the general public, use their brains to comprehend and recognise species. Our mental concepts of species taxa are our main tools when navigating the complexity of biological nature, and in one form or another are common to all humanity, as ethnobiologists have shown (Berlin 1992). Without being able to mentally "visualise" species somehow, it is very difficult to think about them as entities, at least for most of the purposes for which people use them cognitively. Pursuing this line of thought leads well beyond the conventional limits of systematics to cognitive psychology and in particular to the research field of concepts, which overlaps with philosophy, linguistics and artificial intelligence, among others. My own tentative reconnaissance of this complex landscape has been illuminating in one particular respect, which is how taxonomists create, communicate and use their subjective mental ideas of classical taxonomic species. I was impressed particularly by the thought that when a taxonomist sits down to describe or delimit a species, he or she already has a group concept in mind; it seems inescapable and yet how does it come about, and with so little effort? This prompted two further considerations which motivated this review. First, it is instructive to see classical taxonomy as a brain-to-brain process in which the published taxon concepts are objectifications necessary for mental communication from one person to another and are limited by the cognitive constraints imposed by this translation process. Second, that taxonomic species are the essential medium by which the findings of deep evolutionary research can be made comprehensible and useful to everyone. These two assertions lead to a practical conclusion: taxonomic species need to be expressed as computable matrices as well as classical taxon concepts, if they are to be fully engaged in future integrative species delimitation.

The starting point for this paper is thus that taxonomic species are based on the same cognitive processes that underlie the formation of mental concepts in general (Smith & Medin 1981; Lakoff 1987; Atran 1990; Margolis & Laurence 1999; Widdows 2004; Gärdenfors 2014; Hannan et al. 2019). Rather than hypotheses arising from evolutionary theory, taxonomic species begin as concepts resulting from an in-built mental ability to categorise the biological world around us (Berlin 1992). These group concepts, initially subjective, are refined and elaborated by the process of taxonomic description, which serves to make them communicable. Once formalised, named and put into the public domain, taxonomic species provide a set of biological units for two main purposes: to serve as the starting framework for evolutionary biological investigations (in the wide sense, including genetics, ecology, population biology, etc. as well as phylogeny), and to provide a system of categories by means of which the practical, cultural and scientific needs of society can be reconciled (Davis & Heywood 1963: 1 – 2). These two aims often conflict (Hey 2001; Mayo 2022), but they can be understood as the result of a continual dialectic: taxonomic species represent a preliminary categorisation of the biological environment, whilst evolutionary biology interrogates and modifies them in the light of further research.

There is also an important difference in the premises of these two lines of work: constructing taxonomic species is, at least initially, a subjective cognitive process that imposes a perceptual grid on the observed world of organisms, and brings the species into existence as simplified abstractions from a complex environment. Evolutionary biologists on the other hand seek out what organic units really exist and for this reason they cannot commit themselves a priori to the reality of taxonomic species, only to their usefulness as a set of initial working hypotheses. Some systematists indeed do not accept that species are adequate concepts for comprehending biodiversity (e.g. Martynov & Korshunova 2022).

This difference in premises begs an explanation of my use of the terms "objective" and "objectify". As regards taxonomic species, these terms are used here to express how taxonomists communicate their idea of a given species by description, diagnosis and citation of specimens, processes which objectify concepts previously held subjectively in the mind of the taxonomist. Evolutionary biologists, on the other hand, investigate the ontological reality of species (see e.g. Burma 1954; Stevens 1994; Cuerrier et al. 1996), and for them objectivity concerns the degree to which species as concepts can be corroborated by a wide range of other kinds of investigation (Bateman 2011).

Challenges to the validity of taxonomic species can also be divided into two main kinds. On the one hand, there is the further accumulation of data resulting from wider and more intensive sampling in the field, which may alter the delimitation of the species as groups based primarily on the phenotype — this corresponds to the traditional work of taxonomic revision. On the other, there is the confrontation of taxonomic species with data patterns that can be interpreted more powerfully using hypotheses derived from evolutionary theory, e.g. genetic and cladistic hypotheses based on DNA data.

In what follows, I examine classical species taxonomy as a cyclic workflow (Sterner & Lidgard 2018) rather than a fixed end product, and follow its progression from intuitive discernment of groups to objectified description of taxonomic species (kinds), then to their later modification to fit the constantly emerging data from field exploration and the research fields of evolutionary biology in the broad sense (ecology, genetics, phylogeny, biogeography, etc.).

The first part of this paper considers how taxonomists conceive the species of taxonomy using the classical methodology still current, i.e. what do individual taxonomic species mean, conceptually and cognitively for the taxonomist who creates and uses them? This seems a necessary preliminary to the development of effective computational models of taxonomic species, and involves a reflection on how descriptive treatments are structured and what these elements contribute to the overall concept of the taxon.

The second part considers how species taxon concepts could be operationalised so that they can participate more effectively in integrative taxonomy (Sterner & Lidgard 2014). Modelling species as lineages using genetic, ecological and geographical data sets (including ecological niche modelling and functional trait analysis) has created a growing need for access to the variation that underlies species taxon concepts, which the classical descriptive formats do not adequately provide.

In the shorter third and fourth parts some potential outputs from the species taxonomy workflow are mentioned, and a brief discussion is given concerning the reference system for taxon concepts which is accumulating online. The mobilisation of the traditional descriptive data offers a way forward for more effective and agile updating and refinement of species taxon concepts through online systems of collaboration among taxonomists and their institutions. The traditional taxonomic framework currently acts as a sampling frame and reference system for the data of evolutionary biology (including ecology, genetics, environmental modelling and conservation), but operationalisation could augment this with modelling and simulation, using the data on which most descriptive species taxonomy continues to be based.

Some of the terminology used here can be confusing because of differing usage across the literature concerned with species in the fields of systematics, pattern analysis, cognitive psychology and philosophy. I have provided a glossary of terms at the end in an attempt to make the arguments easier to follow.

1. What are taxonomic species?

Taxonomic species are vaguely defined concepts (see glossary) with prototypical effects. They exist subjectively in the minds of taxonomists and also in relatively objectified form as taxon concepts set out within standard descriptive frameworks.

Prototypical concepts

Various authors have discussed the relevance of the work of cognitive psychologists and ethnobiologists to understanding the conceptual basis of the taxonomic species of biological science (e.g. Atran 1990; Berlin 1992; Hey 2001; Winsor 2003; Zachos 2016). Hey's exposition is particularly significant in the extended argument presented for distinguishing between taxonomic species (taxa, ontological kinds, cognitive entities) and the real species (ontological individuals, evolutionary groups) that we presume to exist outside human cognition and which are the object of evolutionary researchFootnote 1.

Cognitive psychologists have shown that people automatically categorise objects under kind concepts. Their majority view is that these concepts are not the same as logical classes or mathematical sets, i.e. definable by a finite number of necessary and sufficient conditions, in other words, by an essence; the latter is regarded as the classical view of concepts and although it has the merit of providing clear conceptual boundaries, it is no longer widely accepted by cognitive scientists as a generally applicable explanation of the structure of concepts (Smith & Medin 1981; Lakoff 1987; Murphy 2002; Hannan et al. 2019), despite remaining important in theoretical discussions (Margolis & Laurence 1999).

Although the mental structure of "kind" concepts remains controversial, experimental studies of categorisation, particularly those of E. Rosch and colleagues, showed that concepts exhibit prototypical effects (Rosch 1978). That is, the category of objects instantiated by a concept has a central collection of typical members, with the more peripheral individuals becoming increasingly atypical, and the category boundaries are vague; i.e. when we categorise intuitively we look for typicality rather than boundaries. Typicality can be expressed through the idea of a central prototype — which may be understood as an abstraction (analogous to a mean vector) – but also by several actual exemplars, which taken together represent the central idea of a concept (Murphy 2002: 75; Winsor 2003Footnote 2). Fuzzy sets (Zadeh 1982; Berkes & Berkes 2009) are a formalised analogy of the notion of a category with vague boundaries; more familiar to taxonomists is the similar if less precise idea of polythetic groups (Beckner 1959; Sneath & Sokal 1973).

Rosch also demonstrated that within the hierarchy of categories recognised by human cognition in any given domain (e.g. plants), there is a basic-level category which is more distinctive and information-rich than any other level. Ethnobiologists (e.g. Berlin 1992; Zamudio & Hilgert 2015) have established that prototypical basic categories of biological kinds are universally recognised in human cultures, and also that folk biological kinds are not restricted to those which are useful. Recognition of biological diversity through the medium of categories is a fundamental aspect of people's perception of their environment. It has also been established that because of their greater familiarity with biodiversity, experts recognise finer-grained categories than non-experts, forest people than urban people, etc. (Murphy 2002: 229 – 233; Remagnino et al. 2017: 81 – 95).

Atran (1990, 1999) traced the development of biological taxonomy from its folk biological origins. He argued that folk biological taxa are cognitively autonomous from modern scientific taxa, and owe their stability and universality in human society to their possession of cognitive essences or natures; this is a notion that presumes a fixed, if not explicitly definable core of properties. By focussing on essences, Atran criticised prototype theory as inadequate for explicating natural kinds when viewed from the folk perspective. But he also recounts the gradual historical transformation of essence-based taxa into their modern evolution-based versions, in which essences are untenable (e.g. Mayr 1982; Winsor 2003; Wilkins 2009). Atran's arguments expose the fundamental role that taxonomy has in mediating communication between science and society in regard to biodiversityFootnote 3.

Philippe Lherminier (2008, 2009, 2014, 2018; Lherminier & Solignac 2005) places the notion of species within a broad modern societal context and argues that the categorisation of organisms into species is a fundamental cultural phenomenon, necessary not only for scientific ends, but for all human communication that involves discourse about the organic world. In the current context this is particularly significant with regard to debates on the value for human society of biodiversity, its conservation, its extinction and our own identity. Because of its very general approach, Lherminier's work highlights the universal need for a working system of species categories, which can be seen as the first and unique responsibility of taxonomists.

Subjectivity: neotypological concepts (NTCs)

One of the consequences of accepting that people already carry about in their heads a large informal library of prototypical concepts for biological groups is that a taxonomist is unlikely to encounter a situation in which no prior categorisations are relevant; more-or-less unconsciously, we have all acquired a wide range of biological kind concepts (plant, tree, palm, fern). We therefore never approach a taxonomic task with a clean mental slate but instead with an a priori framework on which we can build more refined concepts.

The first and most important phase of the taxonomic workflow is to discern (perceive, discover, see glossary) the species, which is, initially at least, a subjective process. It involves the mental construction by the taxonomist of a concept of each species which can be held in memory and which becomes more objective later in the workflow. This cognitive process was described by Davis & Heywood (1963: 11) and they christened the resulting subjective structures as neotypological concepts of species, referred to henceforth as NTCs (see glossary). They correspond to the concepts of cognitive scientists and are characterised by prototypical effects, i.e. when applied to real individuals, e.g. as specimens or living plants, they result in polythetic groups with typical and non-typical members and vague boundaries (i.e. the categories of cognitive psychology).

This initial intuitive phase of the taxonomic workflow was called tâtonnement ("trial and error") by A. P. de Candolle (1813: 67)Footnote 4 (see also Sneath & Sokal 1973: 20; Winsor 2003: 391). At this stage only subjective analytical processes seem to be involved. As far as scientific biological taxonomy is concerned, the formation of NTCs is a "black box" process that cannot be precisely modelled because the cognitive mechanisms involved are not understood in detail and the data acquired by each taxonomist is essentially unrecoverable in its totality. Models of the taxonomic grouping process produced by biologists focus on more objective methods (phenetics, cladistics, etc.) and bypass the subjectivity inherent in alpha taxonomy. However, understanding the workflow that produces taxonomic species necessitates consideration of this subjective phase since it is the most fundamental part of the whole process. Engagement with cognitive science could be potentially fruitful, e.g. for clues to find appropriate computational methods to model NTCs (e.g. Tversky 1977; Tversky & Gati 1978; Hannan et al. 2019). How humans categorise the world is a major research field (e.g. Murphy 2002) and its results are highly relevant to this first phase of the alpha taxonomic workflow even if no general consensus yet exists on how categorisation works in detail (Hey 2001; Murphy 2002; Gärdenfors 2014; Hannan et al. 2019).

Taxonomists themselves, perhaps understandably, have paid relatively little attention to "tâtonnement". Some authors have described the process, e.g. Diels (1921), Hitchcock (1925), Sprague (1940: 448 – 449), Mayr (1942: 12 – 13), Mayr et al. (1953: 72 – 73), Mayr (1969), Cullen (1968: 176 – 177), Leenhouts (1968: 26 – 27) and Jeffrey (1982: 13 – 17). These accounts mostly describe how kinds/species are recognised during the examination or naming of a miscellaneous collection of specimens or when starting on the revision of the species of a taxonomic group. This stage is often represented as a sorting process: the specimens are sorted by visual inspection into groups whose component individuals are judged to be the "same", meaning that the differences between the groups at this stage are based largely on overall subjective impressions. Cullen (1968) argued that this was the result of a comparison of specimen Gestalten and that the diagnostic characters of species were determined only after the initial groups had been established cognitively. Sprague (1940: 449) and Jeffrey (1982: 13) referred to the discernment of kinds respectively as "appreciating what is called the 'facies' of a plant" and as "identification" (as distinct from "naming"). In practice, the sorter also uses a hierarchy of concepts they have already learned, e.g. a plant collector will sort specimens into families or possibly genera before sending them to a specialist, depending on their prior knowledge.

During the tâtonnement phase, the taxonomist thus builds her cognitive NTC of each species by a scanning or "training" process (in the data-mining sense). She gathers visual data from examining living plants and specimens and studies published species treatments. Usually the taxonomist will write notes and make measurements, dissections and drawings to check visual impressions or investigate the occurrence of particular character states; this includes first-hand observations in the field and in cultivated plant collections. By determining available specimens using existing keys and descriptions the taxonomist becomes familiar with the characters used to describe and delimit the taxonomic species recognised by her predecessors. The details of this process have so far been little studied (see e.g. chapter 5 in Remagnino et al. 2017).

From a practical standpoint, i.e. the recognition and determination of individuals to species, a taxonomist's NTCs are her key working tools. The effort required to create a "library" of NTCs is the critical initial creative phase of carrying out taxonomic work. The NTC that a taxonomist has of a species is constantly under mental review and its effectiveness depends on the memory retention powers of the taxonomist and the constantly changing range of individuals and information scanned.

Objectification: phytography and species taxon concepts (STCs)

The mental construction of the NTCs necessarily precedes the objective processes of description and delimitation used to reify (see glossary) taxonomic species as groups (categories) of individuals; the description and delimitation of a taxon require its prior existence in some conceptual form. This is the central "mystery" of taxonomic species. How does it come about that a species can be described or delimited without a prior grouping concept? The preceding section provides a solution to this conundrum. There is a prior concept, the NTC, but it can only be understood (and investigated) from a subjective cognitive standpoint.

Another question then arises — is a given taxonomic species (species A): i) the subjective mental concept each taxonomist has of it (the NTC), or ii) an objectified published description (and if so which, if there is more than one?), or iii) a published diagnosis, or iv) a finite collection of specimens so named by taxonomists in herbaria (which taxonomists?, which herbaria?), or v) the potentially infinite aggregate of plants in nature which correspond to the description and diagnosis? In the case of highly endangered species, a finite set of plants could probably be established in practice at any one time, but their reproduction introduces indefinability in the time dimension.

In reflecting on this quandary it is worth remembering how the practical workflow operates. In any given context the taxonomist uses her NTC of species A to form categories of individual plants corresponding to that species. These categories may be formed directly, as in the operation of plant identification: in determining individuals of species A from a collection of specimens or in the field, a category is formed representing that species without the interpolation of a description. In published studies, the taxonomist uses available material to create a description and diagnosis to which the species name points and which she and others can then use as an aid to the determination of individual plants. The description and diagnosis are an objectified version of the taxonomist's NTC of species A. They vary according to the geographical scale of the study (monograph, flora account, etc.). Published studies also vary in how many of the specimens examined and determined by the taxonomist are actually cited. This suggests that from an operational standpoint, a given taxonomic species can be understood as a process, or workflow, rather than a single conceptual entity. For a taxonomist, taxonomic species A consists of a processor — a complex of subjective NTC and objective descriptions and diagnoses (many species, perhaps a majority, exist as a collection of different descriptions rather than just one). Categories of individuals are produced "on the fly" by inputting different individuals to this processor and outputting as members of species A those that correspondFootnote 5.

In a given taxonomic publication, the taxonomist commits herself to a particular reification of species A, based on her prior NTC, a particular sample of individuals, and a particular set of characteristics. This commitment implies a proposition that is embedded in the contemporary worldview of the taxonomist — "the group of individuals which can be more-or-less delimited with these characters, based on the sample of individuals which I have examined, represents a species that exists in nature". What the taxonomist understands by "species" depends on the period when the description was published. Since the Modern Synthesis (Huxley 1940, 1942; Mayr & Provine 1980) a species has been understood by almost all biologists as the result of evolutionary processes, but this has not greatly changed the workflow of recognising and describing taxonomic species in angiosperms (Winston 1999). While this can be understood as evidence of the persistent usefulness of classical methods, it has also acted as a brake on methodological innovation in descriptive species taxonomy (Bateman 2011, 2022).

Unlike the tâtonnement phase, the process of description and delimitation of plant species (and other taxa) has received much more attention and is traditionally known as phytography. Linnaeus's Philosophia botanica (Linnaeus 1751; Freer 2003) is the foundational work of this field and has been followed by other treatises, among which may be cited A. P. de Candolle (1813, 1819), Lindley (1832), A. L. P. P. de Candolle (1880), Diels (1921), Stearn (1973) and Winston (1999, for all groups of organisms). The objectification of taxonomic species takes place when the species treatment is placed in the public domain by publication. At this point they become part of the canon of biological taxonomy. If any species has not previously been recognised and is published with a new name correct according to the rules of nomenclature (Turland et al. 2018), the date and place of publication represents its formal beginning as a recognised concept. But any newly published species treatment, including revised versions of those published previously with the same binomial, constitutes a unique taxon concept of that species name and so all such treatments can be considered new to some extentFootnote 6.

The notion of the taxon concept (taxonomic concept, potential taxon) is a very useful idea introduced by Berendsohn (1995) and Geoffroy & Berendsohn (2003), that facilitates the discrimination of groups published with the same scientific name at different times and places but which differ in some significant aspect such as sample size, character set or geographical scope. It highlights the fact that every taxonomist, in preparing a species treatment, bases it not only on their unique subjective view, but also on an objectively different sample of characters and individuals. A species taxon concept (STC) is a complex statement consisting of nomenclature, description, diagnosis (including key entries), cited specimens, images and geographical and ecological characterisation, located in a particular, dated, published taxonomic treatment by a particular authorship. In creating species taxon concepts, modern taxonomists attempt to model units of biodiversity which make sense intuitively to both scientist and layperson and which are also consilient (Wilson 1998; Bookstein 2014) with the diachronic lineages and synchronic reproductive communities that evolutionary biologists investigate. As Hey (2001) discusses at length, this correspondence can never be more than approximate because our mental cognitive apparatus (Riedl 1986) imposes a prototypical structure on NTCs and hence STCs, and also tends to establish concept boundaries that are more discrete than those to be expected in real evolutionary groups.

The structure of species taxon concepts (STCs)

Taxonomic species are taxa, and the STC notion is simply a refinement of a taxon. Hey (2001) preferentially refers to species taxa as categories, using that term in the sense deployed by cognitive psychology (category versus concept, see Glossary). He also discusses taxa as natural kinds, a philosophical concept on which there is a large and discordant literature (see e.g. Atran 1990; Bird & Tobin 2018 and references therein), which among other things, turns on the question of whether natural kinds have essences. To avoid these complications, and to understand a species taxon from a functional (workflow) perspective, an STC can be viewed as a processor which assigns individuals to groups called taxonomic species. Although this is analogous to the way the definition (intension) of a classical class or set can function, a STC is no simple definition, but rather a complex of distinct elements which together offer a multifaceted perspective rather than a single view. Each of these elements is discussed in turn:

nomenclature. The indispensable nomenclatural element of the STC consists primarily of its binomial name, correctly formulated and published by the botanist according to the rules of the nomenclatural code (Turland et al. 2018). Other information may be present (type specimen citation, literature references, synonyms, etc.), but the function of the species name is that it labels and denotes the taxon conceptFootnote 7. From the very beginning of the nomenclatural code (A. L. P. P. de Candolle 1867, 1868, Art. 46), and even since Linnaeus's Species plantarum (Linnaeus 1753), the most vital connection has always been between the binomial name and a descriptive statement to which it applies — it has long been fundamental in scientific taxonomy that without a descriptive statement to point towards, the name in itself has no meaning. This critical link is correctly made via the nomenclatural type specimen. However, as many authors have pointed out (Simpson 1940 is especially lucid on this question), the type does not have the role of representing a prototypical individual of an STC; what is typical will depend on the sample of individuals on which the STC is based and this changes as field exploration and data analytical techniques progress.

description. Under the interpretation put forward in this paper, the taxonomic description of a species is the result of objectifying (reifying) the cognitive NTC that a taxonomist has already conceived mentally and which she has used to pick out a representative collection of individuals (usually dried specimens). Based on this sample, she prepares a description which consequently reflects the same prototypical characteristics as her NTC. The description has three major features as a group-defining device: i) to present the most typical values (and states) of the characters or variables used, based on direct observations of the sample of individuals seen, ii) to express the range of variation observed, and iii) to provide a word painting that awakens in the reader's imagination a picture of the most typical members of the species.

Because the history of botanical description consists of the development of textual rather than mathematical modes of expression, an extensive glossary of technical descriptive terms has been developed and deployed to present the typical (average, commonest) values of descriptive characters and their ranges of variation (A. P. Candolle 1813; Lindley 1832; A. L. P. P. Candolle 1880; Diels 1921/1924; Stearn 1973; Freer 2003). Winsor's important discussion of the method of exemplars in classical taxon descriptions (Winsor 2003) demonstrates how their prototypical characteristics derive from the practical workflows used by Linnaeus and Cuvier for describing genera in which a single species was selected to prepare the description and the characters of the other congenerics were systematically compared until a core set of characters for the taxon was obtained (Cuvier 1828; Whewell 1847; Stearn 1957; Farber 1976). Hitchcock (1925: 86 – 93) provides a detailed account of the preparation of a species description which begins with the choice of a "single average specimen as the basis" and goes on to explain how character variation should be dealt with. Winsor (2003) highlights William Whewell's analysis in which he argued, in resonant Victorian prose, that natural classes, normally only vaguely delimited, were more effectively defined by a central representative element rather than a boundary (Whewell 1847)Footnote 8.

A geometric model of a species description is a cloud of points (specimens/individuals) within a multidimensional feature space, each of whose axes corresponds to a character used in the description. Towards the margins the points are more rarefied and towards the core region they are denser. The core region best characterises the species as a group concept and its textual representation is the goal of a species description. This point cloud metaphor corresponds to a fuzzy concept of species (McCloskey & Glucksberg 1978) in which the typicality of individuals is expressed by their aggregate group membership values.

The entity represented by a taxonomic species description can thus be thought of as a polythetic group, a multidimensional fuzzy set, or a prototypical concept, all different ways of expressing a group characterised by varying typicality of its members and vague boundaries.

diagnosis. The diagnostic element of the species taxon concept is an attempt to fit a classical set definition onto the polythetic entity represented in the description. The diagnosis emphasises the boundary definition (delimitation) rather than the core (typicality) and corresponds to an intensional class definition, that is, a set of properties that an individual must have in order to be assigned to the species — "If an individual A exhibits state m in character x, state n in character y, ... etc., then A belongs to species S". These properties may be provided in a key or in notes following the description or in accounts of new species there may be a separate diagnosis expressing such a definition. The search for maximally diagnostic characters plays a crucial role in preparing the STC. The point of the diagnosis is not to define the taxonomic species as classical set, but to help the user to determine the species of individual plants by highlighting the gaps between the species. The boundaries of the character definitions presented are only aspirational and contingent on the scope of a taxonomic study; they will almost certainly suffer later modification, e.g. with the availability of more collections and the inclusion of new character data. The taxonomist works to ensure that these diagnoses provide the most discrete delimitations possible with the material examined. Using a statistical analogy, the diagnosis and key constitute a best fit to the conceptual object represented by the description.

cited specimens: the hypodigm. The published list of the specimens which accompanies the description and diagnosis is yet a third way of viewing the STC, in this case as an extension (Kripke 1972; Putnam 1973, 1975; Richards 2010: 185 – 186), that is, the set of individuals to which the description and diagnosis apply. The cited specimens are a way of demonstrating ostensively what the species is, in a way that invites the observer to deploy their mental Gestalt capability. The citation list allows in principle an observer to comprehend the STC by simply looking at the specimens — "Species A is like this, and this and this" — rather than become involved in detailed character observation and analysis. The specimen list is also crucially important in pointing to an authenticated sampling framework for researchers who wish to examine other character fields such as anatomy (micromorphology), phytochemistry and DNA. In the parlance of data-mining, the specimen list represents a training set (Ripley 1996; James et al. 2013). Since it is these specimens which are the objective foundation of all the other elements, the list is also vital when another taxonomist tests the STC against a new version based on different data — "Do these individuals still form an optimal group when, e.g. new characters are added? other specimens are included? an independent data set is substituted? etc."

In practice, the purpose of citing specimens (other than the type) in an STC is to present a sample from a population of unknown but presumed very large size. The list is usually composed of a selection of the specimens seen by the taxonomist and judged to be most typical, i.e. determined with the greatest confidence. Using the point cloud model, the cited specimens will normally be those forming the central core of the STC. Taxonomists may draw a distinction between those specimens they are prepared to cite in an STC (which is a published statement) and the often greater number they have determined with this species name in herbaria, which will include many they are not so confident about but which are judged to be better placed within this STC than any otherFootnote 9. Simpson (1940, 1961) coined the term hypodigm for all the specimens seen and judged by a taxonomist to belong to a particular taxon. My account of the cited specimen list describes what might be expected in a monographic treatment, but in many floristic treatments and other kinds of published format, there are editorial restrictions on the number of specimens that may be cited.

In principle, the description and diagnosis must fit the cited specimens as accurately as their essentially textual mode of presentation allows. But although the purpose of the STC is that it should apply to a very large number of individuals representing an inferred evolutionary species or lineage, in fact the taxonomist can only guarantee that it fits the individuals studied, as is the case for quantitative models generally (Bookstein 2014). It is thus unsurprising that with the growth of herbarium collections, succeeding taxonomists find that their STC differs from preceding ones with the same binomialFootnote 10.

other elements of the species taxon concept. The images that often accompany an STC are the element which fits most perfectly the idea of an exemplar as previously discussed. When the image is a line drawing, the STC is represented visually as a single plant. It answers the question "what does this species (and its component organs) look like?" Most images are not intended to represent variation, but rather present an individual which is judged to be a typical (centroidal) example of the individuals studied.

The geographical and ecological information presented in an STC has normally been gathered from specimen labels, supplemented by the taxonomist's own field observations. Objectively, its range and specificity are constrained by the list of cited specimens, since these are the authenticated specimens included in the published taxon concept. This is critically important information for users of species taxonomy: geographical and altitudinal locations, habitats, e.g. vegetation and substrate types, life forms, and the months in which the plants flower and fruit.

The significance of the ecological and geographical characterisation of a taxonomic species cannot be overemphasised. The link between verified location of specimens and their authoritative taxonomic determination is a vital output of the alpha taxonomic workflow. These are the data that establish most rigorously the geographical distribution of the STC and are used to establish habitat profiles for species using GIS software. It is worth noting that long before the development of modern phylogenetic analysis, geographical and ecological envelopes were already considered to be characteristic features of taxonomic species and keys to understanding speciation, as expressed for example by the geographical-morphological method (Wettstein 1898; Rothmaler 1955; Davis & Heywood 1963), the Rassenkreis theory (Rensch 1929, 1934) and allopatric speciation and the theory of subspecies (Mayr 1942, 1963).

2. The need to operationalise

Taxonomic species function in science as static indexes: a binomial name attached to a piece of data implies a link to standard sets of properties (e.g. morphological, geographical, ecological, etc.) or reference to a distinct and potentially recoverable evolutionary lineage. In reality these links are mostly aspirational because no consensus exists on current species taxon concepts and they are neither easily available nor accessible computationally.

Names alone are not enough

Scientists use taxonomic species names to index their biological data, but the name alone is less useful than it appears. Because there is no standard system and because there may be a diversity of STCs for each species name, a binomial name alone does not point directly to a scientifically verifiable STC unless the author has made a particular effort to do so. Take, for instance, a published table in a biological paper in which data values of some kind are listed along with the binomial names of the taxa. These names represent identifications of the study material, but there is rarely any reference or means of establishing which STC was used by the identifier, e.g. a recent monograph. At most, one can expect to find the author abbreviation of the botanist who first published the name, e.g. L., Hook.f., Vahl, Engl. etc. The author's name can lead the investigator to the original STC, e.g. Linnaeus (1753), and thence to a type specimen, but this data trail does not often lead to the current STC. Finding the latter, given a species binomial, is a problem that usually only a trained taxonomist can solve and even then can be very time-consuming. Even citation of voucher specimens (e.g. for GenBank) leaves the link between published data and STC only half made. Who but a specialist can verify the current STC to which the voucher belongs?

Taxonomic species are upstream in the workflow of other biological disciplines, their recognition and description precede further studies. Most STCs have never been subject to any other investigations beyond their description; they are premises rather than conclusions. As premises, STCs have to be formally named in order to enter scientific discourse, and this is the reason why biological nomenclature at species level is based primarily on taxonomic species rather than groups produced by other research areas.

However, taxonomic species are subject to constant change because of accumulating knowledge about organisms resulting from new research, including field exploration. Each (published) STC differs from all its precedents and in the absence of monographic studies none can be regarded as a universal standard and even then only temporarily, as monographic STCs will go out of date. Change in STCs is also often accompanied by nomenclatural changes.

In recent years an enormous amount of work has gone into tackling this information labyrinth and today online databases such as Scratchpads (Smith et al. 2012), The Plant List (2013), the Leipzig Catalogue of Vascular Plants (Freiberg et al. 2020), Biodiversity Heritage Library (BHL 2021), Catalogue of Life (Bánki et al. 2021), Global Biodiversity Information Facility (GBIF 2021), International Plant Names Index (IPNI 2021), Plants of the World Online (POWO 2021), Tropicos (2021), World Checklist of Selected Plant Families (WCSP 2021), World Checklist of Vascular Plants (WCVP 2021), and World Flora Online (WFO 2021) make it easier to trace names to STCs and to synonyms. However, the problem remains how to decide which STC is currently optimal, given the data available.

Modern biodiversity science generates ever-increasing amounts of data, especially in ecology, genetics, phylogeny, conservation and bioprospection. How and whether this data should be integrated into the taxonomic delimitation of species is a major issue; Barkley (2000) gives an illuminating account of the contemporary context and workflow for describing taxonomic species. The STCs of angiosperms are still mostly based on the data available on herbarium specimens, and the account presented here of the recognition and description of taxonomic species still applies as regards angiosperm STCs generated by contemporary systematics (Bebber et al. 2010; Wood et al. 2017; Grace et al. 2021). This is probably because the whole process still depends on data capture by visual scanning of plants and plant specimens and a largely cryptic, cognitive group-forming process (subjective data analysis). The data used in other biodiversity research is not so easily scanned subjectively, is captured and stored (often by instruments) in the form of data matrices (data sets) and group-discovering analysis is computational.

The rise of integrative species delimitation is a response to this problem (e.g. Bateman 1999; Sites & Marshall 2003; Dayrat 2005; Duminil & di Michele 2009; Bateman 2011; Bateman 2012; Scotland & Wood 2012; Wheeler et al. 2012; Edwards & Knowles 2014; Williams et al. 2014; Pante et al. 2015; Wood et al. 2015; Bateman 2018; Muñoz-Rodríguez et al. 2019; Yang et al. 2019; Grace et al. 2021; Bateman 2022), largely motivated by the increasingly large amounts of molecular data available for systematic research. The goal is that taxonomic species should reflect not just the patterns of the data which were used to generate them, i.e. mostly morphology, but also the ever-richer harvest of new information. However, it is clear that without transforming STCs into data sets, the contribution of morphological data to new integrative STCs will be inadequate, and consist mainly of the initial species determination by a taxonomist of the material used. The search for consilient patterns of data from, e.g. morphology, genetics, geography and ecology, can only engage morphology-based species taxon concepts properly if they are in matrix form, like the other information being analysed.

Not all integrative studies include a morphological component, but when present it is normally provided by a morphometric analysis of data newly sampled from individuals that have been determined to species or infraspecies taxa using pre-existing STCs, e.g. Bateman & Denholm (1988), Joly & Bruneau (2007), Bateman et al. (2013), Su et al. (2015), Yang et al. (2019). These morphological data sets could be said to represent the prior species taxon concepts in the absence of computable data stemming from the STCs themselves. In essence, the creation of these morphometric matrices amounts to the substitution of the published STC by the new data set. In the absence of a morphometric analysis, the taxonomic input to integrative species concepts consists essentially of indexing by a binomial (i.e. acts of species determination — essentially a black box process not accessible to analysis) that may not even point to a particular accepted taxon concept. The danger here is the possibility of an increasing disjunction of the results of biodiversity science from the species taxonomic framework on which it is based (Pante et al. 2015).

Modelling species taxon concepts: building the data set

Converting species taxon concepts into matrices began with the development of interactive keys and computer-generated descriptions (Pankhurst 1970, 1991) and has probably been most widely implemented using the DELTA system (Dallwitz 1974, 1984); more recently developed systems for the same purpose are LUCID (Lucidcentral 2020) and Xper2 (Ung et al. 2010a, b).

Since the purpose of an interactive key is species (or other taxon) identification, each taxonomic species is represented by a single character state vector of the most diagnostic features; the expression of character variation is more limited than the content of a full description. Thus, although the character vector of a species in a key matrix is a representation of its taxon concept, it by no means expresses the full extent of the data underlying the description. Furthermore, there is often no direct reference in a key that anchors the character vector to a full, published STC, thus weakening its status as a standard. In effect, the key matrix vector of a taxonomic species is an enriched version of the diagnostic element of its STC rather than a full expression of all the elements.

A different approach has been pioneered by Don Kirkup and co-workers, which has as its starting point the published STCs of standard botanical literature, e.g. a major Flora such as the Flora Zambesiaca (Kirkup et al. 2005). This entails the transformation of the character information embedded in species, genera and family descriptions using data-mining techniques for text, re-presenting it as data matrices which can be used for taxon identification, and searching for correlations between characters, habitats and geographical locations (Tucker & Kirkup 2014). This approach holds the potential for including all the information that can be harvested automatically from a published STC (nomenclature, description, diagnosis, cited material, ecology and geography) in a database, giving it wider scope than the species instantiations of matrix-based keys. The limitation here is that the STCs of each species are still fundamentally represented as a single vector. Its prototypical properties are not available for analytical treatment since the multivariate and polythetic variation expressed by the grouped individuals that are the basis of the STC are presented in only rudimentary form, at most as simple ranges of character values. Operationalising published STCs in this way is an important step forward, but does not necessarily lead to a clear indication of which STC should be regarded as the most comprehensive, or generally accepted, when several exist for the same binomial — this can only be established by a monographic treatment.

A full presentation of the character data in matrix form of the individuals constituting the basis of an STC, directly linked to a traditional published STC and authored by an acknowledged taxonomic authority, has been pioneered by very few botanists, but prominent amongst them is Andrew Henderson (e.g. Henderson 2005, 2006, 2011, 2012). His palm monographs and accompanying data sets can be regarded as attempts to operationalise his species taxon concepts and the matrices are freely available online for re-computation by whoever is interested in taking the analytical techniques further. The data is fully documented and the only important missing element is a complete and repeatable protocol for analysing the data, which future studies with software languages like R (R Core Team 2020) should be able to supply using reproducible research outputs such as R Markdown (Xie et al. 2018).

Henderson's data sets hold out the future prospect of a standard training set for taxonomic species in cases where STCs have been transformed into computable data. If we envisaged the maintenance and development of such data sets for each taxonomic species, e.g. by taxonomists represented by Scratchpad communities (Smith et al. 2012), the goal of a standard or consensus STC could in principle be achieved. Conversely, as long as STCs continue to be placed in the public domain through the medium of traditional publication formats (even when online), it is likely that the present situation will continue, where multiple STCs are available for the same binomial and informed choice of the best one to use remains something that only the specialists themselves are capable of making.

Kilian et al. (2015) have provided a detailed description of a taxonomic workflow based on capturing and organising data from individual specimens using the European Distributed Institute of Taxonomy Cyberplatform in such a way as to make possible accumulation, management and online publication of the resulting data sets. The workflow is designed to utilise both genetic and phenotypic data types. This is an implementation which provides a way forward for a system that could be used generally for plant species taxon concepts.

Modelling species taxon concepts: computational analysis

missing data. Transforming an STC into an operational data matrix involves the solution of difficult problems. One of these is the heterogeneity of sampling at the character level, which is a consequence of missing data. The specimen data on which a plant description is based are usually a patchwork in which the number of observations varies from character to character. This results in a matrix with many missing data and widely different sample sizes between characters, e.g. leaf characters in most plant species descriptions are likely to be based on a larger sample than, say, the number of ovules per ovary locule. Dealing with missing data in matrices is itself a field of research in computational statistics (e.g. Schafer 1997; Josse et al. 2020) and there are many approaches available.

The textual format of species descriptions disguises the shortcomings of missing data. The descriptive statements of the different characters appear equivalent, when in reality the samples on which they are based may differ considerably, from one to many hundreds of individuals. This is a consequence of the way the taxonomist aggregated the specimens initially into groups. Specimens differ considerably in the characters they present, even when fulfilling the normal requirement of including vegetative and reproductive organs in the same collection. One collection may have flowers but no fruits, and another fruits but no flowers and these organs may be at different stages of development. Some vegetative organs, e.g. tubers, may have been collected in only a few cases. In the initial phase of subjective grouping, the taxonomist makes judgements of similarity among the specimens and decides which ones will form the group of individuals to be described. Thus even if only a single fruiting specimen is available in the specimen set, provided its inclusion is regarded as reliable on the basis of the other characters it exhibits, then it may be used to contribute data on fruits to the taxonomic description.

This mental construction is in itself a model and in its way is flexible and effective. We implicitly make the assumption that statements about one character — based on observations recorded from 50 individuals — are equivalent to those about a different character based on, say, five individuals. We take the typical values of two such characters to be equivalent, despite the differing probabilities that they are, given the difference in their sample sizes (Jardine & Sibson 1970). If we were to model this computationally it could be seen as a question of generating simulated data for the empty matrix cells, based on a set of assumed probability distributions for the characters. While this looks like cheating, in fact it is no more than an explicit model of our intuitive expectations stimulated by the text. This kind of approach could be pushed further. A description could be used as a framework for constructing a model in which every character had a hypothesised probability distribution, parametrised by the available cited specimens and other verifiable data. This could be used to generate a set of virtual individuals for simulating populations of taxonomic species described from very few individuals. Modelling species in this way is comparable to ecological niche modelling in which geographical and ecological properties parametrised on real locations of species individuals are used to extend the known areas of occurrence of species in both space and time, based on probabilistic simulation.

character types. Another obstacle to transforming taxonomic descriptions into matrices is the mixture of qualitative and quantitative data types (Abbott et al. 1985; Podani 2000). Most computational methods used in systematics require either qualitative or quantitative data and cannot deal with both. Various tactics have been adopted in the past to circumvent this difficulty, including division of the data set into qualitative and quantitative subsets or the conversion of quantitative variables into qualitative characters or vice versa. Gower (1971) created a coefficient which allowed the combination of quantitative and qualitative data into similarity or distance measures and this has been widely used (Legendre & Legendre 1998: 258 – 260; Gordon 1999). Podani (1999) developed Gower's further so that ordinal data could treated as a specific data type. However, modern data-mining techniques offer a greater range of approaches, some of which can handle mixed data, e.g. Hastie et al. (2009), James et al. (2013). Modelling taxonomic species in the future is likely to focus on adapting these more recent methods.

Updating and optimising data sets that represent species taxon concepts

The primary purpose of data sets representing STCs is not so much to replace the traditional workflow, but to add to it 1) a more convenient way to update the STC as new collections accumulate, 2) the computational comparison of STCs with data sets from other fields and 3) the optimisation of STCs in the light of reciprocal illumination, i.e. necessary changes to the STC caused by insights from other data (genetic, ecological, etc.).

Alpha taxonomy is a continual work-in-progress, a constant workflow, not merely the description of new species. Operationality implies that the STCs should be maintained in the public domain and constantly updated. New individuals and new characters are recruited as research progresses. The growth of the data sets means that they must be regularly re-analysed to optimise the partitions of individuals that correspond to taxonomic species. Reverse-engineering the data sets back into textual descriptions is also needed, as pioneered by the DELTA project (Dallwitz 2018), since the demand for this format will surely remain. This implies that with time, online data sets will replace text descriptions as the foundational scientific basis of the STCs, and text descriptions will take on the role of products derived from them.

Seeking a running optimum for the partition of individuals into groups based largely on morphological data and using computational methods would not make taxonomic species any less provisional within the context of biology as a whole. It is a heuristic and its optimality is confined to the data on which it is based. Its continued justification lies in the fact that at least for complex organisms like angiosperms, phenotypic STCs bridge the gap between the steadily growing complexities of evolutionary species (in the sense of Hey 2001) as revealed by evolutionary biologists, and the wider scientific and general public who look to taxonomists to provide them with a comprehensible library of science-based species (Garnett et al. 2020; Grace et al. 2021).

Automated plant species identification

The rapidly expanding field of automatic species identification (MacLeod 2007; Cope et al. 2012; Lee et al. 2017; Remagnino et al. 2017; Wäldchen & Mäder 2017 – 2018; Wäldchen et al. 2018) depends on the existence of training sets of images of plants or plant parts which have been reliably determined to species by expert taxonomists, and thus on species taxon concepts. These techniques are a response to an expanding need for species identification from many different parts of society.

The development of deep learning convolutional neural networks (CNNs) has made possible very high levels of accuracy using completely automatic processing of images. The most advanced techniques do not require the structural and terminological classification of plant morphology and its taxonomic characters; instead the software builds feature vectors directly from the images supplied (Remagnino et al. 2017; Wäldchen et al. 2018). The crucial point is that the images must have been grouped taxonomically prior to processing. The increased implementation of such advanced approaches to identification is likely to generate more demand for expert taxonomic indexing of the training sets used. If morphological data continue to play a role in macrobiological species taxon concepts, as argued here, then operationalising them online could help meet such a demand.

3. Online publication outputs from alpha taxonomy

In addition to the conventional publication of alpha taxonomy in books and journal papers there is also a need for at least some of the products of taxonomic research to be available in other, more readily assimilable formats. It is evident that traditional formats are unsuited to mobilising those parts of taxonomic outputs which are of most immediate practical use to other scientists (Pante et al. 2015). I list here some products which might be published online routinely as part of the workflow of alpha taxonomy:

Specimen character data sets with standardised authorship, date of publication and other relevant metadata

Traditional STC description is a flexible format which does not depend on the availability of complete data for each specimen. It does not, however, easily allow incremental accumulation of observations. In conventional practice, each revision of an STC requires the taxonomist to start from scratch, newly scoring the characters from the same and other specimens. Like any other scientific observation-gathering exercise, scoring characters can be done well or badly. The value of publishing specimen character vectors is that it would allow taxonomists to check the work of their predecessors when necessary, or have a rational basis on which to accept previously recorded observations. In this way, high quality data sets can be accumulated (e.g. Henderson 2011; Kissling et al. 2019), which can include systematic sampling of plant characters by digital image capture. As refined products these are scientific outputs at least as valuable as conventional papers.

Geographical point data sets with standardised authorship, date of publication and other relevant metadata

Geographical point data already has a general global framework (GBIF 2021 — the Global Biodiversity Information Facility), but its intrinsic importance would be better served if it were a standard output of taxonomic revision because its value depends on production and validation by a relevant expert. Each geographical point, representing a preserved specimen combining a validated geographical location and date and an authoritative taxonomic determination linked to an STC, is a permanent datum with high value. It is probably the most instantly useful output of a taxonomic study from the standpoint of other scientists. In environmental studies which collect chemical and physical data e.g. from the atmosphere or the oceans, such data points are generated by automatic data gathering stations. In the case of taxonomic-geographical data, each point requires expert human intervention. Online publication of such high quality data serves a need and deserves recognition.

Computational analyses used

Reproducible research methods, e.g. complete documentation of analyses using R Markdown (Xie et al. 2018), would help researchers to refine and improve pattern analyses of online data sets. This would strengthen the objective basis of STCs and make it easier for taxonomists to develop the skills of computational analysis by copying the approaches of other workers and using their data for training. As with online publication of data sets, full metadata including authorship, date of publication, etc. is needed so that authors receive due credit without sacrificing ease of access for other workers.

Document specimens cited in published species taxon concepts with online images or links to repositories where they are available

Online images allow taxonomists to check the data given in descriptions and diagnoses (including key entries), add new data to STCs and check the label data transcriptions provided in databases, which are often incomplete or error-prone, or omit important information such as recognition of the handwriting of earlier specialists.

4. A standard reference system for species taxon concepts

An explicit goal of taxonomists for centuries has long been to create a world reference system in which all known plant species are described. This still is a goal of systematists and an expectation of other scientists who depend on taxonomic identifications and the general public, who require a descriptive explanation of biodiversity. The idea of a general system has motivated the construction of the comprehensive online taxonomic databases mentioned earlier, as well as more recent major international research projects like APG (Angiosperm Phylogeny Group 2016; Stevens 2001 onwards) and PAFTOL (Baker et al. 2019). The rapid growth of e-taxonomy websites focusing on individual groups and regions has been inspired by the potential for taxonomists to collaborate on monographs of individual taxa, some of which are mentioned here. Smith et al. (2012) provide access to hundreds of Scratchpad monographic sites, including CATE-Araceae (2021, Araceae; see also Haigh et al. 2008); some especially notable sites are GrassBase (Clayton et al. 2006 onwards; see also Vorontsova et al. 2015), Solanaceae Source (PBI Solanum Project 2021), Palmweb (2020, Arecaceae), Kilian et al. (2021, Cichorieae-Asteraceae), Borsch et al. (2015, Caryophyllales), Carvalho (2013 onwards, Caricaceae), and Wilkie et al. (2008 onwards, Sapotaceae), among others that could be mentioned. Two important sites with major regional treatments that include species descriptions are Flora do Brasil (2020) and eFloras (2021, including North America, China and Chile).

The notion of a world reference system goes back a long way (e.g. Linnaeus 1753) and formerly resembled, in imagination at least, a monumental edifice something like a pyramid in continual construction, where successive generations of taxonomists labour to set their chiselled stones for posterity. A modern version of the world reference system is to imagine it as a transformation from printed literature to a system of inter-operable websites distributed throughout the world. This is an advance on its predecessor in offering the prospect of constant updating and cheap and rapid dissemination of the resulting information, the lack of which are prime defects of placing reference systems in the public domain through the print medium alone.

However, advances in evolutionary biological research have placed another obstacle in the path of achieving the world system. There is widespread dissatisfaction with traditional, morphology-based species taxon concepts of the kind which make up the overwhelming majority of taxonomic species in circulation, and indeed of current production (see any plant taxonomic journal or recent Flora). Viewed from the perspective of a conservation scientist, population geneticist, ecological niche modeller and population or species-level phylogeneticist, the classical angiosperm species taxon concept can appear outmoded. Bateman (2022), for example, presents a critique of the inadequacy of classical plant species taxon concepts in orchid taxonomy, and argues the need for species taxa to be based on statistically adequate population-level sampling, and to result from the analytical congruence of different data types, at least morphological and molecular genetic. The results of research on evolutionary groups (in the sense of Hey 2001) often show little correlation with published STCs and the dependence on morphology is viewed with scepticism unless backed up with genetic evidence. Motivation among scientists for uploading the existing assemblage of hundreds of thousands of STCs onto the internet is consequently also weak; the feeling is that species should be more robust as evolutionary hypotheses before being presented as scientifically established entities.

Integrative species delimitation is a response to this sense of the lack of scientific credibility of morphology-based STCs. It is driven by the view that formally named and delimited species should be based on as wide a range of data as necessary, and is justified because molecular data at population scales have much greater interpretative power, given the background of population genetic theory. Systematists are understandably keen to base new STCs on molecular genetic data whenever this is possible.

This situation can be compared with the progress of molecular phylogenetics (Angiosperm Phylogeny Group 2016; Baker et al. 2019). Here, the ordinal, familial, and increasingly the generic structure of angiosperm classification has undergone a transformation whereby molecular genetic data now provide the machine language of the supraspecific classification, and the morphological patterns which are then fitted onto the molecular clades can be thought of as a high-level language that serves to make the tree of life comprehensible to society at large (Judd et al. 2016). However, ultimately this overall phylogenetic hierarchy is based on classical STCs, which comprise the sampling frame for all such studies. Supraspecific clades can be built analytically using Hennigian theory, but species themselves are not analytically tractable without a single acceptable definition of a species, which remains beyond our grasp (Zachos 2016). Evolutionary hypotheses require the prior hypotheses of taxonomic species, and the latter are still mostly constructed by the process described earlier. Taxonomic species thus have the role of axioms — prior assumptions necessary for downstream research, within the overall workflow of systematics and evolutionary biology.

In point of fact, species are established by convention, even if this may not be immediately obvious. Taxonomists tend to specialise in particular groups or floristic regions. As there is always much more to do than taxonomists to do it, their species taxa are for the most part uncontested. The convention is to accept the species of a taxonomist unless there is reason not to do so. The increase in ways of proposing species groups that has arisen from evolutionary biological studies brings an expanded requirement for convention. Since species delimitation has no single theory-based analytic solution, and since different delimitations are being proposed in some cases, based on different data types, there is a need for systematists to find ways of homing in on consensus delimitations, at least in regard to the "world system" that is to be presented to the public as the library of biodiversityFootnote 11.

Criticisms of consensus taxonomy argue that restrictions to the independence of researchers should not be encouraged and communicate a suspicion that consensus will bring the imposition of rigid bureaucracy on creativity and discovery (e.g. Carvalho et al. 2014; Garnett & Christidis 2017; Thomas et al. 2018; Garnett et al. 2020). On the other hand, the need for an overall world system of species is universally recognised, and in itself represents no threat to independent research (Godfray et al. 2007; Scoble et al. 2007; Clark et al. 2009). An analogy might be made with the Periodic Table of chemistry (Scerri 2020) and its relationship to the discoveries of particle physics; these two fields appear to coexist, presumably because their primary aims are not the same and they complement each other in forming a bridge from society in general to the frontiers of current research.

There is general acceptance that genetic data should play a key role in discerning and describing species and this might be taken to be the fundamental goal of integrative species taxonomy. How effectively is genetic data being integrated with morphological STCs? Integrative taxonomy is downstream from alpha taxonomy. Studies usually take existing published STCs as an initial framework (by taxonomic determination of the study materials), generate new data and search algorithmically for groups which are then recognised as species. In the search for the optimal grouping pattern, the data that formed the basis of the previous taxonomist's morphological STCs plays little part, beyond labelling the individuals and perhaps providing some prominent diagnostic characters. But in fact, morphological STCs are elaborate concepts built on the assessment of complex variation, not just names and simple definitions. As we have seen earlier, the patterns of morphological variation inherent in the STCs, derived from the specimens on which they are based, remain locked away and inaccessible because of the traditional format of an STC. Potential patterns of consilience between morphological, genetic and ecological data may thus be missed. The aims of integrative taxonomy would therefore be well-served by operationalising morphological/phenotypic STCs. Numerous published morphology-based phenetic and cladistic analyses have already provided data sets, although many are not readily available. However, these kinds of studies only rarely set out to represent the STC itself using the canonical character set of the monographer. Instead, they are often the result of a sampling workflow similar to that of, e.g. a molecular systematic study; first the study material is identified using classical intuitive procedures, and then sampled for the specific goals of the study. That is to say, the foundational concept of the species taxon remains in the background and is not fully articulated.

Another benefit of the computability of morphological STCs would be to facilitate reverse-engineering from a newly minted integrated STC. Species taxon concepts resulting from integrative analyses would then be automatically convertible into morphological avatars in standard format as the basis for formalised descriptions and nomenclature (as in the DELTA system, Dallwitz 2018)Footnote 12. A workflow of this nature could obviate the need to find molecular formats for formal species descriptions, at least in higher plants, leaving intact the current system of publication and relieving evolutionary biologists of the need to learn the specialised protocols of descriptive alpha taxonomy. Species taxonomy (species-as-taxa) and evolutionary biology (species-as-evolutionary-groups) would be combined in a single workflow without the need to propose drastic changes to current formal taxonomic and nomenclatural practice.

Realistically, the majority of STCs are likely to remain morphology-based for the foreseeable future while integrative taxonomy slowly penetrates the system at species level (Pante et al. 2015). Completing this is a very big and complex task, but recently there have been calls by systematists for urgent action in respect of consensus species delimitation and the production of monographs. These proposals are driven by the increasingly acute crisis of biodiversity loss and climate change. Grace et al. (2021) set out these challenges and advocate the accelerated construction of monographs targeted on priority taxa and based on large-scale molecular sequencing. Muñoz-Rodríguez et al. (2019), based on ideas proposed earlier by Scotland & Wood (2012), present a recent working example of such a monograph, as does Bateman (2018, 2022). At the same time, Garnett et al. (2020) draw attention to major issues that need to be resolved in creating a robust scientific framework for a system of the world's species that is acceptable to both taxonomists and their users. Earlier, Bateman (2011) presented a searching analysis of the role of descriptive taxonomy in the light of climate change. Working in parallel to these synthesising ideas, an ever-greater number of cryptic species is being discovered by molecular systematics research (Monro & Mayo 2022).

The nature of scientific endeavour implies innovation and change in ideas and conceptual frameworks, but the complexity of grappling with the biodiversity crisis requires the taxonomic framework for species to be made more easily accessible. One message of this review is that there is already such a framework — the species taxon concepts currently regarded as accepted by taxonomists. Although good only in parts (like the parson's egg), this is in fact the framework that everybody is using, dispersed across a largely hard-copy literature. The foundations of a distributed online version of this framework are already visible in the form of databases such as Catalogue of Life (Bánki et al. 2021), Plants of the World Online (POWO 2021), World Flora Online (WFO 2021), and Leipzig Catalogue of Vascular Plants (Freiberg et al. 2020), but for the most part the STCs themselves still need to be uploaded. As this progresses, the needs of integrative taxonomy will, I believe, require the development of computable formats for traditional phenotypic STCs, so that what has already been achieved by taxonomists can be utilised fully in the future. This in turn will require methodological innovation and increased effort in training taxonomists. The idea is to move further towards the collective goal that all systematists probably share: how are we going to produce a global system for species biodiversity which is scientifically sound, available to all and kept up to date?

Glossary

alpha taxonomy: see classical taxonomy. Alpha taxonomy is a term coined by Turrill (1938): " 'Traditional', 'orthodox', or 'alpha' taxonomy is based on morphology ...". Davis & Heywood (1963: 3 – 4) elaborate this further: " [The pioneer and consolidation phases of the classification of the world's flora] correspond to the 'alpha' classification of Turrill ..., a preliminary classification based almost entirely on external morphology." Although these authors were referring to the taxonomic hierarchy as a whole, my focus is on species, which from a cognitive standpoint can be said to represent the most basic level of categorisation (Rosch 1978; Atran 1990).

basic-level groups/categories: Rosch et al. (1976) proposed that "In taxonomies of concrete objects, there is one level of abstraction at which the most basic category cuts are made. Basic categories are those which carry the most information … and are … the most differentiated from one another." Murphy (2002: 210) explains basic level categories thus: "Of all the possible categories in a hierarchy to which a concept belongs, a middle level of specificity, the basic level, is the most natural, preferred level at which to conceptually carve up the world. The basic level can be seen as a compromise between the accuracy of classification at a maximally general level and the predictive power of a maximally specific level".

category: see also concept. Murphy (2002: 5) understands categories as classes of objects, as distinct from a mental (subjective) representation (i.e. a concept). Categories are said to instantiate, i.e. make manifest, mental concepts and in this sense they can be understood to be more objective than concepts-as-mental-representations. A category can also be understood as equivalent to the extension of a class of objects, i.e. the collection of objects themselves, as distinct from the intension of the class, which is the definition or list of criteria that define an object's membership of the class.

In systematics, however, the species category means the collection or class of all taxa deemed to be species. The widely discussed range of species concepts (Zachos 2016) consists of various definitions of the species category.

categorisation: "Categorisation means assigning objects to concepts" (Hannan et al. 2019: 97).

class: this term is used here loosely and interchangeably with group, although in logic and philosophy a class is more strictly treated as a collection of entities which share a unique set of necessary and sufficient properties.

classical taxonomy: used here for the construction of species taxon concepts, as described in the main text, in which descriptions and diagnoses use plant morphology as the primary source of character data, naming rules follow the Code, and the species are based on and verifiable from permanent reference collections of herbarium specimens, in the tradition established by Linnaeus. This paradigm is often referred to as Linnean taxonomy, at least for species, but might better be called Candollean taxonomy, since both the data and the structure of STCs were established in their modern format by A. P. de Candolle (1818), and the nomenclatural Code was established by his son Alphonse in 1867. Other terms expressing much the same idea are traditional, alpha and orthodox taxonomy.

classification: according to Jeffrey (1982), classification is "the assignment of like objects to recognisable groups". In computer-driven pattern recognition classification means the same thing as identification in biological systematics, which implies the pre-existence of the groups to which the objects are assigned. In systematics, however, classification means both the creation of a hierarchy of classes and the hierarchical system itself; the word's Latin roots (classis, facio) mean the creation of classes. In this review I have sometimes referred to the system of species as a classification, without reference to a hierarchy, although partition would be a more strictly correct term for this meaning.

clustering: see discern

concept: This term is used here in more than one sense. The first is for describing the mental representation of taxonomic species, and here I use the term concept in the sense most commonly understood today by cognitive psychologists (e.g. Smith & Medin 1981; Murphy 2002: 5; Hannan et al. 2019: 97), which excludes strict definition. This kind of concept is a purely subjective representation which we use to aggregate objects intuitively into categories. Murphy (2002: 5): "In general, I try to use the word concepts to talk about mental representations of classes of things, ..." Estes (1994: 241) gives this definition: "a general idea or understanding, especially one derived from specific instances ... and taking the form of a knowledge structure that enables or mediates categorisations". He holds that the mark of having mastered a concept is the ability to categorise objects or events of a domain in ways that could not be accomplished in the absence of the concept. Just how concepts are represented mentally is still controversial in cognitive psychology, so that to some extent it has to be treated as a black box — something that comes into existence in our minds, acquires a name, and allows us to categorise objects (mental and concrete), but whose detailed cognitive structure remains mysterious.

Second, the terms species taxon concept and taxon concept (see entries) have a more complex and objectified structure and include the hypodigm (cited specimens), which is a category.

Third, species concepts in systematics are definitions of the species category (see entry for category) and are intended to be precise, i.e. members must possess a limited number of necessary and sufficient properties specified by each definition. This corresponds to the classical concept of cognitive psychologists and seeks to attain something close to the rigour of logical classes or mathematical sets.

consilience: according to Wikipedia (https://en.wikipedia.org/wiki/Consilience, 9 Sept. 2021), "consilience … is the principle that evidence from independent, unrelated sources can "converge" on strong conclusions." This conforms to the lengthier discussions of Wilson (1998) and Bookstein (2014).

delimitation: concerns finding the boundaries of a group, and implies the prior existence of that group even if only vaguely conceived, i.e. delimitation is downstream from discernment.

determination: is a technical term used by taxonomists and is very similar to identification but with the added meaning that the operation concludes with the application of the correct name to the taxon.

diachronic: of comparisons at different periods.

dialectic: discussion and reasoning by dialogue as a method of intellectual investigation; this is one of various definitions given by Merriam-Webster (https://www.merriam-webster.com/dictionary/dialectic).

discern: There are a number of words (delimit, discern, discover, distinguish, identify, match, perceive, recognize, reveal, typologise) which are used variously and inconsistently to express either or both 1) the initial segmentation of objects into groups, or 2) the determination (identification, classification) of an individual as a member of a predefined group; in both cases the naming of the resulting groups takes place after their discovery, discernment, delimitation, etc.

According to the taxonomic species workflow described in this paper, the first stage corresponds to the formation of the NTC, i.e. a subjective mental concept that emerges, as it were, from a mist of variation previously undifferentiable into discernible groups. I use discern as a very general term for this mental act, although it could well be argued that distinguish, discover, perceive or reveal serve this purpose equally well. Delimit and typologise are more specific, the former implying discernment by drawing a boundary, and the latter by establishing a central representation of some sort. Identify and match both imply the subjective recognition of sameness between entities that could be either groups or individuals, but without more specific conditions. Recognise implies the prior cognitive existence of some group to which the object in question belongs. Clustering is a process similar to discernment but with the implication that it is the result of mathematical operations.

It may be relevant to this discussion that every species is discerned within a cognitive framework (universe of discourse — UD) that includes other individuals and groups and so discernment of taxa is always a comparative process, even when this is happening intuitively. Conversely, single species cannot be discerned alone; there is always a cognitive background of pre-existing group concepts.

discover: (of a species) implies the uncovering of a pre-existing entity (dis- cover) and therefore includes a rather "hard" idea of how organisms are formed into groups and differs from discernment of a species by downplaying the distorting interference of taxonomists' own cognitive apparatus in picking out species from the diversity of individual organisms.

epistemology, epistemological: the study or theory of the nature and grounds of knowledge, especially with reference to its limits and validity (https://www.merriam-webster.com/dictionary/epistemology). In our context, the question what sort of evidence is needed to show that species exist, is an epistemological one.

extension: the set of things denoted by a term (Richards 2010: 186). Equivalent to category as used here. In the present context the extension of a named species taxon concept would correspond to all the individuals included by its author.

group: used here very generally, meaning any collection of objects, with or without a definition.

hypodigm: Simpson's definition (Simpson 1961: 185; see also Simpson 1940) is: "The hypodigm of a given taxonomist at a given time and for a given taxon consists of all the specimens personally known to him at that time, considered by him to be unequivocal members of that taxon, and used collectively as the sample on which his inferences as to the population are based". This can be taken to correspond to the specimens cited by an author in a monographic-level STC. Other formats (e.g. Flora treatments) may editorially restrict the number of published specimen citations allowed, and then establishing the hypodigm of a given STC can become more difficult, requiring a search for herbarium specimens determined by the author at the relevant period.

identification: is commonly used to mean 1) the same as determination, that is, the act of assigning an individual organism to a pre-existing group which may or may not be named. Identification can also mean 2) the same as matching ("identity parade" Jeffrey 1982), a meaning derived lexically as identi-fication from Latin identitas, facere = to make identical. This suggests that the meaning of identification is not restricted to assigning an individual to a class (the usual sense in taxonomy) but also includes the sense of matching one individual with another. Confusingly, computational pattern analysts use the word "classification" for meaning 1).

instantiation: this describes the process by which someone creates a category of objects by applying their mental concept to real world phenomena. An example would be the determination of a collection of plant specimens to a particular species. The set of specimens thus determined is the result of a process of categorisation by the botanist, and is an instantiation of their concept of the species. The individual specimens are then instances of the species.

intension: the meaning of a term as determined by its extension (Richards 2010: 186). Equivalent to the definition of a term, or a list of criteria that an object must satisfy to be assigned membership of the extension of the term. In the present context, the intension of a species name would correspond to the description and diagnosis of an STC with that binomial.

matching: describes a comparison of two individuals or objects which results in the decision that they are indistinguishable ("the same", Jeffrey 1982), or better, that they are so similar that they cannot be (easily) distinguished. Similarity is an undefinable and subjective term (Quine 1969; Rieppel 2006) but nevertheless understandable by everyone; it might be compared with Euclid's idea of common notions (Heath 1956: 155): ideas that are assumed a priori to be universally comprehended.

naming: is the process of applying a scientific name to an already recognised taxon. A species is first discerned (a subjective cognitive process), then described (a taxonomic process of objectification) and finally named (a nomenclatural process). Nomenclatural type specimens are the key tool used in assigning names to species.

neotypological concept of a species (NTC): Davis & Heywood (1963: 11): "… every taxonomist builds up what one might call a typological picture based on his experience of each species; this is derived from the material which he has seen in the field, herbarium, garden and laboratory and from figures and descriptions in botanical literature. He will thus come to a concept of the species which includes a greater or lesser amount of its actual variability. We might call this the neotypological concept …"

NTC: see neotypological concept of a species

ontology, ontological: a particular theory about the nature of being or the kinds of things that have existence (https://www.merriam-webster.com/dictionary/ontology). According to Hofweber (2017), "As a first approximation, ontology is the study of what there is"; in our context, the question whether species really exist is an ontological one.

orthodox taxonomy: see classical taxonomy

partition: a non-hierarchical division of individuals into mutually exclusive categories, e.g. specimens into taxonomic species.

polythetic group: a group in which none of the character states shared by its members are alone necessary or sufficient for membership of the group; see discussion in Sneath & Sokal (1973: 21 – 23).

prototypical concept: a concept which has one or more typical members as its point of reference; "The prototypes of a concept are those with maximum typicality, or the "best examples"" (Hannan et al. 2019: 23).

recognise: see discern; both terms imply a subjective cognitive process but recognition implies reiteration, i.e. that the concept (usually a taxon) in question has previously been discerned.

reification, reify: see categorise. From Latin res (thing) and facio, facere (to make), i.e. "to make into a thing". This describes the transformation of a purely mental concept into a concrete instantiation consisting of a category (class, set, group) of individual objects.

segmentation: is a term from pattern analysis which implies the division of some predefined space or universe of objects into subdivisions using an algorithm. The implication here is that subdivisions are made using clear criteria similar to Aristotelian class division or the structure of an identification key.

similarity: see matching

species taxon concept: a complex published statement consisting of nomenclature, description, diagnosis (including key entries), cited specimens, images and geographical and ecological characterisation, located in a particular, dated, published taxonomic treatment by a particular authorship.

STC: see species taxon concept

synchronic: of comparisons at the same period.

taxon concept: see species taxon concept

traditional taxonomy: see classical taxonomy

type, typology, typological, typologise: Typology concerns the recognition of the most typical individuals of a group, that is, those that are most central to its meaning. Typicality is a characteristic of groups in which the component members have differing degrees of membership, as in fuzzy sets (Zadeh 1982); the boundaries of such groups are also often poorly defined. Establishing typologies (typologising) of a range of objects (concrete or abstract) is a routine activity in many fields, e.g. linguistics, archaeology, psychology, aesthetics, etc. and involves setting up a series of groups with unique sets of properties into which individuals or objects in that domain can be classified. A type is thus a typical member of such a group and serves to characterise a group by expressing it as a central tendency or summary; the statistical notion of a mean or centroid is fundamentally the same idea (see footnote 8 on Whewell's oft-cited description of a type).

In systematics, the adjective typological is often used to refer to the taxa of classical botany, including STCs, with the intention of contrasting them with groups recognised by methods derived strictly from evolutionary theory, e.g. cytotypes, molecular genetic clades, cryptic species based on interbreeding barriers, etc. This is in fact a false contrast since such groups generally also involve typical members and vague boundaries, e.g. clades defined by synapomorphies which are not present in all derived lineages due to later homoplasy. When used judiciously, however, this term remains useful. In the present paper, the interpretation of taxonomic species as based on prototypical concepts leads inexorably to their recognition as typological concepts because they result in categories (groups of individual plants) that have typical and atypical members and between which it can be impossible to draw a sharp boundary. However, the adjective typological carries a negative meaning in current biology, owing to its prominence in the "Essentialism Story" (e.g. Winsor 2003, 2006; Wilkins 2009; Witteveen 2015, 2016, 2018a, b). What this controversy brings into focus is the fact that the taxa of classical taxonomy have a strong subjective basis, with the consequence that species are strictly undefinable by analysis, although they can be established by convention.

type, nomenclatural: According to the current code of nomenclature (Turland et al. 2018), the nomenclatural type of a plant species is an herbarium specimen selected as such (holotype) by the original author and published with the original description: the species epithet will thenceforth be permanently attached to this specimen. Whatever subsequent taxonomic changes that species may undergo (e.g. merging with another species), this name-bearing specimen will always be the ultimate reference for the date of the name's valid publication — critical for deciding which name must be correctly applied. This is the primary purpose of a nomenclatural type specimen but it will not necessarily be typical of a given STC, since this depends on the extent of the hypodigm. Thus, if a taxonomist carries out a revision and merges three species previously recognised as distinct, a choice must be made as to which of the three names is the correct one to use for the newly delimited STC. In straightforward cases, the first name to have been published validly according to the rules becomes the correct name of the species, and its type automatically becomes the type specimen of the newly circumscribed species. There continues to be controversy as to what function nomenclatural types serve; see a discussion by Witteveen (2018b).