Short and sweet: an analysis of the length of parasite species names

In its advice to taxonomists, the International Commission on Zoological Nomenclature (ICZN) recommends that scientific species names should be compact, memorable, and easy to pronounce. Here, using a dataset of over 3000 species of parasitic helminths described in the past two decades, we investigate trends in the length of Latin specific names (=epithets) chosen by taxonomists. Our results reveal no significant temporal change in the length of species epithets as a function of year of description, with annual averages fluctuating around the overall average length of just over 9 letters. We also found that lengths of species epithets did not differ among the various host taxa from which the parasites were recovered, however acanthocephalan species have been given longer species epithets than other helminth taxa. Finally, although species epithets were shorter than genus names for three-quarters of the species in our dataset, we detected no relationship between the length of species epithets and that of genus names across all species included, i.e., there was no evidence that shorter species epithets are chosen to compensate for long genus names. We conclude by encouraging parasite taxonomists to follow the recommendations of the ICZN and choose species epithets that are, as much as possible, compact and easy to remember, pronounce and spell. Supplementary Information The online version contains supplementary material available at 10.1007/s11230-022-10058-0.


Introduction
From the 18 th century when Carl Linnaeus proposed his taxonomic classification scheme, to the central place it now occupies in the International Commission on Zoological Nomenclature's (ICZN) framework (https://www.iczn.org/), the Latin binomial system is a cornerstone of taxonomy and all efforts to inventory Earth's biodiversity. For ease of use, the ICZN and various commentators (e.g., Š lapeta 2013) recommend names that are compact, memorable, and/or easy to pronounce. However, many species names pose challenges to both scientists and lay people who must pronounce, remember or spell them. How have taxonomists, including parasite taxonomists, followed the above recommendations when naming new species after their initial discovery?
There is much variation in the length of species names. According to Wikipedia (https://en.wikipedia. org/wiki/List_of_short_species_names), the shortest binomial species names (genus and species names combined) are 4 letters long, including the bat Ia io Thomas 1902 from tropical Asia. Species names cannot be any shorter than this. Also according to Wikipedia (https://en.wikipedia.org/wiki/List_of_ long_species_names), the bacteria Myxococcus llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogochensis Chambers et al. 2020 (73 letters for genus and species names combined) is the longest binomial species name as of 2022. It is named after the site in Wales where it was discovered, which has one of the longest place names in the world. It is hardly easy to spell or pronounce, let alone remember. According to the same source, the parasite with the longest name is the trematode Epithelionematobothrium mulloidichthydis Yamaguti, 1970, with 39 letters; the nematode Hysterothylacium deardorffoverstreetorum Knoff et al. 2012 also has a name with 39 letters. These are both a challenge to write or say out loud.
When choosing a name for a new species, taxonomists are faced with a compromise. On the one hand, short and simple names should be more appealing because they will be easy to remember. On the other hand, longer names may be necessary to fully capture who or what they want to honour or the information they want to convey about the species (where it was found, what it looks like, what host species it infects). It remains unclear whether patterns in name lengths emerge from the vast number of parasitic helminth species named to date. Do species names differ in length among different taxa of parasites, or among the host taxa from which they are found, possibly reflecting different traditions or preferences among the taxonomists working on those taxa? For each major parasite taxon, a small number of prolific researchers account for the vast majority of new species described and named , therefore we might expect their influence to be reflected in the names chosen. Also, has the average length of new species names changed over time, perhaps reflecting a growing preference for shorter names? Finally, are shorter species epithets more frequently chosen for species belonging to genera with long names, as an effort (perhaps subconscious) to keep the full binomial name within reasonable length?
Here, we address the above questions using a large dataset on the species names of helminth parasites described since the year 2000. We focus exclusively on species epithets, and not on genus names or full binomial names, since many genera have been named well before the starting year of our dataset. In other words, except when new genera are erected, the morphology and genetics of a new species determine what existing genus it falls into; only the species epithet gets chosen from scratch by the authors of the species description.

Methods
We used the dataset compiled by , which comprises information on each new species description of trematodes, cestodes, monogeneans, nematodes, and acanthocephalans published between 2000 and 2020, inclusively, in the following 8 journals: Acta Parasitologica (data from 2000-2005 missing for this journal), Comparative Parasitology, Folia Parasitologica, Journal of Helminthology, Journal of Parasitology, Parasitology International, Parasitology Research, and Systematic Parasitology. We updated this dataset with data from new species described in the same 8 journals in 2021. Although helminth descriptions are also published in other journals, these 8 journals capture a large proportion of published descriptions, and provide a large enough sample for the present analysis. The full dataset is provided as Supplementary Information.
For each species description, in addition to the Latin binomial name of the new species, the dataset includes the following information: (i) the higher taxon to which the parasite belongs (trematodes, cestodes, monogeneans, nematodes, or acanthocephalans); (ii) the host taxon it parasitises (invertebrates, mammals, birds, reptiles, amphibians, or fish including elasmobranchs); (iii) the number of letters in both the genus name and species epithet; (iv) the year of publication; and (v) the journal in which it was published.
Our analysis tested for taxonomic or temporal patterns in the length of parasite species epithets, as well as for a relationship between the lengths of genus names and species epithets. For this, we used the length (no. letters) of species epithets as response variable in a generalized linear mixed model (GLMM) with Poisson distribution, using the lme4 package (Bates et al. 2015) in the R computing environment (R Core Team 2022). The fixed factors or predictors were the length of the genus name, the parasite's higher taxon (5 levels: trematodes, cestodes, monogeneans, nematodes, and acanthocephalans), the host taxon (6 levels: invertebrates, fish, amphibians, reptiles, birds and mammals), and the year of publication (2000 to 2021; ordered variable). For the two categorical factors, based on earlier pairwise analyses, 'acanthocephalan' was chosen as the reference level (included in the intercept) for parasite taxa because it tended to differ from other taxa, whereas 'amphibian' was chosen arbitrarily as the reference level for host taxa as no difference was seen among host taxa. Interactions were left out of the model, as the number of possible combinations was too large for meaningful interpretation. The journal in which the species was described was included as a random factor, to account for non-independence among species descriptions and possible (though unlikely) editorial pressures creating consistent inter-journal differences in the length of species epithets.

Results
Our 22-year dataset (2000-2021 inclusively) comprised 3016 species names, with monogeneans and nematodes accounting for the majority of species (Table 1). The lengths of species epithets followed an approximate Poisson distribution (Fig. 1), with an overall average length of 9.2 letters (range 3 to 20). The longest epithets are far from easy to pronounce or spell (Table 2). Several species epithets in our dataset were used for more than one species. The most popular ones were the 7-letter name 'gibsoni' (used for 13 species), followed by 'brayi' (5-letters; used for 11 species) and 'vietnamensis' (12-letters; used for 9 species). These were treated as separate entries, because they were chosen by their authors independently of each other even if they have the same etymology (i.e., eponyms of the eminent taxonomists David Gibson and Rod Bray, and country of collection). The full dataset is provided as Supplementary Information.
The GLMM results confirmed that the journal in which a species description was published accounted for a trivial proportion of variance in the length of species epithets (Table 3). The findings also indicate that the species epithets of acanthocephalans are longer than those of species in other taxa, with the difference being significant for trematodes, cestodes, and monogeneans but not quite for nematodes (Table 3). On average, epithets of acanthocephalan species were about one letter longer than those of species in other taxa (Table 1). In contrast, the results of the analysis revealed no significant variation in the length of species epithets among the host taxa from which the parasites were recovered.
The analysis also uncovered no effect of year of publication on the length of species epithets (Table 3). The annual average length of species epithets has fluctuated slightly over time around the overall average (Fig. 2), showing no evidence of any clear and consistent temporal trend.
Finally, the GLMM found no relationship between the length of species epithets and that of genus names ( Table 3). The length of species epithets remains exactly the same on average regardless of the length of the genus name (Fig. 3). However, species epithets are generally shorter than genus names. The species epithet was shorter than the genus name for 2260 species (74.9% of cases), exactly the same length for 238 species (7.9%), and longer for 518 species (17.2%).

Discussion
The inventory of parasite biodiversity on Earth is far from complete. The number of new parasite species discovered and described every year has been increasing in recent decades, and several hotspots of biodiversity are yet to be fully explored for the parasite species they harbour (Poulin & Presswell 2016;Jorge & Poulin 2018;Carlson et al. 2020). It is therefore a good time to re-examine some of the practices associated with the description of new parasite species, including the choice of species epithets.
Our findings indicate that, on average, the length of species epithets given to newly described helminth species has not changed over the past two decades. However, we found a small but significant taxonomic bias in the length of species epithets: acanthocephalans are generally given epithets slightly longer than those chosen for species in other helminth taxa. The reasons for this difference are unclear, but may simply arise from the personal preferences of the most prolific taxonomists specialising in acanthocephalans. Interestingly, acanthocephalans also tend to have the longest genus names. In our dataset, treating each occurrence of a genus name separately even if they appear multiple times, genus names of acanthocephalans are 14.4 letters long on average compared to 13.4 for monogeneans, 13.0 for cestodes, 12.2 for trematodes, and 11.3 for nematodes. Therefore, acanthocephalans generally have the longest binomial Latin names of all helminths. This should perhaps be taken into consideration when naming new acanthocephalan species in the future.
We also found no evidence of a negative relationship between the length of species epithets and that of genus names. Had we found one, such a relationship would have suggested an attempt, whether conscious or not, to compensate for very long genus names by choosing a short species epithet for new species assigned to such genera. Perhaps such efforts to match long names with short ones should be encouraged in future, to keep the overall binomial name within reasonable length.
Earlier, we used the same dataset to assign species epithets to 5 broad etymological categories, based on the source of inspiration used to name a new species . Species were categorised into those named for their morphology, for their host, for their type locality, after an eminent scientist, or for something else. A preliminary look at whether the length of species epithets differed among these categories, or subcategories within the 5 main ones, indicated that they do not (data not shown). Thus, the inspiration behind a species epithet does not influence its length.
The only comparable study we are aware of is an investigation of the names of over 48,000 spider species described since the middle of the 18 th century (Mammola et al. 2022). The frequency distribution of the lengths of species epithets among spiders is strikingly similar to ours, with the same shape and mode, resulting in practically the same average length. In addition, there was no temporal change in the average length of species epithets among spiders, even though the spider data set spanned more than two centuries (Mammola et al. 2022), whereas ours covered just over two decades. It seems therefore that the few patterns we observed are not unique to parasite species, but may reflect broader practices in taxonomy.
Previous commentators have provided guidelines for the formation of species names that conform to the grammatical rules of Latin, and/or for the correct usage of species names after they are coined (Sangster & Pope 2000;Notton et al. 2011;Šlapeta 2013;   The percentage of the remaining variance accounted for by the random factor 'Journal ID' was \1%.  Vendetti & Garland 2019). Based on our findings, we would like to remind all readers of article 25C of the ICZN (https://www.iczn.org/), which states that names should be chosen with their subsequent users in mind, so that they are as much as possible compact, euphonious and memorable. The only absolute requirement for a species epithet is that it must be unique amongst known species within the same genus. We therefore encourage taxonomists to choose species epithets that are no longer than 12-13 letters, which seems the maximum that most biologists would be comfortable with, whatever their native language. Shorter names are not necessarily 'sweeter', i.e. more pleasant sounding when spoken out loud, but they are likely easier to pronounce. As is true of scientific jargon in general, simpler is often better.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions. DdAD was supported by a University of Otago Doctoral Scholarship during this study. The study received no other specific funding.

Data availability
The full dataset is available as Supplementary Material.

Declarations
Competing interests The authors declare no competing interests.

Conflict of interest
The authors declare no competing interests.
Ethics approval Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.