Introduction

Current approaches used to name within-species, plant virus phylogenetic groups are often haphazard, misleading and lacking in logical basis, especially with well-studied plant viruses. The kinds of names employed range from ones based on single biological properties or geographical, country and place-association designations to combinations of such names with ones identifying genomic sequence differences. Biologically based names include those given to the first isolate found within a particular group based on the host species or cultivar in which it was first detected, a characteristic symptom it causes in infected plant foliage, fruit or storage organs, or the pathotype or strain group defined by resistance genes to which it belonged. Such a diverse range of group names might have seemed appropriate initially when few, mostly incomplete nucleotide sequences were available. However, their use is becoming increasingly unsustainable as numbers of sequenced isolates of the same virus from new host species and different parts of the world (many of which result from recombination or reassortment) continue to increase. Serious consequences for plant virus research and disease management might arise from incorrect assumptions made when current phylogenetic group names incorrectly identify the breadth of properties of their group members.

An important contributory factor is the rapid increase in numbers of complete genome sequences becoming available and changing the landscape of phylogenetic analyses [19]. Using complete genome sequences gives much more information and allows recombination to be taken into account in interpretation of phylogenetic trees. In contrast, employing partial genome sequences for phylogenetic analysis can produce misleading phylogenetic trees, which, when recombination occurs, often result in incorrect deductions about the positioning of isolates in within-species phylogenetic groups [1820, 29]. Other factors increasingly impacting on the sequences that make up within-species phylogenetic groups include emergence of known and novel viruses in new hosts or world regions driven by (i) new encounters between wild and crop plants at interfaces between wild and managed vegetation, (ii) rapidly expanding world trade in plants and plant products moving viruses and vectors around the world, (iii) agricultural intensification, extensification and diversification underway to feed the burgeoning world population, (iv) encroaching urbanization as population centers expand, and (v) climate change causing plant viruses to adapt, shift hosts and change their geographical distributions [1, 3, 4, 7, 1014, 24].

How widespread is use of misleading names for within-species phylogenetic groups? What errors are likely from misleading group names and what would the implications of continued inaction be for plant virus molecular and biological research and disease management? What worldwide benefits would result from an improved nomenclature system? Does the improved system proposed have any problems or pitfalls? This opinion piece seeks to answer such questions.

The proposal

We suggest a concise and consistent system for within-species phylogenetic group names for plant viruses that would help to avoid the kinds of problems identified here. No alterations are proposed to current demarcation criteria and thresholds, which normally involve strong bootstrap support within phylogenetic trees, currently used to establish such groups. Within-species phylogenetic groups are needed to distinguish isolates with one type of sequence from others of the same virus within other types of sequence groups. The term ‘strain’ is reserved for a group of isolates within the same virus sharing a common biological property rather than being defined by any sequence difference. We recommend the following:

  • Avoid names for within-species phylogenetic groups defined by geographical distributions and place-association designations, biological properties, or combined genome sequencing and biological/geographical descriptions.

  • Use Latinized numerals for phylogenetic groups instead, the sequential numbering being historical rather than down the page each time a new phylogenetic tree is produced.

  • When a new within-species group is identified, the next numeral in the sequence is allocated to it, regardless of its’ location in the tree relative to its neighbors.

  • Retain names based on biological differences for biological strain nomenclature.

  • When new sequences are added, wherever possible, always update information on the full spectrum of known biological properties and geographical distributions of the isolates within each phylogenetic group by incorporating this information into the sequence background information in sequence databases (may not always be possible with metagenomics or pooled data from next-generation sequencing).

Our suggested revision allows for discussion of the sequences that make up each within-species group to occur without any preconceived notion based on their phylogenetic group names, such as the assumption that they were all isolated from one particular host species or came from one geographic region. Before we recommended it for three other viruses [12, 19, 20, 23], this same nomenclature system was already in use with potato virus X [21] and hardenbergia mosaic virus [28]. There are other alternatives, such as using a mixture of Latinised numerals and alphabet letters (as with zucchini yellow mosaic virus [5]), alphabet letters alone, or standard numerals. However, on balance, Latinised numerals seem most appropriate, with letters reserved for subgroups within groups. It is important to clarify that this proposal only applies to within-species virus phylogenetic group names, and not to actual virus species names themselves, which are overseen by the International Committee on Taxonomy of Viruses (ICTV). The ICTV does not include within-species plant virus nomenclature within its remit.

Use of Latinized numerals removes misleading names and ambiguities in host range, pathogenicity or geography inherent to traditional within-species plant virus phylogenetic group nomenclature. Instead, it provides a ‘neutral’ naming system that lacks information on the biological properties or geographical distributions of isolates within each group. However, such information is sometimes needed not only by researchers but also by quarantine and biosecurity authorities, diagnostic laboratories, disease-management programs, and plant breeders dealing with virus issues. Therefore, whenever new, complete, virus isolate genome sequences with distinct biological properties, or from a different geographical region, are added to a sequence database, the data provided need to include information on their known biological properties and geographical distributions. Database repositories should be encouraged to request this information. In addition, authors of future research papers describing new plant virus isolates should be encouraged to include such information in their writeups. It would then be readily available and so able not only to inform the end users mentioned above but also to avoid perpetuating misleading within-species phylogenetic group names.

How widespread is the problem?

Use of illogical within-species phylogenetic group names is very widespread. Examples include many well-known and economically important viruses, e.g., the RNA viruses bean yellow mosaic virus (BYMV), plum pox virus (PPV), potato virus Y (PVY), sweet potato feathery mottle virus (SPFMV), turnip mosaic virus (TuMV) (all potyviruses), grapevine leafroll virus 2 (GRLV2) (a closterovirus), potato virus S (PVS) (a carlavirus), and sweet potato chlorotic stunt virus (SPCSV) (a crinivirus). Explanations of the nomenclature issues associated with BYMV, PVY, SPCSV and SPFMV are provided below, and for GRLV2, PPV, PVS and TuMV as supplementary information.

BYMV provides an example of misleading biological within-species phylogenetic group names. Its first group names resulted from coat protein (CP) sequence analysis of available isolates and the names of their original hosts. At first it appeared that BYMV originated with a generalist group that evolved before plant domestication within communities of wild plant species belonging to diverse families. After the advent of agriculture, specialised host-specific groups developed within crop domestication centres. The phylogenetic group names selected corresponded to original isolation hosts, resulting in the names ‘broad bean’, ‘lupin’, ‘pea’, ‘canna’ and ‘monocot’ for the specialist groups, and ‘general’ for the original ‘generalist’ group [29]. However, when a comprehensive phylogenetic study of complete genomes was done, it became clear that these host-dependent names were unjustified as new isolates were found to occur in members of host plant species, genera or families other than the ones originally used to name the group [19]. Recombination was found to have played an important role in the evolutionary history of the virus, and assumptions about host adaptation and geographical origins of sequenced isolates in relation to crop domestication centres proved incorrect [20]. Use of Latinised numerals for phylogenetic groups resolved all the inconsistencies (Fig. 1).

Fig. 1
figure 1

Example of misleading biologically based phylogenetic within-species group names for the species Bean yellow mosaic virus. The names used became unsustainable following an increase in numbers of complete genome sequences, presence within specialist groups of isolates from different host species from those after which the group was originally named and findings that recombination splits the former “general” group into separate groups. Use of Latinized numerals to name groups overcame all such issues (modified from reference [19])

Geographically based names are used for phylogenetic groupings with SPCSV, and both geographical and biological nomenclature with SPFMV. Two major phylogenetic groups of SPCSV were named SPCSVEA and SPCSVWA, with EA standing for ‘East African’ and WA for ‘West African’. However, SPCSVEA now contains sequences from South America and East Asia, and SPCSVWA includes sequences from South and North Africa, South and North America, Europe and East Asia [6, 26, 30]. SPFMV includes ‘East African’ phylogenetic group SPFMVEA, but this now contains sequences from South Africa, Oceania, Southeast Asia and South America [9, 17, 27]. Similarly, the biologically named ‘Russet Crack’ phylogenetic group SPFMVRC now contains isolates that do not cause this symptom in sweet potato tuberous roots. Using Latinized numerals instead of geographically and biologically based names resolved all these inconsistencies.

PVY provides an example involving biological, geographical and sequence names. Here, the single biological properties used to name phylogenetic groupings include names based on whether the earliest isolate(s) within a group induced particular kinds of symptoms in potato tubers or foliage, whether it infected some hosts naturally but not others, developed necrotic phenotypes in potato cultivars with hypersensitivity genes or in tobacco, and the name of the cultivar in which an isolate was first detected. In addition, its phylogenetic group names are also based on genome sequencing and geographical or place-association designations [16, 25]. As the number of complete non-recombinant and recombinant PVY genome sequences from different world regions grows, these names become increasingly unsustainable. Examples include (i) a recombinant phylogenetic group named after PVY isolates that induced tuber necrosis (PVYNTN) although it was later found that the ability to cause this symptom was expressed by isolates in non-recombinant and other recombinant groupings; (ii) a recombinant group named after potato cultivar ‘Wilga’ from which the first isolates within this group originated (PVYWi), although this group was subsequently found to contain sequences from other cultivars; (iii) the use of the same names for two biological and non-recombinant phylogenetic groupings (PVYO and PVYC) although it is now known that phylogenetic PVYO contains sequences from biological PVYO and PVYZ isolates and phylogenetic PVYC contains sequences from biological PVYC and PVYD isolates; and (iv) the biological strain group PVYN, which contains two non-recombinant phylogenetic groups, the ‘North American’ group (PVYNA-N) and the ‘European’ group (PVYN, sometimes designated PVYEU-N), although sequences from both groups are now found on other continents [12, 16, 20, 25]. This is clearly an unsustainable method for naming PVY within-species phylogenetic groups. Employing Latinised numerals for its groups overcame all these problems (see Fig. 2 in reference [20]).

Similar problems also occur with DNA viruses. For example, in their paper on geminivirus phylogenetic ‘strain’ demarcation and nomenclature, Fauquet et al. [8] recommended employing a sequence identity threshold of 85-93 % to separate within-species groups and distinguishing host or symptom descriptors to name them, or, when distinguishing descriptors were unavailable, using alphabet letters instead. However, this approach of using host or symptom descriptors suffers from the same weakness that, as the numbers of sequenced isolates of the same virus from new host species and different parts of the world increase, such biologically based phylogenetic group names are unlikely to continue to reflect the properties of all isolates in the enlarged group. Similar problems also occur with cauliflower mosaic virus (a caulimovirus), where within its two major within-species phylogenetic groups A and B, there are subgroups identified by country names, each of which also contains isolates belonging to another subpopulation from elsewhere [31].

Problems associated with misleading within-species phylogenetic nomenclature are not unique to plant virology. For example, geneticists studying the Fork head gene family in species ranging from yeast to humans reached a point where they had >100 family members with a naming system for transcription factors consisting of multiple classification systems. This produced nomenclature problems resembling the within-species phylogenetic nomenclature issues with plant viruses. A nomenclature committee proposed a system of classes and subclasses where classes were designated by a letter, and subclasses by an Arabic numeral, providing a consistent method of identifying within-species phylogenetic groups [15].

Without change, what errors are likely, with what implications?

When within-species phylogenetic group-specific molecular diagnostic tools are designed, it is essential to ensure that the correct sequences are chosen. This can become problematic when the information indicated by a phylogenetic group name does not reflect the properties of all its isolates. Inexperienced people could make incorrect assumptions in such instances, resulting in flawed diagnostic tools and incorrect virus strain targeting. Such targeting errors by diagnostic services of biosecurity or quarantine authorities could potentially lead to failure to prevent dangerous strains from being introduced to countries, regions or continents that were formerly free of them, or alternatively, to unjustified impediments to international trade. It could also cause inaccuracies in monitoring and surveillance programs designed to establish incidences of virus strains of particular biological significance. Diagnostic laboratories undertaking routine tests for virus strains on commercial plant samples from agricultural industries and large-scale virus screening programs could then provide incorrect results. For example, when the targeted group has a broader host range than is implied by its phylogenetic group name, the presence of a virus strain identified as being part of a particular within-species phylogenetic group with a host plant name associated with it may be incorrectly deemed to be of low risk to crops. Moreover, incorrect labeling of strain-specific virus resistance traits sometimes occurs in crop cultivars. A likely cause would be the assignment of biological names to within-species phylogenetic groups based on strain groups or pathotypes that elicit strain-specific virus resistance genes. Molecular marker and conventional virus resistance screening procedures using molecular diagnostic tools could then be compromised, resulting in flawed choices of parental lines for crosses to breed new cultivars, e.g., with PVY (see supplementary information for explanation). The European Cultivated Potato Database (http://www.europotato.org) illustrates a situation where misunderstandings arising from misleading phylogenetic group names are one possible cause of errors arising from flawed selection of virus strains for use in screening. This database provides many examples of relatively meaningless and often-conflicting data about resistance to common potato viruses (see supplementary information for explanation). Use of flawed diagnostic tools and incorrect targeting of virus strains arising from misleading within-species phylogenetic group names could sometimes compromise plant virus molecular biology research involving virus-host interactions, evolutionary pathways and molecular epidemiology. It would also compromise plant virus research on epidemiology, disease impacts, control measures and integrated management.

Benefits of proposal adoption

Use of Latinised numerals to name within-species phylogenetic groups (i) eliminates the possibility of incorrect deductions about properties of virus isolates within phylogenetic groups with misleading names, (ii) prevents potentially serious mistakes being made due to incorrect targeting of virus strains using inappropriate molecular tools, and (iii) avoids within-species names becoming increasingly misleading as numbers of virus sequences increase and virus and vector shifts gather pace due to expanding trade and agriculture, and global warming. Minimizing potentially serious mistakes due to incorrect targeting of dangerous biological strains using inappropriate molecular tools would also help to avoid the possibility of potentially serious mistakes by biosecurity authorities unnecessarily impeding or constraining international trade in plants and plant products, or failing to prevent inadvertent introductions of dangerous plant virus strains. It would ensure that monitoring for dangerous virus strains could be conducted effectively and aid in providing appropriate disease management recommendations, such as deployment of cultivars with correctly identified virus-resistance traits.

Would proposal adoption cause any problems?

With some viruses, within-species phylogenetic group names might still identify the biologically significant traits of all group members accurately according to current knowledge. Citrus tristeza virus (CTV) provided the only example we found that held up as having within-species phylogenetic groups containing isolates with uniform biologically significant traits. The 10 complete genome sequences of CTV that are available fit into three phylogenetic groups with members that cause severe, intermediate or mild symptoms in orange or grapefruit trees [2, 22]. However, this relationship seems unlikely to hold once further complete genome sequences are added.

There is always the potential that unforeseen consequences could arise from changing established naming conventions. In this case, due to lack of any biological or geographical information being provided when Latinized numerals are used, the change to a ‘neutral’ within-species phylogenetic group naming convention might occasionally increase the risk of introducing dangerous virus strains to new geographical locations or hosts. We suggest that a ‘neutral’ system where information regarding biological properties, geographical location, country names, etc. is avoided in naming groups of within-species virus sequences but such information is available readily from other sources (e.g., in sequence database entries that are freely accessible to everyone) is of far greater value than inappropriately labelled and interpreted phylogenetic trees. Adopting this procedure should overcome any objections to changing current within-species nomenclature approaches to use of Latinized numerals because the relevant information would be available to researchers and other end users without detracting from the advantage of having an unbiased system.

Conclusion

We hope this opinion piece will stimulate much discussion about the deficiencies of current plant virus within-species phylogenetic nomenclature and the implications of future inaction on this subject. We recommend serious consideration be given to wide-scale adoption of our Latinized numeral nomenclature proposal for within-species phylogenetic group names as, when accompanied by relevant supporting information, it overcomes all of the drawbacks inherent in the current haphazard approach. Our recommendations have implications both internationally and nationally for quarantine and biosecurity authorities, diagnostic laboratories, disease-management programs, plant breeders and researchers.