Two misleading words in reports of virus discovery: little things mean a lot
To the Editor:
An elegant definition of “virus” was made by Andre Lwoff in 1957 . That definition is somewhat outdated in light of modern studies of viruses, but it remains a fascinating and erudite read. His extensive definition of a virus could be paraphrased as an entity (a) having nucleic acid, (b) replicating as nucleic acid only, (c) not growing or dividing but replicating by a template mechanism, (d) that does not possess its own energy system (provided by the host cell which it parasitizes), and (e) that is infectious. Numerous other definitions of “virus” have been published, for example that of Raoult and Forterre , who proposed that a virus is “a capsid-encoding organism that is composed of proteins and nucleic acids, self-assembles in a nucleocapsid and uses a ribosome-encoding organism for the completion of its life cycle”. As far as we are aware, no definition of “virus” is exclusively based on genomics. Indeed, the late Nobel Laureate Sir Peter Medawar once succinctly put it as “bad news wrapped in protein” .
Improper use of the word “virus” somehow slips under the radar of some editors these days. Journals have published papers describing the molecular detection of nucleic acids, after which the authors sequence them and construct elaborate phylogenetic trees to determine whether they are from recognized or previously unrecognized viruses, and then incorrectly use the word “virus” when they should use the phrase “nucleic acid sequence” ([4–7]; numerous other citations could be presented). On the contrary, and as an example of proper use of terms and ingenious technique, the paper by Krüger et al.  reports a biological property of an ostensible henipavirus from an African bat, a virus which has not been isolated. Genomic RNA of this virus had been detected in fecal samples of a bat, the open reading frame of the fusion and attachment proteins of the putative virus were inserted into expression plasmids, and expression was compared to that of a well-characterized henipavirus from Malaysia. These surface glycoproteins then were shown to induce syncytium formation in bat cell cultures. Similarly ingenious techniques have been used to partially characterize other viruses before they were isolated [9, 10]. Still, complete phenotypic characterization awaits the isolation of a virus which can then be phenotyped.
A virus comprises, in part, one or more RNA, DNA, or RNA + DNA nucleic acid sequences, but there is more to a virus than that. Just as a forensic DNA sample is from a person, the sample is not the person from whom it was obtained. Without an actual virus isolate (difficult or perhaps impossible to obtain at this time for certain viruses), there is no virus to fully characterize. One may argue that a virus cannot be identified without an isolate; but this is a clear disregard of modern standards and is in contrast to modern concepts. Virus taxonomy now is based principally on virus genomics, the study of virus genes and their functions. Without knowledge of those genetic functions, nucleotide sequences are merely descriptions of chemical characteristics and do not provide phenotypic information regarding biological properties and, except for comparisons with the analogous sequences of characterized viruses, do not provide necessary and sufficient insights into virus ecology. Alternatively, without sequence information, all we have are descriptions of biological characteristics, which harkens back to days when viruses were originally classified by the diseases they cause, the sizes and shapes of their virions, and replication characteristics in various cell culture systems and hosts. Unquestionably, two viral genomic sequences, even complete sequences, that differ by only a single or a few nucleotides may be of interest in terms of virus evolution, changes in tissue tropism, pathogenesis, antigenicity, and host specificity, but these characteristics cannot be verified without having actual virus isolates to compare. If a newly isolated virus clearly is antigenically or genetically related to an established, well-recognized virus, knowing its entire nucleotide sequence is unnecessary for preliminary identification, but that virus cannot be considered as fully characterized until its genome is completely sequenced. This is particularly important for segmented viruses and natural reassortants (see references [11, 12]).
The other commonly used but poorly descriptive word is “novel”. If a sequence differs from a previously recognized sequence by a few or even many nucleotides, is this so novel that it is worthy of publication? Deposition of the sequence in a gene bank certainly is worthwhile, and recognition of sequence variants is important epidemiologically, but slight variants, while “novel”, are of little value unless one can demonstrate some phenotypic or biologic significance. Such so-called “novel viruses” or “novel genotypes” are what in the past were called “strains”, “subtypes” or “variants”, but they usually were defined by biological or serological means using actual virus isolates, with most not shown to have biological significance or usefulness.
The International Committee on Taxonomy of Viruses (ICTV) had accepted Van Regenmortel’s  definition of a virus species and originally defined it as: “a polythetic class of viruses that constitutes a replicating lineage and occupies a particular ecological niche”. The word “polythetic” has recently been omitted from this definition, i.e., “A species is a monophyletic group of viruses whose properties can be distinguished from those of other species by multiple criteria”. (See comments by Van Regenmortel and Murphy at http://talk.ictvonline.org/discussions/ictv1/f/63/t/3930.aspx). Regardless of which definition one prefers, it does not apply to a nucleotide sequence, so that exclusive dependence on a nucleotide sequence cannot be used to define a virus or a virus species.
Of what epidemiologic, biologic, or taxonomic use is knowing that the feces of an unidentified bat in a general location captured on an unspecified date contained a partial genomic sequence related to that of an amphibian virus? In this superficial example we only can surmise that the virus from which the sequence originated infects both the bat and the amphibian or that the bat had eaten an amphibian or that the bat had eaten some other life form that had eaten an amphibian.
There is no question that a partial sequence is suggestive and may very well lead to additional findings after additional studies, such as the detection of Marburg virus RNA in Egyptian rousettes (Rousettus aegyptiacus) and bats of other species by Swanepoel et al. , leading to the isolation of Marburg virus from Egyptian rousettes by Towner et al. .
Misuse of words is misleading and unscientific. Acceptance of ambiguous reports of fragments of genome sequences cannot replace complete genome sequence data in defining the discovery of a virus. However, recognition of this deficiency may also be an opportunity to move toward a solution to the dilemma created by the question “What is a virus?” We suggest that virologists, taxonomists, journal editors, philosophers, and perhaps others convene a session or sessions to define “sequence quality” and, ultimately, the word “virus”. Perhaps Virology Division News could serve as the facilitator for such an effort, but however it is done, in this era we need a new definition of “virus”.
Conflict of interest
Neither author has a conflict of interest in regard to this manuscript.