Mitochondrial genome announcements need to consider existing short sequences from closely related species to prevent taxonomic errors

The reconstruction of complete mitochondrial genomes (mitogenomes) has considerable potential to clarify species relationships in cases where morphological analysis and DNA sequencing of individual genes are inconclusive. However, the trend to use only mitogenomes for the phylogenies presented in mitogenome announcements carries the inherent risk that the study species’ taxonomy is incorrect because no mitogenomes have yet been reconstructed for its sister species. Here, I illustrate this problem using the mitogenomes of two seahorses, Hippocampus capensis and H. queenslandicus. Both specimens used for mitogenome reconstruction originated from traditional Chinese medicine markets rather than native habitats. Although mitogenome phylogenies placed these specimens correctly among the seahorses from which mitogenomes were available at the time, incorporating single-marker sequence from closely related species into the phylogenies revealed that both mitogenomes are problematic. The mitogenome of the endemic South African H. capensis did not cluster among single-marker DNA sequences of seahorses from the species’ native habitat, but among sequences submitted under the names H. casscsio, H. fuscus and H. kuda that originated from all over the Indo-Pacific, including China. Phylogenetic placement of the mitogenome of H. queenslandicus within a cluster of seahorses that also included H. spinosissimus confirms an earlier finding that H. queenslandicus is a synonym of H. spinosissimus, a widespread Indo-Pacific species that also occurs in China. It is recommended that mitogenome announcements incorporate available single-marker sequences of closely related species, not only mitogenomes. The reconstruction of mitogenomes can exacerbate taxonomic confusion if existing information is ignored.


Introduction
Molecular methods represent a means of confirming species status that requires little taxonomic expertise (Hebert et al. 2003), and such methods are now often used to support the distinctness of newly described species from morphologically similar taxa (Rius and Teske 2013;González et al. 2014;Zhang et al. 2016;Short et al. 2020), or challenge the validity of species descriptions (Teske et al. 2007a, b). While the vast majority of studies have used single-marker datasets (e.g., DNA barcoding), advances in DNA sequencing technology have now made it possible to sequence entire organelle genomes, including mitochondrial genomes (or mitogenomes) in animals (Nelson et al. 2012) and chloroplast genomes in plants (Li et al. 2015).
A crucial component of organelle genome announcements is the reconstruction of a phylogenetic tree that confirms the correct placement of the new genome among closely related species. In the recent literature, there has been a clear trend to construct such trees using only published organelle genomes. For example, the journal Mitochondrial DNA prefers such trees to be constructed using readily alignable proteome sequences, to the exclusion of previously published data from single genetic markers. In taxonomic groups for which few organelle genomes have been published, this has unavoidably resulted in the publication of phylogenies that do not include any congeners of the 1 3 study species (Jooste et al. 2019;Jagatap et al. 2019;Tladi et al. 2020;Emami-Khoyi et al. 2021). However, even in comparatively well studied taxonomic groups, there is an inherent risk that the organelle genome of a species that was misidentified appears to cluster correctly because its sister species are missing from the phylogeny.
Here, I highlight this problem using two mitogenomes of seahorses (genus Hippocampus Rafinesque, 1810). Seahorses are teleost fishes from the family Syngnathidae, which have a highly specialized morphology that includes an upright posture, a prehensile tail, fused jaws, an exoskeleton and a male brood pouch. While these unique morphological features make all members of the genus easy to recognize as seahorses, distinguishing between the different species is difficult because of considerable morphological variation that is affected by factors such as age, gender and the environment (Lourie et al. 2016). As a result, a large number of synonyms exists, and the number of species accepted as valid in recent taxonomic treatments have ranged from 23 to 83 species (Lourie et al. 2016).
Several seahorse mitogenomes have recently been published (Zhang et al. 2015;Wang et al. 2016Wang et al. , 2019Hou et al. 2016;Cheng et al. 2017;Chen et al. 2018;Ge et al. 2018;Lai et al. 2019;Jahari et al. 2020). The increased level of resolution provided by mitogenomes can be expected to significantly improve taxonomic resolution in cases where single-marker datasets, such as those used for DNA barcoding, provide insufficient information for distinguishing closely related species. However, as I demonstrate below, the reconstruction of mitogenomes from specimens that were not identified by taxonomic experts and that were not compared with taxonomically reliable single-marker datasets from their congeners can exacerbate the taxonomic confusion in the genus Hippocampus.
One of the recently published seahorse mitogenomes, that of the endangered Knysna seahorse, Hippocampus capensis Boulenger, 1900 (Ge et al. 2018), was reconstructed using DNA of a specimen obtained from the Bozhou Chinese herbal medicine market (Anhui Province, China). The import of African seahorses for Traditional Chinese Medicine (TCM) markets in Asia is common, and is believed to be responsible for the depletion of local stocks (T. Mkare, KMFRI, Mombasa, Kenya and I. da Silva, Universidade Lúrio, Pemba, Mozambique, pers. comm.). For example, the mitogenome of the East African H. camelopardalis was generated using a specimen from the same TCM market in Bozhou (Lai et al. 2019), and both the West African seahorse H. algiricus and the South African H. capensis were reported from a Taiwanese TCM market (Chang et al. 2013). The sale of H. capensis at these markets is a concern because this species is endangered, and endemic to three estuaries on the temperate south coast of South Africa (Teske et al. 2003;Mkare et al. 2017). The illegal collection of these seahorses for TCM markets can be expected to rapidly drive this rare species to extinction. The morphological identification of the Chinese specimen was relatively vague, and based on the absence of a coronet, a short snout and the absence of spines on the body. A comparison of the image provided in the mitogenome announcement with H. capensis suggested that the identification was questionable, with the difference in snout length being the most obvious morphological discrepancy (Teske et al. 2005). In contrast, the specimen identified as H. capensis from a TCM in Taiwan (Chang et al. 2013) resembles South African H. capensis more closely. Subsequent to the mitogenome announcement, specimens from other TCM markets were genetically identified as H. capensis by means of DNA barcoding (Hou et al. 2018;Wang et al. 2020), likely using the mitogenome as a reference.
Another recently published mitogenome, that of the Queensland seahorse, Hippocampus queenslandicus Horne, 2001 (Hou et al. 2016), was also constructed using DNA that originating from a specimen from a TCM market (Chengdu, Sichuan Province, China). The species status of the Queensland seahorse was challenged shortly after the original species description on the grounds of both morphological and genetic data (Teske et al. 2007b), and a recent taxonomic revision concluded that H. queenslandicus is a synonym of H. spinosissimus Weber, 1913(Lourie et al. 2016. Nonetheless, single-marker sequences of Chinese specimens identified as Queensland seahorses were recently submitted to the GenBank repository (Hou et al. 2017(Hou et al. , 2018.
In this paper, I show how incorporating short DNA sequence data from various mitochondrial markers into mitogenome phylogeny reconstructions can help to clarify the taxonomic status of the study species.

Methods
To determine the phylogenetic placement of the Chinese seahorses used for the mitogenome reconstructions of H. capensis and H. queenslandicus, I compared their sequences with partial sequences of mitochondrial cytochrome b (CYTB), cytochrome c oxidase subunit 1 (COX1) and control region (CR; also known as d-loop) of conspecifics or closely related species. The reason for selecting these markers is that they are available from a comparatively large number of species and locations.
Sequences of closely related seahorses were identified by conducting BLAST searches with the blastn algorithm (Altschul et al. 1990), using the mitogenomes and their CYTB, COX1 and CR regions, as well as the corresponding sequences of species that are closely related to them based on previous phylogenetic or taxonomic information. These sequences showed > 95% identity. Given the large number of sequences, only some representative samples from larger datasets were included in phylogeny reconstructions. The sequences were aligned in MEGA7 (Kumar et al. 2016) using the ClustalW alignment tool (Thompson et al. 2002). For each genetic marker, the most suitable model of nucleotide evolution was identified using the Bayesian Information Criterion (Schwarz 1978), with the following details: H. capensis (CYTB): Tamura 3-parameter model, abbreviated as T92 (Tamura 1992 (Bouckaert et al. 2014) using default parameters. As this program does not incorporate the T92 model, the second most optimal BIC model was specified, which was the HKY model (Hasegawa et al. 1985) for all datasets except CR of H. queenslandicus, where the T93 model scored higher (Tamura and Nei 1993). For each dataset, 10 million iterations were run, with every 1000 th tree saved, and the first 10% of trees removed in TreeAnnotator as burn-in. Tracer v1.7 (Rambaut et al. 2018) was used to check for convergence and to ensure that effective sample sizes were > 200 for all parameters. The resulting phylogenetic trees were visualized in FigTree v1.4 (Rambaut and Drummond 2012). To assess congruence between different methods of phylogeny reconstruction, additional phylogenetic trees were reconstructed in MEGA7 using the neighbour-joining algorithm (Saitou and Nei 1987) and maximum likelihood, in this case specifying the BIC models listed above, and complete deletion of missing data. Nodal support for neighbour-joining and maximum likelihood trees was based on 1000 bootstrap replications (Felsenstein 1985), and added to the Bayesian phylogenetic trees for comparison.

Results
Phylogenetic trees recovered previously published H. capensis sequences of known origin (some of which I collected myself in the species' native habitat and can thus vouch for their authenticity) as strongly supported monophyletic clusters (shown in green in Fig. 1). The Chinese specimen used for the mitogenome reconstruction of H. capensis (in red) was not part of these clusters, but was grouped with sequences of the recently described H. casscsio Zhang, Qi, Wang & Lin, 2016, which was subsequently found along much of China's coastline using DNA barcoding (Zhang et al. 2017), specimens identified as H. fuscus Rüppell, 1838 from the Mediterranean, the Red Sea, the east coast of Africa and the Arabian Sea, as well as H. kuda Bleeker, 1852 from various locations throughout the Indo-West Pacific. The COX1 tree included H. capensis sequences from China (Wang et al. 2020) that may represent the same population referred to as H. casscsio in the other trees (in blue), and that likely used the mitogenome as a reference for genetic identification.
In the phylogenetic trees constructed for H. queenslandicus (Fig. 2), the mitogenome clustered among sequences identified as H. queenslandicus and H. spinosissimus (including two mitogenomes of the latter species that likely also originated from China as they were reconstructed by researchers from that country). The lack of differentiation between these sequences confirms previous findings that H. queenslandicus is a synonym of H. spinosissimus (Teske et al. 2007b). The sequences from Queensland (Australia) were originally used to reject the species status, and their species names should be changed to H. spinosissimus in the GenBank repository. In contrast, the Chinese samples of this species (COX1 only) were generated recently (Hou et al. 2017(Hou et al. , 2018 and the COX1 portion of the H. queenslandicus mitogenome was likely used to identify them genetically.

Discussion
The results confirm that the Chinese specimen used to reconstruct the mitogenome of H. capensis was misidentified, and that the mitogenome of H. queenslandicus merely confirmed a previous finding that this species name is a synonym of H. spinosissimus. Further, while the incorporation of genetic data into the species description of the Chinese seahorse H. casscsio is commendable, the fact that the new species was only compared with Chinese seahorses, while published data from more closely related species were ignored, likely resulted in nothing more than the description of yet another synonym. These seahorses clustered among sequences of H. fuscus, and in the CYTB and CR trees, these clusters did not include H. kuda (this trend is confirmed in trees comprising larger datasets). In a recent revision of seahorse taxonomy, H. fuscus was considered to be a synonym of H. kuda (Lourie et al. 2016). Nonetheless, the fact that H. fuscus seems to be morphologically distinct from H. kuda because of the absence of a coronet (Teske et al. 2005), a trait that it shares with H. capensis (Teske et al. 2005) and H. casscsio , suggests that it is worthy of renewed taxonomic attention. Incomplete lineage sorting between the Arabian Sea and the South China Sea indicates that this species may be represented in the vast, unsampled region separating these two locations. Due to the relatively low information content of the mitochondrial markers used here, it cannot be ruled out that H. casscsio and H. fuscus are distinct species that have diverged very recently. However, until this is confirmed with mitogenome data from a large number of samples and locations, or high-resolution genomic data (Teske et al. 2019), the species status of H. casscsio should be rejected. Nodal support values in the phylogenetic trees constructed here were generally low (particularly for the methods using bootstrapping). However, the fact that phylogenetic trees in mitogenome announcements typically have very high nodal support values should not be considered as evidence for their quality. Phylogenetic resolution is lower when lineage sorting between closely related species that comprise shallow phylogenies is incomplete (Nater et al. 2015), and the omission of such taxa in mitogenome announcements artificially increases nodal support.
While the generation of mitogenomic data can be expected to clarify phylogenetic relationships between taxonomically problematic species, it is important that such resources are compared with as many previously generated sequences as possible. The fact that the phylogenetic trees presented in mitogenome announcements typically only include previously published complete mitogenomes, to the exclusion of shorter fragments from more closely related species, can prevent the discovery of misidentifications. Failure to integrate existing molecular data (in addition to ignoring taxonomic information based on comprehensive morphological analyses) will merely exacerbate the already confusing situation concerning taxonomically problematic groups, such as Hippocampus. Mitogenomes are likely considered to be more authoritative than single-marker datasets, as indicated by several DNA barcoding studies of Chinese seahorses referred to as H. capensis and H. queenslandicus having used the mitogenomes of the misidentified specimens discussed in this paper for genetic identification (Hou et al. 2017(Hou et al. , 2018Wang et al. 2020). The publication of taxonomically problematic mitogenomes will result in the advances in sequencing technology presenting an obstacle rather than a solution to resolving phylogenetic relationships in taxonomically challenging groups.