1 Background

Heterobranchia [1] is divided into Allogastropoda (=Lower Heterobranchia) and Euthyneura. The former is characterized by the presence of “Tri-ganglionate” state and the later (Euthyneura) by the “Penta-ganglionate” state by the acquisition of a pair of “Parietal ganglia” [1, 2]. The word “Euthyneura” was coined by Spengel [3] for uniting the taxa that reverted from the crossed state of visceral loop to uncross state during the process of detorsion. However, this is not considered as a synapomorphic character for “Euthyneura” due to the presence of this condition in few “Streptonerous” taxa [1]. Euthyneura is further divided into Opisthobranchia and Pulmonata. Each of these two groups is distinct on the basis of several synapomorphic characters of the nervous system [2]. Despite these synapomorphic characters, different molecular phylogenetic studies have yielded Lower Heterobranchia, Euthyneura, and Pulmonata as non-monophyletic [4,5,6,7,8,9,10].

Over the past few decades, with the advent of molecular techniques, there is growing information available with regard to the phylogenetic affinities among different groups of gastropods by means of widely used mitochondrial (16S, COI) and nuclear molecular markers (18S, 28S) in reconstructing their phylogenetic affinities. However, there is a need for new nuclear markers in studying gastropodan phylogenies especially with regard to deeper divergences. Histone-3 (H3) protein is one among the four highly conserved core nucleosome proteins that associate with DNA [11]. H3 gene, a nuclear marker is slow evolving when compared to the fast evolving mitochondrial markers like 16S and COI. Therefore, H3 gene is useful in studying the deep level relationships (for obtaining resolution at deeper nodes), while the widely used mitochondrial markers (16S, COI) are useful in resolving the relationships at the level of genera (Dinapoli et al. [12]). Colgan et al. [13] first used H3 marker for studying the phylogenetic relationships among Arthropoda along with U2 snRNA gene.

With regard to the utility of H3 gene in the phylogenetics of Gastropoda, it was used either alone [12] or in combination with different molecular markers [14, 15]. In all these studies, taxa from Pulmonata were poorly represented. Further, there are no comprehensive phylogenetic studies available on Pulmonata on the basis of H3 gene or in combination with other molecular makers to date. Hence, the objective of the present study is to evaluate the phylogenetic affinities among different groups of Pulmonata by including more number of sequences available for major pulmonate groups from GenBank along with the sequences obtained from the present study by means of H3 sequence information. The present work is also carried out with a view to evaluate the utility of this marker to understand the phylogeny of Pulmonata, especially with regard to divergence of internal nodes.

2 Methods

2.1 Collection of specimens

Three different species of pulmonate gastropods viz., Laevicaulis alte, Gyraulus convexiusculus, and Lymnaea (Pseudosuccinea) acuminata form rufescens were collected for the present study. Laevicaulis alte was collected from the Selapadu village from the state of Andhra Pradesh, India. Gyraulus convexiusculus and Lymnaea (Pseudosuccinea) acuminata form rufescens were collected from the “Ganesha” pond located at Sri Sathya Sai Prasanthi Nilayam Ashram, Andhra Pradesh, India. All the three specimens were sent to the Zoological Survey of India, Kolkata, for their identification.

2.2 Isolation of genomic DNA, PCR, and sequencing of H3 gene

The genomic DNA was isolated from the foot muscle tissue of snails using NucleoSpin® Tissue kit (Macherey-Nagel) by following the manufacturer’s instructions.

H3, a nuclear protein coding gene, was amplified using the primers H3F- ATGGCTCGTACCAAGCAGACVGC and H3R- ATATCCTTRGGCATRATRGTGAC [14]. The three species of gastropods were so far not sequenced for H3 gene and the analyzed sequences of these three taxa were deposited for the first time to the National Centre for Biotechnology Information (NCBI) and DNA Databank of Japan (DDBJ). Apart from these, a few more sequences were retrieved from NCBI GenBank and were included in the present study (Table 1).

Table 1 List of taxa utilized in the present study (including the taxa sequenced in the present study—marked by asterisk), taxonomic position, and accession numbers

PCR was carried out in a final volume of 20 μl with 1X Phire PCR buffer (contains 1.5 mM MgCl2), 0.2 mM dNTPs, 0.25 μM forward and reverse primers, 0.2 U Phire hot start II DNA polymerase, 0.15 mg/ml BSA, 3% DMSO, 0.5 M Betaine, 1 μl genomic DNA, and the final volume was adjusted to 20 μl using molecular biology grade water. PCR amplification was performed in a thermal cycler (GeneAmp PCR system 9700, Applied Biosystems) with the following amplification profile: denaturation at 98 °C for 30 s followed by 40 cycles of denaturation at 98 °C for 5 s, annealing at 54 °C for 10 s, extension at 72 °C for 15 s followed by final extension at 72 °C for 60 s.

PCR products were purified using ExoSAP-IT™ (GE Healthcare) following the manufacturer’s instructions and were sequenced using the same primers that were used for PCR at the Regional Facility for DNA Fingerprinting, Thiruvananthapuram, India.

The obtained chromatograms were visualized and necessary editing process was carried out in DNA Baser v4.20.0 [16] for obtaining consensus sequences. The sequences thus obtained were searched against the BLAST program of NCBI to check for contamination and subsequently submitted to DDBJ and NCBI GenBank and were assigned the following accession numbers LC198673, KX514447 - KX514448.

2.3 Multiple sequence alignment

Nucleotide sequences retrieved from GenBank were verified whether they were amplified by the same set of primers used in the present study or from the same regions of Histone 3 gene. After confirmation, nucleotide sequences obtained in the present study as well as those retrieved from NCBI GenBank (for few sequences) were trimmed at 5′ end in order to bring the sequences into reading frame. This ensures that stop codons are not present after the translation of nucleotide sequences to amino acids. Non-coding nucleotides were also removed from the 3′-end. Further, sequences obtained in the present study as well as those retrieved from NCBI GenBank were reverse complemented wherever the gene was in the complement (minus) strand.

Multiple sequence alignment was performed in MEGA v6.06 [17] using the inbuilt muscle program [18] with default parameters. Alignment of nucleotide sequences was guided by the translated amino acids. Regions of ambiguous alignments were excluded using online Gblocks server [19] under “less stringent” and “more stringent” options.

2.4 Model selection

Suitable substitution models for the datasets were determined using MrModeltest v2.3 [20] executed in PAUP* 4.0a152 [21] on the basis of Akaike Information Criterion (AIC). GTR+I+G model of nucleotide substitution was used for Dataset III in the present study.

2.5 Tests for substitution saturation

The following datasets were considered for the determination of substitution saturation using the entropy-based test of Xia et al. [22] implemented in DAMBE v6.4.48 [23]. Apart from this,p-distances were plotted against the distance obtained under GTR model in DAMBE for determining the saturation of a given loci.

  1. 1.

    Dataset I. Original unmasked sequence alignment

  2. 2.

    Dataset II. Sequence alignment refined by GBlocks under “more stringent” option

  3. 3.

    Dataset III. Sequence alignment refined by GBlocks under “less stringent” option

H3 gene was analyzed on the basis of its codons—codon 1, codon 2, codon 3, codons 1 & 2 combined, and codons 1, 2, & 3 combined.

In Dataset 3, codon 1, codon 2, codons 1, 2, & 3 (combined), and codons 1 & 2 (combined) did not show saturation. For codon 3, only few taxa were saturated. Hence, the nucleotide sequences of codons 1, 2, & 3 were combined (dataset III) and subsequently used for phylogenetic analyses in the present study. The dataset III consisted of 231 nucleotides.

Plots of p-distances against the evolutionary distances obtained under GTR model for codons 1, 2, & 3 combined for all the three datasets showed saturation of transitions and showed similar pattern across all the three datasets. Similarly, for codons 1 & 2 combined for all the three datasets, transitions are saturated. Codon 3 of all the three datasets showed a high degree of substitution saturation for both the transitions and transversions.

2.6 Phylogenetic analyses

Bayesian inference (BI) of phylogeny (Fig. 1) and maximum likelihood (ML) analysis (Fig. 2) were carried out in MrBayes v3.2.5 [24] and raxmlGUI 1.5b2 [25] respectively under GTR+I+G model.

Fig. 1
figure 1

Bayesian phylogram depicting the relationships among taxa on the basis of H3 (codons 1, 2 & 3 combined) gene. Indicated on the nodes are posterior probability (PP) values. Only PP values > 0.5 are shown

Fig. 2
figure 2

Maximum likelihood phylogram depicting the relationships among taxa on the basis of H3 (codons 1, 2 & 3 combined) gene. Indicated on the nodes are the bootstrap values. Tree scale of 0.01 represents 1 substitution per 100 nucleotides. Only bootstrap values > 50 are shown

ML analysis was carried out for 1000 bootstrap replications. Bayesian analysis was carried out for 10 million generations with two independent runs. Each run consisted of four chains (3 heated and 1 cold) with default heating parameters. Sampling was done at the end of every 1000th generation with 25% of the initial trees discarded as burnin. The average standard deviation of the split frequencies at the end of 10 million generations was 0.006199. TRACER v1.6 [26] was used to determine the effective sample size (ESS) and it was found to be > 500.

The phylograms were visualized by importing the “.con” and “bipartitions.tre” files from MrBayes and RAxML respectively into iTOL v3 - http://itol.embl.de/ [27]. Apart from iTOL (Fig. 2), TreeGraph 2.14.0–771 beta was also used for annotating the phylogenetic tree (Fig. 1) [28]. In the subsequent section, bootstrap and posterior probability values were represented either as (1/100) which means a PP of 1 and a bootstrap value of 100 or in parenthesis next to the abbreviated form of the respective analysis or towards the end of a sentence discussing the same.

3 Results

3.1 Topology of the phylogenetic trees

ML tree (Fig. 2) was comparatively resolved than BI tree (Fig. 1). However, none of the deeper nodes in ML tree had bootstrap support compared to Bayesian tree. As a result, reliable inferences from ML tree could not be drawn regarding the overall relationships within Heterobranchia, except for a few bootstrap-supported external nodes. As a result, one should not consider the following conclusions inferred from the ML tree—paraphyly of Pulmonata, polyphyly of Opisthobranchia, and lower Heterobranchia.

From the topologies of the phylogenetic trees recovered in the present study, it was not possible to clearly assign the monophyly of lower Heterobranchia, Opisthobranchia, Pulmonata, and Euthyneura. The concept of Euthyneura seems to be non-existent as lower Heterobranchia was not basal in ML and BI trees.

3.2 Lower Heterobranchia

A total of five lower Heterobranch taxa were included in the present study.

Pupa solidula, Turbonilla sp, Rictaxis punctocaelatus, and Micromelo undata formed a monophyletic clade in BI (0.98) and ML (63) phylograms. This emerged as a sister clade to Lymnaeidae and Planorbidae with moderate support in BI (0.62). Acteonoidea recovered as non-monophyletic in BI and ML trees.

Orbitestella sp formed a monophyletic clade with an Opisthobranch Haminoea hydatis in both BI (0.99) and ML (56) analyses.

None of the lower Heterobranch taxa emerged basal in ML and BI phylograms.

3.3 Opisthobranchia

Of all the Opisthobranch taxa included in the present study,Acanthodoris sps formed a monophyletic clade. The relationships of the remaining Opisthobranch taxa were not clearly resolved.

Aplysiomorpha: Akera bullata formed a basal clade in BI and ML trees. In BI tree, it formed a monophyletic clade with the outgroup taxaPila conica (Caenogastropoda).

Sacoglossa: Plakobranchus ocellatus was the next basal taxon in BI (0.71) and ML trees.

Cephalaspidea: Haminoea hydatis formed a monophyletic clade with a lower Heterobranch taxon in BI (0.99) and ML (56) trees.

Nudipleura:

Anadoridoidea: Acanthodoris brunnea and Acanthodoris lutea formed a monophyletic clade in BI (1) and ML (100) analyses and emerged as a sister clade to Ellobiidae.

Bathydoridoidea: Bathydoris clavigera formed a monophyletic clade with Tylodina perversa (Umbraculoidea- Tylodinidae) in ML (56) tree.

3.4 Pulmonata

Siphonariidae: Siphonaria concinna and Siphonaria denticulata formed a monophyletic clade in BI (0.94) and ML analyses.

Hygrophila:

Lymnaeidae: Two species viz.,Lymnaea acuminata and Lymnaea stagnalis formed a monophyletic clade in BI and ML trees with strong node support values (0.99/89).

Planorbidae: Planorbidae represented by five species formed a monophyletic clade with strong support both in BI (1) and ML (89) trees.

Lymnaeidae and Planorbidae in turn formed a monophyletic clade in BI (0.84) and ML trees. This clade emerged as a sister clade to Lower Heterobranchia.

Eupulmonata:

Ellobiidae: Of the four Ellobiids Pedipes jouani, Pedipes ovalis, Carychium minimum, and Ophicardelus ornatus included in the present study, Pedipes jouani and Pedipes ovalis formed a monophyletic clade in BI (0.98) and ML (82) phylograms. They emerged as a sister clade to Opisthobranch taxa Acanthodoris brunnea and Acanthodoris lutea in BI (0.64) and ML trees.

Carychium minimum, Ophicardelus ornatus, and Salinator solida (Amphiboloidea) together formed a monophyletic clade in BI (0.55) and ML trees. These two clades were differently positioned in the phylogenetic trees.

Systellommatophora:

Onchidiidae: Of the six Onchidiids included in the present study, Onchidium sp emerged as a basal taxon in both BI (0.80) and ML analyses. Remaining five taxa formed a monophyletic clade in BI (1) and ML (91) trees.

Veronicellidae: Laevicaulis alte and Veronicella cubensis formed a monophyletic clade in both BI (1) and ML (76) analyses.

Most importantly, Onchidiidae and Veronicellidae did not group together in BI and ML phylograms to form a monophyletic Systellommatophora. Each of the two families was positioned differently in ML and BI trees.

Stylommatophora: Stylommatophora recovered monophyletic in both BI (0.52) and ML analyses.

4 Discussion

In the present study, H3 sequence information was used to assess the phylogenetic relationships within Heterobranchia, in particular to Pulmonata. In the present study, H3 gene sequences of taxa belonging to major pulmonate groups’ viz. Planorbidae, Ellobiidae, Onchidiidae, Veronicellidae, and Stylommatophora (represented by Helicidae and Orculidae) were retrieved from GenBank and included along with the sequences obtained in the present study for studying the phylogenetic relationships among pulmonates.

Relationships among Opisthobranchia, Lower Heterobranchia, and Pulmonata could not be determined from the topology of the trees recovered. ML tree is better resolved compared to Bayesian tree. However, none of the inner nodes were bootstrap supported. In ML and Bayesian trees, there was no step wise or progressive arrangement of taxa from Lower Heterobranchia to Pulmonata.

A few of the pulmonate groups’ viz. Planorbidae, Veronicellidae, and Orculidae recovered monophyletic with node support in both BI and ML trees. Stylommatophora formed a monophyletic clade in both ML and BI trees with high node support seen only for BI tree. Systellommatophora is non-monophyletic. It is interesting to observe that Onchidium sp emerged as a basal taxon from the rest of the Onchidiids in ML and BI trees. This is similar to the results obtained in the study of Dinapoli et al. [12] where Onchidium sp emerged separately from the remaining onchidiids.

The monophyly of Eupulmonata consisting of Systellommatophora and Stylommatophora and also the concept of Geophila [29] was not observed from the topologies of ML and BI trees as Stylommatophora and Systellommatophora were positioned differently in the phylograms. Taxa from Lower Heterobranchia emerged as a sister clade to Pulmonata and Opisthobranchia.

Further, the gene sequence information for H3 marker is not available for several taxa when compared to 16S, 18S, 28S, and COI. A simple search in the international nucleotide databases (DDBJ, NCBI, and EMBL) shall reveal this fact. As an example, Systellommatophora consists of three families’ viz. Onchidiidae under the superfamily Onchidioidea, Veronicellidae and Rathouisiidae under the superfamily Veronicelloidea. As at April 2019, there is only one sequence available forVeronicella cubensis belonging to the family Veronicellidae in NCBI GenBank. Hence, Veronicellidae is under-represented and the nucleotide sequence of H3 gene obtained from the present study for Laevicaulis alte belonging to this family was reported for the first time. Similarly, there is only one Histone-3 gene sequence available for Lymnaea stagnalis under “Lymnaeoidea/ Lymnaeidae”. In the present study, Lymnaea acuminata of this family was sequenced and this report is also the first of its kind.

Colgan et al. [14, 15] have utilized H3 gene in combination with other marker(s) for studying the phylogeny of Gastropoda. In their studies, pulmonates were poorly represented and consisted of only five pulmonates. Whereas, the present study included a large number of pulmonate taxa representing major groups in Pulmonata. In contrast to the results obtained in the present study, Euthyneura recovered monophyletic in these two studies. In the study of Colgan [14], within Euthyneura, opisthobranchs were paraphyletic with respect to pulmonates. When H3 gene alone was used, it resulted in an unresolved topology (polytomy). In the study of Colgan [15], Lower Heterobranchia (“Heterostropha”) and Heterobranchia are rarely monophyletic and monophyly of Opisthobranchia could not be determined due to tree topologies. This is more or less similar to the results obtained in the present study, where Lower Heterobranchia and Opisthobranchia did not emerge monophyletic.

Dinapoli et al. [12] had utilized the H3 gene alone in reconstructing the phylogeny of Heterobranchia. Unlike the present study, her study consisted of only nine pulmonates and most of the pulmonate groups were not represented. The phylograms lacked resolution as a result of which the monophyly of Pulmonata and Opisthobranchia were not obtained. Similar to the results obtained in the study, there is a good resolution at the level of genera.

5 Conclusion

The present study, for the first time reported the phylogeny of Pulmonata with increased number of taxa on the basis of H3 gene. This is in contrast to most of the phylogenetic studies, which focused on few of the widely used markers viz. 16S, 18S, 28S, and COI. Further, in order to obtain accurate results on the phylogenetic relationships among Heterobranchia and in particular to Pulmonata, there is a need for the utility of new molecular markers like H3 gene; particularly to study the divergences of deep nodes. However, phylogenies on the basis of H3 gene lacked resolution owing to high grade of conservation and high codon usage bias that results in partially obscured phylogenetic signal [12]. In the present study, a few of the major groups in Pulmonata recovered monophyletic. In the future, sequencing more number of taxa for H3 gene from different groups of Pulmonata, with increased sequence lengths may shed deeper insights into the phylogeny of Pulmonata and Heterobranchia.