Background

Eukaryotic genomes are highly variable in structure and size because of the presence of vast quantities of repetitive DNA [1, 2]. Satellite DNAs (satDNAs) are a common component, accounting for an important part of the genome in most animal and plant genomes (reviewed by [3,4,5]. In general, a genome has a varied number of satDNA families (the satellitome) [6] with varying nucleotide sequences and genomic abundance [7,8,9,10,11,12,13]. Although some examples of small arrays scattered throughout euchromatin have been documented [6, 14,15,16,17,18,19,20,21,22], these sequences are often found in centromeres and in pericentromeric and subtelomeric heterochromatic areas [4, 10, 23]. More than just “junk DNA” (as for a long time considered), several studies have revealed that satDNAs have a role in a variety of biological processes, including gene regulation [24], centromere function [25], chromatin modulation [26], and spatial chromosomal structure [27,28,29]. The combination of cytogenetics and genomics studies has proven to be useful in elucidating numerous aspects of genome evolution and organization [30, 31], with particular emphasis on repetitive DNAs [6, 32,33,34]. Furthermore, due to their tandemly repeated genomic organization, satDNA studies in non-model organisms were boosted in the last few years, especially with the development of several assembly-free pipelines designed for using raw reads [35,36,37,38]. In this context, several satDNA catalogs were characterized in a variety of invertebrate and vertebrate species [6, 34, 39,40,41,42,43,44].

Although related species sharing a common ancestor share the same ancestral library of satDNA families, differential amplification of the different satDNAs, and different variants of each of them, leads to differentiated satDNA profiles in each species [5, 7, 10]. This involves the replacement of some satDNA families by others at specific sites on chromosomes such as centromeres, for example. The rate of change is very rapid, and satDNA sequences represent one of the fastest evolving genomic components. This often leads to high levels of interspecific sequence diversity even within closely related species, exhibiting very different profiles (both quantitative and qualitative) of satDNAs in their genomes [4, 10, 45]. However, satDNA rate of change can be altered (accelerated or slowed) by various factors such as the location and organization of the repeated sequences [46], functional constraints [42, 47,48,49,50,51], biological factors [8, 11], or population and evolutionary factors [41, 52, 53]. In this regard, it is particularly intriguing to investigate why, for some satDNAs, this process is slower than expected and persists over long periods, spanning dozens (or even hundreds) of million years, in the same chromosomal location in the entire group of related species [32, 42, 47, 52,53,54,55,56]. It is also important to examine the effects of slow rates of molecular and morphological evolution described in a species group [52] on the evolution of the satellitome.

Crocodilians are one of the oldest extant vertebrate lineages, demonstrating evolutionary success and morphological resilience over many millions of years [57]. Extant species have preserved physical and ecological traits for nearly 100 million years, unlike other vertebrates that have undergone significant diversity [58,59,60]. Crocodilians have a key position in vertebrate phylogeny because, combined with dinosaurs, pterosaurs, and modern birds, they compose the archosaurs, a monophyletic group [61,62,63]. Crocodylia is classified into three families: Crocodylidae, Gavialidae, and Alligatoridae, with approximately 27 species [64]. The family Alligatoridae is made up of eight species that are divided in four genera: Melanosuchus, Paleosuchus, and Caiman, which belong to the Caimaninae subfamily and Alligator, which forms the monogeneric subfamily Alligatorinae. Except for the Alligator genus, where A. mississippiensis and A. sinensis are limited to the Southeastern United States and China, respectively, all the other six species are presently found in South America, being more widespread in Brazil [57, 65].

The karyotypes of all current Alligatoridae species were recently revised using conventional differential staining and up-to-date molecular cytogenetic approaches [66]. Although there is a limited amount of diversity and a certain level of karyotype stasis (with diploid numbers equal to 2n = 42 and 2n = 32 for all Caimaninae (Caiman, Paleosuchus, and Melanosuchus) and Alligatorinae (Alligator) species, respectively), their genomic content revealed significant interspecific divergence [66].

Here, we performed the first broadscale comparative analysis of the alligators’ satellitomes. By combining genomic and chromosomal data, we identified and compared the full catalogs of satDNA families (i.e., the satellitomes) of 5 of the 8 extant Alligatoridae species, revealing ancestral patterns of evolution and enabling investigation into how satDNA families evolve over time. The results revealed strong sequence conservatism among Alligatoridae species with very limited diversity of their satDNA library. Furthermore, fluorescence in situ assays in all the 8 extant Alligatoridae species showed that the identical satDNA orthologs can exhibit various hybridization patterns, indicating their high evolutionary dynamics.

Results

Bioinformatic satDNA characterization

After several iterations (C. yacare = 5, C. latirostris = 7, M. niger = 4, P. trigonatus = 4, and A. sinensis = 2), we characterized 39 satDNAs in Alligatoridae, where repeat unit lengths ranged from 23 (PtrSat11-23) to 6317nt (ClaSat02-6317) and the average of their A + T content was 46.9%. Specific features of the satDNAs in each species are summarized in Table 1. The number of iterations performed for each species was a consequence of the results obtained in each round so that when no new tandem repeats were discovered in a given round the analysis was not continued. Thus, for example, in the case of A. sinensis no tandem sequences were discovered in the third interaction. Iterations using RepeatExplorer2 after TAREAN did not return any characterized satellite DNA for the five species analyzed.

Table 1 General features of Alligatoridae satellitomes characterized with TAREAN. SF = superfamily, RUL = repeat unit length, TSI = tandem structure index. Divergence per satDNA was expressed as the percentage of Kimura divergence. SatDNAs that have 50% or more identity belong to the same group

In general, alligators and caimans analyzed here exhibited few satDNAs (minimum of three in A. sinensis and maximum of 13 in C. latirostris), and also a small diversity in the within- and between-species level. Intraspecific cases with similarity greater than 50% and less than 80% were classified as the same superfamily, while interspecific cases with similarity greater than 50% were placed in the same group, which was further subdivided into four distinct ones.

Based on sequence alignments, four main groups of satDNAs were identified showing at least 50% of similarity that encompassed satDNAs from at least four species, named here as group 1 (N = 19 satDNAs shared among Caimaninae), group 2 (N = 8 satDNAs shared among Caimaninae), group 3 (N = 6 satDNAs shared among Alligatoridae), and group 4, with satDNAs from two species (N = 2 satDNAs shared among Caimaninae) (Additional file 1: Fig. S1; Additional file 2: Table S1; Table 1), the remaining 4 satDNAs did not show any similarity with other sequences (Table 1). This classification helped us to delimit the origin of some satDNAs and follow their diversification patterns in each species.

We also performed one comparative RepeatExplorer2 run, inputting reads from all the analyzed species into a single dataset. Results obtained corroborated our previous analyses and satDNAs from groups 1–4 were found, as well as other tandem repeats not classified within these groups (Additional file 2: Table S2). A general clustermap considering the genomic abundances of satDNAs exhibiting a maximum of 20% of divergence in each species was also generated and is in accordance with the phylogenetic relationships (Fig. 1). As expected by theory, the more distant two species are, the fewer satDNAs they share. For instance, A. sinensis almost does not share satDNAs with the other species.

Fig. 1
figure 1

Clustermap evidencing presence/absence and abundance of the satDNAs across Alligatoridae species. Abundances were calculated as the log10 of the proportion of short reads masked as satDNAs with a maximum of 20% of divergence and normalized by single-copy genes. On the left, hierarchical calculated clusters; on the right, species name and the proposed phylogeny for the group with their respective divergence times, based on data generated by [65]

Expanding our analyses, as we found a significant RUL variability in each of those groups, we generated a global dot-plot with sequences from the abovementioned first three groups (groups 1, 2, and 3) (Fig. 2). As expected, sequences belonging to a same group showed similarities as revealed by the dotplots, which also indicated that most longer satDNAs within groups probably emerged from the diversification of pre-existing shorter satDNAs.

Fig. 2
figure 2

Sequence alignment of satellite repeats in groups 1, 2, and 3, demarcated by arrows in the alignments, representing the subunits from which the sequences in each group originated. In addition, dotplots of each satDNA, presenting the internal repetitions among all monomers, are also indicated

We generated dotplots for each satDNA monomer supporting this view (Fig. 2). Thus, for example, among group 1 satDNAs we observed that the 41-bp satellites are made up of a structure composed of two subrepeats (21 + 20 bp) (Fig. 2). Comparisons between these two subrepeats suggest that a satDNA of about 20 bp in length must have existed and that a new satDNA composed of 41-bp repeats emerged through a process of duplication and subsequent divergence (in fact, one of the satellites in this group, PtrSat11-23, is 23 bp long). Therefore, when comparing the mean divergence that exists between 41-bp repeat units (inter-repeat divergence) with the mean divergence that exists between 20/21 bp subunits that compose each repeat (intra-repeat divergence), we always find that the former is smaller than the latter (Additional file 2: Table S3 and Additional file 1: Fig. S2). Satellites of this group with lengths longer than 41 bp show a complex pattern of several cycles of duplication and divergence of subunits of about 40 bp or more. For example, the analysis of ClaSat06-1063 demonstrates a complex evolutionary pattern based on different cycles of duplication and divergence of sub-repeats of approximately 40/80 bp (including intervening sequences) (Additional file 1: Fig. S3). Similarly, group 2 satellites that are composed of 60 bp repeat units have a pattern of 20 bp subunits that again point to a formation of 60-bp satellites from smaller satellites (in fact, one of the satellites in this group, ClaSat12-24, is 24 bp long) (Fig. 2). Also in this case, mean inter-repeat divergence is smaller than mean intra-repeat divergence (Additional file 2: Table S4). Finally, group 3 is the only example of shared satDNAs between all the analyzed species here. Repeat monomers of 40 bp are predominant among these satDNAs, and dot-plot analysis revealed a heterogeneous structure based on two different subrepeats (29 bp + 11 bp, Fig. 2). In this case again, we show that mean intra-repeat divergence is greater than mean inter-repeat divergence (Additional file 2: Table S5 and Additional file 1: Fig. S4). Remarkably, in A. sinensis, a 96-bp-long satDNA was characterized in this group, and its monomer sequence reveal a complex structure composed of two 40-bp subrepeats and an intervening sequence (40 bp + 16 bp + 40 bp, Fig. 2).

BLAST searches against the genome of A. sinensis revealed that satDNAs classified as groups 1 and 2 were not found in this species, while matches were observed for sequences belonging to group 3 (CyaSat02-40, ClaSat05-40, MniSat04-40, PtrSta10-40, and AsiSat03-96) and group 4 (ClaSat04-536 and PtrSat09-490). BLAST searches also resulted in matches against ClaSat02-6317 and ClaSat13-398, satDNAs that are not classified in any group (results are summarized in Additional file 2: Table S6). These results suggest that groups 1 and 2 of sequences emerged after the split of Caimaninae and Alligatorinae, while groups 3 and 4 are shared among the representatives of both subfamilies. In addition, Alligatorinae-specific AsiSat01-1717 and AsiSat02-60 satDNAs returned abundant significant matches, as expected.

ClaSat02-6317 and ClaSat13-398 produced multiple hits against the A. sinensis genome (n = 6264 and 1712, respectively). Remarkably, the obtained TSI for ClaSat13-398 was low (TSI = 0.21), suggesting that this sequence is dispersed along the genome. While ClaSat02-6317 exhibited a higher TSI (TSI = 0.74), we hypothesize that this is due to its larger monomer size. Since the fragments of paired-end sequencing are usually around 300–400 bp and the monomer of this satDNA is > 6000 bp, the obtained TSI is most likely due to mapping in the same monomer, not mapping in adjacent monomers. In fact, both ClaSat02-6317 and ClaSat13-398 do not show FISH hybridization signals in this species supporting their scattering as short tandems throughout the genome. Interestingly, a RepeatMasker search on the vertebrate database of Repbase revealed that the former is homologous to endogenous retroviruses (ERVs) (62% identity; 70% of the element) and the latter to LINE sequences (70% identity; 53% of the element) (Supplemental Table S7). On the other hand, this search also revealed that AsiSat1-1717 shared a 54 and 82% of its sequence with two satellite DNAs previously found in the Nile crocodile (NCBI accession numbers: OP480175 and OP480176).

Chromosomal location of satDNAs with differential abundance between species

We analyzed the chromosomal location of satDNAs that were successfully amplified by PCR belonging to group 1 (ClaSat01-41; ClaSat06-1063; ClaSat07-320; ClaSat08-800 and ClaSat11-547), group 2 (ClaSat10-60), group 3 (ClaSat05-40), and group 4 (ClaSat04-536) in addition to the two exclusively ones found in A. sinensis (AsiSat01-1717 and AsiSat02-60) in all Alligatoridae species to check their chromosomal distribution. Additionally, the ungrouped satellites ClaSat02-6317 and ClaSat13-398 were tested but none of them yielded visible FISH signals in any species (data not shown).

Concerning the ClatSatDNAs, except for the satellite ClaSat04-536 (group 4), which showed no hybridization signals in any species, all the other satDNA sequences were found in (peri-) centromeric heterochromatin regions in all Caimaninae species (Figs. 3, 4, and 5). Both alligators (A. sinensis and A. mississippiensis) showed no hybridization signal for any of the ClaSatDNAs investigated (data not shown), which is in accordance with the clustermap analysis. Here, to illustrate, we present the results for representative selected ClaSatDNAs from each of the major groups identified (Figs. 3, 4, and 5). The satDNA ClaSat01-41, belonging to group 1 (the most frequent group present in each species), were mapped in two chromosomal pairs in all Caimaninae species except P. trigonatus, which did not display any hybridization signal (Fig. 3). However, despite sharing the same motifs, some divergent and species-specific chromosomal location patterns were observed among ClaSatDNAs from group 1 among species (Additional file 1: Figs. S5–S6). For satellites in groups 2 and 3, numerous chromosomal pairs containing these sequences were found in nearly all species (Figs. 4 and 5). M. niger is distinctive for displaying hybridization signals on only two chromosomal pairs for group 3 satellites (Fig. 5d).

Fig. 3
figure 3

Metaphase chromosomes from C. crocodilus (a), C. latirostris (b), C. yacare (c), M. niger (d), P. palpebrosus (e), and P. trigonatus (f) after in situ mapping of ClaSat01-41 (group 1). The satDNA FISH signals are highlighted in green (ATTO488 labeled) or red (ATTO550 labeled) and the chromosomes were counterstained with DAPI (blue). Scale bar = 20 μm

Fig. 4
figure 4

Metaphase chromosomes from C. crocodilus (a), C. latirostris (b), C. yacare (c), M. niger (d), P. palpebrosus (e), and P. trigonatus (f) after in situ mapping of ClaSat10-60 (group 2). The satDNA FISH signals are highlighted in red (ATTO550 labeled) and the chromosomes were counterstained with DAPI (blue). Scale bar = 20 μm

Fig. 5
figure 5

Metaphase chromosomes from C. crocodilus (a), C. latirostris (b), C. yacare (c), M. niger (d), P. palpebrosus (e), and P. trigonatus (f) after in situ mapping of ClaSat05-40 (group 3). The satDNA FISH signals are highlighted in red (ATTO550 labeled) and the chromosomes were counterstained with DAPI (blue). Scale bar = 20 μm

Besides, we also mapped the two exclusive satDNAs presented in A. sinensis genome (AsiSat01-1717 and AsiSat02-60) in all Alligatoridae species. Both AsiSatDNAs showed hybridization signals only in Alligator species. While AsiSat01-1717 was exclusively mapped in several chromosomes, AsiSat02-60 was mapped in all centromeres of both species (Fig. 6). Collectively, our analyses revealed, for A. sinensis, that (i) although group 3 and group 4 satellites are present in the A. sinensis genome, these satellites are poorly represented and possibly organized in short tandems scattered throughout the genome as can be deduced from TSI values, BLAST search and FISH: hybridization signals were not visible, and satellites exhibited high TSI, but low number of alignments in BLAST; and (ii) this is in contrast to the alligator-specific satellites that appear clustered at loci on long arrays, consistent with results obtained in FISH experiments in which these satellites give conspicuous FISH bands, high TSI values, and a large number of alignments in BLAST.

Fig. 6
figure 6

Metaphase chromosomes from A. sinensis (a and c) and A. mississippiensis (b and d) after in situ mapping with AsiSat01-1717 (a and b) and AsiSat02-60 (c and d) probes. The satDNA FISH signals are highlighted in red (ATTO550 labeled) and the chromosomes were counterstained with DAPI (blue). Scale bar = 20 μm

Discussion

Despite the fact that both alligators’ (A. sinensis and A.mississippiensis) complete genomes were characterized some years ago [63, 67, 68], genome-wide investigations of satDNAs in this group of organisms were never undertaken. SatDNAs are well known to be underrepresented in genome assemblies [4], particularly those genomes assembled using short-read sequencing technology, as is the case with alligators. In this context, knowledge about satDNAs in crocodilians was limited to just a few works [69, 70]. Given that high-throughput satellitome analysis has been very enlightening for understanding the satDNA evolution in various organisms, we used a chromosome- and genomic-based approach to try to describe the satellitome from members of all current Alligatoridae genera for the first time. In a period of around ~ 70 Myr (million years), many satDNA sequences are shared among the species, assisting in the hypothesis that they are derived from small sequences, as shown in Fig. 2. Furthermore, in following fluorescence in situ tests the distinct hybridization patterns for the identical ortholog satDNAs were observed.

After mining satellite DNAs using well-established bioinformatic pipelines [6, 36], we found that alligators’ satellitomes are among the smallest catalogs described until now, varying between 3 and 13 satDNAs, in A. sinensis and C. latirostris, respectively. In recent years, several satellitomes from a wide range of species, including plants and animals, have been identified [6, 42,43,44, 71,72,73]. These investigations showed that satellite DNA profiles are very dynamic. For example, characiform fish satellitomes display a significant quantitative and qualitative variation, with some species exhibiting a few dozen [44], while others can show more than one hundred satDNAs [74]. Here, we found that all alligators are similarly satDNA-poor constituting a common trend in this group.

Novel satDNA families can emerge by variable mechanisms and from multiple genomic regions, like introns, transposable elements, and/or existing satDNA families [38, 75, 76]. Our findings indicated that there was little intraspecific variation in satellite DNA, indicating that most new satellite sequences evolved from pre-existing ones. For instance, C. latirostris exhibited 13 satDNAs, but six and three of them were grouped as superfamilies (sequences showing more than 50% of similarity and less than 80%), named here as groups 1 and 2, respectively (Table 1). Interestingly, this limited diversity is also apparent at the interspecific level, where over 90% of the 39 satDNAs described for Alligatoridae can be categorized into 4 main groups of sequences. After their origin, new longer satellites derived from the complex diversification of shorter ones already existing in the genome throughout different and successive cycles of duplication and divergence, which has been extensively documented in other species [46, 52, 72].

The long-term evolution of satellite DNA catalogs in related species can be explained by the library hypothesis. Fundamentally, it states that changes in the profiles of satDNAs among species are mostly quantitative in the “library,” rather than multiple de novo origins [77]. Here, we could track the origin of the ancestral forms of satDNAs belonging to groups 1–4 to, at least, the common ancestor of Caimaninae (groups 1, 2, and 4) and Alligatoridae (group 3). We observed a substantial degree of similarity in satDNAs among species, with only four being species-specific. The long-term maintenance of satDNAs is notable. In this context, the conservation could be related to the acquisition of cellular function [42, 47,48,49,50,51, 77], particular genomic organization [32], or slow rates of evolution [52]. Previous studies found slow rates of molecular evolution within crocodilians [63]; thus, we hypothesize that satDNAs also evolved slowly in this group (as discussed below). In squamate reptiles, while the great majority of sequences are of recent origin and only observed in closely related species [78,79,80,81,82,83], several (and most common ones) are largely conserved in unrelated species [84].

The chromosomal mapping analysis revealed that all characterized satellites showed the general same chromosomal location (i.e., large peri- and centromeric blocks) among species, showing specific patterns for each one (Figs. 3, 4, 5, and 6 and Additional file 1: Figs. S5–S6). On the other hand, it is interesting to see that group 1 satellites, even being the most abundant in the Caimaninae genome, show a visible block of FISH signal in only two chromosomal pairs. When using the FISH technique, as a specific satDNA sequence can actually display a variety of array structures (dispersed and/or clustered into long and nonrandom arrangements) among species, it results in a range of labeling patterns at the chromosomal level. This is particularly true, for example, for the group 3 ClatSat05-40 because, although being abundant in the genome of A. sinensis (as indicated by our BLAST results in Additional file 2: Table S6 and clustermap using RepeatMasker data), it exhibits a non-cluster organization, which hindered in situ experiments from producing any detectable hybridization signals at the chromosomal level. We hypothesize that this could well explain the FISH patterns observed in Caimaninae for group 1 satDNAs, although we cannot verify this as we do not have the complete sequence of their genomes nor are these satDNAs present in the A. sinensis genome for comparisons. In this context, different satellites of groups 1 and 2 show TSI values that are compatible with a dual organization, both forming loci visible by FISH and forming short arrays scattered throughout the genome not detectable by FISH (Table 1).

On the other hand, it is remarkable that two of the satellites studied in this paper (ClaSat02-6317 and ClaSat13-398), which appear to be dispersed according to BLAST and FISH results, are related to mobile elements and show homology of an important part of their sequences with such elements, which suggests that these satellites have evolved from this type of elements. Specifically, ClaSat02-6317 is related to ERVs, while ClaSat13-398 is related to LINEs. There is increasing evidence that TEs are a major source of satellites (Šatović-Vukšić and Plohl, 2023) and these results support this proposal. Interestingly, it has been shown that the majority of within-crocodilian TE activity is derived from ERVs (Chong et al. 2014; Sotero-Caio et al. 2017). Our results therefore also support that these elements can constitute a source for satellites in Crocrodylia.

Our current findings are in line with the karyotype patterns described for the family, which show a stable dichotomy between the genera Alligator (2n = 32) and Caiman, Melanosuchus, and Paleosuchus (2n = 42), with 2n = 32 representing the most likely ancestral state [revised in 66]. The two main divergent karyotype groups to which these reptiles belong are reflected in both the specificities of their respective satDNA libraries in terms of their sequence composition and chromosomal locations. However, all the satDNAs were mapped in the constitutive heterochromatin that is limited to the pericentromeric areas in all Alligatoridae species [66]. It is reasonable to consider that some of these satellites would be a component of the centromeric chromatin, much like in other species [4, 5]. Although the presence of multiple dispersed loci composed of a single copy or a few tandem copies of a satDNA is a fact today [23], the accumulation of satDNAs (as well as other repetitive DNA families) in centromeres and in heterochromatic regions is characteristic, as observed in many other groups [4, 23, 85,86,87]. Such colocalization (i.e., the tendency to occupy similar locations on non-homologous chromosomes) might have been facilitated by the reunion of centromeres at the first meiotic prophase bouquet [6, 88]. This is especially true in Caimaninae since the karyotypes of all species are dominated by acrocentric chromosomes. In this context, the existence of large and small chromosomes in Caimaninae could be favoring the structural differences at the (peri)centromeric level between different chromosomes [89].

Both alligator species, A. sinensis and A. mississippiensis, displayed hybridization signals only for two (AsiSat01-1717 and AsiSat02-60) among all the investigated satDNAs (Fig. 6). Furthermore, AsiSat02-60 was exclusively mapped in all centromeres of both Alligator species. That is, these two species have conserved the same (peri)centromeric satDNA in all their chromosomes underscoring its possible important role in the centromeric and pericentromeric organization, a role that it may be shared with AsiSat01-1717 in some chromosomes. Alligatorinae long diverged (~ 70 Myr) from all the other Caimaninae and have highly rearranged karyotypes (2n = 32) that are predominantly metacentric, in contrast with all Caimaninae species that have 2n = 42 chromosomes and karyotypes dominated by acrocentric chromosomes [66]. We have proposed that 2n = 32 represents the likely ancestral state and that the karyotype diversification in Caimaninae was followed by a series of Robertsonian rearrangements in which centric fissions played a key role [66]. Accordingly, alligators’ satellitomes are among the smallest catalogs described until now for any species, with only 3 satDNAs identified.

Taking together the data obtained in this work, we can conclude that this group of ancient species that have survived on Earth for more than 100 Myr, has a very small common catalog of satDNA families. Nevertheless, each of the two lineages analyzed (Caimaninae and Alligatorinae), which have diverged for more than 70 Myr, is differentiated by the satDNAs that have been amplified in each group at the centromeric level. What stands out in this study is that these satellites have been conserved during all this time and persist for reasons that we have to analyze below. While the same satellite has been conserved in centromeres of Alligatorinae species for about 70 Myr, the chromosomal rearrangements that have taken place in the Caimaninae lineage would have caused the emergence and diversification of new satellite DNAs that have replaced them in the (peri)centromeric regions. Some of them, such as those of Group 3, were already present in a dispersed form in the ancestral genome of Alligatoridae, as was possibly the case with the satDNAs of Group 4 and the ungrouped satDNAs ClaSat02-6317 and ClaSat13-398 (Additional file 2: Table S2), still dispersed in all Alligatoridae species. In fact, the replacement of some satDNAs by others is common at the centromeric level even among closely related species in both animals and plants (reviewed in [4, 5, 90], see also the “Background” section). In the case of Alligatoridae, the slow evolution of their genomes may also be affecting in turn, as it was suggested for satDNAs from sturgeons [52, 53]. Extant crocodiles have limited rates of morphological [91, 92], molecular [63], and karyotype diversification [66, 93, 94]. Likewise, the present-day satellitome (particularly the Caimaninae species) shares common satDNA libraries among its species, despite their long time of divergence. Therefore, the following questions arise: (a) why have they also changed so little in such a highly variable genome fraction over such an enormous span of time?; (b) would such low genetic, karyotype, and morphological variability be related to the low number of extant crocodilian species?

Crocodylomorpha (a clade that comprises living and extinct crocodilians) first appeared roughly 250 million years ago, and its 28 existing species are among the biggest living ectothermic animals. As a result, their survival over such a long geological period is of great evolutionary importance. They do, however, have a rich fossil history that includes hundreds of extinct species, revealing a hidden past of incredible variety and complexity [95, 96]. Oaks [65] has questioned the traditional notion of crocodiles as old “living fossils,” arguing that most extant crocodilians are remnants of formerly successful lineages in terms of diversity and range. Crocodylomorpha is the only pseudosuchians to have survived the Triassic-Jurassic (TJ) extinction event, which happened around 200 million years ago [97, 98]. Furthermore, after the mid-Miocene climatic optimum, there was a huge drop in crocodilian diversity, which coincided with global cooling and glacial advancement. During the Pliocene, the number of taxa is believed to have decreased from around 26 to 8, representing the greatest extinction rate over the previous 100 million years [99]. As a result, the selection of an “evolutionary package” with similar genomic, chromosomal, morphology, and physiology to what is currently observed among extant species most likely resulted from drastic demographic declines or founder events and represented evolutionary responses to a long-term bottleneck history.

Conclusions

This study is the first to offer a comparative mapping of the satDNA families in Alligatoridae. We observe some level of interspecific divergence even with so strong sequence conservatism through Caimaninae. With the results, we learn that satDNA orthologs indicate their evolutionary process according to different hybridization patterns. After rounds of mining, we discover the alligators’ satellitomes are one of the smallest satDNA libraries described so far, with just four groups of satDNAs and four sequences species-specific between all species, possibly showing as ancestral features for the group, conserved throughout the crocodilians for a long time. With additional studies about repetitive DNAs in the other families of Crocodylia, it is important to demonstrate the evolution of these sequences and provide more information about the chromosomal evolution in reptiles.

Methods

Samples, DNA extraction, and chromosomal preparation

Table 2 summarizes the collecting sites, number, and sex of individuals used in this investigation. The sampling is similar to that previously examined by [66]. In vitro blood cultures were used to obtain chromosomal preparations [100, 101]. The usual phenol–chloroform-isoamyl alcohol procedure was used to extract genomic DNA (gDNA) from blood stored in 100% ethanol [102].

Table 2 Species, sample size (N), sex, and locality of the analyzed individuals. The species whose satellitomes were studied are highlighted in bold

Sequencing data

Two broad-snouted caimans C. latirostris and the Schneider’s smooth-fronted caiman P. trigonatus were selected for low-pass shotgun sequencing on the BGISEQ-500 platform at BGI (BGI Shenzhen Corporation, Shenzhen, China), yielding 2.76, 2.76, and 2.67 Gb, respectively (Additional file 2: Table S7). Raw reads are available in the Sequence Read Archive from the NCBI (SRA-NCBI) under the accession numbers: SRR19901397 (C. latirostris male), SRR19901398 (C. latirostris female), SRR19901554 (P. trigonatus female). To search and compare satDNAs in other Alligatoridae species, we also collected genomic data available in the SRA-NCBI for the Yacare caiman Caiman yacare (SRR1609243), the black caiman Melanosuchus niger (SRR1609245) and for the Chinese alligator Alligator sinensis (SRR953089), thus encompassing all the extant Alligatoridae genera. The general features of sequencing data are summarized in Additional file 2: Table S8.

Satellite DNA characterization and comparative analyses

After gathering sequencing data for all the species as mentioned earlier, we performed a quality (Q > 30) and adapter trimming with Trimmomatic [103] for each library separately. After that, we proceeded to the characterization of satDNAs in each species. We performed several iterations of RepeatExplorer2 [36] and filtered the identified satDNAs with DeconSeq [104] following the protocol of [6]. We analyzed 2 × 500,000 reads in each iteration until no low- or high-confidence satellite DNA was found. After multiple iterations, we filtered and removed multigene families (5S rDNA and/or U snDNA) from the catalog. Then, we performed a similarity search among the remaining sequences with RepeatMasker using a custom python script (https://github.com/fjruizruano/ngs-protocols/blob/master/rm_homology.py), grouping them as the same sequence variant (≥ 95% of similarity), variant (≥ 80% of similarity) or different satDNA sharing a same superfamily (≥ 50% of similarity) in each species [6]. After that, we estimated Kimura’s divergence, using Kimura 2-parameter model from the script calcDivergenceFromAlign.pl of RepeatMasker suite and abundance values for all satDNAs families with the “cross_match” option in RepeatMasker software [105], using 2 × 5,000,000 reads for each library, except for Melanosuchus niger and Caiman yacare, because their libraries had fewer reads, performing the analysis with 2 × 1,213,376 and 2 × 1,608,245, respectively (Table 1; Additional file 1: Fig. S1). Genomic abundance of each satDNA was given as the number of mapped reads in each satDNA divided by the number of analyzed nucleotides. Finally, we classified each satellite based on decreasing abundance order, as Ruiz-Ruano et al. [6] suggested. The specific features of each satDNA are observed in Table 1. Each catalog of satDNAs was deposited on the GenBank with accession numbers OP169024–OP169026 (A. sinensis), OP169027–OP169032 (C. yacare), OP169033–OP169038 (M. niger), OP169039–OP169049 (P. trigonatus), and OP169050–OP169062 (C. latirostris). One additional and independent RepeatExplorer2 run was performed with a concatenated genomic library containing 150,000 reads from each species, using the “Perform comparative analysis” option.

To compare the satellitomes of multiple species, we performed a similarity search with RepeatMasker (https://github.com/fjruizruano/ngs-protocols/blob/master/rm_homology.py) considering all the de novo-characterized satDNA sequences. Then, we aligned the monomers of all satDNAs showing at least 50% similarity with MUSCLE [106]. In addition, we generated individual self-dotplots of the satDNA sequences and a general one with Flexidot [107].

For a general visualization of abundance and presence/absence of each satDNA in the different species, we ran RepeatMasker [105] against the complete catalog of Alligatoridae using each of the genomic libraries. After that, we normalized read coverage of the samples relative to single-copy genes. For this, we retrieved three single-copy genes in Sauropsida (options: Present in all species; Single-copy in all species) in the OrthoDB (https://www.orthodb.org/; accessed in July 30th) and mapped the genomic libraries against the genes using bowtie2 [108] with the preset values –sensitive and –local. Then, a normalization factor was calculated as: [(number of mapped reads x read sizes x gene sizes)/number of analyzed reads] (Additional file 2: Table S9). A final step of summing up the log10 of normalized read counts from RepeatMasker (0 to 20% of Kimura divergence) was performed. With the final matrix (Additional file 2: Table S10), we generated a Clustermap (Fig. 1) with Seaborn using the seaborn.clustermap function (https://seaborn.pydata.org/generated/seaborn.clustermap.html).

Taking advantage of the fact that the genome of Alligator sinensis is available in the NCBI (GCA 000455745.1), we also conducted a BLAST (blastn, word size = 11, e-value = 1e-6) to search the entire list of satDNAs against this genome that was assembled using Illumina Hiseq2000 [67]. We did not perform any structural or quantitative analysis on array sizes and/or organization because only short reads were employed for this assembly [67]. As a result, BLAST searches provided more useful information on the presence or absence of satDNAs in the genome of A. sinensis. To get an estimation of the degree of tandem structure for the satDNAs in this species, we calculated the Tandem Structure Index (TSI), as demonstrated in [71]. This value is calculated as the quotient of the number of paired reads mapped against a satDNA and the total number of reads (https://github.com/fjruizruano/SatIntExt). Thus, higher TSI values indicate the occurrence of longer arrays in the analyzed species. One must note that once the FISH probes were labeled and hybridized in groups, the TSI values are not completely suitable for comparison with the FISH results of satDNAs within groups 1 to 4.

Primer design and polymerase chain reaction (PCR)

We designed primer pairs for 12 satellite DNA families characterized from C. latirostris and two satellite DNA families characterized for A. sinensis, creating convergent primers for satellites larger than 1000 bp and divergent primers for satellites smaller than 1000 bp. We verified if those primer pairs anchors in conserved regions of monomers and used them to PCR-amplify in all Alligatoridae species. The PCRs contained 1 × PCR buffer, 1.5 mM of MgCl2, 200 µM of each dNTP, 0.5 µL of each primer, 10–100 ng/µL of gDNA, and 0.2 µl of Taq DNA polymerase in a total volume of 25 µL. The PCR program included an initial denaturation at 95 °C for 7 min, followed by 34 cycles at 95 °C for 45 s, 61 °C for 1 min, 72 °C for 1 min, and a final extension at 72 °C for 7 min. The PCR products were checked in 2% agarose gel.

Probe labeling and fluorescence in situ hybridization (FISH)

Except for ClaSat03-183, ClaSat09-285, ClaSat12-24, and AsiSat03-96, all the other satDNAs were successfully amplified and the PCR products were labeled with Atto550-dUTP (red) or Atto488-dUTP (green) according to the manufacturer’s recommendations using the Nick-Translation mix kit (Jena Bioscience, Jena, Germany). The probes were then hybridized in all other Alligatoridae species according to the methodology reported by [109]. To corroborate the FISH results, at least 30 metaphase spreads were examined in each individual. Photos were obtained with CoolSNAP on an Olympus BX50 microscope (Olympus Corporation, Ishikawa, Japan), and the images were processed using Image-Pro Plus 4.1 software (Media Cybernetics, Silver Spring, MD, USA).