Background

Komodo dragon (Varanus komodoensis) is the world’s largest extant lizard, weighing up to 75–100 kg and measuring up to three meters in length. This species of monitor lizard, indigenous to Komodo and nearby islands in southern Indonesia (Fig. 1), is a relic of very large varanids that once populated Indonesia and Australia, most of which, along with other megafauna, died out after the Pleistocene [1]. Komodo dragons are endangered and actively conserved in zoos around the world and in Komodo National Park, a UNESCO World Heritage site, due to their vulnerable status [2]. They are believed to have evolved from other varanids in Australia, first appearing approximately 4 million years ago [1].

Fig. 1
figure 1

Komodo dragon (Varanus komodoensis). Tujah, a large male Komodo dragon residing at the St. Augustine Alligator Farm Zoological Park, and the source of the DNA used in the present study. Photograph courtesy of the St. Augustine Alligator Farm Zoological Park in St. Augustine, Florida

On their native Indonesian islands, Komodo dragons are the dominant terrestrial predators, even though their diet is based mainly on carrion [3]. The saliva of wild dragons (as opposed to zoo-kept animals) has been found to contain as many as 58 species of bacteria, many of which are pathogenic [3,4,5], which may also contribute to their effectiveness as predators. The lizards themselves appear to be unaffected by these bacteria, despite biting each other in fights and having bleeding gums during feedings. Furthermore, their plasma has been shown to have potent antimicrobial properties [6]. Thus, we hypothesized that Komodo dragons would have robust innate immunity and this innate immunity may be partially mediated by antimicrobial peptides.

There are few studies regarding the reptilian immune response; however, as in mammals, reptiles have both an innate and adaptive immune response with cell mediated and humoral components. The reptile immune response is primarily dependent on an efficient innate immune response as the adaptive immune response does not consistently demonstrate evidence of a memory response [7].

Innate immunity, which includes chemokines and cytokines, provides the first line of defense against infection in higher vertebrates and is partially mediated by antimicrobial host-defense peptides [8, 9]. Antimicrobial host-defense peptides play complex roles in host defense against infection, with peptides exhibiting a range of pathogen-directed antimicrobial effects as well as host-directed immunomodulatory, chemotactic, inflammomodulatory and wound healing properties [8, 9]. The role and prevalence of antimicrobial peptides in the innate immune response of reptiles is only now being understood [10,11,12,13,14,15]. The plasma and cell extracts of crocodiles, alligators and Komodo dragons have been shown by several groups to have antimicrobial properties [6, 10, 16,17,18,19,20]. Recently, our group has made significant technical advances in developing a method for the identification and characterization of native antimicrobial peptides (BioProspector process), which we employed in the discovery of novel, non-canonical, active antimicrobial peptides in alligator plasma [21,22,23] and Komodo dragon plasma [24, 25].

The major classes of antimicrobial host-defense peptides in vertebrates include defensins and cathelicidins [8, 9]. These peptides are produced as part of the host-defense innate immune response by cells throughout the body, including epithelium, endothelium and white blood cells. Like most cationic antimicrobial host-defense peptides, defensins and cathelicidins tend to be relatively small peptides (< 100 amino acids in length) that simultaneously exhibit cationic and amphipathic qualities. They are generally membrane-active peptides that can disrupt bacterial membrane integrity as part of their antimicrobial mechanism. The cationic and amphipathic properties of these peptides contribute to their ability to preferentially target and disrupt bacterial membranes, which tend to be rich in anionic lipids, rather than host cell membranes, whose outer surfaces tend to be predominantly neutral in nature.

The family of vertebrate defensin peptides includes alpha-, beta-, theta- and ovo-defensin subclasses, with alpha- and theta-defensins being unique to mammals and ovodefensins to birds and reptiles [26, 27]. Peptides in each subclass exhibit compact three-dimensional conformations stabilized by characteristic conserved patterns of cysteine residues and associated disulfide bond networks. The disulfide bond networks in each defensin subclass are critical to their ability to adopt well-defined structures, which are essential to their antimicrobial and host-directed properties.

Cathelicidins are another major class of host-defense antimicrobial peptides and are unique to vertebrates [28]. The functional cathelicidin peptides exhibit diverse sequences and structures. However, they are distinguished by the presence of conserved N-terminal pre-pro-cathelin domains in the cathelicidin precursor proteins [29]. Cathelicidins are often packaged in azurotrophic granules in neutrophils and have been identified in chicken heterophils (avian white blood cells) [30]. The detailed characteristics of each peptide subclass are described in the relevant sections below.

Advances in genomic techniques and the availability of sequenced genomes have rapidly expanded our understanding of the presence of innate immunity genes across different classes. The anole lizard has been found to have genes for most of the major classes of antimicrobial peptides that are produced by mammals and other vertebrates, including β-defensins and cathelicidins [13]. As in the case of birds, genes for α-defensin peptides have not been reported to date in reptiles; this class of antimicrobial peptides appears to be restricted to mammals [13]. However, the status of antimicrobial peptide genes in the Komodo dragon has not been determined, due to the lack of a published Komodo dragon genome. Their tolerance to regular exposure to potentially pathogenic bacteria in their saliva and apparent resistance to bacterial infection suggests that Komodo dragon’s evolutionary adaptations may extend to their innate immunity and the host-defense peptides that they employ.

As part of our effort to extend our earlier study of Komodo dragon cationic antimicrobial peptides [24], genomic DNA and RNA were obtained from Komodo dragon blood samples and sequenced in order to provide a Komodo dragon-specific DNA sequence database to facilitate de novo peptide sequencing [24].

Here, we report the sequencing, assembly, and analysis of the Komodo dragon genome. This work will also provide evidence of the robust innate immunity of these lizards and will be a valuable resource for researchers studying the evolution and the biology of the endangered Komodo dragon. The analysis reported here is focused on genes associated with innate immunity and host-defense peptides. However, further investigation of the Komodo dragon genome may have broader impact on our understanding of the biology and evolution of reptiles.

Results and discussion

Cell types in Komodo dragon blood

A sample of blood was obtained from a Komodo dragon named Tujah at the Saint Augustine Alligator Farm Zoological Park in accordance with required safety and regulatory procedures, and with appropriate approvals. At the time of collection, we were interested in collecting both genomic DNA for sequencing as well as mRNA to generate a cDNA library to facilitate our proteomic studies. In birds, the heterophils (white blood cells) are known to express multiple antimicrobial peptides [30]. Antimicrobial peptides identified from chicken heterophils exhibit significant antimicrobial [31, 32] and host-directed immunomodulatory activities [29]. Accordingly, after obtaining an initial sample of fresh Komodo dragon blood, we allowed the white blood cells to settle out of the blood and collected them because they were likely to be involved with antimicrobial peptide expression. The collected Komodo dragon white blood cells were then divided evenly, with half being processed for the isolation of genomic DNA in preparation for sequencing and library generation, and the other half reserved for mRNA extraction for our proteomic studies.

We then performed smears and identified the various cell types that we observed. Immune cell identification in Komodo dragon blood is challenging due to limited published literature for reference. The various cell types that were observed in Wright-stained blood smears are shown in Fig. 2. We identified these cells based on similarity to the immune cells we had previously identified in the American alligator blood [12]. Of interest were the large and elongated nucleated red blood cells of this reptile. In addition, we were able to identify heterophils (similar to granulocytes), a probable source of cathelicidin peptides, as well as monocyte and lymphocyte cells.

Fig. 2
figure 2

Komodo dragon red blood cells and immune cells. Blood cells from Komodo dragon were visualized by Wright stain and imaged at 40x. Cell types are identified as: A. nucleated red blood cell, B. monocyte, C. lymphocyte, and D. heterophil

A second sample of Komodo dragon blood was later collected and processed for genomic DNA extraction by Dovetail Genomics for additional sequencing. The researchers at Dovetail Genomics did not separate white blood cells, and instead extracted DNA from cells pelleted directly from whole blood.

Assembly and annotation of the Komodo dragon genome

Previous analyses of Komodo dragon erythrocytes using flow cytometry estimated the genome to be approximately 1.93 Gb in size [33]. Using deep Illumina sequencing and Dovetail approaches, we obtained a draft genome assembly that was 1.60 Gb large, similar to the genome size of A. carolinensis lizard genome which is 1.78 Gb [34]. The draft assembly contains 67,605 scaffolds with N50 of 23.2 Mb (Table 1). A total of 17,213 genes were predicted, and 16,757 (97.35%) of them were annotated. Completeness estimates with CEGMA [35] were 56% (‘complete’) and 94% (‘partial’). The estimated percentage of repeats in the genome is 35.05% with the majority being LINEs (38.4%) and SINEs (5.56%) (Additional file 1: Fig. S1 & Additional file 2: Table S1). Genomic data will be available at NCBI with raw sequencing reads deposited in the Sequence Read Archive (#SRP161190), and the genome assembly at DDBJ/ENA/GenBank under the accession #VEXN00000000. The assembly version described in this paper is VEXN01000000.

Table 1 Genome assembly attributes

Identification of potential innate immunity and antimicrobial peptide genes

Innate immunity in reptiles is a critical aspect of their evolutionary success, but it remains poorly understood in these animals. Innate immunity is defined as those aspects of immunity that are not antibodies and not T-cells. Innate immune responses to invading pathogens can include the expression of cytokines; the activation and recruitment of macrophages, leukocytes and other white blood cells; and the expression of antimicrobial peptides such as defensins and cathelicidins [13, 15].

We have taken a genomics-based approach [36] to identifying innate immunity genes in the Komodo dragon genome in this work. We have sequenced the Komodo genome and examined it for genes and clusters of important innate immunity antimicrobial peptide genes (β-defensins, ovodefensins and cathelicidins), which are likely involved in expressions of innate immunity in this giant lizard.

β-Defensin and related genes in Komodo genome

Defensins are one example of disulfide-stabilized antimicrobial peptides, with β-defensins being a uniquely vertebrate family of disulfide-stabilized, cationic antimicrobial peptides involved in the resistance to microbial colonization at epithelial surfaces [37,38,39]. The β-defensin peptides are defined by a characteristic six-cysteine motif with conserved cysteine residue spacing (C–X6–C–X (3–5)–C–X (8–10)–C–X6–CC) [40] and associated disulfide bonding pattern (Cys1-Cys5, Cys2-Cys4 and Cys3-Cys6); however, variations in the number of and spacing between cysteine residues has been observed. As with other cationic antimicrobial peptides, β-defensins typically exhibit a net positive (cationic, basic) charge.

One of the first extensive reports of an in vivo role for β-defensin peptide expression in reptiles is the inducible expression of β-defensins in wounded anole lizards (Anolis carolinensis) [10, 11, 14, 41,42,43]. Reptile neutrophils appear to have granules that contain both cathelicidin-like peptides as well as β-defensin peptides. β-defensin-like peptides are also found in reptile eggs [26]. It is well-known that some species of lizard can lose their tails as a method of predator escape, and that these tails then regenerate from the wound site without inflammation or infection. β-defensin peptides are expressed both within the azurophilic granulocytes in the wound-bed as well as in the associated epithelium [41, 43] and are observed in phagosomes containing degraded bacteria. There is a distinct lack of inflammation in the wound, which is associated with regeneration, and two β-defensins in particular are expressed at high levels in the healing tissues [10, 42] Overall, there appears to be a significant role for the β-defensins in the wound healing and regeneration in the anole lizard [44].

β-defensin genes have been generally observed to reside in clusters within the genomes of vertebrates [45, 46]. In humans, as many as 33 β-defensin genes were identified in five clusters [47, 48]. Recently, analyses of the genomes of several avian species including duck, zebra finch and chicken revealed that the genome of each species contained a β-defensin cluster [49,50,51,52]. A β-defensin-like gene cluster has recently been identified in the anole lizard (Prickett, M.D., unpublished work in progress), which is closely related to the Komodo dragon [13]. Interestingly, the cathepsin B gene (CTSB) has been identified as a strong marker for β-defensin clusters in humans, mice, and chickens [51]. Thus, we examined the Komodo genome for the cathepsin B gene (CTSB) as a potential marker to aid in the identification of the β-defensin cluster(s) therein.

Through these analyses, we identified a total of 66 potential β-defensin genes in the Komodo dragon genome, of which 18 are thought to be Komodo dragon-specific β-defensin genes (Table 2). The β-defensin genes identified from the Komodo dragon genome exhibit variations in cysteine spacing, gene size, the number of cysteine residues that comprise the β-defensin domain, as well as the number of β-defensin domains. With respect to the conserved cysteine residue spacing, especially at the end (C–X6–C–X (3–5)–C–X (8–10)–C–X6–CC), we found considerable variability in our analysis of the β-defensin genes in the Komodo dragon genome, in that five Komodo dragon β-defensin genes have seven resides between the last cysteines, 16 have six residues between the last cysteines, 42 have five residues between the last cysteines, and three Komodo dragon β-defensin genes exhibit more complex cysteine-residue spacing patterns (Table 2).

Table 2 Identified Komodo dragon Defensin genes grouped based on scaffold locations of gene clusters

As with birds and other reptiles, the majority of Komodo dragon defensin genes appear to reside in two separate clusters within the same syntenic block (Fig. 3). One cluster is a β-ovodefensin cluster flanked on one end by the gene for XK, Kell blood group complex subunit-related family, member 6 (XKR6) and on the other end by the gene for Myotubularin related protein 9 (MTMR9). The intercluster region of circa 400,000 bp includes the genes for Family with sequence similarity 167, member A (FAM167A); BLK proto-oncogene, Src family tyrosine kinase (BLK); Farnesyl-diphosphate farnesyl transferase 1 (FDFT1); and CTSB (cathepsin B), which is a flanking gene for the β-defensin cluster (Fig. 3). In birds, turtles, and crocodilians, the other end of the β-defensin cluster is followed by the gene for Translocation associated membrane protein 2 (TRAM2). As is the case with all of the other squamate (lizards and snakes) genomes surveyed, the flanking gene for the end of the β-defensin cluster cannot be definitively determined at present as there are no squamate genomes with intact clusters available.

Fig. 3
figure 3

β-defensin gene family clusters. Scaffold locations of the identified Komodo dragon defensin and ovodefensin genes, highlighting the defensin and ovodefensin clusters in the Komodo dragon genome

The end of the cluster could either be flanked by XPO1 or TRAM2 or neither. Two of the three genes found on scaffold 45 with TRAM2 (VkBD80a, VkBD80b) are nearly identical and potentially the result of an assembly artifact. The genes are orthologs for the final gene in the avian, turtle, and crocodilian β-defensin clusters. The anole ortholog for this gene is isolated and is not associated with TRAM2, XPO1, nor any other β-defensins, and there are no β-defensins found in the proximity of anole TRAM2. Two of the seven genes associated with XPO1 have orthologs with one of the five anole genes associated with XPO1 but it cannot be determined in either species if these are part of the rest of the β-defensin cluster or part of an additional cluster. The snake orthologs are associated with TRAM2 but are not part of the cluster.

Structural diversity

Diversity can be seen in variations in structure of the β-defensin domain. Typically, a β-defensin consists of 2–3 exons: a signal peptide, an exon with the propiece and β-defensin domain with six cysteines, and in some cases, a short third exon. Variations in the number of β-defensin domains, exon size, exon number, atypical spacing of cysteines, and/or the number of cysteines in the β-defensin domain can be found in all reptilian species surveyed (unpublished). There are three β-defensins with two defensin domains (VkBD7, VkBD34, and VkBD43) and one with three defensin domains (VkBD39). The Komodo dragon β-defensin genes VkBD12, VkBD13, and VkBD14 and their orthologs in anoles have atypically large exons. The group of β-defensins between VkBD16 and VkBD21 also have atypically large exons. Atypical spacing between cysteine residues is found in three β-defensins, VkBD20 (1–3–9-7), VkBD57 (3–4–8-5), and VkBD79 (3–10–16-6). There are four β-defensins with additional cysteine residues in the β-defensin domain: VkBD6 with 10 cysteine residues, and a group of three β-defensins, VkBD16, VkBD17, and VkBD18, with eight cysteine residues.

The two β-defensin domains of VkBD7 are homologous to the one β-defensin domain of VkBD8 with orthologs in other species of Squamata. In the anole lizard A. carolinensis there are two orthologs, LzBD6 with one β-defensin domain and the non-cluster LzBD82 with two β-defensin domains. The orthologs in snakes (SnBD5 and SnBD6) have one β-defensin domain. VkBD34 is an ortholog of LzBD39 in anoles and SnBD15 in snakes. VkBD39 and VkBD43 consist of three and two homologous β-defensin domains respectively, which are homologous to the third exons of LzBD52, LzBD53, and LzBD55, all of which have two non-homologous β-defensin domains. VkBD40 with one β-defensin domain is homologous to the second exons of LzBD52, LzBD53, LzBD54 (with one defensin domain), and LzBD55.

An increase in the number of cysteines in the β-defensin domain results in the possibly of forming additional disulfide bridges. Examples of this variation can be found in the psittacine β-defensin, Psittaciforme AvBD12 [52]. The β-defensin domain of VkBD6 appears to consist of 10 cysteines, four of which are part of an extension after a typical β-defensin domain with an additional paired cysteine (C-X6-C-X4-C-X9-C-X6-CC-X7-C-X7-CC-X5-C). The group of Komodo β-defensins VkBD16, VkBD17, and VkBD18, in addition to having an atypical cysteine spacing, also have eight cysteines within a typical number of residues. The β-defensin following this group, VkBD19, is a paralog of these three genes; however, the β-defensin domain contains the more typical six cysteine residues.

The gene structures of these Komodo β-defensin genes are subject to confirmation with supporting evidence. There are a number of atypical structure elements in anole lizards including additional non β-defensin domain exons or larger exons.

Analyses of the peptide sequences encoded by the newly identified Komodo dragon β-defensin genes revealed that the majority (53 out of 66) of them are predicted to have a net positive charge at physiological conditions, as is typical for this class of antimicrobial peptide (Table 3). However, it is notable that four peptides (VkBD10, VkBD28, VkBD30 and VkBD34) are predicted to be weakly cationic or neutral (+ 0.5–0) at pH 7, while nine peptides (VkBD3, VkBD4, VkBD11, VkBD19, VkBD23, VkBD26, VkBD35, VkBD36 and VkBD37) are predicted to be weakly to strongly anionic. These findings suggest while these peptides exhibit canonical β-defensin structural features and reside in β-defensin gene clusters, one or more of these genes may not encode for β-defensin-like peptides or canonical β-defensins, because β-defensins typically are cationic and their positive charge contributes towards their antimicrobial activity.

Table 3 Physical properties of identified β-defensin peptides

Identification of Komodo dragon ovodefensin genes

Ovodefensin genes have been found in multiple avian and reptile species [26], with expression found in egg white and other tissues. Ovodefensins including the chicken peptide gallin (Gallus gallus OvoDA1) have been shown to have antimicrobial activity against the Gram-negative E. coli and the Gram-positive S. aureus. Presumptive β-ovodefensins are found in a cluster in the same syntenic block as the β-defensin cluster in birds and reptiles. There have been 19 β-ovodefensins found in A. carolinensis (one with an eight cysteine β-defensin domain) and five in snakes (four with an eight cysteine β-defensin domain) (Prickett, M.D., unpublished work in progress). The Komodo dragon cluster consists of six β-ovodefensins (Tables 4 and 5). Two of these may be Komodo dragon specific; VkOVOD1, which is a pseudois an ortholog of SnOVOD1 in addition to the first β-ovodefensin in turtles and crocodilians. The defensin domains VkOVOD3, VkOVOD4, and VkOVOD6 consist of eight cysteines, orthologs of SnOVOD2, SnOVOD3, and SnOVOD5, respectively. VkOVOD4 and VkOVOD6 are orthologs of LzOVOD14.

Table 4 Ovodefensin peptides predicted in the Komodo dragon genome
Table 5 Physical properties of identified ovodefensin peptides

Identification of the Komodo dragon cathelicidin genes

Cathelcidin peptide genes have recently been identified in reptiles through genomics approaches [13]. Several cathelicidin peptide genes have been identified in birds [52, 54,55,56,57,58], snakes [59, 60] and the anole lizard [11, 14, 61]. The release of functional cathelicidin antimicrobial peptides has been observed from chicken heterophils, suggesting that reptilian heterophils may also be a source of these peptides [30, 62]. Alibardi et al. have identified cathelicidin peptides being expressed in anole lizard tissues, including associated with heterophils [11, 14, 61]. Cathelicidin antimicrobial peptides are thought to play key roles in innate immunity in other animals [29] and so likely play this role in the Komodo dragon as well.

In anole lizards, the cathelicidin gene cluster, consisting of 4 genes, is organized as follows: <FASTK> cathelicidin cluster <KLHL18>. We searched for a similar cathelicidin cluster in the Komodo dragon genome. Searching the Komodo dragon genome for cathelicidin-like genes revealed a cluster of three genes that have a “cathelin-like domain”, which is the first requirement of a cathelicidin gene, located at one end of saffold 84. However, this region of scaffold 84 has assembly issues with gaps, isolated exons, and duplications. Identified Komodo dragon cathelicidin genes have been named after their anole orthologs. Two of the Komodo dragon cathelicidins (Cathelicidin2 and Cathelicidin4.1) are in sections with no assembly issues. By contrast, Cathlicidin4.2 was constructed using a diverse set of exons 1–3 and a misplaced exon 4 to create a complete gene, which is paralogous to Cathelicidin4.1. As the cluster is found at one end of the scaffold, there may be additional unidentified cathelicidins that are not captured in this assembly.

A common feature of cathelicidin antimicrobial peptide gene sequences is that the N-terminal cathelin-domain encodes for at least 4 cysteines. In our study of alligator and snake cathelicidins we also noted that typically following the last cysteine, a three-residue pattern consisting of VRR or similar sequence immediately precedes the predicted C-terminal cationic antimicrobial peptide [12, 13, 15, 60, 63]. Additional requirements of a cathelicidin antimicrobial peptide gene sequence are that it encodes for a net-positive charged peptide in the C-terminal region, it is typically encoded by the fourth exon, and it is typically approximately 35 aa in length (range 25–37) [13, 15]. Since the naturally occurring protease responsible for cleavage and release of the functional antimicrobial peptides is not known, prediction of the exact cleavage site is difficult. As can be seen in Table 6, the predicted amino acid sequences for each of the identified Komodo dragon cathelicidin gene candidates are listed. Performing our analysis on each sequence, we made predictions and conclusions about whether each potential cathelicidin gene may encode for an antimicrobial peptide.

Table 6 Predicted cathelicidin antimicrobial peptide gene sequences

It can be seen that the predicted N-terminal protein sequence of Cathelicidin2_VARKO (VK-CATH2) contains four cysteines (underlined, Table 6). However, there is not an obvious “VRR” or similar sequence in the ~ 10 amino acids following the last cysteine residue as we saw in the alligator and related cathelicidin sequences [12, 13, 15]. In addition, analysis of the 35 C-terminal amino acids reveals a predicted peptide sequence lacking a net positive charge. For these reasons, we predict that the Cathelicidin2_VARKO gene sequence does not encode for an active cathelicidin antimicrobial peptide at its C-terminus (Table 7).

Table 7 Predicted active cathelicidin peptides and calculated properties (APD3 [64])

For the identified Cathelicidin4.1_VARKO gene, the predicted cathelin-domain includes the requisite four cysteine residues (Table 6), and the sequence “VTR” is present within 10 amino acids of the last cysteine, similar to the “VRR” sequence in the alligator cathelicidin gene [12, 13, 15]. The 33-aa C-terminal peptide following the “VTR” sequence is predicted to have a net + 12 charge at physiological pH, and a large portion of the sequence is predicted to be helical [65, 66], which is consistent with cathelicidins. The majority of known cathelicidins contain segments with significant helical structure [67]. Finally, analysis of the sequence using the Antimicrobial Peptide Database indicates that the peptide is potentially a cationic antimicrobial peptide [64]. Hence, we predict that this gene likely encodes for an active cathelicidin antimicrobial peptide, called VK-CATH4.1 (Table 7).

In addition, this peptide demonstrates some homology to other known antimicrobial peptides in the Antimicrobial Peptide Database [64] (Table 8). It shows a particularly high degree of sequence similarity to cathelicidin peptides identified from squamates, with examples included in Table 8. Thus, the predicted VK-CATH4.1 peptide has many of the hallmark characteristics of a cathelicidin peptide and is a strong candidate for further study. Table 8 shows the alignment of VK_CATH4.1 with known peptides in the Antimicrobial Peptide Database [64].

Table 8 Comparison to other cathelicidins

For the identified Cathelicidin4.2_VARKO gene, the predicted cathelin domain includes the requisite four cysteine residues (Table 6). As was noted in the Cathelicidin4.1_VARKO gene, the sequence “VTR” is present within 10 amino acids of the fourth cysteine residue, and immediately precedes the C-terminal segment, which encodes for a 30-aa peptide that is predicted to be antimicrobial [64]. The amino acid sequence of the C-terminal peptide is predicted to have a net + 10 charge at physiological pH, and it demonstrates varied degrees of homology to other known antimicrobial peptides in the Antimicrobial Peptide Database [64]. Thus, like VK-CATH4.1, this candidate peptide also exhibits many of the hallmark characteristics associated with cathelicidin peptides, and is a second strong candidate for further study. Table 8 shows the homology and alignment of VK-CATH4.2 with known peptides from the Antimicrobial Peptide Database. Finally, the gene sequence encoding the functional peptide VK-CATH4.2 is found on exon 4, which is the typical location of the active cathelicidin peptide. This exon encodes the peptide sequence LDRVTRRRWRRFFQKAKRFVKRHGVSIAVGAYRIIG.

The predicted peptide VK-CATH4.2 is highly homologous with peptides from other predicted cathelicidin genes, with similar predicted C-terminal peptides, from A. carolinensis, G. japonicus, and P. bivittatus (Table 8). Residues 2–27 of VK-CATH4.2 are 65% identical and 80% similar to the anole Cathelicidin-2 like predicted C-terminal peptide (XP_008116755.1, aa 130–155). Residues 2–30 of VK-CATH4.2 are 66% identical and 82% similar to the gecko Cathelicidin-related predicted C-terminal peptide (XP_015277841.1, aa 129–151). Finally, aa 2–24 of VK-CATH4.2 are 57% identical and 73% similar to the Cathelicidin-related OH-CATH-Like predicted C-terminal peptide (XP_007445036.1, aa 129–151).

Conclusions

Reptiles, including Komodo dragons, are evolutionarily ancient, are found in diverse and microbially-challenging environments, and they accordingly appear to have evolved robust innate immune systems. All of these features suggest that reptiles may express interesting antimicrobial peptides. A few reptilian antimicrobial peptides including defensin and cathelicidin peptides have been previously identified and studied that demonstrate broad-spectrum antimicrobial and antifungal activities. While defensins and cathelicidins are known in three of the four orders of reptiles: the testudines, crocodilians, and the squamata, few peptides have been identified to date in lizards and none in varanids (including Komodo dragon).

Genes encoding antimicrobial peptides involved in innate immunity have previously been found in birds and reptiles, some of which are localized within clusters in the genome. Cathelicidin genes have been identified in birds and reptiles, including crocodilians, lizards and snakes. Clusters of β-defensin genes were recently identified in birds by one of our team [52]. While the origins of these gene clusters have not been well established, the phenomenon may have biological significance, potentially helping to coordinate the expression of these genes. Thus, these functionally related loci may have been selectively maintained through reptile and avian innate immunity evolution.

This paper presents a new genome, that of the Komodo dragon, one of the largest extant lizards and the largest vertebrate to exhibit the ability to reproduce through parthenogenesis. Annotated genomes have been published for only a limited number of lizard species, and the present Komodo dragon genome is the first varanid genome assembly to be reported, and therefore will help to expand our understanding of lizard evolution in general. We present an annotated genome that contains as many as 17,213 genes. While there are many aspects of evolution and biology of interest to study in the Komodo dragon, we chose to focus on aspects of innate immunity, specifically antimicrobial peptides, as this was the source of our interest in the Komodo genome [24].

Antimicrobial peptides are present in mammals, birds, amphibians and fish but have not been well-characterized in reptiles despite the central position of this class in vertebrate evolution. We have sought to contribute to this understanding through our prior studies of antimicrobial peptides from birds [52], alligators [12, 21,22,23], snakes [12, 60, 63, 69,70,71,72], and now Komodo dragon [24, 25].

In the present study, we report the identification of genes encoding Komodo dragon defensin and cathelicidin peptides. We have elucidated 66 potential β–defensin genes, including 18 that appear to be unique to Komodo dragons. The remaining 48 peptides appear to have homologs in anole lizards and/or snakes. Similar to avian genomes, the Komodo dragon genome does not contain α-defensin genes; this class of antimicrobial peptides appears to be restricted to mammals [13]. Additionally, six potential β-ovodefensins were identified in the genome. These β–defensin and β-ovodefensin genes are localized in defensin-gene clusters within the genome.

In addition to defensins, we identified three potential cathelicidin genes in the genome; however, upon further analysis it was determined that one of these apparent cathelicidin genes did not actually encode a cathelicidin peptide. The remaining two genes, Cathelicidin4.1_VARKO and Cathelicidin4.2_VARKO, are predicted to encode functional cathelicidin peptides at the C-terminal end of the precursor peptide. These peptides show significant degrees of similarity to other reptile cathelicidins. These findings are significant; however, the identified defensin and cathelicidin gene clusters appear to reside near scaffold edges, and therefore may not represent the full complement of defensin and cathelicidin genes that may be present in the Komodo dragon genome.

The defensin and cathelicidin genes and gene clusters that we have identified here exhibit similarities to those that have been reported for the anole lizard and snakes, but they also show characteristics that are unique to the Komodo dragon. We anticipated that the findings presented here should contribute to a deeper understanding of innate immunity and antimicrobial peptides in reptiles and vertebrates in general.

Methods & experimental procedures

Komodo dragon blood samples

Komodo dragon (Varanus komodoensis) blood was collected by staff at the St. Augustine’s Alligator Farm Zoological Park (St. Augustine, FL) in compliance with relevant guidelines, using protocols approved by the GMU IACUC (GMU IACUC# 0266). Blood was collected in plastic blood collecting tubes treated with K2EDTA as the anticoagulant. Samples were immediately placed on ice, and then shipped on ice overnight to GMU.

Library preparation and multiplexing

Genomic DNA was prepared from a sample that had been enriched for leukocytes by a settling protocol (24 h, 37 °C, 5% CO2) from fresh Komodo dragon blood. DNA-seq libraries were constructed using PrepX ILM DNA Library Reagent Kit (Catalog No. 400044, Lot No. F0199) on the Apollo 324 robot (WaferGen, CA). Briefly, 150 ng of genomic DNA was resuspended in 50 μl of nuclease-free water and fragmented to 200–250 bp, using Covaris M220 to 300 bp at Peak Incident Power of (W) 50, Duty Factor of 20%, Cycles per Burst of 200, and Treatment Time of 75 s. Briefly, the ends were repaired and an ‘A’ base added to the 3′ end, preparing the DNA fragments for ligation to the adapters, which have a single ‘T’ base overhang at their 3′ end. The adapters enabled PCR amplification and hybridization to the flow cell. Following ligation, the excess adapters were removed and 300 ± 50 bp fragments (225 bp insert) were enriched for library amplification by PCR. The library that was generated was then validated using an Agilent 2100 Bioanalyzer and quantitated using a Quant-iT dsDNA HS Kit (Invitrogen) and qPCR. The samples were multiplexed based on qPCR quantitation to obtain similar distribution of reads of multiplexed samples.

Chicago library preparation

High molecular weight genomic DNA was extracted from blood cells collected from fresh Komodo dragon whole blood. A Chicago library was prepared as described previously [73]. Briefly, ≥ 0.5 μg of high molecular weight genomic DNA (50 kbp mean fragment size) was extracted from whole Komodo dragon blood using a Qiagen blood and cell midi kit, reconstituted into chromatin in vitro, and fixed with formaldehyde. Fixed chromatin was then digested with MboI, the 5′ overhangs were filled in with biotinylated nucleotides, and then free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was sheared to ~ 350 bp mean fragment size, and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were then isolated using streptavidin beads before PCR enrichment of the library.

Cluster generation and HiSeq paired-end sequencing

Libraries were clustered onto a flow cell using Illumina’s TruSeq PE Cluster Kit v3-cBOT-HS (PE-401-3001) and sequenced on an Illumina HiSeq 2500. The Chicago library was sequenced using 2 × 101 PE Rapid-Run (153 M read pairs) and the TruSeq SBS Kit v3-HS (200-cycles) (FC-401-3001), while the Virginia Bioinformatics Institute Genomics Core provided a 2 × 151 PE Rapid-Run (149 M read pairs) using TruSeq Rapid SBS Kit-200 cycle (2500) (FC-402–4001) and two TruSeq Rapid SBS Kit-50 cycles (FC-402–4002).

Scaffolding the draft genome with HiRise

N50 is defined as the scaffold length such that the sum of the lengths of all scaffolds of this size or less is equal to 50% of the total assembly length. The initial Komodo dragon draft genome assembly in FASTA format generated at Virginia Tech with Illumina 150 PE (Celera Assembler 8.2, default parameters, [74]) resulted in 1599 Mbp with a scaffold N50 of 35.8 kbp. This assembly, additional Illumina shotgun sequences (100 PE) and Chicago library sequence in FASTQ format were used as input data for HiRise, a software pipeline designed specifically for using Chicago library sequence data to assemble genomes [73]. Shotgun and Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model, and the resulting likelihood model was used to identify putative misjoins and score prospective joins. After scaffolding, shotgun sequences were used to close gaps between contigs.

Genome annotation and completeness

Assembly sequences were first masked using RepeatMasker (v4.0.3, http://www.repeatmasker.org/) with parameters set to “-s -a -nolow” and using a customized repeat library. Protein-coding genes were predicted using MAKER2 [75], which used anole lizard (A. carolinensis, version AnoCar2.0) and python (P. bivittatus, version bivittatus-5.0.2) protein sequences that were downloaded from Ensembl (www.ensembl.org) and RefSeq (www.ncbi.nlm.nih.gov/refseq) as protein homology evidence, along with the previously assembled RNA-seq data [24] as the expression evidence, and integrated with prediction methods including Blastx, SNAP [76] and Augustus [77]. The SNAP HMM file was generated by training the anole lizard gene sequences. An Augustus model file was generated by training 3026 core genes of vertebrates from a genome completeness assessment tool BUSCO [78]. Predicted genes were subsequently used as query sequences in a Blastx database search of NR database (the non-redundant database, http://www.ncbi.nlm.nih.gov/). Blastx alignments with e-value greater than 1e− 10 were discarded, and the top hit was used to annotate the query genes. Repeat families were identified by using the de novo modeling package RepeatModeler (http://www.repeatmasker.org/RepeatModeler/). Then, the de novo identified repeat sequences were combined with manually selected vertebrate repeats from RepBase (https://www.girinst.org/repbase/) to form a customized repeat library. The completeness of assembly was estimated using CEGMA by examining 248 core eukaryotic genes [35].

Transcriptome

A transcriptome generated from RNA isolated from Komodo blood cells has been previously described [24] and was used here to aid in the assembly annotation. Briefly, 280–300 bp libraries (160–180 bp insert) were generated, clustered onto a flow cell using Illumina’s TruSeq PE Cluster Kit v3-cBOT-HS and sequenced using TruSeq SBS Kit v3-HS (300 cycles, 2 × 150 cycle paired-end) on an Illumina HiSeq 2500.

Identification of defensin and cathelicidin genes within the genome

Lizard and snake defensin and cathelicidin genes had been previously identified in prior analyses of published genomes for Anolis carolinensis [34] Ophiophagus hannah (king cobra) [79] Python bivittatus (Burmese python) [80] as well as the pit vipers Protobothrops mucrosquamatus (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Protobothrops_mucrosquamatus/100/) and Vipera berus berus (https://www.ncbi.nlm.nih.gov/bioproject/170536) (https://www.hgsc.bcm.edu/reptiles/european-adder-genome-project) (Additional file 3: Table S2). This data was used in our analyses of the Komodo dragon genome. Genes from A. carolinensis (β-defensins, ovodefensins, cathelicidins, and genes flanking the defensin and cathelicidin clusters) were used as queries in a TBLASTN against the Komodo genome. Due to the diversity of β-defensins, homology searches are not sufficient to identify the entire β-defensin repertoire, so a combination of strategies was used. Genomic scaffolds containing hits were extracted and genes identified by BLAST were manually curated using Artemis [19]. Scaffolds with hits to β-defensins were then further examined manually for the characteristic β-defensin motif and signal peptides not previously identified by the initial BLAST search. Gene structures were determined based on previously annotated A. carolinensis orthologs when possible.

Annotated β-defensin genes were named by using the initials for the species and genus (Vk) as a prefix and a five-letter abbreviation as a suffix (VkBDx_VARKO) and numbered in order following CTSB on scaffold 210. Β-ovodefensins were similarly named in order following MTMR9 (VkOVODx_VARKO). Β-defensins on scaffold 826 were numbered using anole orthologs as a reference for gene order. Β-defensins on other scaffolds were named based on their anole orthologs. Cathelicidins were named based on their anole orthologs.

Peptide prediction

Predicted amino acid sequences were compared to other known protein sequences using blast-p at NCBI (https://www.ncbi.nlm.nih.gov) tool [81, 82]. Prediction of size, charge, helicity and other properties of proposed antimicrobial peptides was performed using Antimicrobial Peptide Database APD3 Calculation and Prediction tool http://aps.unmc.edu/AP/prediction/prediction_main.php [64]. Homology searching against other peptides in the APD3 database was done using the proffered option after the calculation and prediction tool was applied.