Data description

The genome of a female Abyssinian cat (“Cinnamon” who resides at the University of Missouri-Columbia, USA) was sequenced at 1.8 × and 3.0 × whole genome shotgun (WGS) coverage at Agencourt Inc. Fca-6.2, an additional 12 × coverage of 454 reads and BAC ends was sequenced, assembled with CABOG [1] and analysed at Washington University, St. Louis (USA) [2]. Fca-6.2 is anchored to chromosome coordinates with two physical framework maps, a radiation hybrid map [3] and a short tandem repeat (STR) linkage map [4]. Further, 1943 distinct sites identified in a recently built linkage map using a single nucleotide polymorphism (SNP) genotyping array including ≈60,000 SNPs from an Illumina custom cat genotyping array are also mapped to the assembly.

Here we present a genome browser, Genome Annotation Resource Fields — GARfield [5], which displays the Fca-6.2 assembly and included annotated genome features. In Table 1 we list the features of GARfield annotated in the cat genome assembly which are described and illustrated in the Additional file 1 of this Data Note. The genome features detected in Fca-6.2 include a merged list of 21,865 genes derived from a comparative gene identification strategy using BLAST alignments between gene exons of reference genome from eight reference mammalian gene maps (human, chimpanzee, macaque, dog, cow, horse, rat, and mouse) obtained from the Ensembl Gene 75 database [6]. In addition, the whole genome methylation sites and a methylome bisulfite sequence pattern of cat whole blood cells is presented, previewing epigenetic profiling in important complex disease associations, including diseases with viral and neoplastic etiology.

Table 1 Annotated cat genome features available as genome browser tracks for GARfield and UCSC genome browsers

Approximately 55.7% of the cat genome is composed of repetitive elements of familiar classes (LINEs, SINEs, satellite DNA, LTRs and others). We report more than 25 novel families of complex tandem repeat elements in the cat genome uncovered by multiple repeat detection algorithms. We searched for STR-microsatellite loci useful in population and forensic applications. Putative PCR primers for 53,710 STR loci are annotated. We also mapped known feline endogenous retroviral loci (full length RD114, FeLV, FERV) and detected 125 kb of partial retroviral genome sequences dispersed across the cat genome. Nuclear mitochondrial (Numt) DNA pseudogenes derived from ancient transposition from cytoplasmic mitochondrial chromosomes to nuclear chromosomal positions comprise 176 kb in addition to the Lopez-Numt, a 7.8 kb element tandem-repeated 38–76 times on Chromosome D2 previously described in the 1.8× analysis of Cinnamon’s genome [7].

The earlier 3,078,438 feline single nucleotide variants (SNVs) [7, 8] from largely non-repetitive regions of the cat genome are supplemented with a new group of 99,494 newly annotated SNPs plus 8,355 detected indels. In addition, we performed an assisted assembly with a 40× Illumina SOLID DNA sequence coverage of Sylvester, a European wildcat, F. silvestris silvestris, a wild representative of the species from which cats were domesticated approximately 10,000 years ago [9]. Genome variations (SNVs and indels) between F. silvestris and F. catus SNPs are reported here and both species’ genomes and their associated data have been uploaded to the GARfield genome browser (see Availability of supporting data section).

Our annotation resolved cat homologues of 743,362 evolutionarily constrained elements (ECEs) recently identified in the human genome by alignment to 29 different mammalian genomes [10] and these were compared to the conserved sequence blocks obtained by the reciprocal best match (RBM) screen for cat genes with seven mammalian genomes (human, chimp, macaque, dog, cow, rat and mouse). A conservative alignment approach implicated 54% of the human ECE sequence comprising ≈3% of the cat genome. A total of 3,182 feline microRNA (miRNA) homologues were detected and mapped based upon homology to miRNA sequences from 36 species with miRNA sequence described in the miRBase database [11]. Finally we screened the genome sequence for copy number variation and segmental duplications. All annotated features listed in Table 1 are described in detail in Additional file 1 and tracked in the GARfield genome browser.

Availability of supporting data

The assembly sequences are available at NCBI RefSeq database (accession numbers #PRJNA175699 and #PRJNA253950). The annotated features are available in the Genome Association Resource Fields (GARfield) genome browser http://garfield.dobzhanskycenter.org and the UCSC Genome Browser (http://genome.ucsc.edu), which links to a Dobzhansky Center Hub (http://public.dobzhanskycenter.ru/Hub/hub.txt) (See Section 2 of Additional file 1 for instructions). Supplementary tables and figures that refer to GARfield features are given in Additional file 1 and listed in Table 1.

Sequence and variation data is available in NCBI (SAMN02795853 for Boris the cat and SAMN02898152 for wildcat) and supporting data is also available in the GigaDB repository [12].