Objective

Dogs (Canis familiaris) were probably the earliest domesticated animals and one of the human companions in ancient times [1, 2]. Archaeological findings and genetic research indicated that the dog breeds have derived from wild wolves [3,4,5]. In the Southwest Asia, major–scale farming extended within the so-named Fertile Crescent (FC), where the independent domestication of plants and animals occurred [6, 7]. Extensively, cultural advances occurred in the Zagros area of current day Iraq and Iran, connecting Iranian plateau and Mesopotamia [8]. Dogs had been pictured frequently in Southwest Asia [1, 9]. Consequently, one of the notable viewpoints on the primary location of the dog domestication has been the Southwest Asia, likely the Middle East [1]. Moreover, the Middle East has been included in the considerable allelic distribution between dog breeds and wolf [10]; however, this presumption has been queried because of dog-wolf hybridization as stated in previous studies [11,12,13]. The dog is a considerable example of phenotypic variation under artificial selection and demographic forces, but genetic basis of this diversity is not yet completely clear. Therefore, the availability of complete whole-genome resequencing data of Iranian local canids will provide an opportunity for researchers to trace the origin of dog domestication. We firstly carried out genome sequencing of six Iranian local canids including two hunting dogs (Saluki breed), a mastiff dog (Qahderijani ecotype) and three wolves (Table 1). We used these data for identifying effective genomic variants in dogs and wolves [14].

Table 1 Overview of whole-genome sequence data files of six Iranian canids

Data description

We collected blood samples from three Iranian local dogs and three Iranian local wolves with the approval of the owners from six various sites in Iran. Sampling of Saluki dogs was done on Jamil Tavanaei’s personal farms in Kurdistan zone (Sanandaj and Bijar) and sampling of a Qahderijani dog was conducted on Alireza Hoseini private farm in Isfahan zone. One of the wolf samples was collected from Kerman zoological garden in Kerman zone and the others were collected from Eram zoological garden in Tehran zone. DNA was extracted with phenol/chloroform method. For sequencing library preparation, the genomic DNA was sheared to fragments of 300–500 bp, which were then end-repaired, “A”-tailed, and ligated to Illumina sequencing adapters. The ligated products with sizes of 400–500 bp were selected on 2% agarose gels and then amplified by LM-PCR. Illumina paired-end whole-genome resequencing for six individuals was done with Hiseq2500 Illumina system) http://www.berrygenomics.com). Both nuclear and mitochondrial genomes were sequenced. We created 287.5 Gb data with a uniform read length of 150 bp. A total of 1,884,054,828 short reads were generated for all of the six individuals. After filtering, the range of total high-quality sequence data was from 42.1 Gb to 51 Gb and the coverage varied from 14.51X to 17.15X. The range of the mean insert sizes and their standard deviations in the sequenced data for all samples was from 280.06 to 331.86 and from 27.12 to 33.94, respectively.

The quality assessment of raw sequence reads was done with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). We used BWA (v.0.7.15) [15] program to compare sequence data with the reference genome (CanFam3.1) downloaded from the Ensembl (http://asia.ensembl.org/Canis_lupus_familiaris/Info/Index). The alignment quality was assessed with SAMtools v.1.9 using flagstat and depth commands [16]. The short-read sequences with the mean depth of 16X were mapped to the dog reference genome (CanFam3.1) and achieved 99% coverage of the reference assembly. The mapping output files were preprocessed using SAMtools [16], the Picard tools (http://broadinstitute.github.io/picard/) and GATK tools [17]. We used variome detection pipeline for this data using CNVnator [18], BreakDancer [19], DELLY [20] and Bedtools [21] programs [14]. Finally, we compared the effect of variome between the dog and wolf genomes using Sorting Intolerant from Tolerant (SIFT) algorithm [19], Ensembl annotation [22] and DAVID [23] tool [14]. The data presented herein together with our previously mitochondrial DNA sequence on Iranian dogs [11] will provide useful resources to understand genetic structure of the Iranian dogs and testing hypotheses on the dog origin and domestication issues.

Limitations

Sample size for the dog and wolf populations is a limitation of our work. We could create genome sequence data from only three wolves and three dogs. In addition, we produced the short-reads with a mean depth of 16X which is a medium depth and it might not be suitable for some genomic analyses.