Abstract
There are over 350 genetically distinct breeds of domestic dog that present considerable variation in morphology, physiology, and disease susceptibility. The genome sequence of the domestic dog was assembled and released in 2005, providing an estimated 20,000 protein-coding genes that are a great asset to the scientific community that uses the dog system as a genetic biomedical model and for comparative and evolutionary studies. Although the canine gene set had been predicted using a combination of ab initio methods, homology studies, motif analysis, and similarity-based programs, it still requires a deep annotation of noncoding genes, alternative splicing, pseudogenes, regulatory regions, and gain and loss events. Such analyses could benefit from new sequencing technologies (RNA-Seq) to better exploit the advantages of the canine genetic system in tracking disease genes. Here, we review the catalog of canine protein-coding genes and the search for missing genes, and we propose rationales for an accurate identification of noncoding genes though next-generation sequencing.
Similar content being viewed by others
References
Abitbol M, Thibaud JL, Olby NJ, Hitte C, Puech JP, Maurer M, Pilot-Storck F et al (2010) A canine Arylsulfatase G (ARSG) mutation leading to a sulfatase deficiency is associated with neuronal ceroid lipofuscinosis. Proc Natl Acad Sci USA 107:14775–14780
Akey JM, Ruhe AL, Akey DT, Wong AK, Connelly CF, Madeoy J, Nicholas TJ et al (2010) Tracking footprints of artificial selection in the dog genome. Proc Natl Acad Sci USA 107:1160–1165
Alekseyev MA, Pevzner PA (2007) Are there rearrangement hotspots in the human genome? PLoS Comput Biol 3:e209
Bannasch D, Young A, Myers J, Truve K, Dickinson P, Gregg J, Davis R et al (2010) Localization of canine brachycephaly using an across breed mapping approach. PLoS One 5:e9632
Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297
Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136:215–233
Beggs AH, Bohm J, Snead E, Kozlowski M, Maurer M, Minor K, Childers MK et al (2010) MTM1 mutation associated with X-linked myotubular myopathy in Labrador Retrievers. Proc Natl Acad Sci USA 107:14697–14702
Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14:988–995
Blanco E, Parra G, Guigo R (2007) Using geneid to identify genes. Curr Protoc Bioinformatics Chapter 4:Unit 4.3
Boguski MS, Lowe TM, Tolstoshev CM (1993) dbEST–database for “expressed sequence tags”. Nat Genet 4:332–333
Boyko AR, Quignon P, Li L, Schoenebeck JJ, Degenhardt JD, Lohmueller KE, Zhao K et al (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS Biol 8:e1000451
Breen M, Hitte C, Lorentzen TD, Thomas R, Cadieu E, Sabacan L, Scott A et al (2004) An integrated 4249 marker FISH/RH map of the canine genome. BMC Genomics 5:65
Brockdorff N, Ashworth A, Kay GF, McCabe VM, Norris DP, Cooper PJ, Swift S et al (1992) The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71:515–526
Cadieu E, Neff M, Quignon P, Walsh K, Chase K, Parker HG, Vonholdt BM et al (2009) Coat variation in the domestic dog is governed by variants in three genes. Science 326(5949):150–153
Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R et al (2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563
Ciaudo C, Bourdet A, Cohen-Tannoudji M, Dietz HC, Rougeulle C, Avner P (2006) Nuclear mRNA degradation pathway(s) are implicated in Xist regulation and X chromosome inactivation. PLoS Genet 2:e94
Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, Ponting CP et al (2011) The reality of pervasive transcription. PLoS Biol 9:e1000625
Cloonan N, Xu Q, Faulkner GJ, Taylor DF, Tang DT, Kolle G, Grimmond SM (2009) RNA-MATE: a recursive mapping strategy for high-throughput RNA-sequencing data. Bioinformatics 25:2615–2616
Daughters RS, Tuttle DL, Gao W, Ikeda Y, Moseley ML, Ebner TJ, Swanson MS et al (2009) RNA gain-of-function in spinocerebellar ataxia type 8. PLoS Genet 5:e1000600
Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M et al (2008) Annotating genomes with massive-scale RNA sequencing. Genome Biol 9:R175
Derrien T, Andre C, Galibert F, Hitte C (2007a) Analysis of the unassembled part of the dog genome sequence: chromosomal localization of 115 genes inferred from multispecies comparative genomics. J Hered 98:461–467
Derrien T, Andre C, Galibert F, Hitte C (2007b) AutoGRAPH: an interactive web server for automating and visualizing comparative genome maps. Bioinformatics 23:498–499
Derrien T, Theze J, Vaysse A, Andre C, Ostrander EA, Galibert F, Hitte C (2009) Revisiting the missing protein-coding gene catalog of the domestic dog. BMC Genomics 10:62
Drogemuller C, Karlsson EK, Hytonen MK, Perloski M, Dolf G, Sainio K, Lohi H et al (2008) A mutation in hairless dogs implicates FOXI3 in ectodermal development. Science 321:1462
Enard D, Depaulis F, Roest Crollius H (2010) Human and non-human primate genomes share hotspots of positive selection. PLoS Genet 6:e1000840
Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE, Finch CE et al (2008) Expression of a noncoding RNA is elevated in Alzheimer’s disease and drives rapid feed-forward regulation of beta-secretase. Nat Med 14:723–730
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P et al (2011) Ensembl 2011. Nucleic Acids Res 39:D800–D806
Gabory A, Ripoche MA, Le Digarcher A, Watrin F, Ziyyat A, Forne T, Jammes H et al (2009) H19 acts as a trans regulator of the imprinted gene network controlling growth in mice. Development 136:3413–3421
Galibert F, André C (2006) The dog genome. Genome Dyn 2:46–59
Gingeras TR (2007) Origin of phenotypes: genes and transcripts. Genome Res 17:682–690
Goodstadt L, Ponting CP (2006) Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol 2:e133
Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28:503–510
Guyon R, Lorentzen TD, Hitte C, Kim L, Cadieu E, Parker HG, Quignon P et al (2003) A 1-Mb resolution radiation hybrid map of the canine genome. Proc Natl Acad Sci USA 100:5296–5301
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106:9362–9367
Hitte C, Madeoy J, Kirkness EF, Priat C, Lorentzen TD, Senger F, Thomas D et al (2005) Facilitating genome navigation: survey sequencing and dense radiation-hybrid gene mapping. Nat Rev Genet 6:643–648
Hitte C, Kirkness EF, Ostrander EA, Galibert F (2008) Survey sequencing and radiation hybrid mapping to construct comparative maps. Methods Mol Biol 422:65–77
Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D, Khalil AM et al (2010) A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142:409–419
Hurst LD, Pal C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5:299–310
Ishii N, Ozaki K, Sato H, Mizuno H, Saito S, Takahashi A, Miyamoto Y et al (2006) Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction. J Hum Genet 51:1087–1099
Jones P, Chase K, Martin A, Davern P, Ostrander EA, Lark KG (2008) Single-nucleotide-polymorphism-based association mapping of dog stereotypes. Genetics 179:1033–1044
Kawaji H, Severin J, Lizio M, Waterhouse A, Katayama S, Irvine KM, Hume DA et al (2009) The FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation. Genome Biol 10:R40
Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, Thomas K et al (2009) Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA 106:11667–11672
Kirkness EF, Bafna V, Halpern AL, Levy S, Remington K, Rusch DB, Delcher AL et al (2003) The dog genome: survey sequencing and comparative analysis. Science 301:1898–1903
Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D et al (2006) CAGE: cap analysis of gene expression. Nat Methods 3:211–222
Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75:843–854
Lequarré AS, Andersson L, André C, Fredholm M, Hitte C, Leeb T, Lohi H et al (2011) LUPA: A European initiative taking advantage of the canine genome architecture for unravelling complex disorders in both human and dogs. Vet J 189:155–159
Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M et al (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438:803–819
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753
Mattick JS (2009) The genetic signatures of noncoding RNAs. PLoS Genet 5:e1000459
Mattick JS, Taft RJ, Faulkner GJ (2010) A global view of genomic information–moving beyond the gene and the master regulator. Trends Genet 26:21–28
Merveille AC, Davis EE, Becker-Heck A, Legendre M, Amirav I, Bataille G, Belmont J et al (2011) CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43:72–78
Meyer IM, Durbin R (2002) Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–1318
Mohammad F, Mondal T, Guseva N, Pandey GK, Kanduri C (2010) Kcnq1ot1 noncoding RNA mediates transcriptional gene silencing by interacting with Dnmt1. Development 137:2493–2499
Mosher DS, Quignon P, Bustamante CD, Sutter NB, Mellersh CS, Parker HG, Ostrander EA (2007) A mutation in the myostatin gene increases muscle mass and enhances racing performance in heterozygote dogs. PLoS Genet 3:e79
Muffato M, Louis A, Poisnel CE, Roest Crollius H (2010) Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics 26:1119–1121
Nagano T, Mitchell JA, Sanz LA, Pauler FM, Ferguson-Smith AC, Feil R, Fraser P (2008) The Air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science 322:1717–1720
Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S et al (2005) Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2:105–111
Olsson M, Meadows JR, Truve K, Rosengren Pielberg G, Puppo F, Mauceli E, Quilez J et al (2011) A novel unstable duplication upstream of HAS2 predisposes to a breed-defining skin phenotype and a periodic fever syndrome in Chinese Shar-Pei dogs. PLoS Genet 7:e1001332
Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F et al (2010) Long noncoding RNAs with enhancer-like function in human cells. Cell 143:46–58
Parker HG, Kim LV, Sutter NB, Carlson S, Lorentzen TD, Malek TB, Johnson GS et al (2004) Genetic structure of the purebred domestic dog. Science 304:1160–1164
Parker HG, Kukekova AV, Akey DT, Goldstein O, Kirkness EF, Baysac KC, Mosher DS et al (2007) Breed relationships facilitate fine-mapping studies: a 7.8-kb deletion cosegregates with Collie eye anomaly across multiple dog breeds. Genome Res 17:1562–1571
Parker HG, VonHoldt BM, Quignon P, Margulies EH, Shao S, Mosher DS, Spady TC et al (2009) An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science 325:995–998
Project Consortium ENCODE, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799–816
Project Consortium ENCODE, Myers RM, Stamatoyannopoulos J, Snyder M, Dunham I, Hardison RC, Bernstein BE et al (2011) A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9:e1001046
Semon M, Duret L (2006) Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Mol Biol Evol 23:1715–1723
Seppala EH, Jokinen TS, Fukata M, Fukata Y, Webster MT, Karlsson EK, Kilpinen SK et al (2011) LGI2 Truncation causes a remitting focal epilepsy in dogs. PLoS Genet 7:e1002194
Sutter NB, Ostrander EA (2004) Dog star rising: the canine genetic system. Nat Rev Genet 5:900–910
Taft RJ, Pang KC, Mercer TR, Dinger M, Mattick JS (2010) Non-coding RNAs: regulators of disease. J Pathol 220:126–139
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. Comput Appl Biosci 13:263–270
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
van Bakel H, Nislow C, Blencowe BJ, Hughes TR (2010) Most “dark matter” transcripts are associated with known genes. PLoS Biol 8:e1000371
Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, Sigurdsson S, Fall T et al (2011) Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet 7(10):e1002316
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF et al (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63
Wilbe M, Jokinen P, Truve K, Seppala EH, Karlsson EK, Biagi T, Hughes A et al (2010) Genome-wide association mapping identifies multiple loci for a canine SLE-related disease complex. Nat Genet 42:250–254
Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555–556
Yang Z, dos Reis M (2011) Statistical properties of the branch-site test of positive selection. Mol Biol Evol 28:1217–1228
Yu W, Gius D, Onyango P, Muldoon-Jacobs K, Karp J, Feinberg AP, Cui H (2008) Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 451:202–206
Zhou Y, Zhong Y, Wang Y, Zhang X, Batista DL, Gejman R, Ansell PJ et al (2007) Activation of p53 by MEG3 non-coding RNA. J Biol Chem 282:24731–24742
Acknowledgments
We acknowledge the Centre National de la Recherche Scientifique, the University of Rennes 1 for funding. TD was supported by the Conseil Régional de Bretagne and AV was supported by the European Commission (FP7-LUPA, GA-201370). We thank Jocelyn Plassais for the dog photographs.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Derrien, T., Vaysse, A., André, C. et al. Annotation of the domestic dog genome sequence: finding the missing genes. Mamm Genome 23, 124–131 (2012). https://doi.org/10.1007/s00335-011-9372-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00335-011-9372-0