Mammalian Genome

, Volume 23, Issue 1–2, pp 124–131 | Cite as

Annotation of the domestic dog genome sequence: finding the missing genes

  • Thomas Derrien
  • Amaury Vaysse
  • Catherine André
  • Christophe Hitte
Article

Abstract

There are over 350 genetically distinct breeds of domestic dog that present considerable variation in morphology, physiology, and disease susceptibility. The genome sequence of the domestic dog was assembled and released in 2005, providing an estimated 20,000 protein-coding genes that are a great asset to the scientific community that uses the dog system as a genetic biomedical model and for comparative and evolutionary studies. Although the canine gene set had been predicted using a combination of ab initio methods, homology studies, motif analysis, and similarity-based programs, it still requires a deep annotation of noncoding genes, alternative splicing, pseudogenes, regulatory regions, and gain and loss events. Such analyses could benefit from new sequencing technologies (RNA-Seq) to better exploit the advantages of the canine genetic system in tracking disease genes. Here, we review the catalog of canine protein-coding genes and the search for missing genes, and we propose rationales for an accurate identification of noncoding genes though next-generation sequencing.

References

  1. Abitbol M, Thibaud JL, Olby NJ, Hitte C, Puech JP, Maurer M, Pilot-Storck F et al (2010) A canine Arylsulfatase G (ARSG) mutation leading to a sulfatase deficiency is associated with neuronal ceroid lipofuscinosis. Proc Natl Acad Sci USA 107:14775–14780PubMedCrossRefGoogle Scholar
  2. Akey JM, Ruhe AL, Akey DT, Wong AK, Connelly CF, Madeoy J, Nicholas TJ et al (2010) Tracking footprints of artificial selection in the dog genome. Proc Natl Acad Sci USA 107:1160–1165PubMedCrossRefGoogle Scholar
  3. Alekseyev MA, Pevzner PA (2007) Are there rearrangement hotspots in the human genome? PLoS Comput Biol 3:e209PubMedCrossRefGoogle Scholar
  4. Bannasch D, Young A, Myers J, Truve K, Dickinson P, Gregg J, Davis R et al (2010) Localization of canine brachycephaly using an across breed mapping approach. PLoS One 5:e9632PubMedCrossRefGoogle Scholar
  5. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297PubMedCrossRefGoogle Scholar
  6. Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136:215–233PubMedCrossRefGoogle Scholar
  7. Beggs AH, Bohm J, Snead E, Kozlowski M, Maurer M, Minor K, Childers MK et al (2010) MTM1 mutation associated with X-linked myotubular myopathy in Labrador Retrievers. Proc Natl Acad Sci USA 107:14697–14702PubMedCrossRefGoogle Scholar
  8. Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14:988–995PubMedCrossRefGoogle Scholar
  9. Blanco E, Parra G, Guigo R (2007) Using geneid to identify genes. Curr Protoc Bioinformatics Chapter 4:Unit 4.3Google Scholar
  10. Boguski MS, Lowe TM, Tolstoshev CM (1993) dbEST–database for “expressed sequence tags”. Nat Genet 4:332–333PubMedCrossRefGoogle Scholar
  11. Boyko AR, Quignon P, Li L, Schoenebeck JJ, Degenhardt JD, Lohmueller KE, Zhao K et al (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS Biol 8:e1000451PubMedCrossRefGoogle Scholar
  12. Breen M, Hitte C, Lorentzen TD, Thomas R, Cadieu E, Sabacan L, Scott A et al (2004) An integrated 4249 marker FISH/RH map of the canine genome. BMC Genomics 5:65PubMedCrossRefGoogle Scholar
  13. Brockdorff N, Ashworth A, Kay GF, McCabe VM, Norris DP, Cooper PJ, Swift S et al (1992) The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71:515–526PubMedCrossRefGoogle Scholar
  14. Cadieu E, Neff M, Quignon P, Walsh K, Chase K, Parker HG, Vonholdt BM et al (2009) Coat variation in the domestic dog is governed by variants in three genes. Science 326(5949):150–153PubMedCrossRefGoogle Scholar
  15. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R et al (2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563PubMedCrossRefGoogle Scholar
  16. Ciaudo C, Bourdet A, Cohen-Tannoudji M, Dietz HC, Rougeulle C, Avner P (2006) Nuclear mRNA degradation pathway(s) are implicated in Xist regulation and X chromosome inactivation. PLoS Genet 2:e94PubMedCrossRefGoogle Scholar
  17. Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, Ponting CP et al (2011) The reality of pervasive transcription. PLoS Biol 9:e1000625PubMedCrossRefGoogle Scholar
  18. Cloonan N, Xu Q, Faulkner GJ, Taylor DF, Tang DT, Kolle G, Grimmond SM (2009) RNA-MATE: a recursive mapping strategy for high-throughput RNA-sequencing data. Bioinformatics 25:2615–2616PubMedCrossRefGoogle Scholar
  19. Daughters RS, Tuttle DL, Gao W, Ikeda Y, Moseley ML, Ebner TJ, Swanson MS et al (2009) RNA gain-of-function in spinocerebellar ataxia type 8. PLoS Genet 5:e1000600PubMedCrossRefGoogle Scholar
  20. Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M et al (2008) Annotating genomes with massive-scale RNA sequencing. Genome Biol 9:R175PubMedCrossRefGoogle Scholar
  21. Derrien T, Andre C, Galibert F, Hitte C (2007a) Analysis of the unassembled part of the dog genome sequence: chromosomal localization of 115 genes inferred from multispecies comparative genomics. J Hered 98:461–467PubMedCrossRefGoogle Scholar
  22. Derrien T, Andre C, Galibert F, Hitte C (2007b) AutoGRAPH: an interactive web server for automating and visualizing comparative genome maps. Bioinformatics 23:498–499PubMedCrossRefGoogle Scholar
  23. Derrien T, Theze J, Vaysse A, Andre C, Ostrander EA, Galibert F, Hitte C (2009) Revisiting the missing protein-coding gene catalog of the domestic dog. BMC Genomics 10:62PubMedCrossRefGoogle Scholar
  24. Drogemuller C, Karlsson EK, Hytonen MK, Perloski M, Dolf G, Sainio K, Lohi H et al (2008) A mutation in hairless dogs implicates FOXI3 in ectodermal development. Science 321:1462PubMedCrossRefGoogle Scholar
  25. Enard D, Depaulis F, Roest Crollius H (2010) Human and non-human primate genomes share hotspots of positive selection. PLoS Genet 6:e1000840PubMedCrossRefGoogle Scholar
  26. Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE, Finch CE et al (2008) Expression of a noncoding RNA is elevated in Alzheimer’s disease and drives rapid feed-forward regulation of beta-secretase. Nat Med 14:723–730PubMedCrossRefGoogle Scholar
  27. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P et al (2011) Ensembl 2011. Nucleic Acids Res 39:D800–D806PubMedCrossRefGoogle Scholar
  28. Gabory A, Ripoche MA, Le Digarcher A, Watrin F, Ziyyat A, Forne T, Jammes H et al (2009) H19 acts as a trans regulator of the imprinted gene network controlling growth in mice. Development 136:3413–3421PubMedCrossRefGoogle Scholar
  29. Galibert F, André C (2006) The dog genome. Genome Dyn 2:46–59PubMedCrossRefGoogle Scholar
  30. Gingeras TR (2007) Origin of phenotypes: genes and transcripts. Genome Res 17:682–690PubMedCrossRefGoogle Scholar
  31. Goodstadt L, Ponting CP (2006) Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol 2:e133PubMedCrossRefGoogle Scholar
  32. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28:503–510PubMedCrossRefGoogle Scholar
  33. Guyon R, Lorentzen TD, Hitte C, Kim L, Cadieu E, Parker HG, Quignon P et al (2003) A 1-Mb resolution radiation hybrid map of the canine genome. Proc Natl Acad Sci USA 100:5296–5301PubMedCrossRefGoogle Scholar
  34. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106:9362–9367PubMedCrossRefGoogle Scholar
  35. Hitte C, Madeoy J, Kirkness EF, Priat C, Lorentzen TD, Senger F, Thomas D et al (2005) Facilitating genome navigation: survey sequencing and dense radiation-hybrid gene mapping. Nat Rev Genet 6:643–648PubMedCrossRefGoogle Scholar
  36. Hitte C, Kirkness EF, Ostrander EA, Galibert F (2008) Survey sequencing and radiation hybrid mapping to construct comparative maps. Methods Mol Biol 422:65–77PubMedCrossRefGoogle Scholar
  37. Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D, Khalil AM et al (2010) A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142:409–419PubMedCrossRefGoogle Scholar
  38. Hurst LD, Pal C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5:299–310PubMedCrossRefGoogle Scholar
  39. Ishii N, Ozaki K, Sato H, Mizuno H, Saito S, Takahashi A, Miyamoto Y et al (2006) Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction. J Hum Genet 51:1087–1099PubMedCrossRefGoogle Scholar
  40. Jones P, Chase K, Martin A, Davern P, Ostrander EA, Lark KG (2008) Single-nucleotide-polymorphism-based association mapping of dog stereotypes. Genetics 179:1033–1044PubMedCrossRefGoogle Scholar
  41. Kawaji H, Severin J, Lizio M, Waterhouse A, Katayama S, Irvine KM, Hume DA et al (2009) The FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation. Genome Biol 10:R40PubMedCrossRefGoogle Scholar
  42. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, Thomas K et al (2009) Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA 106:11667–11672PubMedCrossRefGoogle Scholar
  43. Kirkness EF, Bafna V, Halpern AL, Levy S, Remington K, Rusch DB, Delcher AL et al (2003) The dog genome: survey sequencing and comparative analysis. Science 301:1898–1903PubMedCrossRefGoogle Scholar
  44. Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D et al (2006) CAGE: cap analysis of gene expression. Nat Methods 3:211–222PubMedCrossRefGoogle Scholar
  45. Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75:843–854PubMedCrossRefGoogle Scholar
  46. Lequarré AS, Andersson L, André C, Fredholm M, Hitte C, Leeb T, Lohi H et al (2011) LUPA: A European initiative taking advantage of the canine genome architecture for unravelling complex disorders in both human and dogs. Vet J 189:155–159PubMedCrossRefGoogle Scholar
  47. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M et al (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438:803–819PubMedCrossRefGoogle Scholar
  48. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753PubMedCrossRefGoogle Scholar
  49. Mattick JS (2009) The genetic signatures of noncoding RNAs. PLoS Genet 5:e1000459PubMedCrossRefGoogle Scholar
  50. Mattick JS, Taft RJ, Faulkner GJ (2010) A global view of genomic information–moving beyond the gene and the master regulator. Trends Genet 26:21–28PubMedCrossRefGoogle Scholar
  51. Merveille AC, Davis EE, Becker-Heck A, Legendre M, Amirav I, Bataille G, Belmont J et al (2011) CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43:72–78PubMedCrossRefGoogle Scholar
  52. Meyer IM, Durbin R (2002) Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–1318PubMedCrossRefGoogle Scholar
  53. Mohammad F, Mondal T, Guseva N, Pandey GK, Kanduri C (2010) Kcnq1ot1 noncoding RNA mediates transcriptional gene silencing by interacting with Dnmt1. Development 137:2493–2499PubMedCrossRefGoogle Scholar
  54. Mosher DS, Quignon P, Bustamante CD, Sutter NB, Mellersh CS, Parker HG, Ostrander EA (2007) A mutation in the myostatin gene increases muscle mass and enhances racing performance in heterozygote dogs. PLoS Genet 3:e79PubMedCrossRefGoogle Scholar
  55. Muffato M, Louis A, Poisnel CE, Roest Crollius H (2010) Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics 26:1119–1121PubMedCrossRefGoogle Scholar
  56. Nagano T, Mitchell JA, Sanz LA, Pauler FM, Ferguson-Smith AC, Feil R, Fraser P (2008) The Air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science 322:1717–1720PubMedCrossRefGoogle Scholar
  57. Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S et al (2005) Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2:105–111PubMedCrossRefGoogle Scholar
  58. Olsson M, Meadows JR, Truve K, Rosengren Pielberg G, Puppo F, Mauceli E, Quilez J et al (2011) A novel unstable duplication upstream of HAS2 predisposes to a breed-defining skin phenotype and a periodic fever syndrome in Chinese Shar-Pei dogs. PLoS Genet 7:e1001332PubMedCrossRefGoogle Scholar
  59. Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F et al (2010) Long noncoding RNAs with enhancer-like function in human cells. Cell 143:46–58PubMedCrossRefGoogle Scholar
  60. Parker HG, Kim LV, Sutter NB, Carlson S, Lorentzen TD, Malek TB, Johnson GS et al (2004) Genetic structure of the purebred domestic dog. Science 304:1160–1164PubMedCrossRefGoogle Scholar
  61. Parker HG, Kukekova AV, Akey DT, Goldstein O, Kirkness EF, Baysac KC, Mosher DS et al (2007) Breed relationships facilitate fine-mapping studies: a 7.8-kb deletion cosegregates with Collie eye anomaly across multiple dog breeds. Genome Res 17:1562–1571PubMedCrossRefGoogle Scholar
  62. Parker HG, VonHoldt BM, Quignon P, Margulies EH, Shao S, Mosher DS, Spady TC et al (2009) An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science 325:995–998PubMedCrossRefGoogle Scholar
  63. Project Consortium ENCODE, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799–816CrossRefGoogle Scholar
  64. Project Consortium ENCODE, Myers RM, Stamatoyannopoulos J, Snyder M, Dunham I, Hardison RC, Bernstein BE et al (2011) A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9:e1001046CrossRefGoogle Scholar
  65. Semon M, Duret L (2006) Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Mol Biol Evol 23:1715–1723PubMedCrossRefGoogle Scholar
  66. Seppala EH, Jokinen TS, Fukata M, Fukata Y, Webster MT, Karlsson EK, Kilpinen SK et al (2011) LGI2 Truncation causes a remitting focal epilepsy in dogs. PLoS Genet 7:e1002194PubMedCrossRefGoogle Scholar
  67. Sutter NB, Ostrander EA (2004) Dog star rising: the canine genetic system. Nat Rev Genet 5:900–910PubMedCrossRefGoogle Scholar
  68. Taft RJ, Pang KC, Mercer TR, Dinger M, Mattick JS (2010) Non-coding RNAs: regulators of disease. J Pathol 220:126–139PubMedCrossRefGoogle Scholar
  69. Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. Comput Appl Biosci 13:263–270PubMedGoogle Scholar
  70. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515PubMedCrossRefGoogle Scholar
  71. van Bakel H, Nislow C, Blencowe BJ, Hughes TR (2010) Most “dark matter” transcripts are associated with known genes. PLoS Biol 8:e1000371PubMedCrossRefGoogle Scholar
  72. Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, Sigurdsson S, Fall T et al (2011) Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet 7(10):e1002316PubMedCrossRefGoogle Scholar
  73. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF et al (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476PubMedCrossRefGoogle Scholar
  74. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63PubMedCrossRefGoogle Scholar
  75. Wilbe M, Jokinen P, Truve K, Seppala EH, Karlsson EK, Biagi T, Hughes A et al (2010) Genome-wide association mapping identifies multiple loci for a canine SLE-related disease complex. Nat Genet 42:250–254PubMedCrossRefGoogle Scholar
  76. Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555–556PubMedGoogle Scholar
  77. Yang Z, dos Reis M (2011) Statistical properties of the branch-site test of positive selection. Mol Biol Evol 28:1217–1228PubMedCrossRefGoogle Scholar
  78. Yu W, Gius D, Onyango P, Muldoon-Jacobs K, Karp J, Feinberg AP, Cui H (2008) Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 451:202–206PubMedCrossRefGoogle Scholar
  79. Zhou Y, Zhong Y, Wang Y, Zhang X, Batista DL, Gejman R, Ansell PJ et al (2007) Activation of p53 by MEG3 non-coding RNA. J Biol Chem 282:24731–24742PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Thomas Derrien
    • 1
  • Amaury Vaysse
    • 1
  • Catherine André
    • 1
  • Christophe Hitte
    • 1
  1. 1.Institut de Génétique et Développement de Rennes, CNRS-UMR6061Université de Rennes 1RennesFrance

Personalised recommendations