Science China Life Sciences

, Volume 62, Issue 4, pp 579–593 | Cite as

Origination and evolution of orphan genes and de novo genes in the genome of Caenorhabditis elegans

  • Wenyu Zhang
  • Yuanxiao Gao
  • Manyuan Long
  • Bairong ShenEmail author
Research Paper


Orphan genes that lack detectable homologues in other lineages could contribute to a variety of biological functions. However, their origination and function mechanisms remain largely unknown. Herein, through a comprehensive and systematic computational pipeline, we identified 893 orphan genes in the lineage of C. elegans, of which only a low fraction (0.9%) were derived from transposon elements. Six new protein-coding genes that de novo originated from non-coding DNA sequences in the genome of C. elegans were also identified. The authenticity and functionality of these orphan genes and de novo genes are supported by three lines of evidences, consisting of transcriptional data, and in silico proteomic data, and the fixation status data in wild populations. Orphan genes and de novo genes exhibited simple gene structures, such as, short in protein length, of fewer exons, and are frequently X-linked. RNA-seq data analysis showed these orphan genes are enriched with expression in embryo development and gonad, and their potential function in early development was further supported by gene ontology enrichment analysis results. Meanwhile, de novo genes were found to be with significant expression in gonad, and functional enrichment analysis of the co-expression genes of these de novo genes suggested they may be functionally involved in signaling transduction pathway and metabolism process. Our results presented the first systematic evidence on the evolution of orphan genes and de novo origin of genes in nematodes and their impacts on the functional and phenotypic evolution, and thus could shed new light on our appreciation of the importance of these new genes.


Caenorhabditis elegans orphan genes de novo genes 



We are grateful to Li Zhang for providing helpful suggestions on de novo gene identification analysis. Computing was supported by EEgrid cluster of the University of Chicago. This work was supported by National Natural Science Foundation of China (31600670 to W. Zhang, 31670851 to B. Shen).

Supplementary material

11427_2019_9482_MOESM1_ESM.pptx (55 kb)
Supplementary material, approximately 54.5 KB.
11427_2019_9482_MOESM2_ESM.xlsx (58 kb)
Supplementary material, approximately 57.6 KB.
11427_2019_9482_MOESM3_ESM.docx (20 kb)
Supplementary Table S2 Geography distribution of 40 Caenorhabditis elegans wild strains
11427_2019_9482_MOESM4_ESM.xlsx (59 kb)
Supplementary material, approximately 58.7 KB.
11427_2019_9482_MOESM5_ESM.xlsx (665 kb)
Supplementary material, approximately 665 KB.


  1. Agarwala, R., Barrett, T., Beck, J., Benson, D.A., Bollin, C., Bolton, E., Bourexis, D., Brister, J.R., Bryant, S.H., Canese, K., et al. (2016). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44, D7–D19.CrossRefGoogle Scholar
  2. Arnold, A., Rahman, M.M., Lee, M.C., Muehlhaeusser, S., Katic, I., Hess, D., Scheckel, C., Wright, J.E., Stetak, A., Boag, P.R., et al. (2014). Functional characterization of C. elegans Y-box-binding proteins reveals tissue-specific functions and a critical role in the formation of polysomes. Nucleic Acids Res 42, 13353–13369.CrossRefGoogle Scholar
  3. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J. M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: Tool for the unification of biology. Nat Genet 25, 25–29.CrossRefGoogle Scholar
  4. Babraham Institute. (2013). FastQC: A quality control tool for high throughput sequence data. Babraham Bioinforma.Google Scholar
  5. Begun, D.J., Lindfors, H.A., Kern, A.D., and Jones, C.D. (2007). Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137.CrossRefGoogle Scholar
  6. Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., and Bairoch A. (2007). UniProtKB/Swiss-Prot. Methods Mol Biol 406, 89–112.Google Scholar
  7. Cai, J., Zhao, R., Jiang, H., and Wang, W. (2008). De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179, 487–496.CrossRefGoogle Scholar
  8. Castillo-Davis, C.I., and Hartl, D.L. (2002). Genome evolution and developmental constraint in Caenorhabditis elegans. Mol Biol Evol 19, 728–735.CrossRefGoogle Scholar
  9. Chen, S., Krinsky, B.H., and Long, M. (2013). New genes as drivers of phenotypic evolution. Nat Rev Genet 14, 645–660.CrossRefGoogle Scholar
  10. Cho, S., Jin, S.W., Cohen, A., and Ellis, R.E. (2004). A phylogeny of Caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 14, 1207–1220.CrossRefGoogle Scholar
  11. Colbourne, J.K., Pfrender, M.E., Gilbert, D., Thomas, W.K., Tucker, A., Oakley, T.H., Tokishita, S., Aerts, A., Arnold, G.J., Basu, M.K., et al. (2011). The ecoresponsive genome of Daphnia pulex. Science 331, 555–561.CrossRefGoogle Scholar
  12. Cutter, A.D. (2008). Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate. Mol Biol Evol 25, 778–786.CrossRefGoogle Scholar
  13. Dennis, G., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H., and Lempicki, R.A. (2003). DAVID: Database for annotation, visualization, and integrated discovery. Genome Biol 4, R60.CrossRefGoogle Scholar
  14. Desiere, F., Deutsch, E.W., King, N.L., Nesvizhskii, A.I., Mallick, P., Eng, J., Chen, S., Eddes, J., Loevenich, S.N., and Aebersold, R. (2006). The PeptideAtlas project. Nucleic Acids Res 34, D655–D658.CrossRefGoogle Scholar
  15. Donoghue, M.T., Keshavaiah, C., Swamidatta, S.H., and Spillane, C. (2011). Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol Biol 11, 47.CrossRefGoogle Scholar
  16. Grün, D., Kirchner, M., Thierfelder, N., Stoeckius, M., Selbach, M., and Rajewsky, N. (2014). Conservation of mRNA and protein expression during development of C. elegans. Cell Rep 6, 565–577.CrossRefGoogle Scholar
  17. Jacob, F. (1977). Evolution and tinkering. Science 196, 1161–1166.CrossRefGoogle Scholar
  18. Katju, V., and Lynch, M.. (2003). The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics 165, 1793–1803.Google Scholar
  19. Kent, W.J. (2002). BLAT—The BLAST-like alignment tool. Genome Res 12, 656–664.CrossRefGoogle Scholar
  20. Kiontke, K., Gavin, N.P., Raynes, Y., Roehrig, C., Piano, F., and Fitch, D.H. A. (2004). Caenorhabditis phylogeny predicts convergence of hermaphroditism and extensive intron loss. Proc Natl Acad Sci USA 101, 9003–9008.CrossRefGoogle Scholar
  21. Knowles, D.G., and McLysaght, A. (2009). Recent de novo origin of human protein-coding genes. Genome Res 19, 1752–1759.CrossRefGoogle Scholar
  22. Krueger F. (2016). Trim Galore. Babraham Bioinforma.Google Scholar
  23. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359.CrossRefGoogle Scholar
  24. Levine, M.T., Jones, C.D., Kern, A.D., Lindfors, H.A., and Begun, D.J. (2006). Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci USA 103, 9935–9939.CrossRefGoogle Scholar
  25. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760.CrossRefGoogle Scholar
  26. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Durbin, R. (2009a). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079.CrossRefGoogle Scholar
  27. Li, L., Foster, C.M., Gan, Q., Nettleton, D., James, M.G., Myers, A.M., and Wurtele, E.S. (2009b). Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves. Plant J 58, 485–498.CrossRefGoogle Scholar
  28. Li, C.Y., Zhang, Y., Wang, Z., Zhang, Y., Cao, C., Zhang, P.W., Lu, S.J., Li, X.M., Yu, Q., Zheng, X., et al. (2010). A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput Biol 6, e1000734.CrossRefGoogle Scholar
  29. Long, M., Betrán, E., Thornton, K., and Wang, W. (2003). The origin of new genes: Glimpses from the young and old. Nat Rev Genet 4, 865–875.CrossRefGoogle Scholar
  30. Lynch, M., and Conery, J.S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155.CrossRefGoogle Scholar
  31. Martens, L., Hermjakob, H., Jones, P., Adamski, M., Taylor, C., States, D., Gevaert, K., Vandekerckhove, J., and Apweiler, R. (2005). PRIDE: The proteomics identifications database. Proteomics 5, 3537–3545.CrossRefGoogle Scholar
  32. Mayer, M.G., Rödelsperger, C., Witte, H., Riebesell, M., and Sommer, R.J. (2015). The orphan gene dauerless regulates dauer development and intraspecific competition in nematodes by copy number variation. PLoS Genet 11, e1005146.CrossRefGoogle Scholar
  33. Murphy, D.N., and McLysaght, A. (2012). De novo origin of protein-coding genes in murine rodents. PLoS ONE 7, e48650.CrossRefGoogle Scholar
  34. Neme, R., and Tautz, D. (2013). Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117.CrossRefGoogle Scholar
  35. Obayashi, T., and Kinoshita, K. (2009). Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA Res 16, 249–260.CrossRefGoogle Scholar
  36. Obayashi, T., and Kinoshita, K. (2011). COXPRESdb: A database to compare gene coexpression in seven model animals. Nucleic Acids Res 39, D1016–D1022.CrossRefGoogle Scholar
  37. Orgel, L.E., and Crick, F.H.C. (1980). Selfish DNA: the ultimate parasite. Nature 284, 604–607.CrossRefGoogle Scholar
  38. Palmieri, N., Kosiol, C., and Schlötterer, C. (2014). The life cycle of Drosophila orphan genes. eLife 3, e01311.CrossRefGoogle Scholar
  39. Pocock, R. (2004). A regulatory network of T-box genes and the even-skipped homologue vab-7 controls patterning and morphogenesis in C. elegans. Development 131, 2373–2385.CrossRefGoogle Scholar
  40. Pruitt, K.D., Tatusova, T., Brown, G.R., and Maglott, D.R. (2012). NCBI Reference Sequences (RefSeq): Current status, new features and genome annotation policy. Nucleic Acids Res 40, D130–D135.CrossRefGoogle Scholar
  41. Ritter, A.D., Shen, Y., Bass, J.F., Jeyaraj, S., Deplancke, B., Mukhopadhyay, A., Xu, J., Driscoll, M., Tissenbaum, H.A., and Walhout, A.J.M. (2013). Complex expression dynamics and robustness in C. elegans insulin networks. Genome Res 23, 954–965.CrossRefGoogle Scholar
  42. Rödelsperger, C., Streit, A., and Sommer, R.J. (2013). Structure, function and evolution of the nematode genome. In eLS (Chichester, UK: John Wiley & Sons, Ltd).Google Scholar
  43. Rubin, G.M. (2000). Comparative genomics of the eukaryotes. Science 287, 2204–2215.CrossRefGoogle Scholar
  44. Rudel, D., and Kimble, J. (2002). Evolution of discrete Notch-like receptors from a distant gene duplication in Caenorhabditis. Evol Dev 4, 319–333.CrossRefGoogle Scholar
  45. Stein, L., Sternberg, P., Durbin, R., Thierry-Mieg, J., and Spieth, J. (2001). WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res 29, 82–86.CrossRefGoogle Scholar
  46. Sun, W., Zhao, X.W., and Zhang, Z. (2015). Identification and evolution of the orphan genes in the domestic silkworm, Bombyx mori. FEBS Lett 589, 2731–2738.CrossRefGoogle Scholar
  47. Susumu O. (1970). Evolution by Gene Duplication (Springer).Google Scholar
  48. Tautz, D., and Domazet-Lošo, T. (2011). The evolutionary origin of orphan genes. Nat Rev Genet 12, 692–702.CrossRefGoogle Scholar
  49. The C. elegans Sequencing Consortium. (1998). Genome sequence of the nematode Caenorhabditis elegans: A platform for investigating biology. Science 282, 2012–2018.CrossRefGoogle Scholar
  50. Thompson, O., Edgley, M., Strasbourger, P., Flibotte, S., Ewing, B., Adair, R., Au, V., Chaudhry, I., Fernando, L., Hutter, H., et al. (2013). The million mutation project: a new approach to genetics in Caenorhabditis elegans. Genome Res 23, 1749–1762.CrossRefGoogle Scholar
  51. Toll-Riera, M., Bosch, N., Bellora, N., Castelo, R., Armengol, L., Estivill, X., and Mar Alba, M. (2009). Origin of primate orphan genes: A comparative genomics approach. Mol Biol Evol 26, 603–612.CrossRefGoogle Scholar
  52. Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: A revolutionary tool for transcriptomics. Nat Rev Genet 10, 57–63.CrossRefGoogle Scholar
  53. Wang, L., Park, H.J., Dasari, S., Wang, S., Kocher, J.P., and Li, W. (2013). CPAT: Coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res 41, e74.CrossRefGoogle Scholar
  54. Williams, S. (1996). Pearson’s correlation coefficient. N Z Med J 109, 38.Google Scholar
  55. Wu, D.D., Irwin, D.M., and Zhang, Y.P. (2011). De novo origin of human protein-coding genes. PLoS Genet 7, e1002379.CrossRefGoogle Scholar
  56. Xiao, W., Liu, H., Li, Y., Li, X., Xu, C., Long, M., and Wang, S. (2009). A rice gene of de novo origin negatively regulates pathogen-induced defense response. PLoS ONE 4, e4603.CrossRefGoogle Scholar
  57. Zhang, Y.E., Vibranovski, M.D., Krinsky, B.H., and Long, M. (2010a). Age-dependent chromosomal distribution of male-biased genes in Drosophila. Genome Res 20, 1526–1533.CrossRefGoogle Scholar
  58. Zhang, Y.E., Vibranovski, M.D., Landback, P.,. Marais, G.A.B, and Long, M. (2010b). Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol 8.Google Scholar
  59. Zhang, Y.E., Landback, P., Vibranovski, M., and Long, M. (2012). New genes expressed in human brains: Implications for annotating evolving genomes. Bioessays 34, 982–991.CrossRefGoogle Scholar
  60. Zhang, W., Landback, P., Gschwend, A.R., Shen, B., and Long, M. (2015). New genes drive the evolution of gene interaction networks in the human and mouse genomes. Genome Biol 16.Google Scholar
  61. Zhao, L., Saelao, P., Jones, C.D., and Begun, D.J. (2014). Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769–772.CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Wenyu Zhang
    • 1
    • 2
  • Yuanxiao Gao
    • 3
  • Manyuan Long
    • 2
    • 4
  • Bairong Shen
    • 1
    Email author
  1. 1.Institutes for Systems Genetics, West China HospitalSichuan UniversityChengduChina
  2. 2.Department of Ecology and Evolutionthe University of ChicagoChicagoUSA
  3. 3.Department of Evolutionary TheoryMax-Planck Institute for Evolutionary BiologyPlönGermany
  4. 4.Committee on Geneticsthe University of ChicagoChicagoUSA

Personalised recommendations