Skip to main content
Log in

Origination and evolution of orphan genes and de novo genes in the genome of Caenorhabditis elegans

  • Research Paper
  • Published:
Science China Life Sciences Aims and scope Submit manuscript

Abstract

Orphan genes that lack detectable homologues in other lineages could contribute to a variety of biological functions. However, their origination and function mechanisms remain largely unknown. Herein, through a comprehensive and systematic computational pipeline, we identified 893 orphan genes in the lineage of C. elegans, of which only a low fraction (0.9%) were derived from transposon elements. Six new protein-coding genes that de novo originated from non-coding DNA sequences in the genome of C. elegans were also identified. The authenticity and functionality of these orphan genes and de novo genes are supported by three lines of evidences, consisting of transcriptional data, and in silico proteomic data, and the fixation status data in wild populations. Orphan genes and de novo genes exhibited simple gene structures, such as, short in protein length, of fewer exons, and are frequently X-linked. RNA-seq data analysis showed these orphan genes are enriched with expression in embryo development and gonad, and their potential function in early development was further supported by gene ontology enrichment analysis results. Meanwhile, de novo genes were found to be with significant expression in gonad, and functional enrichment analysis of the co-expression genes of these de novo genes suggested they may be functionally involved in signaling transduction pathway and metabolism process. Our results presented the first systematic evidence on the evolution of orphan genes and de novo origin of genes in nematodes and their impacts on the functional and phenotypic evolution, and thus could shed new light on our appreciation of the importance of these new genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwala, R., Barrett, T., Beck, J., Benson, D.A., Bollin, C., Bolton, E., Bourexis, D., Brister, J.R., Bryant, S.H., Canese, K., et al. (2016). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44, D7–D19.

    Article  CAS  Google Scholar 

  • Arnold, A., Rahman, M.M., Lee, M.C., Muehlhaeusser, S., Katic, I., Hess, D., Scheckel, C., Wright, J.E., Stetak, A., Boag, P.R., et al. (2014). Functional characterization of C. elegans Y-box-binding proteins reveals tissue-specific functions and a critical role in the formation of polysomes. Nucleic Acids Res 42, 13353–13369.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J. M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: Tool for the unification of biology. Nat Genet 25, 25–29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Babraham Institute. (2013). FastQC: A quality control tool for high throughput sequence data. Babraham Bioinforma.

    Google Scholar 

  • Begun, D.J., Lindfors, H.A., Kern, A.D., and Jones, C.D. (2007). Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., and Bairoch A. (2007). UniProtKB/Swiss-Prot. Methods Mol Biol 406, 89–112.

    CAS  PubMed  Google Scholar 

  • Cai, J., Zhao, R., Jiang, H., and Wang, W. (2008). De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179, 487–496.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Castillo-Davis, C.I., and Hartl, D.L. (2002). Genome evolution and developmental constraint in Caenorhabditis elegans. Mol Biol Evol 19, 728–735.

    Article  CAS  PubMed  Google Scholar 

  • Chen, S., Krinsky, B.H., and Long, M. (2013). New genes as drivers of phenotypic evolution. Nat Rev Genet 14, 645–660.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cho, S., Jin, S.W., Cohen, A., and Ellis, R.E. (2004). A phylogeny of Caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 14, 1207–1220.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Colbourne, J.K., Pfrender, M.E., Gilbert, D., Thomas, W.K., Tucker, A., Oakley, T.H., Tokishita, S., Aerts, A., Arnold, G.J., Basu, M.K., et al. (2011). The ecoresponsive genome of Daphnia pulex. Science 331, 555–561.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cutter, A.D. (2008). Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate. Mol Biol Evol 25, 778–786.

    Article  CAS  PubMed  Google Scholar 

  • Dennis, G., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H., and Lempicki, R.A. (2003). DAVID: Database for annotation, visualization, and integrated discovery. Genome Biol 4, R60.

    Article  PubMed Central  Google Scholar 

  • Desiere, F., Deutsch, E.W., King, N.L., Nesvizhskii, A.I., Mallick, P., Eng, J., Chen, S., Eddes, J., Loevenich, S.N., and Aebersold, R. (2006). The PeptideAtlas project. Nucleic Acids Res 34, D655–D658.

    Article  CAS  PubMed  Google Scholar 

  • Donoghue, M.T., Keshavaiah, C., Swamidatta, S.H., and Spillane, C. (2011). Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol Biol 11, 47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Grün, D., Kirchner, M., Thierfelder, N., Stoeckius, M., Selbach, M., and Rajewsky, N. (2014). Conservation of mRNA and protein expression during development of C. elegans. Cell Rep 6, 565–577.

    Article  CAS  PubMed  Google Scholar 

  • Jacob, F. (1977). Evolution and tinkering. Science 196, 1161–1166.

    Article  CAS  PubMed  Google Scholar 

  • Katju, V., and Lynch, M.. (2003). The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics 165, 1793–1803.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Kent, W.J. (2002). BLAT—The BLAST-like alignment tool. Genome Res 12, 656–664.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kiontke, K., Gavin, N.P., Raynes, Y., Roehrig, C., Piano, F., and Fitch, D.H. A. (2004). Caenorhabditis phylogeny predicts convergence of hermaphroditism and extensive intron loss. Proc Natl Acad Sci USA 101, 9003–9008.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Knowles, D.G., and McLysaght, A. (2009). Recent de novo origin of human protein-coding genes. Genome Res 19, 1752–1759.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Krueger F. (2016). Trim Galore. Babraham Bioinforma.

    Google Scholar 

  • Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Levine, M.T., Jones, C.D., Kern, A.D., Lindfors, H.A., and Begun, D.J. (2006). Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci USA 103, 9935–9939.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Durbin, R. (2009a). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li, L., Foster, C.M., Gan, Q., Nettleton, D., James, M.G., Myers, A.M., and Wurtele, E.S. (2009b). Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves. Plant J 58, 485–498.

    Article  CAS  PubMed  Google Scholar 

  • Li, C.Y., Zhang, Y., Wang, Z., Zhang, Y., Cao, C., Zhang, P.W., Lu, S.J., Li, X.M., Yu, Q., Zheng, X., et al. (2010). A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput Biol 6, e1000734.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Long, M., Betrán, E., Thornton, K., and Wang, W. (2003). The origin of new genes: Glimpses from the young and old. Nat Rev Genet 4, 865–875.

    Article  CAS  PubMed  Google Scholar 

  • Lynch, M., and Conery, J.S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155.

    Article  CAS  PubMed  Google Scholar 

  • Martens, L., Hermjakob, H., Jones, P., Adamski, M., Taylor, C., States, D., Gevaert, K., Vandekerckhove, J., and Apweiler, R. (2005). PRIDE: The proteomics identifications database. Proteomics 5, 3537–3545.

    Article  PubMed  Google Scholar 

  • Mayer, M.G., Rödelsperger, C., Witte, H., Riebesell, M., and Sommer, R.J. (2015). The orphan gene dauerless regulates dauer development and intraspecific competition in nematodes by copy number variation. PLoS Genet 11, e1005146.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Murphy, D.N., and McLysaght, A. (2012). De novo origin of protein-coding genes in murine rodents. PLoS ONE 7, e48650.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Neme, R., and Tautz, D. (2013). Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Obayashi, T., and Kinoshita, K. (2009). Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA Res 16, 249–260.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Obayashi, T., and Kinoshita, K. (2011). COXPRESdb: A database to compare gene coexpression in seven model animals. Nucleic Acids Res 39, D1016–D1022.

    Article  CAS  PubMed  Google Scholar 

  • Orgel, L.E., and Crick, F.H.C. (1980). Selfish DNA: the ultimate parasite. Nature 284, 604–607.

    Article  CAS  PubMed  Google Scholar 

  • Palmieri, N., Kosiol, C., and Schlötterer, C. (2014). The life cycle of Drosophila orphan genes. eLife 3, e01311.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pocock, R. (2004). A regulatory network of T-box genes and the even-skipped homologue vab-7 controls patterning and morphogenesis in C. elegans. Development 131, 2373–2385.

    Article  CAS  PubMed  Google Scholar 

  • Pruitt, K.D., Tatusova, T., Brown, G.R., and Maglott, D.R. (2012). NCBI Reference Sequences (RefSeq): Current status, new features and genome annotation policy. Nucleic Acids Res 40, D130–D135.

    Article  CAS  PubMed  Google Scholar 

  • Ritter, A.D., Shen, Y., Bass, J.F., Jeyaraj, S., Deplancke, B., Mukhopadhyay, A., Xu, J., Driscoll, M., Tissenbaum, H.A., and Walhout, A.J.M. (2013). Complex expression dynamics and robustness in C. elegans insulin networks. Genome Res 23, 954–965.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rödelsperger, C., Streit, A., and Sommer, R.J. (2013). Structure, function and evolution of the nematode genome. In eLS (Chichester, UK: John Wiley & Sons, Ltd).

    Google Scholar 

  • Rubin, G.M. (2000). Comparative genomics of the eukaryotes. Science 287, 2204–2215.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rudel, D., and Kimble, J. (2002). Evolution of discrete Notch-like receptors from a distant gene duplication in Caenorhabditis. Evol Dev 4, 319–333.

    Article  CAS  PubMed  Google Scholar 

  • Stein, L., Sternberg, P., Durbin, R., Thierry-Mieg, J., and Spieth, J. (2001). WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res 29, 82–86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sun, W., Zhao, X.W., and Zhang, Z. (2015). Identification and evolution of the orphan genes in the domestic silkworm, Bombyx mori. FEBS Lett 589, 2731–2738.

    Article  CAS  PubMed  Google Scholar 

  • Susumu O. (1970). Evolution by Gene Duplication (Springer).

    Google Scholar 

  • Tautz, D., and Domazet-Lošo, T. (2011). The evolutionary origin of orphan genes. Nat Rev Genet 12, 692–702.

    Article  CAS  PubMed  Google Scholar 

  • The C. elegans Sequencing Consortium. (1998). Genome sequence of the nematode Caenorhabditis elegans: A platform for investigating biology. Science 282, 2012–2018.

    Article  Google Scholar 

  • Thompson, O., Edgley, M., Strasbourger, P., Flibotte, S., Ewing, B., Adair, R., Au, V., Chaudhry, I., Fernando, L., Hutter, H., et al. (2013). The million mutation project: a new approach to genetics in Caenorhabditis elegans. Genome Res 23, 1749–1762.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Toll-Riera, M., Bosch, N., Bellora, N., Castelo, R., Armengol, L., Estivill, X., and Mar Alba, M. (2009). Origin of primate orphan genes: A comparative genomics approach. Mol Biol Evol 26, 603–612.

    Article  CAS  PubMed  Google Scholar 

  • Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: A revolutionary tool for transcriptomics. Nat Rev Genet 10, 57–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang, L., Park, H.J., Dasari, S., Wang, S., Kocher, J.P., and Li, W. (2013). CPAT: Coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res 41, e74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Williams, S. (1996). Pearson’s correlation coefficient. N Z Med J 109, 38.

    CAS  PubMed  Google Scholar 

  • Wu, D.D., Irwin, D.M., and Zhang, Y.P. (2011). De novo origin of human protein-coding genes. PLoS Genet 7, e1002379.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xiao, W., Liu, H., Li, Y., Li, X., Xu, C., Long, M., and Wang, S. (2009). A rice gene of de novo origin negatively regulates pathogen-induced defense response. PLoS ONE 4, e4603.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang, Y.E., Vibranovski, M.D., Krinsky, B.H., and Long, M. (2010a). Age-dependent chromosomal distribution of male-biased genes in Drosophila. Genome Res 20, 1526–1533.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang, Y.E., Vibranovski, M.D., Landback, P.,. Marais, G.A.B, and Long, M. (2010b). Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol 8.

    Google Scholar 

  • Zhang, Y.E., Landback, P., Vibranovski, M., and Long, M. (2012). New genes expressed in human brains: Implications for annotating evolving genomes. Bioessays 34, 982–991.

    Article  CAS  PubMed  Google Scholar 

  • Zhang, W., Landback, P., Gschwend, A.R., Shen, B., and Long, M. (2015). New genes drive the evolution of gene interaction networks in the human and mouse genomes. Genome Biol 16.

    Google Scholar 

  • Zhao, L., Saelao, P., Jones, C.D., and Begun, D.J. (2014). Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769–772.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to Li Zhang for providing helpful suggestions on de novo gene identification analysis. Computing was supported by EEgrid cluster of the University of Chicago. This work was supported by National Natural Science Foundation of China (31600670 to W. Zhang, 31670851 to B. Shen).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bairong Shen.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Gao, Y., Long, M. et al. Origination and evolution of orphan genes and de novo genes in the genome of Caenorhabditis elegans. Sci. China Life Sci. 62, 579–593 (2019). https://doi.org/10.1007/s11427-019-9482-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11427-019-9482-0

Keywords

Navigation