Advertisement

Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

  • Xiaomeng Tian
  • Ran Li
  • Weiwei Fu
  • Yan Li
  • Xihong Wang
  • Ming Li
  • Duo Du
  • Qianzi Tang
  • Yudong Cai
  • Yiming Long
  • Yue Zhao
  • Mingzhou LiEmail author
  • Yu JiangEmail author
Research Paper

Abstract

Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 (TIG3) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.

Keywords

pan-genome pig reference genome 3D chromatin structure presence-absence variation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China (31822052 and 31572381) to Y.J and the Science & Technology Support Program of Sichuan (2016NYZ0042 and 2017NZDZX0002) to M.Z.L. We thank the High Performance Computing platform of Northwest A&F University for their assistance with the computing.

Supplementary material

11427_2019_9551_MOESM1_ESM.docx (2.2 mb)
Tian, et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data
11427_2019_9551_MOESM2_ESM.xls (19 mb)
Supplementary material, approximately 18.9 MB.

References

  1. Ai, H., Fang, X., Yang, B., Huang, Z., Chen, H., Mao, L., Zhang, F., Zhang, L., Cui, L., He, W., et al. (2015). Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat Genet 47, 217–225.CrossRefPubMedGoogle Scholar
  2. Arumemi, F., Bayles, I., Paul, J., and Milcarek, C. (2013). Shared and discrete interacting partners of ELL1 and ELL2 by yeast two-hybrid assay. ABB 04, 774–780.CrossRefGoogle Scholar
  3. Blanco, E., Parra, G., and Guigo, R. (2007). Using geneid to identify genes. Curr Protoc Bioinformatics Chapter 4, Unit 4.3.Google Scholar
  4. Burge, C.B., and Karlin, S. (1998). Finding the genes in genomic DNA. Curr Opin Struct Biol 8, 346–354.CrossRefPubMedGoogle Scholar
  5. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). BLAST+: architecture and applications. BMC BioInf 10, 421.CrossRefGoogle Scholar
  6. Casper, J., Zweig, A.S., Villarreal, C., Tyner, C., Speir, M.L., Rosenbloom, K.R., Raney, B.J., Lee, C.M., Lee, B.T., Karolchik, D., et al. (2017) OUP accepted manuscript. Nucleic Acids Res.Google Scholar
  7. Christopoulos, A., Ligoudistianou, C., Bethanis, P., and Gazouli, M. (2018). Successful use of adipose-derived mesenchymal stem cells to correct a male breast affected by Poland Syndrome: a case report. J Surg Case Rep 2018(7), rjy151.CrossRefPubMedPubMedCentralGoogle Scholar
  8. Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J. S., and Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380.CrossRefPubMedPubMedCentralGoogle Scholar
  9. Doerks, T., Copley, R.R., Schultz, J., Ponting, C.P., and Bork, P. (2002). Systematic identification of novel protein domain families associated with nuclear functions. Genome Res 12, 47–56.CrossRefPubMedPubMedCentralGoogle Scholar
  10. Dong, P., Tu, X., Chu, P.Y., Lü, P., Zhu, N., Grierson, D., Du, B., Li, P., and Zhong, S. (2017). 3D chromatin architecture of large plant genomes determined by local A/B compartments. Mol Plant 10, 1497–1509.CrossRefPubMedGoogle Scholar
  11. Durand, N.C., Shamim, M.S., Machol, I., Rao, S.S.P., Huntley, M.H., Lander, E.S., and Aiden, E.L. (2016). Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3, 95–98.CrossRefPubMedPubMedCentralGoogle Scholar
  12. Fang, X., Mou, Y., Huang, Z., Li, Y., Han, L., Zhang, Y., Feng, Y., Chen, Y., Jiang, X., Zhao, W., et al. (2012). The sequence and analysis of a Chinese pig genome. Gigascience 1, 16.CrossRefPubMedPubMedCentralGoogle Scholar
  13. Frantz, L.A.F., Schraiber, J.G., Madsen, O., Megens, H.J., Cagan, A., Bosse, M., Paudel, Y., Crooijmans, R.P.M.A., Larson, G., and Groenen, M.A.M. (2015). Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet 47, 1141–1148.CrossRefPubMedGoogle Scholar
  14. Frazee, A.C., Pertea, G., Jaffe, A.E., Langmead, B., Salzberg, S.L., and Leek, J.T. (2015). Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol 33, 243–246.CrossRefPubMedPubMedCentralGoogle Scholar
  15. Golicz, A.A., Bayer, P.E., Barker, G.C., Edger, P.P., Kim, H.R., Martinez, P. A., Chan, C.K.K., Severn-Ellis, A., McCombie, W.R., Parkin, I.A.P., et al. (2016). The pangenome of an agronomically important crop plant Brassica oleracea. Nat Commun 7, 13390.CrossRefPubMedPubMedCentralGoogle Scholar
  16. Gordon, S.P., Contreras-Moreira, B., Woods, D.P., Des Marais, D.L., Burgess, D., Shu, S., Stritt, C., Roulin, A.C., Schackwitz, W., Tyler, L., et al. (2017). Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun 8, 2184.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Groenen, M.A.M., Archibald, A.L., Uenishi, H., Tuggle, C.K., Takeuchi, Y., Rothschild, M.F., Rogel-Gaillard, C., Park, C., Milan, D., Megens, H.J., et al. (2012). Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491, 393–398.CrossRefPubMedPubMedCentralGoogle Scholar
  18. Guirao-Rico, S., Ramirez, O., Ojeda, A., Amills, M., and Ramos-Onsins, S. E. (2018). Porcine Y-chromosome variation is consistent with the occurrence of paternal gene flow from non-Asian to Asian populations. Heredity 120, 63–76.CrossRefPubMedGoogle Scholar
  19. Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B., Peñagaricano, F., Lindquist, E., Pedraza, M.A., Barry, K., et al. (2014). Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121–135.CrossRefPubMedPubMedCentralGoogle Scholar
  20. Jeong, H., Song, K.D., Seo, M., Caetano-Anollés, K., Kim, J., Kwak, W., Oh, J.D., Kim, E.S., Jeong, D.K., Cho, S., et al. (2015). Exploring evidence of positive selection reveals genetic basis of meat quality traits in Berkshire pigs through whole genome sequencing. BMC Genet 16, 104.CrossRefPubMedPubMedCentralGoogle Scholar
  21. Kent, W.J. (2002). BLAT—The BLAST-like alignment tool. Genome Res 12, 656–664.CrossRefPubMedPubMedCentralGoogle Scholar
  22. Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–360.CrossRefPubMedPubMedCentralGoogle Scholar
  23. Knight, P.A., and Ruiz, D. (2013). A fast algorithm for matrix balancing. IMA J Numer Anal 33, 1029–1047.CrossRefGoogle Scholar
  24. Kumar, S., Stecher, G., and Tamura, K. (2016). MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33, 1870–1874.CrossRefPubMedGoogle Scholar
  25. Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H., Brand, T., Willerslev, E., et al. (2005). Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science 307, 1618–1621.CrossRefPubMedGoogle Scholar
  26. Leung, D., Jung, I., Rajagopal, N., Schmitt, A., Selvaraj, S., Lee, A.Y., Yen, C.A., Lin, S., Lin, Y., Qiu, Y., et al. (2015). Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature 518, 350–354.CrossRefPubMedPubMedCentralGoogle Scholar
  27. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760.CrossRefPubMedPubMedCentralGoogle Scholar
  28. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079.CrossRefPubMedPubMedCentralGoogle Scholar
  29. Li, M., Chen, L., Tian, S., Lin, Y., Tang, Q., Zhou, X., Li, D., Yeung, C.K.L., Che, T., Jin, L., et al. (2017). Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies. Genome Res 27, 865–874.CrossRefPubMedPubMedCentralGoogle Scholar
  30. Li, M., Tian, S., Jin, L., Zhou, G., Li, Y., Zhang, Y., Wang, T., Yeung, C.K.L., Chen, L., Ma, J., et al. (2013). Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nat Genet 45, 1431–1438.CrossRefPubMedGoogle Scholar
  31. Li, R., Li, Y., Zheng, H., Luo, R., Zhu, H., Li, Q., Qian, W., Ren, Y., Tian, G., Li, J., et al. (2010). Building the sequence map of the human pan-genome. Nat Biotechnol 28, 57–63.CrossRefPubMedGoogle Scholar
  32. Li, Y., Zhou, G., Ma, J., Jiang, W., Jin, L., Zhang, Z., Guo, Y., Zhang, J., Sui, Y., Zheng, L., et al. (2014). De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat Biotechnol 32, 1045–1052.CrossRefPubMedGoogle Scholar
  33. Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293.CrossRefPubMedPubMedCentralGoogle Scholar
  34. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303.CrossRefPubMedPubMedCentralGoogle Scholar
  35. Monat, C., Pera, B., Ndjiondjop, M.N., Sow, M., Tranchant-Dubreuil, C., Bastianelli, L., Ghesquière, A., and Sabot, F. (2016). de novo assemblies of three Oryza glaberrima accessions provide first insights about pan-genome of African rices. Genome Biol Evol evw253.Google Scholar
  36. Morgulis, A., Gertz, E.M., Schäffer, A.A., and Agarwala, R. (2006). WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141.CrossRefPubMedGoogle Scholar
  37. Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., Allen, J.E., Amon, J., Arcà, B., Arensburger, P., Artemov, G., et al. (2015). Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes. Science 347, 1258522–43.CrossRefPubMedGoogle Scholar
  38. Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., and Salzberg, S.L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295.CrossRefPubMedPubMedCentralGoogle Scholar
  39. Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.CrossRefPubMedPubMedCentralGoogle Scholar
  40. Ron, G., Globerson, Y., Moran, D., and Kaplan, T. (2017). Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains. Nat Commun 8, 2237.CrossRefPubMedPubMedCentralGoogle Scholar
  41. Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., et al. (2014). Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15, 506.PubMedPubMedCentralGoogle Scholar
  42. Shen, W., Le, S., Li, Y., and Hu, F. (2016). SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11, e0163962.CrossRefPubMedPubMedCentralGoogle Scholar
  43. Sherman, R.M., Forman, J., Antonescu, V., Puiu, D., Daya, M., Rafaels, N., Boorgula, M.P., Chavan, S., Vergara, C., Ortega, V.E., et al. (2019). Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet 51, 30–35.CrossRefPubMedGoogle Scholar
  44. Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439.CrossRefPubMedPubMedCentralGoogle Scholar
  45. Sun, C., Hu, Z., Zheng, T., Lu, K., Zhao, Y., Wang, W., Shi, J., Wang, C., Lu, J., Zhang, D., et al. (2017). RPAN: rice pan-genome browser for ∼3000 rice genomes. Nucleic Acids Res 45, 597–605.CrossRefPubMedGoogle Scholar
  46. Uyama, T., Ichi, I., Kono, N., Inoue, A., Tsuboi, K., Jin, X.H., Araki, N., Aoki, J., Arai, H., and Ueda, N. (2012). Regulation of peroxisomal lipid metabolism by catalytic activity of tumor suppressor H-rev107. J Biol Chem 287, 2706–2718.CrossRefPubMedGoogle Scholar
  47. Vaccari, C.M., Romanini, M.V., Musante, I., Tassano, E., Gimelli, S., Divizia, M.T., Torre, M., Morovic, C.G., Lerone, M., Ravazzolo, R., et al. (2014). De novo deletion of chromosome 11q12.3 in monozygotic twins affected by Poland Syndrome. BMC Med Genet 15, 63.CrossRefPubMedPubMedCentralGoogle Scholar
  48. Wang, X., Zheng, Z., Cai, Y., Chen, T., Li, C., Fu, W., and Jiang, Y. (2017). CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations. GigaScience 6.Google Scholar
  49. Wong, K.H.Y., Levy-Sakin, M., and Kwok, P.Y. (2018). De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat Commun 9, 3040.CrossRefPubMedPubMedCentralGoogle Scholar
  50. Xiao, S., Xie, D., Cao, X., Yu, P., Xing, X., Chen, C.C., Musselman, M., Xie, M., West, F.D., Lewin, H.A., et al. (2012). Comparative epigenomic annotation of regulatory DNA. Cell 149, 1381–1392.CrossRefPubMedPubMedCentralGoogle Scholar
  51. Xie, C., Mao, X., Huang, J., Ding, Y., Wu, J., Dong, S., Kong, L., Gao, G., Li, C.Y., and Wei, L. (2011). KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res 39, W316–W322.CrossRefPubMedPubMedCentralGoogle Scholar
  52. Yan, G., Zhang, G., Fang, X., Zhang, Y., Li, C., Ling, F., Cooper, D.N., Li, Q., Li, Y., van Gool, A.J., et al. (2011). Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 29, 1019–1023.CrossRefPubMedGoogle Scholar
  53. Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B. E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.CrossRefPubMedPubMedCentralGoogle Scholar
  54. Zhao, Q., Feng, Q., Lu, H., Li, Y., Wang, A., Tian, Q., Zhan, Q., Lu, Y., Zhang, L., Huang, T., et al. (2018). Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet 50, 278–284.CrossRefPubMedGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Xiaomeng Tian
    • 1
  • Ran Li
    • 1
  • Weiwei Fu
    • 1
  • Yan Li
    • 2
  • Xihong Wang
    • 1
  • Ming Li
    • 1
  • Duo Du
    • 1
  • Qianzi Tang
    • 2
  • Yudong Cai
    • 1
  • Yiming Long
    • 1
  • Yue Zhao
    • 1
  • Mingzhou Li
    • 2
    Email author
  • Yu Jiang
    • 1
    Email author
  1. 1.Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and TechnologyNorthwest A&F UniversityYanglingChina
  2. 2.Institute of Animal Genetics and Breeding, College of Animal Science and TechnologySichuan Agricultural UniversityChengduChina

Personalised recommendations