Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data
Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 (TIG3) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.
Keywordspan-genome pig reference genome 3D chromatin structure presence-absence variation
Unable to display preview. Download preview PDF.
This work was supported by the National Natural Science Foundation of China (31822052 and 31572381) to Y.J and the Science & Technology Support Program of Sichuan (2016NYZ0042 and 2017NZDZX0002) to M.Z.L. We thank the High Performance Computing platform of Northwest A&F University for their assistance with the computing.
- Blanco, E., Parra, G., and Guigo, R. (2007). Using geneid to identify genes. Curr Protoc Bioinformatics Chapter 4, Unit 4.3.Google Scholar
- Casper, J., Zweig, A.S., Villarreal, C., Tyner, C., Speir, M.L., Rosenbloom, K.R., Raney, B.J., Lee, C.M., Lee, B.T., Karolchik, D., et al. (2017) OUP accepted manuscript. Nucleic Acids Res.Google Scholar
- Frantz, L.A.F., Schraiber, J.G., Madsen, O., Megens, H.J., Cagan, A., Bosse, M., Paudel, Y., Crooijmans, R.P.M.A., Larson, G., and Groenen, M.A.M. (2015). Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet 47, 1141–1148.CrossRefPubMedGoogle Scholar
- Gordon, S.P., Contreras-Moreira, B., Woods, D.P., Des Marais, D.L., Burgess, D., Shu, S., Stritt, C., Roulin, A.C., Schackwitz, W., Tyler, L., et al. (2017). Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun 8, 2184.CrossRefPubMedPubMedCentralGoogle Scholar
- Groenen, M.A.M., Archibald, A.L., Uenishi, H., Tuggle, C.K., Takeuchi, Y., Rothschild, M.F., Rogel-Gaillard, C., Park, C., Milan, D., Megens, H.J., et al. (2012). Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491, 393–398.CrossRefPubMedPubMedCentralGoogle Scholar
- Jeong, H., Song, K.D., Seo, M., Caetano-Anollés, K., Kim, J., Kwak, W., Oh, J.D., Kim, E.S., Jeong, D.K., Cho, S., et al. (2015). Exploring evidence of positive selection reveals genetic basis of meat quality traits in Berkshire pigs through whole genome sequencing. BMC Genet 16, 104.CrossRefPubMedPubMedCentralGoogle Scholar
- Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293.CrossRefPubMedPubMedCentralGoogle Scholar
- McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303.CrossRefPubMedPubMedCentralGoogle Scholar
- Monat, C., Pera, B., Ndjiondjop, M.N., Sow, M., Tranchant-Dubreuil, C., Bastianelli, L., Ghesquière, A., and Sabot, F. (2016). de novo assemblies of three Oryza glaberrima accessions provide first insights about pan-genome of African rices. Genome Biol Evol evw253.Google Scholar
- Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.CrossRefPubMedPubMedCentralGoogle Scholar
- Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., et al. (2014). Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15, 506.PubMedPubMedCentralGoogle Scholar
- Vaccari, C.M., Romanini, M.V., Musante, I., Tassano, E., Gimelli, S., Divizia, M.T., Torre, M., Morovic, C.G., Lerone, M., Ravazzolo, R., et al. (2014). De novo deletion of chromosome 11q12.3 in monozygotic twins affected by Poland Syndrome. BMC Med Genet 15, 63.CrossRefPubMedPubMedCentralGoogle Scholar
- Wang, X., Zheng, Z., Cai, Y., Chen, T., Li, C., Fu, W., and Jiang, Y. (2017). CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations. GigaScience 6.Google Scholar