Abstract
Next generation sequencing technologies enabled high-density genotyping for large numbers of samples. Nowadays SNP calling pipelines produce up to millions of such markers, but which need to be filtered in various ways according to the type of analyses. One of the main challenges still lies in the management of an increasing volume of genotyping files that are difficult to handle for many applications. Here, we provide a practical guide for efficiently managing large genomic variation data using Gigwa, a user-friendly, scalable and versatile application that may be deployed either remotely on web servers or on a local machine.
Key words
- NoSQL database
- SNP markers
- INDELs
- VCF
- Web tool
- Interoperability
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptions





References
Sempéré G, Philippe F, Dereeper A, Ruiz M, Sarah G, Larmande P (2016) Gigwa-genotype investigator for genome-wide analyses. GigaScience 5:25
Guilhem S, Adrien P, Mathieu R, Julien F, Yann H, Fabien DB et al (2019) Gigwa v2 – extended and improved genotype investigator. GigaScience 8:giz051
Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Cech M et al (2018) The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537–W544
Milne I, Shaw P, Stephen G, Bayer M, Cardle L, Thomas WTB et al (2010) Flapjack--graphical genotype visualization. Bioinformatics 26:3133–3134
Abbeloos R, Backlund JE, Basterrechea Salido M, Bauchet G, Benites-Alfaro O, Birkett C et al (2019) BrAPI - an application programming Interface for plant breeding applications. Bioinformatics 35(20):4147–4155
The global alliance for genomics and health consortium (2017) GA4GH API [Internet]. https://github.com/ga4gh/ga4gh-schemas
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158
Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Zhang H et al (2003) The international HapMap project. Nature 426:789–796
Slifer SH (2018) PLINK: key functions for data analysis. Curr Protoc Hum Genet 97:e59
MongoDB Inc (2015) MongoDB [Internet]. [cited 2015 Dec 19]. https://www.mongodb.org/
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w 1118; iso-2; iso-3. Fly (Austin) 6:80–92
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A et al (2016) The Ensembl variant effect predictor. Genome Biol 17:122
Sardos J, Rouard M, Hueber Y, Cenci A, Hyma KE, van den Houwe I et al (2016) A genome-wide association study on the seedless phenotype in banana (Musa spp.) reveals the potential of a selected panel to detect candidate genes in a vegetatively propagated crop. PLoS One 11:e0154448
Feulner PGD, Schwarzer J, Haesler MP, Meier JI, Seehausen O (2018) A dense linkage map of Lake Victoria cichlids improved the Pundamilia genome assembly and revealed a major QTL for sex-determination. G3 (Bethesda) 8:2411–2420
Hazzouri KM, Gros-Balthazard M, Flowers JM, Copetti D, Lemansour A, Lebrun M et al (2019) Genome-wide association mapping of date palm fruit traits. Nat Commun 10:4680
McKay SJ, Vergara I, a, Stajich JE. (2010) Using the generic Synteny browser (GBrowse_syn). Curr Protoc Bioinformatics Chapter 9:Unit 9.12
Buels R, Yao E, Diesh CM, Hayes RD, Munoz-Torres M, Helt G et al (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 17:66
DARwin - Dissimilarity analysis and representation for Windows [Internet]. [cited 2018 Nov 21]. http://darwin.cirad.fr/
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909. https://doi.org/10.1038/ng1847
Sequence Ontology consortium. GFF3 Specification
Dereeper A, Homa F, Andres G, Sempere G, Sarah G, Hueber Y, Dufayard J-F, Ruiz M (2015) SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations. Nucleic Acids Res 43(W1):W295–W300. https://doi.org/10.1093/nar/gkv35
Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14:178–192
Morales N, Bauchet GJ, Tantikanjana T, Powell AF, Ellerbrock BJ, Tecle IY et al (2020) High density genotype storage for plant breeding in the Chado schema of Breedbase. PLoS One 15:e0240059
Integrated Breeding Platform - Plant breeding software [Internet]. [cited 2020 Dec 7]. https://www.integratedbreeding.net/
Ruas M, Guignon V, Sempere G, Sardos J, Hueber Y, Duvergey H et al (2017) MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data. Database (Oxford) 2017:bax046. https://doi.org/10.1093/database/bax046/3866796
Raubach S, Kilian B, Dreher K, Amri A, Bassi FM, Boukar O et al (2020) From bits to bites: advancement of the germinate platform to support prebreeding informatics for crop wild relatives. Crop Sci [Internet]. [cited 2020 Dec 7];n/a. https://acsess.onlinelibrary.wiley.com/doi/abs/10.1002/csc2.20248
Carceller P (2018) beegmac [Internet]. Github SouthGreen. https://github.com/SouthGreenPlatform/beegmac
Hamelin C, Sempere G, Jouffe V, Ruiz M (2012) TropGeneDB, the multi-tropical crop information system updated and extended. Nucleic Acids Res 41(Database-Issue):1172–1175
The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies [Internet]. HDF Group. [cited 2020 Dec 7]. https://www.hdfgroup.org/
Nti-Addae Y, Matthews D, Ulat VJ, Syed R, Sempéré G, Pétel A et al (2019) Benchmarking database systems for genomic selection implementation. Database [Internet]. [cited 2020 Feb 18]. https://doi.org/10.1093/database/baz096/5566651
South Green collaborators (2016) The south green portal: a comprehensive resource for tropical and Mediterranean crop genomics South Green collaborators. Curr Plant Biol 78:6–9
Acknowledgments
This work was technically supported by the CIRAD-UMR AGAP and IRD-Itrop HPC Data Centers of the South Green Bioinformatics platform (https://www.southgreen.fr/). The authors would also like to thank Manuel Ruiz for providing the original idea, Valentin Guignon for developing a Gigwa module for Drupal, and the BrAPI community for interactions that helped improve and promote the software.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Sempéré, G., Larmande, P., Rouard, M. (2022). Managing High-Density Genotyping Data with Gigwa. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 2443. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2067-0_21
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2067-0_21
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2066-3
Online ISBN: 978-1-0716-2067-0
eBook Packages: Springer Protocols
