Skip to main content

Managing High-Density Genotyping Data with Gigwa

Part of the Methods in Molecular Biology book series (MIMB,volume 2443)

Abstract

Next generation sequencing technologies enabled high-density genotyping for large numbers of samples. Nowadays SNP calling pipelines produce up to millions of such markers, but which need to be filtered in various ways according to the type of analyses. One of the main challenges still lies in the management of an increasing volume of genotyping files that are difficult to handle for many applications. Here, we provide a practical guide for efficiently managing large genomic variation data using Gigwa, a user-friendly, scalable and versatile application that may be deployed either remotely on web servers or on a local machine.

Key words

  • NoSQL database
  • SNP markers
  • INDELs
  • VCF
  • Web tool
  • Interoperability

This is a preview of subscription content, access via your institution.

Buying options

Protocol
EUR   44.95
Price includes VAT (Netherlands)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR   99.99
Price includes VAT (Netherlands)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR   130.79
Price includes VAT (Netherlands)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
EUR   174.39
Price includes VAT (Netherlands)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Sempéré G, Philippe F, Dereeper A, Ruiz M, Sarah G, Larmande P (2016) Gigwa-genotype investigator for genome-wide analyses. GigaScience 5:25

    CrossRef  Google Scholar 

  2. Guilhem S, Adrien P, Mathieu R, Julien F, Yann H, Fabien DB et al (2019) Gigwa v2 – extended and improved genotype investigator. GigaScience 8:giz051

    CrossRef  Google Scholar 

  3. Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Cech M et al (2018) The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537–W544

    CrossRef  CAS  Google Scholar 

  4. Milne I, Shaw P, Stephen G, Bayer M, Cardle L, Thomas WTB et al (2010) Flapjack--graphical genotype visualization. Bioinformatics 26:3133–3134

    CrossRef  CAS  Google Scholar 

  5. Abbeloos R, Backlund JE, Basterrechea Salido M, Bauchet G, Benites-Alfaro O, Birkett C et al (2019) BrAPI - an application programming Interface for plant breeding applications. Bioinformatics 35(20):4147–4155

    CrossRef  Google Scholar 

  6. The global alliance for genomics and health consortium (2017) GA4GH API [Internet]. https://github.com/ga4gh/ga4gh-schemas

  7. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158

    CrossRef  CAS  Google Scholar 

  8. Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Zhang H et al (2003) The international HapMap project. Nature 426:789–796

    CrossRef  CAS  Google Scholar 

  9. Slifer SH (2018) PLINK: key functions for data analysis. Curr Protoc Hum Genet 97:e59

    CrossRef  Google Scholar 

  10. MongoDB Inc (2015) MongoDB [Internet]. [cited 2015 Dec 19]. https://www.mongodb.org/

  11. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w 1118; iso-2; iso-3. Fly (Austin) 6:80–92

    CrossRef  CAS  Google Scholar 

  12. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A et al (2016) The Ensembl variant effect predictor. Genome Biol 17:122

    CrossRef  Google Scholar 

  13. Sardos J, Rouard M, Hueber Y, Cenci A, Hyma KE, van den Houwe I et al (2016) A genome-wide association study on the seedless phenotype in banana (Musa spp.) reveals the potential of a selected panel to detect candidate genes in a vegetatively propagated crop. PLoS One 11:e0154448

    CrossRef  Google Scholar 

  14. Feulner PGD, Schwarzer J, Haesler MP, Meier JI, Seehausen O (2018) A dense linkage map of Lake Victoria cichlids improved the Pundamilia genome assembly and revealed a major QTL for sex-determination. G3 (Bethesda) 8:2411–2420

    CrossRef  CAS  Google Scholar 

  15. Hazzouri KM, Gros-Balthazard M, Flowers JM, Copetti D, Lemansour A, Lebrun M et al (2019) Genome-wide association mapping of date palm fruit traits. Nat Commun 10:4680

    CrossRef  Google Scholar 

  16. McKay SJ, Vergara I, a, Stajich JE. (2010) Using the generic Synteny browser (GBrowse_syn). Curr Protoc Bioinformatics Chapter 9:Unit 9.12

    PubMed  Google Scholar 

  17. Buels R, Yao E, Diesh CM, Hayes RD, Munoz-Torres M, Helt G et al (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 17:66

    CrossRef  Google Scholar 

  18. DARwin - Dissimilarity analysis and representation for Windows [Internet]. [cited 2018 Nov 21]. http://darwin.cirad.fr/

  19. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909. https://doi.org/10.1038/ng1847

  20. Sequence Ontology consortium. GFF3 Specification

    Google Scholar 

  21. Dereeper A, Homa F, Andres G, Sempere G, Sarah G, Hueber Y, Dufayard J-F, Ruiz M (2015) SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations. Nucleic Acids Res 43(W1):W295–W300. https://doi.org/10.1093/nar/gkv35

  22. Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14:178–192

    CrossRef  Google Scholar 

  23. Morales N, Bauchet GJ, Tantikanjana T, Powell AF, Ellerbrock BJ, Tecle IY et al (2020) High density genotype storage for plant breeding in the Chado schema of Breedbase. PLoS One 15:e0240059

    CrossRef  CAS  Google Scholar 

  24. Integrated Breeding Platform - Plant breeding software [Internet]. [cited 2020 Dec 7]. https://www.integratedbreeding.net/

  25. Ruas M, Guignon V, Sempere G, Sardos J, Hueber Y, Duvergey H et al (2017) MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data. Database (Oxford) 2017:bax046. https://doi.org/10.1093/database/bax046/3866796

    CrossRef  Google Scholar 

  26. Raubach S, Kilian B, Dreher K, Amri A, Bassi FM, Boukar O et al (2020) From bits to bites: advancement of the germinate platform to support prebreeding informatics for crop wild relatives. Crop Sci [Internet]. [cited 2020 Dec 7];n/a. https://acsess.onlinelibrary.wiley.com/doi/abs/10.1002/csc2.20248

  27. Carceller P (2018) beegmac [Internet]. Github SouthGreen. https://github.com/SouthGreenPlatform/beegmac

  28. Hamelin C, Sempere G, Jouffe V, Ruiz M (2012) TropGeneDB, the multi-tropical crop information system updated and extended. Nucleic Acids Res 41(Database-Issue):1172–1175

    CrossRef  Google Scholar 

  29. The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies [Internet]. HDF Group. [cited 2020 Dec 7]. https://www.hdfgroup.org/

  30. Nti-Addae Y, Matthews D, Ulat VJ, Syed R, Sempéré G, Pétel A et al (2019) Benchmarking database systems for genomic selection implementation. Database [Internet]. [cited 2020 Feb 18]. https://doi.org/10.1093/database/baz096/5566651

  31. South Green collaborators (2016) The south green portal: a comprehensive resource for tropical and Mediterranean crop genomics South Green collaborators. Curr Plant Biol 78:6–9

    Google Scholar 

Download references

Acknowledgments

This work was technically supported by the CIRAD-UMR AGAP and IRD-Itrop HPC Data Centers of the South Green Bioinformatics platform (https://www.southgreen.fr/). The authors would also like to thank Manuel Ruiz for providing the original idea, Valentin Guignon for developing a Gigwa module for Drupal, and the BrAPI community for interactions that helped improve and promote the software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Larmande .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Sempéré, G., Larmande, P., Rouard, M. (2022). Managing High-Density Genotyping Data with Gigwa. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 2443. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2067-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2067-0_21

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2066-3

  • Online ISBN: 978-1-0716-2067-0

  • eBook Packages: Springer Protocols