Skip to main content

Multicore and Cloud-Based Solutions for Genomic Variant Analysis

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7640)

Abstract

Genomic variant analysis is a complex process that allows to find and study genome mutations. For this purpose, analysis and tests from both biological and statistical points of view must be conducted. Biological data for this kind of analysis are typically stored according to the Variant Call Format (VCF), in gigabytes-sized files that cannot be efficiently processed using conventional software.

In this paper, we introduce part of the High Performance Genomics (HPG) project, whose goal is to develop a collection of efficient and open-source software applications for the genomics area. The paper is mainly focused on HPG Variant, a suite that allows to get the effect of mutations and to conduct genomic-wide and family-based analysis, using a multi-tier architecture based on CellBase Database and a RESTful web service API. Two user clients are also provided: an HTML5 web client and a command-line interface, both using a back-end parallelized using OpenMP. Along with HPG Variant, a library for VCF files handling and a collection of utilities for VCF files preprocessing have been developed.

Positive performance results are shown in comparison with other applications such as PLINK, GenABEL, SNPTEST or VCFtools.

Keywords

  • Multicore
  • OpenMP
  • web service
  • genomic variant analysis
  • mutation

References

  1. Majewski, J., Schwartzentruber, J., Lalonde, E., Montpetit, A., Jabado, N.: What can exome sequencing do for you? J. Med. Genet. 48(9), 580–589 (2011)

    CrossRef  Google Scholar 

  2. Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R., Lunter, G., Marth, G., Sherry, S.T., McVean, G., Durbin, R., 1000 Genomes Project Analysis Group: The Variant Call Format and VCFtools. Bioinformatics 27, 2156–2158 (2011)

    Google Scholar 

  3. Tarraga, J., Gonzalez, C.Y., Requena, V., Dopazo, J., Medina, I.: High Performance Genomics (HPG) Project, http://docs.bioinfo.cipf.es/projects/hpg-project

  4. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., Sham, P.C.: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81(3), 559–575 (2007)

    CrossRef  Google Scholar 

  5. Purcell, S.: PLINK v1.07, http://pngu.mgh.harvard.edu/purcell/plink/

  6. Aulchenko, Y.S., Ripke, S., Isaacs, A., van Duijn, C.M.: GenABEL: an R library for genome-wide association analysis. Bioinformatics 23(10), 1294–1296 (2007)

    CrossRef  Google Scholar 

  7. Marchini, J., Howie, B.: Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11(7), 499–511 (2010)

    CrossRef  Google Scholar 

  8. Medina, I., De María, A., Bleda, M., Salavert, F., Alonso, R., Gonzalez, C.Y., Dopazo, J.: VARIANT: Command Line, Web service, and Web interface for fast and accurate functional characterization of variants found by Next Generation Sequencing. Nucleic Acids Res., Web Server Issue 40(W1), W54–W58 (2012), http://docs.bioinfo.cipf.es/projects/variant/wiki/

  9. Bleda, M., Tarraga, J., De Maria, A., Salavert, F., Garcia-Alonso, L., Celma, M., Martin, A., Dopazo, J., Medina, I.: CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res., Web Server Issue 40(W1), W609–W614 (2012)

    Google Scholar 

  10. Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA (2009)

    Google Scholar 

  11. Bleda, M., Tarraga, J., De Maria, A., Salavert, F., Garcia-Alonso, L., Celma, M., Martin, A., Dopazo, J., Medina, I.: CellBase v1, http://docs.bioinfo.cipf.es/projects/cellbase/wiki

  12. Medina, I., De Maria, A., Alonso, R., Salavert, F., Dopazo, J.: Genome Maps, a new generation of genome browser based on HTML5. Unpublished work, http://genomemaps.org/

  13. Thurston, A.D.: Parsing Computer Languages with an Automaton Compiled from a Single Regular Expression. In: Ibarra, O.H., Yen, H.-C. (eds.) CIAA 2006. LNCS, vol. 4094, pp. 285–286. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  14. Anderson, C.A., Pettersson, F.H., Clarke, G.M., Cardon, L.R., Morris, A.P., Zondervan, K.T.: Data quality control in genetic case-control association studies. Nat. Protoc. 5, 1564–1573 (2010)

    CrossRef  Google Scholar 

  15. Ott, J., Kamatani, Y., Lathrop, M.: Family-based designs for genome-wide association studies. Nat. Rev. Genet. 12, 465–474 (2011)

    CrossRef  Google Scholar 

  16. Clarke, G.M., Anderson, C.A., Pettersson, F.H., Cardon, L.R., Morris, A.P., Zondervan, K.T.: Basic statistical analysis in genetic case-control studies. Nat. Protoc. 6(2), 121–133 (2011)

    CrossRef  Google Scholar 

  17. Bolk Gabriel, S., Salomon, R., Pelet, A., Angrist, M., Amiel, J., Fornage, M., Attie-Bitach, T., Olson, J.M., Hofstra, R., Buys, C., Steffann, J., Munnich, A., Lyonnet, S., Chakravarti, A.: Segregation at three loci explains familial and population risk in Hirschsprung disease. Nature Genet. 31, 89–93 (2002)

    Google Scholar 

  18. Emison, E.S., McCallion, A.S., Kashuk, C.S., Bush, R.T., Grice, E., Lin, S., Portnoy, M.E., Cutler, D.J., Green, E.D., Chakravarti, A.: A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857–863 (2005)

    CrossRef  Google Scholar 

  19. Laird, N., Horvath, S., Xu, X.: Implementing a unified approach to family based tests of association. Genet. Epidemiol. 19(suppl. 1), S36–S42 (2000)

    Google Scholar 

  20. Dudbridge, F.: Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum. Hered. 66, 87–98 (2008)

    CrossRef  Google Scholar 

  21. Purcell, S.: PLINK/SEQ v0.08, http://atgu.mgh.harvard.edu/plinkseq/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

González, C.Y., Bleda, M., Salavert, F., Sánchez, R., Dopazo, J., Medina, I. (2013). Multicore and Cloud-Based Solutions for Genomic Variant Analysis. In: , et al. Euro-Par 2012: Parallel Processing Workshops. Euro-Par 2012. Lecture Notes in Computer Science, vol 7640. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36949-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36949-0_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36948-3

  • Online ISBN: 978-3-642-36949-0

  • eBook Packages: Computer ScienceComputer Science (R0)