Advertisement

Multicore and Cloud-Based Solutions for Genomic Variant Analysis

  • Cristina Y. González
  • Marta Bleda
  • Francisco Salavert
  • Rubén Sánchez
  • Joaquín Dopazo
  • Ignacio Medina
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)

Abstract

Genomic variant analysis is a complex process that allows to find and study genome mutations. For this purpose, analysis and tests from both biological and statistical points of view must be conducted. Biological data for this kind of analysis are typically stored according to the Variant Call Format (VCF), in gigabytes-sized files that cannot be efficiently processed using conventional software.

In this paper, we introduce part of the High Performance Genomics (HPG) project, whose goal is to develop a collection of efficient and open-source software applications for the genomics area. The paper is mainly focused on HPG Variant, a suite that allows to get the effect of mutations and to conduct genomic-wide and family-based analysis, using a multi-tier architecture based on CellBase Database and a RESTful web service API. Two user clients are also provided: an HTML5 web client and a command-line interface, both using a back-end parallelized using OpenMP. Along with HPG Variant, a library for VCF files handling and a collection of utilities for VCF files preprocessing have been developed.

Positive performance results are shown in comparison with other applications such as PLINK, GenABEL, SNPTEST or VCFtools.

Keywords

Multicore OpenMP web service genomic variant analysis mutation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Majewski, J., Schwartzentruber, J., Lalonde, E., Montpetit, A., Jabado, N.: What can exome sequencing do for you? J. Med. Genet. 48(9), 580–589 (2011)CrossRefGoogle Scholar
  2. 2.
    Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R., Lunter, G., Marth, G., Sherry, S.T., McVean, G., Durbin, R., 1000 Genomes Project Analysis Group: The Variant Call Format and VCFtools. Bioinformatics 27, 2156–2158 (2011)Google Scholar
  3. 3.
    Tarraga, J., Gonzalez, C.Y., Requena, V., Dopazo, J., Medina, I.: High Performance Genomics (HPG) Project, http://docs.bioinfo.cipf.es/projects/hpg-project
  4. 4.
    Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., Sham, P.C.: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81(3), 559–575 (2007)CrossRefGoogle Scholar
  5. 5.
  6. 6.
    Aulchenko, Y.S., Ripke, S., Isaacs, A., van Duijn, C.M.: GenABEL: an R library for genome-wide association analysis. Bioinformatics 23(10), 1294–1296 (2007)CrossRefGoogle Scholar
  7. 7.
    Marchini, J., Howie, B.: Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11(7), 499–511 (2010)CrossRefGoogle Scholar
  8. 8.
    Medina, I., De María, A., Bleda, M., Salavert, F., Alonso, R., Gonzalez, C.Y., Dopazo, J.: VARIANT: Command Line, Web service, and Web interface for fast and accurate functional characterization of variants found by Next Generation Sequencing. Nucleic Acids Res., Web Server Issue 40(W1), W54–W58 (2012), http://docs.bioinfo.cipf.es/projects/variant/wiki/
  9. 9.
    Bleda, M., Tarraga, J., De Maria, A., Salavert, F., Garcia-Alonso, L., Celma, M., Martin, A., Dopazo, J., Medina, I.: CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res., Web Server Issue 40(W1), W609–W614 (2012)Google Scholar
  10. 10.
    Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA (2009)Google Scholar
  11. 11.
    Bleda, M., Tarraga, J., De Maria, A., Salavert, F., Garcia-Alonso, L., Celma, M., Martin, A., Dopazo, J., Medina, I.: CellBase v1, http://docs.bioinfo.cipf.es/projects/cellbase/wiki
  12. 12.
    Medina, I., De Maria, A., Alonso, R., Salavert, F., Dopazo, J.: Genome Maps, a new generation of genome browser based on HTML5. Unpublished work, http://genomemaps.org/
  13. 13.
    Thurston, A.D.: Parsing Computer Languages with an Automaton Compiled from a Single Regular Expression. In: Ibarra, O.H., Yen, H.-C. (eds.) CIAA 2006. LNCS, vol. 4094, pp. 285–286. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Anderson, C.A., Pettersson, F.H., Clarke, G.M., Cardon, L.R., Morris, A.P., Zondervan, K.T.: Data quality control in genetic case-control association studies. Nat. Protoc. 5, 1564–1573 (2010)CrossRefGoogle Scholar
  15. 15.
    Ott, J., Kamatani, Y., Lathrop, M.: Family-based designs for genome-wide association studies. Nat. Rev. Genet. 12, 465–474 (2011)CrossRefGoogle Scholar
  16. 16.
    Clarke, G.M., Anderson, C.A., Pettersson, F.H., Cardon, L.R., Morris, A.P., Zondervan, K.T.: Basic statistical analysis in genetic case-control studies. Nat. Protoc. 6(2), 121–133 (2011)CrossRefGoogle Scholar
  17. 17.
    Bolk Gabriel, S., Salomon, R., Pelet, A., Angrist, M., Amiel, J., Fornage, M., Attie-Bitach, T., Olson, J.M., Hofstra, R., Buys, C., Steffann, J., Munnich, A., Lyonnet, S., Chakravarti, A.: Segregation at three loci explains familial and population risk in Hirschsprung disease. Nature Genet. 31, 89–93 (2002)Google Scholar
  18. 18.
    Emison, E.S., McCallion, A.S., Kashuk, C.S., Bush, R.T., Grice, E., Lin, S., Portnoy, M.E., Cutler, D.J., Green, E.D., Chakravarti, A.: A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857–863 (2005)CrossRefGoogle Scholar
  19. 19.
    Laird, N., Horvath, S., Xu, X.: Implementing a unified approach to family based tests of association. Genet. Epidemiol. 19(suppl. 1), S36–S42 (2000)Google Scholar
  20. 20.
    Dudbridge, F.: Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum. Hered. 66, 87–98 (2008)CrossRefGoogle Scholar
  21. 21.
    Purcell, S.: PLINK/SEQ v0.08, http://atgu.mgh.harvard.edu/plinkseq/

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Cristina Y. González
    • 1
  • Marta Bleda
    • 1
    • 2
  • Francisco Salavert
    • 1
    • 2
  • Rubén Sánchez
    • 1
  • Joaquín Dopazo
    • 1
    • 2
    • 3
  • Ignacio Medina
    • 1
    • 3
  1. 1.Computational Genomics InstituteCentro de Investigación Príncipe Felipe (CIPF)ValenciaSpain
  2. 2.CIBER de Enfermedades Raras (CIBERER)ValenciaSpain
  3. 3.Functional Genomics NodeInstituto Nacional de Bioinformática (INB) at CIPFValenciaSpain

Personalised recommendations