Advertisement

Web Services Interface to Run Protein Sequence Tools on Grid, Testcase of Protein Sequence Alignment

  • Christophe Blanchet
  • Christophe Combet
  • Vladimir Daric
  • Gilbert Deléage
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4345)

Abstract

Bioinformatics analysis of data produced by high-throughput biology, for instance genome projects, is one of the major challenges for the next years. Some of the requirements of this analysis are to access up-to-date databanks (of sequences, patterns, 3D structures, etc.) and relevant algorithms (for sequence similarity, multiple alignment, pattern scanning, etc.). GPS@ is a Web portal devoted to bioinformatics applications on the grid (Grid Protein Sequence Analysis, http://gpsa-pbil.ibcp.fr). GPS@ is the grid release of the NPS@ bioinformatics portal, and is wrapping the mechanisms required for submitting bioinformatics analyses on the grid infrastructure. For example, we have put online two multiple alignment Web Services that are submitting the computing job on a remote grid environment. One is accessible through a classical Web interface by using a simple Web browser; the other one can be used through a SOAP and workflow client such as Taverna or Triana. These Web services can process the submitted alignment on two different computing environments: a local and classical one which is a cluster of 30 CPUs, but we are also providing biologists with a large-scale distributed one: the grid platform of the EU-EGEE project (more than 20,000 CPUs available at the European scale).

Keywords

Bioinformatics Grid computing Web Services Protein Sequence Analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bernal, A., Ear, U., Kyrpides, N.: Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. NAR 29, 126–127 (2001)CrossRefGoogle Scholar
  2. 2.
    Perrière, G., Combet, C., Penel, S., Blanchet, C., Thioulouse, J., Geourjon, C., Grassot, J., Charavay, C., Gouy, M., Duret, L., Deléage, G.: Integrated databanks access and sequence/structure analysis services at the PBIL. Nucleic Acids Res. 31, 3393–3399 (2003)CrossRefGoogle Scholar
  3. 3.
    Fox, J.A., McMillan, S., Ouellette, B.F.: A compilation of molecular biology web servers: 2006 update on the Bioinformatics Links Directory. Nucleic Acids Res. 34 (Web Server Issue) W3–5 (2006)Google Scholar
  4. 4.
    Combet, C., Blanchet, C., Geourjon, C., Deléage, G.: NPS@: Network Protein Sequence Analysis. Tibs 25, 147–150 (2000)Google Scholar
  5. 5.
    Bioinformatics Links Directory, online at: bioinformatics.ubc.ca/resources/links_directory
  6. 6.
    Blanchet, C., Combet, C., Geourjon, C., Deléage, G.: MPSA: Integrated System for Multiple Protein Sequence Analysis with client/server capabilities. Bioinformatics 16, 286–287 (2000)CrossRefGoogle Scholar
  7. 7.
    Deleage, G., Combet, C., Blanchet, C., Geourjon, C.: ANTHEPROT: an integrated protein sequence analysis software with client/server capabilities. Comput. Biol. Med. 31, 259–267 (2001)CrossRefGoogle Scholar
  8. 8.
    Combet, C., Penin, F., Geourjon, C., Deleage, G.: HCVDB: Hepatitis C Virus Sequences Database. Appl. Bioinformatics 3(4), 237–240 (2004)CrossRefGoogle Scholar
  9. 9.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic. Acids. Res. 22, 4673–4680 (1994)CrossRefGoogle Scholar
  10. 10.
    Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)CrossRefGoogle Scholar
  11. 11.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic. Acids. Res. 34(suppl_2), W729–W732 (2006)CrossRefGoogle Scholar
  12. 12.
    Taylor, I., Shields, M., Wang, I., Harrison, A.: Visual Grid Workflow in Triana. Journal of Grid Computing 3(3-4), 153–169 (2005)CrossRefGoogle Scholar
  13. 13.
    Foster, I., Kesselman, C. (eds.): The Grid 2: Blueprint for a New Computing Infrastructure (2004)Google Scholar
  14. 14.
    Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience. Concurrency and Computation 17, 323–356 (2005)CrossRefGoogle Scholar
  15. 15.
    Vicat-Blanc P.P., d’Anfray, P., Blanchet, C., Chanussot, F.: e-Toile: High Performance Grid Middleware. In: Proceedings of Cluster 2003 (2003)Google Scholar
  16. 16.
    Enabling Grid for E-sciencE (EGEE), http://www.eu-egee.org
  17. 17.
    European DataGrid project (EDG), http://www.eu-datagrid.org
  18. 18.
    Blanchet, C., Combet, C., Deléage, G.: Integrating Bioinformatics Resources on the EGEE Grid Platform. ccgrid. In: Sixth IEEE International Symposium on Cluster Computing and the Grid Workshops (CCGRIDW 2006), p. 48 (2006)Google Scholar
  19. 19.
    Blanchet, C., Lefort, V., Combet, C., Deléage, G.: GPS@ Bioinformatics Portal: from Network to EGEE Grid. Stud. Health Technol. Inform. 120, 187–193 (2006)Google Scholar
  20. 20.
    Desprez, F., Vernois, A., Blanchet, C.: Simultaneous Scheduling of Replication and Computation for Bioinformatic Applications on the Grid. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds.) ISBMDA 2005. LNCS (LNBI), vol. 3745, pp. 262–273. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Christophe Blanchet
    • 1
  • Christophe Combet
    • 1
  • Vladimir Daric
    • 1
  • Gilbert Deléage
    • 1
  1. 1.Institut de Biologie et Chimie des Protéines (IBCP UMR 5086); CNRSUniv. Lyon 1;, IFR128 BioSciences Lyon-GerlandLyonFrance

Personalised recommendations