Advertisement

High Throughput Comparison of Prokaryotic Genomes

  • Luciana Carota
  • Lisa Bartoli
  • Piero Fariselli
  • Pier L. Martelli
  • Ludovica Montanucci
  • Giorgio Maggi
  • Rita Casadio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4967)

Abstract

This work handles the optimization of the grid computing performances for a data-intensive and high ”throughput” comparison of protein sequences. We use the word ”throughput” from the telecommunication science to mean the amount of concurrent independent jobs in grid. All the proteins of 355 completely sequenced prokaryotic organisms were compared to find common traits of prokaryotic life, producing in parallel tens of Gigabytes of information to store, duplicate, check and analyze. For supporting a large amount of concurrent runs with data access on shared storage devices and a manageable data format, the output information was stored in many flat files according to a semantic logical/physical directory structure. As many concurrent runs could cause reading bottleneck on the same storage device, we propose methods to optimize the grid computing based on the balance between wide data access and emergence of reading bottlenecks. The proposed analytical approach has the following advantages: not only it optimizes the duration of the overall task, but also checks if the estimated duration is compliant with the scientific requirements and if the related grid computing is really advantageous compared to an execution on a local farm.

Keywords

Virtual Organization Prokaryotic Genome Service Probability Blast Comparison Blast Output 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucleic Acids Res. 34(Database issue), D16–D20 (2006)CrossRefGoogle Scholar
  2. 2.
    Ivakhno, S.: From functional genomics to systems biology. FEBS J. 274(10), 2439–2448 (2007)CrossRefGoogle Scholar
  3. 3.
    Marsden, R.L., Lewis, T.A., Orengo, C.A.: Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics 8, 86 (2007)CrossRefGoogle Scholar
  4. 4.
    Bateman, A., Valencia, A.: Structural genomics meets computational biology. Bioinformatics 22(19), 2319 (2006)CrossRefGoogle Scholar
  5. 5.
  6. 6.
  7. 7.
    Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications 15(3) (2001), http://www.globus.org/alliance/publications/papers/anatomy.pdf
  8. 8.
    Foster, I., Kesselman, C., Tuecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Open Grid Service Infrastructure WG, Global Grid Forum (June 22, 2002), http://www.globus.org/alliance/publications/papers/ogsa.pdf
  9. 9.
    gLite - Lightweight Middleware for Grid Computing, http://glite.web.cern.ch/glite/
  10. 10.
    gLite User Guide. Data Management,107:130, https://edms.cern.ch/file/722398/1.1/gLite-3-UserGuide.pdf
  11. 11.
    Altshul, S.F., Gish, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)Google Scholar
  12. 12.
    EGEE User s Guide. WMS Proxy Service, https://edms.cern.ch/file/674643/1/WMPROXY-guide.pdf
  13. 13.
    EGEE User s Guide. JDL Attributes Specification (submission via WMS WMProxy), https://edms.cern.ch/file/590869/1/EGEE-JRA1-TEC-590869-JDL-Attributes-v0-8.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Luciana Carota
    • 1
  • Lisa Bartoli
    • 2
  • Piero Fariselli
    • 2
  • Pier L. Martelli
    • 2
  • Ludovica Montanucci
    • 2
  • Giorgio Maggi
    • 3
  • Rita Casadio
    • 2
  1. 1.CNAF-INFN, National Institute of Nuclear PhysicsBolognaItaly
  2. 2.Biocomputing GroupUniversity of BolognaItaly
  3. 3.BA-INFN, National Institute of Nuclear PhysicsBariItaly

Personalised recommendations