Scalable Repositories for Virtual Clusters

  • Paolo Anedda
  • Simone Leo
  • Massimo Gaggero
  • Gianluigi Zanetti
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6043)


For a large class of scientific data analysis applications it is becoming important, due to the sheer size of datasets, to have the option to perform the analysis directly where the data are stored, rather than on remote computational clusters. A possible strategy is the use of virtual clusters, thus guaranteeing a high degree of isolation from the underlying physical computational structure, and a very compact initial description. Deploying, saving and restoring HPC dedicated virtual clusters introduces, however, a different class of requirements on the virtual machines managing infrastructure, in particular for what concerns storage I/O requirements, whose scalability boundaries are easily reached. Here we discuss an alternative approach based on a storage model that leverages the WORM (write once, read many) character of the data used by VM management to increase, in a scalable way, the aggregate data bandwidth available to virtual cluster level operations and provide preliminary results indicating that it is a viable solution.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Borgman, C.L., Wallis, J.C., Mayernik, M.S., Pepe, A.: Drowning in data: digital library architecture to support scientific use of embedded sensor networks. In: 7th ACM/IEEE-CS joint conference on Digital libraries (2007)Google Scholar
  2. 2.
    Peng, H.: Bioimage informatics: a new area of engineering biology. Bioinformatics 24(17), 1827–1836 (2008)CrossRefGoogle Scholar
  3. 3.
    Editorial: Prepare for the deluge. Nature Biotechnology 26(10), 1099 (2008)CrossRefGoogle Scholar
  4. 4.
    Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: 19th ACM Symposium on Operating Systems Principles (2003)Google Scholar
  5. 5.
    Chisnall, D.: The Definitive Guide to the Xen Hypervisor. Prentice-Hall, Englewood Cliffs (2007)Google Scholar
  6. 6.
    Foster, I., Freeman, T., Keahey, K., Scheftner, D., Sotomayor, B., Zhang, X.: Virtual clusters for grid communities. In: 6th IEEE International Symposium on Cluster Computing and the Grid (2006)Google Scholar
  7. 7.
    Dean, J., Ghemawat, S.: MapReduce: Simplified DataProcessing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation (2004)Google Scholar
  8. 8.
    Leo, S., Anedda, P., Gaggero, M., Zanetti, G.: Using virtual clusters to decouple computation and data management in high throughput analysis applicationsGoogle Scholar
  9. 9.
    Schwan, P.: Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux Symposium (2003)Google Scholar
  10. 10.
    Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the First Conference on File and Storage Technologies (FAST), pp. 231–244 (2002)Google Scholar
  11. 11.
    Ruth, P., McGachey, P., Xu, D.: VioCluster: Virtualization for dynamic computational domains. IEEE International Cluster Computing (2005)Google Scholar
  12. 12.
    Keahey, K., Foster, I., Freeman, T., Zhang, X., Galron, D.: Virtual Workspaces in the Grid. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 421–431. Springer, Heidelberg (2005)Google Scholar
  13. 13.
    Kiyanclar, N., Koenig, G., Yurcik, W.: Maestro-VC: A paravirtualized execution environment for secure on-demand cluster computing. In: 6th IEEE International Symposium on Cluster Computing and the Grid Workshops (2006)Google Scholar
  14. 14.
    Nishimura, H., Maruyama, N., Matsuoka, S.: Virtual clusters on the fly – fast, scalable, and flexible installation. In: 7th IEEE International Symposium on Cluster Computing and the Grid (2007)Google Scholar
  15. 15.
    Begnum, K., Disney, M.: Scalable Deployment and Configuration of High-Performance Virtual Clusters. In: 3rd International Conference on Cluster and Grid Computing Systems (2006)Google Scholar
  16. 16.
    Carns, P., Ligon III, W., Ross, R., Thakur, R.: PVFS: a parallel file system for linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference (2000)Google Scholar
  17. 17.
    Ananthanarayanan, R., Gupta, K., Pandey, P., Pucha, H., Sarkar, P., Shah, M., Tewari, R.: Cloud analytics: do we really need to reinvent the storage stack? In: Workshop on Hot Topics in Cloud Computing (HotCloud ’09) (2009)Google Scholar
  18. 18.
    Lin, J., Bahety, A., Konda, S., Mahindrakar, S.: Low-latency, high-throughput access to static global resources within the Hadoop framework. Technical Report HCIL-2009-01, University of Maryland, Human-Computer Interaction Lab. (2009)Google Scholar
  19. 19.
    Gentzsch, W.: Sun grid engine: Towards creating a compute power grid. In: First IEEE/ACM International Symposium on Cluster Computing and the Grid (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Paolo Anedda
    • 1
  • Simone Leo
    • 1
  • Massimo Gaggero
    • 1
  • Gianluigi Zanetti
    • 1
  1. 1.CRS4 Distributed Computing GroupPulaItaly

Personalised recommendations