Scale and performance in semantic storage management of data grids

  • Stergios V. Anastasiadis
  • Syam Gadde
  • Jeffrey S. Chase
Regular contribution

Abstract

Data grids are middleware systems that offer secure shared storage of massive scientific datasets over wide area networks. The main challenge in their design is to provide reliable storage, search, and transfer of numerous or large files over geographically dispersed heterogeneous platforms. The Storage Resource Broker (SRB) is an example of a system that provides these services and that has been deployed in multiple high-performance scientific projects during the past few years. In this paper, we take a detailed look at several of its functional features and examine its scalability using synthetic and trace-based workloads. Unlike traditional file systems, SRB uses a commodity database to manage both system- and user-defined metadata. We quantitatively evaluate this decision and draw insightful conclusions about its implications to the system architecture and performance characteristics. We find that the bulk transfer facilities of SRB demonstrate good scalability properties, and we identify the bottleneck resources across different data search and transfer tasks. We examine the sensitivity to several configuration parameters and provide details about how different internal operations contribute to the overall performance.

Keywords

Data grids Middleware systems Distributed storage systems Semantic Web  

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adya A, Bolosky WJ, Castro M, Cermak G, Chaiken R, Douceur JR, Howell J, Lorch JR, Theimer M, Wattenhofer RP (2002) FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In: USENIX symposium on operating systems design and implementation, Boston, December 2002, pp 1–14Google Scholar
  2. 2.
    Allcock B, Bester J, Bresnahan J, Chervenak AL, Foster I, Kesselman C, Meder S, Nefedova V, Quesnal D, Tuecke S (2002) Data Management and Transfer in High Performance Computational Grid Environments. Parallel Computing Journal 28(5):749–771, MayCrossRefGoogle Scholar
  3. 3.
    Avaki Corporation (2003) Keep it simple: overcome information integration challenges with Avaki Data Grid software. Technical report, July 2003Google Scholar
  4. 4.
    Baru C, Moore R, Rajasekar A, Wan M (1998) The SDSC Storage Resource Broker. In: IBM CASCON, Toronto, November 1998Google Scholar
  5. 5.
    Bell K, Chien A, Lauria M (2002) A high-performance cluster storage server. In: IEEE international symposium on high performance distributed computing, Edinburgh, UK, July 2002, pp 311–320Google Scholar
  6. 6.
    Bent J, Venkataramani V, Leroy N, Roy A, Stanley J, Arpaci-Dusseau AC, Arpaci-Dusseau RH, Livny M (2002) Flexibility, manageability, and performance in a grid storage appliance. In: IEEE international symposium on high performance distributed computing, Edinburgh, UI, July 2002, pp 3–12Google Scholar
  7. 7.
    Berners-Lee, Hendler J, Lassila O (2001) The Semantic Web. Sci Am 284(5):34–43Google Scholar
  8. 8.
    Beynon M, Ferreira R, Kurc TM, Sussman A, Saltz JH (2000) DataCutter: middleware for filtering very large scientific datasets on archival storage systems. In: IEEE symposium on mass storage systems, College Park, MD, March 2000, pp 119–134Google Scholar
  9. 9.
    Brandt SA, Miller EL, Long DD, Xue L (2003) Efficient metadata management in large distributed storage systems. In: IEEE/NASA Goddard conference on mass storage systems and technologies, San Diego, April 2003, pp 290–298Google Scholar
  10. 10.
    Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30(1–7):107–117CrossRefGoogle Scholar
  11. 11.
    Carson M, Santay D (2003) NIST Net – A Linux-based network emulation tool. ACM Comput Commun Rev 33(3)Google Scholar
  12. 12.
    Clarke I, Sandberg O, Wiley B, Hong TW (2000) Freenet: a distributed anonymous information storage and retrieval system. In: Workshop on design issues in anonymity and unobservability, Berkeley, CA, pp 311–320, July 2000Google Scholar
  13. 13.
    Fielding RT Jr, Whitehead EJ Jr, Anderson KM, Bolcer GA, Oreizy P, Taylor RN (1998) Web-based development of complex information products. Commun ACM 41(8):84–92CrossRefGoogle Scholar
  14. 14.
    Foster I, Kesselman C, Tsudik G, Tuecke S (1998) A security architecture for computational grids. In: ACM conference on computer and communication security, San Francisco, November, pp 83–92Google Scholar
  15. 15.
    Howard JH, Kazar ML, Menees SG, Nichols DA, Satyanarayanan M, Sidebotham RN, West MJ (1988) Scale and performance in a distributed file system. ACM Trans Comput Syst 6(1):51–81CrossRefGoogle Scholar
  16. 16.
    Kubiatowicz J, Bindel D, Chen Y, Eaton P, Geels D, Gummadi R, Rhea S, Weatherspoon H, Weimer W, Wells C, Zhao B (2000) OceanStore: an architecture for global-scale persistent storage. In: ACM symposium on architectural support for programming languages and operating systems, Cambridge, MA, November, pp 190–201Google Scholar
  17. 17.
    Muthitacharoen A, Morris R, Gil TM, Chen B (2002) Ivy: a read/write peer-to-peer file system. In: USENIX symposium on operating systems design and implementation, Boston, December, pp 31–44Google Scholar
  18. 18.
    Nallipogu E, Ozguner F, Lauria M (2002) Improving the throughput of remote storage access through pipelining. In: International workshop on grid computing, Baltimore, MD, November, pp 305–316Google Scholar
  19. 19.
    Rajasekar A, Wan M, Moore R (2002) MySRB & SRB: Components of a data grid. In: IEEE international symposium on high performance distributed computing, Edinburgh, UK, July 2002, pp 301–310Google Scholar
  20. 20.
    Rajasekar A, Wan M, Moore R, Jagatheesan A, Kremenek G (2002) Real-life experiences with data grids: case studies using the SRB. In: International conference on high performance computing – HPCAsia, Bangalore, India, December 2002Google Scholar
  21. 21.
    Rowstron A, Druschel P (2001) Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In: ACM symposium on operating systems principles, Banff, Alberta, Canada, October 2001, pp 188–201Google Scholar
  22. 22.
    Saito Y, Karamanolis C, Karlsson M, Mahalingham M (2002) Taming aggressive replication in the Pangaia wide-area file system. In: USENIX symposium on operating systems design and implementation, Boston, December 2002, pp 15–30Google Scholar
  23. 23.
    Singh G, Bharathi S, Chervenak A, Deelman E, Kesselman C, Manohar M, Patil S, Pearlman L (2003) A metadata catalog service for data intensive applications. In: ACM supercomputing conference, Phoenix, AZ, November 2003Google Scholar
  24. 24.
    Stevens R, Robinson A, Goble C (2003) myGrid: personalised bioinformatics on the information grid. Bioinformatics 19(1):302–304CrossRefGoogle Scholar
  25. 25.
    Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for Internet applications. In: ACM SIGCOMM, San Diego, August 2001, pp 149–160Google Scholar
  26. 26.
    Tyler T, Fisher D (1995) Using distributed OLTP technology in a high performance storage system. In: IEEE symposium on mass storage systems, Monterey, CA, September 1995, pp 45–45Google Scholar
  27. 27.
    Welch V, Siebenlist F, Foster I, Bresnahan J, Czajkowski K, Gawor J, Kesselman C, Meder S, Pearlman L, Tuecke S (2003) Security for grid services. In: International symposium on high performance distributed computing, Seattle, June 2003, pp 48–57Google Scholar
  28. 28.
    Wilkes J (2003) Data services – from data to containers. Keynote at the USENIX conference for file and storage technologies, San Francisco, March 2003Google Scholar
  29. 29.
    Xu Z, Karlsson M, Tang C, Karamanolis C (2003) Towards a semantic-aware file store. In: Workshop on hot topics in operating systems, Lihue, HI, May 2003, pp 145–150Google Scholar

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  • Stergios V. Anastasiadis
    • 1
    • 2
  • Syam Gadde
    • 2
  • Jeffrey S. Chase
    • 1
  1. 1.Department of Computer ScienceDuke UniversityDurhamUSA
  2. 2.Duke-UNC Brain Imaging and Analysis CenterDurhamUSA

Personalised recommendations