Distributed and Parallel Databases

, Volume 30, Issue 5, pp 325–350

ROARS: a robust object archival system for data intensive scientific computing

  • Hoang Bui
  • Peter Bui
  • Patrick Flynn
  • Douglas Thain
Article

DOI: 10.1007/s10619-012-7103-5

Cite this article as:
Bui, H., Bui, P., Flynn, P. et al. Distrib Parallel Databases (2012) 30: 325. doi:10.1007/s10619-012-7103-5
  • 246 Views

Abstract

As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides both large, robust, scalable storage and efficient rich metadata queries for scientific applications. In this paper, we present the design and implementation of ROARS, focusing primarily on the challenge of maintaining data integrity across long time scales. We evaluate the performance of ROARS on a storage cluster, comparing to the Hadoop distributed file system and a centralized file server. We observe that ROARS has read and write performance that scales with the number of storage nodes, and integrity checking that scales with the size of the largest node. We demonstrate the ability of ROARS to function correctly through multiple system failures and reconfigurations. ROARS has been in production use for over three years as the primary data repository for a biometrics research lab at the University of Notre Dame.

Keywords

Distributed storageDistributed systemArchive system

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Hoang Bui
    • 1
  • Peter Bui
    • 1
  • Patrick Flynn
    • 1
  • Douglas Thain
    • 1
  1. 1.University of Notre DameNotre DameUSA