Abstract
The internet has provided a means to share scientific data across groups and disciplines for integrated research extending beyond the local computing environment. But the organization and curation of data pose challenges due to their sensitive nature (where data needs to be protected from unauthorized usage) as well as their heterogeneity and large volume, both in size and number. Moreover, the importance of metadata is coming to the fore, as a means of not only discovering datasets of interest but also for organizational purposes. SDSC has developed data management systems to facilitate use of published digital objects. The associated infrastructure includes persistent archives for managing technology evolution, data handling systems for collectionbased access to data, collection management systems for organizing information catalogs, digital library services for manipulating data sets, and data grids for federating multiple collections. The infrastructure components provide systems for digital object management, information management, and knowledge management. We discuss examples of the application of the technology, including distributed collections and data grids for astronomical sky surveys, high energy physics data collections, ecology, and art image digital libraries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Moore, R., C. Baru, P. Bourne, M. Ellisman, S. Karin, A. Rajasekar, S. Young: Information Based Computing. Proceedings of theWorkshop on Research Directions for the Next Generation Internet, May, 1997.
Jones, M.B: Web-based Data Management, In “Data and Information Management in the Ecological Sciences: A Resource Guide”, eds. W.K. Michener, J.H. Porter, S.G. Stafford, LTER Network Office, University of New Mexico, Albuquerque, New Mexico, 1998.
Ludaescher, B., A. Gupta, M.E. Martone: Model-Based Information Integration in a Neuroscience Mediator System. 26th Intl. Conference on Very Large Databases (demonstration track), September, 2000.
NPACI Data Intensive Computing Environment thrust area. http://www.npaci.edu/DICE/.
Extensible Markup Language (XML). http://www.w3.org/XML/.
Baru, C., V. Chu, A. Gupta, B. Ludascher, R. Marciano, Y. Papakonstantinou, and P. Velikhov: XML-Based Information Mediation for Digital Libraries. ACM Conf. On Digital Libraries (exhibition program), 1999.
Moore, R., C. Baru, A. Rajasekar, B. Ludascher, R. Marciano, M. Wan, W. Schroeder, and A. Gupta: Collection-Based Persistent Digital Archives-Part 1, D-Lib Magazine, (http://www.dlib.org/) March 2000.
Grid Forum Remote Data Access Working Group. http://www.sdsc.edu/GridForum/RemoteData/.
Moore, R., C. Baru, A. Rajasekar, R. Marciano, M. Wan: Data Intensive Computing, In “The Grid: Blueprint for a New Computing Infrastructure”, eds. I. Foster and C. Kesselman. Morgan Kaufmann, San Francisco, 1999.
Baru, C., R, Moore, A. Rajasekar, M. Wan: The SDSC Storage Resource Broker. Proc. CASCON’98 Conference, 1998 (see also http://www.npaci.edu/DICE/SRB).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rajasekar, A.K., Moore, R.W. (2001). Data and Metadata Collections for Scientific Applications. In: Hertzberger, B., Hoekstra, A., Williams, R. (eds) High-Performance Computing and Networking. HPCN-Europe 2001. Lecture Notes in Computer Science, vol 2110. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48228-8_8
Download citation
DOI: https://doi.org/10.1007/3-540-48228-8_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42293-8
Online ISBN: 978-3-540-48228-4
eBook Packages: Springer Book Archive