Data and Metadata Collections for Scientific Applications
The internet has provided a means to share scientific data across groups and disciplines for integrated research extending beyond the local computing environment. But the organization and curation of data pose challenges due to their sensitive nature (where data needs to be protected from unauthorized usage) as well as their heterogeneity and large volume, both in size and number. Moreover, the importance of metadata is coming to the fore, as a means of not only discovering datasets of interest but also for organizational purposes. SDSC has developed data management systems to facilitate use of published digital objects. The associated infrastructure includes persistent archives for managing technology evolution, data handling systems for collectionbased access to data, collection management systems for organizing information catalogs, digital library services for manipulating data sets, and data grids for federating multiple collections. The infrastructure components provide systems for digital object management, information management, and knowledge management. We discuss examples of the application of the technology, including distributed collections and data grids for astronomical sky surveys, high energy physics data collections, ecology, and art image digital libraries.
KeywordsDigital Library Data Grid Access Control List Data Intensive Computing Information Management Technology
Unable to display preview. Download preview PDF.
- 1.Moore, R., C. Baru, P. Bourne, M. Ellisman, S. Karin, A. Rajasekar, S. Young: Information Based Computing. Proceedings of theWorkshop on Research Directions for the Next Generation Internet, May, 1997.Google Scholar
- 2.Jones, M.B: Web-based Data Management, In “Data and Information Management in the Ecological Sciences: A Resource Guide”, eds. W.K. Michener, J.H. Porter, S.G. Stafford, LTER Network Office, University of New Mexico, Albuquerque, New Mexico, 1998.Google Scholar
- 3.Ludaescher, B., A. Gupta, M.E. Martone: Model-Based Information Integration in a Neuroscience Mediator System. 26th Intl. Conference on Very Large Databases (demonstration track), September, 2000.Google Scholar
- 4.NPACI Data Intensive Computing Environment thrust area. http://www.npaci.edu/DICE/.
- 5.Extensible Markup Language (XML). http://www.w3.org/XML/.
- 6.Baru, C., V. Chu, A. Gupta, B. Ludascher, R. Marciano, Y. Papakonstantinou, and P. Velikhov: XML-Based Information Mediation for Digital Libraries. ACM Conf. On Digital Libraries (exhibition program), 1999.Google Scholar
- 7.Moore, R., C. Baru, A. Rajasekar, B. Ludascher, R. Marciano, M. Wan, W. Schroeder, and A. Gupta: Collection-Based Persistent Digital Archives-Part 1, D-Lib Magazine, (http://www.dlib.org/) March 2000.
- 8.Grid Forum Remote Data Access Working Group. http://www.sdsc.edu/GridForum/RemoteData/.
- 9.Moore, R., C. Baru, A. Rajasekar, R. Marciano, M. Wan: Data Intensive Computing, In “The Grid: Blueprint for a New Computing Infrastructure”, eds. I. Foster and C. Kesselman. Morgan Kaufmann, San Francisco, 1999.Google Scholar
- 10.Baru, C., R, Moore, A. Rajasekar, M. Wan: The SDSC Storage Resource Broker. Proc. CASCON’98 Conference, 1998 (see also http://www.npaci.edu/DICE/SRB).