Advertisement

Journal of Grid Computing

, Volume 1, Issue 2, pp 101–116 | Cite as

Distributed Generation of NASA Earth Science Data Products

  • Bruce R. Barkstrom
  • Thomas H. Hinke
  • Shradha Gavali
  • Warren Smith
  • William J. Seufzer
  • Chaumin Hu
  • David E. Cordner
Article

Abstract

The objective of this work is the development of Grid-based approaches through which NASA data centers can become active participants in serving data users by transforming archived data into the specific form needed by the user. This approach involves generating custom data products from data stored in multiple NASA data centers. We describe a prototype developed to explore how Grid technology can facilitate this multi-center product generation. Our initial example of a custom data product is phenomena-based subsetting. This example involves production of a subset of a large collection of data based on the subset's association with some phenomena, such as a mesoscale convective system (severe storm) or a hurricane. We demonstrate that this subsetting can be performed on data located at a single data center or at multiple data centers. We also describe a system that performed customized data product generation using a combination of commodity processors deployed at a NASA data center, Grid technology to access these processors, and data mining software that intelligently selects where to perform processing based on data location and availability of compute resources. This demonstration also suggests that we could create a catalog of phenomena related data at multiple data centers, in which the catalog can contain references to the original data in different locations. The catalog is important to providing other users with efficient access to the data belonging to the identified phenomenon.

Grid mining phenomena-based subsetting product generation subsetting 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    W. Allcock, J. Bresnahan, I. Floster, L. Liming, J. Link, and P. Plaszczac, "GridFTP Update January 2002", Globus Project Technical Report, January 2002, http://www.globus. org/datagrid/deliverables/GridFTP-Overview-200201.pdf.Google Scholar
  2. 2.
    P. Avery and I. Foster, "The GriPhyN Project: Towards Petas-cale Virtual-Data Grids", Grid Physics Network GriPhyN 2001–14, April 17, 2000.Google Scholar
  3. 3.
    B.R. Barkstrom, "Digital Archive Issues from the Perspec-tive of an Earth Science Data Producer", paper presented at the Digital Archive Directions (DADS) Workshop, June 22–26, 1998, available at http://ssdoo.gsfc.nasa.gov/nost/isoas/ dads/dads21b.html.Google Scholar
  4. 4.
    B.R. Barkstrom, "Data Product Configuration Management and Versioning in Large-Scale Production of Satellite Scien-tific Data", in B. Westfechtel and A. van der Hoek (eds.), SCM 2001/2003, Lecture Notes in Computer Science, Vol. 2649, pp. 118–133, 2003.Google Scholar
  5. 5.
    C. Baru, R. Moore, A. Rajasekar, and M. Wan, "The SDSC Storage Resource Broker", in Proceedings of the CASCON'98, Toronto, Canada, 1998.Google Scholar
  6. 6.
    A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets", Journal of Network and Computer Applications, Vol. 23, pp. 187–200, 2001.Google Scholar
  7. 7.
    Commodity Grid Kits, http://www-unix.globus.org/cog/.Google Scholar
  8. 8.
    Data Mining and Exploration Middleware for Distributed and Grid Computing, University of Minnesota Supercomputing Institute, September 18–19, 2003, http://www.msi.umn.edu/ general/Symposia/dmem/agenda.htm.Google Scholar
  9. 9.
    W. Du and G. Agrawal, "Developing Distributed Data Mining Implementations for a Grid Environment", in Proceedings 2 nd IEEE/ACM International Symposium on Cluster Computing and the Grid, Berlin, Germany, May 2002.Google Scholar
  10. 10.
    K.I. Devlin, "Application of the 85 GHz Ice Scattering Sig-nature to a Global Study of Mesoscale Convective Systems", Master's thesis, Meteorology, Texas A&M University, August 1995.Google Scholar
  11. 11.
    I. Foster, J. Vockler, M. Wilde, and Y. Zhao, "The Virtual Data Grid: A New Model and Architecture for Data-Intensive Col-laboration", in Proceedings of the Conference on Innovative Data System Research, 2003.Google Scholar
  12. 12.
    A. Ghiselli, "DataGrid Prototype 1", in Proceedings of the TERENA Networking Conference, 2002.Google Scholar
  13. 13.
    Government Data Centers: Meeting Increasing Demands, National Research Council of the National Academies, Wash-ington, DC, 2003, http://www.nap.edu.Google Scholar
  14. 14.
    L. Guy, P. Kunszt, E. Laure, H. Stockinger, and K. Stockinger, "Replica Management in Data Grids", in Proceedings of the 5th Global Grid Forum Meeting, Edinburgh, Scotland, 2002.Google Scholar
  15. 15.
    Th.H. Hinke, J. Rushing, H. Ranganath, and S.J. Graves, "Techniques and Experience in Mining Remotely Sensed Satellite Data", Artificial Intelligence Review: Issues on the Application of Data Mining, Vol. 14, No. 6, pp. 503–531, December 2000.Google Scholar
  16. 16.
    Th.H. Hinke and J. Novotny, "Data Mining on NASA's Infor-mation Power Grid", in Proceedings Ninth IEEE International Symposium on High Performance Distributed Computing, Pittsburgh, Pennsylvania, August 2000.Google Scholar
  17. 17.
    Th.H. Hinke, J. Rushing, S. Kansal, S.J. Graves, H. Ran-ganath, and E. Criswell, "Eureka Phenomena Discovery and Phenomena Mining System", in Proceedings: 13th Interna-tional Conference on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrol-ogy, Long Beach, California, February 1997.Google Scholar
  18. 18.
    Th.H. Hinke, J. Rushing, S. Kansal, S.J. Graves, and H. Ranganath, "For Scientific Data Discovery: Why Can't the Archive be More Like the Web", in Proceedings Ninth In-ternational Conference on Scientific Database Management, Evergreen State College, Olympia, Washington, August 1997.Google Scholar
  19. 19.
    W. Johnston, D. Gannon, and B. Nitzberg, "Grids as Pro-duction Computing Environments: The Engineering Aspects of NASA's Information Power Grid", in Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 1999.Google Scholar
  20. 20.
    G.V. Laszewski, I. Foster, J. Gawor, W. Smith, and S. Tuecke, "CoG Kits: A Bridge between Commodity Distributed Com-puting and High-Performance Grids", in Proceedings of the ACM Java Grande Conference, 2000.Google Scholar
  21. 21.
    NASA Workshop on the Issues in the Application of Data Mining to Scientific Data, NASA Goddard Space Flight Center, 1999, http://datamining.itsc.uah.edu/meeting/ DMFinalReport.pdf.Google Scholar
  22. 22.
    M. Schwaller, B. Krupp, and W. North, "Particle Physics Data Grid", Science Data Plan for the EOS Data and Information System, Technical Report, Goddard Space Flight Center, July 1996, http://www.ppdg.net.Google Scholar
  23. 23.
    U.S. National Virtual Observatory, http://www.us-vo.org.Google Scholar
  24. 24.
    M. Wan, A. Rajasekar, R. Moore, and P. Andrew, "A Sim-ple Mass Storage System for the SRB Data Grid", 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Sys-tems & Technologies (MSST2003), San Diego, CA, April 2003.Google Scholar
  25. 25.
    Workshop on Data Mining and Exploration Middleware for Distributed and Grid Computing, University of Minnesota Supercomputing Center, September 2003.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Bruce R. Barkstrom
    • 1
  • Thomas H. Hinke
    • 2
  • Shradha Gavali
    • 3
  • Warren Smith
    • 4
  • William J. Seufzer
    • 1
  • Chaumin Hu
    • 3
  • David E. Cordner
    • 1
  1. 1.NASA Langley Research CenterUSA
  2. 2.NASA Ames Research CenterUSA
  3. 3.Advanced Management Technology Inc. and NASA Ames Research CenterUSA
  4. 4.Computer Science Corporation and NASA Ames Research CenterUSA

Personalised recommendations