Data Management in Dynamic Environment-driven Computational Science

  • Yogesh L. Simmhan
  • Sangmi Lee Pallickara
  • Nithya N. Vijayakumar
  • Beth Plale
Part of the IFIP The International Federation for Information Processing book series (IFIPAICT, volume 239)


Advances in numerical modeling, computational hardware and problem solving environments have driven the growth of computational science over the past decades. Science gateways, based on service oriented architectures and scientific workflows, provide yet another step in democratizing access to advanced numerical and scientific tools, computational resource and massive data storage, and fostering collaborations. Dynamic, data-driven applications, such as those found in weather forecasting, present interesting challenges to Science Gateways, which are being addressed as part of the LEAD Cyberinfrastructure project. In this article, we discuss three important data related problems faced by such adaptive data-driven environments: managing a user’s personal workspace and metadata on the Grid, tracking the provenance of scientific workflows and data products, and continuous data mining over observational weather data.

Key words

LEAD science gateways cyberinfrastructure data & metadata management provenance data quality data mining streams 


  1. 1.
    Tony Andrews, Francisco Curbera, Hitesh Dholakia, Yaron Goland, Johannes Klein, Frank Leymann, Kevin Liu, Dieter Roller, Doug Smith, Satish Thatte, Ivana Trickovic, and Sanjiva Weerawarana. Business Process Execution Language for Web Services Version 1.1. BEA Systems and International Business Machines Corporation andMicrosoft Corporation and SAP AG and Siebel Systems, 2003.Google Scholar
  2. 2.
    Mario Antonioletti, Malcolm Atkinson, Rob Baxter, Andrew Borley, Neil P. Chue Hong, Brian Collins, Neil Hardman, Alastair C. Hume, Alan Knox, Mike Jackson, Amy Krause, Simon Laws, James Magowan, Norman W. Paton, Dave Pearson, Tom Sugden, Paul Watson, and Martin Westhead. The design and implementation of grid database services in ogsa-dai: Research articles. Concurrency and Computation: Practice and Experience, 17(2-4): 357–376, 2005.CrossRefGoogle Scholar
  3. 3.
    Rob Armstrong, Dennis Gannon, AI Geist, Katarzyna Keahey, Scott Kohn, Lois Mclnnes, Steve Parker, and Brent Smolinski. Toward a common component architecture for high-performance scientific computing. In High Performance Distributed Computing Conference, 1999.Google Scholar
  4. 4.
    Gordon Bell, Jim Gray, and Alex Szalay. Petascale computational systems. Computer, 39(l): 110–112, 2006.CrossRefGoogle Scholar
  5. 5.
    Rajendra Bose and James Frew. Lineage Retrieval for Scientific Data Processing: A Survey. ACM Computing Surveys, 37(1): 128, 2005.CrossRefGoogle Scholar
  6. 6.
    Charlie Catlett. The TeraGrid: A Primer. TeraGrid, 2002.Google Scholar
  7. 7.
    Ann Chervenak, Robert Schuler, Carl Kesselman, Scott Koranda, and Brian Moe. Wide area data replication for scientific collaborations. In Workshop on Grid Computing, 2005.Google Scholar
  8. 8.
    Ben Domenico, John Caron, Ethan Davis, Robb Kambic, and Stefano Nativi. Thematic real-time environmental distributed data services (thredds): Incorporating interactive analysis tools into nsdl. Digital Information, 2(4), 2002.Google Scholar
  9. 9.
    Kelvin K. Droegemeier, Dennis Gannon, Daniel Reed, Beth Plale, Jay Alameda, Tom Baltzer, Keith Brewster, Richard Clark, Ben Domenico, Sara Graves, Everette Joseph, Donald Murray, Rahul Ramachandran, Mohan Ramamurthy, Lavanya Ramakrishnan, John A. Rushing, Daniel Weber, Robert Wilhelmson, Anne Wilson, Ming Xue, and Sepideh Yalda. Service-oriented environments for dynamically interacting with mesoscale weather. Computing in Science and Engineering, 7(6): 12–29, 2005.CrossRefGoogle Scholar
  10. 10.
    Ian Foster, Hiro Kishimoto, Andreas Savva, Dave Berry, Andrew Grimshaw, Bill Horn, Fred Maciel, Frank Siebenlist, Ravi Subramaniam, Jem Tread well, and Jeffrin Von Reich. The Open Grid Services Architecture, Version 1.5. Global Grid Forum, 2006.Google Scholar
  11. 11.
    Dennis Gannon, Jay Alameda, Octav Chipara, Marcus Christie, Vinayak Dukle, Liang Fang, Matthew Farellee, Geoffrey Fox, Shawn Hampton, Gopi Kandaswamy, Deepti Kodeboyina, Charlie Moad, Marlon Pierce, Beth Plale, Albert Rossi, Yogesh Simmhan, Anuraag Sarangi, Aleksander Slominski, Satoshi Shirasauna, and Thomas Thomas. Building grid portal applications from a web-service component architecture. Proceedings of the IEEE, 93(3): 551–563, 2005.CrossRefGoogle Scholar
  12. 12.
    Dennis Gannon, Beth Plale, Marcus Christie, Liang Fang, Yi Huang, Scott Jensen, Gopi Kandaswamy, Suresh Marru, Sangmi Lee Pallickara, Satoshi Shirasuna, Yogesh Simmhan, Aleksander Slominski, and Yiming Sun. Service oriented architectures for science gateways on grid systems. In International Conference on Service Oriented Computing, 2005.Google Scholar
  13. 13.
    Dennis Gannon, Beth Plale, Suresh Marru, Gopi Kandaswamy, Yogesh Simmhan, and Satoshi Shirasuna. Workflows for eScience: Scientific Workflows for Grids, chapter Dynamic, Adaptive Workflows for Mesoscale Meteorology. Springer-Verlag, 2006.Google Scholar
  14. 14.
    Carole Goble, Chris Wroe, Robert Stevens, and the myGrid consortium. The my-grid project: services, architecture and demonstrator. In UK e-Science programme All Hands Meeting, 2003.Google Scholar
  15. 15.
    N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proceedings of the IEEE, 79(9): 1305–1320, 1991.CrossRefGoogle Scholar
  16. 16.
    Elias N. Houstis, John R. Rice, Efstratios Gallopoulos, and Randall Bramley, editors. Enabling Technologies for Computational Science: Frameworks, Middleware and Environments, chapter 1, pages 7–17. Kluwer Academic, 2000.Google Scholar
  17. 17.
    Yi Huang, Alek Slominski, Chatura Herath, and Dennis Gannon. WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing. In Cluster Computing and Grid Conference, 2006.Google Scholar
  18. 18.
    Scott Jensen, Beth Plale, Sangmi Lee Pallickara, and Yiming Sun. A hybrid xml-relational grid metadata catalog. In International Conference Workshops on Parallel Processing, 2006.Google Scholar
  19. 19.
    Gopi Kandaswamy, Liang Fang, Yi Huang, Satoshi Shirasuna, Suresh Marru, and Dennis Gannon. Building Web Services for Scientific Grid Applications. IBM Journal of Research and Development, 50(2/3): 249–260, 2006.CrossRefGoogle Scholar
  20. 20.
    Richard A. Kerr. Storm-in-a-box forecasting. Science, 304(5673): 946–468, 2004.CrossRefGoogle Scholar
  21. 21.
    Sriram Krishnan, Randall Bramley, Dennis Gannon, Rachana Ananthakrishnan, Madhusudhan Govindaraju, Aleksander Slominski, Yogesh Simmhan, Jay Alameda, Richard Alkire, Timothy Drews, and Eric Webb. The xcat science portal. Journal of Scientific Programming, 10(4): 303–317, 2002.Google Scholar
  22. 22.
    Xiang Li, Rahul Ramachandran, John Rushing, Sara Graves, Kevin Kelleher, S. Lakshmivarahan, Douglas Kennedy, and Jason Levit. Mining nexrad radar data: An investigative study. In Interactive Information and Processing Systems. American Meteorological Society, 2004.Google Scholar
  23. 23.
    Ying Liu and Beth Plale. Query optimization for distributed data streams. In Software Engineering and Data Engineering Conference, 2006.Google Scholar
  24. 24.
    Ying Liu, Beth Plale, and Nithya Vijayakumar. Realization of ggf dais data service interface for grid access to data streams. Technical Report 613, Indiana University, Computer Science Department, 2005.Google Scholar
  25. 25.
    Ying Liu, Nithya N. Vijayakumar, and Beth Plale. Stream processing in data-driven computational science. In Grid Conference, 2006.Google Scholar
  26. 26.
    Acopia Networks. File virtualization with the acopia arx. Technical report, Acopia Networks, 2005.Google Scholar
  27. 27.
    Beth Plale. Leveraging run time knowledge about event rates to improve memory utilization in wide area data stream filtering. In High Performance Distributed Computing Conference, 2002.Google Scholar
  28. 28.
    Beth Plale. Usage study for data storage repository in lead. Technical Report 001, LEAD, 2005.Google Scholar
  29. 29.
    Beth Plale, Dennis Gannon, Yi Huang, Gopi Kandaswamy, Sangmi Lee Pallickara, and Aleksander Slominski. Cooperating services for data-driven computational experimentation. Computing in Science and Engineering, 07(5): 34–43, 2005.CrossRefGoogle Scholar
  30. 30.
    Beth Plale, Rahul Ramachandran, and Steve Tanner. Data management support for adaptive analysis and prediction of the atmosphere in lead. In Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology, 2006.Google Scholar
  31. 31.
    Arcot Rajasekar, Michael Wan, and Reagan Moore. Mysrb & srb: Components of a data grid. In High Performance Distributed Computing Conference, 2002.Google Scholar
  32. 32.
    Kurt Riesselmann. 600 US scientists + 3500 scientists from other countries = The New High-Energy Frontier. Symmetry, 2(3): 18–21, 2005.Google Scholar
  33. 33.
    Satoshi Shirasuna and Dennis Gannon. Xbaya: A graphical workflow composer for the web services architecture. Technical Report 004, LEAD, 2006.Google Scholar
  34. 34.
    Yogesh Simmhan, Beth Plale, and Dennis Gannon. A survey of data provenance in e-science. SIGMOD Record, 34(3): 31–36, 2005.CrossRefGoogle Scholar
  35. 35.
    Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. A Framework for Collecting Provenance in Data-Centric Scientific Workflows. In International Conference on Web Services, 2006.Google Scholar
  36. 36.
    Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. Performance evaluation of the karma provenance framework for scientific workflows. LNCS, 4145, 2006.Google Scholar
  37. 37.
    Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. Towards a Quality Model for Effective Data Selection in Collaboratories. In IEEE Workshop on Scientific Workflows and Dataflows, 2006.Google Scholar
  38. 38.
    Gurmeet Singh, Shishir Bharathi, Ann Chervenak, Ewa Deelman, Carl Kesselman, Mary Manohar, Sonal Patil, and Laura Pearlman. A metadata catalog service for data intensive applications. In ACM Supercomputing Conference, 2003.Google Scholar
  39. 39.
    Alek Slominski. Workflows for e-Science, chapter Adapting BPEL to Scientific Workflows. Springer-Verlag, 2006. In Press.Google Scholar
  40. 40.
    Dennis E. Stevenson. Science, computational science, and computer science: at a crossroads. In Conference on Computer Science. ACM Press, 1993.Google Scholar
  41. 41.
    Nithya N. Vijayakumar, Ying Liu, and Beth Plale. Calder query grid service: Insights and experimental evaluation. In Cluster Computing and Grid Conference, 2006.Google Scholar
  42. 42.
    Nithya N. Vijayakumar and Beth Plale. Towards low overhead provenance tracking in near real-time stream filtering. LNCS, 4145, 2006.Google Scholar
  43. 43.
    Nithya N. Vijayakumar, Beth Plale, Rahul Ramachandran, and Xiang Li. Dynamic filtering and mining triggers in mesoscale meteorology forecasting. In International Geoscience and Remote Sensing Symposium, 2006.Google Scholar

Copyright information

© International Federation for Information Processing 2007

Authors and Affiliations

  • Yogesh L. Simmhan
    • 1
  • Sangmi Lee Pallickara
    • 1
  • Nithya N. Vijayakumar
    • 1
  • Beth Plale
    • 1
  1. 1.Computer Science DepartmentIndiana UniversityBloomingtonUSA

Personalised recommendations