An Architecture for Archiving and Post-Processing Large, Distributed, Scientific Data Using SQL/MED and XML

  • Mark Papiani
  • Jasmin L. Wason
  • Denis A. Nicole
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1777)


We have developed a Web-based architecture and user interface for archiving and manipulating results of numerical simulations being generated by the UK Turbulence Consortium on the United Kingdom’s new national scientific supercomputing resource. These simulations produce large datasets, requiring Web-based mechanisms for storage, searching and retrieval of simulation results in the hundreds of gigabytes range. We demonstrate that the new DATALINK type, defined in the draft SQL Management of External Data Standard, which facilitates database management of distributed external data, can help to overcome problems associated with limited bandwidth. We show that a database can meet the apparently divergent requirements of storing both the relatively small simulation result metadata, and the large result files, in a unified way, whilst maintaining database security, recovery and integrity. By managing data in this distributed way, the system allows post-processing of archived simulation results to be performed directly without the cost of having to rematerialise to files. This distribution also reduces access bottlenecks and processor loading. We also show that separating the user interface specification from the user interface processing can provide a number of advantages. We provide a tool to generate automatically a default user interface specification, in the form of an XML document, for a given database. The XML document can be customised to change the appearance of the interface. Our architecture can archive not only data in a distributed fashion, but also applications. These applications are loosely coupled to the datasets (in a many-to-many relationship) via XML defined interfaces. They provide reusable server-side post-processing operations such as data reduction and visualisation.


Scientific Data Query Form Document Type Definition File Server External File 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sandham, N.D. and Howard, R.J.A. Direct Simulation of Turbulence Using Massively Parallel Computers. In: A. Ecer et al., eds. Parallel Computational Fluid Dynamics’ 97, Elsevier, 1997.Google Scholar
  2. 2.
    Williams, R., Bunn, J., Reagan, M., and Pool, C., T. Workshop on Interfaces to Scientific Data Achives, California, USA, 25–27 March, 1998, Technical Report CACR-160, CALTECH, 42pp.Google Scholar
  3. 3.
    Eisenberg, A. and Melton, J., SQL:1999, formerly known as SQL3. SIGMOD Record, 28(1), March, 1999.Google Scholar
  4. 4.
    Mattos, N., Melton, J. and Richey, J. Database Language SQL-Part 9:Management of External Data (SQL/MED), ISO/IEC Committee Draft, CD 9075-9, December, 1988.
  5. 6.
    Jim Bray, J., Paoli, J. and Sperberg-McQueen, C., M. eds. Extensible Markup Language (XML) 1.0, W3C Recommendation, 10 February, 1998.
  6. 7.
    Zloof M.M. Query By Example. American Federation of Information Processing (AFIPS) Conf. Proc., Vol. 44, National Computer Conference, 1975, 431–8.Google Scholar
  7. 8.
    Manber, U. Future Directions and Research Problems in the World Wide Web. Proc ACM SIGMOD Conf., Montreal, Canada, June 3–5, 1996, 213–15.Google Scholar
  8. 9.
    Warren, M., S., et al. Avalon: An Alpha/Linux Cluster Achieves 10 Gflops for $150k. Gordon Bell Price/Performance Prize, Supercomputing 1998.
  9. 10.
    Davidson, J., D., and Ahmed, S. Java Servlet API Specification, Version 2.1a, November, 1988.
  10. 10.
    White, S., Hapner, M. JDBC 2.0 API, Sun Microsystems Inc., Version 1.0, May, 1998.Google Scholar
  11. 11.
    Haw D., Goble, C., A., and Rector, A., L. GUIDANCE: Making it easy for the user to be an expert. Proc. 2nd Int. workshop on User Interfaces to Databases, Ambleside, UK, 13–15th July, 1994, 19–44.Google Scholar
  12. 12.
    McGrath, R., E. A Scientific Data Server: The Conceptual Design. White Paper, NCSA, University of Illinois, Urbana-Champaign, January, 1997.Google Scholar
  13. 13.
    Catarci, T., Costabile, M., F., Levialdi, S., and Batini, C. Visual Query Systems for Databases: A Survey. Journal of Visual Languages and Computing, 8, 1997, 215–60.CrossRefGoogle Scholar
  14. 14.
    Carey, M., J., Haas, L., M., Maganty, V., and Williams, J., H. PESTO: An Integrated Query/Browser for Object Databases. Proc. VLDB Int. Conf., India, 3–6 September, 1996, 203–14.Google Scholar
  15. 15.
    Yaeger, N. A Web Based Scientific Data Access Service: The Central Component of a Lightweight Data Archive, National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign.

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Mark Papiani
    • 1
  • Jasmin L. Wason
    • 1
  • Denis A. Nicole
    • 1
  1. 1.Department of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK

Personalised recommendations