Introduction

The solar system includes the Sun, the interplanetary medium (solar wind), the Earth, planets, and smaller bodies orbiting around the Sun. In the context of the present paper we restrain ourselves to consideration of the ionised matter (plasma) in the solar system, excluding the solid, liquid, and non-ionised gaseous parts of the atmospheres of these bodies, as well as the interior of the Sun itself. Historically, the study of these different regions of the solar system plasma environment has proceeded somewhat independently, and the supporting data systems have evolved at different speeds. But the study of global phenomena requires simultaneous access to data from many different sources, and this is still rather more difficult than it need be.

The need for modern data handling and analysis facilities for space physics has long been identified. For example, a series of three papers (Scudder et al. 1986a, b, c) published in 1986 was based upon the detailed analysis of data from ten instruments on the ISEE 1–2 mission, for just one crossing of the Earth’s bow shock. When presented at a scientific congress the obvious question was asked: “Can this analysis be repeated for ~20 different bow shock crossings?”. After more than three years spent acquiring and comparing the data used, it seemed unrealistic to think of repeating the task ~20 times using the technology of the time. Since then, the arrival of Internet has simplified data transport, but the problem of handling and comparing plasma data in a multi-satellite, multi-mission environment remains very real.

A modern user-friendly data analysis environment requires mutual collaboration between scientists in the field and data centre experts: only the Principal Investigator (PI) fully understands his instrument and its associated data sets, while data specialists are studying the technical issues of interoperability. This paper gives a brief survey of these technical issues, and of one solution currently being implemented.

Context

Many years ago astronomers coined the expression “Virtual Observatory” (VO). In June 2003 the International Virtual Observatory Alliance (IVOAFootnote 1) was formed with a mission to “facilitate the international coordination and collaboration necessary for the development and deployment of the tools, systems and organizational structures necessary to enable the international utilization of astronomical archives as an integrated and interoperating virtual observatory”. The IVOA now comprises national projects in 16 different countries.

The motivation of the astronomers was reinforced by practical considerations:

  1. A.1

    Astronomical objects are so far away that the point of observation on the Earth is of little importance.

  2. A.2

    The majority of objects observed are not varying rapidly in time.

  3. A.3

    The observations are object-oriented; more precisely, the primary key for indexing most data is the location of the observed object in the sky.

  4. A.4

    Astronomy benefits from the widespread use of near standard formats.

A.1 and A.2 imply that observations made at different places at different times can be usefully compared. A.3 facilitates retrieval of the required data, and A.4 facilitates the data comparison itself.

Solar system science can make a comparable list.

  1. S.1

    Solar system observations generally require knowledge of the position of the instrument making the observation. Plasma instruments are mounted on satellites which actually fly through the regions whose properties are being measured, while other spacecraft are sent to observe the Sun from different directions, or to observe, or even explore in situ, the planets.

  2. S.2

    The observed phenomena are time-varying. Collisionless shocks, discontinuities, waves and turbulence are immaterial and can be, and usually are, rapidly varying or transient. Nevertheless, these are the phenomena which determine the characteristics such as density, velocity and temperature of the plasma which fills most of the solar system.

  3. S.3

    The data are organised as long time-series of similar measurements (in- situ observations), or as images of unique events (solar eruptions, auroras, or rings of Jupiter).

  4. S.4

    Although the Flexible Image Transport SystemFootnote 2 format, originally developed by astronomers in the late 1970’s, is now also used for solar system images, the in-situ data come in almost as many different formats as there are laboratories which provide the instruments.

While the scientific motivation for a VO is uniformly strong throughout the astronomical community, it is clear that solar system data presents inherent technical difficulties, which have caused solar system VO development to proceed more slowly, and along different lines.

Method

The space physics data environment consists of thousands of relatively small datasets plus many large datasets. There are tens to hundreds of data centres or data providers (repositories) scattered worldwide, with only very loose links (if any) between them. These sites offer data with very diverse formats for both the data itself, and the metadata which describes that data. The data comes from a multitude of past and present missions: Fig. 1 shows many of the currently operational solar system missions. Front-line research requires easy comparison of data from different instruments on the same or on different missions. Data which is not archived and available to the science community will not be widely used, and will eventually be lost.

Fig. 1
figure 1

Currently operational missions

Archiving of specific subsets of space physics data often becomes the task of national data centres, which house most of the very large datasets. These data centres were set up to aggregate, preserve and promote the use of their data, and each set its own internal standards about how best to do this. The activities associated with “archiving” in space physics have been detailed, for example, by Harvey and Huc (2003). The need to search for data worldwide across multiple sites was identified in 1998 when ISTP asked four data centres (NSSDC, CDPP, SwRI, RAL) to suggest a solution. This initiative eventually led to the formal creation, in October 2003, of the international Space Physics Archive Search and Extract consortium, Space Physics Archive Search and Extract (SPASE),Footnote 3 composed of the original four plus other data centres (Harvey et al. 2004). Since 2005 the NASA/Living With a Star Targeted Research and Technology (LWS/TR&T) program has supported US members of this consortium, while European members have been supported from their own resources.

In October 2004 an international workshop organised by NASA/LWS assembled about 100 research scientists and data engineers at Greenbelt, MD. The output was “A Framework for Space and Solar Physics Virtual Observatories” (Bentley et al. 2004). This document laid the foundations for the organisation of the data environment where, two years on, the following categories are identified:

  • End users are the scientists in the field who submit their user requirements to the VOs. This is probably the group which is the least organised, yet which stands to gain most from a well-organised global data management service.

  • Data providers supply the data made available. In space physics it is generally the PI who is finally responsible for both the nature and the quality of the data products from his instrument. PI collaboration is essential for adequate metadata and documentation, and everybody (including funding agencies) must recognise the effort required to generate the products requested by the end users, in adequate number and quality, and in a timely way.

  • Data repositories preserve data and metadata and make them accessible over the Internet. This archiving service includes ensuring the provision of adequate catalogues, the respect of certain rules for the description both data and metadata, maintenance of the network access, etc, plus attending to the long-term evolution of formats and physical support medium technology. These activities can be undertaken by the PI himself, but it is more reasonable that he concentrate on his research while entrusting archiving to a specialist centre. Data repositories vary from instrument specific to broadly based, and are generally supported by national or international agencies.

  • Service providers supply software which may be used to perform a service, such as special operations on the data files, comparison of data, of manipulation of auxiliary data, such as finding orbital conjunctions.

  • Service centres make services available. Examples are the Satellite Situation Centre,Footnote 4 which provides orbital information including such things as satellite conjunctions and magnetic field line footprints, and the Collaborative Sun-Earth Connector, CoSEC,Footnote 5 which facilitates comparison of data.

  • Virtual observatories provide uniform access to data and services for some particular group of users, whose perimeter is defined by considerations of scientific interest, resources, and geo-politics. Specific communities have terms adapted to their own needs depending, for example, on what they observe and the way they observe it. But often the source of funding of a VO is better identified than its precise scientific scope.

  • SPASE promotes standards and guidelines to facilitate interoperability within and between VOs serving solar system physics research. SPASE may occasionally provide infrastructure when this cannot be identified as being the responsibility of a particular VO or data repository.

Early in 2006 NASA announced the creation of five new VOs in the domain of Solar System physics. The list of solar system VOs is shown in Table 1, the last five being the ones recently created by NASA. VSTO is managed by NCAR and funded principally by NSF, and has a partial scientific overlap with VSPO. And EGSO is the only true virtual solar system observatory in Europe. The functionality expected of a VO is described by Bentley et al. (2004), page 14: a VO federates data resources, but is not responsible for the data made available. Some data centres currently hold volumes of data comparable with those available via some of the above-listed VOs, for example,

  • NSSDC, National Space Science Data Centre,Footnote 6

  • CDPP, Centre de Données de la Physique des PlasmasFootnote 7

  • ESA, Cluster Active Archive (CAA)Footnote 8;

nevertheless, they are not VOs.

Table 1 Existing and new solar system virtual observatories

All the above-listed VOs and data centres, and others yet to come such as planetology, need some level of mutual interoperability in order to support the study of global solar system phenomena, such as Sun-Earth relations (including space weather), and comparative planetology. They need an overall “umbrella” organisation within which they can discuss and decide how to manage their mutual interoperability requirements and thus meet the science requirements.

Space Physics Archive Search and Extract

SPASE is an international, community-based organization with the goals of facilitating data search and retrieval across the Space and Solar Physics data environment.

The SPASE CharterFootnote 9 states that the objectives are progressive, starting with data discovery (on the basis of scientific criteria) and retrieval, and evolving towards providing the ability for the user to ingest this data automatically into his own local applications. This latter goal has not yet been attained.

The first requirement for different data centres to be able to operate together collectively is adequate standardisation. In particular, a common description of data and services is essential. This is why the major part of the SPASE activity so far has centred on the data model and its associated dictionary, described in “The SPASE data model”. Its objectives are to conceptualize the domain of space physics data and resources, to provide a standard method of describing resources, and to provide a formal dictionary (set of representational terms) to describe space physics resources. This will enable (in order of increasing metadata requirements):

  • one-stop searching for data in multiple data repositories;

  • intercomparison of similar quantities from different data sets in different data centres through common terminology mapping (e.g. visible radiation vs. optical radiation);

  • identification of disparate data granules overlapping in time;

  • recovery of data granules together with all the information necessary for their immediate analysis.

It is the ability of individual VOs to provide suitably formatted data and metadata which determines the feasibility of each of these objectives.

To meet its objectives, SPASE has assembled a team of domain experts (scientists), information specialists and technologists to advance the goal of establishing standards for sharing space physics resources. Discussion is conducted by e-mail and fortnightly teleconferences, and regular twice-yearly meetings are convened to reach a formal consensus and release stable versions of the data model.

The model must also have a formal encoding to permit its exploitation, and it has been decided that this should be XML. The SPASE team has started to test the data model and its XML implementation with user scenarios and real data world resources. It has been greatly aided by the newly created VOs, which have both the mandate to archive data in their discipline area, and the resources to do it. They have produced much feedback which has enabled the data model to be refined in response to community needs. The SPASE team is also encouraging support and adoption of its data model by providing tools and a reference implementation.

The SPASE data model

Version 1.0.0 of the SPASE data model was released on 23 November 2005. In the light of numerous questions and suggestions, an updated Version 1.1.0 was released 31 August 2006. This version was frozen for use by the individual VOs (listed in caption of Fig. 1) who plan to adhere to this model. But development of the model continues within the SPASE working group. In particular, this development aims to describe individual physical parameters at a level adequate to support Application Programming Interfaces and, of course, to take into account user comments upon Version 1.1.0. Version 1.2.0 contains extensions based on community needs, and was released May 22, 2007.

To simplify management of the metadata, SPASE has identified “resources”, each of which describes a set of information which is likely to be referenced by other resources or used to describe a dataset or other type of product. SPASE has the following resource types:

  • Catalog                           Display Data

  • Numerical Data                Granule

  • Instrument                       Observatory

  • Person                             Registry

  • Repository                       Service

Thus one observatory (satellite, or ground facility) may house many instruments, each of which produces many sets of numerical data, each of which includes several different physical parameters. Each dataset has one or more persons responsible for it, and is available from at least one repository.

Note that SPASE versions are numbered Version n.m.p, where changes of these indices have the fol lowing significance:

  • p   changes for each new internal (SPASE working group) version. It is intended that p = 0 for all versions released to serve for implementation purposes within data repositories and/or VOs.

  • m   changes each time a new version is released for implementation. This new version contains no major structural changes, but new categories of resource, additional values for enumerated keywords of existing resources, or other such changes may have been introduced. Data centres may wish to update existing metadata descriptions, but this remains optional depending upon the changes made and their service level requirements.

  • n   changes whenever there has been a major change requiring that data centres revise some part of their existing data descriptions to remain SPASE compatible. It is intended that such changes occur rather infrequently!

Results

The first draft of the SPASE data model and dictionary result from several years of work by a few data centres. They engaged in this task because they realised the task to be essential both for an operational worldwide VO for solar system physics, and probably for their own survival too. All the data centres listed in “Method” have indicated their desire to provide SPASE descriptions of their holdings and participate regularly in SPASE activities. This is important, because it is in the VOs and especially their associated data repositories that most of the people actually responsible for archiving and describing space physics data are to be found.

Use of SPASE involves a certain effort. Whether or not it is SPASE-compatible, each individual data centre or repository must have its own internal data model and dictionary, so as to exercise full control over its own system development. For example, the dates of upgrade of a data centre system depend upon many factors external to the data centre itself:

  • the dates of local operating system upgrades;

  • major updates to supporting (e.g., database or graphical) software, and

  • support of launch and commissioning campaigns, special scientific workshops, or similar such manifestations.

Therefore to be SPASE-compatible a data repository must convert as many as possible of its internal data descriptions to the current version of the SPASE model and dictionary. This may be:

  • relatively easy if the data centre internal data model resembles the SPASE model, involving essentially only the translation of some keywords, but

  • much more difficult if the data model is different; in addition to keywords, at some level the metadata syntax must be understood.

The conversion need not be perfect, but information not converted cannot be used by SPASE. VOs making their metadata available in SPASE-compatible form are encouraged to display the logo “SPASE inside” on their data access home page. As for any communications initiative, the return on investment is slow until the investment nears completion : but can be spectacular thereafter.

European considerations

There are currently three operational European data activities (in order of conception):

  • CDPP – Centre de Données de la Physique des Plasmas, http://cdpp.cesr.fr/ is a national centre for archiving space plasma data which opened its service in October 1999 (Harvey et al. 2003).

  • EGSO – European Grid of Solar Observations, http://www.mssl.ucl.ac.uk/grid/egso/ is a VO led by UCL-MSSLFootnote 10 and established under the Information Systems Technology Priority of the European Community Framework Program 5 (which has now ended).

  • CAA – Cluster Active Archive, http://caa.estec.esa.int/ opened in February 2006 and funded by ESA to make publicly available data of the highest possible quality from the four-spacecraft Cluster mission.

How do these three entities fit into the scheme presented previously?

  • EGSO is a VO, and should participate as such.

  • CDPP and CAA are magnetospheric data centres, which are respectively multi-mission and project-oriented. They are not the only ones: we may mention the MEDOCFootnote 11 and BASS2000Footnote 12 data associated with EGSO. There is also the embryonic EuroPlaNet/IDIS: how should this fit in?

EGSO, CDPP and CAA are all members of SPASE, but there is presently no other coordination at the European level.

There are several reasons for thinking that more European coordination to be desirable.

  • The funding situation in Europe is complex: both ESA and the European Commission have declared their interest in preserving and exploiting the data but both have tight budgets so that, except for CAA, most of the support presently comes from national funding agencies. Precious resources must not be wasted by inadvertent duplication.

  • National language support in Europe: it is difficult to evaluate its importance, which can fluctuate in unpredictable ways, but it is certainly extremely high for the invaluable “Outreach” aspects of VO activity.

  • NASA has expressed an interest in interfacing to a more global view of European VO activity than is presently available.

  • It seems particularly important that a complete registry be maintained in Europe; this could be either a mirror of the US central SPASE registry, or a registry which is managed and updated independently.

Virtual observatories and space weather

The requirements for a VO are primarily driven by the research community, whose priorities are:

  • being able to locate and recover the data, even long after the event

  • being able to exploit it without undue software development.

The requirements for an operational space weather service are different. The primary requirement is rapid access to the required data, which implies that:

  • its location is known, and

  • processing software already exists.

VO services will be increasingly used for the research activities associated with Space Weather. For example, Hanuise et al. (2006) have recently studied the impact of solar flares on the Earth’s magnetosphere, using the flares observed by three SOHO (ref. cit. for the acronyms) instruments during 2003 May 27–30. The effects near the Earth were studied using data from the satellites ACE, Cluster, GOES-10 GOES-12 and CHAMP, plus ground based facilities included EISCAT and SuperDARN, magnetic observatories at low and high latitude, including the IMAGE magnetometer chain. Measurements of the total electron content and scintillation were derived from GPS data, and magnetospheric activity indices were obtained from the Danish Meteorological Institute, the WDC for Geomagnetism and Space Magnetism in Kyoto, and the GeoForschungZentrum in Potsdam. This detailed article concludes “we can often qualitatively understand and predict the impact of solar events on geospace, and we can predict that certain space weather effects are likely to occur. But we are still far from being able to provide quantitatively accurate predictions as far as the timing and the level of intensity of space storm and space weather effects are concerned.” Better tools to facilitate work in this domain are clearly needed.

The use of standard formats and data descriptions would almost certainly accelerate the integration of new data sources or services into an operational service, although it remains to evaluate the impact of the handicap (speed of execution) of using of generic software to read these standard formats in an operational environment.

VO activity is an integral part of GRID (Globalisation des Ressources Informatiques et des Données) resource-sharing technology (see, for example, Enabling Grids for E-sciencE, EGEEFootnote 13): but in the near future it is unlikely adventure into the more well-known GRID domain of sharing hardware resources. Today there is only one domain where this is required: space weather, with its need to perform the maximum amount of analysis in the shortest possible time whenever the onset of a “Space weather event” has been detected. But this is an operational service level requirement, not a scientific research requirement.

Discussion

SPASE participants will continue to use the SPASE V1.2 data description to describe as many data sets as possible. This will expose a few shortcomings of the model, while use of the data model for data searching will test its functionality. Essential modifications will be incorporated into future releases. In parallel, the description of physical parameters will be enriched until it can support Applications Programming Interfaces. After expiry of the current NASA LWS/TR&T SPASE contract in 2007, SPASE activity in the USA has been promised “permanent support” from NSSDC.

SPASE may well be the only international forum to offer the possibility for solar system VOs from all continents to collaborate. Support in Europe is presently limited to the support of some individual national agencies within the framework of their national contributions to the IVOA. It would be good if some Europe-wide coordination of the solar system virtual activity could be organised. During one of its semi-annual round-table meetings convened recently in Toulouse, the SPASE consortium devoted one day, June 8 2006, to European activity. Several presentations by European national and international representatives showed that the desire for more integrated European collaboration was certainly present, but that no one agency felt mandated to take the first step. The presentations can be found at the URL http://www.cesr.fr/~harvey/SPASE_060608.htm. How can permanent European collaboration and coordination be established ?

SPASE is a solar system counter-part to the (much larger) current activity of IVOA, the international organisation which coordinates the international astronomical observatory. IVOA caters more for extra-solar system astronomy, but it has not closed the door on solar-system objectives and, indeed, in many European countries the solar system scientific community is funded by the same budget line as the extra-solar system astronomers. Despite their data models differing for reasons outlined in “Background”, SPASE and IVOA have much underlying commonality. They are presently advancing in parallel, and it would make sense for them to work towards the long-term goal of common management. This is the direction taken by the ASTRONET consortiumFootnote 14 of European national astronomical funding agencies, which is supported by the European Commission: it has indicated a strong interest in solar system physics.

Conclusions

Any successful multi-mission, multi-national analysis system will require adequate access to adequate metadata, and the SPASE consortium is attempting to provide standards, tools and management procedures for a global data analysis system for solar system plasma astronomy. The science studies cited in “Introduction” and “Virtual observatories and space weather” concerned respectively plasma micro-processes and space weather. To complete the picture, we mention a third study, of the propagation of an interplanetary disturbance away from the Sun, and the plasma phenomena seen at the different planets encountered on the way.

Prangé et al. (2004) have used planetary auroral storms to trace an interplanetary shock from the Sun to Saturn, and thus explain strong transient polar emission detected on Saturn around 11:30 UT on 2000 December 07 by the Space Telescope Imaging Spectrograph aboard HST. At the time the Sun, Earth, Jupiter and Saturn were nearly aligned, which enabled the following deductions:

  • Of the series of CME-driven interplanetary shocks observed by the LASCO coronagraph on SOHO on November 01–10, five were directed towards the Earth and evolved into interplanetary shocks observed by WIND and ACE two days later, with associated geomagnetic storms observed by POLAR. Earth-based radio astronomical observations of Jupiter showed that by November 18–24 they had merged into a single long-duration event at Jupiter.

  • Extrapolation of the interplanetary model supported by the Jupiter observations showed that the solar wind event would have passed Jupiter December 02–08. The authors conclude that the HST observations were almost certainly caused by this event via an “as-yet-unknown” interaction with Saturn’s magnetosphere.

All the studies mentioned here were long and fastidious to perform, but are clearly essential to our understanding of the solar system in general, and our local “Space weather” environment in particular. Today the different categories of participant described in “Method”, and role of each participant according to his category, now seem fairly well established for the foreseeable future. What is much less clear is the way in which these different categories of participant, especially the VOs, will be created and funded uniformly across the world. The organisation of the world-wide data system involves many players: first and foremost the data providers and end users, but also other important groups such as the various scientific projects, data centres and, above all, the funding agencies, national and international. Any realistic system architecture must take account of this reality.