SPASE 2.0: a standard data model for space physics
- First Online:
- Cite this article as:
- King, T., Thieman, J. & Roberts, D.A. Earth Sci Inform (2010) 3: 67. doi:10.1007/s12145-010-0053-4
SPASE—for Space Physics Archive Search and Extract—is a group with a charter to promote collaboration and sharing of data for the Space Plasma Physics community. A major activity is the definition of the SPASE Data Model which defines the metadata necessary to describe resources in the broader heliophysics data environment. The SPASE Data Model is primarily a controlled vocabulary with hierarchical relationships and with the ability to form associations between described resources. It is the result of many years of effort by an international collaboration (see http://www.spase-group.org) to unify and improve on existing Space and Solar Physics data models. The genesis of the SPASE group can be traced to 1998 when a small group of individuals saw a need for a data model. Today SPASE has a large international participation from many of the major space research organizations. The design of the data model is based on a set of principles derived from evaluation of the existing heliophysics data environment. The development guidelines for the data model are consistent with ISO-2788 (expanded in ANSI/NISO Z39.19) and the administration for the data model is comparable to that described in the ISO standards ISO-11179 and ISO-20943. Since the release of version 1.0 of the data model in 2005, the model has undergone a series of evolutions. SPASE released version 2.0 of its data model in April 2009. This version presents a significant change from the previous release. It includes the capability to describe a wider range of data products and to describe expert annotations which can be associated with a resource. Additional improvements include an enhanced capability to describe resource associations and a more unified approach to describing data products. Version 2.0 of the SPASE Data Model provides a solid foundation for continued integration of worldwide research activities and the open sharing of data.
KeywordsSPASEData modelOntologyVocabularyHeliophysicsSpace physicsStandard
Space Physics Archive Search and Extract (SPASE) is the moniker of a group of representatives from a number of space physics data archives, space science virtual observatories (data systems created for the purpose of providing access to metadata and data in a particular discipline, Weigel et al. 2009) and other data providers. The goal of the SPASE group is to establish standards to enable and enhance collaboration and sharing of space physics data. The main focus of the SPASE group has been the definition of a data model for space physics data. The group also undertakes tasks to demonstrate the viability of the data model and its ability to enable interoperability among data providers, archives and data services which serve the space physics community. The goal is to make it possible for data and other resources in the data environment to be easily registered, found, accessed, and used.
The progress from concept to specifications was relatively slow in the early years of SPASE. It was recognized that a more intense effort with funding for personnel to spend a significant fraction of their time on this effort was needed and, in 2005, a proposal to create the Space Physics Archive Search and Extract project was funded through the NASA Living With a Star program. This provided support for the U.S. participation and a more intense effort was undertaken. At this time it was determined that the main part of the SPASE effort should be devoted to a common SPASE Data Model to serve as a metadata medium of information and data exchange among the widespread and diverse data archives within the community. The first operational version of the SPASE Data Model (version 1.0) was released in November 2005. In 2005 NASA solicited proposals to establish thematic virtual observatories for the heliophysics community and the SPASE Data Model was adopted as the metadata standard to enable interoperability. In 2008 the initial SPASE grant ended and NASA committed to longer term support for SPASE by making it an infrastructure component of the Heliophysics Data Environment (HPDE, http://hpde.gsfc.nasa.gov). From the inception of the SPASE group to the present more than 4000 emails, over a hundred biweekly teleconferences, and at least six face-to-face meetings, the SPASE Data Model continued to be developed and the latest version is now 2.1.0 as of March, 2010.
The SPASE Data Model is now being used by a variety of groups within the United States, Europe, Canada and Japan. In the U.S. the SPASE Data Model is an infrastructure component of NASA’s Heliophysics Data Environment (HPDE). The HPDE consists of nine Heliophysics Virtual Observatories (VxO) and a growing number of service providers (e.g., Autoplot (http://www.autoplot.org), Heliophysics Event List Manager (HELM), etc.). The VxO’s use the SPASE Data Model to document both existing and newly acquired resources. Service providers and the VxOs use the assigned unique resource identifiers to retrieve the resources (data) and metadata. Information from the resource descriptions is harvested to enable search engines and specialty services. The SuperMAG project of the National Science Foundation also provides SPASE Data model compliant descriptions for its resources enabling cross agency sharing of data which is something that was remarkably limited in the past.
Both the Canadian Space Science Data Portal (CSSDP) and the Cluster Active Archive (CAA) in Europe are using the SPASE Data Model. The European Union’s HELIO project will be able to consume SPASE Data Model compliant descriptions and has intentions to provide SPASE Data Model compliant descriptions. In Japan the Inter-university Upper atmosphere Global Observation NETwork (IUGONET) has chosen SPASE as their metadata exchange standard.
SPASE data model development
Data are self-documented.
Data resources have internal schema or structures for storing values.
Resources are distributed.
There are many providers of resources and these providers can be located anywhere in the world.
An Online Resource has a Universal Resource Locator (URL)
If a resource is on-line it can be accessed and retrieved using a Universal Resource Locator (URL).
The data environment is continuously evolving.
New resources are actively generated either as part of an on-going experiment or as a result of analysis and assessment.
Along with the design principles a basic tenet was to allow the data model to evolve based on necessity. The earliest version of the data model (released in 2005) was intentionally rudimentary. It contained basic definitions for Person, Observatory, Instrument, NumericalData, DisplayData and Catalog classes. Extensions or expansions of the basic model occurred as existing resources were described using SPASE; occasionally a resource could not be adequately described with the existing data model. The issues were raised and discussed by the SPASE group and suggestions for improvements were formulated. When a consensus was reached the changes were made to the data model. This has resulted in an organic, yet controlled, evolution to the SPASE data model.
The specification of the SPASE Data Model uses terminology which is slightly unconventional, but is easily mapped to current standardized nomenclature. This is in part due to the timeframe of the origins of SPASE (1998) and the decision to adopt terminology commonly understood by the SPASE community. The SPASE community is lead by scientists who are assisted by computer scientists, data engineers, and data developers working within the space physics domain. One nomenclature example is that the SPASE consortium uses the terms “Resource”, “Element”, and “Container” which would correspond respectively to the terms “Class”, “Attribute”, and “Component Class” within the discipline-independent Unified Modeling Language (UML). The SPASE consortium also defined a set of development procedures and policies which have been found to align well with several international standards for metadata.
The SPASE Data Model is primarily a controlled vocabulary with hierarchical relationships and the ability to form associations between described resources. The selection of element names (dictionary terms) in the controlled vocabulary follows the guidelines outlined in (ISO-2788 1986) and expanded in (ANSI/NISO Z39.19 2005). Element names are also consistent with naming and identification principles described in (ISO-11179-5 2005). That is, each element name in the vocabulary has one and only one meaning; element names are always in singular form; compound element names are used as necessary to convey the semantics of the element; and lists (enumerations) are used to clearly differentiate aspects of a conceptual set. For example, the element to capture how readily a resource can be accessed has the name “Availability”. Some elements have multi-word (compound) names to capture the full semantics of the element. For example, the class of “range” elements which includes “Azimuthal Angle Range”, “Polar Angle Range”, “Energy Range”, “Wavelength Range”, and “Frequency Range”.
Terms are organized into whole-part relationships (classes). The SPASE Data Model employs both hierarchical relationships (taxonomy) of classes and associative relationships. Some classes have polyhierarchical relationships in that they can be included in more than one other class. Every term in the SPASE vocabulary (dictionary) has a single definition which is compliant with the Data Element Formulation Rules detailed in (ISO-11179-4 2004)
The administration of the SPASE Data Model adheres to the principles of ISO-11179 (Metadata Registry) standard, although SPASE does not formally adopt all aspects of this standard. Each element in the SPASE Data Model is a Data Element Concept with fully specified representation (value domain and data type) and the terms are organized into object classes (ISO-11179-1 2004). Each item (element, container, enumeration) in the SPASE data model is an Administered Item (ISO-11179-3 2004) and is assigned a version number which is synchronized with the classification scheme of a data model release.
where “scheme” is “spase” for those resources administered through the SPASE framework, “authority” is the unique identifier for the naming authority within the data environment and “path” is the unique local identifier of the resource within the context of the “authority”. Since SPASE manages all aspects of its data model ISO-11179 administered item attributes like “Submitting Organization” and “Stewardship Contact” are implied element attributes and not maintained in the SPASE dictionary.
The Data Model specification is implementation neutral and does not require any particular implement. However, the reference implementation chosen by the SPASE consortium uses XML (eXtensible Markup Language). The SPASE group generates an XML schema for each version of the data model and a corresponding XML Metadata Interchange (XMI) documents, which can be used in Unified Model Language (UML) tools. There are also XML style sheets for converting the metadata to HTML or OAI (Open Archives Initiative). Tools for exploring the data model are also available. This includes a dictionary search capability, a linear tree view and a tree based explorer (available from http://www.spase-group.org/).
Enhancements which lead to Version 2.0 include the addition of terms to support both active and passive wave related data, an important distinction for wave studies in space physics. The model for the components of multi-dimension data was expanded from an exclusively Cartesian model (classic X,Y,Z) to support angular components used in cylindrical and spherical coordinate systems. An Annotation resource was added to allow additional information to be associated with existing resources. This information can be typed to describe an event, anomaly or feature discovered in a resource. The Granule resource, which is used to describe individual files that are part of a conceptual resource (for example a NumericalData resource), was enhanced to support the tight coupling of related sources (files). This includes thumbnail and browse images, ancillary metadata, and layout information stored as separate files. Additional improvements were made to enhance the capability to describe resource associations. One resource can now be associated with another with relationships of Child, Event Of, Derived From, Observed By, Part Of, Revision Of and Other.
Modification and improvement of the Data Model remain the prime tasks for the SPASE consortium as users report on their experiences in the use of the model. The SPASE group is also working to improve support and services including better metadata editors, improved stylesheets, standardization of registry protocols and tools to aid the migration of metadata to new versions of the data model and to other data model standards, for example, the Planetary Data System (http://pds.nasa.gov). Other services will be developed as the needs are identified. In addition, the documentation for the SPASE Data Model is undergoing a process of evolution to improve the information available to both new and experienced users. This includes tutorials, guideline documents, and references to examples and helpful auxiliary information.
Development is also proceeding on a number of services to be offered to the community (see http://www.spase-group.org/tools). One central tool is the SMWG (SPASE Metadata Working Group) Resource Registry which serves as a ready source of information about the spacecraft, instruments, personnel, and other entities that can be used in data related descriptions. There will also be interfaces to reports that could be useful for data descriptions as well. When data have been found and extracted it is always useful to have data visualization capabilities, so services such as Autoplot for graphic display have been updated to understand SPASE Resource Identifiers for accessing data resources. A search capability across the space physics domain is also a valuable aid to those wishing to describe data. A centralized search service for NASA resources is located at the Virtual Space Physics Observatory (http://vspo.gsfc.nasa.gov). To carry out searches across agencies and between thematic virtual observatories two Application Programmer Interfaces are presently in consideration, the SPASE Query Language or SPASE-QL (Narock and King 2008) and one which uses Representational State Transfer (REST) that is part of the SPASE registry service (http://www.spase-group.org/tools/collection/).
Domain specific data models and ontologies are making it possible for the seamless exchange of data across groups, agencies and international boundaries. With sufficient support (parsers, services, etc) adoption can be easy. With full and concise documentation utilizing SPASE technologies can be quick and (nearly) effortless. The SPASE group is working towards this goal. For more details, please visit the SPASE website found at: http://www.spase-group.org.
The SPASE community welcomes additional participation from those who have data in this domain or just an interest in being involved in the data community. If you have resources such as data, event lists, or anything that might be useful to the space physics data environment contact one of the authors of this paper or representatives of the participating virtual observatories or a member of one of the institutions listed in Table 1 to get started. If you have a lot of resources to share, consider establishing a personal virtual observatory within the domain and having a SPASE registry in order to join the federated environment.
This work was supported in part by the National Aeronautics and Space Administration under Grant No. NNX09AF15G.