Introduction

The Japan Aerospace Exploration Agency (JAXA)’s Hayabusa2 spacecraft explored C-type near-Earth asteroid (162173) Ryugu and collected samples of ~ 5.4 g in total at two surface locations (Morota et al. 2020; Tachibana et al. 2022; Yada et al. 2022). The sample was transported to the Extraterrestrial Sample Curation Center (ESCuC), Institute of Space and Astronautical Sciences (ISAS), JAXA (Abe 2021; Yada et al. 2023). The sample was investigated through the initial description (Phase1 curation) in a non-destructive and non-contaminated manner (Yada et al. 2022; Pilorget et al. 2022; Nakato et al. 2022). Collected particles were classified into three categories: (1) individual particles that are longer than 1 mm in the long axis, (2) aggregate samples consisting of particles shorter than 1 mm, and (3) gas samples stored in the gas tanks (Okazaki et al. 2017; Miura et al. 2022; Okazaki et al. 2022a).

The initial description (Yada et al. 2022; Pilorget et al. 2022; Hatakeda et al. 2023; Cho et al. 2022) includes preliminary analyses: (1) mass, (2) stereomicrograph, (3) size of each particle, (4) infrared reflectance spectroscopy, (5) infrared hyperspectral imaging using MicrOmega, (6) six band multi-band imaging, and (7) shape modeling by stereo imaging for each particle. These descriptions were used for the sample allocation to the project-led initial analysis teams and to the Phase2 curation teams for further analyses. Note that 10% of the total mass of Ryugu particles was transferred to NASA Johnson Space Center curation office based on the memorandum of understanding (MOU) between JAXA and NASA. The entirety of the initial description data was given a single data object identification (DOI) as a curatorial dataset (https://doi.org/10.17597/ISAS.DARTS/CUR-Ryugu-description, ASRG et al. 2022). The given DOI for dataset meets the FAIR Data principle (Wilkinson et al. 2016).

The JAXA Astromaterials Science Research Group (ASRG) developed a web-based curatorial database system, the Ryugu Sample Database System (RS-DBS; https://darts.isas.jaxa.jp/curation/hayabusa2/). The RS-DBS consists of a file server, a database server, and a web server on the Data Archives and Transmission System (DARTS) (Miura et al. 2000). Similar databases of curation purposes were reported to manage a geochemical source dataset obtained by series of analysis (Yachi et al. 2013; Uesugi et al. 2016). The RS-DBS has been and will be used by researchers to select the sample to propose for their analysis through the JAXA Ryugu Sample Announcement of Opportunity (AO). All analytical results from the initial analysis teams, the Phase2 curation teams, and samples allocated to the community through the AO will be updated on the RS-DBS.

The samples allocated to the six sub-teams of the initial analysis team were investigated for a year, from June 2021 to May 2022. Advanced curatorial activities and descriptions by two Phase2 curation teams also started in parallel. Major scientific outcomes have already been published (e.g., Nakamura et al. 2022a; Yokoyama et al. 2022; Ito et al. 2022; Nakamura et al. 2022b; Okazaki et al. 2022a; Okazaki et al. 2022b; Noguchi et al. 2022; Naraoka et al. 2023; Yabuta et al. 2023). After the first-year analytical campaign, all the analyzed and/or processed samples have been returned to ESCuC with some exceptions. Exceptions include consumed samples by destructive analysis and radioactive samples by the Neutron Activation Analysis. Returned samples will be available to the community through an upcoming AO.

In this report, we describe the contents and structure of the RS-DBS in detail.

Contents of the RS-DBS

The RS-DBS (Fig. 1) shows Ryugu particles with unique identification (ID) numbers together with selected analytical results (microscopic image, mass, FT-IR, MicrOmega, multi-band spectroscopy, and stereo imaging as of March 2023).

Fig. 1
figure 1

The web interface of the RS-DBS summarizes a set of analytical information on each sample. Users can sort the center table and select samples for a specific item and mass and size range. The table layout can be changed in the “Display style” panel on the top left (e.g., thumbnail style). The “Search constraints” panel on the left provides the users with a sample search function with some keywords. The “Cart” checkbox on the left column in the table allows the users to make a customized sample list. The table in this figure shows the search result of samples that includes FT-IR, MicrOmega, Multi-band images, and Stereo images

Sample information

Registered samples in the RS-DBS are classified as follows: (i) particles larger than 1 mm in size, (ii) aggregates consisting of particles smaller than 1 mm and 5–10 mg in dishes, (iii) gas samples extracted from the sample container (Miura et al. 2022; Okazaki et al. 2022a), and (iv) processed samples after analyses returned from research groups such as the Initial Analysis teams, the Phase2 curation teams, and the AO participants. The samples, classified as (iv), are in various forms, such as ultra-thin sections by FIB (focused ion beam) or UMT (ultramicrotome), polished epoxy-mounts, Indium pressed mounts, potted butts epoxy-mounts, IOM (insoluble organic matter) extractions, residues after sub-sample processing (e.g., fragmented particles), and liquid solutions.

The nomenclature of the Ryugu samples involves a prefix indicating the sample catcher used (i.e., A, B, or C) followed by a four-digit number, with numbers increasing in order of naming. Sub-samples derived from the parent sample should be named after the parent name with a hyphen followed by alphanumeric characters and underscores that researchers designated for. Duplicated names are not allowed. Curatorial staff or researchers may assign common names as nicknames for samples, which can be registered in the RS-DBS. The relationships between the parent sample and its sub-samples and the nickname are recorded in the database.

Sample descriptions to be registered are based on the previously established database system for the asteroid Itokawa samples (Uesugi et al. 2016). Descriptions are (1) the sample name which is a unique ID following the nomenclature described above, (2) sample sizes such as the length (mm) and mass (mg) for the particle, volume (mL) and pressure (Pa) in the bottle for the gas sample, and a volume (mL) for the liquid solution sample, (3) histories of analysis (e.g., FT-IR, MicrOmega, and other data from the initial description), and distribution sites (the Initial Analysis teams, Phase2 curation teams, the AOs, and NASA), (4) a current sample container and storage names, (5) a current status to describe sample quality (e.g., kept in the clean chamber or exposed to the atmosphere, unprocessed or processed samples), (6) availability for the AO and current loan status, and (7) published papers related to the sample. Details are shown in Table 1.

Table 1 Sample descriptions available on the web. The descriptions contain the name, size or amount, analysis history, storage, condition availability of the Ryugu Sample AO, and scientific reference information related to the sample

Analytical data

The analytical data derived from the initial description (Yada et al. 2022) are archived in the RS-DBS for each sample. As of March 2023, there are six types of measurements; (1) microscopic images (NIKON SMZ1270i, Miyazaki et al. 2023) with the Feret diameter for each particle; (2) an electric microbalance (Mettler-Toledo XP4042, Miyazaki et al. 2023); (3) an infrared reflectance spectrum by the Fourier transform infrared spectrometer of 1 to 5 µm wavelength range (FT-IR, JASCO VIR-300, Hatakeda et al. 2023); (4) an infrared hyperspectral image with MicrOmega (Pilorget et al. 2022). This hyperspectral image is 256 × 250 pixel image with 22 µm spatial resolution and 0.99 to 3.65 µm wavelength range; (5) a multi-band imaging with the same filter set (ul: 0.39 µm, b: 0.48 µm, v: 0.55 µm, Na: 0.59 µm, w: 0.70 µm, x: 0.85 µm) as the Hayabusa2 optical navigation telescopic camera (ONC-T; Sugita et al. 2019; Cho et al. 2022); and (6) a 3D shape model by stereo imaging (Cho et al. 2022). Detailed items available on the web are shown in Table 2. All processed data by the initial description are stored in the file server on DARTS.

Table 2 Analytical data available on the web. There are six types of measurements derived from the Initial Description (Yada et al. 2022): (1) microbalance; (2) microscopic imaging; (3) FT-IR (Hatakeda et al. 2023); (4) MicrOmega (Pilorget et al. 2022); (5) multi-band spectroscopy (Cho et al. 2022); and (6) stereo-imaging system (Cho et al. 2022)

Structure of the RS-DBS

The RS-DBS consists of three main components: (1) a file server, (2) a Relational Database Management System (RDMS), and (3) a web interface (Fig. 2). We used the Data Archives and Transmission System (DARTS) at ISAS/JAXA as the file server. PostgreSQL and Apache servers with Hypertext preprocessor (PHP) and JavaScript are used for the database system and the web interface, respectively. These well-known open-source technologies are de facto standard, so these are expected to reduce system development and maintenance costs and to be stable in operation.

Fig. 2
figure 2

Components and workflow of the RS-DBS. The RS-DBS consists of a relational database management system (RDMS), a file server, and a web interface/server. The workflow starts from the initial description work in the clean chamber. All analytical data are stored in the fileserver. Further data processing is preformed locally at the curation facility for some data to be registered in the database servers. The data in the file servers are displayed through the web interface, and the users can access the sample catalog having searching functions through the internet

File server

All the measurement raw data, which is an output file from an instrument without any post-processing, obtained in the initial description at the curation facility is archived and taken a backup in local disc drives. All the worklogs (operators’ handwriting note) including the operators’ name, measurement date, snapshots of used parameters of the instruments, etc. are also stored in the local hard disk drive as PDF format. Almost all the raw data are processed to improve accessibility, usability, and comprehensibility for users. Available raw data and processed data are stored in a file server on DARTS which manages directories for each sample with the sample name consisting of measurement result directories that store a series of measurement data (Fig. 3). Stored data in DARTS will be archived securely for at least 30 years under the ISAS data security policy.

Fig. 3
figure 3

This figure explains the directory structure in the file server on DARTS. Analytical data are generally named after the sample name, and the data are organized based on the sample name. In the file server, the upper level directories are named after the sample names, and lower level directories are created with the names of each analytical item. Under the directories of the analytical items, processed data files and data files output from each instrument are stored in the same directory

Relational Database Management System (RDMS)

The RDMS of the RS-DBS manages a sample data table, measurement data tables and a sample loan data table (Fig. 4), where the ID in the sample data table is a primary key, the function of PostgreSQL, for the relationship among the data tables. Each table has a subordinate table for describing data files. The sample data table indicates essential information required for the curation and sample allocation. The measurement data tables show descriptions of the measurement as described in Table 2. The sample loan data table records histories of sample distribution outside the curation facility and manages on-loan samples. Subordinate tables for data files (i.e., the sample file data table, the measurement file data tables, and the sample loan file data table) registered addresses of files stored in the file server to link between each database record and a designated data file.

Fig. 4
figure 4

The Entity-Relationship (ER) of the Relational Database Management System (RDMS) in the RS-DBS. An ER diagram is a visual model used in database design. It represents the entities and their relationships within a database in a graphical manner. In this ER diagram, the detailed items within each table are hidden to facilitate a clear representation of the relationships between the tables. Only the primary keys are shown to focus on the main structure and connections of tables in the RDMS. The Sample ID in the Sample Data Table is the primary key. It has a one-to-many relationship with other tables in the sample with the Sample File Data Table, each Measurement Data Table, and the Sample Loan Data Table. A Measurement ID in each Measurement Data Table is the primary key item and has a one-to-one relationship with each subordinate Measurement File Data Table. The Sample Loan ID in the Sample Loan Data Table is the primary key item and has a one-to-one relationship with the Sample Loan File Data Table

Updating data are generally done twice or three times a month. We developed a user-friendly data registration interface for the RDMS to allow operators to use a spreadsheet form, not SQL commands.

Web interface

The web interface (Fig. 1) is designed to be user-friendly to access the information: (1) users can select a view style of the interface, such as a sample list with images or without images, a thumbnail list, and a detailed data sheet for each sample; (2) users can search samples having specific characteristics such as name, form, measurement history, size range, and mass range. The search function for the compositions of the samples has not been ready yet; it could be achieved after categorizing the sample compositions based on the initial description; (3) users can sort the sample list with specific characteristics. The displayed table can be downloaded as a comma separated value (CSV) formatted file. An export function of the web interface is available to download all sample and measurement descriptions in one file, a JavaScript object notation (JSON) formatted file.

Concluding remarks

We have developed the web-based curatorial database system for the Hayabusa2-returned samples (RS-DBS) as a sample catalog for worldwide users to choose preferable samples and propose the samples for a loan through the JAXA Ryugu Sample AO. The RS-DBS describes the curatorial information, each sample’s characteristics, data and analysis history, and sample loan status. Analytical data are securely stored for a long term, no less than 30 years, in DARTS. We have assigned a DOI for the Ryugu sample dataset and kept following the FAIR Data Principle. According to the FAIR data policy, there is room for improvement in applying standardized and searchable metadata to individual analytical data and assigning persistent identifiers (PID), such as the International Generic Sample Number (IGSN; IGSN organization) to individual samples.

The JAXA curation is going to improve the web interface to make it more user-friendly, which will be one of the curation activities to maximize the science outputs from the future returned samples, i.e., NASA’s Origins, Spectral Interpretation, Resource Identification, Security-Regolith Explorer (OSIRIS-REx) (Lauretta et al. 2019) and JAXA’s Martian Moons eXploration (MMX) (Usui et al. 2020) missions. It is expected that improvement of the RS-DBS would be implemented through continuous data registration, from Phase1 and Phase2 curation, the initial analysis, and the JAXA Ryugu Sample AO activities.