A cosmic-ray database update: CRDB v4.1

The cosmic-ray database, CRDB, has been gathering cosmic-ray data for the community since 2013. We present a new release, CRDB v4.1, providing many new quantities and data sets, with several improvements made on the code and web interface, and with new visualisation tools. CRDB relies on the MySQL database management system, jquery and table-sorter libraries for queries and sorting, and PHP web pages and AJAX protocol for displays. A REST interface enables user queries from command line or scripts. A new (pip-installable) CRDB python library is developed and extensive jupyter notebook examples are provided. This release contains cosmic-ray dipole anisotropy data, high-energy p¯/p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{p}/p$$\end{document} upper limits, some unpublished LEE and AESOP lepton time series, many more ultra-high energy data, and a few missing old data sets. It also includes high-precision data from the last three years, in particular the hundreds of thousands AMS-02 and PAMELA data time series (time-dependent plots are now enabled). All these data are shown in a gallery of plots, which can be easily reproduced from the public notebook examples. CRDB contains 316,126 data points from 504 publications, in 4111 sub-experiments from 131 experiments.


Introduction
Owing to the quantity and variety of data gathered in cosmicray (CR) physics, a central shared database (DB) assuring data quality, completeness, and traceability is an asset for the community.Although the oldest datasets have a historical value mostly, the low-energy data still trace and give a 1 crdb@lpsc.in2p3.frunique perspective on the 11-year Solar cycle [e.g.1,2], and may also be of unforeseen use in the future.
The Cosmic-Ray DataBase 1 (CRDB) team has been distributing a growing body of CR data since its first public release in 2013 [3].In a recent update, CRDB v4.0 [4], existing data on (groups of) ultra-heavy elements (Z > 30), upper limits on anti-nuclei (Z ≤ −2), and a selected sample of ultra-high-energy (UHE) CRs from ground-experiments were included.In CRDB v4.0, the DB structure and the submission data format were also revised, and users were provided with a REST interface to extract both CR data and solar modulation levels (in their own codes and scripts), with overall more flexibility and more keywords to select the data queried.
In this release, CRDB v4.1, beside uploading data from the last three years (from AMS-02, CALET, DAMPE, PA-MELA, etc.), we take advantage of an agreement with our colleagues from the KCDC 2 DB [5] to complete our sample of UHECR data.We also add energy-dependent anisotropy data, including and extending those presented in [6].We also correct the meta-data and provide a few unpublished lowenergy leptons and positron fraction data from the LEE, AE-SOP and AESOP-LITE balloon flights (operated over a 50 year time period).Because an incredibly large body of timedependent data has been released by the AMS-02 experiment, we provide a new interface to ease the visualisation of these time series; these data are now the most numerous by far in CRDB.One of the main novelty of this release is a new standalone python library for the plotting of CRDB data, which should further ease their distribution and use by the community at large.We also took the opportunity of this release to fix some mistakes in the data, meta-data, and to improve the code (behind the scene) and the web interface; 1 https://lpsc.in2p3.fr/crdb 2 https://kcdc.iap.kit.edu/ the most important changes are documented and available on CRDB's webpage, and briefly described later on.
The paper is organised as follows: Sect. 2 recalls the DB structure and the few changes made in this release; Sect. 3 presents the web interface and its novelties, and also introduce the new public python library to query and display CRDB data (outside of the website); Sect. 4 highlights the new data added in this version; we conclude in Sect. 5. Appendix A motivates our new (and hopefully more rationale) convention for the energy units in CRDB.

Database structure
In CRDB, data are separated in two broad categories, namely the data (CR data points and data uncertainties) and the meta-data (data about the data): the latter include the data taking periods, the description of the experiment, links to the associated publications, etc.The DB structure, shown in Fig. 1, has only slightly changed since our last release.Its most important features are recalled below, and we use MONOSPACE font to easily identify the DB table names and keys.

Data points and energy axis (DATA table)
Data points are described in the DATA table (see Fig. 1).Each entry has a unique ID and corresponds to a measured VALUE or upper limit (if boolean IS_UPPER_LIMIT set to 1) within an energy bin [E_BIN_L, E_BIN_U] or at the mean energy bin value E_MEAN 3 .The data point is also associated to a sub-experiment and publication via its SUBEXP_PUBLI_ID key (whose value points at a SUBEXP_PUBLI table entry, see Sect.2.5).
To cover the different energy types provided in the original publications, the energy axis (E-AXIS) of each data point must be set to ETOT, EK, R, EKN, or ETOTN.These types correspond to and are given in unit of, respectively, total energy GeV, and total energy per nucleon E tot/n = E tot /A in GeV: as discussed in Appendix A, we changed the energy unit convention of E k/n from GeV/n into GeV in this version.For the data, CRDB enables asymmetric statistical (VALUE_ERRSTAT_L and VALUE_ERRSTAT_U) and system- 3 If only E_MEAN is provided in the publication, we set E_BIN_L = E_BIN_U = E_MEAN.If both E_BIN_L and E_BIN_U are provided but not E_MEAN, we set E_MEAN = (E_BIN_L × E_BIN_U) 1/2 .Finally, some experiments define their last energy bin as all events above a given energy: in that case, we manually set an upper bin value at least 100 times the lower bin value.atic (VALUE_ERRSYST_L and VALUE_ERRSYST_U) uncertainties4 .

Quantities and conversions (CR_QUANTITY table)
The measured quantity is either a single CR quantity NUM_ID or a ratio of two CR quantities NUM_ID/DEN_ID, where both NUM_ID and DEN_ID point to entries in the CR_QUANTITY table.These entries are identified by an ID (set manually), a SYMBOL, and a NAME.The keys A, Z, and M_AMU (for the atomic mass number, charge, and mass in a.m.u) are nonnull for isotopes, only the key Z can be filled for elements, and all keys are set to zero for groups of elements (or compound quantities) and dipole anisotropy data.
In CRDB queries, the data conversion from one energy axis to another is enabled (see Table A.1 in [4]).The conversion is exact for individual fluxes of CR isotopes or leptons and for ratios of leptons, and also for p/p (this last conversion was not implemented in the previous release), but it is impossible for generic ratios, compound quantities, or anisotropy data.Nevertheless, an approximate conversion can still be enforced for fluxes of elements (or group of elements) if these quantities have a CR isotope proxy; this proxy is enabled via the PROXY_ID key in the CR_QUANTITY table (this key was previously in a separate and redundant table that we removed in this release).

Meta-data for experiments and modulation level (EXP, SUBEXP, and SUBEXP_IMAGE tables)
Definition and description.CR data are taken from experiments described in the EXP table (see Fig. 1).Each experiment has a TYPE (balloon, ground, or space), a unique ID (set internally in the DB), a name (EXPNAME), a starting year (DATE), and optionally a website (HTML); we stress that the experiment name is mainly used to better regroup and sort sub-experiments in the Experiments/Data website tab.
Sub-experiments (SUBEXP table) have an ID and are attached to a single experiment (EXP_ID).They enable to tag and distinguish, for a same experiment: (i) data obtained from different data taking periods; (ii) data taken from distinct sub-detectors or reconstructed from different analysis types; (iii) data obtained using external third-party models or different assumptions.Sub-experiments have a NAME5 , a The meta-data (publication, experiment and sub-experiment names and infos) are stored in the EXP, SUBEXP, SUBEXP_IMAGE, and PUBLI tables, with SUBEXP_PUBLI a bridge table enabling to access and link these various meta-data.The ISOTOPE_PROXY table is used to define the rules for energy-axis conversions of CR fluxes (see App. A.4 of [4] or in the new 'Caveats/Tips' web page, see Sect.3.1).The LOG_QUERIES table keeps track of the number and origin of the visits.
short DESCRIPTION (detector or detection technique), additional INFO (e.g.location for balloon flights, GPS coordinates for ground-based detectors, etc.), and an IMAGE_ID (see next).For each sub-experiment, we also provide a single value (set to zero by default) for a possible energy-scale relative uncertainty (ESCALE_RELERR).
In this release, we also added the new SUBEXP_IMAGE table (see Fig. 1).Previously, the detector images were kept in a separate directory with file names based on the EXPNAME and or sub-experiment NAME keys.In the new table, we have the image itself (DATA key) with its unique ID key, along with a brief description if needed (DESCRIPTION key).This allows to avoid storing duplicate images and makes checks on the completeness of the presence of images for all subexperiments easier.
Solar modulation level.Especially important for the interpretation of low-energy data (below a few hundreds of GeV), we must provide (i) the DISTANCE to the Sun of the subexperiment-almost all experiments are at 1 a.u., but a few satellites (Ulysses and Voyager) have also taken data at different position inside and outside the Solar cavity-and (ii) the exact list of start-stop DATES of the data taking perievant, the Monte Carlo generator used to analyse the data (e.g.IceTop SIBYLL2.1 for UHECR data).The two exceptions to form the subexperiment names are for the case of a combined analysis, i.e. names based on the concatenation of the two experiment names, e.g.Ice-Cube+IceTop (2010/06-2013/05), and unnamed balloons (concatenation of Balloon and their flight dates).For the dates, the chosen format is YYYY/MM (shortened to YYYY if the month is unknown), with a single date for a shorter-than-a-month data taking period, e.g.Balloon (1966/05), or two dates otherwise, e.g.IMP7 (1973/05-1973/08); if the month is unknown, we only quote the year (or range of years).ods 6 .These two pieces of information allow to calculate and fill SMALL_PHI, the average modulation level over the corresponding data taking periods, in the force-field approximation [7,8].Actually, SMALL_PHI contains different estimates of ⟨ϕ(t)⟩, all calculated from the same neutron monitor data 7 , but based on slightly different modellings: the values tagged [Uso05] and [Uso17] are based on monthly average public values8 from [9,10], while those tagged [Ghe17] are based on daily average values from [1].In CRDB, all queried data are returned with their calculated SMALL_PHI value, but users are obviously free to discard or re-calculate it-by default, the returned values are [Ghe17], which can be also calculated for any time period from the Solar modulation tab (see Sect. 3.1).

Meta-data for publications (PUBLI table)
Almost all data in CRDB are taken from peer-reviewed publications.The main exceptions are data from balloon flights before the 1990's, which were published in the proceedings of the biennial International Cosmic-Ray Conference only.Each publication is stored in the PUBLI table (see Fig. 1) with a unique ID (set internally) and an HTML key, taken to be the publication ADS (Astrophysics Data System) identifier (e.g.2014A&A...569A..32M).This identifier allows to re-trieve and fill in a standardised manner the REF and BIBTEX keys via the ADS API 9 .The original publications are stored in CRDB (for the administrators) but cannot be made publicly available because of publication rights.
Because some data sets are sometimes re-analysed and reported in a new publication, we set the SUPERSEDED_BY key of the obsolete one to the ID value of the new one (it is left empty if it is not superseded).This allows us to enforce that queries to CRDB always return the most recent data, discarding the deprecated ones.We nevertheless keep track of these superseded data in the 'Experiments/Data' tab (see Sect. 3.1), where old and new publications are shown.

Tying data and meta-data (SUBEXP_PUBLI table)
The full description of the data requires the data themselves, the sub-experiment that measured them, and the publication where they appeared.The SUBEXP_PUBLI bridge table (see Fig. 1) allows to tackle situations where several subexperiments are reported in the same publication.Each data set with a unique ID is tied to a sub-experiment (SUBEXP_ID) and a publication (PUBLI_ID).In addition, in this table, we keep track of the date at which each dataset was uploaded in CRDB (DATE_UPLOAD), and also of all CR QUANTITIES whose data were provided in this publication.While both these keys are unused in data queries, they are useful for maintenance and cross-checks of the DB.

Web interface and queries
CRDB runs on free open source softwares with a classical LAMP solution: Linux operating system, APACHE HTTP server, MySQL database, and PHP scripting language.The server is hosted at the LPSC laboratory, and has been recently changed to have a more recent version of the operating system, the DB, and the PHP version.The DB RAM was extended from 512 MB to 2048 MB to handle the larger requests from the newly added time-series data (see Sect. 4.6).The CRDB website is organised in tabs providing different entry points to explore the DB data and meta-data.The webpages use AJAX (asynchronous JavaScript and XML) web development technique for efficiency and speed.In addition to the few improvements made on the existing website tabs, we added two new ones in this release (see Sect. 3.1).To query, sort, and show the DB content, the web interface relies on jquery, jquery-ui, jquery.cluetip,and table-sorter.There are two ways for users to query data: either from the Data extraction tab (see below) or from a direct command-line call (bypassing the website) via the REST interface (also see below).The latter functionality has 9 https://github.com/adsabs/adsabs-dev-apibeen fully exploited in this release, with the development of a new dedicated CRDB python library.This library is described and used to generate a gallery of plots in Sect.3.2.

Web pages: content and novelties
We briefly describe below the content and noteworthy improvements made on the tabs.We also added a new tab to list a few caveats and tips related to the data preparation and transformations.
-Welcome tab: entry point of the website, where the DB content, tools, people involved, code status, etc. are highlighted.In this release, we also added a gallery of plots to advertise the variety of data in CRDB.-Caveats/Tips tab: there are a few subtleties in the way the data (and meta-data) are handled in CRDB.Indeed, at the collection stage, the information on the data is sometimes partial, and somewhat subjective choices need to be made to be able to implement them nonetheless.Then, at the query stage, combinations and conversions are enabled, with some degree of approximation as well.Users probably do no pay a lot of attention to these details, and this is probably fine most of the time.Whereas the details and caveats about these procedures are made explicit in the CRDB publications [3,4], the most relevant ones are gathered here in one place.This should help users identify data for which going back to the original publication is necessary.-Data extraction tab: queries of user-selected CR quantities with various options (sub-experiment names, dates, energy unit, etc.).The retrieved data include the ones matching exactly the query but also, if selected, extra sets based on energy conversions (Table A.1 of [4]) and data combinations (App.A of [3]); we added in this release the trivial but forgotten transformation rule to get Y/X from data published as X/Y.The data retrieved are then plotted and listed in a pop-up window and can be downloaded in various formats: in this release we added an extra option, 'csv (as import)', enabling to retrieve the data and all their meta-data (format similar to the one described in the Submit data tab, see below Behind the scene, a cron scheduler downloads NM data daily from NMDB 10 .It also calculates the associated ϕ FF , whose values can be retrieved for a selected time period and resolution (from 10 minute up to a month), either directly from this tab, or from a REST interface.
In this release, we fixed several minor bugs (as listed on the website), and more importantly, we fixed the broken REST interface and the daily update 11 .-Submit data tab: how to format and send a csv file to CRDB.-Useful links tab: online resources related to CR data.
-Admin tab: maintenance tools to check broken or inconsistent entries and missing meta-data, detailed procedure to upload data in the DB.This tab is restricted to authenticated users (i.e.CRDB maintainers).

Python access to CRDB (and notebook)
The CRDB provides a REST interface, which can be used from any programming language to automate downloading and processing data in scripts and programs.A tutorial on how to do this is available 12 .Since Python is the dominant scripting language for data processing, we further provide a ready-made solution for Python users that simplifies and standardises queries from scripts.Users of this library do not need to learn the REST API, this is done internally by the library.The corresponding Python package called crdb13 can be downloaded with the standard tool pip from the Python Package Index 14 .The main function is crdb.query,which performs a query to the database through keyword arguments, which are internally validated so that user errors are caught early and clear error messages are returned.The tabular output of a query is transformed by this function into a structured Numpy array [11], which allows for efficient fast processing in Python.Each query is automatically cached to disk for 30 days, to accelerate repeated calls to crdb.query and to reduce the load on the server; this often occurs during the development of a script or program.Further utility functions allow users to easily generate lists of citations for the data sets they queried from the DB.All functions are well documented, the documentation can be accessed with Python's internal help() command.
The Python package also provides a command-line interface, which allows users to perform queries and store the results in one of the ASCII formats supported by the CRDB data extraction system.In this case, the query is specified using command-line arguments, the latter mirroring those of crdb.query.Example code on how to make standard plots in Python can be found in the gallery, and we show in Figs. 2 and 3 a few plots illustrating the variety, coverage, and completeness of CRDB's data.More plots are shown in the next section, and all of them are available from CRDB's public gallery notebook 15 .
4 New datasets in CRDB v4.1 In addition to regular data updated since the last release (Sect.4.1), the content of CRDB has evolved in several directions.In this release, we (i) add dipolar anisotropy data (Sect.4.2); (ii) take advantage of a partnership with KCDC to gradually move from limited sample to completeness of UHECR data (Sect.4.3); (iii) include high-energy upper limits on antiproton fluxes from ground experiments (Sect.4.4); (iv) correct and complete low-energy lepton data from the LEE, AESOP, and AESOP-Lite balloons flown over 50 years (Sect.4.5); (v) expand time series data thanks to the recently released AMS-02 daily and PAMELA monthly data (Sect.4.6).

Data uploaded since CRDB v4.0
Many data from AMS-02, CALET, DAMPE, etc. have been published since our last release.These data sets should have ideally been uploaded in CRDB shortly after their publication, but were only prepared for this release.We also took the opportunity of this release to upload a few old datasets that were not yet in CRDB.Rather than a detailed and cumbersome description of all these new data sets, which are listed in Table 1, we prefer to highlight below some of their most salient features.  63,6Cu/Cu, 64,66,67,68,70 Zn/Zn, 67,69,71 Ga/Ga, 70,71,72,73,74,76 Ge/Ge, 73,75 As/As, (80+82) Se/Se, 79,81 Br/Br, (74+75+76+77+78) Se/Se, (78+79+80+81+82) Kr/Kr, (83+84+85+86) Kr/Kr, (84+85+86) Sr/Sr, (87+88) Sr/Sr To start with, the first 7 years of AMS-02 data [28], along with other publications by the AMS collaboration [29,30,31], all uploaded in this release, now provide the most comprehensive set of data from a single experiment.These data are in the GV to TV rigidity range, and correspond to fluxes and ratios of leptons, antiprotons, and nuclei from H to Si, plus Fe.Moreover, in addition to the above AMS-02 data, we have uploaded the recent CALET [33,34,35,36,37], DAMPE [39,40,41], ISS-CREAM [43], and NUCLEON [47,46,48,44,45] data, which provide the most precise set of direct measurement data in the TeV domain and above; these data are key to investigate possible breaks and features in the spectra, and the consistency between direct and indirect measurement data.Some of the new data sets uploaded also explore in a unique way the composition of ultra-heavy CRs (UHCR).Indeed, recent ACE-CRIS data [27] unveil the isotopic content of CR elements Z = 30−38, complementing the elemental fractions measured by Tiger and SuperTiger (already in CRDB); a further extension to the range 41 ≤ Z ≤ 56 should be available soon by SuperTiger [52].For even heavier (and rarer) elements, very few experiments have provided data  so far.In addition to Ariel6, HEAO3-HNE, UHCRE-LDEF, and Trek data (already in CRDB), we added the skylab data [50].
The last piece of UHCR data that we decided to add in this release are those from the OLIMPIYA experiment.The latter uses olivine crystals contained in stony-iron meteorites (pallasites) as CR detectors.At variance with satellite experiments that provide measurements of UHCR GCRs accumulated over an exposure time of a few years, the OLIMPIYA experiment provides measurements of GCRs accumulated over up to hundreds of Myr-these two complementary techniques allow to have a glimpse on the GCR time evolution.
The OLIMPIYA data uploaded in this release 16 are taken from [24,53] that supersedes a previous analysis presented in [54] 17 .

Anisotropy data
Ground-based detectors with high event statistics allow the study of anisotropies in the arrival directions of CRs.Of particular interest is here the dipole anisotropy predicted by diffusion theory, that allows us to study the nearby CR source distribution and diffuse CR transport in our local magnetic environment [e.g., 6].
While the true dipole anisotropy is represented by an amplitude and two phases, the data-driven reconstruction method of ground-based observatories allows only the reconstruction of the projection of the dipole vector onto the equatorial plane.Conventionally, this projection is characterised by the (projected component of the) amplitude and the phase in right ascension.These new dipole anisotropy data are indicated in the DB by two new entries, namely DipoleAmplitude and DipolePhase; we have chosen a convention where DipolePhase ∈ [−180 The dipole data in terms of total energy ETOT is shown in Fig. 4. Note that the limited statistics of CR experiments in the PeV-EeV energy region has so far only yielded upper limits on the dipole anisotropy.In the DB, we indicate this by providing both the best amplitude and its upper limit as separate entries.As visible in Fig. 4, the dipole amplitude and phase data from different observatories can show strong deviations beyond statistical uncertainties.This is related to hidden (and often unquantified) systematic effects, corresponding to the partial sky coverage of experiments and reconstruction method.
Furthermore, experimental collaborations oftentimes provide a number of updates of their anisotropy studies as the event statistics accumulate.We have chosen to include all the data publicly available, but note that the later data sets are usually meant to supersede the earlier ones.Finally, note that some of the (especially older) data have been extracted from publications, which give rather limited information on the methodology used.We have chosen to include these at face value, but recommend to exercise caution when using these data for quantitative studies.The experiments and associated references for all these data are gathered in Table 2.

UHECR data from KCDC
Considering the vast amount of academic databases and search engines for locating and accessing published scientific data, unified access to published datasets and spectra is still in the early stages.This is due to the large variety of experiments and thus the large variety of measured data.In cooperation with CRDB, the 'KASCADE Cosmic-ray Data Centre' (KCDC) is taking a step towards simplification, by embedding the UHECR data from KCDC, i.e. data from extensive air shower experiments, into CRDB.The advantage of such an extensive collection of UHECR data is that data from dipole amplitude [10 3 Fig. 4 Equatorial dipole amplitude and phase of the CR anisotropy inferred by various experiments (see Table 2 for references).The figure is available from the gallery notebook 15 .
other experiments can be obtained relatively quickly.KCDC is already a demonstrator and partner of PUNCH4NFDI 18 , the consortium of particle, astroparticle, astro-, hadron and nuclear physics within the German National Research Data Infrastrucutre, NFDI, which is aimed to unify the methodical approach of open data in this field.
The KCDC is a web-based interface where initially the scientific data from the completed air-shower experiment KASCADE-Grande was made available for the astroparticle community as well as for the interested public Besides a DataShop to download the reconstructed data of KASCADE-Grande and the meta-data, KCDC offers more than 100 cosmic ray spectra from about 25 different ground-based highenergy CR experiments published between 1984 and 2021 for download.The data sets available cover an energy range from about 10 12 eV to more than 10 20 eV for all-particle spectra (keyword AllParticles in CRDB) as well as for mass groups like p, He up to Fe or heavy and light respectively, derived from the unfolding procedure for different 18 https://www.punch4nfdi.de/high-energy interaction models like QGSJet, EPOS and also SIBYLL, mostly embedded in the CORSIKA simulation package: CORSIKA 19 (COsmic Ray event SImulation for KAscade) has been written especially for KASCADE and extended since then to become the world's standard simulation package in the field of cosmic ray air shower simulations.
While the KASCADE-Grande experimental data in KCDC are accessible also via an API, the spectra points and metadata, stored in a postgres database, can only be selected and displayed on the website after registration.Thus, a partnership with CRDB was set up with the aim of creating a basis for this data exchange and to provide the community with a common interface to this merged spectra data.The KCDC data sets are now being reformatted to meet the requirements of CRDB, to supplement its very extensive content with data from ground-based air shower experiments.The spectra uploaded on CRDB at the time of this release are listed in Table 3; they represent about ∼ 50% of the full data being prepared, and a sample of these data can be seen in Fig. 2. To match the requirements of UHECR measurements, the data quantity list DATA_QTY had to be extended by two more groups, the He-C-group and the Si-Fe-group.
To find out more about the real meaning of the particle spectra like helium, oxygen and so on, their mixtures as well as the mixtures of different high-energy interaction models, users should refer to the original papers.

Upper limit on high-energy p/p
With the angular resolution of ground cosmic-ray detectors reaching below the degree level in the 90's, it became possible to observe a deficit of events from the direction of the Moon or the Sun (∼ 0.5 • ): the Moon or Sun shadow technique was used first to calibrate their angular resolution and pointing accuracy.Actually, the position of the shadow is offset from the true location of the blocking bodies owing to the deflection of cosmic rays in the geomagnetic field, with the shadow shifted westward (resp.eastward) for positively (resp.negatively) charged particles.This allowed several ex- periments to set upper limits on the p/p ratio above TeV energies [129,130,131,132,133,134].
These upper limits were added in CRDB, along with the older upper limits obtained from the observed charged ratio of muons [135].These new datasets are shown in Fig. 5 and listed in Table 4.  4).The figure is available from the gallery notebook 15 .

LEE, AESOP, and AESOP-Lite balloon flights
From 1968 to 2011, the LEE (Low Energy Electrons) balloonborne instrument [136] was launched over 35 times.LEE provided the longest series of CR electron measurements (e − + e + ) over a time period that covers about four solar cycles.This data is particularly relevant to the study of the solar modulation of electrons with energies up to about 20 GeV.In CRDB v4.1, we reorganized the existing LEE data from 1968 to 1994.Data points taken from figures were updated with the actual values when private communication with the authors was possible.Data post-1994 were also added to the database.Indeed, the spectra for the years 1997 to 2000 were never fully published.However, flight data were analyzed using the same method as that outlined in [137], and the spectrum values at 1.2 GeV only were published in [138].The full spectra for these years were provided by the authors (Paul Evenson, 2023) and uploaded in CRDB.These data are shown in the top panel of Fig. 6 along with other measurements from experiments at similar energies.We also show on this plot times series of He (second panel), NM count rates (third panel), and Solar modulation values calculated from these count rates (fourth panel).
From 1994 to 2011, the AESOP (Anti-Electron Sub Orbital Payload) balloon-borne instrument [139] flew at multiple occasions with the primary objective to study the chargesign dependence of the solar modulation of electrons from a few hundreds MeV to a few GeV.In CRDB v4.1, we reorganized the existing AESOP e + /(e − + e + ) data and updated The AESOP-Lite apparatus is the successor of LEE and AESOP.Its primary objectives are to search for the origin of low-energy electrons in the electron spectrum between 20-300 MeV, and to provide a baseline electron spectrum at 1 au for the measurements of the Voyager probes currently transmitting data from outside the heliosphere.The e − , e + , and e + /(e − + e + ) data from the AESOP-Lite's maiden flight from Sweden in 2018 [145] were added to CRDB; future data will be added too.
The metadata of all these balloon flights were updated using information from the original publications.When not available, the information from the stratospheric balloon flight catalogue StratoCat20 was used.The list of the balloon flight names as encoded in CRDB along with the associated publications are listed in Table 5.    Thanks to its large acceptance and high statistics, AMS-02 was able, for the first time, to provide daily averaged fluxes of H, He, and He/H from 2011 to 2019 [156,157], and e − from 2011 to 2021 [158]: these data are now the dominant body of data in CRDB, with about 200 000 data points over ∼ 3000 days.
We also added the recently published He time series of PAMELA from 2006 to 2013.Owing to its smaller acceptance and statistics, the data were averaged over one Carrington rotation (∼ 1 month) in the first three years [159], and over three Carrington rotations later because of a random failure of a few front-end chips in the tracking system [...] particularly significant after 2009 [160]; this corresponds to ∼ 3000 new data points in CRDB (in E k/n and R), as retrieved from the 'CRDB@ASI' database 21 [161].We also added a few positron fraction data points taken from three different time periods [162]: the latter paper also provides 3-month averages (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) of the e + /e − ratio, but normalised to the unspecified 2006 value, so we did not add them in CRDB.
To better visualise these data, we added a new query option in the web interface to plot data as a function of time (instead of energy).The direct benefit is to enable showing the evolution of data from similar energy bands over long time periods.This is illustrated with Fig. 7, available from the gallery notebook 15 .

Conclusions and future releases
We have presented in this paper CRDB v4.1, an update of the CR database hosted at LPSC.On the technical side, this 21 https://tools.ssdc.asi.it/CosmicRays/update involved a migration of CRDB server and a slight simplification of the DB structure.On the code side, a few minor bugs have been fixed, the queried data can now be returned in a more complete csv format (which includes all metadata), and we fixed a missing combination rule for the data.On the web interface side, we added a new plotting capability to display CRs as a function of time, and added two new tabs: one lists all caveats related to the preparation of the data uploaded in CRDB and to the (sometimes approximate) transformation rules made on the queried data; the other provides a gallery of plots advertising and illustrating the diversity of CRDB data.Actually, this gallery and many other plots can be generated from our new public python CRDB library, and notebook examples are provided in the git page 15 .
On the content side, we enlarged the scope and content of CRDB, with the addition of dipole anisotropy data, high-energy upper limits on p, a large number of UHECR datasets, and also time series data.The latter include recently released AMS-02 daily and PAMELA monthly data, but also yearly data from LEE, AESOP or AESOP-Lite balloons taken over a 50 year period.We also updated CRDB data with all the GCR data published in the last three years, also adding a couple of older data that had slipped our attention until now.
The path to future developments is not very clear and also depends on the feedback from the community.Indeed, CRDB now accounts for most galactic and extragalactic CR data, in terms of quantities that can be cast as 1D data vectors (as opposed to skymaps or higher-dimension datacubes).Missing datasets should consist mostly of old time series from satellite experiments, which are both difficult to track and retrieve from the publications: owners and authors of such datasets are welcome to get in touch with us.If need be, other quantities related to UHECR data could also be added in the future, like ⟨ln A⟩.In any case, looking at present and future high-precision CR data, we stress that the current format to store uncertainties in CRDB is already limited and should probably be improved at some time in the future.Indeed, data from the last generation of CR detectors already come with broken-down contributions from various systematics, whereas only the total systematics can be stored in CRDB.This issue will worsen when covariance matrix of uncertainties will start to be released as well (as is already the case for instance for the most recent Pierre Auger data).
The CRDB team will continue uploading newly published CR data, but we also encourage collaborations to prepare their data (CRDB submission format) if they wish them to quickly be distributed via CRDB.Comments, questions, suggestions, and corrections on are welcome and are to be sent at crdb@lpsc.in2p3.fr.Fig. 7 Electron time series from AMS-02 [156,158] and PAMELA [154], see Table 2.The figure is available from the gallery notebook 15 .

Fig. 1
Fig. 1 Tables and keys in the MySQL structure of CRDB.The data (energy, values, and uncertainties) are stored in DATA and CR_QUANTITY tables.The meta-data (publication, experiment and sub-experiment names and infos) are stored in the EXP, SUBEXP, SUBEXP_IMAGE, and PUBLI tables, with SUBEXP_PUBLI a bridge table enabling to access and link these various meta-data.The ISOTOPE_PROXY table is used to define the rules for energy-axis conversions of CR fluxes (see App. A.4 of[4] or in the new 'Caveats/Tips' web page, see Sect.3.1).The LOG_QUERIES table keeps track of the number and origin of the visits.

Fig. 2
Fig.2Selected plots from the gallery, obtained from the CRDB python library and available from the gallery notebook15 : flux of selected species (top), multiplied by E2.6  k on the top right panel; energy dependence of high-energy CR groups of elements (bottom).

Fig. 3
Fig.3Selected plots from the gallery, obtained from the CRDB python library and available from the gallery notebook15 : electron and positron fluxes (top), and comparison of elemental abundances in Solar system and GCRs (bottom).

Fig. 5 Ā
Fig. 5 Ā/A and Z/Z ratios in CRDB; the very few data points for upper limits on (Z ≤ −2)/(Z ≥ 2), (Z ≤ −3)/(Z ≥ 3) and (Z ≤ −6)/(Z ≥ 6) are not shown.The orange crosses with downward arrows correspond to the new p/p upper limits at high energy added in this release (see Table4).The figure is available from the gallery notebook15 .

Fig. 6
Fig. 6 First and second panels: GCR fluxes of low-energy e − + e + and He over the last 70 years, illustrating the 11 years Solar cycle ('balloons' in the legend refers to unnamed balloons).Third panel: NM count rate from the Thule NM station retrieved from the NEST NMDB interface at https://www.nmdb.eu/nest/help.php#helptres.Bottom panel:: Solar modulation level reconstructed from NM data [e.g., 149], as retrieved from CRDB's Solar Modulation REST interface and whose values are based on [150, 151, 1], or as retrieved from https://cosmicrays.oulu.fi/phi/ [10] .The figure is available from the gallery notebook 15 .
Nuclear emulsions 1950-1968, Muon Telescope 1957-1995, etc. -REST/CRDB.pytab: details how to query CRDB from a stand-alone script, with the same options as the ones provided in the Data extraction tab (datasets retrieved from the website or from the REST interface with the same selection and options are the same).We also provide a simple command-line example (to run in a terminal) using curl.This capability is taken advantage of and extended in this release thanks to a new standalone python library to retrieve and display data, for instance from a python notebook, see Sect.3.2).-Solar modulation tab: gives access, for any time interval, to the force-field modulation level (see Sect. 2.3).
sorting and readability of the numerous unnamed balloon flight series (i.e.balloon launched multiple times over years by the same team and analysed in several publications), we regrouped them into fewer and more informative names, e.g.

Table 1
List of 'regular' datasets added in the last three years, sorted according to their experiment TYPE: ballon (from Balloon to SOKOL), ground (OLIMPIYA only), and space (from ACE-CRIS to Voyager2).

Table 5
Lepton data from the LEE, AESOP, and AESOP-Lite balloons flown over a 50-year time period, re-organised, corrected, and with a few new data sets added in CRDB v4.1.