1 Introduction

xstar [1, 2] is a plasma modeling code widely used by the astronomical X-ray community. It generates high-resolution synthetic spectra by deriving from comprehensive application of relevant physical processes the ionic charge states and level populations of the plasma atomic constituents assuming steady-state equilibrium in a photoionized or collisionally ionized gas. The scientific accuracy of xstar thus relies on an extensive atomic database meticulously compiled in the past 20 years from in-house computations (see [3, 4] and list of references therein) and a variety of other sources (e.g., [5,6,7]). Due to its extensive treatment of photoionization, particularly at the high energies associated with the K edges of ions with \(Z\le 30\), and of high-density effects [8], xstar is useful in the modeling of luminous compact objects such as accreting black holes and neutron stars.

With the establishment of the big data era [9,10,11], astronomical spectrum modeling has evolved from running monolithic codes with well-prescribed data outputs to custom workflows, often deployed in high-performance-computing (HPC) environments, which utilize different tools put together with Python. Furthermore, Python is often used beyond model fitting to understand the details of physics of the best-fitting model to constrain with greater versatility the physical mechanisms in hand or discover new ones. We are following this trend by upgrading xstar into a general-purpose calculator of non-LTE plasmas (in particular photoionized and collisionally ionized gases) based on Python tools referred to collectively as PyXstar. Similar initiatives have already been pursued by other spectrum modeling codes such as PyAtomDBFootnote 1 [12], PyNebFootnote 2 [13, 14], and ChiantiPyFootnote 3 [6], which the astronomical community has well received.

PyXstar currently comprises four independent modules allowing the user to: map and exploit the xstar FITSFootnote 4 output files; get “under the hood” to decipher the intricate intermediate steps of synthetic spectrum generation; run model grids; and interact with the atomic database. Regarding the latter, the user is encouraged to visualize, revise, and modify the current datasets if custom-made versions are desired. Moreover, the curation, provenance, and evaluation (accuracy and completeness) of the database are of prime importance. Steps are being carried out to facilitate database maintenance by storing the master version in a relational model accessible through the SQLiteFootnote 5 library. The ultimate goal of PyXstar is to enable scientists to visualize easily the atomic species and specific spectral features that the plasma imprints on the observed spectrum, and additionally, to understand the assumed equilibrium by including, for example, which processes contribute to the heating and cooling.

The present report gives an overview of the work in progress. In Sect. 2, we describe the technical profile of the xstar database, its data curation policies, and the PyXstar database retrieval scheme. We go over the package modular structure in Sect. 3, illustrating its functionality in Sects. 45 with database retrieval functions and objects. Brief comparisons with other Python modules are given in Sect. 6 followed by our conclusions in Sect. 7.

2 The XSTAR database

xstar contains an atomic database of around 870 MB of radiative and collisional data to empower the modeling of plasmas of light chemical elements (\(Z\le 30\)) at electron temperatures \(T> 10^4\) K and densities \(n_e \lesssim 10^{24}\) cm\(^{-3}\). The master database currently consists of flat ASCII files that are transcribed to a FITS file before each public version is released. It is based on four long arrays of integers, floats, and characters that are read into main memory when the code is invoked. A pointer structure is derived after this initial step to ensure fast direct database access during the complete plasma modeling process.

The xstar database curation policies regarding new data and maintenance specify that the tabulations and units of the original sources are maintained. As a result, the database spans an extensive variety of rate and data types that have been inventoried in [2, 4]. In the face of this complexity, PyXstar aims to give the user-expedited data access and manipulation capabilities through a series of easy-to-use Python functions and object classes.

Fig. 1
figure 1

Data manipulation scheme of TOPbase, the Opacity Project atomic database, based on two data structures: the view and the table. Retrieved datasets can be viewed or plotted in a terminal, printed, or stored on disk. Figure is reproduced from Fig. 2 of [15] with permission from Revista Mexicana de Astronomía y Astrofísica (http://www.astroscu.unam.mx/RMxAA)

The PyXstar database retrieval scheme follows the view–table framework implemented in the development of TOPbase,Footnote 6 the Opacity Project atomic database [15] (see Fig. 1). In this approach, a disk data search is performed through a single command portrayed by a series of positional arguments referred to as the descriptor that leads to a view of the database in main memory. The view can then be further manipulated (e.g., row selection, exclusion, and sorting) in main memory through the table logical data structure attending the user’s ultimate requirements. Views and tables can be displayed on the screen, printed, or disk stored and retrieved. Table graphic processing is also considered. The original TOPbase user interface was command-based and accessed remotely from the database host at the Centre de Données astronomiques de Strasbourg (CDSFootnote 7) through the SSH (telnet at the time) network protocol. However, with the advent of the World Wide Web in the early 1990s, the user interface was rapidly transcribed to the HTML markup language reducing the functionality of the table structure [16].

A key difference of the PyXstar data retrieval scheme is that database searches are performed in main memory. Views are procured through a Python function by means of descriptors specified in terms of non-positional keyword arguments. The table data structure is fully exploited with Pandas dataframesFootnote 8 or Astropy data tablesFootnote 9 for comprehensive, easy-to-use big-data analysis. Further data manipulation capabilities are brought about by the introduction of data objects implemented through Python classes. A central class is Ion bearing the basic attributes of an ionic species such as its atomic number, electron number, charge, ionization potential, and ground-level configuration. This class also activates two subclasses, Level and Transition, with a second layer of attributes and methods: for the former, the level configuration, spin multiplicity, total orbital angular momentum, statistical weight, energy, radiative and Auger widths, and for the latter, the transition wavelength, A-value, Auger rate, and effective collision strengths. Inter-class methods are also implemented; for instance, the Level subclass also includes the level radiative and Auger widths, both derived through the Transition subclass.

3 PyXstar

The PyXstar blueprint currently consists of four modules that concentrate on well-defined tasks to enhance the user experience and plasma modeling scope of the xstar code. We briefly describe them in Sects. 3.13.4.

3.1 PyXstar_model

This module streamlines the running of xstar models from a JupyterFootnote 10 notebook in different computing environments and provides Python functions to access and manipulate the data contained in the FITS output files. Ample use is made of the astropy.io.fitsFootnote 11 Python module.

The first version of this module addresses three running environments: a local HEASoftFootnote 12 installation; a local DockerFootnote 13 container; and a remote Docker container through SSH tunneling.Footnote 14 Input parameters can be specified through a Python dictionary or an interactive Jupyter widget (IPyWidgetsFootnote 15). Datasets include among others: plasma parameters; ionic abundances and column densities; heating and cooling rates; and line, radiative recombination, and continuum spectra.

Fig. 2
figure 2

xstar model flowchart showing the main stages of the computation and the spatial zones and passes in the iterative cycles

3.2 PyXstar_uth

We show in Fig. 2 the flowchart of an xstar model consisting of different plasma spatial zones. From a set of input parameters that include, among others, the elemental abundances, temperature, density, ionization parameter, luminosity, and turbulence velocity, the code reads from disk the atomic database, works out a set of pointers to access its components, and determines the radiation flux. It then proceeds to compute the ionization balance and level populations by imposing thermal equilibrium that results in a temperature and tabulations of the line and continuum opacities and emissivities. The inter-zone heat transfer is worked out, and the code proceeds with the following iteration. Several passes of the zonal cycle can be prescribed to ensure the desired convergence.

The aim behind PyXstar_uth is to compartmentalize this bicycle into Python functions and classes to give the user interactive and scripting potential at every stage and to foster access of the intermediary data unavailable when running the code in the PyXstar_model mode. The Python functions have been coded as wrappers of the Fortran xstar routines using the F2PYFootnote 16 interface generator. We are also rewriting these Python wrappers in CythonFootnote 17 to compare performance and with the intention to port xstar to a more modern programming language.

Two types of data are addressed by PyXstar_uth: (i) the basic atomic parameters encompassing the database and (ii) derived data computed in plasma models such as transition rates, heating–cooling rates, level populations, ionization fractions, opacities, and emissivities. In the present report, we are mainly concerned with the initial setup stage (see Fig. 2) when the atomic database is loaded into main memory; thus, the chosen Python functions and classes are those destined to display information about its components and methods, which are further described in Sect. 4.

3.3 PyXstar_grid

Within the context of the HEASoft spectral fitting code xspec,Footnote 18 grids of xstar models can be implemented with the xstar2xspec Perl script. From user specifications, the FTOOLS xstinitable and xstar2table create a job list that is run and consolidated into table models. A parallel version of this procedure has been devised independently in the C++ programming language with the Message Passing Interface (MPI) [17]. In the present work, we have developed a Python version of the xstar2xspec script based on the FTOOLS Python wrappers of HeasSoftPy,Footnote 19 which runs in parallel in a multi-core processor through the multiprocessingFootnote 20 module.

3.4 PyXstar_DB

As mentioned in Sect. 2, the master xstar database is structured as a collection of flat ASCII files that do not conform to a relational model, thus making maintenance and updating time-consuming. PyXstar_DB remaps the database onto an SQLite engine diminishing record indexing and reinforcing table relationships through keys. Dataset provenance and completeness are high-priority issues. Implementation of the new database is leading to a more comprehensive and simplified inventory of the xstar data and rate types.

4 Database function

The function in PyXstar_uth to display components of the atomic database

$$\begin{aligned} \mathtt {get\_data(ion,rtype,llo,lup)} \end{aligned}$$
(1)

requires the following keyword arguments: \(\texttt{ion}\) identifying of the ionic species in xstar notation (e.g., ‘o_iii’) or the tuple (ZN) with its atomic (Z) and electron (N) numbers; \(\texttt{rtype}\) the dataset of interest specified by a character acronym or the corresponding xstar ratetype integer [3, 4]; and the integers \(\texttt{llo}\) and \(\texttt{lup}\) denoting, respectively, the lower- and upper-level indices of the ion. The default for the lower level (\(\texttt{llo}=0\)) returns all the transitions with \(\texttt{llo}< \texttt{lup}\) while that for the upper level (\(\texttt{lup}=0\)) returns all the transitions with \(\texttt{lup}> \texttt{llo}\). The option \(\texttt{llo}= \texttt{lup}=0\) returns all the transitions of the ion.

PyXstar_uth is invoked in the usual Python manner uploading the atomic database to main memory where it will reside for the rest of the user interactions. The graphic library Matplotlib and the module of mathematical functions math are also conveniently imported in this initial step:

figure b

We have adopted here and in Sects. 4.14.3 the cell-based format of the Jupyter notebook interface. We illustrate the functionality of \(\mathtt {get\_data()}\) with database searches of energy levels, radiative bound–bound transitions, and photoionization cross sections.

4.1 Energy levels

The function displays the attributes of energy level llo in ion. The default, \(\mathtt {llo=0}\), displays all the energy levels, while \(\mathtt {llo=-1}\) does so for the continuum; i.e., the ionization potential of the species. As an example in the cell below, we search for all the levels of C-like O iii:

figure c

The search output is displayed in a Pandas dataframe whose concise and comprehensive data visualization profile is remarkable. It may also be noted that the level electron configuration in the xstar database abides by the Witthoeft notation delineated in “Appendix A.”

4.2 Bound–bound radiative transitions

Radiative attributes (wavelength, A-value, and gf-value) for a bound–bound transition between levels \(\texttt{llo}\) and \(\texttt{lup}\) of ion are also listed with this function. In the cell below, we retrieve the respective data for the transition between levels 7 and 3 in O iii:

figure d

It may be seen that the output dataframe can conveniently list records longer than the monitor width by using the continuation character ‘/’.

4.3 Photoionization cross sections

The function retrieves fits or tabulations of the photoionization cross sections \(\sigma (E)\) as a function of energy for the transition between level \(\texttt{llo}\) of the parent ion leaving the daughter ion in level \(\texttt{lup}\). We search below for the photoionization cross section of the ground level of O iii leaving O iv in its ground level:

figure e

Two data products are returned, y[0] listing the transition identifiers

figure f

showing that the cross section has two contributions (data types 49 and 53):

figure g

The correspondence between the \(\mathtt {y[0]}\) transition identifiers and \(\mathtt {y[1]}\) tabulations is

$$\begin{aligned} \mathtt {y[0].iloc[}i\mathtt {]} \rightleftharpoons \mathtt {y[1][}i\mathtt {]} \end{aligned}$$
(2)

We can then plot the cross section using the Matplotlib graphic library,

figure h

showing the two contributions. This plot is useful to check matching accuracy in a multi-component method frequently used in the xstar database to represent cross sections over a wide energy range. The figure also illustrates how data in Pandas dataframes are readily plotted with Matplotlib; e.g., applying the anonymous Python lambda function to process column values.

5 Data objects

Further programming pathways to make the most of the xstar database components can be rendered through data objects in terms of Python classes. We introduce the Ion class exhibiting several attributes and two subclasses: Level and Transition. To examine their possibilities, we instantiate the class with O iii and list its attributes

figure i
figure j

The class attributes of the instantiated ion may be accessed with the usual dot notation; for instance, the ionization potential

figure k

Similarly, the methods of the Level and Transition subclasses can be inventoried

figure l
figure m

and accessed

figure n
figure o

The default \(\mathtt {llo=lup=0}\) gives the whole dataset as a list of tuples.

6 Comparison with other spectral modeling tools

We briefly review here two spectral modeling tools that use Python interfaces to display their atomic databases.

6.1 PyNeb

PyNeb [13, 14] is a Python package widely used in nebular physics for the analysis of emission lines. Relying on an extensive atomic database, it solves the equilibrium equations to obtain the level populations, critical densities, and line emissivities. By comparing the theoretical emissivity ratios with observed line intensity ratios, the electron temperature and density and the chemical abundances may be estimated.

In the object-oriented architecture of PyNeb, the ion object is implemented through the Atom and RecAtom classes with a stream of methods to tabulate and plot the radiative and collisional data. For instance, to list the effective collision strength for the transition between levels 2 and 1 in O iii at \(T=10^4\) K, the atomic data files are fetched, the ion is instantiated, and the parameter is listed:

figure p

A salient aspect of the database curation of this system is its reliance on collections rather than selections of available datasets. The latter are transcribed to a prescribed format and laboriously re-indexed to match the NIST energy-level order [7], which is standard throughout the database. The default datasets are carefully appointed, but their substitution with other listed datasets in any spectral model is straightforward. This option makes PyNeb particularly useful in determining the impact of the atomic data on nebular plasma models [18] and in atomic data assessment [19, 20].

Atomic datasets in xstar, on the other hand, are mostly custom-computed or selected from other databases, and when updating, those to be replaced are discarded. Furthermore, it is considerably more difficult to replace datasets (e.g., partial photoionization cross sections) piecemeal for an N-electron ion in xstar as they can be tightly coupled to those of the (\(N{-}1\))-electron system.

6.2 PyAtomDB

AtomDB [12] is an atomic database compiled to underpin the spectral modeling of collisionally excited plasmas mainly in the ultraviolet and X-ray. Spectral modeling is carried out with the PyAtomDB Python package that replaced the previous apec C code. Although it is mainly tailored to compute derived data from the database, e.g., rate coefficients, level populations, and charge state distributions, PyAtomDB also allows the interactive viewing of the raw atomic parameters.

For instance, to get the A-value of transition 2–1 in O iii, the respective atomic data file is loaded to list its headings and attributes:

figure q
figure r

It may be noted that the requested data type is indicated by a character string, ‘LA’ (we have implemented similar acronyms to denote atomic data types in PyXstar) and that the source references are included in the transition attributes.

An interesting aspect of the PyAtomDB database views is that the requested dataset is downloaded directly from the host server the first time it is used to be then addressed locally for further local manipulation; thus, the user has the possibility of managing the database files in home file space. In contrast, the complete PyXstar atomic database is on local disk and uploaded to main memory when the module is imported. This leads to better performance but involves larger data volumes. Moreover, the PyAtomDB output data are formatted as Python dictionaries and lists or FITS files, while PyXstar interfaces with Pandas dataframes that certainly uplifts data processing and analysis. Finally, PyAtomDB has a module with a wide variety of useful functions to manipulate atomic parameters to facilitate data processing.

7 Conclusions

Within the context of the atomic database of the spectral modeling code xstar, we have given an overview of current developments of a Python package for up-scaling its data processing capabilities. These broadly involve the input, output, raw, and intermediate data with both plasma modeling and database curation in mind.

For performance, functions dealing with the raw and intermediate data have been coded as Python wrappers of the xstar Fortran subroutines. However, Cython versions have also been implemented for comparison and to look for alternatives to ensure long-term sustainability and maintenance of the system. The use of Pandas dataframes for data processing has been emphasized as well as the translation of the database from flat ASCII files to a relational model using the SQLite machinery. The ultimate intent is to allow the user to modify the atomic datasets in plasma modeling.