GRID-SITES: Gridded Solar Iterative Temperature Emission Solver for Fast DEM Inversion

The increasing size of solar datasets demands highly efficient and robust analysis methods. This paper presents an approach that can increase the computational efficiency of differential emission measure (DEM) inversions by an order of magnitude or higher, with the efficiency factor increasing with the size of the input dataset. The method, named the Gridded Solar Iterative Temperature Emission Solver (Grid-SITES) is based on grouping pixels according to the similarity of their intensities in multiple channels, and solving for one DEM per group. This is shown to be a valid approach, given a sufficiently high number of grid bins for each channel. The increase in uncertainty arising from the quantisation of the input data is small compared to the general measurement and calibration uncertainties. In this paper, we use the Solar Iterative Temperature Emission Solver (SITES) as the core method for the DEM inversion, although Grid-SITES provides a general framework which may be used with any DEM inversion method, or indeed any large multi-dimensional data inversion problem. The method is particularly efficient for processing larger images, offering a factor of 30 increase in speed for a 10 megapixel image. For a time series of observations, the gridded results can be passed sequentially to each new image, with new populated bins added as required. This process leads to increasing efficiency with each new image, with potential for a ≈100\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${\approx\,}100$\end{document} increase in efficiency dependent on the size of the images.


Introduction
Extreme ultraviolet (EUV) observations are a crucial source of information on the low solar corona. Qualitatively, EUV images provide a window into the structure and dynamics of the atmosphere. Quantitatively, both EUV images and spectroscopy constitute the most important information of the plasma characteristics of the transition region and low corona. A major part of this information is provided by differential emission measure (DEM) analysis. A DEM is a powerful diagnostic of the coronal plasma -it is an estimate of the emission, or the total number of electrons squared along the observed line of sight (similar to a column mass), at a given temperature. EUV measurements in multiple bandpasses (or spectral emission lines) with different temperature responses, allow the estimation of a DEM. DEM analysis is core to many important studies of various solar atmospheric features and events, including coronal mass ejections in their initial stage of eruption Hannah and Kontar, 2013), active regions and loops (Aschwanden et al., 2013;Warren, Winebarger, and Brooks, 2012;Del Zanna, 2013), flares (Fletcher et al., 2013;Dudík et al., 2014;Sun, Cheng, and Ding, 2014;Su et al., 2018;Hernandez-Perez et al., 2019), EUV waves (Kozarev et al., 2011;Vanninathan et al., 2015), and coronal dimmings (Vanninathan et al., 2018;Veronig et al., 2019).
As new EUV instruments are developed, the temporal, spatial and spectral resolution becomes ever finer. Since 2010, the Atmospheric Imaging Assembly (AIA: Lemen et al., 2012) aboard the Solar Dynamics Observatory (SDO: Pesnell, Thompson, and Chamberlin, 2012) provides very fine temporal and spatial resolution of the Sun at multiple wavelengths. Even as the community continues to develop methods to digest the large volume of data from AIA/SDO, new instruments are planned and tested with even finer resolution (e.g. High-Resolution Coronal Imager (Hi-C); see Cirtain et al., 2013). In the context of the size of the AIA/SDO dataset, or the data collection rate, DEM inversion methods are computationally expensive. Larger DEM studies of broad regions over long time periods are rare (e.g. Morgan and Taroyan, 2017).
To the best of our knowledge, the most computationally fast method is that of Cheung et al. (2015), based on Simplex optimisation of a set of smooth basis functions, or a sparse matrix. This method can process around 10 5 pixels per second on a standard desktop computer. The Solar Iterative Temperature Emission Solver (SITES) inversion method (Morgan and Pickering, 2019, hereafter Paper I) creates an initial DEM estimate through a direct redistribution of observed intensities across temperatures according to the temperature response function of the measurement, and iteratively improves on this estimate through calculation of intensity residuals. The resulting DEMs replicate model DEMs well in tests on AIA synthetic data. It is simple in concept and implementation and is non-subjective in the sense that no prior constraints are placed on the solutions other than positivity and smoothness, and can process a 1000 DEMs per second on a standard desktop computer. This is similar to the speed of regularised matrix inversion-based methods such as Hannah and Kontar (2012) or Plowman, Kankelborg, and Martens (2013).
This paper provides a general framework for improving the efficiency of applying DEM inversions to data, with emphasis on AIA images. It is not a new DEM inversion method, but an approach to pre-sorting input measurements to enable faster DEM inversion. Thus Grid-SITES may be used with any inversion method, although SITES is used throughout this work as the core inversion method. See Paper I for a description of SITES and an overview of other DEM inversion methods.
The Grid-SITES concept and method is presented in Section 2, and its validity confirmed in Section 3. Section 4 describes the additional uncertainty introduced by the gridding method. Results, and comparison of Grid-SITES to directly inverted DEMs are shown in Section 5. Section 6 describes the application of Grid-SITES to a time series of observations, and Section 7 shows the inversion of full-resolution AIA observations. A brief summary is given in Section 8.

The Gridding Concept and Method
The gridding concept assumes that many of the pixels in an AIA observation will have a very similar set of intensities across all channels, and further assumes that these similar intensities will result in similar DEMs. Both these assumptions are shown to be valid in this section, offering a great saving in computational efficiency through calculating only one DEM for each group of pixels which are deemed similar. Given the relatively high calibration uncertainties (Boerner et al., 2014), in addition to the errors arising from the DEM inversion, the added uncertainty of grouping non-identical, yet similar measurements together to give a single DEM is shown to be reasonable, and adds only a small additional error to the resulting DEM.
A set of AIA observations made at 01 January 00:00 UT are used to introduce the gridding concept and method. Six EUV channels are included: 94,131,171,193,211 and 335 Å. For convenience, the 'synoptic' images are used (http://jsoc2.stanford.edu/data/aia/ synoptic/), which are reduced in spatial dimensions to 1024 × 1024 pixels through averaging, thus 16 pixels in the original high-resolution data are combined to create one synoptic pixel. Figure 1 shows a colour composite image of this observation set to give context. The boxed region enclosing pixels x = [350, 700] and y = [600, 850] is used as a case study. The region contains a small active region and quiet Sun. In order to test the validity of gridding, a direct DEM inversion is applied to each pixel, using the SITES method of Paper I.
Of the 351 × 251 = 88,101 pixels in the region, a few are discarded due to spurious high (> 1.4 × 10 4 DN) or low (< −5 DN) values in one or more channel. Of the remaining pixels, the intensity in each channel is converted to a logarithmic (base ten) intensity, and the minimum and maximum 0.1% percentile calculated. These are listed in Table 1, along with the percentage of pixels falling outside the minimum-maximum range. Figure 2 shows the distribution of log intensities for each channel. The number of grid bins for each channel, n i , (with each channel indexed by subscript i) is set between a minimum of n 0 = 16 and a maximum of n 1 = 26 according to each channel's median log intensity,Ĩ i , through whereĨ min andĨ max are the minimum and maximum median intensity over all channels, respectively, and the curtailed square brackets represent rounding up to the nearest integer. Thus the lowest intensity channel has 16 bins and the highest has 26. Creating a six-dimensional histogram of the six channel intensities using standard histogram programs would result in an array of several tens to hundreds of millions of elements (depending on the choice of the number of bins in each channel). The number of bins presented in Table 1 would give a histogram with over 85 × 10 6 bins (the product over all channels of the number of bins). However, for any observation most of these elements have no members. For the example dataset used here, only ≈ 2 × 10 4 elements are populated. That is, if a six-dimensional histogram was constructed using the parameters of Table 1, it would have over 85 × 10 6 elements, but only ≈ 2 × 10 4 elements would be populated based on the input intensities. For any AIA observation, only a relatively small number of unique intensity groupings, or populated bins, exist. The number of populated bins for the example observation (≈ 2 × 10 4 ) is around 20% of the total number of pixels in the observation (≈ 8.8 × 10 4 ), which suggests that a ≈ 5-fold increase in efficiency is possible if DEM inversions are applied only to the populated bins. Whilst this is not a huge efficiency increase, larger images offer greater efficiency. For example, including all the pixels on the disk for Figure 1 A context image from 01 January 00:00 UT, created from a set of AIA "synoptic" observations. The boxed region is used to demonstrate the method. All seven AIA channels contribute to this composite, with the temperature response of each channel between 0.05 and 7.0 MK specifying that channel's contribution to the red, green and blue colour channels of the output images. The image is processed with multiscale Gaussian normalisation to enhance fine-scale structure (Morgan and Druckmüller, 2014). Note that this is a processed image provided for spatial context, thus is not directly representative of the observed numerical values used for the analysis. Table 1 Parameters for the histogram binning of the AIA channels (used to create Figure 2), showing the 0.1 percentile minimum and maximum log intensity (DN s −1 pix −1 ), number of bins, the width of each bin in logarithmic intensity, and the percentage of pixels outside the histogram range.   Table 1. this "synoptic" observation (≈ 0.5 megapixels) results in ≈ 5 × 10 4 groups (or populated bins), which offers a ≈ 10-fold efficiency increase. For a full-resolution image of the same observation date, there are ≈ 8.5 megapixels on the disk, giving ≈ 10 5 groups -a 100-fold increase in speed compared with processing each pixel separately. This increase in efficiency with a greater number of input pixels can be explained by considering the distributions of intensities in each channel. Plotting histograms of intensities for, say, the 171 Å channel for the small example region gives the distribution of Figure 2c. The histogram for a full-disk image for the same channel would show a similar distribution, albeit with a higher number of pixels per bin. The same argument will hold for a six-dimensional histogram over all channels. Since the gridding scheme will only seek to calculate one DEM per populated bin, increased efficiency is achieved. An efficient way to find these unique groupings for a large number of pixels, without creating a large histogram, is: i) Channels are indexed with subscript i. Each channel's intensities are converted to logarithmic intensity, I i . This gives a more even distribution of values compared to using intensities directly. ii) Establish each channel's minimum, m 0i , and maximum, m 1i , logarithmic intensity as well as the number of bins n i for each channel. We use a robust minimum and maximum, and a higher number of bins is allocated to higher-signal channels as described above in the context of Figure 2 and Table 1. iii) Interpolate the intensity value of each pixel into the appropriate bin index for that channel using (2) Figure 3 Distribution of the number of groups that contain a certain number of pixels. For example, there are > 10 4 groupings containing only one pixel, and only one group containing the maximum 124 pixels. The y-axis has a logarithmic scale, so a constant of one has been added to the number of groups to avoid numerical error.
The truncated square brackets indicate rounding down to the nearest integer, thus k i gives the bin number for each measurement for that channel. The intensity at a given pixel and channel is transformed into the appropriate bin index k i for that pixel and channel, according to the number of bins, n i , and minimum/maximum intensity values. iv) Initialise a main index, K, equal to k 5 (the bin number index of the last (sixth) channel in use), and repeat for each channel from i 4 (the fifth channel) down to i 0 (the first channel): thus converting the k i into a one-dimensional index giving the location of each pixel within a multi-dimensional histogram. The k i give the one-dimensional bin number at each channel (thus range from 0 to n i − 1 = e.g. 16), whilst K gives the overall bin number in the multi-dimensional histogram (thus can range from 0 to e.g. 8.5 × 10 6 for the small example region). Note that the choice of allocation of each histogram dimension to a particular AIA channel is not important, as long as this order remains consistent throughout the processing. We simply use increasing wavelength to allocate channels to each dimension. v) Finally, the unique values of K are identified. These are extracted to give the subset K u , or the set of unique combinations of channel intensities.
For the example region, there are 23,581 elements to K u . Figure 3 shows the number of groups against the number of pixels in each group. Around half the members of K u contain only one contributing pixel. The majority of elements of K u contain only a small number of pixels, below 10, with a few groups containing over 60 pixels. The maximum number of pixels in a group is 124.

Validity of Gridding
We wish to test the assumption that grouping similar pixels, according to the set number of intensity bins, give rise to similar DEMs. An arbitrary group containing 20 similar pixels is chosen, and each pixel's channel intensities are used to calculate DEMs directly using the method of Paper I. Figure 4a shows the small range of each channel's intensities, and their correct correspondence to the intensity binning ranges. Figure 4b shows the 20 individual DEMs as grey lines. They are very closely distributed, reflected in their small standard deviation shown as red error bars. The mean DEM is shown as a bold black line, and the black error bars show the estimated uncertainty in the DEM, calculated from the measurement errors and calibration uncertainties according to Equation 6 of Paper I. The spread of the 20 DEMs is far smaller than the estimated DEM uncertainty, showing that, for this group, the assumption is valid.
This type of analysis can be extended to all grid groups. For each group that contain three or more pixels, the set of directly calculated DEMs and uncertainties are recorded, and a mean DEM and uncertainty calculated (as shown as a bold line and black error bars for the example of Figure 4b). A test of the validity of the gridding process is the number of that group's DEMs that fit within the DEM uncertainty bounds. Figure 5a shows the percentage of pixels that fall outside of these bounds as a function of temperature, calculated across all 7309 groups containing three or more pixels. This value is calculated only for DEM bins above 10% of the maximum DEM peak (for each group), otherwise the result becomes dominated by the errors at small DEM values, and division by small numbers (for example, Figure 5 (a) The percentage of pixels, at each DEM temperature bin, that have Grid-SITES output DEMs that lie outside the range of the estimated DEM inversion uncertainty as calculated by direct DEM inversion of each pixel separately. This deviation beyond the inherent inversion uncertainty is therefore due to the error introduced by the grid or histogram bin resolution. (b) The mean absolute deviation of each pixel's DEM from the group's mean DEM (mean of its gridding group), divided by the estimated DEM inversion uncertainties for that group. Both these plots have been calculated for only DEM values above 10% of the maximum DEM peak (maximum for each group's mean DEM), to avoid division by small numbers. see Figure 4b at higher temperatures). Only a small percentage of pixels have a part of their DEM beyond the natural uncertainty range. That is, the gridding of intensity values into bins leads to only a small increase in uncertainty. An analytical estimate of this increase is given later. Figure 5b shows a similar measure -the absolute deviation of each pixel's DEM from its group's mean DEM, divided by the group's DEM uncertainty. This is averaged across all 7309 groups containing three or more pixels, for DEM values above 10% of the peak values (as above). This shows that the grouped DEMs tend to lie very close to their group's mean, within a range less than 30% of the natural uncertainty on average. This section shows that gridding pixels into groups of similar intensities is a valid procedure for improving the efficiency of DEM inversion. Increasing the number of bins for all channels would increase the accuracy of the gridding procedure, at the expense of computational speed.

DEM Gridding Uncertainty
The set of unique channel intensity combinations, K u , is calculated. The small number of pixels not included in the range of K u are inverted directly. For each element of K u , the mean intensity and measurement error is calculated for each channel (this is of course not necessary for elements of K u populated by only one measurement). The intensity and error is passed to a suitable DEM inversion method (in this case, Paper I) for processing. The resulting DEM, D, and uncertainty, d, is recorded for each K u . A new error, d , is calculated that includes the additional error introduced due to the gridding. At a temperature bin subscripted j the error is where subscript i is the index of each channel (i = 0, 1, . . . 5), I i is the mean intensity for the current grid bin (non-logarithmic), I i is the width of the bin, and S ij is the relative response of each channel, i, at temperature bin, j . S ij is calculated from the temperature response of each channel, R ij (as given by the AIA Solarsoft routines) by so that, at a given temperature bin, the relative responses sum to unity over all channels. See Equation 1 and related text of Paper I for more detail. Given a reasonable choice for the number of bins in each channel, I i is small compared to the measurement errors, and the additional error introduced by the gridding method is thus relatively small. In this paper, the choice of 16 to 26 bins depending on each channels median intensity, is a compromise between computational efficiency and the accuracy of the DEM output.
As each K u is processed, the DEM is mapped back into the original x, y pixels by identifying which elements of K are equal to the current value of K u . All these pixels are then assigned to that DEM and the procedure is completed when all elements of K u are thus processed.

Results and Comparison
This section compares DEM inversions gained directly from SITES and through the Grid-SITES method across the region of interest (boxed region in Figure 1). Figure 6 shows the DEM for both methods at three different temperatures. At all temperatures, the maps are very similar, appearing identical to the eye. Figure 7 shows the same comparison, but for the fractional emission measure (FEM). The FEM is the ratio of emission at a given temperature over the total emission integrated over all temperature at that pixel, introduced in Paper I. In this case, some minor differences can be seen. The Grid-SITES maps look slightly more pixelated and non-smooth in some isolated regions. This is due to the DEM results being quantised into discrete steps according to the gridding output.
Figures 8a -c show the percentage deviation of the SITES and Grid-SITES DEMs at three different temperatures. The deviation, , is calculated as where D s is the SITES DEM, inverted directly, and D g is the Grid-SITES DEM. To avoid large relative deviations where D s is small, these values are only shown for regions where D s is larger than 5% of the maximum D s at that temperature. The majority of pixels have small deviations, at a few percent, with only small regions or isolated pixels above ±10%. Figure 8d shows the correlation between the D s and D g profiles, calculated across the temperature range. This is a measure of how similar the DEM profiles are, regardless of their absolute values. All correlations are above 0.9, with the vast majority very close to 1.
The results of this section shows that Grid-SITES is a valid approach to improving the efficiency of DEM inversions. The quality of the match between directly inverted DEMs and gridded DEMs can be further improved by increasing the number of bins for each channel, at the expense of computational efficiency.

Application to Time Series
A highly efficient application of Grid-SITES is for processing a time series of observations. A solution grid is constructed by defining the minimum and maximum intensities, and number of bins to the first observation as described in the method above. Grid-SITES is then applied to the first observation, and the resulting DEMs recorded for each required populated bin. A new solution grid is calculated for the second observation, but using the same parameters (number of bins, minimum and maximum intensities) as the first observation. For this second observation, many of the populated grid bins will also be populated by the first observation, thus the DEMs can be immediately read from the previous grid. This avoids calculating DEMs for grid points common to both sets of observations, and only new grid points require processing, offering increased efficiency. The updated grid can then be used for subsequent observations and the process repeated, with a decreasing number of new grid points requiring DEM processing at each time step. Figure 9 shows, for a time series of ten observations, the percentage of pixels that require processing. Results are shown both for the small example region containing 87,918 pixels, and for the whole solar disk containing 570,240 pixels. For the small region, the first time step requires processing of 27% of the pixels in order to populate the initial grid. For the second time step, this drops to 12%. By the tenth time step, only 5% of the pixels require processing. For the whole disk, the first time step requires 10% of pixels to be processed, dropping to 4% by the second time step, and down to just 1.5% by the tenth time step -thus a factor of 63 faster than processing directly.

Figure 9
The percentage of image pixels that must be processed for a time series of AIA observations, with the grid solutions maintained and updated between each set of observations (see text). The crosses are for the small example region (87,918 pixels), the diamonds are for the whole solar disk (570,240 pixels). A series of ten observations are processed, with the first observation at 1 January 00:00 UT, with a one hour increment.
The AIA calibration, or the temperature response curves, vary over time as the detectors degrade. This time variation is slow and linear, but has occasional discontinuous steps at certain dates, and it is different across channels. Thus a Grid-SITES solution grid is only valid for a certain time period around the date for which the grid is created. During times when the calibration changes slowly and linearly, a period of one or two weeks is probably acceptable. Greater care is needed during times of discontinuous large jumps in response curves. Given this consideration, the DEM solution grid can be saved, and used for processing of other datasets as long as they are collected within an acceptable time period. Care must also be taken with a time series of observations containing rapid changes in intensities, for example during a flare or a coronal dimming. In this case, the intensity ranges for each channel that form the initial grid parameters should be set at ranges appropriate for the whole series. The AIA data file headers contain useful DATAMIN and DATAMAX fields. Figure 10 shows the FEM resulting from applying Grid-SITES to a full-resolution whole disk observation of 01 January 00:00 UT (the same observation is visualised in Figure 1). Three temperatures are shown: i) 0.5 MK: This is at the lower limit of valid temperatures for SITES inversion (see Paper I). The FEM at this temperature is dominated by coronal holes and the quietest Sun far from active regions. ii) 1.6 MK: At this temperature, coronal holes, the quietest Sun, and active regions have low FEM. The FEM is dominated by broad regions surrounding active regions. The high FEM values in these regions (≈ 15%) indicate DEMs that peak strongly near 1.6 MK. iii) 3.1 MK: Only the active regions have high FEM at this temperature.

Application to Full-disk Images
The full-disk observation of Figure 10 has just over 10 7 pixels in the processed field of view. Grid-SITES finds 3.33 × 10 5 groupings of pixels which gives a 30-fold increase in computational efficiency. This observation was processed in around ten minutes on a 3.7 GHz Intel Core i5 desktop PC with 32 Gb memory, compared with 6 hours without Grid-SITES. If the calculated grid was reused for other observations made close in time, full-disk images may be processed in less than a minute.
In high-resolution Grid-SITES DEM (or FEM) maps, the quantisation of the input data (and output DEMs) can be seen as discontinuous steps in off-limb regions. This is seen particularly clearly above limb active regions in the bottom panel of Figure 10. One possible improvement would be an interpolation scheme to avoid these discrete steps. The interpolation is not straightforward since it must be done in the multi-dimensional space of the input data grid, and the populated grid bin may have neighbouring bins that are empty, thus contain missing information for interpolation. In principle, if a working interpolation scheme is achieved, the Grid-SITES DEMs should more closely match directly inverted DEMs.

Summary
Grouping similar groups of pixels according to their logarithmic intensity is a valid approach to improving the efficiency of DEM inversions for large sets of data. For small images (e.g. ≈ 10 5 pixels) the efficiency increase is ≈ 5-fold, for larger images or time series of images (e.g. ≈ 10 7 pixels) the efficiency is closer to 100-fold. The gridding leads to output DEMs that agree well with directly inverted DEMs, with only a small relative increase in uncertainty.
The accuracy of Grid-SITES is dictated by the choice of grid resolution. Finer grids (higher number of bins) lead to higher accuracy and less efficiency. In this paper, 16 -26 bins are allocated to each channel according to that channel's typical intensities, giving a sensible compromise between accuracy and efficiency. In the limit of very fine grids, the result and Figure 10 Fractional emission measure (FEM) at a temperature of 0.5 MK (top), 1.6 MK (middle) and 3.1 MK (bottom) for a full-resolution observation made at 01 January 00:00 UT, and inverted using Grid-SITES. The colour bar shows the percentage FEM for each pixel at each temperature. efficiency becomes equal to inverting the data directly per pixel. Thus Grid-SITES offers a flexible working environment according to the purpose of the inversion. For example, a user can choose a coarse binning for a very fast inversion to give initial results, and a fine grid for more detailed further analysis. Greater efficiency gains are offered by larger datasets, and a grid can be saved for use on other data as long as the user is aware of certain periods when the AIA response calibration changes rapidly.
In this work, Grid-SITES uses SITES as the core inversion method, although the same gridding approach can be used with any DEM inversion method. This can facilitate the application of more than one inversion method to the same dataset, giving additional confidence on the inversion results. This work has used AIA images exclusively, although Grid-SITES can obviously be used for inverting any suitable dataset. Furthermore, the gridding scheme can be used for analysis other than DEMs, where a time-consuming procedure is to be applied to a large multi-dimensional dataset.
The incentive for developing Grid-SITES is to analyse large datasets and enabling, for example, large-scale statistical studies of active regions, or of coronal changes over long time-scales using AIA/SDO. Given a decent desktop computer, Grid-SITES enables rapid DEM inversions of a time series of full resolution, full-disk AIA observations -in minutes rather than hours. The IDL routines for Grid-SITES, the DEM inversion SITES method, plus the FEM visualisation method will be made available via the Solarsoft software package.