Review of image quality measures for solar imaging

The observations of solar photosphere from the ground encounter significant problems due to the presence of Earth's turbulent atmosphere. Prior to applying image reconstruction techniques, the frames obtained in most favorable atmospheric conditions (so-called lucky frames) have to be carefully selected. However, the estimation of the quality of images containing complex photospheric structures is not a trivial task and the standard routines applied in night-time Lucky Imaging observations are not applicable. In this paper we evaluate 36 methods dedicated for the assessment of image quality which were presented in the rich literature over last 40 years. We compare their effectiveness on simulated solar observations of both active regions and granulation patches, using reference data obtained by the Solar Optical Telescope on the Hindoe satellite. To create the images affected by a known degree of atmospheric degradation, we employ the Random Wave Vector method which faithfully models all the seeing characteristics. The results provide useful information about the methods performance depending on the average seeing conditions expressed by the ratio of the telescope's aperture to the Fried parameter, $D/r_0$. The comparison identifies three methods for consideration by observers: Helmli and Scherer's Mean, Median Filter Gradient Similarity, and Discrete Cosine Transform Energy Ratio. While the first one requires less computational effort and can be used effectively virtually in any atmospherics conditions, the second one shows its superiority at good seeing ($D/r_0<4$). The last one should be considered mainly for the post-processing of strongly blurred images.


Introduction
Ground-based telescopes suffer from the degradation of image quality due to the turbulent nature of Earth's atmosphere. This phenomenon, frequently termed as "seeing", prevents large-aperture telescopes from achieving their theoretical angular resolution. Even the best observing sites do not allow for observations in the visible with the resolution higher then the diffraction limit of a 20 cm telescope. This means, that long exposures of both very small and extremely large telescope show the same angular resolution.
The Fried parameter, r 0 , is a quantity which describes the average atmospheric conditions. The Fried parameter is the distance across which the expected change of the wavefront phase is exactly 1/2π. It may be also understood as the size of the telescope's aperture over which the theoretical diffraction limit may be easily achieved. Since at the best observational sites r 0 rarely reaches 20 cm in the visible (λ = 0.5 µm) (Socas-Navarro et al., 2005), the long exposures obtained from any telescope do not expose the resolution higher than the one reached by a 20 cm telescope. Importantly, the D/r 0 ratio is frequently utilized to determine how far the resolution of images obtained by a telescope of D diameter is from its theoretical diffraction limit.
So far, numerous techniques, both hardware-and software-based, have been developed to enhance the resolution of astronomical observations. Probably the most prominent hardware-based approach is adaptive optics (AO, Hardy, 1998), in which the compensation of wavefront distortions is performed directly by a deformable mirror. The AO for solar imaging differs from the one used in nighttime observations (Rimmele and Marino, 2011;Rimmele, 2004). There is no point-like object for wavefront sensing which is performed usually with Shack-Hartmann sensor. Instead, the cross-correlation between images observed in individual subapertures has to be calculated to estimate the shape of wavefront (Löfdahl, 2010). Significantly poorer daytime atmospheric conditions put much higher demands on the updating frequency of deformable mirror. Fortunately, there is also more light available for sensors which enables such observations.
A popular example of software-based approach to improve the quality of images is the method called Lucky Imaging (LI, Scharmer, 1989;Law, Mackay, and Baldwin, 2006). Due to the availability of very fast and low-noise cameras, this method has recently become very popular high-resolution acquisition technique in the visible. However, it is dedicated to smaller telescopes (< 2 m), since it relies on the fact, that there is only a small chance to obtain a diffraction-limited image in a series of very short exposures (i.e., shorter than the coherence time of the atmosphere -usually up to several milliseconds). This chance, estimated by Fried (1978), is relatively high for smaller apertures (e.g., D/r 0 < 2), while it quickly becomes negligible for greater mirrors.
There are many combinations of software and hardware techniques of highresolution imaging. As an example, Colodro-Conde et al. (2017) or Mackay et al. (2012) present the fusion of Lucky Imaging and adaptive optics (AOLI). In Schödel and Girard (2012) the deconvolution of a series of short exposures is shown as another possible way to enhance the quality of astronomical images. A wide range of speckle-interferometry methods is also available (Saha, 2007;Tokovinin and Cantarutti, 2008). They involve such approaches as the aperture masking (Monnier et al., 2004;Ireland, 2013) or speckle bispectral reconstructions (Lohmann, Weigelt, and Wirnitzer, 1983;Hofmann and Weigelt, 1993). Some of them have been successfully used for a long time in solar imaging (von der Lühe, 1993).
Either with or without AO, the observer acquires a series of images and then applies several post-processing steps (e.g., Denker et al., 2005). One of them is always the selection of best exposures, i.e., the ones having the highest image quality. Such lucky imaging has been very popular for a long time in groundbased solar observatories due to its apparent simplicity and very low hardware requirements (only a fast camera is necessary) (Beckers, 1989). However, the most advanced imagers utilizing lucky exposures are far from being simple (Ferayorni et al., 2016). The rejection of less useful frames is possible due to the high intensity of observed object which allows for acquiring thousands of exposures per second at relatively low resolution or tens of frames if large-format cameras are utilized. The selection is also essential for reaching the high quality of final outcomes regardless of the complexity of succeeding image reconstruction (e.g., simple stacking, deconvolution, or bi-spectral analysis).
The assessment of the temporal quality of registered solar images becomes challenging due to the complex character of observed scene. It is impossible to utilize a basic quality metric frequently used in lucky imaging -the intensity of the brightest pixel in a speckle pattern -as it requires a well isolated pointlike object (star). Instead, a widely employed method for quality assessment of solar images is the root-mean-square contrast (rms-contrast) which assumes the uniform isotropic properties of granulation (Danilovic et al., 2008;Denker et al., 2005;Scharmer et al., 2010). Unfortunately, the method has several drawbacks like a significant dependency of its effectiveness on the wavelength (Albregtsen and Hansen, 1977) and the sensitivity to the structural contents of the image (Deng et al., 2015). This implies that also the tip-tilt effects, which move observed patch slightly changing its contents, will introduce additional noise to quality estimation as the image features, like sunspots or pores, will move in and out of the analyzed region.
Motivated by a recent work by Deng et al. (2015) introducing an objective image quality metric to solar observations, we decided to explore which of the numerous quality metrics (QMs) available in the rich literature can be employed for selection of solar frames. This is carried out by investigating the correlation between the outcomes of QMs and the known strength of simulated turbulence. Our review includes 36 QMs with many varying implementations. Implementations refers to different gradient operators, kernel sizes, thresholding techniques, etc., without implying conceptual changes. We utilize reference images from the Solar Optical Telescope (SOT) on board the Hindoe satellite Tsuneta et al. (2008) and use an advanced method for modeling atmospheric turbulences, the Random Wave Vectors (RWV, Voitsekhovich et al., 1999) method. The scintillation noise is also included to faithfully reflect all the seeing characteristics. Moreover, we check the computational efficiency of the QMs to indicate which methods are more suitable for application in high-speed real-time image processing.

Turbulence simulation
Since the observations from the space are not disturbed by the atmosphere, as the reference data for our experiments we used the images obtained by the SOT. From a wide range of registered images in Hindoe database 1 , we selected a one which contained several regions of enhanced magnetic activity (sunspots). The image was originally obtained at green continuum wavelength, on 29 Nov. 2015 at 21:19:44UT. In the Fig. 1 we show the selected reference and indicate the positions of six extracted 100 × 100-pixel patches, (roughly 5 × 5 each, image scale of 0.054" pixel −1 ). Patches W1, W2, and W3 include sunspots, while W4, W5, and W6 contain granulation.
To investigate the response of various QMs for a given strength of turbulence, we modeled the transfer function of the atmosphere utilizing RWV method. The method allows for reliable modeling of amplitude and phase fluctuations of the incoming optical wavefront. Since a patch of solar photosphere, which is used for quality assessment, can be arbitrarily small we assumed that it is within the isoplanatic angle. Thus, anisoplanatic effects, more complicated for simulation, are neglected in this study, which will otherwise significantly complicate the simulations. Using RVW we generated 1000 blurring kernels (speckles patterns) for ten distinctive scenarios of seeing conditions: D/r 0 = 1, 2, ..., 10. We assumed that such a range is representative for observations at best observing sites. Each generated kernel consisted of 1024×1024 pixels and the resolution was two times higher than required by the Nyquist limit, i.e, a single pixel corresponded to an angular size of λ/4D (λ -wavelength, D -telescope diameter). We opted for such oversampling to be able to combine simulated speckle kernels with reference Hindoe images. Each blurring kernel was normalized so that the summed intensity over all pixels is unity. Exemplary kernels with the corresponding longexposure seeing disks are presented in Fig. 2. Evidently, the poorer the seeing conditions (higher D/r 0 ), the more complex is the kernel shape. The simulated tip-tilt effect is also visible as a displacement of the kernel's centroid.
Atmospheric scintillation results in varying attenuation of flux collected by a telescope. This type of noise was recently investigated by Osborn et al. (2015). The authors present the following formula for estimating the relative variance of the total intensity of the observed object:   where D is telescope diameter, γ is the zenith distance , h obs is the altitude of the observatory, H is the scale height of the atmospheric turbulence, generally assumed to be ∼ 8000 m, C Y is the scaling coefficient which can be determined from turbulence profilers (SCIDARs) and was estimated between 1.30−1.67 for best observing sites (see Tab. 1 in the original work Osborn et al. (2015)).
To include such fluctuations in our simulated images, we multiplied each kernel by a random variable with an expected value of unity and a standard deviation equal to σ s . We assumed (1) the telescope size D = 0.5 m according to the size of Hindoe/SOT instrument, (2) the observations at zenith distance of γ = 60 • and (3) a low scintillation index, C y = 1.5, which is expected for (4) high-altitude observatory, h obs = 3000 m. For such conditions the relative scintillation noise is σ s = 0.032.
To properly convolve a blurring kernel with a reference patch, the scale of a kernel has to be the same as the image scale. For the assumptions given above, i.e., D = 0.5 m telescope observing at λ = 550 nm, we obtain an image scale of 0.055" pixel −1 . This means that the sampling of blurring kernels (D/4λ) and the assumed telescope size allow for convolving kernels with the Hindoe images without any prescalling. Several examples of solar images degraded by simulated blurring kernels are presented in Fig. 3.
The time-dependent quality, when observing stellar objects, can be determined from the relative intensity of the brightest pixel in the normalized kernel. This is a widely accepted approach to select the sharpest frames in the nighttime lucky imaging (Law, Mackay, and Baldwin, 2006). However, in our case the object is not point-like and shows complex structures. Thus, we decided to use the quality measure based on the amount of energy preserved from the original frequency spectrum. Since the original image is convolved with a blurring kernel and the original amplitudes in frequency spectrum are multiplied by the amplitudes of a kernel, the value of proposed quality measure can be calculated by summing squared amplitudes in 2-D Fourier transform of a kernel. According to the Parseval's theorem, this also equals to the sum of squared intensities of a kernel directly in the image plane.
Since turbulence is a random process, there is also a possibility that the temporal seeing conditions will become much better or much worse than the ones indicated by the average D/r 0 (Fried, 1978). This is in fact what we also observed in our data. For all ten sets of simulated kernels (D/r 0 = 1, 2, ..., 10), the spread of temporal quality is exposed in the histograms in Fig. 4. In the upper part of Fig. 4 we plotted the amount of preserved energy of the original image over all 1000 kernels for three values of D/r 0 . Clearly, the quality for the average D/r 0 = 4 can sometimes outperform the conditions registered at D/r 0 = 1. Therefore, D/r 0 expresses only the average blurring strength while it can not be taken as a reliable estimator of the quality of individual frames.

Methods
A set of 35 state-of-the-art QMs was provided in the review of Pertuz, Puig, and Garcia (2013). The authors consider the most popular methods dedicated for the assessment of focus in complex scenes by means of contrast measurement. The methods can be categorized into six families of operators, based on: gradients (GRA), Laplacians (LAP), wavelets (WAV), intensity statistics (STA), discretecosine transform (DCT), and the miscellaneous methods (MIS), i.e., the ones that can not be categorized into any other group. An overview of the methods with their abbreviations and references is given in Tab. 1.
Within the set of techniques one can find the most popular method used in solar image processing, i.e., rms-contrast measure, which was called Normalized Gray-level Variance and abbreviated as STA5 (we followed this nomenclature). The most recent QM proposed by Deng et al. (2015), Median Filter Gradient Similarity (GRA8), was included in our comparison as well.
To improve the effectiveness of QMs for solar observations we had to adjust the parameters of many techniques. This was carried out experimentally by tuning the parameter considering the properties of observed solar scene and analyzing the algorithm's details. Moreover, for some of the methods we proposed major, but still simple, modifications which allowed for enhancing their effectiveness. This resulted in creation of various implementations of the most of methods (labeled as version A, B, etc.). The set of investigated parameter values and/or the details of applied modifications are given in Tab. 2. The distance parameters, like radius or size of local filtering window, are expressed in pixels (in our case 1 pixel = 0.055"). The thresholds are given in relative values, which means that before applying thresholding the intensities in a patch were normalized such that the intensities cover the range from zero to unity. Several methods have two adjustable parameters, therefore we evaluated various combinations of their values. Including all the modifications, we end up in a total number of 105 implementations of 36 techniques which are included in our comparison.

Data analysis
To assess the performance of all methods we investigated the correlation between the results of QMs and the actual expected quality Q. As stated before, the quality was estimated by the total preserved energy with respect to the reference, undisturbed image. We observed that the relation between these two quantities is almost always nonlinear. Fortunately, it is sufficient that the dependency is monotonic as it allows us to distinguish between better and worse atmospheric conditions. Therefore, to estimate the effectiveness of the methods, instead of Pearson's correlation coefficient, we used the Spearman rank-order correlation C s . Such a coefficient is insensitive to any nonlinearities in observed dependencies.
As an example in the upper panel of Fig. 5 we show significantly different dependencies between the expected quality Q and the outcomes of two QMs (LAP1 and MISB, on the left and right side, respectively) calculated for image patch W 1. Each set of D/r 0 was presented with a different marker/color. We decided to calculate the Spearman correlation coefficient in two ways: (1) within a set of observations for each D/r 0 and (2) across all the simulated conditions D/r 0 = 1 − 10. We present the corresponding results of C s in each D/r 0 subset in the bar plots of Fig. 5. The correlation depends on D/r 0 so that it becomes apparent which range of atmospheric conditions is most suitable for a given method. For example, the LAP1 method allows for effective estimation of image quality in D/r 0 = 1 while MIS1B is completely useless in this range. The proposed approach to calculate the correlation coefficient permitted estimating the performance of each method for a range of typical atmospheric conditions.    A−E number of diagonal elements from the matrix of eigenvalues (k) k = {1, 3, 5, 7, 9} STA3 --STA4 A−E size of the window in which the local variance is computed (w) w = {3, 5, 7, 11, 15} STA5 --STA6 A In the example presented in Fig. 5 the superiority of LAP1 over MIS1B is also visible for C s calculated over whole set of D/r 0 . However, the difference is significantly smaller than in particular groups as both methods achieve C s > 0.9 (0.98 and 0.90 for LAP1 and MIS1B, respectively). This is due to the wide range of considered Q values, which makes the correlation between the quantities more evident and leads to the elevated value of the C s coefficient. Although the differences are not so significant, the correlation for such a wide range of Q values still allows for comparison of the overall effectiveness of QMs, however, within much narrower range of C s values.

Results and discussion
The results of the comparison are presented in Tab. 4 and Fig. 6. While in Tab. 4 we present only the best four methods for each D/r 0 , the average performance obtained for whole QM families is summarized in Fig. 6. We divided the results into two categories, according to the contents of observed image: W1, W2, and W3 -active regions, and W4, W5, and W6 -granulation.
According to the results presented in Tab. 4, there is no single winner covering all possible atmospherics conditions and scene characteristics. However, there are  Table 4. The rankings of best quality measures for various D/r 0 seeing conditions and six analyzed patches: W1, W2, W3 -active regions and W4, W5, W6 -granulation areas. The lower bar plots show the rankings for wide-range correlation accross all atmospheric conditions D/r 0 = 1 − 10. three techniques which showed distinctively better performance and, therefore, they are discussed below. One of the methods with very impressive performance, is Helmli and Scherer's Mean method (MIS5) proposed by Helmli and Scherer (2001). This technique provided very good results for whole range of D/r 0 conditions and both types of observed solar regions. For all atmospheric conditions it is always among the four best methods. As indicated by distinctively higher correlation coefficients calculated over D/r 0 = 1 − 10, its usefulness should be considered when the seeing is highly varying or unknown. In summary, this method should be the first choice among all tested techniques.
The second noteworthy method is the Median Filter Gradient Similarity (GRA8) method, which was recently proposed by Deng et al. (2015). It shows the best performance for very good atmospherics conditions (D/r 0 < 4). However, to achieve the high effectiveness we had to slightly modify the method by (1) combining both horizontal and vertical gradients and (2) changing the size of kernel. The superiority of the method is evident especially for active regions (W1−W3), while it is slightly less efficient when assessing only granulation patches (W4−W6), especially for D/r 0 > 2. Interestingly, this technique was not included in the best 12 methods indicated by wide-range correlations, D/r 0 = 1 − 10 which indicates that it should not be applied for observations with poor or unknown atmospherics conditions. The results of the GRA8 method recommend it for the utilization in observations assisted by AO since in this case the image quality is significantly enhanced, and the effective D/r 0 is small.
The last method, which provides good results is the DCT Energy Ratio proposed by Shen and Chen (2006). Its performance is the highest for moderate and poor atmospherics conditions when observing granulation regions. This method is second best method when the whole range of D/r 0 is taken into account, for patches W4-W6. The parameter of this method -the size of sub-block -should be selected accordingly to the blurring strength, i.e., the larger the D/r 0 , the larger the sub-block.
The method frequently used in solar observations, rms-contrast or normalized gray-level variance (STA5), showed surprisingly poor performance. Since the rms-contrast measure was originally developed for granulation regions, with isotropic and uniform characteristics, we investigated how the method performed for this type of scene. The results presented in Fig. 7 indicate that indeed the techniques allows for achieving significantly higher correlation values for granulation. Still, the outcomes based on rms-contrast were much less correlated with the expected image quality than the results achieved by the best techniques. Therefore, it did not appeared in any ranking presented in Tab. 4. Our conclusions agree with the observations made by Deng et al. (2015) who also indicate relatively low efficiency of the common rms-contrast measure.
Some interesting conclusions can be drawn from the average performance of each QM family presented in Fig. 6. The Laplacian-based methods (LAP) show very good average correlation with D/r 0 for both types of observed scenes. While the LAP-based methods are indicated only several times in the rankings presented in Tab. 4, they still should be considered as a base for new, better techniques. The biggest difference between the efficiency when changing from granulation to active regions, can be observed in statistical methods (STA). This agrees well with the results shown in Fig. 7 and appears due to the assumption made in STA methods that regions should expose uniform properties across the observed field. The strange shape of GRA dependency visible in the right plot of Fig. 6 arises due to the characteristics of Median Filter Gradient Similarity (GRA8). As it was stated before, the performance of this method rises significantly as the D/r 0 decrease. This behavior biases the average efficiency of GRA family. The possibility of using a method in real-time image selection can be especially important, as it allows for recording only the most useful data. Therefore, for completeness of our comparison, we measured the time required for processing a single image patch by each method. The evaluations were performed on a single core of IntelCore i7-3930K operating at 3.2 Ghz. The procedures were repeated 10 4 times to obtain the average execution time. The results are presented in Tab. 5 wherein we show the methods and their average computation time (singleimage analysis) in seconds, alternately.
The results show that real-time frame selection for the assumed size of image patch is possible for frame rates more than 1000fps for most of the analyzed methods. This can be, however, not true if one wants to process large, full resolution, frames and/or if several steps of image calibration have to be applied. For such a case, the usage of graphics processing units (GPU) and accompanying optimization of a code should be considered. However, in our comparison we decided to perform measurements on a single CPU core, so that the further reduction of execution time with a multi-core machine can be estimated. For most of the included methods, the analysis is performed independently in local sub-regions of whole image, which makes them easily parallelized. We observed that the Median Filter Gradient Similarity (GRA8 ) requires visibly more computation time (>0.01 s) than MIS5 (<0.002 s). It is mostly due to the median filtering which requires sorting pixels in a sliding window. This indicates that GRA8, compared to MIS5, would require more computing resources and/or better code optimization to operate at high frame rates utilized frequently in solar observations. Unfortunately, the Discrete Cosine Transform techniques (DCT1 and DCT2 ) have significantly higher execution time, which makes them useless for real-time computation. These methods should be considered only in the post-processing. The fastest method was the one most frequently utilized in solar observations, STA5. Unfortunately its mediocre performance is not compensated by the distinctively higher computational efficiency.

Conclusions
The assessment of image quality is an integral part of observations of the solar photosphere and chromosphere. It is essential to precisely select only the images of the highest fidelity before performing consecutive image reconstruction. Unfortunately the complexity and variety of observed structures make the estimation of image quality a challenging task. There is no single point-like object available, therefore, the selection techniques utilized in nighttime lucky imaging are not applicable.
In this study we decided to employ 36 techniques with various implementations and evaluated their precision in selecting the best exposures. The methods were based on various principles (gradients, intensity statistics, wavelets, Laplacians, Discrete Cosine Transforms) and were published over the last 40 years. Additionally, we enhanced their performance by applying simple modifications and by adjusting their tunable parameters.
In the comparison we employed reference images, containing both active regions and granulation areas, obtained by the Hidnoe satellite. The selected patches were degraded by convolution with blurring kernels generated by RWV method which faithfully reflects the seeing characteristics. We assumed a wide range of relative atmospheric conditions, starting from nearly undisturbed observations at D/r 0 = 1 to mediocre seeing at D/r 0 = 10. The reference quality of each simulated image was objectively estimated by measuring the amount of energy preserved in the Fourier spectrum of the original image. Eventually, to assess the efficiency of the compared techniques, we calculated Spearman's correlation coefficient between the outcomes of each method and the expected image quality.
The results of our comparison showed that the efficiency depends on the strength of atmospheric turbulence. For good seeing, D/r 0 < 4, the best method is the Median Filter Gradient Similarity (Deng et al., 2015), (GRA8), the most recent technique proposed for solar observations. Importantly, the original idea had to be slightly modified to enhance the method's performance. On the other hand, when the seeing conditions cover a wide range, the most efficient method is the Helmli and Scherer's Mean (Helmli and Scherer, 2001), (MIS5). This method should be considered when observing without AO or if the seeing conditions are unknown. The last distinctive method was the DCT Energy Ratio (Shen and Chen, 2006) which showed high performance in poor atmospheric conditions when observing granulation regions. The measurements of execution time indicated that of the three mentioned techniques the Helmli and Scherer's Mean has significantly higher computational efficiency which recommends it for utilization with extremely high frame rates and/or larger image patches.