Bayesian Super-Resolution of Spectroscopic Data

The resolution of spectroscopy, which delivers valuable insights and knowledge in various research fields, has sometimes been limited by the number of multi-channel detectors employed. For example, in Raman spectroscopy using charge coupled device (CCD) detectors, the resolution is limited by the number of the CCD arrays and it is difficult to achieve spectroscopic data acquisition with high resolution over a wide range. Here we describe a methodology to increase the resolution as well as signal-to-noise (S/N) ratio by applying Bayesian super-resolution in the analysis of spectroscopic data. In our present method, first the hyperparameters for the Bayesian super-resolution are determined by a virtual experiment imitating actual experimental data, and the precision of the super-resolution reconstruction is confirmed by the calculation of errors from the ideal values. For validation of the super-resolution of spectroscopic data, we applied this method to the analysis of Raman spectra. From 200 Raman spectra of a reference Si substrate with a resolution of about 0.8 cm -1 , super-resolution reconstruction with resolution of 0.01 cm -1 was successfully achieved with the promised precision. From the super-resolution spectrum, the Raman scattering peak of the reference Si substrate was estimated as 520.55 (+0.12, -0.09) cm -1 , which is comparable to the precisely determined value from previous works. The present methodology can be applied to various kinds of spectroscopic analysis, leading to increased precision in the analysis of spectroscopic data and the ability to detect slight differences in spectral peak positions and shapes. spectroscopy (PL),


Introduction
Spectroscopy is utilized in various research fields such as physics, chemistry, agriculture, and medical science, and delivers valuable insights and knowledge.
Spectroscopic data based on light, X-ray and electrons is often obtained using charge coupled device (CCD) detectors [1][2][3][4] , and their resolution is sometimes limited by the number of pixel arrays in the CCD detectors. In Raman spectroscopy, for example, peak positions are slightly shifted depending on the temperature, carrier concentration and strain in semiconductors and low-dimensional materials [5][6][7][8][9][10][11][12][13][14] . Due to the limitations imposed by the number of CCD arrays in Raman apparatus, increased resolution is achieved at the expense of narrowing the range of wavelengths measured. Therefore, it is almost impossible to acquire high-resolution Raman spectra over a wide range.
Curve-fitting with a model function such as Lorentzian, Gaussian or Voigt functions is used to analyse slight shifts in the Raman peak positions. However, actual Raman peak profiles do not completely follow the model functions, and the difference between the actual profile and the modelled profile produces a systematic error in the peak positions 6,15,16 . Although the acquisition of spectroscopic data with sub-pixel displacements is possible to improve the pixel resolution, the precision of the displacements limits super-resolution 17 . This is true not only for Raman spectroscopy but also for other spectroscopic methods 18,19 .
Bayesian "image" super-resolution is a method to combine a set of lowresolution images of the same view with sub-pixel displacements relative to each other using Bayes' rule in order to obtain a single image of higher resolution. In this method, the sub-pixel displacements and high-resolution image are deduced from a set of lowresolution images assuming a prior distribution [20][21][22] . In the present study, we applied the concept of Bayesian super-resolution to spectroscopic data to improve the pixel resolution, which is limited by the number of the multichannel detectors such as CCD arrays.

Bayesian super-resolution
In this section, the procedure of Bayesian super-resolution of spectroscopic data is described followed by Bayesian image super-resolution [21][22][23] . One of the most important differences between images and spectroscopic data is the dimension of the data. While image data is two-dimensional, spectroscopic data is one-dimensional, with one variable corresponding to the horizontal axis (e. g. wavenumber in the case of Raman spectra).
Reconstruction-type super-resolution aims to generate a high-resolution spectrum x from a given set of low-resolution observed spectra D = {yt | t = 1, 2, . . ., . (1) Bayesian super-resolution estimates the parameters using the maximum-marginalizedlikelihood (MML) rule: where L(θ) is the log marginal likelihood as follows: After obtaining the estimated registration parameters (̂), the high-resolution spectrum (x^) is deduced as the expected value of the posterior distribution: Here, we use a prior estimate which represents the smoothness constraints given by the following formula as given in ref 23: where ρ is a precision parameter that determines the strength of the prior belief and i ~ j represents the adjacent values i and j, and the summation is taken over all pairs of neighbouring values. Here, we express the pair of neighbours of the value i on the highresolution spectrum as N(i). In this case, p(x) becomes a Gaussian distribution because the exponent of Eq. 5 is always negative and is a quadratic function of x. A is a symmetric matrix that is derived as follows: . (6) The likelihood is defined according to the assumption that the observed spectrum yt is obtained by an operation in which the high-resolution spectrum x is geometrically transformed and corrupted by Gaussian noise. In the present study, only horizontal translation of spectral data is considered. This operation can be represented by the following equation using registration parameters θt: where W(θt) is a non-square matrix representing the geometrical transformation and εt is Gaussian noise with uniform precision (inverse variance) β. We sometimes write Wt = W(θt) for the sake of simplicity. The likelihood is given by The expectation-maximization (EM) algorithm is used to search the registration parameters 24 . In the E step, the posterior distribution of a high-resolution spectrum x is calculated: where Σ = ( In the M step, the following expected squared error derived from the expected likelihood is optimized with respect to θt: The procedure of super-resolution for spectroscopic data is summarized in Algorithm 1. Spectroscopic data D is composed of observed values yt (corresponding to the values of "intensity" in Raman spectra for example) at each horizontal position θt p (with values of "wavenumber"), which was determined by the measurement apparatus (for example, the interval of the data points is not constant in CCD Raman). In addition, the deviation of the horizontal position from the ideal position θt c exists as a hidden parameter. Thus, in the present case, θt is composed of θt p and θt c . In the present study, the expected squared error shown in Eq. (12) was optimized in the parameter of θt c by brute-force search because of its multimodality.

Determination of hyperparameters
In the algorithm for our Bayesian super-resolution of spectroscopic data, there are two hyperparameters ρ and β. In this section, the determination procedure for these hyperparameters is shown. We set the value of 1/β as the value of the variance for the background noise of the observed spectroscopic data. The value of ρ was determined from the virtual spectroscopic data generated by the following procedure. Firstly, the experimental data was fitted to the Lorentzian and the values of peak height (I0), peak position (x0), vertical offset (F0) and full width at half maximum (FWHM) (w) were estimated. The Lorentz function (F(x)) is defined as follows: Then, the virtual spectroscopic data was generated from the Lorentz function using the estimated values with the Gaussian noise and geometrical transformation represented by the registration parameters θt. Using the virtual data, the value of ρ was determined so as to minimize the error from the Lorentz function. Using these hyperparameters, the super-resolution spectra was deduced from the actual data with the promised precision.

Experimental procedure
The Si substrate for positron defect measurements (NMIJ CRM 5606-a) was obtained from National Metrology Institute of Japan. Itoh and Shirono have reported that the Raman shift of NMIJ CRM 5606-a was 520.45±0.28 cm -1 with reliable estimation from their intensive research 25 . Raman spectroscopy measurements were performed at room temperature using a Renishaw In-Via Raman microscope. The wavelength of the incident laser was 532 nm and the width of the grating was 3000 gr/mm. The pixel resolution of the obtained spectra was about 0.8 cm -1 at around 520 cm -1 . An objective lens with a magnification of 5 times was used and the acquisition time was 1 second.
200 Raman spectra were obtained changing the horizontal offset values by 0.01 cm -1 .

Results and discussion
Figure 1(a) shows one of the Raman spectra taken from the Si substrate. Owing to the edge filter, the background levels showed a steep change at around 100 cm -1 . In addition to the Raman shift of Si at around 520 cm -1 , a leakage of Rayleigh scattering at around 0 cm -1 was observed. All Raman shifts were evaluated by fitting to the Lorentzian functions ( Fig. 1(b)), and the peak intensities and peak positions were confirmed to be randomly distributed as shown in Fig. 1(c). The evaluated parameters obtained by fitting as well as the evaluated value of the standard deviation for background noise are summarized in Table II.
Two hundred sets of virtual spectroscopic data were generated and the error of the super-resolution spectra from the true values calculated from the Lorentz function was evaluated for different values of ρ. In the present case, the wavenumber resolution was set to 0.01 cm -1 , which was about 80 times higher than the experimental resolution.
From the result shown in Fig. 2 The super-resolution Raman spectrum was reconstructed by the Bayesian super-resolution procedure using the determined hyperparameters by the virtual experiment with wavenumber resolution of 0.01 cm -1 . Figure 3 shows the reconstruction result of the super-resolution Raman spectrum from the 200 experimental datasets.
Although the spectrum shown over the wide range ( Fig. 3(a)) seems almost same as the experimental spectrum shown in Fig. 2(a), the magnified spectra clearly show the results of the super-resolution procedure. The shape of the Rayleigh scattering peak at around 0 cm -1 shown in Fig. 3(b) was not ideal Gaussian but asymmetric. This would be due to the actual condition of the edge filters. At around 300 cm -1 , very weak peaks originating from two-phonon Raman scattering 26,27 were observed as shown in Fig. 3(c).
Compared to the measured spectra, not only high resolution but also the reduction of noise was achieved. At around 520 cm -1 , single-phonon Raman scattering was observed which did not have the ideal Lorentz function shape (Fig. 3(d)). Further magnification of the spectrum shows the asymmetrical shape of the peak top, which may be caused by the deviation of the optical alignment from ideal.
The deviation from the ideal Lorentz function for the Raman scattering as well as that from the ideal Gaussian function for the leakage of Rayleigh scattering led to the occurrence of systematic errors in the peak positions from fitting. On the other hand, we can determine the peak positions as the local maxima with wavelength resolution of 0.01 cm -1 and with the promised precision estimated from the variance calculated by Eq.
(10). Table III  The current results indicate that multiple measurements of Raman spectra obtained by commercial CCD Raman apparatus applying Bayesian super-resolution can reach similar precision for the estimation of the Raman shift to that obtained by highreliability methods. Furthermore, by the current procedure, in which the hyperparameters are determined by a virtual experiment, we can reconstruct the highresolution spectra with decreased noise by the Bayesian super-resolution with excellent precision. The present methodology can be applied not only to Raman spectroscopy but also all kinds of spectroscopy utilizing multi-channel detectors, such as electron energy loss spectroscopy (EELS), energy dispersion X-ray spectroscopy (EDX), Auger electron spectroscopy (AES), X-ray photoelectron spectroscopy (XPS), X-ray fluorescence spectroscopy (XRF), X-ray diffraction (XRD), optical emission and absorption spectroscopy, photoluminescence spectroscopy (PL), cathodoluminescence spectroscopy (CL) and so on. Although the current methodology cannot improve spectral resolution beyond the digital resolution of the measured spectroscopic data, precise estimation from the spectroscopic data of features such as spectral peak positions and shapes is possible without changing the high-resolution setup of the spectroscopy apparatus.

Conclusion
We established a methodology to increase the resolution of spectroscopic data by applying Bayesian super-resolution to the analysis of spectroscopic data. In our present method, hyperparameters are determined by a virtual experiment imitating the experimental data, and the precision of the super-resolution reconstruction is confirmed by the calculation of errors from the ideal values. In order to validate the performance of the super-resolution, we acquired 200 Raman spectra with a resolution of about 0.8 cm -1 from a reference Si substrate (NMIJ CRM 5606-a). Using the determined hyperparameters, the super-resolution reconstruction with the resolution of 0.01 cm -1 was successfully achieved with the promised precision. From the super-resolution spectrum, the Raman scattering peak of the reference Si substrate was estimated as 520.55 (+0.12, -0.09) cm -1 which is comparable to the precisely determined value. The present methodology can be applied to various kinds of spectroscopic analysis, leading to increased precision in the analysis of spectroscopic data and the ability to detect slight differences in spectral peak positions and shapes.    Raman spectra. One of the measured spectra is also shown for comparison.