1 Introduction

Unlike other branches of physics, astrophysics cannot apply the third pillar of the scientific method, experimentation. After observing nature and conjecturing laws that govern its behavior, astronomers cannot carry out experiments that confirm or falsify the theory. Experimentation is then substituted by new observations conducted to check the theoretical predictions. The intrinsic inability for directly measuring the celestial objects adds a special difficulty to the astrophysical tasks. We do not have thermometers, weighing scales, tachometers, magnetometers that can directly gauge the physical conditions in the object. Rather we have to be content with indirect evidence or inferences obtained from the only real astrophysical measurements, namely those related to light. The intensity and polarization properties for visible light, the associated electric field for radio frequencies, or the energy or momentum of high energy photons, as functions of space, wavelength, and time, can be fully quantified with errors that are directly related to the accuracy of the instruments.Footnote 1 From these real measurements, the observational astronomer must deduce or infer the physical quantities that characterize the object with uncertainties that depend on both the experimental errors and the assumptions that allow him/her to translate light-derived quantities into the object quantities. Observational astrophysics could hence probably be defined as the art of inferring the physical quantities of heavenly bodies from real measurements of the light received from them.

Somehow, these astrophysical tasks can be mathematically seen as a mapping between two spaces, namely the space of observables and that of the object’s physical quantities. The success of the astronomer then depends on his/her ability (the art) to characterize not only the mapping but the two spaces. On the observable side, what really matters is the specific choice of measurable parameters and how well they are measured; that is, how many light parameters are obtained (the signal) and which are the measurement errors (the noise). On the object’s physical condition side, what is substantive is the selection of quantities to be inferred. Of course, the finer the—affordable—detail in describing any of the two spaces, the better. The keyword is affordable because infinite resolution does not exist in the real world: a compromise is always in order between the number of available observables and the number of inferred physical quantities. The representation of both spaces therefore needs approximations that constrain the sub-spaces to be explored and how they are described: which Stokes parameters, with which wavelength and time sampling, and with which instrument profile and resolution on the one hand, and, on the other, which quantities and how they are assumed to vary on the object with time and space. Concerning the mapping, this should represent the physics that generates the observables from the given physical conditions in the object and thus illustrates the dependence of the observables on given physical quantities. Understanding this physics is crucial if the researcher is to select observables that are as “orthogonal” as possible; that is, that depend mostly on one physical quantity and not on the others. Certainly, the physics mapping needs approximations as well. These approximations depend a great deal on the observables and on the object’s physical quantities; for example, the assumptions cannot be the same if you have fully sampled Stokes profiles or just a few wavelength samples; different hypotheses apply for physical quantities that do or do not vary with depth in the atmosphere, or that are expected to present a given range of magnitudes. Therefore, mappings may include (often over-simplistic) one-dimensional calibration curves between a given observable parameter and a given physical quantity, or complicated multidimensional relationships between observables and quantities that require the definition of a metric or distance in at least one of the two spaces.

Even in the simplest situations, the relationship between observables and quantities does not have to be linear and may depend on the specific sub-space of the physical parameters. For example, a calibration curve based on the weak-field approximation may apply for a given range of magnetic fields but saturate for stronger ones (see Sects. 2.4, 3.1.2). But, when the problem can be assumed to be multidimensional, covariances appear because single observables rarely depend on just a single quantity (see Sect. 7). For example, a given spectral line Stokes V profile can seemingly grow or weaken by the same amount owing to changes in temperature or magnetic field strength (e.g., del Toro Iniesta and Ruiz Cobo 1996). An example can be seen in Fig. 1, where two apparently equal V profiles come from two different atmospheres. With all these ingredients at hand, the astrophysical analysis of observations is a non-linear, fully involved, topological task where many decisions have to be made (the art) and, hence, cannot be taken for granted.

Fig. 1
figure 1

Left panel: Open circles Stokes V profile in units of the continuum intensity of the Fe i line at 630.25 nm synthesized in a model atmosphere in hydrostatic equilibrium, 2000 K cooler than the del Toro Iniesta et al. (1994) model, with a constant longitudinal magnetic field of 800 G, a gradient in velocity from \(2\,\) \(\hbox {km}\,\hbox {s}^{-1}\)at the bottom to \(0\,\) \(\hbox {km}\,\hbox {s}^{-1}\)at the top of the photosphere, and a macroturbulence velocity of 1 \(\hbox {km}\,\hbox {s}^{-1}\). Solid line Stokes V profile of the same line (normalized the same way), synthesized in a model atmosphere 305 K hotter that the former, 270 G weaker, and with a higher macroturbulence velocity of 2.06 \(\hbox {km}\,\hbox {s}^{-1}\). Right panel T and B stratifications for the two models

The techniques by which astronomers have obtained information about the physical conditions in the object have evolved in parallel to technological advancements; that is, to the available means we have of gathering such information. The community has gradually enhanced its knowledge from medium-band measurements including one or several spectral lines to very fine wavelength sampling of the four Stokes profiles of single or multiple spectral lines; from old curves of growth for equivalent widths to highly sophisticated techniques that include the solution of the radiative transfer equation (RTE). The finer the information, the more complete the physical description.

Following Socas-Navarro (2001), let us consider the simplest case of having a single observable parameter, the Doppler displacement with respect to the rest position of the spectral line, \(\varDelta \lambda \), and a single physical quantity to derive, the line-of-sight (LOS) velocity, \(v_{\mathrm{LOS}}\). Imagine that we measure \(\varDelta \lambda \) by finding the minimum (or the maximum in the case of an emission line) of the intensity profile. The biunivocal mapping between the one-dimensional space of observables—that containing all possible Doppler displacements—and the one-dimensional space of physical quantities—that of LOS velocities—is given by the Doppler formula

$$\begin{aligned} v_{\mathrm{LOS}} = \frac{\varDelta \lambda }{\lambda _0} c, \end{aligned}$$

where \(\lambda _0\) stands for the vacuum rest wavelength position of the line and c for the speed of light. This simple inference relationship requires at least three implicit physical assumptions for the Doppler displacement to be properly defined and measured; namely that (a) the solar feature is spatially resolved, (b) the line is in pure absorption (or pure emission), and (c) \(v_{\mathrm{LOS}}\) is constant along the LOS. First, if we have unresolved structures we cannot ascribe the inferred velocity to any of them. Second, lines with core reversals, either in absorption or in emission, do not qualify for the extremum-finding method. And third, as soon as we have an asymmetric profile, \(\varDelta \lambda \) can no longer be properly defined for the line but for a given height through the profile, and then the mapping in Eq. (1) immediately loses its meaning. While in the case of a constant velocity, we properly infer that velocity, in the presence of gradients we infer a value corresponding only to the—in principle unknown—layers where the core of our line has been formed (typically the highest layers of the atmosphere). We measure a velocity but we do not know which one. Strictly speaking, the same measurement corresponds to different physical quantities depending on the assumptions. Of course we could complicate our problem a little and try to determine the stratification of LOS velocities with height, or simply estimate a gradient, by measuring the so-called bisector, the geometric position of those points equidistant from both wings of the profile at a given depth. At that point, our spaces have increased their dimensions and Eq. (1) is no longer the sole ingredient of our mapping because we must add some more physical assumptions to interpret the different displacements of the bisector in terms of velocities at different heights in the atmosphere. Hence, depending on the assumed physics, the quantitative results may change. This easy example has been used to illustrate that even the simplest inference is dependent on physical assumptions. This is an inherent property of astrophysical measurements and no one can escape from it: the same observable can mean different things depending on the assumed underlying physics. Most of the criticisms of the inversion techniques that are reviewed in this paper often come from this lack of uniqueness of the results. Many authors claim that the inversion of the RTE is an ill-posed problem. This being true, one should realize that astrophysics itself is indeed ill-conditioned, and this is a fact we have to deal with, either willingly or not.

The physics connecting the object quantities with the observable parameters is of paramount significance and deserves a little consideration at this point. Radiative transfer is the discipline encompassing the generation and transport of electromagnetic radiation through the solar (stellar) atmosphere. Hence, the mapping between the two spaces will be based upon it and depend on its degrees of approximation. The specification of the radiation field through a scattering atmosphere was first formulated as a physical problem by Strutt (1871a, b, 1881, 1899). In the astrophysical realm, the problem was posed in the works by Schuster (1905) and Schwarzschild (1906) without taking polarization into account. After that, although not known to the astrophysical community, Soleillet (1929) presented a theory of anisotropic absorption that is nothing but a rigorous formulation of the radiative transfer equation. Very importantly, he used the formalism proposed by Stokes (1852) to deal with partially polarized light. It was not, however, until the works by Chandrasekhar (1946a, b, 1947) that the transfer problem of polarized light was settled as an astrophysical problem on its own. The Stokes formalism has regularly been used since then in the astronomical literature. After Hale’s (1908) discovery of sunspot magnetic fields, the interpretation of the solar (stellar) spectrum of polarized light became necessary and a full theory has been developed since the mid 1950s. The first modern formulation of an equation of radiative transfer for polarized light was presented by Unno (1956), who also provided a solution in the simplified case of a Milne–Eddington (ME) atmosphere. Only absorption processes were taken into account and a complete description had to wait until the works by Rachkovsky (1962a, b, 1967), who also included dispersion effects (the so-called magneto-optical effects). These two derivations were phenomenological and somewhat heuristic. A rigorous derivation of the radiative transfer equation (RTE) based on quantum electrodynamics was obtained by Landi Degl’Innocenti and Landi Degl’Innocenti (1972). Later, four derivations of the RTE from basic principles of classical physics were published by Jefferies et al. (1989), Stenflo (1991; see also Stenflo 1994), Landi Degl’Innocenti (1992; see also Landi Degl’Innocenti and Landolfi 2004), and del Toro Iniesta (2003b). A discussion of the RTE and the several assumptions used in various available inference techniques is deferred to Sect. 2.

Certainly, any inference has to be based on solutions of the RTE because it relates the observable Stokes spectrum with the unknowns of the problem; namely, the physical quantities characterizing the state of the atmosphere they come from. No matter how simplified such solutions can be, it is natural to compare the observations with theoretical calculations in prescribed sets of physical quantities. The comparison of observational and synthetic parameters results in values for the sought-for quantities that may be refined in further iterations by changing the theoretical prescriptions. This trial-and-error method can be practical when the problem is very simple (involving a few free parameters) but can become unsuitable for practical use if the number of free parameters is large. Even automated trial-and-error—i.e., Monte Carlo—methods may fail to converge to a reliable set of physical conditions in the medium. Some more educated techniques are needed to finally work out that convergence between observed and synthetic parameters.

Generally speaking, any method in which information about the integrand of an integral equation is obtained from the resulting value of the integral is called an inversion method. In our particular case, it is straightforward to write the synthetic Stokes spectra as an integral involving a kernel that depends on the physical conditions of the atmosphere [see Eq. (8)]. In fact, the emergent formal solution of the RTE is the most basic type of integral equation, namely a Fredholm equation of the first type, because both integration limits are fixed. Consequently, we will call inversion codes or inversion techniques those methods that (almost) automatically succeed in finding reliable physical quantities from a set of observed Stokes spectra because we shall understand that they indeed automatically solve that integral equation. There is a whole variety of flavors depending on the several hypotheses that can be assumed, but all of them share the characteristic feature of automatically minimizing a distance in the topological space of observables. The idea had already been clearly explained in the seminal work by Harvey et al. (1972): “Solve for \(\varvec{B}\) on the bases of best fit of the observed profiles to the theoretical profiles”. And the free parameters for such a best fit were found through least squares minimization of the profile differences. They obtained only an average longitudinal field component because their Stokes Q and U observations were not fully reliable and magneto-optical effects were not taken into account, but the fundamental idea underlying many of the current techniques can already be found in that very paper, including a simple two-component model to describe the possible existence of spatially unresolved magnetic fields.

In a thorough study using synthetic Stokes profiles, Auer et al. (1977) proposed a new inversion method based on Unno’s theory and tested its behavior in the presence of several realistic circumstances, such as asymmetric profiles, magnetic field gradients, magneto-optical effects, and unresolved magnetic features. This technique was later generalized by Landolfi et al. (1984) to include magneto-optical and damping effects. The numerical check of the code was fairly successful but neither the original code by Auer et al. (1977) nor the new one by Landolfi et al. (1984) were applied to observations. Independently of the latter authors, the preliminary studies by Skumanich and Lites (1985), Lites and Skumanich (1985) and Skumanich et al. (1985) jelled in what has been one of the most successful ME inversion codes so far by Skumanich and Lites (1987), later extended by Lites et al. (1988) to mimic a chromospheric rise in the source function (see Sect. 2.3). This code has been extensively used with observational data, most notably those obtained with the Advanced Stokes Polarimeter (Elmore et al. 1992).

Based on the thin flux tube approximation, Keller et al. (1990) proposed an inversion code for extracting physical information not from the Stokes profiles themselves but from several parameters calculated from I and V observations of a plage and a network. Two years later, Solanki et al. (1992a) presented a new inversion code whereby from the whole Stokes I and V profiles they selected among a handful of prescribed temperature stratifications and inferred height-independent magnetic field strength and inclination, Doppler shift, filling factor (surface fraction in the resolution element covered by magnetic fields), macro- and micro-turbulent velocities, and some atomic parameters of the spectral line. The very same year, Ruiz Cobo and del Toro Iniesta (1992) introduced SIR, an acronym for Stokes Inversion based on Response functions. Like the former codes, SIR ran a non-linear, least-squares, iterative Levenberg–Marquardt algorithm but with a remarkable step-forward feature: physical quantities characterizing the atmosphere were allowed to vary with optical depth. The increase of free parameters can generate a singularity problem: the variation of some atmospheric parameters may not produce any change on the synthetic spectra or, in other cases, different combinations of the perturbation of several parameters may produce the same change in the spectra. The success of SIR lies in regularizing the problem through a tailored Singular Value Decomposition method (SVD). This allows, in principle, to look for any arbitrarily complex atmospheric stratification. The three components of the magnetic field, the LOS velocity, the temperature stratification, and the microturbulence may have any height profile. The code also infers height-independent microturbulent velocity and filling factor. The possibility exists for also fitting some atomic parameters (e.g., Allende Prieto et al. 2001), but they are typically fixed in practice. The code can be applied to any number of spectral lines that are observed simultaneously. SIR has been successful in a large number of observing cases and its use is still spreading among the community.

Following SIR’s strategy (that is, using response functions, nodes, Levenberg–Marquardt, and SVD), an evolution of the Solanki et al. (1992a) code called SPINOR was presented by Frutiger and Solanki (1998) that also allowed for height variations of the physical quantities and included the possibility of multi-ray calculations assuming the thin flux tube approximation. Sánchez Almeida (1997) proposed an original inversion code under the MIcro-Structured Magnetic Atmosphere (MISMA) hypothesis (see Sects. 2.5, 3.2.2). In 2000, the codes by Socas-Navarro et al. (2000, NICOLE—NLTE Inversion Code based on the Lorien Engine—) and by Bellot Rubio et al (2000; see also 1997) were presented. The first (based on an earlier code by Socas-Navarro et al. (1998) without taking either polarization or magnetic fields into account) included non-LTE radiative transfer (see Sect. 2.1), and the second was specifically designed for analyzing Stokes I and V profiles in terms of the thin flux tube approximation by using an analytic shortcut for radiative transfer proposed by del Toro Iniesta et al (1995, see Sect. 3.2.3). On their hand, Rees et al. (2000) proposed a Principal Component Analysis (PCA), which worked by creating a database of synthetic Stokes profiles by means of an SVD technique. In such a database, given eigenprofiles are obtained that are later used as a basis for expanding the observed Stokes profiles. Hence, the description of observations can be made with the help of a few coefficients, thus speeding up the inversion process. One year later, LTE Inversion based on Lorien Iterative Algorithm (LILIA), a code with similar properties as SIR, was presented by Socas-Navarro (2001) and Fast Analysis Technique for the Inversion of Magnetic Atmospheres (FATIMA), a PCA code, was introduced by Socas-Navarro et al. (2001). A different technique was proposed by Carroll et al. (2001, see also Socas-Navarro 2003) that used artificial neural networks (ANNs) whereby the system was trained with a set of synthetic Stokes profiles. The structure obtained therefrom finds the solution for the free parameters by interpolating among the known ones. Although the training can be slow, the inversion of observational data is very fast. In practice, both the synthetic training set of ANNs and the synthetic database of PCA have employed ME profiles to keep the implementation feasible. Otherwise, the number of free parameters would render the two techniques impracticable. A PCA code to analyze the Hanle effect in the He i D\(_{3}\) line was developed by López Ariste and Casini (2003, see also, Casini et al. 2005).

A substantial modification of the original SIR code, called SIRGAUSS, was presented by Bellot Rubio (2003) in which the physical scenario included the coexistence of an inclined flux tube—that is pierced twice by the LOS—within a background. Such a scenario is used to describe an uncombed field model of sunspot penumbrae (Solanki and Montavon 1993). An evolution of this inversion code, called SIRJUMP, was later used by Louis et al. (2009) that was able to infer possible discontinuities in the physical quantities along the LOS. A further code presented by Asensio Ramos (2004) was able to deal with the Zeeman effect in molecular lines. The very same year, Lagg et al. (2004) published HeLIx, an ME inversion code that dealt with the Hanle and the Zeeman effect in the He i line at 1083 nm. Another ME inversion code was presented by Orozco Suárez and del Toro Iniesta (2007) with the helpful feature that was written in IDL, so that it is easily manipulated by relatively inexperienced users and employed as a routine in high-level programming pipelines. Also in 2007, Bommier et al. took over the Landolfi et al. (1984) method and extended it to include unresolved magnetic structures. Unfortunately, they fail to obtain the magnetic field strength and the filling factor separately; only their product is reliable. Self-consistent levels of confidence in the ME inversion results were estimated through the code proposed by Asensio Ramos et al. (2007a) using Bayesian techniques. A rigorous treatment of optical pumping, atomic level polarization, level crossings and repulsions, Zeeman, Paschen–Back, and Hanle effects on a magnetized slab was included in HAZEL (Asensio Ramos et al. 2008), with which analysis of the He i D\(_3\) and the multiplet at 1083 nm can be carried out.

Oriented to its extensive use with the data coming from the Helioseismic and Magnetic Imager (Graham et al. 2003) aboard the Solar Dynamics Observatory, Borrero et al. (2011) presented Very Fast Inversion of the Stokes Vector (VFISV), a new ME code but with several further approximations and simplifying assumptions to make it significantly faster than other available codes. Mein et al. (2011) presented an alternative inversion code in which, with a significant number of simplifying assumptions on top of the ME approximation (such as Stokes I profiles being Gaussians and magneto-optical effects being almost negligible), some moments of the Stokes profiles are used to retrieve the vector magnetic field and the LOS velocity. In 2012, a significant step forward was provided by van Noort, who combined spectral information with the known spatial degradation effects on two-dimensional maps to obtain a consistent restoration of the atmosphere across the whole field of view. An aim similar to van Noort’s is followed by Ruiz Cobo and Asensio Ramos (2013), who, by means of a regularized method (indeed based on PCA), deconvolve the spectropolarimetric data that are later inverted with SIR. Based on the concept of sparsity, Asensio Ramos and de la Cruz Rodríguez (2015) have proposed a novel technique that allows the inversion of two-dimensional (potentially three-dimensional) maps at once.

The interested reader can complement this chronological overview with the reviews by del Toro Iniesta and Ruiz Cobo (1995, 1996, 1997), Socas-Navarro (2001), del Toro Iniesta (2003a), Bellot Rubio (2006) and Asensio Ramos et al. (2012) and the didactical introductions and discussions by Stenflo (1994), del Toro Iniesta (2003b) and Landi Degl’Innocenti and Landolfi (2004). A critical discussion on the different techniques and the specific implementations will be developed through the paper, which is structured as follows: the basic assumptions of radiative transfer are discussed in Sect. 2; the following two sections discuss the approximations used for the model atmospheres and the Stokes profiles; an analysis of the forward problem, namely the synthesis of the Stokes spectrum, is presented in Sect. 5, which is followed by an analysis of the sensitivities of spectral lines to physical quantities (Sect. 6); the basics of inversion techniques are analyzed in Sect. 7 and a discussion on inversion results presented in Sect. 8; finally, Sect. 9 summarizes the conclusions. An appendix proposes an optimum way of initializing the inversion codes through the use of classical estimates.

2 Radiative transfer assumptions

The propagation of electromagnetic energy through a stellar atmosphere—and its eventual release from it—is a significantly complex, non-linear, three-dimensional, and time-dependent problem where the properties of the whole atmosphere are involved. From deep layers up to the stellar surface, the coupling between the radiation field and the atmospheric matter implies non-local effects that can connect different parts of the atmosphere. In other words, the state of matter and radiation at a given depth may depend on that at the other layers: light emitted at one point can be absorbed or scattered at another to release part or all of its energy.

The description of the whole system, matter plus radiation field, needs to resort to the solution of the coupled equations that describe the physical state of the atomic system and that of the radiation traveling through it. Therefore, we have to simultaneously solve the so-called statistical equilibrium equations and the radiative transfer equation. The first assumption we shall make is that radiative transfer is one dimensional; that is, that the transfer of radiative energy perpendicular to the line of sight can be neglected in the matter–radiation coupling. For most solar applications so far, this assumption has been seen to be valid. Since the purpose of this paper is not directly related to either of the two systems of equations, let us simply point out what their main characteristics and ingredients are, and how the whole problem can be simplified in different situations. We refer the interested reader to the book by Landi Degl’Innocenti and Landolfi (2004) for a full and rigorous account of all the details.

Most classical radiative transfer descriptions in the literature do not deal with polarization. They are typically qualified as radiative transfer studies for unpolarized light but the name is ill-chosen. Formally speaking, those analyses are for light traveling through homogeneous and isotropic media (del Toro Iniesta 2003b). As a consequence of that heritage, the community is used to speak about atomic level populations either calculated through the Boltzmann and Saha equations (the LTE approximation; see Sect. 2.2) or not (the non-LTE case; see Sect. 2.1). These isotropic descriptions of the transfer problem, however, are not valid when a physical agent such as a vector magnetic field establishes a preferential direction in the medium, hence breaking the isotropy. Moreover, the outer layers of a star are a clear source of symmetry breaking. The exponential density decrease with height makes the radiation field anisotropic: outward opacity is much smaller than inward opacity. This should also be the case with collisions between particles: they are more probable at the bottom than at the top of the atmosphere. In such a situation, the probability is not zero for the various degenerate levels of the atom (with respect to energy) to be not evenly populated and for non-zero coherences or phase relations between them to exist. The atomic system is then said to be polarized and its state is best described with the so-called density operator, \(\varvec{\rho }\), that provides the probabilities of the sublevels being populated (hence the populations) along with the possible correlations or interferences between every pair. In the standard representation that uses the eigenvectors of the total angular momentum, \(\varvec{J}^2\), and of its third component, \(\varvec{J}_z\), as a basis, the density matrix element

$$\begin{aligned} \rho \left( \alpha j m, \alpha ' j' m'\right) = \left\langle \alpha j m | \rho | \alpha ' j' m'\right\rangle \end{aligned}$$

represents the coherence or phase interference between the different magnetic sublevels characterized by their angular momentum quantum numbers. In Eq. (2), \(\alpha \) and \(\alpha '\) stand for supplementary quantum numbers relative to those operators that commute with \(\varvec{J}^2\) and \(\varvec{J}_z\). Certainly, the diagonal matrix elements \(\rho _{\alpha } (j m, j m) \equiv \rho (\alpha j m, \alpha j m)\) represent the populations of the magnetic sublevels and the sum

$$\begin{aligned} n_j = \sum _m \rho _{\alpha } (j m, j m) = \sum _{m=-j}^{j} \langle \alpha j m | \rho | \alpha j m \rangle \end{aligned}$$

accounts for the total population of the level characterized by the j quantum number.

At all depths in the atmosphere, evolution equations for these density matrix elements have to be formulated that describe their time (t) variations due to the transport of radiation, on the one hand, and to collisions among particles on the other. All interactions with light—namely, pure absorption (A), spontaneous emission (E), and stimulated emission (S)—have to be considered. All kinds of collisions—namely, inelastic (I), superelastic (S), and elastic (E) collisions—have to be taken into account. Inelastic collisions induce transitions between any level \(|\alpha jm\rangle \) and an upper level \(|\alpha _u j_u m_u\rangle \) with a consequent loss in kinetic energy. Superelastic collisions induce transitions to a lower energy level \(|\alpha _l j_lm_l\rangle \) with an increase in the kinetic energy of collision. Finally, elastic collisions induce transitions between degenerate levels \(|\alpha jm\rangle \) and \(|\alpha jm'\rangle \); in these, the colliding particle keeps its energy during the interaction. The statistical equilibrium equations (4) and (5) that follow for radiative and collisional interactions, respectively, have slightly different application ranges. The former are valid for the multi-term atom representation and can even be used in the Paschen–Back regime, while the latter are only valid for the special case of the multi-level atom representation (although they can be generalized to the multi-term representation).Footnote 2 We make them explicit here for illustrative purposes only and refer the interested reader to the Landi Degl’Innocenti and Landolfi’s (2004) monograph for details. According to that work, the radiative interaction equations in the magnetic field reference frameFootnote 3 can be written as

$$\begin{aligned}&\frac{\mathrm{d}}{{\mathrm{d}} t} \rho _{\alpha } (jm,j'm') = -2\pi {\mathrm{i}} \, \nu _{\alpha } (jm,j'm') \rho _{\alpha } (jm,j'm') \nonumber \\&\quad +\sum _{\alpha _l j_l m_l j'_l m'_l}^{\,} \rho _{\alpha _l} (j_l m_l,j'_lm'_l) \, T_A \left( \alpha jmj'm',\alpha _l j_l m_lj'_lm'_l\right) \nonumber \\&\quad +\,\sum _{\alpha _u j_u m_u j'_u m'_u}^{\,} \rho _{\alpha _u} (j_um_u,j'_um'_u) \left[ T_E \left( \alpha jmj'm', \alpha _u j_u m_u j'_u m'_u\right) \right. \nonumber \\&\quad \left. +\,T_S \left( \alpha jmj'm', \alpha _u j_u m_u j'_u m'_u\right) \right] \nonumber \\&\quad -\,\sum _{j''m''}^{\,} \Bigl \{ \rho _{\alpha } (jm,j''m'') \left[ R_A (\alpha j'm'j''m'') + R_E (\alpha j''m''j'm') + R_S (\alpha j''m''j'm') \right] \Bigr . \nonumber \\&\quad +\,\rho _{\alpha } (j''m'',j'm') \left[ R_A (\alpha j''m''jm) + R_E (\alpha jmj''m'') + R_S (\alpha jmj''m'')\right] \Bigr \}, \end{aligned}$$

where \(\nu _{\alpha } (jm,j'm')\) is the frequency difference between the two sublevels and the T’s and R’s are radiative rates of coherence transfer and relaxation among the sublevels, respectively. Now, the collisional interactions give

$$\begin{aligned}&\frac{\mathrm{d}}{{\mathrm{d}} t} \rho _{\alpha } (jm,jm') = \sum _{\alpha _l j_l m_l m'_l}^{\,} C_I (\alpha j mm',\alpha _l j_l m_l m'_l) \rho _{\alpha _l} (j_lm_l,j_lm'_l) \nonumber \\&\quad +\,\sum _{\alpha _u j_u m_u m'_u}^{\,} C_S (\alpha j mm',\alpha _u j_u m_u m'_u) \rho _{\alpha _u} (j_um_u,j_um'_u) \nonumber \\&\quad +\,\sum _{m'' m'''}^{\,} C_E (\alpha j mm',\alpha j m'' m''') \rho _{\alpha } (jm'',jm''') \nonumber \\&\quad -\sum _{m''}^{\,} \left[ \frac{1}{2} X(\alpha jmm'm'') \rho _{\alpha }(jm,jm'') + \frac{1}{2} X(\alpha jm'mm'')^* \rho _{\alpha } (jm'',jm') \right. \nonumber \\&\quad -\,\left. \frac{1}{2} X_E(\alpha jmm'm'') \rho _{\alpha }(jm,jm'') +\frac{1}{2} X_E(\alpha jm'mm'')^* \rho _{\alpha } (jm'',jm') \right] , \end{aligned}$$

where the C’s are collisional transfer rates between levels and the X’s are relaxation rates. The indices refer to the corresponding type of collisions and the asterisk denotes the complex conjugate.

With the standard notation for the Stokes pseudo-vector \(\varvec{I} \equiv (I,Q,U,V)^{\scriptscriptstyle \mathrm{T}}\), where index T stands for the transpose, the radiative transfer equation can be written as (e.g., del Toro Iniesta 2003b)

$$\begin{aligned} \frac{\mathrm{d} \varvec{I}}{\mathrm{d} \tau _{\mathrm{c}}} = \mathbf{K} (\varvec{I} - \varvec{S}), \end{aligned}$$

where \(\tau _{\mathrm{c}}\) is the optical depth at the continuum wavelength, \(\mathbf{K}\) stands for the propagation matrix, and \(\varvec{S}\) is the so-called source function vector. Since the continuum spectrum of radiation can safely be assumed flat within the wavelength span of a spectral line and non-polarized as far as currently reachable polarimetric accuracies are concerned, the optical depth, defined as

$$\begin{aligned} \tau _{\mathrm{c}} \equiv \int _s^{s_{\mathrm{lim}}} \chi _{\mathrm{cont}} \, \mathrm{d}s, \end{aligned}$$

is the natural length scale for radiative transfer. Note that the origin of optical depth (\(\tau _{\mathrm{c}} = 0\)) coincides with the outermost boundary of geometrical distances (\(s_{\mathrm{lim}}\)) and is taken where the observer is located so that \(\tau _{\mathrm{c}}\)’s are actual depths in the atmosphere. In Eq. (7), \(\chi _{\mathrm{cont}}\) is the continuum absorption coefficient (the fraction of incoming electromagnetic energy withdrawn from the radiation field per unit of length through continuum formation processes). The propagation matrix deals with absorption (withdrawal of the same amount of energy from all polarization states), pleochroism (differential absorption for the various polarization states), and dispersion (transfer among the various polarization states). The product of K and \(\varvec{S}\) accounts for emission. The RTE can then be considered as a conservation equation: the energy and polarization state of light at a given point in the atmosphere can only vary because of emission, absorption, pleochroism, and dispersion. Equation (6) is strictly valid only under the assumption that the energy and polarization state of light are independent of time. To be more specific, we have assumed that the rate of change of the Stokes parameter profiles is much slower than the radiative and collisional relaxation time scales involved in the problem.

A formal solution to the general RTE was proposed for the first time by Landi Degl’Innocenti and Landi Degl’Innocenti (1985), according to whom, the observed Stokes profiles at the observer’s optical depth (\(\tau _{\mathrm{c}}=0\)) read

$$\begin{aligned} \varvec{I}(0) = \int _{0}^{\infty } \mathbf{O} (0,\tau _{\mathrm{c}}) \mathbf{K} (\tau _{\mathrm{c}}) \varvec{S} (\tau _{\mathrm{c}}) \mathrm d \tau _{\mathrm{c}}, \end{aligned}$$

where \(\mathbf{O}\) is the so-called evolution operator, and a semi-infinite atmosphere has been assumed as usual. The solution is called formal because it is not a real solution as long as the evolution operator (and the propagation matrix and the source function vector) are not known. Unfortunately, no easy analytical expression can in general be found for \(\mathbf{O}\). Only in some particular cases, such as that in Sect. 2.3, can a compact form for the evolution operator and an analytic solution of the RTE be obtained. In all other cases, numerical evaluations of \(\mathbf{O}\) and solutions of the transfer equation are necessary. The emergent Stokes spectrum is obtained through an integral of a product of three terms all over the whole atmosphere. Claiming that some of the Stokes parameters are proportional to one of the matrix elements of \(\mathbf{K}\) is, at the very least, adventurous. This proportionality can only take place in very special circumstances (e.g., Sects. 2.4, 3.1.2).

2.1 The non-local thermodynamic equilibrium problem

Being a vector differential equation, the RTE should indeed be considered as a set of four coupled differential equations. These can only be solved independently in specific media, either isotropic or very simplified ones. But the situation is far more complicated since both K and \(\varvec{S}\) depend on the material properties described by \(\rho _{\alpha } (jm,j'm')\), as well as on external fields such as a macroscopic velocity or a magnetic field. For their part, the radiative and collisional transfer and relaxation rates do depend on the radiation field. Therefore, Eqs. (4), (5) and (6) describe a very involved, non-local, non-linear problem, known as the non-local thermodynamic equilibrium (NLTE) problem and must be consistently solved altogether. The numerical solution of all those coupled equations requires iterative procedures that are summarized in Fig. 2.

Fig. 2
figure 2

Block diagram of the Stokes profile synthesis under NLTE conditions

By a model atmosphere we understand the set of thermodynamic variables (usually two, e.g., temperature and pressure, T and p), dynamic (the macroscopic, bulk line-of-sight velocity field, \(v_{\mathrm{LOS}}\)), magnetic (the vector field \(\varvec{B}\), represented by B, the strength, \(\gamma \), the inclination with respect to the LOS, and \(\varphi \), the azimuth), and possibly some other, ad hoc variables (such as the micro- and macro-turbulence velocities, \(\xi _{\mathrm{mic}}\) and \(\xi _{mac}\), the filling factor, f—the area fraction of the resolution pixel that is filled with the unknown atmosphere—and so forth). All these variables have to be specified as functions of the optical depth. Numerically, that model can be represented by a vector \(\varvec{x}\) of \(np+r\) components, n being the number of depth grid points throughout the atmosphere, p the number of physical quantities varying with depth, and r the number of quantities that are assumed constant throughout the LOS. For example, one such model atmosphere would look like

$$\begin{aligned} \varvec{x}\equiv & {} [T(\tau _1), T(\tau _2), \ldots , T(\tau _n), p(\tau _1), p(\tau _2), \ldots , p(\tau _n), B(\tau _1), B(\tau _2), \ldots , B(\tau _n),\nonumber \\&\gamma (\tau _1), \gamma (\tau _2), \ldots , \gamma (\tau _n), \varphi (\tau _1),\varphi (\tau _2), \ldots , \varphi (\tau _n), \nonumber \\&v_{\mathrm{LOS}}(\tau _1), v_{\mathrm{LOS}}(\tau _2), \ldots , v_{\mathrm{LOS}}(\tau _n), \xi _{\mathrm{mic}}, \xi _{\mathrm{mac}}, f]^{\scriptscriptstyle \mathrm{T}}, \end{aligned}$$

where we have assumed specifically that both micro- and macro-turbulence (as well as the filling factor) are constant with depth. This assumption is based on the fact that experience teaches that the increase in spatial resolution reached with new instruments makes less and less necessary the use of such ad hoc parameters.

Once this model atmosphere is set, the necessary ingredients for the RTE and the statistical equilibrium equations can be calculated. The solution of the RTE has to be compared with that coming from it after modification driven by the new density matrix elements resulting from the solution of the statistical equations. If the differences are considered small compared with a given threshold, then a new synthetic set of Stokes parameters has been found. If not, the equilibrium equations have to be modified in order to iterate the procedure until convergence is reached. The direct problem of obtaining the Stokes spectrum of a given line coming out from a given model atmosphere then turns out to be very complex. It cannot always be computed with the necessary speed and accuracy. Approximations are, thus, in order.

2.2 The local thermodynamic equilibrium approximation

Imagine now that coherences among the Zeeman sublevels can be neglected, and that all of them are evenly populated. That is, assume that

$$\begin{aligned} \rho \left( \alpha jm,\alpha ' j'm'\right) = \delta _{\alpha \alpha '} \delta _{jj'} \delta _{mm'} \rho _{\alpha j}, \end{aligned}$$

where \(\delta \) is Kronecker’s delta. In such conditions, \(n_{j} = (2j+1) \rho _{\alpha j}\). Assume also that \(n_{j}\) and the population of other ionic species can be evaluated through the equations of thermodynamic equilibrium at the local temperature (the Boltzmann and Saha laws; e.g., Gray 2005). This assumption will be valid only in the case that the photon mean free path (\(\ell = 1/\chi _{\mathrm{cont}}\))Footnote 4 is small compared to the scale of variation of the physical quantities, i.e., when the atomic populations depend only upon the values of the local physical quantities. Besides, it can be shown that if Kirchoff’s law is further assumed, (e.g., Landi Degl’Innocenti and Landolfi 2004) the source function vector reduces to

$$\begin{aligned} \varvec{S} = (B_{\nu } (T), 0, 0, 0)^{\scriptscriptstyle \mathrm{T}}, \end{aligned}$$

where, \(B_{\nu } (T)\) is the Planck function at the local temperature. These are the conditions of the so-called local thermodynamic equilibrium approximation (LTE) and have automatically decoupled the RTE from the material equations. Then, if LTE can be supposed for a given spectral line, the synthesis of its Stokes profiles simplifies significantly because iterative procedures are no longer needed. This is graphically explained in Fig. 3.

Fig. 3
figure 3

Block diagram of the Stokes profile synthesis under LTE conditions

In some circumstances, it may be useful to relax the fulfillment of the Boltzman law and, instead, admit that \(\rho _{\alpha j}\) deviate from the LTE values, \(\hat{\rho }_{\alpha j}\), so that

$$\begin{aligned} \beta _j = \frac{\rho _{\alpha j}}{\hat{\rho }_{\alpha j}} \end{aligned}$$

are departure coefficients that measure how far the conditions are from LTE. Thus, although radiative transfer remains with the LTE scheme sketched in Fig. 3, the second block is affected by Eq. (12) and the \(\beta \)’s are needed to calculate the level populations. As we are going to see, this departure-coefficient approximation can be very useful for formulating NLTE inversion procedures (see Sect. 7.2.3).

Fig. 4
figure 4

Stokes I, LTE source function for various atmospheric models: the umbral model E by Maltby et al. (1986, black line), the penumbral model by del Toro Iniesta et al. (1994, red line), the plage model by Solanki (1986, blue line), and the quiet-Sun models by Gingerich et al. (1971, purple line) and Vernazza et al. (1981, green line)

2.3 The Milne–Eddington approximation

An even more simplified approximation is obtained by further assuming that thermodynamics is sufficiently described with a source function that depends linearly on the continuum optical depth,

$$\begin{aligned} \varvec{S} = (S_{0} + S_{1}\tau _{\mathrm{c}}) \, \varvec{e}_{0}, \end{aligned}$$

where \(\varvec{e}_{0} \equiv (1,0,0,0)^{\scriptscriptstyle \mathrm{T}}\), and that the other physical quantities (\(\varvec{B},\,v_{\mathrm{LOS}}\), etc.) in the model are constant throughout the atmosphere, hence defining a constant K. Figure 4 shows the LTE source function (the first component of the vector in Eq. 11) at 525 nm for several realistic model atmospheres, namely, the umbral model E by Maltby et al. (1986 black line), the penumbral model by del Toro Iniesta et al. (1994, red line), the plage model by Solanki (1986, blue line), and the quiet-Sun models by Gingerich et al. (1971, yellow line) and Vernazza et al. (1981, green line). The hypothesis of linearity does not seem very accurate for all the models. Nevertheless, in spite of its seemingly unrealistic nature, when we are dealing with a weak spectral line, the optical depth interval at which the line is sensitive to the atmospheric quantities is usually small enough to consider that a linear source function is not a bad approximation. There is wide experience in showing how useful the ME approximation is for inferring average values of the magnetic field vector and the LOS velocity, starting with the paper by Skumanich and Lites (1987, for a check with other approaches see Westendorp Plaza 1998). The key point is that the RTE has an analytic solution (Stokes \(\varvec{I}\) at \(\tau _{\mathrm{c}} = 0\)) under these assumptions (e.g., del Toro Iniesta 2003b):

$$\begin{aligned} \varvec{I} (0) = (S_0 + \mathbf{K}^{-1} S_1) \, \varvec{e}_0. \end{aligned}$$

The analytic character of the solution helps in grasping many of the relevant features in line formation; it cannot reproduce Stokes line asymmetries,Footnote 5 though (Auer and Heasley 1978). Using this useful feature, Landi Degl’Innocenti and Landi Degl’Innocenti (1985) had the clever idea of tailoring the functional shape of the source function so that it might be used to synthesize chromospheric line profiles while preserving an analytic solution because of the constancy with depth of the propagation matrix. Atomic polarization is neglected in this modeling. The so-called “field-free approximation” is assumed. The latter grants substitution of the scalar components of the source function for those corresponding to the same atom in the absence of a magnetic field (Rees 1969). Later on, Lites et al. (1988) elaborated Landi Degl’Innocenti and Landi Degl’Innocenti’s idea and proposed a new source function that was incorporated into their inversion code to interpret the observed profiles of the Mg i b lines at 517.27 and 518.36 nm. Specifically, they wrote the RTE in terms of the line center optical depth, \(\tau _{0}\), which remains the same as in Eq. (6) but substituting K by \(\mathbf{K}' \equiv r_0 \mathbf{K}\), where \(r_0\) is the continuum-to-line absorption coefficient ratio and with a new source function \(\varvec{S}'\) that follows from two distinct continuum and line source functions given by

$$\begin{aligned} \begin{aligned} \varvec{S}_{\mathrm{cont}}&= \varvec{S}, \\ \displaystyle \varvec{S}_{\mathrm{lin}}&= \varvec{S} - \sum _{i=1}^2 A_i \mathrm{e}^{-\varepsilon _i \tau _0}, \end{aligned} \end{aligned}$$

where \(\varvec{S}\) is defined in Eq. (13). The exponential shape of the last two terms in \(\varvec{S}_{\mathrm{lin}}\) tries to mimic the consequences in the source function of the actual chromospheric rise of temperature. The A’s and \(\varepsilon \)’s are free parameters that can be tuned to fit the observed profiles. With this formulation, the analytic solution of the transfer equation (at \(\tau _{0} = 0\)) turns out to be

$$\begin{aligned} \varvec{I}(0) = \left[ S_0 + \mathbf{K}'^{-1} S_1 - \sum _{i=1}^{2} A_i (\mathbf{K}' + \varepsilon _i \mathbbm {1}) (\mathbf{K}' - r_0 \mathbbm {1}) \right] \varvec{e}_0, \end{aligned}$$

where \(\mathbbm {1}\) stands for the identity \(4\times 4\) matrix.Footnote 6

Further exploiting the analytic character of the Milne–Eddington solution, slight modifications in the assumptions were also suggested by Landolfi and Landi Degl’Innocenti (1996) to deal with small velocity gradients and even with discontinuities along the LOS. In summary, we can say that approximations to the RTE predicated on keeping the K matrix constant or almost constant are useful and still a field for exploitation in observational work.

2.4 The weak-field approximation

A further simplification of radiative transfer is sometimes used. When the magnetic field can be assumed constant with depth and weak enough, the resulting Stokes V profile of many lines turns out to be proportional to the longitudinal component of the field, regardless of the remaining physical quantities (see Sect. 3.1.2). Under this assumption (and for not extremely weak fields since linear polarization is zero to first order approximation), the ratio between Stokes U and Q is proportional to the tangent of twice the field azimuth. The weakness of the field is guaranteed provided that (e.g., Landi Degl’Innocenti and Landolfi 2004)

$$\begin{aligned} g_{\mathrm{eff}} \frac{\varDelta \lambda _{\mathrm{B}}}{\varDelta \lambda _{\mathrm{D}}} \ll 1, \end{aligned}$$

where \(g_{\mathrm{eff}}\) is the effective Landé factor of the line, \(\varDelta \lambda _{\mathrm{B}}\) is the Zeeman splitting, and \(\varDelta \lambda _{\mathrm{D}}\) is the Doppler width of the line. The effective Landé factor is given by

$$\begin{aligned} g_{\mathrm{eff}} = \frac{1}{2} (g_u + g_l) + \frac{1}{4} (g_u-g_l) \left[ j_u(j_u + 1) - j_l(j_l+1)\right] , \end{aligned}$$

where \(g_u\) and \(g_l\) are the Landé factors of the upper and lower level of the transition, respectively. In LS coupling, those factors are functions of the quantum numbers:

$$\begin{aligned} g = \frac{3}{2} + \frac{s(s+1)-l(l+1)}{2j(j+1)}. \end{aligned}$$

The Zeeman splitting is given by

$$\begin{aligned} \varDelta \lambda _{\mathrm{B}} = \frac{\lambda _0^2 e_0 B}{4\pi mc^{2}}, \end{aligned}$$

where \(\lambda _0\) is the central, rest wavelength of the line, \(e_0\) and m are the charge and mass of the electron, B is the magnetic field strength, and c stands for the speed of light. For its part, the Doppler width is given by

$$\begin{aligned} \varDelta \lambda _{\mathrm{D}} = \frac{\lambda _0}{c} \sqrt{\frac{2kT}{m_a} + \xi _{\mathrm{mic}}^2}, \end{aligned}$$

where T is the temperature, k is the Boltzmann constant, and \(m_a\) is the mass of the atom.

From a formal point of view, Eq. (17) is a good conditioning inequality. However, in practical terms, one should establish what is meant by much less than 1. This is addressed in Sect. 3.1.2 but we can be sure that the wider the line, the more the weak-field approximation applies. Hence, broad chromospheric lines are good candidates for using it. One of the first attempts at measuring a magnetic field with a chromospheric line, known to the authors of this review, was carried out as early as 1990 by Martínez Pillet et al. who (photographically) observed Stokes I and V profiles of the Ca ii H line and interpreted them in terms of the weak-field approximation. This approach remains useful as interest in the chromosphere increases (e.g., de la Cruz Rodríguez et al. 2013).

2.5 The MISMA hypothesis

Driven by the ubiquitous appearance of Stokes profile asymmetries in observations, Landi Degl’Innocenti (1994) suggested considering the atmospheric physical quantities, instead of deterministic stratifications, to have stochastic distributions about mean values with possible correlation effects among them. Assuming that the source function nevertheless varies linearly with depth through the whole atmosphere and that the propagation matrix stays constant at the spatial scale of each of the realizations of such a common stochastic distribution, he found an analytic solution for the transfer equation. Certainly inspired by the Landi Degl’Innocenti’s proposal, Sánchez Almeida et al. (1996) put forward a new approach. Realizing that the wavelength symmetries in the propagation matrix elements do indeed avoid such Stokes profile asymmetries in the absence of LOS velocity gradients in the regular formulation of the transfer problem (Landi Degl’Innocenti 1992), they proposed that the solar atmosphere may be pervaded by MIcro-Structured Magnetic Atmospheres (MISMAs). The hypothesis implies a highly inhomogeneous atmosphere at scales much smaller than the photon mean free path whereby the integration of Eq. (6) turns out to be very difficult. An alternative formulation is thus in order by locally averaging the propagation matrix and the emission vector. The resulting equation reads

$$\begin{aligned} \frac{\mathrm{d} \varvec{I}}{\mathrm{d} s} = - \left< \mathbf{K}' \right> (\varvec{I} -\varvec{S}'). \end{aligned}$$

It formally looks very much like the regular RTE but is formulated in terms of geometrical distances, s; \(\mathbf{K}' = \chi _{\mathrm{cont}} \mathbf{K}\);

$$\begin{aligned} \varvec{S}' \equiv \left\langle {\mathbf{K}}' \right\rangle ^{-1} \left\langle {\mathbf{K}}' \varvec{S} \right\rangle ; \end{aligned}$$

and the averages are taken over a distance \(\varDelta s\) that may vary along the optical path. The distance \(\varDelta s\) is supposed to be still smaller than \(\ell \) for Stokes I to be assumed constant within its range. In addition, the averages are considered to vary smoothly along the line of sight.

With all these assumptions, Eq. (22) is formally the same as Eq. (6). All the mathematical tools developed to solve the latter can be used to find a solution to the former. This is so despite the (numerically) inconvenient formulation in terms of geometrical distances: it requires either non-equally-spaced grid points or an increase in computation time. The good news is that, since correlations may exist among the physical parameters of the microstructures, the symmetry properties of matrix \(\left< \mathbf{K}' \right>\) are automatically destroyed. Hence, asymmetric Stokes profiles can appear naturally.

3 Degrees of approximation in the model atmospheres

Provided that physical atmospheric quantities are bounded functions of the optical depth, we can safely expect that they are either continuous or have some jump (Heaviside-like) discontinuities throughout the line formation region. Therefore, except for the discontinuity points, a Taylor expansion approximation seems simple and sensible. The good feature of Taylor expansions is that you can keep them at a given order of approximation that can be subsequently increased if needed. The sequential approach is of great help in following the principle of Occam’s razor—lex parsimoniae—which, in our opinion, should prevail in the interpretational work. The question arises as to whether an order of approximation is useful or whether it should be increased to give account of the observations. The answer must be found in the degree of accuracy with which we are trying to reproduce the observables. Hence, it has to do with the balance between the signal and the noise: if the next order of approximation only introduces variations that are below, say, three times the rms noise, \(\sigma \), then its use is discouraged. If, on the contrary, the difference between the observed and synthetic profiles is greater than \(3\sigma \), its use may be advisable.Footnote 7 Let us postpone the discussion to the following sections and present here the various atmospheres we are considering. We start with the zeroth order approximation and assume that physical quantities are constant with depth to continue with gradients, higher order variations, and jumps or discontinuities.

Fig. 5
figure 5

Examples of ME Stokes profiles of the Fe i line at 617.3 nm as observed with an instrument whose Gaussian spectral PSF has a FWHM of 6 pm. Two model atmospheres are used that differ only in the magnetic field strength: \(B=1200\) G for the black lines and 200 G for the red ones

3.1 Constant physical quantities

Let us distinguish among three possibilities, namely, the Milne–Eddington approximation, the weak-field approximation, and an atmosphere where \(\varvec{B}\) and \(v_{\mathrm{LOS}}\) are constant but where thermodynamics is properly accounted for with a realistic stratification of temperature.Footnote 8

3.1.1 The Milne–Eddington atmosphere

As commented on in Sect. 2.3, a Milne–Eddington atmosphere provides an analytic solution to the RTE. With nine parameters, the Stokes profiles of a spectral line can be synthesized. The model parameters are the three components of the magnetic field, \(B,\,\gamma \), and \(\varphi \), the LOS velocity, \(v_{\mathrm{LOS}}\), and the so-called thermodynamic parameters: the line-to-continuum absorption coefficient, \(\eta _0\) (\(=\)1/\(r_0\)), the Doppler width of the line, \(\varDelta \lambda _{\mathrm{D}}\), the damping parameter, a, and the two coefficients for the source function, \(S_0\) and \(S_1\). The actual values of \(\eta _0,\,\varDelta \lambda _{\mathrm{D}}\), and a may vary significantly throughout the atmosphere. Therefore, assigning one single value for each may be, say, risky. Experience, however, indicates that this is possible. Reasonable fits to actual data can be obtained with this approximation and we can even understand the relationship between the single-valued parameters and their actual stratification (Westendorp Plaza 1998). Only Stokes profiles with definite symmetry properties can be formed in an ME atmosphere. Stokes \(I,\,Q\), and U are even functions of wavelength while Stokes V is odd. This is a consequence of the absence of velocity gradients (Auer and Heasley 1978) and will be discussed later in Sect. 5. Figure 5 shows two examples of ME profiles corresponding to the Fe i line at 617.3 nm as observed with an instrument whose (Gaussian) spectral profile (point spread function, PSF) has a full width at half maximum (FWHM) of 6 pm. The thermodynamic model parameters are \(\eta _0 = 5.06,\,\varDelta \lambda _{\mathrm{D}} = 2.6\) pm, \(a= 0.22,\,S_{0} = 0.1\), and \(S_{1} = 0.9\); they come from a fit to the FTS spectrum (Kurucz et al. 1984; Brault and Neckel 1987). The magnetic inclination and azimuth are both equal to \(30^{\circ }\); \(B=1200\) G for the black lines and 200 G for the red ones.

Fig. 6
figure 6

Maximum of the Stokes V profile as a function of the magnetic field strength for a longitudinal field (left panel). Maximum of the Stokes Q profile as a function of the square magnetic field strength (right panel). Asterisks correspond to an instrumental profile FWHM of 6 pm and diamonds to a FWHM of 8.8 pm. Red lines represent linear fits to the points; blue lines display quadratic fits; green lines correspond to fits for fields weaker than 200 G

3.1.2 The weak-field atmosphere

As stated in Sect. 2.4, when B is constant with depth and very weak, then the Stokes V profile turns out to be proportional to the longitudinal component of the magnetic field independently of the remaining quantities. It can be shown (e.g., Landi Degl’Innocenti and Landolfi 2004) that

$$\begin{aligned} V(\lambda ) \simeq - g_{\mathrm{eff}} \, \varDelta \lambda _{\mathrm{B}} \cos \gamma \, \frac{\partial I_{\mathrm{nm}}}{\partial \lambda }, \end{aligned}$$

where \(I_{\mathrm{nm}}\) is the non-magnetic Stokes I profile, corresponding to the line in the absence of a magnetic field. Equation (24) has been key for many magnetic inferences. In fact, written as \(V = CB _{\parallel }\), it is known as the magnetographic equation since it provides a calibration of the magnetographic signal. When magnetographs used only one or two wavelength samples of the circular polarization, the magnetographic equation was indeed the only means of obtaining estimates of the component of the magnetic field along the line of sight. Nowadays, with modern magnetographs providing more samples in all four Stokes parameters, that equation is still useful for morphological, qualitative estimates but cannot be trusted everywhere and under all circumstances. The modern way to evaluate C indeed implies some radiative transfer calculations in given model atmospheres (e.g., Martínez Pillet et al. 2011), and these calculations readily show that the approximation saturates at low magnetic field strengths. In the left panel of Fig. 6, we plot the maximum of the Stokes V profile as a function of the field strength (the field is along the LOS, \(\gamma =0^{\circ }\)) with an instrumental profile FWHM of 6 pm (asterisks) and of 8.8 pm (diamonds). In solid lines, the linear (red) and quadratic (blue) fits are also shown. Only strengths up to 600 G are plotted because the relationship is evidently nonlinear above that threshold. For weaker fields, it is apparent that the instrumental broadening of the profiles helps linearity to hold as differences between the linear and quadratic fits are smaller for the broader PSF. Those differences are for most of the points above \(3 \cdot 10^{-3} I_{\mathrm{c}}\); that is, more than \(3\sigma \), with \(\sigma \) being the noise level of the polarization continuum signal of typical observations. Such differences are clearly detectable by current means. Hence, the approximation loses validity for yet weak fields. Deviations from linearity are even clearer if one sees the green lines in the figure, which correspond to linear fits including only data points for which B is less than 200 G. In our example, the weak field approximation for the Stokes V peaks breaks down at fields stronger than 300 G with a FWHM of 6 pm and stronger than 400 G with a FWHM of 8.8 pm. Certainly, if the instrument has a narrower spectral PSF or if the noise is smaller, the approximation fails earlier. The approximation clearly worked better for older instruments.

Fig. 7
figure 7

Differences between the Stokes V profile and its weak-field approximation (left column) and differences between the Stokes I profile and that for a zero field strength. Colors indicate values of the longitudinal component of the field. The dashed horizontal lines mark the \(3\sigma \) level of typical, modern observations. The upper row is for a FWHM of 6 pm and the bottom one for a FWHM of 8.8 pm. Colors correspond to 600 G (black), 500 G (red), 400 G (blue), 300 G (green), 200 G (purple), and 100 G (dark green)

Further arguments can be supplied for the user to be cautious about weak field assumptions with typical, visible photospheric lines. The first one is that Eq. (24) is hardly applicable, as shown in Fig. 7, not only because Stokes V does not follow it but because Stokes I deviates from \(I_{\mathrm{nm}}\) even sooner (and, up to first order, \(I=I_{\mathrm{nm}}\) must hold for Eq. (24) to be valid; e.g., Landi Degl’Innocenti and Landolfi 2004). In the left column of the figure, the differences between the left-hand and the right-hand members of the equation are plotted. Colors correspond to 600 G (black), 500 G (red), 400 G (blue), 300 G (green), 200 G (purple), and 100 G (dark green). The dashed, horizontal purple lines mark the \(3\sigma \) level. The upper row is for a FWHM of 6 pm and the bottom row is for a FWHM of 8.8 pm. The plots in the left column are of course consistent with the results from Fig. 6. Those in the right column are illustrative of how Stokes I varies with the magnetic field strength. Differences between the various profiles can easily be discerned above the \(3\sigma \) level. When the profiles themselves are affected by noise, unlike in these plots, detecting the differences may be more difficult but the message is clear: contrary to the common belief, the Stokes V profile is not the only tool for estimating the longitudinal component of weak magnetic fields; Stokes I helps a lot and should not be forgotten.

The second argument concerns the diagnostic capability for typical lines to disentangle B from \(\gamma \) in the weak-field regime. Most statements about the only accurate retrieval to be the longitudinal magnetic field component are based on Eq. (24), as if it were the only available tool from radiative transfer. Stokes profiles other than V are often obliterated. It is easy to understand (e.g., Landi Degl’Innocenti and Landolfi 2004), however, that the mere deviations between I and \(I_{\mathrm{nm}}\) we have seen in Fig. 7 should imply the appearance of linear polarization signals (provided that the inclination is different from zero): such Stokes I deviations from \(I_{\mathrm{nm}}\) are second order terms in an expansion of all four Stokes profiles.Footnote 9 At second order, Stokes Q and U are no longer zero (or below the noise) either and start to provide additional information. It can also be proven (e.g., Landi Degl’Innocenti and Landolfi 2004) that \(Q \propto B^2 \sin ^2 \gamma \), as shown in the right panel of Fig. 6, where the maximum of Stokes Q is plotted against \(B^2\) for a field that is inclined \(45^{\circ }\) with respect to the vertical.Footnote 10 Here, deviations between linear and quadratic fits are smaller than for the V case (note that the Y scale is an order of magnitude smaller) but the interesting point is that, above \(B=200\) G, linear polarization signals begin to be larger than \(3\sigma \) and, hence, detectable.

A third argument we want to bring to the reader’s attention is related to the common belief that weak fields are hardly distinguished from strong fields (say above 1 kG) with a filling factor significantly smaller than 1. We will return to this issue in Sects. 6.2 and 8.2, as the problem has already been discussed in the literature (e.g., del Toro Iniesta et al. 2010). Let us only mention here that the loss of linearity of Stokes V above, say, 400 G and, most importantly, the behavior of Stokes I are reasons enough for the two types of atmospheres to be distinguished by observational means.

If Eq. (24) were universally accepted, then it would indicate that the RTE is almost useless since the emergent profile is proportional to one of the matrix elements of \(\mathbf{K}\). Elementary mathematics readily explain that this is not possible except for, perhaps, a small value range of fields. In summary, we must acknowledge that Stokes V is not proportional to the longitudinal component of the magnetic field.

3.1.3 Constant vector magnetic field and LOS velocity

There is still a third option to deal with constant \(\varvec{B}\) and \(v_{\mathrm{LOS}}\). Imagine that the atmosphere is a regular one as far as thermodynamics is concerned but where the magnetic and dynamic quantities do not vary with depth. Since the propagation matrix is no longer constant, no analytic solution of the RTE is available.Footnote 11 One is then led to use numerical techniques to synthesize the spectrum. The atmosphere, however, is greatly simplified since the number of parameters is reduced. This can be very helpful for quicker analyses of the data or as a makeshift for more elaborate subsequent approaches that include variations of \(\varvec{B}\) and \(v_{\mathrm{LOS}}\) with the optical depth. This is the approximation used for the first version of the SPINOR code (Solanki et al. 1992a) or as an option in the SIR code (Ruiz Cobo and del Toro Iniesta 1992).

3.2 Physical quantities varying with depth

The community has gathered a great deal of evidence about variations of \(\varvec{B}\) and \(v_{\mathrm{LOS}}\) along the optical path everywhere over the solar disk. In addition, physical laws such as those of magnetic flux and mass conservations demand that these quantities vary with optical depth in a number of structures. The approximations in the former subsections cannot then be considered but as first-step approaches or simplified descriptions of reality. In any case, we can safely assume that stratifications of the physical quantities are bounded functions of \(\tau _{\mathrm{c}}\) (or whichever variable parameterizing the optical path), as we admitted in the beginning of this section.

A historical landmark for the full acknowledgement of LOS velocity gradients from an observational point of view was established by the discovery by Mickey and Orrall (1974) and Illing et al. (1974a, b, 1975) of a broadband circular polarization in sunspots. The true explanation was already suggested in the last of those papers, although schematically founded on the assumption of two slabs with different velocity and magnetic field strengths. The broadband observations were soon related to spectral line net circular polarization (the integral of the Stokes V profile over the wavelength span of the line): Grigorjev and Katz (1975) computed all four Stokes profiles in the presence of an LOS velocity gradient and certainly obtained asymmetric profiles; later on, Auer and Heasley (1978) demonstrated that a necessary and sufficient condition for such a net circular polarization had to be found in velocity gradients along the line of sight, although they were neglecting magneto-optical effects. Rigorous derivations (including dispersion effects) have later been obtained and can be found, for example, in the elegant work by Landi Degl’Innocenti and Landi Degl’Innocenti (1981). The symmetry properties of the propagation matrix elements predict no net circular polarization (or Stokes V area asymmetry) in the absence of an LOS velocity gradient. Other mechanisms such as insufficient spatial resolution that implies mixtures of individual atmospheres within a pixel, may produce asymmetries in the peaks (the so-called amplitude asymmetries) but the integral of V will remain zero. Therefore, any net circular polarization is unambiguous observational evidence for the presence of velocity gradients. And Stokes V area asymmetries are observed practically everywhere. Unfortunately, no such unambiguous evidence exists for the presence of magnetic field gradients, although we know on physical grounds there are plenty of them, such as those through magnetic canopies where a magnetic layer is overlaying a non-magnetic one.

3.2.1 Parameterizing the stratifications

Among the numerical codes relevant to this review (see Sect. 7) there are some that acknowledge variations of \(\varvec{B}\) and \(v_{\mathrm{LOS}}\). We deal here with what might be called “normal” or “regular” stratifications, such as those employed by Ruiz Cobo and del Toro Iniesta (1992), Frutiger and Solanki (1998), Socas-Navarro et al. (2000) and Socas-Navarro (2001), and leave some others, devoted to specific solar features, to the following paragraphs.

Since the number of depth grid points used for the numerical integration of the RTE can be high, it may be advisable to reduce the degrees of freedom of the variations with depth of the physical quantities. As commented on above, a reasonable approach would be to follow higher order polynomials in a stepwise form. From constant values to linear, parabolic, third-order polynomial dependences, and so on. Then, if we assume, for instance, that \(v_{\mathrm{LOS}}\) is linear with \(\tau _{\mathrm{c}}\), we only need to specify the velocity at two grid points (nodes in SIR’s terminology) and three if it is parabolic, hence reducing the number of free parameters of the model. We do not need to specify \(T,\,\varvec{B}\), and \(v_{\mathrm{LOS}}\) at every single point we use for solving the RTE but only at a few of them. We shall see in Sect. 7 that one can go even further with this kind of approach and consider more involved optical depth dependences if necessary.

3.2.2 The MISMA atmosphere

As we explained in Sect. 2.5, the MISMA hypothesis guarantees the appearance of Stokes profile asymmetries but at the expense of introducing a significant number of extra free parameters. In fact, even in the simplest MISMA atmosphere (Sánchez Almeida and Landi Degl’Innocenti 1996), where all the micro-structures are described by ME atmospheres, one has in principle as many as ten free parameters per needed component (also known as micro-structure). To the nine regular ME parameters, the volume occupation fraction for each micro-structure must be added. In more complicated MISMAs, the number of parameters is even higher (Sánchez Almeida 1997). Moreover, in spite of the very detailed physical description where equilibrium equations are required for slender flux tubes, the inclination and azimuth of the magnetic field are kept constant throughout the whole atmosphere, which does not seem very realistic (Canopies are found almost everywhere owing to the fanning out of magnetic field lines with height). Last, but not least, when the structuring of the atmosphere is established at sizes comparable to \(\ell \), the average propagation matrix does not result in an RTE as in Eq. (22), which is no longer valid. Modern observations with continuously increasing spatial resolution do indeed show this kind of structuring both in quiet and active regions and sunspots. For example, single magnetic flux tubes of approximately 150 km size have been fully resolved by Lagg et al. (2010); their evolution followed for half an hour by Requerey et al. (2014); and the internal structure of network magnetic structures revealed (Martínez González et al. 2012) with Sunrise/IMaX observations (Martínez Pillet et al. 2011; Barthol et al. 2011). In our opinion, the MISMA hypothesis, being a clever idea for producing asymmetries, is advisable as a “when-all-else-fails” atmosphere but there are yet conventional radiative transfer treatments that provide reasonable interpretation of the observations.

3.2.3 Other special atmospheres

This subsection is devoted to three special cases where the physical scenario envisaged to explain the observations requires a specific configuration that is not intended to be universally valid. Those specific configurations, however, help in interpreting the Stokes profiles emerging from given solar features.

Interlaced atmospheres Imagine that you can assume that your line of sight is piercing a number n of alternate boundaries \(\left\{ s_{i} \right\} _{i=1,\ldots ,n} (s_{1}<s_{2}<\cdots <s_{n})\) between two distinct atmospheres, as when observing from a side two identical thin flux tubes that are close but not stuck to each other. In such a scenario, the structuring of the atmosphere is comparable in size with \(\ell \) and, therefore, the MISMA hypothesis does not hold. If you happen to know the solution, \(\varvec{I}_{\pm 1}\), of the RTE in each of the two atmospheres, labeled \(\pm 1\), del Toro Iniesta et al. (1995) found out that the formal solution to the problem is

$$\begin{aligned} \varvec{I} (s) = \varvec{I}_{+1} (s) + \sum _{i=1}^{n} (-1)^{n-i} \left[ \prod _{j=i}^{n} \mathbf{O}_{(-1)^{n-j}} (s_{j+1}, s_{j}) \right] \varDelta \varvec{I} (s_{i}), \end{aligned}$$

for any \(s \in [s_{n}, s_{\mathrm{lim}}]\), where the \(+1\) atmosphere is assumed to be the outermost one, \(\varDelta \varvec{I} (s_{i}) \equiv \varvec{I}_{-1} (s_{i}) - \varvec{I}_{+1} (s_{i})\), and \(\mathbf{O}_{\pm 1}\) are the evolution operators for both atmospheres (e.g., Landi Degl’Innocenti and Landi Degl’Innocenti 1985). Equation (25) is at the root of the flux-tube inversion code by Bellot Rubio et al. (1997, see Bellot Rubio et al. 1996, as well). A different treatment of discontinuities along the line of sight was proposed by Borrero et al. (2003) where the density of depth grid points is increased in the discontinuity neighborhood.

Atmospheres with Gaussian profiles The existence of net circular polarization in the penumbrae of sunspots was also the driver for Bellot Rubio (2003) to propose an implementation of the uncombed model by Solanki and Montavon (1993). The scenario is based on two components; namely, a magnetic component and a penumbral magnetic flux tube, the latter occupying a fractional area of the resolution element. The model parameters of the penumbral tube are built by Gaussian modifications (in depth) of those in the background. All the Gaussians have the same width and are located at the same depth, but their amplitudes depend on the specific model parameter, of course. With these premises, the SIR code was modified into the so-called SIRGAUS code, which has been used, among others by Jurčák et al. (2007), Jurčák and Bellot Rubio (2008), Ishikawa et al. (2010) and Quintero Noda et al. (2014).

Atmospheres with jump discontinuities Discontinuities can be treated numerically by decreasing the depth grid step or by using Eq. (25). A specific implementation of such discontinuities was first used by Louis et al. (2009) for an analysis of sunspot light bridges. Like SIRGAUS, it is based on a modification of the SIR code to take this particular scenario into account. In it, two magnetic atmospheres coexist in the resolution pixel: a background atmosphere whereby \(\varvec{B}\) and \(v_{\mathrm{LOS}}\) are constant with depth, and another magnetic atmosphere where those quantities have a Heaviside-like discontinuity. This code (called SIRJUMP) has also been used in practice, e.g., by Martínez González et al. (2012) and Sainz Dalda et al. (2012).

4 Degrees of approximation in the Stokes profiles

Since the ultimate goal of inversions is the bona fide reproduction of observed profiles, an analysis of the properties of Stokes spectra as functions of the wavelength is in order. Such an analysis should be aimed at finding the most conspicuous characteristics of the profiles in order for these characteristics to be the best reproduced among all the features. In other words, if, for instance, a given Stokes \(I (\lambda )\) profile shows only small deviations from a Gaussian, we should aim to obtain the Gaussian that best simulates the profile and identify the model parameters responsible for this bulk behavior. In some cases we may be satisfied just with this “coarse”, or not very detailed, description and leave small deviations or nuances to further, in-depth analysis that might even be carried out separately. As we are going to see, this approximation of incremental complexity for the profiles is well in line with the successive approximations we have described for the model atmospheres in Sect. 3.

The Stokes \(Q,\,U\), and V profiles and Stokes I in line depression; that is,

$$\begin{aligned} I_{\mathrm{d}} \equiv 1 - \frac{I}{I_{\mathrm{c}}}, \end{aligned}$$

as functions of \(x \equiv \lambda -\lambda _{0}\) (where \(\lambda _{0}\) is the central wavelength of the line), can be decomposed as sums of even and odd functions of x, as any other function defined over \(\mathbbm {R}\).Footnote 12 Specifically, if we call S(x) any one of the profiles, then

$$\begin{aligned} S (x) = S_{+} (x) + S_{-} (x), \end{aligned}$$


$$\begin{aligned} S_{+} (x) \equiv \frac{S(x) + S(-x)}{2} \,\,\, \mathrm{and} \,\,\, S_{-} (x) \equiv \frac{S(x) - S(-x)}{2}. \end{aligned}$$

By construction, \(S_{+}\) is even and \(S_{-}\) is odd.Footnote 13

Fig. 8
figure 8

Differences between the Stokes profiles of the Fe i line at 617.3 nm as synthesized in two model atmospheres that differ in the LOS velocity. See text for details

This parity property is very interesting because, as we have seen in former Sections, the Stokes profiles of any line formed in the absence of velocity gradients have definite symmetry (parity) properties. Since asymmetries in regular profiles are relatively small, that is, the profiles usually display a predominant parity character (even for Stokes \(I,\,Q\), and U, and odd for Stokes V), a sum of even and odd profiles may give account of the observed spectra as if the opposite parity component was indeed a perturbation related to velocity gradients. This can explain the success of ME inversion codes for fitting many observations (cf. Westendorp Plaza 1998; Orozco Suárez et al. 2010). The ME atmosphere accounts for the main bulk of the observed Stokes profiles. In Fig. 8 we plot the differences among the Stokes profiles of the Fe i line at 617.3 nm as synthesized in two model atmospheres. Both have the HSRA (Gingerich et al. 1971) stratification of temperature with \(B = 1500\) G and \(\gamma =\varphi = 30^\circ \). One of the models has a constant \(v_{\mathrm{LOS}} = 1.87\) \(\hbox {km}\,\hbox {s}^{-1}\)and the other a small gradient from \(v_{\mathrm{LOS}} = 2\) \(\hbox {km}\,\hbox {s}^{-1}\)at the bottom of the atmosphere through 1.75 \(\hbox {km}\,\hbox {s}^{-1}\)at the top. Both have \(\xi _{\mathrm{mac}} = 1\) \(\hbox {km}\,\hbox {s}^{-1}\)and have been convolved with the IMaX PSF. Note that these differential profiles display almost the opposite parity character to their corresponding Stokes profiles. A description with only \(S_{+}\) or \(S_{-}\) can thus provide a first approach analysis to a large number of observations. Indeed, \(S_{+}\) should be good for \(I,\,Q\), and U, and \(S_{-}\) for V as the differences are smaller than our “nominal” noise of \(10^{-3} \, I_{\mathrm{c}}\). This, of course, cannot always be the case. Very peculiar Stokes profiles are often observed as our polarization accuracy increases. For instance, Sigwarth et al. (1999) first reported the observation of one-lobed V profiles that were later studied in detail by Grossmann-Doerth et al. (2000) and Sigwarth (2001). Most of these profiles are found in the internetwork (e.g., Sainz Dalda et al. 2012).

Fig. 9
figure 9

First six eigenprofiles for Stokes \(I,\,Q,\,U\), and V. They are obtained from observations of the Fe i line pair at 630.1 and 603.2 nm. Image reproduced with permission from Rees et al. (2000), copyright by ESO

A different description of the Stokes profiles as functions of wavelength was proposed by Rees et al. (2000) who suggested that they can be described as sums of given principal components or eigenprofiles. If those eigenprofiles are contained in a database and are properly selected, they can increasingly give account of the profile shapes just by increasing the number of principal components in the expansion. An example of such eigenprofiles is given in Fig. 9. By adding these components properly weighted, the corresponding Stokes profiles are synthesized. This is the basis for all the PCA inversion techniques presented so far and the concept is fairly simple.

A similar approach to that of PCA was proposed by del Toro Iniesta and López Ariste (2003), based on the fact that Stokes \(I_{\mathrm{d}},\,Q,\,U\), and V belong to \(\mathbbm {L}^2\), the space of square integrable functions over \(\mathbbm {R}\). Since \(\mathbbm {L}^2\) is a Hilbert space with a well defined scalar product, an exact, infinite expansion of the profiles is possible in terms of any of the several bases of the space. Among those basis systems, del Toro Iniesta and López Ariste selected the family of Hermite functions, \(h_n (x)\), because of the similarity between the shapes of the first few elements of the family and the Stokes profiles (see Fig. 10). Somehow, the Hermite functions (see the aforementioned paper for a definition) provide a suitable basis for approximating the observed profiles with finite expansions of a few terms. Apart from their possible use in inversion codes that has not been investigated so far, the expansion of Stokes profiles in terms of Hermite functions has been used by Toussaint et al. (2012) for compressing observed Stokes profiles and by Harker (2012) for an automatic solar active region detection. The first authors, after expansion of the Stokes profiles, only keep the coefficients compressed with a conventional algorithm. This way they reduce the storage space by a factor 20 while keeping most of the information virtually noise free. The latter author discriminates the different active regions after looking at the complexity of the emerging Stokes profiles as described by their Hermite-function expansion coefficients.

Fig. 10
figure 10

First six Hermite functions. The abscissa has to be understood as a normalized wavelength (by the Doppler width of the line, for instance)

5 A synthesis approach

As described in the introduction, an approximate knowledge as to how the Stokes profiles react to the various model parameters is advisable as it helps to select the adequate observables as “orthogonal” as possible. The ideal way to explore the diagnostic capabilities of Stokes profiles is by means of response functions (see Sect. 6). Tackling the problem head-on, that is, synthesizing the profiles in different model atmospheres, may help grasp basic ideas on the Stokes profile behavior, though. The idea is to study how the Stokes profiles vary when the model parameters are modified. This is the aim of this section.

5.1 Constant atmospheres

As we have been doing in the two previous sections, let us start with the easiest case of atmospheres that do not vary with optical depth and, specifically, with ME atmospheres, since their analytic solution of the RTE enables a quick numerical overview of the space of model parameters. Figure 11 shows Stokes \(I,\,Q\), and V for the Fe i line at 617.3 nm with the same thermodynamic parameters used in Figs. 5 and 7. Since the linear polarization \(L^{2} \equiv Q^{2} + U^{2}\) is rotationally invariant and \(\varphi \) is constant throughout the ME atmosphere, we are assuming to have selected the preferred reference frame where U is identically zero, so that \(L=Q\). In practical terms we have selected \(\varphi = 0^{\circ }\) for all the profiles. The magnetic field strength is \(B=300\), 500, 900, and 1200 G for the four rows from top to bottom. The magnetic inclination is encoded in color: \(\gamma =0^{\circ }\) (dark green), 15\(^{\circ }\) (purple), 30\(^{\circ }\) (pink), 45\(^{\circ }\) (green), 60\(^{\circ }\) (blue), 75\(^{\circ }\) (red), and \(90^{\circ }\) (black). If we again assume a typical noise \(\sigma =10^{-3} I_{\mathrm{c}}\), then the small differences in the core of Stokes I for \(B=300\) may not be detected and a neat distinction between \(\gamma = 0^{\circ }\) and \(15^{\circ }\) or \(\gamma = 90^{\circ }\) and \(75^{\circ }\) may hardly be reachable with Stokes Q and V at a \(3\sigma \) level. Most certainly, however, there should not be any problem to distinguish between 0\(^{\circ }\), 30\(^{\circ }\), 60\(^{\circ }\) and \(90^{\circ }\). Of course, the dependences of I and V on \(\gamma \) are significant enough when the field is stronger. One should not be restricted to longitudinal components even for this small field strength, and the situation improves further for lower noise levels. This is a well-known issue: in polarimetric observations we are photon starved, indeed as much as night-time astronomers may be for detecting very faint objects.

Fig. 11
figure 11

Stokes \(I,\,Q\), and V as functions of the magnetic field inclination: dark green is for \(\gamma =0\), purple for 15, pink for 30, green for 45, blue for 60, red for 75, and black for \(90^{\circ }\). The magnetic field strength is different for the various rows: from top to bottom, \(B=300\), 500, 900, and 1200 G. The magnetic azimuth is identically zero

Fig. 12
figure 12

Stokes \(I,\,Q\), and V as functions of the magnetic field strength: black is for \(B=1200\), red for 900, blue for 500, green for 300, and purple for 200 G. The magnetic field inclination is different for the various rows: from top to bottom, \(\gamma =15^{\circ }\), 30\(^{\circ }\), 60\(^{\circ }\) and \(75^{\circ }\). The magnetic azimuth is identically zero

The alternative way to gauge the sensitivity of the Stokes profiles to B and \(\gamma \) by synthesizing the profiles is shown in Fig. 12. The rows correspond to different inclinations: \(\gamma =15^{\circ }\), 30\(^{\circ }\), 60\(^{\circ }\)and \(75^{\circ }\) from top to bottom. The magnetic field strength is this time encoded in colors: \(B=1200\) (black), 900 (red), 500 (blue), 300 (green), and 200 G (yellow). This complementary view shows that the dependence on B is in fact stronger than on \(\gamma \). Therefore, properly sampled Stokes profiles with enough polarimetric accuracy should be able to provide the required information to infer the magnetic field strength and inclination separately for most of the strength spectrum. Weaker fields will have bigger uncertainties for sure, but they should not imply a theoretical inability. As a matter of fact, the weaker the fields we want to explore, the smaller the noise we need in our observations, but this is somehow obvious.

5.2 Depth-dependent atmospheres

In the solar atmospheres, physical quantities do vary with depth. Acknowledging such variations almost always implies resorting to numerical solutions of the transfer equation. That was first done by Beckers (1969a, b). Numerical results by Staude (1969, 1970) and by Wittmann (1971, see Fig. 13) soon appeared and numerical codes were described (e.g., Wittmann 1974; Landi Degl’Innocenti 1976). Those first numerical codes capable of synthesizing the Stokes profiles were based on the fourth-order Runge–Kutta algorithm that is very accurate at the price of being very computationally expensive. A generalization of the method by Feautrier (1964) to polarized light was proposed by Auer et al. (1977) and later modified by Rees and Murphy (1987) in order to take magneto-optical effects into account. A fast solution of the RTE, after being reformulated as an integral Volterra equation (first suggested by Staude 1969), was formulated by Rees et al. (1989) with their so-called DELO method.

Fig. 13
figure 13

Comparison between the observed and computed Stokes I profile of the Fe i line at 630.25 nm. Image adapted from Wittmann (1971)

An improvement in accuracy and computational speed was obtained by Bellot Rubio et al. (1998) with an Hermitian method based on developing the Stokes vector as a fourth-order polynomial with depth. In Fig. 14 we show an example of a synthesis of the same line as in Fig. 13, where clear asymmetries in wavelength in the Stokes profiles can be seen. According to Sects. 3.2 and 4, such asymmetries have been produced by the variation with depth of physical quantities.

Learning how the various profile features of the four Stokes parameters depend on the many model parameters is certainly difficult and cannot be summarized in this paper. Experience, however, can train a researcher to be able to deduce—many times after a quick glance (the art)—a specific stronger \(v_{\mathrm{LOS}}\) or B here or there in the atmosphere. The situation is therefore much more complicated than for the ME case and one should rely upon inversions.

Fig. 14
figure 14

Stokes profiles of Fe i line at 630.25 nm as synthesized with an Hermitian method. Image reproduced with permission from Bellot Rubio et al. (1998), copyright by AAS

5.3 MHD simulations

The advent of magnetohydrodynamic (MHD) simulations such as those by Vögler (2003), Vögler et al. (2005) and Rempel (2012) has opened a new window on the exploration of the solar photosphere. They have enabled calculations that may help to envision what is expected from observations and to interpret them. The simulations also provide predictions that can be confronted with them. The enrichment has been remarkable because the realistic atmospheres resulting from the simulations have physical quantities varying along the optical path without any a priori assumptions and may be closer to the actual Sun than other simplified atmospheres. MHD simulations have been used to test the reliability of inversion techniques (Orozco Suárez et al. 2010; de la Cruz Rodríguez et al. 2012). In the first of these works, a confirmation of the predictions by Sánchez Almeida et al. (1996) was found: if your inference technique assumes magnetic fields and velocities constant with depth and you use it on data coming from an atmosphere where these quantities are depth dependent, the result is just the average of the actual stratification weighted with the generalized response function to perturbations of that quantity. In the second of these papers, the non-LTE inversion code called NICOLE is tested. Here we report on preliminary results by Harker et al. (2016) to illustrate the role of simulations as a tool to determine the optimum wavelength sample of Stokes profiles. This is a particularly interesting topic that is very relevant to the observational (and hence interpretational) work. Are the available samples enough for capturing all the information encoded in the Stokes profiles? What is the optimum sampling one should use with a new instrument under development depending on the goals such an instrument aims to fulfill?

Figure 15 shows the Stokes I (in depression), \(Q,\,U\), and V profiles of the Fe i line at 630.25 nm across a slit over a sunspot simulation by Rempel (2012) (left column panels) and their corresponding power spectra (right column panels). The simulation contains the transition from the quiet Sun (at both sides of the X dimension) through the penumbra and the umbra of a sunspot. The power spectra of Stokes \(V,\,Q\), and U are wider than that of Stokes I as a natural consequence of their shapes. Therefore, a cut-off frequency is better found in the polarization profiles. In this example, Shanon’s critical sampling interval is around 1.25 pm/pixel with the remarkable fact that no convolution with an instrumental PSF has been applied to the profiles. This value is coarser than that provided by several ground-based spectrographs with resolutions about \(R\simeq 10^{6}\) that would be considered too fine for the required diagnostics. As soon as the profiles are observed by an instrument with a finite width PSF, the Nyquist frequency will shrink to smaller values. Hence, the critical sampling will be coarser. These kinds of calculations can therefore help in designing new instruments and, as far as this paper is concerned, in deciding the adequate spectral sampling for any synthesis or inversion code: if the sampling is too fine, we are wasting computational time; if the sampling is too coarse, we are neglecting available information.

Fig. 15
figure 15

Stokes profiles (I is in line depression) across a sunspot simulation by Rempel (2012) (left column) and power spectra (right column). Image adapted from Harker et al. (2016)

6 Response functions

The needle in an old-fashioned ammeter is useful as far as it moves when a current circulates through the electric circuit. If, for instance, the needle does not move when a given very weak or very strong current passes through, then the ammeter is useless or insensitive to those intensities. Keeping this analogy, our observables, the Stokes profiles are useful for inferring given physical quantities as long as they change when those physical quantities vary. Of course, to be detectable, the change should be larger than the noise. Therefore, the correct question to ask of a given spectropolarimetric proxy whether it senses, for example, \(T,\,B\), or \(v_{\mathrm{LOS}}\) is how much it modifies when \(T,\,B\), or \(v_{\mathrm{LOS}}\) change. The preceding section has been an attempt in that direction: we have been changing the various atmospheric quantities and checking the modifications in the Stokes profiles. We have proceeded in the direct way, that is, through the solution of the differential RTE. This direct approach is not very useful in practice, though, as we already recognized in Sect. 5.2. Which quantity is to be modified first, at which optical depths, and by how much? If the problem is the simplest we talked about in the introduction (that of measuring velocities from the line core wavelength), then there may be some room for the direct approach. If not, the diagnostic capabilities of the Stokes profiles have to be further explored with the final goal of proceeding the inverse way, that is, of solving the integral equation known as the formal solution of the RTE (Eq. 8).

Here, we have a difficult problem where the observables depend nonlinearly on the unknowns. The nonlinear character is clear: the observables—on the left-hand side of Eq. (8)—are equal to an integral of the product of three terms, each depending strongly non-linearly on the physical quantities that characterize the model atmosphere (e.g., del Toro Iniesta 2003b). Changes in the Stokes spectrum are then very difficult to predict when modifications in the physical parameters occur. As in many other branches of physics, the diagnostic tools come out trough a linearization analysis. We can assume, for instance, that in a very special regime, when perturbations are small enough, changes occur linearly. These are the basics of linearization that, in the realm of solar physics were introduced for non-polarized light by Mein (1971, see also Canfield 1976; Caccin et al. 1977) through the so-called weighting functions, although the name of response functions (RFs) did not appear in the literature until the work by Beckers and Milkey (1975). Since polarization was not taken into account, those analyses were only strictly valid for isotropic media or, as far as we are concerned, for non-magnetic atmospheres. Response functions were introduced within polarized radiative transfer by the brothers Landi Degl’Innocenti and Landi Degl’Innocenti (1977) but RFs were paid little attention to until the works by Landi Degl’Innocenti and Landolfi (1982, 1983), Grossmann-Doerth et al. (1988), Sánchez Almeida (1992) and Ruiz Cobo and del Toro Iniesta (1992, 1994). The latter found that the perturbations \(\delta x_{i}\) exerted to the \(p+r\) atmospheric quantities (p of them varying with height and r constant) induce modifications \(\delta \varvec{I} (0)\) to the observed Stokes profile given by

$$\begin{aligned} \delta \varvec{I} (0) = \sum _{i=1}^{p+r} \int _0^{\infty } \varvec{R}_i (\tau _{\mathrm{c}}) \, \delta x_i (\tau _{\mathrm{c}}) \, \mathrm{d}\tau _{\mathrm{c}}, \end{aligned}$$


$$\begin{aligned} \varvec{R}_i (\tau _{\mathrm{c}}) \equiv \mathbf{O}(0, \tau _{\mathrm{c}}) \, \left[ \mathbf{K} (\tau _{\mathrm{c}}) \frac{\partial \varvec{S}}{\partial x_i} - \frac{\partial \mathbf{K}}{\partial x_i} (\varvec{I} - \varvec{S}) \right] . \end{aligned}$$

Therefore, the modification of \(\varvec{I} (0)\) is given by a sum of terms, each related to one of the atmospheric quantities characterizing the medium. The terms are integrals over the whole atmosphere of the model atmospheric quantities weighted by the RFs. The physical meaning in Eq. (29) is straightforward: imagine that we change only a given quantity (\(T,\,B,\,v_{\mathrm{LOS}}\), or any other) with a magnitude unity (i.e., 1 K, 1 G, 1 \(\hbox {km}\,\hbox {s}^{-1}\), etc.) in the narrow surroundings of a given continuum optical depth \(\tau _0\); then, the subsequent modification in the emergent Stokes spectrum is just the value of the corresponding RF at that optical depth:

$$\begin{aligned} \delta \varvec{I} (0) = \varvec{R}_i (\tau _0). \end{aligned}$$

Then, since the Stokes profiles are usually recorded normalized to some reference value (e.g., the average,—unpolarized—continuum intensity of the quiet Sun), units for RFs are inverse units of the corresponding quantity. That is, the response function to perturbations of temperature is measured in K\(^{-1}\); the response to perturbations in the magnetic field strength is measured in G\(^{-1}\); and so on. Thus, a response function can be defined as the modification that the Stokes spectrum experiences when the medium undergoes a unit perturbation of a given physical quantity at a given very narrow region in optical depth. Equation (30) tells us that these modifications build upon the variations of the propagation matrix and the source function vector with respect to the physical quantities and their evolution through the atmosphere as driven by the evolution operator. The two variations have an opposite sign. This means that they are somehow competing as one could expect. While \(\varvec{S}\) represents the sources of photons, \(\mathbf{K}\) represents the sinks. Indeed we know that we do not only speak about photon removal but also about pleochroism and dispersion but, certainly, the propagation matrix role is somehow similar to that of a withdrawal. The counterbalancing between \(\varvec{S}\) and \(\mathbf{K}\) is very important to understanding radiative transfer because some analyses forget it and only account for the effects of \(\mathbf{K}\) (absorption in the non-polarized case). This was clearly pointed out and explained by Ruiz Cobo and del Toro Iniesta (1994).

Equation (29) suggests that RFs play the role of partial derivatives of the observed Stokes profiles with respect to the atmospheric quantities once they have been discretized. This role is even more clear when we go down to the real world of a quadrature formula for that equation. Model atmospheres are usually described numerically by a grid of points that are spaced in logarithmic optical depth. Let \(\varDelta (\log \tau _{\mathrm{c}})\) be that spacing. If we call \(x_{i,j} \equiv x_i (\tau _j)\) and \(\varvec{R}_{i,j} \equiv \varvec{R}_i (\tau _j)\), then Eq. (29) can be written as

$$\begin{aligned} \delta \varvec{I}(0) = \sum _{i=1}^p \sum _{j=1}^n a_j \varvec{R}_{i,j} \, \delta x_{i,j} + \sum _{k=1}^r \varvec{R}'_k \, \delta x_k, \end{aligned}$$

where \(a_j = \varDelta (\log \tau _{\mathrm{c}}) \ln 10 \, c_j \tau _j\), with \(c_j\) being the quadrature coefficients. Therefore, if we include \(a_j\) in the RFs, as one usually does in graphical representations, then Eq. (32) shows the Stokes spectrum modifications as linear expansions of the new variables \(x_{i,j}\) and \(x_k\). The first term on the right-hand side corresponds to those physical quantities that vary with depth; the second stands for those that are assumed to be constant.Footnote 14 In summary, we can say that RFs are indeed partial derivatives of \(\varvec{I}(0)\) with respect to the (numerical) atmospheric parameters and, thus, they directly provide the sensitivities of the Stokes spectrum to perturbations of the physical conditions in the medium. Examples of these RFs are plotted in Figs. 16, 17, 18 and 19. They have been evaluated for Stokes \(I,\,Q,\,U\), and V, respectively, of the Fe i line at 630.25 nm to perturbations of the temperature (top row panels), of the magnetic field strength (middle row panels), and of the LOS velocity (bottom row panels). The RF values are multiplied by \(10^6\,\)K\(^{-1},\,10^6\,\)G\(^{-1}\), and \(10^4\,(\hbox {km}\,\hbox {s}^{-1})^{-1}\). The two columns correspond to two different model atmospheres. That in the left-hand columns has the temperature stratification of the HSRA model (Gingerich et al. 1971), a constant \(B=2000\) G, \(\gamma = 30^\circ \), and \(\varphi = 60^\circ \); the plasma is at rest in this model. That in the right-hand columns has a 500 K cooler temperature, and a magnetic field 500 G weaker, 20\(^\circ \) less inclined, and an azimuth of 10\(^\circ \); \(v_{\mathrm{LOS}} = 1.58 + 0.3 \log \tau _{\mathrm{c}}\) (\(\hbox {km}\,\hbox {s}^{-1}\)).

Equation (32) hints at a way for calculating RFs through what could be called the brute force method. This method is a four-step procedure: (1) synthesis of the Stokes spectrum in a given model atmosphere; (2) perturbation of just one of the (numerical) atmospheric parameters by a small amount and synthesis of the spectrum in the new model atmosphere; (3) calculation of the ratio between the difference of the two spectra and the perturbation; (4) repetition of steps (2) and (3) for each optical depth, for each wavelength sample, and for the remaining Stokes parameters. This is a formidable calculation as soon as the number of free parameters is large. Fortunately, Eq. (30) provides a shortcut since the evolution operator, the propagation matrix, and the source function vector have to be calculated anyway in every synthesis of the spectrum. With only the added calculations of the derivatives, one can easily calculate RFs at the same time as the RTE is solved. This property is extremely useful for inversion codes, as we shall see in Sect. 7.

Equation (32) also offers an explicit explanation of the astrophysical ill conditioning we commented on in Sect. 1: the same modification of \(\varvec{I}(0)\) may be produced by perturbations of different quantities or by perturbations of a single physical quantity but at several optical depths. That is, the effects of temperature can be similar to those of the magnetic field strength or the effects of perturbing B at \(\log \tau _{\mathrm{c}} = -0.5\) can be the same as those of perturbing B at \(\log \tau _{\mathrm{c}} = -3\). Therefore, we cannot say that the changes \(\delta \varvec{I}(0)\) are produced by perturbations of this physical parameter or that other without considering all of them at the same time. Cross-talk among some parameters may appear and, then, the retrieval of those parameters will be less reliable (see e.g., Sect. 6.2).

Fig. 16
figure 16

RFs of Stokes I to perturbations of T (top panels), B (middle panels), and \(v_{\mathrm{LOS}}\) (bottom panels). Units are \(10^{-6}\,\)K\(^{-1},\,10^{-6}\,\)G\(^{-1}\), and \(10^{-4}\,(\hbox {km}\,\hbox {s}^{-1})^{-1}\).The two columns correspond to two different model atmospheres. That in the left-hand column has the temperature stratification of the HSRA model (Gingerich et al. 1971), a constant \(B=2000\) G, \(\gamma = 30^\circ \), and \(\varphi = 60^\circ \); the plasma is at rest in this model. That in the right-hand column has a 500 K cooler temperature, and a magnetic field 500 G weaker, 20\(^\circ \) less inclined, and an azimuth of 10\(^\circ \); \(v_{\mathrm{LOS}} = 1.58 + 0.3 \log \tau _{\mathrm{c}}\) (\(\hbox {km}\,\hbox {s}^{-1}\))

Fig. 17
figure 17

Same as Fig. 16 for Stokes Q

Fig. 18
figure 18

Same as Fig. 16 for Stokes U

Fig. 19
figure 19

Same as Fig. 16 for Stokes V

6.1 Properties of response functions

A glance at Figs. 16, 17, 18 and 19 readily tells us that some RFs are bigger than the others. This means that our line is more sensitive to some physical quantities than to others. However, the fact that RFs are measured in inverse units of those for their corresponding parameters makes it difficult to compare their relative sensitivity. In this regard, relative RFs shed some light. If we consider relative perturbations \(\delta x_{i,j}{/}x_{i,j}\), then we can define relative RFs \(\tilde{\varvec{R}}_{i,j} \equiv \varvec{R}_{i,j} x_{i,j}\). Hence, \(\tilde{\varvec{R}}_{i,j}\) speak about the response of the Stokes spectrum to relative (i.e., dimensionless) perturbations. Experience shows that relative RFs to T perturbations are the biggest at all depths and wavelengths, clearly indicating that temperature is the most important quantity in line formation. (Indeed, temperature is the physical quantity that governs the thermodynamical state of the material medium because we assume that hydrostatic equilibrium prevails throughout our model atmospheres. After this assumption, pressure, the necessary second thermodynamic variable gets automatically prescribed.) Response functions to temperature perturbations start being different from zero at the deepest layers when compared to the remaining quantities. This is because the second term in the right-hand side of Eq. (30) goes to zero as the continuum optical depth tends to infinity. This was explained by Ruiz Cobo and del Toro Iniesta (1994). This physical fact implies that spectral lines tend to be insensitive at these low layers to the other physical quantities.

The Stokes profile wavelength symmetries are preserved in RFs: in the absence of velocity gradients, RFs of Stokes \(I,\,Q\), and U to any perturbation are even functions of wavelength and RFs of Stokes V are odd. This means that, in fact, velocity gradients increase the diagnostic capabilities of spectral lines. In their absence, half of the profile is useless since the information they provide is exactly the same as the other half.Footnote 15

For given purposes, we can conceive constant perturbations with depth, in spite of the quantity being depth dependent. Owing to their nature, some physical quantities may be assumed constant with depth (e.g., macro- and micro-turbulent velocity, or any of the ME free parameters). In such cases, constant perturbations are in order. If so, then, Eq. (29) tells us that the resulting modification in the Stokes spectrum is given by the product of such a constant perturbation times the integral of the corresponding RF over the whole atmosphere. Hence, we can say that the RF to a constant perturbation, \(\varvec{R}'\) in Eq. (32), is directly the integral of the regular response function or, in numerical terms,

$$\begin{aligned} \varvec{R}'_k \equiv \sum _{j=1}^n a_j \varvec{R}_{k,j}. \end{aligned}$$

As shown by Ruiz Cobo and del Toro Iniesta (1994), RFs play the role of a PSF in the general theory of linear systems. Under this general theory, our system—the Stokes spectrum—experiences an input (the perturbation) and provides an output, \(\delta \varvec{I} (0)\). If the input is a Dirac delta, then the output is the corresponding value of the response function. If the input is harmonic throughout the atmosphere, then the response is the Fourier transform of the RF.

Response functions are model dependent. This property is extremely important in our understanding of spectral line sensitivities and, thus, in the inversion of the RTE. Instead of being a drawback, such a model dependence helps in disentangling the effects produced by the different quantities in distinct model atmospheres. Even in fixed model atmospheres, it is very difficult to discard one quantity or the other at once, however, and most of them have to be retrieved at the same time. Once this is carried out, one can theoretically understand the meaning of measurements (Sánchez Almeida et al. 1996; del Toro Iniesta 2003b).

Last, but not least, the linear nature of RFs allows us to generalize them to any linear combination of Stokes profile wavelength samples. This property helps us understand what can be extracted from different proxies that are traditional in solar polarimetry but, most importantly, helps the astronomer in taking influence of the instrument into account. Since most instruments act as linear systems on light, the detected spectrum is a convolution of the actual spectrum with the instrument spectral PSF. Convolution is linear and, thus, one can easily conclude (Ruiz Cobo and del Toro Iniesta 1994) that the RFs of the convolved spectrum are nothing but those of the original spectrum convolved as well with the PSF.Footnote 16

6.2 Analytic response functions

When all the terms on the right-hand side of Eq. (30) can be calculated analytically (see Sect. 2.3), RFs are necessarily analytic and then we can use them to gain some physical insight into the diagnostic capabilities of the Stokes spectrum about the physical quantities that characterize the medium. This is the case of the ME approximation, where all the quantities are constant with depth. There, index j in Eq. (32) drops and (after inclusion of the coefficient into the RF) we can properly write:

$$\begin{aligned} \varvec{R}_i (\lambda ) = \frac{\partial \varvec{I}(\lambda )}{\partial x_i}. \end{aligned}$$

That is, that RFs are strict partial derivatives of the Stokes spectrum with respect to the free parameters of the problem (Orozco Suárez and del Toro Iniesta 2007). In Eq. (34) we have removed the \(\tau _{\mathrm{c}} = 0\) indicator in the emergent Stokes spectrum and, rather, we have made explicit the dependence of RFs on wavelength. We have just seen in the former subsection that these RFs are indeed integrals over \(\tau _{\mathrm{c}}\) and, hence, the dependence on it disappears. This analytic character is particularly important in the controversial discussion about the possibility of disentangling B, from \(\alpha \) (the filling factor) in the case that our magnetic features are not fully spatially resolved. In Fig. 20 we reproduce Fig. 3 from del Toro Iniesta et al. (2010). It shows RFs to (constant) perturbations in those two quantities plus \(v_{\mathrm{LOS}}\) in a ME atmosphere. As one can clearly see, both Stokes V RFs to \(\alpha \) and to B perturbations are almost proportional among themselves and to the Stokes V profile itself. This can easily be traced back to the expected behavior from the weak field approximation, as expressed in Eq. (24). From this proportionality we should conclude that it is indeed very difficult to discern the values of B and \(\alpha \) separately. If it were not for Stokes I, the widespread belief that only \(\alpha B\) or \(\alpha B \cos \gamma \) can be retrieved would be true. However, the Stokes I RFs to \(\alpha \) and B are neatly different from each other and this necessarily implies that we have the means of inferring the two quantities independently. Stokes Q and U also help in disentangling the field strength and the filling factor for similar reasons as soon as they are above the noise level.

Fig. 20
figure 20

Stokes I (left panel) and V (right panel) RFs to \(v_{\mathrm{LOS}}\) (solid, black lines), to B (dotted, blue lines), and to \(\alpha \) (dashed, red lines) in the weak-field case. Perturbations of 10 m s\(^{-1}\) for \(v_{\mathrm{LOS}}\), of 10 G for B, and of 0.1 for \(\alpha \) have been assumed. Image reproduced with permission from del Toro Iniesta et al. (2010), copyright by AAS

Among other features of RFs, the latter authors showed how the thermodynamical parameters of the ME atmosphere can have cross-talk among themselves: their RFs are fairly similar in shape, so that their effects can be misinterpreted by the inversion codes. Notably, the RFs to perturbations of \(B,\,\gamma ,\,\varphi \), and \(v_{\mathrm{LOS}}\) are markedly different from one another and with respect to those of \(\eta _0,\,\varDelta \lambda _{\mathrm{D}}\), and a. This explains the good result of ME inversion codes in accurately retrieving the magnetic and dynamic parameters while the thermodynamic parameters are sometimes wrong. Our conclusion is also consistent with, and explains, the findings by Lagg et al. (2004) who decided to leave the damping parameter fixed with minor changes in the fitted magnetic and dynamic parameters while \(\varDelta \lambda _{\mathrm{D}}\) and \(\eta _0\) were significantly affected. Linearity can also be useful for spatially coupled inversion techniques (see Sect. 7.5.1 below).

7 Inversion techniques

Once we have discussed all the ingredients and assumptions, we can face the main problem in astrophysics, namely that of making theory and observations compatible. In other words, we can face the inversion problem by deriving the unknown physical quantities through comparison between observed and synthetic Stokes profiles.Footnote 17 Figure 21 describes how the problem gets complicated as compared to the mere forward problem in Fig. 2.

Fig. 21
figure 21

Block diagram of the inversion problem under NLTE conditions

One can clearly see how a new overarching loop is present that indicates the needs for changing the model atmosphere if the synthetic Stokes spectra do not properly fit the observed ones. The problem turns out to be formidable and requires new, specific assumptions that make it tractable. In particular, some of the quantities have to be calculated from the strict NLTE conditions (see Sect. 7.2.3).

Even the simpler LTE problem gets complicated, and indeed becomes iterative, regardless of acknowledging the stratification in the physical quantities of the atmosphere (see Fig. 22). The needs for modifying such a model atmosphere according to the deviations between observed and synthetic profiles makes a loop necessary after calculating both the synthetic spectra and their derivatives with respect to the free parameters. Fortunately, we know how to calculate these derivatives through RFs at the same time as we synthesize the Stokes spectrum with little extra computational effort.

Fig. 22
figure 22

Block diagram of the inversion problem under LTE conditions

Looking for convergence means measuring the distance between observed and synthetic profiles in the space of observables. Any inversion procedure must have a threshold below which the user can consider that convergence has been reached because the fit cannot be further improved within the current assumptions.

7.1 Topology in the space of observables

The topological problem depicted in the introduction for the inversion problem (or any astrophysical inference) needs to be substantiated in minimizing a distance: a metric in the space of the observables. Since Stokes \(I_{\mathrm{d}},\,Q,\,U\), and V belong to \(\mathbbm {L}^2\), the quadratic norm of the difference turns out to be the natural distance between any two profiles. We want to approximate two sets of profiles, so that all four Stokes parameters should be taken into account. Therefore, when in practice we deal with discrete samples, the sought-for distance can be written as

$$\begin{aligned} \chi ^2 (\varvec{x}) \equiv \frac{1}{N_{\mathrm{f}}} \sum _{s=0}^3 \sum _{i=1}^q \left[ I_s^\mathrm{obs} (\lambda _i) - I_s^\mathrm{syn} (\lambda _i; \varvec{x}) \right] ^2 \, w_{s,i}^2, \end{aligned}$$

where index s runs for the four Stokes parameters, we assume q wavelength samples, and \(N_{\mathrm{f}}\) stands for the number of degrees of freedom, that is, the difference between the number of observables (4q) and that of the free parameters (the number of elements in \(\varvec{x},\,np+r\); see Sect. 2.1 and Eq. 9).

\(\chi ^2 (\varvec{x})\) is a merit function of the atmospheric quantities that measures the distance between the observed and the synthetic profiles and has to be minimized in order to achieve a good fit. Having a normalized merit function to the degrees of freedom is useful to warn the user not to use an unreasonably large number of free parameters as compared with the number of observables. In such a case, \(\chi ^2\) would turn out to be always too big. The weights \(w_{s,i}\) can be used to favor some data more than the others. For instance, one can set them to the inverse of the measurement errors. For many applications they are simply kept at unity.

We can look at \(\chi ^2 (\varvec{x})\) as a scalar field in an (\(np+r\))-dimensional space. Since the number of dimensions may be too large, the minimization problem may turn out to be intractable. Before going to specific techniques that make it affordable, let us consider the paths through which we can look for the minimum of the merit function. That is, we have to find the derivatives of \(\chi ^2\) with respect to the atmospheric free parameters. Ruiz Cobo and del Toro Iniesta (1992, see Ruiz Cobo and del Toro Iniesta 1994; del Toro Iniesta 2003b as well) showed that such derivatives are directly given by the RFs:

$$\begin{aligned} \frac{\partial \chi ^2}{\partial x_m} = \frac{2}{N_{\mathrm{f}}} \sum _{s=0}^3 \sum _{i=1}^q \left[ I_s^\mathrm{obs} (\lambda _i) - I_s^\mathrm{syn} (\lambda _i; \varvec{x}) \right] \, w_{s,i}^2 \, R_{m,s} (\lambda _i), \end{aligned}$$

where, for the sake of a more compact notation, index m runs from 1 to \(np+r\) (including constant and variable physical quantities), the quadrature coefficients in Eq. (32) are assumed to be included in the RFs when needed, and no distinction is made between \(\varvec{R}\)’s and \(\varvec{R}'\)’s. The same authors also demonstrated that the second derivatives can be approximated by

$$\begin{aligned} \frac{\partial ^2 \chi ^2}{\partial x_m \partial x_k} \simeq \frac{2}{N_{\mathrm{f}}} \sum _{s=0}^3 \sum _{i=1}^q w_{s,i}^2 \, \left[ R_{m,s} (\lambda _i) \, R_{k,s} (\lambda _i) \right] . \end{aligned}$$

Regardless of the way we approach its minimum in the hyperspace of parameters, that \(\chi ^2\) is the natural metric is reinforced by the fact that other metrics have been tried (e.g., Lagg et al. 2004) that have finally converged to almost the same formulation as in Eq. (35) (Lagg et al. 2007).

7.2 Levenberg–Marquardt based inversions

The process of profile fitting is nothing but the successive (and iterative) approximation of synthetic Stokes profiles until they reach a minimum distance to the observed ones. Hence, an initial guess model atmosphere is needed to start the procedure. Step by step, the model will be modified, so that the resulting synthetic Stokes profiles will approach more and more the observations. When we are close enough to the \(\chi ^{2}\) minimum, an approximate, parabolic motion may be useful:

$$\begin{aligned} \chi ^{2} (\varvec{x} + \delta \varvec{x}) \simeq \chi ^{2} (\varvec{x}) + \delta \varvec{x}^{\scriptscriptstyle \mathrm{T}}(\nabla \chi ^{2} + \frac{1}{2} \mathbf{H}' \delta \varvec{x}), \end{aligned}$$

where the elements of the gradient are given by Eq. (36) and \(\mathbf{H}'\) is one half of the Hessian matrix, whose elements are given by Eq. (37), that is, \(H'_{m,k} = \partial ^2 \chi ^2/\partial x_m \partial x_k\). In Eq. (38) a scalar product is understood between a transposed (row) vector and a regular (column) vector. When we are very near the minimum, it is clear that the second term in the right-hand side of Eq. (38) should be zero, and this is done in the Levenberg–Marquardt (LM) algorithm (e.g., Press et al. 1986) by requiring that

$$\begin{aligned} \nabla \chi ^{2} + \mathbf{H} \delta \varvec{x} = \varvec{0}, \end{aligned}$$

where the new matrix \(\mathbf{H}\) is defined by

$$\begin{aligned} 2 H_{ij} \equiv \left\{ \begin{array}{lll} H'_{ij} (1 + \lambda ), &{}\quad \text{ if } &{}\quad i=j, \\ H'_{ij}, &{}\quad \text{ if } &{}\quad i\ne j, \end{array} \right. \end{aligned}$$

where \(\lambda \) is an ad hoc parameter that helps tuning the algorithm for it to work as if the approximation is almost first order (\(\lambda \) is large) or fully second order (when \(\lambda \) is small). \(\lambda \) is changed in every step in the iteration, depending on how far or close we are to the minimum as indicated by the variation of \(\chi ^2\).

At the end of the procedure we will most likely not find the true minimum but, hopefully, will be close enough to neglect the gradient term in Eq. (38). In such a case we can write

$$\begin{aligned} \varDelta \chi ^2 = \delta \varvec{x}^{\scriptscriptstyle \mathrm{T}}\mathbf{H}' \delta \varvec{x}. \end{aligned}$$

The good news about this relationship is that, since the Hessian matrix is made up of RFs, one can finally obtain an expression for the inversion uncertainties in the physical quantities that are functions of the RFs (see del Toro Iniesta 2003b).

$$\begin{aligned} \sigma _m^2 \simeq \frac{2}{np+r} \frac{{\sum _{s=0}^3 \sum _{i=1}^q} \left[ I_s^\mathrm{obs} (\lambda _i) - I_s^\mathrm{syn} (\lambda _i; \varvec{x}) \right] ^2 \, w_{s,i}^2}{{\sum _{s=0}^3 \sum _{i=1}^q} R^2_{m,s} (\lambda _i) w_{s,i}^2}. \end{aligned}$$

Certainly, the larger the RFs, the smaller the uncertainties.

7.2.1 Problems in practice

Nodes and singular value decomposition With an LM algorithm, inversion of the RTE reduces in summary to solving Eq. (39), which implies the inversion of the modified Hessian matrix. One can certainly not expect the same practical problems when \(\mathbf{H}\) is built for an ME inversion or for a more general assumption where physical quantities vary with depth. Already in the ME case, \(\mathbf{H}\) has dimensions \(9\times 9\) or \(10 \times 10\) (if the filling factor is assumed to be different from unity). Inverting a \(10 \times 10\) matrix is not difficult but, in the more general case, when the atmosphere is parameterized with a depth grid of 20 or 30 points, the Hessian may have several tens or even hundreds of elements in both dimensions. Inverting such matrices is by no means an easy numerical task.

A second problem can appear in practice as \(\mathbf{H}\) may be a quasi-singular (numerically singular) matrix because of the different sensitivities of the Stokes parameters to the various physical quantities that may vary even by orders of magnitude. One particular Stokes parameter of one specific spectral line may not be sensitive to a given physical quantity at given depths in the atmosphere. We already know, for instance, that, about or below \(\log \tau _{\mathrm{c}} = 0\), only temperature leaves its fingerprints on the spectrum: the profiles are insensitive to the other quantities (see Sect. 6.1). Hence, the corresponding matrix elements in \(\mathbf{H}\) will be close to zero, so that they hamper the Hessian matrix inversion. Here we report on the way SIR deals with these two problems. Other inversion techniques (e.g., LILIA, NICOLE, MILOS) apply similar procedures, although no much explicit information is available. The first problem can be circumvented by using several iteration cycles in each of which the number of free parameters is fixed and increased successively from cycle to cycle. The inversion of quasi-singular matrices is usually carried out through the singular value decomposition technique (SVD; e.g., Press et al. 1986).

Nodes and equivalent response functions Imagine that we only have one physical quantity to deal with in the inversion. Then, the number of free parameters is n, the number of depth grid points. Our Hessian is an \(n \times n\) matrix. A practical way out of this involved numerical problem is found (Ruiz Cobo and del Toro Iniesta 1992) by assuming that all depth grid point perturbations are not free but bound by some interpolation formula. For example, we can use polynomial splines. This assumption allows to consider any number \(n^{\prime }\) of free parameters from 1 through n. If such a number is 1, we assume we are applying a constant perturbation, whatever the original stratification is. The perturbation will be linear if the number is 2, parabolic if it is 3, and so on. As explained in del Toro Iniesta (2003b), the use of nodes requires the evaluation of equivalent RFs at the nodes, \(\tilde{\varvec{R}}\)’s, in order to take information from the whole atmosphere into account. With this technique, the equivalent of Eq. (32) in practice becomes

$$\begin{aligned} \delta \varvec{I}^\mathrm{syn} (\lambda _l) = \sum _{m=1}^{n'p+r} \tilde{\varvec{R}}_m (\lambda _l) \, \delta y_m, \end{aligned}$$

where \(y_m\) is a new notation for the free parameters at the nodes.Footnote 18 Constant and depth-varying physical quantities are treated the same in Eq. (43), and the quadrature coefficients are assumed to be included in the RF definitions.

Different sensitivities to the various free parameters Since, by construction, the Hessian matrix is real and symmetric, its inversion is done through diagonalization. The quasi-singularity of the Hessian matrix shows up as some of the diagonal elements, \(\gamma _k\), being too close to zero to be inverted with accuracy. This could be for two reasons. The first one is that, within the free parameters belonging to the same physical quantity, some depths are necessarily more important than others: as we have seen in Sect. 6, RFs tend to zero at some depths. This problem is numerically solved by setting \(1/\gamma _k\) to zero, whenever \(\gamma _k\) is considered too small (under a given threshold). By doing so, we are not (over)correcting a parameter that has no relevance at this time.Footnote 19 The second reason for singularity is that some physical quantities are more important than others. Therefore, the sensitivities of Stokes profiles to perturbations of those quantities can be larger (even by an order of magnitude) than perturbations to less significant quantities. This problem is overcome by using relative instead of absolute RFs as we explained in Sect. 6.1. Nevertheless, and in order to make sure that all physical quantities are considered during each inversion cycle, the zeroing of the less significant diagonal elements is applied separately to each physical quantity.

Fig. 23
figure 23

Convergence rate (logarithm of the merit function in Eq. 35) comparison between an inversion run by using fixed initial guesses for the physical quantities (black line) and the same by using the approximate estimates given in the “Appendix” (red line). One thousand Stokes profiles of the Fe i line at 617.3 nm have been used in the experiment and average results are plotted. They have been synthesized with the MELANIE code (Socas-Navarro 2001) with uniformly distributed field strengths between 0 and 2000 G, inclinations and azimuths between 0\(^\circ \) and 180\(^\circ \), and LOS velocities between −2 and 2 \(\hbox {km}\,\hbox {s}^{-1}\). The profiles are sampled at 30 wavelengths regularly spaced every 2 pm. A C-programmed version of MILOS (Orozco Suárez and del Toro Iniesta 2007) has been used. No noise has been added to the profiles

Initialization The lack of uniqueness we were discussing in the introduction may be revealed in a dependence on the initialization parameters. The community has realized this fact for a long time and codes such as SIR or HeLIx have been explicitly tested and showed robust against different initializations (Ruiz Cobo and del Toro Iniesta 1992; Lagg et al. 2004). Such robustness can nonetheless depend on the specific Stokes profiles and model atmosphere. Therefore, an advisable practice when doubts arise is to use several different initial guesses for estimating the uncertainties in the results for each physical quantity. Several attempts have been performed as well for finding an ideal initialization guess, including specific genetic algorithm procedures only for getting the initial guess (Skumanich, private communication). In our opinion, having initializations almost as complicated as the inversion itself does not make much sense.

While preparing a given application, we discovered an outstanding, very economical way of making optimum initial guesses. Such an initialization is very much in the line we have been supporting throughout the paper, namely, the usefulness of a step-by-step approach. Using the classical center-of-gravity (Rees and Semel 1979) and weak-field approximations of the “Appendix”, we have obtained a remarkable acceleration in the convergence, as shown in Fig. 23.Footnote 20 According to Uitenbroek (2003), the center of gravity technique has the remarkable property of being quite insensitive to the spectral resolution of the data. We can add that the results in Fig. 23, which have been obtained from profiles sampled at 30 wavelengths, are indeed fairly similar to those for six wavelength samples like those foreseen for the SO/PHI instrument (SO/PHI is the acronym for the Polarimetric and Helioseismic Imager for the ESA’s Solar Orbiter mission; see Solanki et al. 2015).

Consistency among different ME implementations Milne–Eddington inversion techniques have become widely used to infer the vector magnetic field and the LOS velocity of the solar plasma. The physical interpretation of ME results was first investigated by Westendorp Plaza (1997, 1998) while comparing them to those obtained with SIR. A check of the theoretically predicted result and that obtained in practice was carried out in the latter paper for the HAO-ASP code (Skumanich et al. 1985; Lites et al. 1988). Predictions by Sánchez Almeida et al. (1996) were confirmed: (1) measurements are essentially the result of averaging the actual parameter stratification with the corresponding generalized response function; (2) the so-called ME thermodynamic parameters had little correlation with the actual (quickly varying with optical depth) thermodynamic parameters. Further practice has usually shown that these “thermodynamic” parameters may not be very consistent among different runs for the same spectral line, while \(\varvec{B}\) and \(v_{\mathrm{LOS}}\) are fairly accurate. This finding has been physically explained by Orozco Suárez and del Toro Iniesta (2007) who explored the shapes of ME response functions: RFs to perturbations in \(\eta _{0},\,\varDelta \lambda _{D}\), and a are fairly similar among themselves and, hence, cross-talk may appear between every two parameters; this is not the case, however, with RFs to perturbations in \(\varvec{B}\) and \(v_{\mathrm{LOS}}\), which are neatly different.

Although the consistency among different versions of the ME inversion is therefore guaranteed through physical analysis, the various implementations may have different numerical approximations and the technique can even be different (e.g., LM, genetic algorithms, PCA, etc.). Motivated by this fact, Borrero et al. (2014) have checked the consistency among the HAO-ASP, HeLIx, and VFISV codes. They have found a positive confirmation of the previous results by using MHD simulations as a test bench instead of ME Stokes profiles.

7.2.2 Automatic selection of nodes

In several specific problems, the optimum choice for parameterizing the atmosphere is actually an art, i.e., a skill that arises after the continuous exercise of intuitive skills. Codes like SIR help the user by enabling several node choices for each parameter. Some different choices can yield similar fits; others can make it impossible to reach a convergent solution. As we have commented on in several places of the present paper, a generally good approach is lex parsimoniae: between two solutions reaching a similar fit quality, that with fewest nodes should be selected. However, in practice, one cannot repeat the inversion several times in order to choose the optimum number of nodes for each parameter.

The current version of SIR includes an algorithm that automatically selects such a number of nodes for every parameter in each iteration. The algorithm is based on the quest for the roots or zeros of the partial derivative of \(\chi ^2\) with respect to each parameter, as written in Eq. (36). Let a be one of the atmospheric quantities varying with optical depth and \(a_p\) its value at \(\tau _p\); that is, \(a_p\) is one of the elements in the model atmosphere of Eq. (9). Let us call \(d_{a_p} \equiv (\partial \chi ^2/\partial a_p)\). Let us also suppose that we are only dealing with the intensity profile of one spectral line, and that, at a given iterative step, \(I^\mathrm{obs} > I^\mathrm{syn}\) for all wavelengths. If \(R_{p,0} (\lambda _i)\) is positive for all wavelengths and all optical depths, it is then clear that \(d_{a_p}\) will also be positive at all depths. To get a better fit, then, we will need to increase a everywhere and just one node might be enough. Following this reasoning, it is easy to conclude that the number of nodes for a given physical quantity should be related to the number of times that the derivative \(d_a\) changes its sign over the optical depth range. Obviously, as the derivative depends on the observational data, it is influenced by noise and, consequently, spurious zeros should be eliminated. Consequently, the algorithm determines the number of nodes after looking for positive relative maxima, and negative relative minima, larger in absolute value than a given threshold. An example of the behavior of this automatic selection feature in SIR is shown in Sect. 8.1.

This automatic selection of the number of nodes can be considered a quantitative implementation of the principle of Occam’s razor. Others are indeed possible. An alternative was presented by Asensio Ramos (2006). This author uses the minimum description length principle to effectively find the optimum number of expansion coefficients in PCA-based inversion techniques or the optimum number of nodes for the various atmospheric parameters in the SIR code. This problem is also addressed by Asensio Ramos et al. (2007b), who estimated the intrinsic dimensionality of spectropolarimetric data based on nearest neighbor considerations and applying the principle of maximum likelihood.

7.2.3 A non-LTE inversion technique

The only available non-LTE inversion technique so far is called NICOLE by Socas-Navarro et al. (2015), which is an evolution of the IAC-NLTE code by Socas-Navarro et al. (2000). In turn, the latter was adapted from a previous one for non-polarized problems by Socas-Navarro et al. (1998). Since even the minute details of the code are extensively described in those papers, let us simply stress here the main assumptions underlying the code for those potential users to know the validity framework of the results.

Regarding the minimization technique, NICOLE employs an LM algorithm that proceeds very similarly to SIR by using response functions. Since RFs cannot strictly be calculated in this specific non-linear, non-local problem, the fixed departure coefficients (FDC) approximation is used to deal with the derivatives of the LTE atomic level populations once the \(\beta \)’s in Eq. (12) are fixed from a previous calculation (Socas-Navarro et al. 1998). Although the approximation is not exact and, indeed, the authors show deviations from the correct values, FDC is good enough for the purposes of getting RFs that pave the way for the code to find the minimum distance between the observed and the synthetic profiles. The second important approximation in NICOLE is the field-free approximation (Rees 1969), as we already mentioned in Sect. 2.3. It consists in obtaining the departure coefficients from an unpolarized, non-LTE code and uses them in a formal solution of the RTE. This way, the \(\beta \)’s are decoupled from the magnetic field. According to the authors, this approximation is valid because the actual level populations are governed by strong UV (weakly split) lines and those lines with large Zeeman splittings are weak enough not to have a significant influence on the statistical equilibrium equations.

NICOLE only deals with polarization induced by the Zeeman effect. Hence, any polarization produced by scattering or depolarization through the Hanle effect are not taken into account. Other assumptions, such as the validity of complete frequency redistribution, may have implications for applications in specific spectral lines. It has recently been used in the analysis of the Ca ii line at 854.2 nm (de la Cruz Rodríguez et al. 2015).

7.3 Database-search inversions

The foundations of Principal Component Analysis inversion codes have already been explained in Sect. 4. The expansion of Stokes profiles as linear combinations of eigenprofiles is at the root of the technique. A set of synthetic Stokes profiles of a given spectral line, obtained with a large number of model atmospheres, is used as a training set to decompose each synthetic and observed profile into a sum of a small number of such eigenprofiles. The inversion topological problem, then, is reduced to a search in the low-dimensional space generated by the eigenprofiles or, more specifically, in the space of the expansion coefficients. The technique has proved to be efficient for quick inversions of the observations and looks very promising as a classification tool for profiles that can later be examined with more detailed techniques. We say so because no PCA code so far has been envisaged to go further than an ME or a slab atmosphere. This is natural in a way. One can build a database where a few constant parameters can get values within given ranges but the mere construction of such a database would be a formidable problem if variations with optical depth of the atmospheric physical quantities are considered. The technique, therefore, is unable to deal with gradients and the like, which—we know—populate a large fraction of the magnetic Sun. Nevertheless, as a first-order approach it is extremely useful everywhere and is eventually the only available tool to explore the behavior of some solar features (e.g., López Ariste and Casini 2003; Casini et al. 2005).

According to Skumanich and López Ariste (2002), the leading orders of the PCA expansion may have a direct (approximate) interpretation in terms of values for the physical quantities, specifically for the LOS velocity and the vector magnetic field. This result is in line with our discussion in Sects. 3 and 4 about successive approximations in the complexity of both the model atmospheres and the profiles. Since the profile database has to be created for each spectral line or group of lines, PCA is very well suited to analyze those spectral lines whose radiative transfer is particularly complicated, either because physical mechanisms other than the Zeeman effect are involved in their formation, or because the scenario is morphologically difficult, as in prominences, spicules, and other chromospheric structures (López Ariste and Casini 2002, 2005; Casini et al. 2005, 2009). Extensions of the technique to stellar problems have also been proposed already (Semel et al. 2006; Ramírez Vélez et al. 2006; Martínez González et al. 2008; Paletou 2012; Paletou et al. 2015).

A particularly interesting feature of the PCA technique is that once the observed Stokes profiles are expanded in terms of the eigenprofiles they become less noisy. This can be helpful for several applications (e.g., Martínez González et al. 2008; Ruiz Cobo and Asensio Ramos 2013).

7.4 Other algorithm inversions

7.4.1 Artificial neural network inversions

Artificial neural networks (ANNs) are systems through which a multidimensional input is translated into a multidimensional output by means of a non-linear mapping. The mapping (or, better, the parameters for the mapping) is (are) obtained by a previous process called training where the system is presented with inputs whose target outputs are already known. The process of training can be long and tedious but, once it is finished, the ANN can deal with new inputs with an extremely quick performance. Within the realm of solar physics, only multi-layer perceptrons have been proposed (Socas-Navarro 2003; Carroll and Staude 2001). In these specific ANNs, the input, composed of N neurons, is sequentially—layer by layer—transformed into an output of N neurons as well, of which a subset are the M elements of the target. Following the notation by Socas-Navarro (2005), the propagation rule between the input (layer 0) and the output (layer L) is given by

$$\begin{aligned} Y_n^l = f_{l} \left( \sum _{j=1}^{N} W_{n,j}^{l} Y_n^{l-1} + \beta _{n}^{l} \right) , \end{aligned}$$

where \(Y_n^l\) represents the contents of neuron n in layer \(l,\,W_{n,j}^{l}\) stands for the synaptic weight connecting that neuron with neuron j in layer \(l-1\), and \(\beta _{n}^{l}\) is a bias level. One or more layers may have a non-linear activation function \(f_{l}\) which depends on the specific implementation. In fact, the two above mentioned papers use a different f.

For ANNs, the topological problem is dealt with during the training process, while the W’s, the \(\beta \)’s, and the parameters defining f are determined. It reduces to (Carroll and Staude 2001) minimizing a quadratic distance similar to:

$$\begin{aligned} \xi ^{2} = \sum _{i=1}^{P} \sum _{k=1}^{M} \left( Y_{k}^{L} - T_{k}^{i} \right) ^{2}, \end{aligned}$$

where P is the total number of training input-target vector pairs and \(T_{k}^{i}\) is the kth target value for the ith training pair.

Artificial neural networks have been used very seldom with actual data. In fact, as far as we know, only the two applications by Socas-Navarro (2005) and by Carroll and Kopf (2008) have been published so far.

7.4.2 Genetic algorithm inversions

A general description of a genetic algorithm (GA), along with an implementation of the so-called PIKAIA code, was given by Charbonneau (1995). When compared to simple steepest ascent or descent techniques, these genetic algorithms are an alternative to, e.g., LM. The former can easily be stuck in local minima while GA and LM look and (eventually) find global minima of the multi-variable merit (or fitness) function. Some authors claim (e.g., Lagg et al. 2004) that GA techniques are more robust in finding global minima than LM, but no direct comparison is known to the authors of this review.

Devised for the specific problem of exploiting the chromospheric diagnostic capabilities of the He i multiplet at 1083 nm, Lagg et al. (2004) presented the so-called HeLIx code (HeLIx is an acronym for Helium Line Information eXtraction). The code is a direct adaptation of the PIKAIA routine to the He i multiplet formation. The line formation problem includes both the Zeeman and Hanle effects, and the presence of two blending photospheric lines of Si i and Ca i. Therefore, much care has to be taken with the analyzed wavelengths (some have to be weighted to zero) and, above all, with the complex, forward radiative transfer problem. The latter is dealt with for the photospheric Si i line by using the synthesis part of the SPINOR code (although no results from it are reported). The non-LTE effects in the He i triplet are neglected because the line is mostly optically thin. A simple ME atmosphere is assumed instead. The Hanle effect treatment is based on the forward scattering case studied by Trujillo Bueno et al. (2002). The code later evolved (now it is called HeLIx \(^+\)) to include the incomplete Paschen–Back effect (Socas-Navarro et al. 2004; Sasso et al. 2006) and to consider all the properties of the so-called constant property slab model by Trujillo Bueno et al. (2005, 2002) through the addition of the forward synthesis code by Landi Degl’Innocenti (1982) with extensions by Merenda (2008). Now, the code borrows the synthesis calculation module from HAZEL Asensio Ramos et al. 2008, which is based on the quantum theory of polarization described in Landi Degl’Innocenti and Landolfi (2004). The inversion strategy of HAZEL is based on the DIRECT algorithm, a deterministic global optimization technique different from the stochastic PIKAIA method used in HeLIx \(^+\). Given these differences, carrying out a direct comparison between the two codes with both numerical and actual observations would be a very interesting exercise, useful for the whole community. This would bring a gauge of pros and cons of GA versus LM algorithms. As a general rule of thumb we can say that genetic algorithm inversions become feasible methods whenever the evaluation of the merit function is extremely fast because this kind of algorithms require the evaluation of the merit function a thousand or a million times.

7.4.3 Bayesian inversions

An alternative technique to the inversion problem that adds some extra statistical information on the results, namely confidence levels on the free parameters, has been proposed by Asensio Ramos et al. (2007a), based on Bayes’ theorem. According to that theorem, once the posterior distribution \(p(\varvec{x}|\varvec{I}^\mathrm{obs})\) is known, the position of its maximum indicates the most probable (a.k.a. optimum) combination of parameters that best fits the observations \(\varvec{I}^\mathrm{obs}\). The posterior (probability) distribution represents how much we know of the parameters once the observational data set is taken into account. As the reader may have already imagined, the Bayesian inversion is nothing but a maximization of \(p(\varvec{x}|\varvec{I}^\mathrm{obs})\) instead of a minimization of the \(\chi ^2\) merit function. We are going to see that, indeed, the two optimization problems collapse to exactly the same result in given cases where the a priori knowledge of the problem inserted in the calculations is the simplest (and—probably—the safest). Let us explain a little what we mean by this a priori knowledge.

According to the theorem, the posterior distribution is proportional to the product of a prior distribution \(p(\varvec{x})\) and the likelihood distribution, \(p(\varvec{I}^\mathrm{obs}|\varvec{x})\):

$$\begin{aligned} p(\varvec{x}|\varvec{I}^\mathrm{obs}) \propto p(\varvec{x}) \, p(\varvec{I}^\mathrm{obs}|\varvec{x}). \end{aligned}$$

The likelihood distribution measures the probability that a given model atmosphere \(\varvec{x}\) can produce synthetic Stokes profiles \(\varvec{I}^\mathrm{syn}\) that fit the observed ones, \(\varvec{I}^\mathrm{obs}\). If the noise distributions are normal and typically independent of wavelength as it is usual to assume, then the likelihood is defined as (Mackay 2003)

$$\begin{aligned} p(\varvec{I}^\mathrm{obs}|\varvec{x}) = \mathrm{e}^{-\frac{1}{2} \chi ^2 (\varvec{x})}, \end{aligned}$$

where \( \chi ^2 (\varvec{x})\) is given in Eq. (35). Imagine now, for a moment, that the prior is identically unity, \(p(\varvec{x}) = \) constant (within a reasonable range), which corresponds to a case where no a priori assumptions are made about the model parameters (all possibilities are equally probable). In such a case, Eq. (46) becomes

$$\begin{aligned} p(\varvec{x}|\varvec{I}^\mathrm{obs}) \propto \mathrm{e}^{-\frac{1}{2} \chi ^2 (\varvec{x})} \end{aligned}$$

and indicates that maximizing the posterior distribution is exactly the same as minimizing \(\chi ^2\). No matter what the optimization algorithm used for the inversion, introducing Eq. (35) into Eq. (48), we have the way for estimating confidence levels as given by the (multidimensional) posterior probability distribution. Two-dimensional cuts of \(p(\varvec{x}|\varvec{I}^\mathrm{obs})\) allow one to explore the possible degeneracies or cross-talks among each pair of model physical quantities. As a matter of fact, and according to our discussions in Sect. 6, response functions and uncertainties derived from Eq. (42) provide qualitatively similar confidence levels although Bayes’ theorem supplies a more graphical approach (see figures in Asensio Ramos et al. 2007a). The only difficulty is then properly sampling the hyperspace of parameters (see below).

The prior distribution contains the information we may have of the model parameters without taking the observations into account. If all the model physical quantities are statistically independent, then

$$\begin{aligned} p(\varvec{x}) = \prod _i^{np+r} p(x_i). \end{aligned}$$

Unless other physical information is available, the typical assumptions one can make on the free parameters are the range of reliable values for each of them. Thus, a useful model for the prior distribution can be given by

$$\begin{aligned} p(x_i) = H\left( x_i, x_i^\mathrm{min}, x_i^\mathrm{max}\right) , \end{aligned}$$

where H(xab) is the typical top-hat function

$$\begin{aligned} H(x,a,b) = \left\{ \begin{array}{ll} {\displaystyle \frac{1}{b-a}} &{}\quad \mathrm{if} \,\, a \le x \le b, \\ 0 &{}\quad \mathrm{otherwise.} \end{array} \right. \end{aligned}$$

Establishing a prior, therefore, is analogous to the assumptions made by SIR on the (spline) smooth variations of the physical quantities along the atmosphere. The useful feature of \(p(\varvec{x})\) is that you can even consider correlations between the quantities and model them accordingly. This is in our opinion, however, a risky exercise because over fanciful correlations can be conceived that turn into an even more involved interpretation of the results. One has to make sure that the specific conditions of the problem enable this or that a priori assumption on parameter cross-talk.

In summary, if either \(p(\varvec{x}) = \) constant or is given by Eqs. (49), (50), and (51), the optimization problem is the same as that described in Sect. 7.1 and, in principle, the LM algorithm could be used as well. The missing ingredient is the sampling of the free parameter hyperspace. An alternative method is then in order. Sampling the parameter space means repeating the synthesis of Stokes profiles many, many times. Typically, one needs of the order of \(10^{np+r}\) samples of the posterior distribution. When the number of free parameters is high, such a brute-force method becomes impracticable. Asensio Ramos et al. (2009) propose using a “not-so-brute-force”, Markov chain Monte Carlo method, where marginalized distributions of parameters can be obtained. The educated successive sampling grows linearly with the number of free parameters instead of exponentially. The decrease in computational cost has allowed the authors to deal both with ME atmospheres and with general LTE atmospheres where the physical quantities vary with depth (Asensio Ramos et al. 2012).

7.5 Inversions accounting for spatial degradation

A significant step forward has been adopted by three different techniques after acknowledging the spatial effects of non-ideal instruments (van Noort 2012; Ruiz Cobo and Asensio Ramos 2013; Asensio Ramos and de la Cruz Rodríguez 2015). While the spectral PSF of the instruments was soon incorporated into the inversion codes,Footnote 21 the spatial blurring could not be satisfactorily dealt with until these works. Note that, in spectropolarimetry, an extended spatial PSF not only degrades the quality and contrast of images but also introduces a spurious polarization signal that can be misinterpreted as magnetic fields and LOS velocities. Several attempts were proposed for mitigating (or circumventing) this spatial contamination, such as using a non-polarized, global quiet-Sun average (Skumanich and Lites 1987) or a non-polarized, local (1\(^{\prime \prime }\) neighborhood) average (Orozco Suárez et al. 2007a). In these examples, the magnetic structure is assumed to contribute with a filling factor \(\alpha \). None of these can be considered fully consistent but they do provide an improvement in robustness and reliability of the results. So-called spatially-coupled inversions (van Noort 2012), regularized deconvolution inversions (Ruiz Cobo and Asensio Ramos 2013), and sparse inversions (Asensio Ramos and de la Cruz Rodríguez 2015) attack the problem directly although through different means. The first technique uses the SPINOR code and the second employs SIR; a combination of the fast iterative shrinkage-thresholding algorithm (Beck and Teboulle 2009) and the restarting scheme by O’Donoghue and Candès (2015) is chosen for the third algorithm.

7.5.1 Spatially-coupled inversions

From a formal point of view, the gradient of the merit function and the Hessian matrix were very helpful in explaining the second order approximation that is behind the Levenberg–Marquardt algorithm. Instead of using \(\nabla \chi ^2\), let us think of the Jacobian matrix \(\mathbf{J}\) of the system, which is made up of the derivatives of all the data points (all wavelength samples of the four Stokes profiles) with respect to the free parameters. That is, the elements of \(\mathbf{J}\) are just the individual terms in the summation of Eq. (36). It is then clear that the approximation in Eq. (37) is equivalent to saying that the Hessian matrix \(\mathbf{H}' \simeq \mathbf{J}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}\) and that \(\mathbf{H} = \mathbf{H}' - {\text{ diag }} \left[ \mathbf{H}' \right] \). Consider now the inversion of the RTE for a whole spectropolarimetric image of \(n\times m\) pixels individually (the uncoupled case). We can build a big (block-diagonal) new Jacobian \(\mathbf{J}\) with the ensemble of \(\mathbf{J}_{k,l}\) of all the individual pixels:

$$\begin{aligned} \mathbf{J} = \left( \begin{array}{ccccc} \mathbf{J}_{1,1} &{}\quad \mathbf{0} &{}\quad \cdots &{}\quad \cdots &{}\quad \mathbf{0} \\ \mathbf{0} &{}\quad \mathbf{J}_{1,2} &{}\quad \mathbf{0} &{}\quad \cdots &{}\quad \mathbf{0} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ \mathbf{0} &{}\quad \cdots &{}\quad \cdots &{}\quad \mathbf{J}_{n,m-1} &{}\quad \mathbf{0} \\ \mathbf{0} &{}\quad \cdots &{}\quad \cdots &{}\quad \cdots &{}\quad \mathbf{J}_{n,m} \end{array} \right) . \end{aligned}$$

The new (big) \(\mathbf{H}'\) readily becomes block diagonal, with the blocks being the \(\mathbf{H}'_{i,j}\) of all the individual pixels:

$$\begin{aligned} \mathbf{H}' = \left( \begin{array}{ccccc} \mathbf{H}'_{1,1} &{}\quad \mathbf{0} &{}\quad \cdots &{}\quad \cdots &{}\quad \mathbf{0} \\ \mathbf{0} &{}\quad \mathbf{H}'_{1,2} &{}\quad \mathbf{0} &{}\quad \cdots &{}\quad \mathbf{0} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ \mathbf{0} &{}\quad \cdots &{}\quad \cdots &{}\quad \mathbf{H}'_{n,m-1} &{}\quad \mathbf{0} \\ \mathbf{0} &{}\quad \cdots &{}\quad \cdots &{}\quad \cdots &{}\quad \mathbf{H}'_{n,m} \end{array} \right) . \end{aligned}$$

The inverse of this matrix (or that of matrix \(\mathbf{H}\)) is easy to obtain by individually inverting each of its block components.

Let us address now the spatially coupled inversion problem, where the effects of an assumed uniform PSF \(\varphi (x,y)\) across the image are taken into account. The Jacobian can now be written as

$$\begin{aligned} \mathbf{J} = \left( \begin{array}{rrcrr} \varphi _{0,0} \mathbf{J}_{1,1} &{}\quad \varphi _{0,-1} \mathbf{J}_{1,2} &{}\quad \cdots &{}\quad \varphi _{1-n,2-m} \mathbf{J}_{n,m-1} &{}\quad \varphi _{1-n,1-m} \mathbf{J}_{n,m} \\ \varphi _{0,1} \mathbf{J}_{1,1} &{}\quad \varphi _{0,0} \mathbf{J}_{1,2} &{}\quad \cdots &{}\quad \varphi _{1-n,3-m} \mathbf{J}_{n,m-1} &{}\quad \varphi _{1-n,2-m} \mathbf{J}_{n,m}\\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ \varphi _{n-1,m-2} \mathbf{J}_{1,1} &{}\quad \varphi _{n-1,m-3} \mathbf{J}_{1,2} &{}\quad \cdots &{}\quad \varphi _{0,0} \mathbf{J}_{n,m-1} &{}\quad \varphi _{0,-1} \mathbf{J}_{n,m} \\ \varphi _{n-1,m-1} \mathbf{J}_{1,1} &{}\quad \varphi _{n-1,m-2} \mathbf{J}_{1,2} &{}\quad \cdots &{}\quad \varphi _{0,1} \mathbf{J}_{n,m-1} &{}\quad \varphi _{0,0} \mathbf{J}_{n,m} \end{array} \right) , \end{aligned}$$

which is no longer diagonal. Nevertheless, since the influence of the PSF should be relatively short and not involve all the image points, the resulting \(\mathbf{J}\) has a significant number of zero elements (it is sparse). If we call \(Y \equiv \varphi *\varphi \) the autocorrelation function of the PSF, it can be shown that (van Noort 2012)

$$\begin{aligned} \mathbf{H}' = \left( \begin{array}{rrr} Y_{0,0} \mathbf{J}_{1,1}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{1,1} &{}\quad \cdots &{}\quad Y_{1-n,1-m} \mathbf{J}_{1,1}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{n,m} \\ Y_{0,1} \mathbf{J}_{1,2}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{1,1} &{}\quad \cdots &{}\quad Y_{1-n,2-m} \mathbf{J}_{1,2}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{n,m}\\ \vdots &{}\quad \ddots &{}\quad \vdots \\ Y_{n-1,m-2} \mathbf{J}_{n,m-1}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{1,1} &{}\quad \cdots &{}\quad Y_{0,-1} \mathbf{J}_{n,m-1}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{n,m} \\ Y_{n-1,m-1} \mathbf{J}_{n,m}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{1,1} &{}\quad \cdots &{}\quad Y_{0,0} \mathbf{J}_{n,m}^{\scriptscriptstyle \mathrm{T}}\mathbf{J}_{n,m} \end{array} \right) . \end{aligned}$$

Inverting matrix \(\mathbf{H}'\) is beyond our reach. However, the inversion of \(\mathbf{H}\) is affordable—at least approximately—because the linear system is sparse, although the number crunching problem is formidable. The authors propose strategies for the approximate \(\mathbf{H}'\) and recognize that human intervention (the artistic part) is in the end more needed in these coupled inversions than in regular uncoupled ones, as expected.

Although the improvement with respect to former techniques is clear and the procedure is opening a new avenue for physical inferences in the solar photosphere (see Fig. 24), a few unsatisfactory oscillations appear here and there in the application to actual observations. Such oscillations show up a caveat of the technique: the possible amplification of high frequencies and, hence, of noise. It can be argued that the spatially coupled inversions carry out convolutions instead of deconvolutions, but it is also true that any spurious, high-frequency signal may be compatible with the inverted models, provided it is washed out by the convolution with the PSF.

What is not clear either to the authors of the present review is the quite unexpected quantitative results in some applications. For example, van Noort et al. (2013) claim to have found magnetic field strengths of 7.5 kG at \(\log \tau = 0\) in some parts of the penumbra. These values can easily break the observational paradigm where fields stronger of 4 kG have very seldom been observed. Since they may represent a new paradigm, they should be accompanied by an estimate of uncertainties that is not present in the paper. First of all, the fits obtained for the V profile in their Fig. 3, seem to be different from the observations by at least an order of magnitude greater than the noise at several wavelengths. Such a fit cannot be considered satisfactory. But even more important is the fact that the quoted field strength corresponds to very deep layers in the atmosphere. As explained by Ruiz Cobo and del Toro Iniesta (1994), the second term in Eq. (30) rapidly tends to zero at low layers because the difference between the Stokes profiles and the source function vector quickly vanishes at these layers. In these circumstances, it is easy to see that values for the magnetic quantities at \(\log \tau = 0\) are extremely uncertain because the RFs go to zero (Eq. 42). Unless otherwise justified, those strong values at low layers cannot be interpreted but as (not-very-accurate) extrapolations of the magnetic field strength global stratification if SPINOR uses the equivalent response functions at the nodes of Sect. 7.2.1.Footnote 22 If instead, the RFs are the regular ones, then we are afraid that the (uncoupled) inversion strategy should be modified by changing the nodes to other places in the atmosphere with greater sensitivity to perturbations in the magnetic and dynamic physical quantities.

Fig. 24
figure 24

Stokes images in the wing (\(+\)5.7 pm) Fe i line at 630.25 nm before (left half) and after (right half) spatially coupled inversion for the Stokes parameters I (top left), Q (top right), U (bottom left) and V (bottom right). Image reproduced with permission from van Noort (2012), copyright by ESO

7.5.2 Regularized deconvolution inversions

A much simpler and computationally cheaper approach has been proposed by Ruiz Cobo and Asensio Ramos (2013). Based on the idea of Stokes profile expansion in terms of the principal components provided by a regular PCA technique, instead of deconvolving the Stokes profile images wavelength by wavelength, which is a very expensive and risky process,Footnote 23 deconvolution is applied to the PCA coefficient images. The resulting Stokes profiles after deconvolution are then inverted with SIR. The procedure is neat and simple, the uncertainties in the determination of physical quantities can be obtained through Eq. (42), and the possible overcorrections due to an excess in deconvolution can easily be controlled.

The idea is to assume that PCA expansions up to a degree D are valid to describe the profiles fully before they reach the telescope. Under such an assumption, the observed Stokes profiles can be written as

$$\begin{aligned} \varvec{I} (\lambda ) = \sum _{i=1}^{D} (\varvec{\omega }_i *\varvec{P}) \, \phi _i (\lambda ) + \varvec{N}, \end{aligned}$$

where \(\varvec{\omega }_i\) are the weights to the PCA eigenprofiles \(\phi _i (\lambda ),\,\varvec{P}\) stands for the spatial PSF of the instrument, and N is the noise (assumed independent of wavelength). One of the interesting features of this regularized deconvolution is that the noise contamination is largely minimized since the real signal is usually contained in the first few coefficients. Then, the \(4\times N\) images \((\varvec{\omega }_i *\varvec{P}) + \varvec{N}\) are deconvolved by a Richardson–Lucy algorithm (Richardson 1972; Lucy 1974). The resulting deconvolved Stokes profiles are then inverted with SIR.

The main objection one may find to this technique is the same as that for PCA, namely that the most, say, peculiar Stokes profiles, which indeed reveal very interesting physics, are not fully fit since one would need more PCA terms. The logical solution for such a problem would be to increase D but this may not be advisable as the final computation time would be too long. A practical circumvention can be provided by classification techniques (e.g., k-means clustering) that allow the grouping of the different Stokes profiles according to purely morphological criteria (MacQueen 1967; Steinhaus 1957; Lloyd 1982).Footnote 24 After such a classification, those peculiar profiles can be identified. One can then carry out the PCA expansion tailored to each group of profiles, hence optimizing the global performance. An increase in expansion terms may only be needed for small fractions of pixels in an image, thus keeping the whole inversion efficient.

7.5.3 Sparse inversions

An extremely interesting new generation of inversion methods has been proposed by Asensio Ramos and de la Cruz Rodríguez (2015). Based on the concept of sparsity or compressibility, this technique allows us to tackle the inversion of 2D maps—and potentially 3D data sets—all at once. The underlying idea of sparsity is the intrinsic redundancy of the data. That is, data can be projected to a parameter space where a reduced set of variables can fully describe that data set. Provided that a linear transformation exists between the data set and the new—small sized—parameter space, an affordable inversion of 2D maps can be carried out. In this first paper the authors present a 2D ME inversion based on a wavelet transformation of the model parameter space. They show that reducing the dimensionality of the model unknowns by a factor between three and five yields results comparable to—or even better than—pixel-to-pixel inversion.

Time saving is among the advantages of this kind of methods, along with their ability to easily compensate for the effects of the telescope PSF and the regularization of solutions introduced by the sparsity hypothesis. Among the drawbacks we find the apparent impossibility of using LM methods—since the Hessian matrix scales as the square of the number of free parameters. The authors suggest using proximal algorithms that can increase the convergence speed of the standard gradient descent method that it is currently used.

7.6 Summary of inversion techniques

This section summarizes in Table 1 all the past and current inversion techniques that have been proposed or are in use for solar physics. A distinction is made between those techniques that assume physical quantities that are constant with optical depth and those that allow the quantities to vary over the photosphere. LM stands for Levenberg–Marquardt, ANN for artificial neural networks, GA for genetic algorithm, B for Bayesian, and GD for gradient descent. The overwhelming majority of codes uses the Levenberg–Marquardt algorithm in order to find the minimum distance between the observed and the synthetic profiles.Footnote 25 \(^{,}\) Footnote 26 \(^{,}\) Footnote 27

Table 1 Inversion techniques

8 Discussion on inversion results

8.1 Increasing complexity in the model atmospheres

Following the approach of Sects. 3 and 4, we want to discuss in this section how a given set of Stokes profiles can be fit with several assumptions about the stratification of the model atmosphere physical quantities. This discussion sheds light on the ill-conditioning issue we reported on in Sect. 1: the same profile can be interpreted in several ways, depending on the complexity of the assumed model atmosphere; the only limiting factor for increasing such a complexity should be the noise in the observations and a reasonable dose of the principle of Occam’s razor. To illustrate our discussion we shall be using SIR to carry out all the necessary calculations, in a similar way to that followed by Bellot Rubio (2006). Given the SIR strategy explained in Sect. 7.2.1, the natural way to making a given physical quantity stratification more complex is by increasing the number of nodes and, hence, the polynomial degree of the spline that is assumed to describe such a stratification. Therefore, we shall be inverting the “observed” profiles with various sets of nodes for each of the different atmospheric quantities.

Fig. 25
figure 25

Left column \(v_{\mathrm{LOS}},\,B,\,\gamma \), and \(\varphi \) stratifications with optical depth for the observed model atmosphere (black lines) and for the resulting models from the mode 1 (green lines), mode 2 (red lines), and mode 3 (blue lines) runs of SIR. Gray shaded areas cover the uncertainty region of the mode 3 solution. Middle column the corresponding Stokes profiles. Right column differences between the observed and inverted Stokes profiles. The abscissa for both the middle and right columns shows the wavelength centered at 630 nm

Among all possible node combinations or modes, we have selected only six for the sake of simplicity. Each mode is characterized by the number of nodes, \(n_{\mathrm{B}}\), used for \(B,\,\gamma ,\,\varphi \), and \(v_{\mathrm{LOS}}\). The number of nodes for \(T,\,n_{\mathrm{T}}\), is higher by two nodes than that for the magnetic and dynamic quantities, except for mode 1 where \(n_{\mathrm{T}} = 2\). Mode 1 can be called “à la ME” because it has just \(n_{\mathrm{B}} = 1\). Since the starting guess model atmosphere has constant \(\varvec{B}\) and \(v_{\mathrm{LOS}}\), only constant values for these quantities can result from this inversion. Mode 2 has \(n_{\mathrm{B}} = 2\), so that linear stratifications are allowed in this mode. Mode 3 has \(n_{\mathrm{B}} = 3\); hence, parabolic stratifications can come out from SIR in this mode. Mode 4 has \(n_{\mathrm{B}} = 5\) and the stratification of the magnetic and dynamic quantities can be quartic. Mode 5 has \(n_{\mathrm{B}} = 7\) and the stratifications can be of order 6. Finally, mode 6 uses the automatic node selection algorithm described in Sect. 7.2.2.

We have built a penumbral model atmosphere after making up a bit one of the resulting models from inversion of a Hinode observation. We will call it hereafter the observed model. Our choice is driven by the shape of the Stokes profiles emerging from such an atmosphere. They are far from being typical even and odd functions of wavelength. The stratifications for \(v_{\mathrm{LOS}},\,B,\,\gamma \), and \(\varphi \) in the observed model are plotted (from top to bottom) with black lines in the left panels of Figs. 25 and 26. With this model atmosphere, we have synthesized the two Fe i lines at 630.1 and 630.2 nm, convolved them with the Hinode spectropolarimeter PSF, sampled with the instrument wavelength sampling interval, and finally added noise to a level of \(10^{-3} \, I_{\mathrm{c}}\). The so-obtained Stokes profiles will be called the observed Stokes profiles and are plotted in black lines in the middle panels of Figs. 25 and 26 (despite being barely discerned). Besides the observed model and profiles, both figures display the resulting models and fit profiles from the corresponding mode runs of SIR. Green, red, and blue lines correspond to modes 1, 2, and 3, respectively, in Fig. 25, and to modes 4, 5, and 6, respectively, in Fig. 26. The right panels of the two figures show the differences between the observed and the fit profiles in the corresponding colored lines. These differences provide a direct measure of the fit quality.

Fig. 26
figure 26

Same as in Fig. 25 but with red, green, and blue lines representing modes 4, 5, and 6, respectively

The à-la-ME inversion yields a fairly good fit although, as expected, it is unable to reproduce the asymmetries in the profiles. The typical misfit is never larger than 10%. The results given by the à-la-ME inversion coincide with the actual values at \(\log \tau _{\mathrm{c}} \simeq -1.5\). The exact coincidence takes place at different depths for each quantity but the important qualitative message to be extracted is that, in spite of its simplicity, the ME approximation is able to retrieve the atmospheric quantities at the mid-photosphere. Looking at the differences in the right panels of Fig. 25, the parity rules we commented on in Sect. 4 seem not to operate but the reason is clear: the ME model is still too far from the observed model for linearity to hold. Moreover, given the strong asymmetry shown by the profiles, the specific wavelength around which we should symmetrize or anti-symmetrize the profiles has to be calculated because it is certainly different from the nominal rest wavelength of the spectral line. The linear stratification mode does a better job as it provides a mean gradient with which asymmetries start to be reproduced. The fits clearly improve with the parabolic mode. Note that, besides having reduced the misfits, the stratifications of \(v_{\mathrm{LOS}},\,B,\,\gamma \), and \(\varphi \) are fairly well mimicked between \(\log \tau _{\mathrm{c}} = -3\) and \(\log \tau _{\mathrm{c}} = -0.5\). Note that the uncertainties evaluated with Eq. (42)—and displayed in the figure with shaded gray areas—indicate very well the range of reliability of the resulting stratifications. This is very important in practice where the observed model atmosphere is unknown. In our example, the uncertainties for the three components of the vector magnetic field are almost compatible with the linear stratification results. This is not the case for the LOS velocity where deviations are apparent and pave the road for more complex stratifications to improve the fits. In spite of this fact, the real gauge for deciding to proceed in the increase of nodes through the optical path is noise. As we have been discussing in many places in this paper, only if the differences between the observed and the synthetic profiles are larger than a few times the rms noise can we expect to obtain improvements with alternative model atmospheres. Since our noise still looks small enough when compared with the Stokes profile differences, we try with modes 4, 5, and 6. The retrieved model atmospheres are better than the former and, in particular, the uncertainty shaded areas in the top panels of Fig. 26 (they correspond to mode 5) indicate that indeed the range of reliability has extended up to \(\log \tau _{\mathrm{c}} = 0\). Notice that the size in the difference panels of the figure for mode 4 indicates that there is still some room to improve the fits, while mode 5 and 6 have profile differences compatible with the noise of the observations. Therefore, we cannot go any further (but indeed the fit quality is superb).

Fig. 27
figure 27

Area (\(\delta A\)) and amplitude (\(\delta a\)) asymmetry of the Stokes V profile as a function of the number of nodes for B. Dashed lines correspond to the asymmetries of the observed profile while shaded areas mark the uncertainties introduced by a noise of \(10^{-3}\, I_{\mathrm{c}}\)

This example illustrates the ability we can have to retrieve very complex stratifications when both the noise is low and asymmetries are present. The latter feature is indeed important. On the one hand, if no asymmetries are present in the observed Stokes profiles we can readily discard part of the complexity: no variations of the LOS velocity with optical depth are present. On the other hand, and very remarkably, asymmetries increase the amount of available information: if profiles are symmetric (either even or odd), half of them can be thrown away although the retrievals will be noisier (see Sect. 6.1).

Since Stokes profile asymmetries have driven most of the evolution of concepts in inversion techniques and, in general, in radiative transfer, a further check on the way our numerical experiments have been able to reproduce those asymmetries is in order. Let us consider the typical definitions for the Stokes V amplitude, \(\delta a\), and area, \(\delta A\), asymmetries:

$$\begin{aligned} \delta a \equiv \frac{a_{\mathrm{b}} - a_{\mathrm{r}}}{a_{\mathrm{b}} + a_\mathrm{r}}, \quad \delta A \equiv \frac{A_{\mathrm{b}} - A_{\mathrm{r}}}{A_{\mathrm{b}} + A_{\mathrm{r}}}, \end{aligned}$$

where \(a_{\mathrm{b}}\) and \(a_{\mathrm{r}}\) stand for the amplitudes of the Stokes V blue and red lobes, and \(A_{\mathrm{b}}\) and \(A_{\mathrm{r}}\) do for the unsigned areas of those lobes, respectively. Figure 27 describes the performance of the different modes in reproducing \(\delta a\) and \(\delta A\). Obviously, mode 1 shows zero asymmetries. Mode 2 approaches significantly the amplitude asymmetry but not the area one in this example. Mode 4 almost reproduce both quantities. The noise level (represented by the horizontal shaded areas) is so low, however, that only mode 5 and 6 provide an exact result.

Fig. 28
figure 28

Same as Figs. 25 and 26 but for a simulated IMaX observation

Certainly, if we decrease the amount of information by, for example, decreasing the number of wavelength samples and/or the polarization signal (because of the magnetic field being weaker) we cannot retrieve such complex model atmospheres any longer or, said otherwise, the range of reliability of the results will narrow down significantly so that low-order polynomial approximations are enough to give account of the observations. To exemplify this case, we have simulated a typical Sunrise/IMaX observation in mode V5-6, that is, five wavelength samples and six accumulations. The IMaX Fe i line at 525.02 nm was synthesized in a quiet-sun model atmosphere (again taken from the actual observations), convolved with the IMaX PSF, sampled at \(-8,\,-4\), 4, 8, and 22.7 pm from line center, and added noise at a level of \(10^{-3}\, I_{\mathrm{c}}\). Since \(Q_{\mathrm{c}} = U_{\mathrm{c}} = V_{\mathrm{c}} = 0\) except perhaps in cases of very large LOS velocities, we just count in a total of \(5\times 4-3=17\) observables and, consequently, cannot afford to retrieve more than 17 unknowns. These 17 free parameters would cope with our mode 3 (five nodes for T and three nodes for \(v_{\mathrm{LOS}},\,B,\,\gamma \), and \(\varphi \)). Indeed, this number is too high in practice and can only be reached in cases of strong asymmetries for the reasons we have just explained above: symmetries in the profiles reduce the degrees of freedom. Let us then consider mode 2 as the maximum achievable run and invert the observed profiles. Figure 28 is similar to Figs. 25 and 26 and the color codes are the same.Footnote 28 The conclusions are clear, the à-la-ME mode provides fair values, and the linear approximation gives a reliable gradient on the physical quantities at around \(\log \tau _{\mathrm{c}} = -1.5\).

8.2 Inversion retrievals of weak fields

The reliability of inversion retrievals from zones with weak fields is a continuous matter of debate. Concerns are often published with different levels of arguments. As in any other aspect of life, criticism is always more prevalent than praise in any community. This is the case when discussing the ability of spectropolarimetric observations to distinguish weak fields and their inclinations. Most discussions are strongly biased by the fairly common misconception of Stokes V being proportional to the longitudinal component of the magnetic field. We have shown in Sect. 3.1.2 that this approximation is valid only for a very limited range of values, and that the important observational parameter when dealing with weak signals is noise. Stokes profiles other than V also provide information about \(\varvec{B}\). As long as the signal is not buried by noise, radiative transfer is powerful enough to provide sufficiently accurate magnetic quantities.

Sometimes the criticisms, are not correctly interpreted. An example is the evidence shown by Martínez González et al. (2006) that the pair of Fe i lines at 630 nm is not able to provide a single model for a scenario in which two depth-dependent atmospheres, one magnetic and another non-magnetic, fill a spatial resolution element of about 1\(^{\prime \prime }\). This true result has often been interpreted as if the famous pair of lines were unable to provide a reliable inference of the vector magnetic field, and that only infrared lines were valid for such a diagnostic. del Toro Iniesta et al. (2010) explained that the scenario used by the former authors was perhaps too complicated for the available information. That is, that the visible line profiles are not enough to cope with the number of free parameters in a two-component, depth-dependent atmosphere. A simpler model atmosphere with just one magnetic component (and the other non-magnetic) may fit the profiles well enough. The latter authors gave both theoretical and observational arguments to defend the hypothesis that visible lines are reasonable diagnostics even for weak magnetic fields, in spite of infrared lines being more sensitive.

Later, Borrero and Kobel (2011, see also 2012, 2013) raised doubts about the retrievals of fairly inclined fields when the polarization signals are very weak because they come from quiet, internetwork regions (e.g., Orozco Suárez et al. 2007b; Lites et al. 2008). In our opinion, again, their claims are partial misinterpretations of the results, as we try to demonstrate with the calculations that follow.

Consider the pair of Fe i lines at 630 nm sampled as in the Hinode (Kosugi et al. 2007) spectropolarimeter (Lites et al. 2001). We have synthesized these lines in a quiet-Sun model for constant magnetic field strengths of 10, 15, 20, 25, 40, 50, 60, 75, 90, and 100 G. The inclination and azimuth may take any value but preserve an isotropic distribution, so that each of these values is equally probable. The LOS velocity has been set to zero for all the profiles. One thousand Stokes profile sets have been calculated for each value of B. Once synthesized, white noise has been added to the profiles with a standard deviation of \(10^{-3}\) or \(3.3 \cdot 10^{-4} I_{\mathrm{c}}\), simulating \(\mathrm{S/N} = 1000\) or 3000.Footnote 29 The synthetic profiles were then inverted with SIR à-la ME (two nodes in T and one node for the rest of parameters). The results are summarized in Fig. 29.

Fig. 29
figure 29

Left panel output magnetic field strength from the inversion as a function of input strength. Colors correspond to the two different values of the S/N. The dashed line marks the bisector of the first quadrant. Error bars represent rms values from the 1000 inversions. Right panel distributions of the output field inclinations from the \(\mathrm{S/N = 1000}\) inversions. Colors correspond to values of the field strength indicated in the inset. The input distribution of inclinations is represented by the dashed line

Fields weaker than 75 G (25 G) from the \(\mathrm{S/N = 1000}\) (3000) inversions tend to be overestimated but in none of the cases is the excess output such that it would retrieve too strong a magnetic field. As a matter of fact, the results illustrate very well the fair reliability of the B inference in practically all circumstances. Again, when noise decreases the inversion results improve. The magnetic inclination results, represented in the right panel of the figure, are also very illustrative of what is going on. Only results from the \(\mathrm{S/N} = 1000\) experiments are plotted. One can clearly see that the weakest fields tend to result in an excess of inclined fields. This is, however, an expected result that has nothing to do with any special inability of the visible lines or with the sought after proportionality between Stokes V and the longitudinal component of the magnetic field. It is rather a consequence of the noise dominating the polarization signals. When the field is very weak, Stokes V is very small, barely exceeding the noise level. At the same time, Stokes Q and U (which should theoretically be zero) simply show noise. Since the V signals are not sufficiently larger than the linear polarization signals, the inversion code has no other option than to interpret the observations as very inclined magnetic fields: it is mostly fitting noise in Q and U. The situation clearly improves as the field strength increases. The inclination distribution is well recovered for \(B=100\,\)G fields, even when S/N is only 1000.

Let us now consider a distribution of magnetic field strengths according to the probability density function (PDF) obtained by Orozco Suárez et al. (2007b) from Hinode observations. With these field strengths and an isotropic inclination distribution such as that for Fig. 29, we have synthesized 10,000 Stokes profiles to which white noise of rms amplitude of \(\sigma = 10^{-3} \cdot I_{\mathrm{c}}\) was added. À-la ME inversions with SIR have been carried out. Both the inputs (black lines) and the results (red lines) are plotted in the upper panels of Fig. 30. Fields weaker than 20 G are slightly overestimated, but above 60 or 70 G the strength PDF is very nicely recovered. The inclination PDF shows an excess of horizontal fields in detriment of the more vertical ones. The same PDFs are shown in the lower panels for a selection of pixels where the maximum polarization signal (\(\mathrm{max} \{|V|,\sqrt{(Q^{2}+U^{2})} \}\)) is greater than \(4\sigma \). As one can clearly see, fields weaker than 10 G and a good portion of horizontal fields almost disappear. The inversions react very well. The underestimation of small inclinations and the excess of large ones can be attributed to insufficient S/N. When the experiments are repeated with higher signal-to-noise ratios, the PDFs agree accordingly better.

Fig. 30
figure 30

Upper panels magnetic field strength (left) and inclination (right) PDFs. B follows the lognormal PDF from Orozco Suárez et al. (2007b) with \(B_{0} = 36.7\) and \(\sigma = 1.2\). Inclinations follow the same random, isotropic distribution as for Fig. 29. Bottom panels same as the upper ones but only for those points where the polarization signal is higher than a threshold (see text for details). Black lines correspond to the input and red lines to the output from à-la ME inversions

9 Conclusions

The inversion of the radiative transfer equation has been presented as a topological problem that maps the space of observables, the Stokes parameters, onto the space of the object physical quantities. The dependences of such a mapping on the definition of the two spaces implies a number of assumptions that are explicitly or implicitly made by any inference technique, regardless of it being called an inversion or not. Such assumptions determine to a great extent the uncertainties in the astronomical inferences, which depend on both the measurement errors and the analysis technique.

In the observational space, one has to select the parameters to be measured and the level of noise with which such measurements are carried out. Signals (measured parameters) are useful insofar they vary after a modification in the object physical quantities. For the variation to be detectable it should be higher than the noise. If the signal does not change above noise levels after a perturbation in the physical quantities, then it is useless and must be discarded. In the object physical space, the number of quantities and their assumed stratification with depth in the atmosphere are the key variables. If a physical quantity at a given depth in the atmosphere produces no measurable effect on the Stokes spectrum, then this quantity should not be looked for through the inversion process. The number of these physical quantities should not exceed the number of observables.

The mapping between the two spaces is nothing but radiative transfer. Depending on the spectral line and the way we measure it (that is, the number of wavelength samples, the width of the spectral PSF, etc.) the transfer can be studied through the full non-LTE problem, the LTE approximation or, rather, through further simplifications such as the Milne–Eddington approximation, the weak field approximation, etc. Strictly speaking, no available inversion technique deals with the full non-LTE problem and the only non-LTE code, NICOLE, relies on several approximations such as the fixed departure coefficient approximation or the field-free approximation in order to make the numerical problem tractable.

We have provided arguments in favor of proceeding through a step-by-step approach in which the complexity of the problem increases sequentially until convergence has been reached. In this sense we strongly recommend initializing inversions with classical estimates of B and \(v_{\mathrm{LOS}}\) as provided by the center of gravity technique, and estimates of \(\gamma \) and \(\varphi \) as provided by the weak field approximation. The criterion for convergence has to be established in terms of noise: if the (rms) difference between observed and synthetic Stokes profiles is less than the typical noise of observations, increasing complexity in the object physical description adds no information. In this regard, we have given both conceptual and technical arguments for the MISMA hypothesis and inversion technique to be abandoned. At the other extreme of the “complexity spectrum”, the weak field approximation must only be used with much care and mainly for very broad chromospheric lines. Nothing in the transfer equation indicates that Stokes V is proportional to \(B \cos \gamma \). Only some matrix elements of K are. After integration through the atmosphere, the proportionality is most probably lost. Moreover, the information provided by the other Stokes parameters helps in disentangling the magnetic field strength from the inclination. In particular, Stokes I very soon departs from the zero field conditions that are strictly necessary for the weak field approximation to apply.

The step-by-step approach we suggest (and which is indeed implemented in the SIR inversion code) is to be preferred for two reasons: on the one hand, the atmospheric stratification of physical quantities can be described by a Taylor-expansion-like method where it is assumed to be first constant, then linear, later quadratic, and so on; on the other hand, the Stokes profiles (I in line depression), as functions of the wavelength, belong to \(\mathbbm {L}^2\) and, hence, can be expanded in terms of orthonormal bases such as those provided by the Hermite functions or those built for PCA techniques.

The various algorithms used for the inversion problem have been reviewed. They include the most widespread Levenberg–Marquardt algorithm, database search inversions, artificial neural networks, genetic algorithms, and Bayesian inferences. The most promising techniques for the near future, namely those which include spatial degradation by the telescope, are also discussed. Among these we find the so-called spatially-coupled inversions by van Noort (2012), the regularized deconvolution inversions by Ruiz Cobo and Asensio Ramos (2013), and the sparse inversions by Asensio Ramos and de la Cruz Rodríguez (2015). Suggestions for improving their performance and reliability are also given.

This paper ends with a discussion on a pair of topics that might seem controversial. The first one is a description of how the current implementation of SIR deals with a step-by-step approach. The sequential improvement on the fits is explicitly shown. The second topic discusses the reliability of weak field retrievals. The idea of a theoretical inability of Zeeman-sensitive spectral lines in the visible for inferring weak fields accurately is refuted. Instead, the root of the problem is shown to be in the signal-to-noise ratio of the observations. If noise is suitably low, then radiative transfer provides the necessary tools for accurate retrievals. Of course, uncertainties will be proportionally larger than when signals are bigger (i.e., stronger fields), but there are no reasons for not trusting the inversion results.