Determination of stellar parameters for Ariel targets: a comparison analysis between different spectroscopic methods

Ariel has been selected as the next ESA M4 science mission and it is expected to be launched in 2028. During its 4-year mission, Ariel will observe the atmospheres of a large and diversified population of transiting exoplanets. A key factor for the achievement of the scientific goal of Ariel is the selection strategy for the definition of the input target list. A meaningful choice of the targets requires an accurate knowledge of the planet hosting star properties and this is necessary to be obtained well before the launch. In this work, we present the results of a bench-marking analysis between three different spectroscopic techniques used to determine stellar parameters for a selected number of targets belonging to the Ariel reference sample. We aim to consolidate a method that will be used to homogeneously determine the stellar parameters of the complete Ariel reference sample. Homogeneous, accurate and precise derivation of stellar parameters is crucial for characterising exoplanet-host stars and in turn is a key factor for the accuracy of the planet properties.

discovered in the next few years by Gaia [33,51], TESS [38], CHEOPS [11] and the upcoming space surveys, such as PLATO [37], along with ground-based surveys, like WASP [35], NGTS [66], TRAPPIST [24], HARPS [29] and ESPRESSO [32], CARMENES [36] and SPIRoU [6]. The recent success of ground-based, space transit and radial velocity searches has ushered exoplanet research into an era of characterisation studies with the goal to investigate the nature, formation, and evolutionary history of the detected objects.
At first, these studies have been focused on understanding the internal structure of exoplanets. From the transit light-curve, the planet radius can be measured and from spectroscopic Doppler measurements, the planet mass is obtained. From the bulk density we have the first hints of the internal structure of the exoplanet and the gas/ice/rock ratios. However, to have a reliable estimate of the density, an accurate knowledge of both radius (with a precision of up to 5%) and planetary mass (up to 10%) is necessary [8,65]. Additionally, the absolute value of planetary radius and mass relies on the precise determination of the radius and mass of the exoplanethost star. The derivation of these last two values is in turn strongly connected to the effective temperature (T eff ), surface gravity (log g), and the metallicity of the star. Thus, the planetary properties are critically dependent on their stellar host properties [2,50,57].
Furthermore, transiting planets provide us one of the best ways of characterising their atmospheres. In-transit spectroscopy as well as secondary transit studies [12,20,34,44,54,62,68] using space observatories, as Hubble and Spitzer, and some ground-based observatories, have yielded the detection of some important molecules present in the planetary atmospheres for a limited number of targets, or have identified the presence of clouds, probing the thermal structure and providing some constraints on the planet properties. However, the data available is still too sparse to provide a consistent interpretation and the achieved results point out the main limitations of the existing facilities: very narrow wavelength coverage, observations usually not simultaneous for a wider spectral range with the introduction of systematic noise, insufficient time allocated to exoplanet science, and more in general the lack of a dedicated space-based exoplanet spectroscopy mission. Thus, our current knowledge of exoplanetary atmospheric and thermal characteristics is still very limited.
Ariel (Atmospheric Remote-sensing Infrared Exoplanet Large-survey) has been selected as the next ESA-M4 science mission [55] and it is expected to be launched in 2028. During its 4-year mission, Ariel will observe the atmospheres of a statistically representative sample (∼1000) of transiting gaseous (Jupiters, Saturns, Neptunes) and rocky (super-Earths and sub-Neptunes) planets using transit spectroscopy in the 1.10-7.8μm spectral range and three narrow-bands photometry in the optical. The wavelength range proposed covers all the expected major atmospheric gases from H 2 O, CO 2 , CH 4 , NH 3 , HCN, H 2 S up to the more exotic metallic compounds, such as TiO, VO, and condensed species. Ariel is designed as a dedicated survey mission for transit, eclipse and phase-curved spectroscopy, providing a homogeneous dataset, with a consistent pipeline and an well-defined target selection strategy, maximising the scientific yield. Focusing on transit, eclipse spectroscopy, the methods are based on the differential analysis of the star and planet spectra in and out of transit, allowing to measure planetary atmospheric signals of 10-100 ppm relative to the star. Such small signals require an exact knowledge of the host star spectrum, at least at the same level of the planetary signal, to map any stellar intrinsic variation (i.e. due to magnetic activity and convective turbulence) in order to avoid misleading results with planetary features [39].
Another important point is that information on the host star composition is critical to separate the signatures left on the planet by its formation, evolution and migration processes, from those due to the specific chemistry of the host star [63]. Indeed, recent studies suggest that planetary O/H, C/H, C/O ratios and metallicity with respect to the stellar values could provide stronger constraints on the planet formation region and their migration mechanisms (see [27] and references therein for a recent review), but similar considerations apply to other elements (e.g., N, S, Ti, Al, [63]).
Finally, an increasing number of studies have pointed towards the existence of correlations between the properties of the host stars and the characteristics and frequency of their planetary systems. In this respect, the correlation between the stellar metallicity and the frequency of giant planets [41,49,64], the connection between radius vs metallicity [13,43], eccentricity vs metallicity [3,67], the role of the abundances of other elements [4,16,17] in the host stars are only few examples of different results that take a clear shape as the new planet discoveries increase, shading light on many details still missing concerning planet formation and evolution.
Such works rely upon homogeneously and precisely derived stellar parameters. Therefore, homogeneous derivation of stellar parameters using high-quality data is crucial for characterising exoplanet-host stars [5,42,47], and in turn, is fundamental to improve the accuracy of the planet properties.
It is important to underline how a well-defined target selection strategy and the definition of the input target list with a statistically significant dataset in the range of relevant stellar/planet parameters have a fundamental role for maximising the scientific yield and the achievement of scientific goals of Ariel. A meaningful choice of the targets requires an accurate study of the stellar properties that need to be derived in advance and continuously updated as the mission approaches launch and the target list evolves with the new exoplanet discoveries.
In this context, we have started a benchmarking analysis between three different spectroscopic techniques used to determine stellar parameters for selected targets belonging to the Ariel Reference Sample [19]. Our goal is to consolidate a method that will be applied to homogeneously determine the stellar parameters for the complete Ariel Reference Sample. More generally, we refer also to the works of [26] and [45] where comparisons between the results from different spectroscopic techniques are discussed.
For a global approach to characterize the stars of Ariel Reference Sample see also [15], where an overview on the methods used to determine stellar fundamental parameters, elemental abundances, activity indices, and stellar ages for the Ariel Reference Sample is given and in particular, results for the homogeneous estimation of elemental abundances of Al, Mg, Si, C, N, and the activity indices S and log(R' HK) are presented.
In the following sections we describe the star sample analysed, we give an overview of the methodology applied to derive the stellar parameters and then we present a comparative analysis of the results.

Star sample
On the bases of Ariel capabilities, a list of targets (Ariel Reference Sample) to be observed during the primary mission life was prepared for the Phase A [19,69]. The sample includes ∼1000 potential targets with stellar types FGKM (typically brighter than K = 11 mag) and planetary parameters in a range of size between Jupiter down to Earth-like planets, temperature in the range between 500K-2500K and bulk density between 0.10-10.0g/cm 3 .
We started our analysis cross-matching the Ariel Reference Sample with the stars available in the SWEET-Cat (Stars With ExoplanETs Catalogue; https://www.astro. up.pt/resources/sweet-cat/) and we selected all the common targets with parameters source "flag=1", which are the stars analysed homogeneously (see [42]). Thus, our starting sub-sample includes 155 FGK stars in a range of 5<V(mag)<16 with relative spectra having a signal-to-noise ratio (S/N) between ∼50 and ∼800 (see Fig. 1). Stellar parameters for these 155 stars available in the SWEET-Cat, were considered our baseline for the comparison analysis. An accurate description of the spectroscopic data used to derive SWEET-Cat stellar parameters is presented in [47] and [42].
We then re-determined the stellar parameters of this sub-sample using two different methods: FAMA (Fast Automatic MOOG Analysis, [28]), based on the equivalent widths analysis and FASMA (Fast Analysis of Spectra Made Automatically, [59]), based on the spectral synthesis method. For our work, we used the same archival spectra employed for the SWEET-Cat analysis. The characteristics of each spectrograph and the number of stars considered in our analysis for each instrument are listed in Table 1. Excluding stars for which FAMA and FASMA codes did not converge (spectra dominated by fringing, very low S/N, fast rotator targets, very cool stars), we obtained results for about 93% of the sample.
Individual Stellar parameters derived by FAMA and FASMA for each analysed star are provided as online material in the format as Table 2.

SWEET-Cat
SWEET-Cat is a catalog of stellar parameters taken in general from the literature for planet hosting stars listed within the Extrasolar Planets Encyclopaedia (http://exoplanets.eu/). This catalogue is continuously updated when new planets are announced and new stellar parameters derived. In particular, for stars with spectra acquired with high resolution spectrographs (mainly HARPS, UVES, FEROS) and high signal-to-noise ratios (mostly S/N>100), the stellar parameters are obtained in a homogeneous way (flagging the targets with 1, as mentioned above). The method used to derive stellar parameters is described in, e.g., [41] and [49]. Briefly, local thermodynamic equilibrium (LTE) condition is assumed, a grid of Kurucz planeparallel model atmospheres ( [25], ATLAS9), and the ARES code [48] for measuring line equivalent widths (EWs) are considered. Stellar parameters (effective temperature T eff , surface gravity log g, microturbulence velocity V micro , and iron abundance [Fe/H]) are derived using the MOOG code (version 2002; [46]) and the line list by [49]. Excitation and ionization equilibria of the Fe I and Fe II weak lines were imposed to derive stellar parameters. The errors on the atmospheric parameters were derived as in [41] and references therein. The stellar masses listed in the SWEET-Cat catalog were derived with the calibration presented in [56], using as input the spectroscopic parameters. According to [42], a correction was applied for the cases in which the calibration gives values between 0.7 and 1.3M .The errors for these mass values are computed as in [42] by means of a Monte Carlo analysis. For each case 10 000 random values of effective temperature, surface gravity, and stellar metallicity were drawn from a Gaussian distribution. From the resulting mass distribution, the central value for the mass and 1-sigma uncertainty were derived.

FAMA analysis
The aim of FAMA is to allow the computation of the atmospheric parameters and abundances of a large number of stars using measurements of equivalent widths (EWs) as automatic and as independent of any subjective approach as possible. It is based on the simultaneous search for three equilibria: excitation equilibrium, ionization balance, and the relationship between log n(FeI) and EW/λ. FAMA also evaluates the statistical errors on individual element abundances and errors due to the uncertainties in the stellar parameters. The convergence criteria are not fixed a priori but are based on the quality of the spectra. The code is described in [28]. For our work, first, we have measured the EWs with DOOp (DAOSPEC Option Optimiser, [14]) an automatic tool developed within the Gaia-ESO survey. The code is based on DAOSPEC code [53] and uses Gaussian fit to measure EWs. We adopted the line list of the Gaia-ESO survey [23], which includes atomic parameters (log g and damping coefficient) for a large number of lines in the spectral range 4200-6800Å. We adopted the MARCS model atmospheres (plane parallel and spherical) [22].

FASMA analysis
The analysis with FASMA is based on the spectral synthesis technique wrapped around the radiative transfer code, MOOG (version 2019; [46]). FASMA creates synthetic spectra on-the-fly and delivers the best-fit parameters (effective temperature, surface gravity, iron abundance, and projected rotational velocity) after a non-linear least-squares fit (Levenberg-Marquardt algorithm). The line list is mainly comprised of iron lines initially obtained from VALD3 [40]. The regions of the spectral synthesis are defined within small intervals around these iron lines (±2Å). The atomic data are calibrated to match the spectra of the Sun and Arcturus but given the strongest constrains on the solar parameters, we gave higher weights to the Sun. The damping parameters are based on the ABO theory [7] when available, or in any other case, we use the Blackwell approximation. The model atmospheres are interpolated from the ATLAS grid [25] in LTE. The minimization is a two step procedure where initially we obtain the best-fit values using fixed solar macroand microturbulence. Then, we refine our results in a second step, with updated macro-and micro-turbulence based on empirical relations. Macroturbulence velocity is set by the calibration of [18] and microturbulence is set based on calibrations for either dwarfs [60] or giants [1]. The uncertainties are derived from the covariance matrix constructed by the nonlinear least-squares fit. The methodology is described in detail in [58,59] where it is tested for both giant and dwarf samples, including stars with larger rotation.

Comparison among the methods
We compare our results seeking possible outliers and trends among the parameters obtained by the three methods. In this preliminary work, we explore the results of the three analysis, comparing with external benchmarks, as the surface gravities derived from light-curves in literature and from Gaia parallaxes and photometry, to identify the ranges of best performances. The bulk of the T eff results for the SWEET-Cat and FAMA is in good agreement (within 1-σ ), although there are some outliers: 28 stars show a difference in T eff over 1σ level but still within ∼417K (3σ level). Among these outliers, we noticed that 7 stars present large vsini (>10km/s). The T eff spread seems more evident when considering stars with the lower and higher T eff in the considered range. In addition, there is an offset between the two methods, with the mean value (red dashed central line) for the difference of ∼70 K. Focusing now on the FASMA analysis, also in this case, the bulk of the T eff results for the SWEET-Cat and FASMA is in good agreement (within 1σ ), although outliers are also present. For stars with the lower and higher T eff in the range of interest, the dispersion between the two methods increases. An offset between the two methods is present with a mean value (red dashed line) in the difference of ∼34K. We note here that some of the outliers are not in common with the outliers found using the FAMA method. Figure 2 (central panels) show respectively the comparison of log g derived using the FAMA (left panel) or FASMA (right panel) methods vs the results given in SWEET-Cat. The colour scale represents T eff for SWEET-Cat. There are trends in both the FAMA and FASMA results: the differences between their log g and those of SWEET-cat are increasing with log g. The differences are particularly high for higher surface gravity (log g >4.6 dex). In the log g range of ∼ 4 − 4.6 dex, the trend is almost negligible. In addition, FAMA, on average, gives lower gravities, of about ∼ −0.18 dex. This is one of the most important results of the analysis and we need to remark it. Despite the differences in the derived stellar parameters, the three methods converge to very similar final metallicities [Fe/H]. The three methods rely, indeed, on different line lists, with different set of atomic data, which make them to converge on slightly different sets of T eff and log g values, which should satisfy the conditions of excitation equilibrium and ionization balance. T eff and log g tend to vary along a local minimum, in which the value of [Fe/H] is usually more constant.

SWEET-Cat vs FAMA/FASMA: [Fe/H]
In Fig. 3, we show the differences in metallicity of the FAMA and FASMA results with respect to SWEET-Cat, as a function of the differences in the derived surface gravities. The symbol are colour-codes by the differences in the derived T eff , again, with respect to the SWEET-Cat ones. Removing the systematic offsets (+0.05 dex from FAMA and -0.05 dex for FASMA with respect to the SWEET-Cat scale), the differences in [Fe/H] are negligible for variation in the surface gravity within ±0.3 dex with respect to the SWEET-Cat ones. In the comparison between FAMA and SWEET-cat [Fe/H] even for differences in log g∼ -0.3 dex, the agreement in metallicity between the two methods is still good. Larger differences in log g>+0.3 dex, corresponds to higher [Fe/H] in SWEET-Cat than the ones obtained with the other methods. This means that moving to higher effective temperatures the discrepancies between the methods increases also for the results in [Fe/H] and the hottest temperature regime remains thus critical also to obtain reliable metallicities.
Concluding, the stars in the parameter space closer to the solar values (T eff =5000− 6000K, log g=4.2 − 4.6 dex) are those for which the three methods are in better agreement. Among the three parameters, the best accord is reached for [Fe/H] in the whole parameter space. The most critical parameter is, however, the surface gravity: external comparison (using Gaia data, asteroseismic data, isochrones) are needed to evaluate the results of the different methods. A correct evaluation of all stellar parameters is indeed fundamental for a precise characterisation of the host star, including the determination of its age [9].

Evaluation of the accuracy and precision of the stellar parameters
In this section, we discuss some indirect checks on the accuracy and precision of the stellar parameters. We especially focus on the control of the stellar surface gravity, parameter that cannot be usually constrained well by spectroscopic methods (e.g. [30,61]). This fact has an impact on the calculation of other stellar parameters (T eff and [Fe/H]) and subsequently on the derivation of the stellar chemical abundances.

Comparison with surface gravity derived from trigonometric distances using Gaia DR2
We compare the log g derived using the three spectroscopic methods with the trigonometric log g based on Gaia DR2 photometry and parallax. Photometric gravities have been obtained using the following equation where M/M is the stellar mass (in solar mass units) provided by the Sweet-cat database. M bol is the bolometric magnitude, obtained from the luminosity published in the Gaia DR2 catalog [21] using the following relation M bol =4.75-2.5× log(L/L ), and T eff is the Gaia photometric T eff . Since the stars in our sample are nearby and thus the reddening is negligible, we can derive M bol from the Gaia luminosities and the distances obtained directly by inverting the parallaxes. In Fig. 4 (left top and bottom panels) we show the comparison (no outliers are removed in the plot). Surface gravities from the SWEET-Cat appear to be slightly underestimated at low T eff , and highly overestimated at high T eff . The same behaviour for FASMA at high T eff . No clear trend appears for the FAMA surfaces gravities vs T eff . In Table 3 we divide the Fig. 4 Left panels: Comparison between spectroscopic (Sweet-cat, FAMA and FASMA) and trigonometric log g as a function of T eff (top) and of trigonometric log g (bottom). The corresponding T eff for the trigonometric log g, is derived from the Gaia measurements. Right panels: Comparison between spectroscopic (Sweet-cat, FAMA and FASMA) and log g derived from the light-curves as a function of T eff (top) and of light-curve log g (bottom). Average errors are also indicate on a side of each plot. For the parameter differences we considered as error the larger value among the average errors from each method sample in three temperature regimes: T eff <5000 K, 5000 K<T eff <6000 K, and finally T eff > 6000 K. We compute the mean differences between the log g from Gaia and the spectroscopic log g from the three methods, and their standard deviation (1σ ). In the coolest regime, all methods tend to underestimate the surface gravities, with differences consistent with each other within the error. The spectral synthesis method FASMA provides gravities, on average, in better agreement with the trigonometric ones. In the intermediate regime, with 5000 K<T eff <6000 K, the three methods show the better agreement with the log g from Gaia: FAMA slightly underestimates Table 3 Mean differences log g Gaia -log g spec with standard deviation (1-σ ) and median differences (in parenthesis) in three T eff intervals Method T eff <5000 5000 K<T eff < T eff > 6000 Total T eff the trigonometric gravities, while SWEET-Cat and FASMA slightly overestimate them. The most challenging regime is the hottest one, with T eff > 6000 K. In this temperature range, there is no agreement between the spectroscopic methods and the trigonometric one (FASMA and SWEET-Cat overestimate the trigonometric log g, while FAMA slightly underestimates it). In the last column, we report the mean differences for the whole sample: slightly negative differences for SWEET-cat and FASMA, and positive for FAMA. However, it is clear that global mean differences mask the most critical regimes, for the hottest and coolest stars of the sample.

Comparison with log g derived from the light-curves
We compare the trigonometric and spectroscopic gravities with those derived from the light-curves in literature, and available in the SWEET-Cat database. In Fig. 4 (right top and bottom panels) we show the difference between the log g derived from the light-curves (literature values for 48 targets were found) and the respective spectroscopic log g listed in the SWEET-Cat (green filled squares) and obtained through FASMA (red filled triangles) and FAMA (blue filled circles) as a function of T eff and of the light-curves log g. First, we notice the good agreement between the trigonometric and light-curve surface gravities, despite the two methodologies are quite different, and this is an encouraging result (see also Fig. 5). Concerning the other methods, similar considerations as in the previous section can be obtained: FAMA shows a bit higher dispersion and a positive offset, but almost no trend. FASMA displays the lower dispersion, but tends to overestimate log g at high T eff . SWEET-cat seems to overestimate log g at high T eff , and underestimate at low T eff . Clearly the hottest region is the most critical one, both because hot stars have less absorption lines useful to constrain the photospheric parameters and because stellar rotation might become important, making more difficult to measure absorption lines which can be blended. Moreover, in the low-temperature regime, the presence of molecular bands can blend and hide the atomic lines.

Comparison with isochrones
Another important check is based on the comparison with theoretical isochrones, computed in T eff vs log g plane, for a set of ages, keeping the metallicity constant at [M/H]=0.058. 1 Figure 6 shows the Kiel diagrams (log g vs T eff ) for the values of log g and T eff listed in the Sweet-Cat, obtained by FAMA and by FASMA, and derived through Gaia photometry and parallax, respectively. The PARSEC isochrones [10] in a range of ages are over-plotted: the outermost tracks correspond to log(age/yr)  Figure 6 (left top panel) points out how the SWEET-Cat gravities are not matching the isochrones path and appear to be overestimated at the highest temperatures and underestimated at the lowest temperatures. In Fig. 6 (right top panel) we present the same plot for the results obtained with FAMA: T eff and log g follow the expected trend, however there is an offset towards lower gravities. In Fig. 6 (left bottom panel) we show the results of FASMA. The dispersion is lower, however, at high temperature there is still an overestimation of log g. Finally, in Fig. 6 (right bottom panel) the Gaia parameters are displayed, pointing out the best match with PARSEC isochrones.

Conclusions
In our first test, we have compared the analysis of a sample of ∼150 spectra (high S/N and high spectral resolution). In the parameters space close to Solar values the agreement among the three methods is good (T eff = 5000 − 6000 K, log g = 4.2 − 4.6 dex). However, at low and high temperatures some methods tend to under/overestimate the surface gravity log g. This might have important effects on the derived stellar abundances. External comparisons (using trigonometric log g, log g from light-curves and from isochrones) confirm the trends. Corrections to these trends were already available (see, e.g, [17,31]). We plan in the next tests to apply them, using both corrections from asteroseismology and light-curves to provide more realistic gravites. In this context we refer also to the wider discussion present in literature on the disagreement between spectroscopic and evolutionary or photometric log g values (e.g. see [52,57]).
However, it is important to notice that, despite the differences in the derived photometric parameters, especially in log g, the three spectroscopic methods agree very well on the final metallicity, except in the hottest temperature regime. Due to high quality of the Gaia photometry and parallax for the Ariel targets, a viable and welcome possibility is to adopt the surface gravity homogeneously derived from Gaia to compute chemical abundances. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.