1 Introduction

Recent successes in characterisation of extrasolar planets are also always tales of characterising the instrument response function to an unprecedented detail. Always being at the edge of technical feasibility means that instrument calibration, observing strategy as well as data analysis and modelling are interdependent. In the light of the EChO ESA-M3 mission concept [1], such interdependence becomes important in the study of engineering decisions and instrument trade-offs. In other words, one needs to simulate the full observational and data analysis chain in order to gauge the impact the instrument concept has on the achievable error bar of the detection. Such a feat requires an advanced mission end-to-end simulator as well as an advanced data analysis pipeline. In this paper, we discuss the data analysis pipeline which is used in conjunction to the mission simulator, EChOSim [2]. The EChOSim data pipeline (from here on EChOSim-DP) is a stand-alone software custom built for EChOSim but with easy adaptability to other instruments and data sets in mind.

The method by which the EChO mission will characterise the nature of extrasolar planets is by time resolved spectroscopy of their atmospheres, in particular of transiting extrasolar planets. Briefly, when an exoplanet transits in front of its host star (in our line of sight) we observe a diminishing of the stellar flux due to the obscuration of the planet. The depth of the resulting lightcurve allows us to estimate the planetary radius (given the stellar radius is known). This we refer to as transit (or primary eclipse) observation. Should the exoplanet feature an extended atmosphere, we expect some of the stellar light to filter through the terminator region of the planetary atmosphere. Here we are sensitive to molecules absorbing the stellar light at specific wavelengths. We hence perceive a variation of transit depths depending on the wavelength range observed. These variations constitute the signatures of an exoplanetary absorption spectrum. Similarly, we can observe the occultation (or secondary eclipse) where the thermal contribution of the exoplanet’s day-side is lost to the observer as the planet passes behind its host star. The study of transmission and emission spectroscopy is now a well established field for both space and ground based observations of exoplanetary atmospheres (e.g. [323] also see [24] for a comprehensive review).

1.1 EChOSim

EChOSim is the EChO mission end-to-end simulator. EChOSim implements a detailed simulation of the major observational and instrumental effects, and associated systematics. It also allows the influence of individual instrumental and astrophysical parameters to be studied and thus represents a key tool in the optimisation of the instrument design. Observation and calibration strategies, data reduction pipelines and analysis tools can all be designed effectively using the realistic outputs produced by EChOSim [2, 25]. The simulation output closely mimics standard STSciFootnote 1 FITS files, allowing for a high degree of compatibility with standard astronomical data reduction routines.

1.2 Examples

We illustrate individual steps in EChOSim-DP using diagrams. Unless specified otherwise, we follow a single data processing run of EChOSim simulated data of the hot-Neptune 55 Cnc e. EChOSim was run to simulate the Chemical Census mode of EChO, in which we co-add (in the case of 55 Cnc e) five eclipse observations to obtain a minimal signal-to-noise (S/N) of the final spectrum of S/N ∼ 5.

For this we assume spectra reconstructed with a resolving powers of 50,50,30,30,30 for the VNIR, SWIR, MWIR-1, MWIR-2, and LWIR channels. The native resolving powers of individual detectors can exceed these requirements, see [1] for a review of the proposed EChO observing modes.

2 Data reduction

The EChOSim-DP is a stand-alone package delivered with the EChOSim code but can easily be adapted to observations produced by any spectrograph. It is written in fully object orientated Python allowing for a cross platform compatibility and an easy adaptability through its modular design. EChOSim-DP is subdivided into five main modules: 1) The data and parameter read-in and object initialisation, 2) data reduction, going from two dimensional focal plane illuminations to 1D time series data, 3) time series de-trending using non-parameteric de-trending algorithms, 4) lightcurve fitting using simplex-downhill minimisations as well as Markov Chain Monte Carlo (MCMC) techniques, 5) collection of results and computation of the final spectrum. We summarise this flow in Fig. 1.

Fig. 1
figure 1

Flowchart of the EChOSim-DP design. The pipeline is subdivided into five main modules (contained as individual python classes): 1) Object initialisation and data read, collating all input data and parameter files and performing format conversions where necessary, 2) Data reduction, reducing the two dimensional focal plane images to 1D wavelength dependent time series, 3) de-trending all or individual time-series using non-parametric machine learning techniques, 4) model fitting the final lightcurve, 5) collecting all data and calculating the final spectrum

2.1 Configuration and data formats

The output of EChOSim follows the standard FITS file conventions with the aim to make the raw data generated by EChOSim as universally readable as possible. The payload of EChO is subdivided into individual channels defined as: VNIR (0.4 - 2.5 μm), SWIR (2.5 - 5.0 μm), MWIR-1 (5.0 - 8.5 μm), MWIR-2 (8.5 - 11.0 μm) and LWIR (11.0 - 16.0 μm). For a detailed description of the individual channels we refer the reader to [1] and publications in this special issue. Due to varying detector array sizes, it is not possible to combine all focal plane read-outs (for an individual frame) in one conventional FITS data-cube. EChOSim hence utilises extensions to the Primary FITS Header Data Unit (PrimaryHDU). This allows the inclusion of meta data on each detector as well as additional auxiliary information carried in binary tables (BinaryHDUs). EChOSim produces one FITS file per integration interval resulting in 10s to 100s of files per simulated observation run (Fig. 2). Whilst the high number of output files produced seems cumbersome, it reflects the data handling strategies of current space and ground based instruments. EChOSim-DP is designed to be fully compatible to this customised FITS convention using a custom build read-in routine based on the PyFITSFootnote 2 package. EChOSim-DP can also natively read single HDU FITS files generated by other instruments.

Fig. 2
figure 2

Top left Focal plane of the mid-IR2 detector as read in by EChOSim-DP. Bottom and right cross cuts though the focal plane along the spectral and spatial directions respectively. Most flux is contained within three pixels of the spatial direction

Auxiliary information contained in BinaryHDUs contains: the EChOSim generated stellar limb-darkening grid, EChOSim generated noiseless stellar, zodi and thermal fluxes from the instrument and its optical elements, EChOSim generated exoplanetary eclipse/transit depths, EChOSim generated Keplarian solutions. If specified by the user, EChOSim-DP can use these auxiliary data to calculate exact time series normalisation constants and eclipse/transit models to estimate best-case scenarios.

EChOSim-DP specific parameters are specified in a separate ascii file and parsed using the python specific ConfigParser.Footnote 3

2.2 Flat-fielding and bad-pixel rejection

Before spectra are extracted, the focal plane data is flat-field subtracted and scanned for bad-pixels. The flat field is provided by EChOSim and constitutes a inter-pixel sensitivity variation map of the detector. In the current implementation no other flat fielding is provided by EChOSim. After flat-fielding, each frame is scanned for 3 σ flux variant pixels (either from cosmic ray hits or otherwise) which are masked and subsequently excluded from further analysis.

2.3 Focal plane binning

Given current detector design specifications, the native spectral resolution (R=λλ) of EChO can exceed that required by the science case. EChOSim-DP provides two available spectral binning formats: 1) constant R, (1); 2) constant Δλ, (2):

$$ {\Delta} x = \frac{\lambda R(\lambda_{mid}) }{2{\Delta}_{pix}} $$
(1)
$$ {\Delta} x= \frac{\lambda_{mid} R(\lambda_{mid}) }{2{\Delta}_{pix}} $$
(2)

where Δx is the binning interval along the spectral axis in pixels, λ and λ m i d the wavelength and central wavelength in μm and Δ p i x the pixel size in μm. Note that EChO spectrometers sample each spectral resolving element with two detector pixels. Figure 3 shows Δx for both binning methods as function of λ. Binning is performed directly on the focal plane before spectral extraction. This increases S/N and avoids potential biasing of the data.

Fig. 3
figure 3

Showing binning steps Δx in pixels as function of wavelength for the two spectral binning modes available in EChOSim-DP. The native resolution for all detectors (VIS, NIR, MIR1, MIR2, FIR) are R = 330, 530, 52, 103, 62. Red-solid line shows the constant in R binning; blue-discontinuous line shows constant in Δλ binning to resolutions of R= 50, 50, 30, 30, 30 respectively

2.4 Optimal extraction

After the data has been binned, we extract the raw spectrum along the spatial axis for each individual time stamp. At each integration time, the raw spectrum is extracted from the data by fitting a model of the PSF to the point-like dispersed signal of the star + planet flux. Two standard extraction techniques are available in EChOSim-DP: Photometric window extraction and optimal extraction. The photometric window extraction is the simplest spectral extraction technique which consists of summing detector counts contained in a box of fixed spatial axis width. This method is very robust in low background observations and when the instrument PSF is not known with adequate precision. Optimal extraction weighs individual pixel columns with the optimal PSF of the detector and creates a very tightly fit ‘extraction window’. This method is preferable in high background observations when the instrument PSF is well determined. EChO will have a well characterised PSF across individual detectors. Here optimal extraction techniques are preferable since observations in the mid to far-IR channels can feature significant zodiacal and thermal backgrounds as well as increased dark current rates (Fig. 4). For the remained of this paper we will only consider optimal extraction techniques.

Fig. 4
figure 4

Showing the extracted flux for a single frame as function of wavelength. Blue-continuous line Optimally extracted flux before background subtraction; red-discontinuous line estimated background counts measured on the off-axis spatial direction

Two extraction options are available: 1) Unconstraint PSF, 2) EChOSim PSF with Fine Guidance Sensor (FGS) offset data.

  • Option 1: is the least constraint extraction. Depending on user input, EChOSim-DP fits a Gaussian or Generalised Gaussian Distribution (GGD) PSF along the spatial axis. The GGD is given by

    $$ PSF_{ggd} = \frac{\beta}{2 \alpha {\Gamma}(1/\beta)} \text{exp} - \left [|(\mu_{y} + {\Delta} y(t)) - y|/\alpha \right ]^{\beta} $$
    (3)

    where μ y is the mean position of the spectrum along the spatial axis y for all frames, Δy(t) is a time dependent offset from the mean, α is a scale parameter and in this case equivalent to α=2σ y and σ y signifies the width of the PSF. The shape parameter β introduces a kurtosis argument in the Gaussian distribution. We retrieve the Normal PSF by setting β=2 and obtain leptokurtic and platiokurtic distributions for β<2 and β>2 respectively. We do not assume skew of the PSF in the spatial direction. The PSF shape can either be left as free parameter (to be fitted from the data) or specified as user input. Equation (3) is convolved with the detector response function assumed by EChOSim to obtain the extraction profile.

    $$ \mathcal{P}(y,t) = PSF(y,t) \otimes \mathcal{R}(y) $$
    (4)

    where ⊗ is the convolution operator and the detector response [26] is given by

    $$\begin{array}{*{20}l} &\mathcal{R}(y; {\Delta}_{pix},l_{y}) = \\ & = \frac{\text{tan}^{-1} \left (\text{tanh}(\frac{{\Delta}_{pix}-y}{4l_{y}}) \right ) - \text{tan}^{-1} \left (\text{tanh}(-\frac{{\Delta}_{pix}-y}{4l_{y}}) \right ) }{\text{tan}^{-1} \left (\text{tanh}(\frac{{\Delta}_{pix}}{4l_{y}}) \right ) - \text{tan}^{-1} \left (\text{tanh}(-\frac{{\Delta}_{pix}}{4l_{y}}) \right )} \end{array} $$
    (5)

    where Δ p i x is the pixel size in μm and l y the diffusion length in μm.

  • Option 2: Here we assume a Gaussian PSF (by setting β=2) with a fixed width given by σ y =F # K y λ where F # is the effective focal length of the telescope in μm, K y is the PSF aberration parameter and λ the wavelength in μm. We hold μ y fixed at an EChOSim specified value and obtain the time dependent offset Δy(t) from the EChOSim provided fine guidance sensor (FGS) centroiding.

We note that for current simulations we use a Gaussian PSF. This is through lack of calibration data of the instrument in the current study phase. EChOSim-DP natively supports the inclusion of more realistic PSF functions available in future simulations.

The centroiding is provided as part of the auxiliary information BinaryHDUs and consists of a time series of y-positional offsets sampled at 1Hz frequency. EChOSim-DP downsamples the positional offsets to the integration times specified in the FITS headers. The downsampling operation correctly reflects the error in the positional offset Δy(t) and the associated flux error.

2.5 Background subtraction

EChOSim-DP calculates the background by computing the median (or mean given user input) focal plane illumination 4σ y away from μ y . The background flux is integrated over the area (in pixels) of the extraction profile and subtracted form the extracted flux.

2.6 PSF instabilities

Simulations of PSF variability due to pointing jitter have shown to result in an overall flux error of ∼10−5−1×10−4 but significantly higher for the spectral ranges of the NIR instrument (2-5 μm) where uncorrected flux errors can reach 5×10−3 levels. This is to be expected as the SWIR instrument features a smaller pixel size. Effects due to telescope and optical bench thermal drifts are found to be negligible in the wavelength ranges below 14 μm and temperatures below 50K. We refer the read to [2, 25] for further information.

Intra-visit (i.e. within the observation of an eclipse/transit event) thermal-mechanical distortions and/or other external forcing functions can introduces additional noise on the FGS centroiding information. This has been accounted for by adding a Gaussian centroiding error with a 10 milli-arcsecond rms amplitude, following the outcome of the industrial studies (priv. com.). Inter-visit (i.e. from observation of one eclipse/transit to the next) variations in the FGS PSF are not considered as drifts can be calibrated upon acquisition of the target.

2.7 Normalisation

The final step is the normalisation of the data to the out of transit (OOT) baseline. Similarly to Section 2.7 the normalisation can either be estimated from the data itself by calculating the OOT mean or normalised using noiseless stellar fluxes provided by EChOSim

$$ F_{norm}(\lambda,t) = \frac{F_{total}(\lambda,t)}{F_{star}(\lambda,t)}. $$
(6)

The noiseless flux measure provided by EChOSim allows the idealised case to calculated where a perfect knowledge of the stellar spectrum (and activity) is assumed. We discuss the more complex case in Section 4.2.

3 Data de-trending

After the data as been reduced to 1D time series, EChOSim-DP can attempt a de-correlation of wavelength correlated non-Gaussian systematics. These systematics tend to be due to array wide fluctuations of quantum efficiencies, insufficient flat-fielding, slit-loss effects and pointing jitter. These complex non-Gaussian signals have shown to be important effects in real instruments [14, 2729]. EChOSim implements inter and intra-pixel variations and non-Gaussian pointing jitter noise. Other non-linear noise sources such as correlated astrophysical noise (e.g. such as stellar pulsation, stellar spots and faculae noise) will be included in future releases of EChOSim.

Here we implement the ACICA de-trending algorithm [29]. Based on blind-deconvolution using Independent Component Analysis [27, 30, 31], we estimate the common non-Gaussian time and wavelength correlated signals and construct a systematic noise model which is then used to correct each individual time series. The advantage of these types of de-trending algorithms over others such as Gaussian Processes [32] are their non-parametric nature. This guarantees a high degree of objectivity in the de-trending as well as a simple implementation into existing code (due to the lack of parameterisation required).

4 Lightcurve modelling

Once the data is reduced and de-correlated, the pipeline provides several means of model fitting the resulting lightcurves. The modelling is divided into two main modes: Radiomentric and Dynamic. In the simplest model assumption, the radiometric case, we simply calculate the error bar from the out-of-transit (OOT) scatter of the time series and estimate the transit depth by taking the ratio of in-transit (IT) and OOT data. For the Dynamic case we use a full transiting planet model [33] and iteratively fit for the transit depth parameter using a simplex-downhill algorithm as well as a Markov Chain Monte Carlo (MCMC) routine.

4.1 Radiometric data analysis

For most cases, and for the sake of computational efficiency, the simplistic radiometric model results are desired for EChOSim observations. Let us assume a secondary eclipse measurement of an exoplanet. In the simplest radiometric case, we calculate the transit depth via the simple relation

$$ \delta = F_{out} - F_{in} $$
(7)

where δ is the transit depth, F o u t is the baseline flux (blue line in Fig. 5) and is defined as

$$ F_{out}~=~\frac{1}{N(t_{0-1},t_{4-5})} \left (\sum\limits_{t=t_{0}}^{t_{1}} F_{t} + \sum\limits_{t=t_{4}}^{t_{5}} F_{t} \right ) $$
(8)

where t is the time index, N the number of observations in time, t 0−1 defines pre-ingress baseline time and t 4−5 post-egress timeline (see Fig. 5). Similarly we define the in-transit flux as

$$ F_{in} = \frac{1}{N(t_{2-3})} \sum\limits_{t=t_{2}}^{t_{3}} F_{t} $$
(9)
Fig. 5
figure 5

Single Mandel & Agol (2002) eclipse model. The discontinuous blue line marks the out of transit baseline. The discontinuous green line marks the in-transit flux and δ defines the transit depth. Discontinuous red lines note the contact points t 1−4

Equation (9) is valid for the secondary eclipse case and mid-IR transit cases where limb-darkening is negligible. To avoid the effect of limb-darkening in the case of primary eclipses in the near-IR, we borrow the ‘correct’ transit depths from EChOSim’s auxiliary output files. Note that this is a valid procedure since we are dealing with an over simplistic model here. The dynamic model fitting does not assume auxiliary data. Given (7), we calculate the error on δ as the sum of squares of the time series error

$$ \sigma_{total}/\sqrt{N} = \sqrt{\sigma_{out}^{2} / N_{out}+ \sigma_{in}^{2} /N_{in}} = \frac{\sqrt{2}\sigma}{\sqrt{N}} $$
(10)

where N is the number of observations and we assume that N o u t =N i n =2N as well as σ o u t =σ i n .

4.1.1 Interpretation of radiometric model

The assumption σ o u t =σ i n seems straight forward as one expects the photometric stability not to vary significantly between out-of-eclipse and in-eclipse times. The radiometric error as in (10) is the correct error treatment for the observation of a single lightcurve at a single wavelength with equal lengths of out-of-transit and in-transit measurements. It assumes that no additional knowledge of the baseline (out-of-transit) flux is available and describes the state of largest ignorance, i.e. \(\sigma \rightarrow \sqrt {2} \sigma \). Should additional knowledge of the baseline flux be available (via the calibration of the wavelength dependent stellar spectrum), we can reduce the normalisation error on the baseline. Hence for a perfect knowledge of the baseline flux level \(\sigma _{total} \rightarrow \sigma \).

4.2 Dynamic data analysis

Going beyond the radiometric model assumptions, EChOSim-DP has two additional time-resolved lightcurve model modes: 1) Simplex and 2) MCMC (Fig. 6).

Fig. 6
figure 6

Schematic outline of EChO observations illustrating the changing baseline flux levels. Here blue curves illustrate the stellar out of transit spectra and the green curve the in-transit spectrum of the star. In the case of a secondary eclipse, the green curve represents the stellar spectrum only whilst the blue curve is star+planetary flux

In the simplex case, we fit an analytical lightcurve model [33] to each individual lightcurve in wavelength space, λ. It fully supports eccentric orbit calculations following [34] and allows all model parameters to be fitted. For lightcurves in wavelength ranges below 5 μm we assume stellar limb-darkening for primary eclipses. Here we linearly interpolate the quadratic limb-darkening coefficients of [35] or read the limb-darkening coefficient grid provided by EChOSim to provide an exact match. For the model minimisation we use a simplex-downhill algorithm [36, 37]. In this simple minimisation scheme, we obtain the error bar on the model fit using (10). Each modelling run creates a new model-fitting object in the data pipeline which allows multiple model runs (radiometric as well as dynamic) to be executed in the same instance of the EChOSim-DP.

We furthermore include a more computationally intensive Markov Chain Monte Carlo routine in EChOSim-DP. This routine allows us to investigate more complex scenarios and potential prior dependence (should prior knowledge on the exoplanetary or stellar spectrum be known). The posterior on the model parameter 𝜃 can be written as

$$ p(\hat{\theta} | F) \propto \mathcal{L}(\theta) \pi(\theta). $$
(11)

where \(\mathcal {L}(\theta )\) is the model likelihood and π(𝜃) the prior distribution on the parameter θ (Figs. 7 and 8). Whilst we here only consider 𝜃 to be the transit depth parameter, EChOSim-DP natively supports the inclusion of other free parameters, such as orbital (e.g. ephemeris, eccentricity, orbital inclination) as well as free-floating limb-darkening parameters. In a typical EChO observation, these additional parameters are thought to be well determined by previous studies and are assumed to be fixed. The likelihood is here assumed to be Gaussian and is given by

$$ \mathcal{L}(\theta, \mathbf{d}) = \frac{1}{\sigma \sqrt{2 \pi}} \text{exp} \left [ - \frac{1}{2} \sum\limits_{t=0}^{t=N} \left (\frac{d_{t} - {\Phi}_{t}(\theta)}{\sigma} \right )^{2} \right ] $$
(12)

where d is the data column vector, and d t and Φ t (𝜃) are the datum and lightcurve model at given time-stamp t.

Fig. 7
figure 7

Normalised lightcurve of secondary eclipse of 55 Cnc e (5 eclipses co-added). Red line analytic lightcurve model [33] with the eclipse depth δ as only free parameter. Note the lack of stellar limb-darkening in secondary eclipses and hence a very discrete ingress and egress

Fig. 8
figure 8

Histogram of MCMC chain run for 50,000 iterations. The histogram approximates the posterior distribution of the transit depth parameter δ for the model fit shown in Fig. 7

We use the PyMCFootnote 4 package implementing the adaptive Metropolis Hastings algorithm of [38]. The MCMC chains are typically run with 20,000 iterations taking the minimised result of the simplex-downhill algorithm as starting value to minimise burn-in time [39] which we restrict to 1000 iterations. We here present the univariate version of the likelihood as in most cases all transit parameters but the depth, δ, are fixed. To minimise parameter covariances for multiple free parameters one can follow parameterisation by [40] or [41]. Using a Bayesian approach, we can investigate more complex model solutions such as the impact of the stellar variability on the normalisation of individual lightcurves. Figure 6 illustrates a time series observation of a transiting exoplanet over a wide range of wavelengths. Here the blue curves represent the stellar spectrum, the black curves the time dependent flux variation due to the transiting extrasolar planet with the green line marking the minimum flux. As discussed in Section 4.1.1, if all time series measurements are assumed to be independent of each other (i.e. not correlated in wavelength), we must assume an error of \(\sqrt {2}\sigma \) on the measurement, given the uncertainty of the OOT normalisation. However, it is clear from Fig. 6 that OOT flux of individual time series is correlated in λ through the stellar spectrum. For a perfect correlation (i.e. absolute knowledge on the correct normalisation of the individual time series) the measurement error hence reduces to σ. Hence the normalisation error, σ n o r m , is bound by \( 0 \leq \sigma _{norm} \leq \sqrt {2}\).

We can now express the likelihood of our observation, \(\mathcal {L}\), as product of the likelihood of the lightcurve model, \(\mathcal {L}(\theta )\) and the stellar spectrum model \(\mathcal {L}(\varphi )\). Note that by taking the product we implicitly assume statistical independence between lightcurve and stellar spectra models and below we explicitly assume a Gaussian noise model

$$ \mathcal{L} = \mathcal{L}(\theta) \mathcal{L}(\varphi) = e^{-\frac{1}{2} \chi^{2}(\theta)} ~e^{-\frac{1}{2} \chi^{2}(\varphi)} $$
(13)

where χ 2 is the chi-squared distribution. We can now write the log-likelihood as follows

$$\begin{array}{@{}rcl@{}} \text{log} \mathcal{L} &=& -\frac{1}{2} \sum\limits^{N}_{t=1} \left (\frac{ F_{t,\lambda} - {\Phi}(\theta_{t})}{\sigma_{t}} \right )^{2} \\ &&-\frac{1}{2} \sum\limits^{M}_{\lambda=1} \left (\frac{ \bar{F}_{t=t_{2-3},\lambda} - {\Psi}(\theta_{\lambda})}{\sigma_{\lambda}} \right )^{2} \end{array} $$
(14)

where Φ(𝜃 t ) is the lightcurve model for given time index t, Ψ(𝜃 λ ) is the stellar model for given wavelength index λ, M is the number of resolution elements in the spectrum and σ t and σ λ are the flux uncertainties on the time series and the stellar spectrum respectively. Note that these error terms are not equivalent and also note that \(\bar {F}_{t=t_{2-3},\lambda }\) is the averaged stellar spectrum from time interval t 2t 3.

5 Outputs

Two types of outputs are provided: spectra in ascii format and python-pickelFootnote 5 objects. For each individual lightcurve fitting, EChOSim-DP provides an ascii file containing wavelength, measured flux and error. The pickle file contains all parameters, intermediate and final data products allowing for an exact reproducibly of results. Figure 9 shows the final spectrum for 55 Cnc e in the Chemical Census mode (blue error bars). Figures 10 and 11 show the same simulation for the Origins and Rosetta stone observing modes of EChO.

Fig. 9
figure 9

Final spectrum generated from EChOSim-DP outputs for 55 Cnc e secondary eclipse run in Chemical census mode (i.e. 5 eclipses stacked, R = 50 for λ<5μm and R = 30 for λ>5μm). Blue error bars derived from EChOSim-DP. Grey: planetary emission spectrum read into EChOSim. We marked prominent emission/absorption features

Fig. 10
figure 10

Final spectrum generated from EChOSim-DP outputs for 55 Cnc e secondary eclipse run in Origin mode (i.e. 17 eclipses stacked, R = 100 for λ<5μm and R = 30 for λ>5μm). Blue error bars derived from EChOSim-DP. Grey: planetary emission spectrum read into EChOSim. We marked prominent emission/absorption features

Fig. 11
figure 11

Final spectrum generated from EChOSim-DP outputs for 55 Cnc e secondary eclipse run in Rossetta mode (i.e. 65 eclipses stacked, R = 300 for λ<5μm and R = 30 for λ>5μm). Blue error bars derived from EChOSim-DP. Grey: planetary emission spectrum read into EChOSim. Inset is a zoom into the 2.2 – 2.5 μm wavelength region. We marked prominent emission/absorption features

6 Discussion & conclusion

EChOSim-DP is a custom built data reduction and analysis pipeline for the EChOSim end-to-end mission simulator of the EChO mission concept.

Despite its customised nature, we have developed the pipeline with easy adaptability (through its fully object-orientated programming) to other instruments and data-sets in mind. The pipeline features state of the art data de-correlation algorithms as well as a full Bayesian analysis implementation via adaptive MCMC. Both these aspects, the de-trending as well as the exploration of stellar variability are not required for the current version of EChOSim (version 3.x) but included with future releases. These releases will have special emphasis on realistic stellar noise simulations [42] as well as more advanced non-Gaussian instrument systematics.