One Flow to Correct Them all: Improving Simulations in High-Energy Physics with a Single Normalising Flow and a Switch

Simulated events are key ingredients in almost all high-energy physics analyses. However, imperfections in the simulation can lead to sizeable differences between the observed data and simulated events. The effects of such mismodelling on relevant observables must be corrected either effectively via scale factors, with weights or by modifying the distributions of the observables and their correlations. We introduce a correction method that transforms one multidimensional distribution (simulation) into another one (data) using a simple architecture based on a single normalising flow with a boolean condition. We demonstrate the effectiveness of the method on a physics-inspired toy dataset with non-trivial mismodelling of several observables and their correlations.


Introduction
Monte Carlo simulations play a key role in the data analysis of high-energy physics experiments.The simulation of final-state particles from scattering events and the simulation of their interactions with the detector material are used in many applications.Important examples are the development of particle reconstruction algorithms, the calibration of the properties of the reconstructed particles, the optimization of event-level signal-background classifiers, and the estimate of signal and background contributions to selected phase spaces.While these simulations often provide a very good description of the data, residual imperfections in the simulations can lead to sizeable deviations from the observed data.This can result in reduced performance of algorithms that were developed based on simulated events, biased signal or background estimates, or biased calibrations.
The effect of such mismodelling on a relevant observable, for example, the efficiency for reconstructing a certain particle, is often mitigated by so-called "scale factors" to the simulations.These scale factors do not attempt to correct the underlying mismodelling itself but aim to effectively remove the data-simulation differences for the given application.They come with their associated statistical and systematic uncertainties, which can limit the sensitivity of measurements and searches in high-energy experiments.
An alternative is to address the underlying mismodelling of the relevant observables, such as the features that are used to reconstruct a certain particle.Such an approach promises to be more general, as any algorithm that is based on the corrected features is expected to show a better agreement of simulation and data.One approach is to derive weights for simulated events.These can be obtained from a classifier that is trained to distinguish simulation and data [1][2][3][4].However, a disadvantage of the weighting approach is that observables that are not part of the training may show a worse agreement between simulations and data after the weights are applied.The weighting approach may also lead to increased uncertainties due to the limited number of samples in the simulation in case of large weights.Another approach is to morph the observables to correspond to the multidimensional target distribution so that only certain observables are modified.This approach has been implemented with chained quantile regressions [5], generative adversarial networks [6], input convex neural networks [7] for optimal transport properties [8], generative diffusion networks [9] and normalising flows [10][11][12].In the latter approach, normalising flows [13,14] are trained as a bijective transformation between the complex multidimensional distribution in the input space and a simple "base distribution" of the same dimensionality, often a multivariate Gaussian distribution.Since monotonically increasing bijective transformations are used, normalising flows preserve the quantiles of the distributions, which makes them suitable for morphing.
We propose a morphing procedure based on a single normalising flow that is conditioned on a boolean ("IsData") that encodes whether the input is drawn from the simulation or the data.We train the normalising flow simultaneously on both datasets to learn a conditional mapping between the input distributions and the base distribution.After the normalising flow is trained, we then map samples from simulation to the base distribution of the flow and switch the boolean condition before mapping back to the input space, effectively considering them samples from data.Our approach differs from previous work on normalising flows for morphing simulations.One proposed strategy is to learn a mapping from input to base distribution only for the data and sample from the base distribution to produce corrected values for the simulation [10].Another strategy is a chain of normalising flows, one for each dimension in the input space, similar to chained quantile regression [11].In Ref. [12], five different methods were studied: The "base transfer" method uses a combination of two normalising flows that map the two input spaces to the same base distribution.In the "unidirectional transfer" method, the base distribution for one of the normalising flows is given by the distribution in the other input space, which is then mapped to a base distribution by a second normalising flow.The "flows for flows" approach extends the unidirectional transfer method to be bidirectional, using a third normalising flow, so that both input spaces are mapped to separate base distributions.Two additional proposals in Ref. [12] extend the flows for flows with constraints on the learned transfer maps.
Our approach is novel and simple, as we propose to train only one normalising flow that learns the mapping for data and simulations simultaneously.Similar to the base transfer method, we train a mapping to the same base distribution for both input spaces.However, the use of only one flow simplifies the training procedure and the time spent in optimisation.It results in an effective sharing of parameters for the mappings of the two input spaces to the base distribution.
We study our approach in a toy example that captures several aspects of realistic applications in high-energy-physics experiments: (a) the observables follow different marginal probability density functions; (b) the observables are partially correlated; (c) the probability density functions and the structure of the correlations depend on ancillary variables; (d) the probability density functions for data and simulation differ in their shapes, their correlations, and their dependence on the ancillary variables.We investigate to what extent our morphing approach can correct the marginal distributions and their correlations, and we test whether a multivariate classifier can still separate data from the corrected simulation.In addition, we describe a preprocessing step that transforms discontinuous input distributions into multimodal distributions and significantly improves the quality of the corrections.
We introduce the setup of the normalising flow in Section 2 and evaluate the performance for the case of well-known two-dimensional benchmarks for generative models (checkerboard, two moons, four circles).In Section 3, we describe the generation of the physics-inspired toy dataset.The quality of the corrections on this dataset is discussed in Section 4. We present our conclusions in Section 5.
2 Correcting simulations with one normalising flow

Normalising flows for morphing distributions
Normalising flows are a class of generative models designed for the effective learning and sampling of multivariate probability distributions.They are constructed by parametrising a transformation that performs the mapping of a complex input distribution to a tractable base distribution of the same dimension, d.Such a transformation, f , must be invertible and hence ensure a one-to-one correspondence between the input probability density function (PDF), p x (x), and the base PDF, p z (z), i.e. f : R d → R d .Given that the composition of multiple invertible functions remains invertible, it is common practice to construct the transformation f as a composition of N invertible transformations to increase the expressiveness of the model [14,15] If f is differentiable, one can express the PDF in the input space, p x (x), as a function of the base PDF and the Jacobian matrix of the transformation, J f , with the change of variables formula as For a composition of transformations, the log-likelihood function becomes where the With a suitable choice of the base PDF p z and the transformation f , equation (2) has a closed form and can be used to train the normalising flow with negative log-likelihood as loss.The transformations f i are typically parameterised by neural networks, referred to as "auxiliary networks" in the following.We used masked autoregressive flows [16] based on spline transformations [17][18][19][20].The autoregressive property of the flow leads to an efficient computation of the Jacobian determinant.The auxiliary networks are implemented as multilayer perceptrons with masked connections, for which we use MADE blocks [21] (Masked Autoencoder for Distribution Estimation).Using splines in normalising flows has the advantage that they can approximate more complex distributions compared to affine transformations while still being simple to invert.We use neural spline flows [20], which are based on monotonic rational quadratic splines.The monotonicity of the transformation is important for morphing distributions, because they ensure that the quantiles of the distributions are preserved during the transformation.When a transformation is monotonic, the order of data points is maintained.This means that the same proportion of data points will fall below any given threshold in both the original and transformed distributions.Consequently, the quantiles, which represent these thresholds, remain unchanged.and simulation to the same base distribution.The flow is conditioned on a boolean that encodes whether the input is drawn from simulation or data.Bottom: Illustration of the preservation of quantiles during the morphing from simulation to data space using the base distribution as an intermediary.
The normalising flow is simultaneously trained on both datasets, i.e. simulation and data, mapping them to a shared base distribution.This approach is illustrated in Figure 1 (top).We condition the flow on a boolean variable ("IsData"), which enables the learning of distinct mappings to the common base distribution for both, simulation and data.To correct the simulation, we use the trained flow to map samples from simulation to the base distribution, switch the boolean, and use the inverse transformation to map back to the data input space.This procedure effectively performs a quantile morphing between simulation and data, using the base distribution as an intermediary as depicted in Figure 1 (bottom).

Two-dimensional benchmarks
For a first evaluation of the single-flow approach, we check its capability to morph two sets of commonly used two-dimensional benchmark datasets for generative models.In the first test, we morph the checkerboard distribution to the four-circle distribution.In the second test, we morph the checkerboard distribution to the two-moons distribution.In both cases, we also check the inverse morphing performance of the flow.The checkerboard and four-circles datasets are taken from the GitHub repository of Ref. [12], the two-moons dataset was generated using scikit-learn [22].One million samples were drawn from each distribution, using 60% for training, 10% for validation and 30% for testing.
We use the neural spline implementation of the Zuko package [23] in PyTorch [24].We set the number of bins in each spline to 8.For the MADE blocks, we use two hidden layers with 128 nodes each and ReLU activation.We use the ADAM optimiser [25] with a cosine annealing learning-rate scheduler [26].The training is performed until the validation loss does not improve over 15 epochs.For the two-dimensional benchmark examples, the normalising flow is composed of four monotonic rational-quadratic transformations.
Figure 2 shows the result of the morphing, where each plot displays the original distributions on the left with the resulting morphed distributions on the right.Our approach is able to reproduce also the sharp edges and discontinuous features in both sets of distributions.We note that it is simple to extend the boolean condition to a one-hot encoding in order to morph between more than two domains with a single flow.We illustrate this in a three-domain example in Appendix C.

Generation of the physics-inspired dataset
The toy dataset is divided into the two toy classes "data" and "simulation" and includes seven variables.Three of the variables are ancillary variables that are loosely inspired by kinematic features in typical high-energy physics applications, whereas the other four variables can be interpreted as informative features that discriminate between signal and background.The informative features are conditioned on the ancillary variables to enhance the complexity of the multivariate distribution and make the dataset more realistic.The structure of the conditions is different for data and simulation.This encodes the mismodelling deep into the structure of the multivariate distribution, making the task of correcting simulated events to data more challenging.In addition to the conditioning, non-trivial correlations are included between several of the seven variables.The ancillary features "p T " and "η" are inspired by the typical shapes of transverse momentum and absolute pseudorapidity distributions from decays of massive particles in collider experiments.The distribution of the p T variable is exponential with different scale parameters for data and simulation.The η variable is drawn from a uniform distribution in the interval [−2, 2], and a Gaussian smearing with a mean of unity and different standard deviations for data and simulation is applied to smooth the edges of the interval.Finally, the absolute value is taken to obtain the η values.The third ancillary feature, the noise "N " is uniform in the interval [0, 3].The noise variable can either correspond to the azimuthal angle, which might have non-trivial correlations with informative features in the case of detector asymmetries, or to a variable that is related to the pileup conditions at hadron-collider experiments.
The four informative features are divided into two families A and B. The two variables v A 1 and v A 2 are drawn from uncorrelated Gaussian distributions.The mean and the standard deviation are different for data and simulation and depend on the ancillary features.With increasing η, the distributions are shifted to the left.The distributions become increasingly narrow for high p T , whereas larger noise leads to broader distributions.Overall, both v A i distributions roughly resemble Gaussian distributions, although the non-trivial conditions lead to slightly non-Gaussian effects, such as heavy or asymmetric tails and flattened or compressed peaks.The distributions for the v B 1 and v B 2 variables are discontinuous.A certain fraction of the samples is assigned a value of zero and the rest of the values are drawn from shifted exponential distributions.This corresponds to a density mixture of a Dirac delta at zero with a tail to the right.The fraction of samples at zero increases with p T and decreases with larger N values.The scale of the exponential distributions for the events in the tail increases with larger η and N values.Again, these effects differ in magnitude between data and simulation.
As a final step, the seven variables are endowed with non-trivial correlations using the mcerp package [27,28].While the marginal distributions before and after the artificial correlation are very similar, we are able to impose non-trivial correlations between all features, which increases the complexity of the dataset.The resulting correlation matrices differ significantly between data and simulation.Figure 3 shows the marginal distributions of the seven variables for data and simulation.Two-dimensional visualisations with 68% contours are given in Appendix A.
We generate two statistically independent datasets for toy data and toy simulation, respectively, with ten million events each.More technical details for the generation of the dataset are given in Appendix B. The code for generating the dataset is publicly available1 .4 Training and results on the physics-inspired dataset

Preprocessing and training
Figure 4 illustrates the forward pass of the normalising flow for the example of the informative feature v B 1 .In the forward pass, all four informative features are transformed to the four-dimensional base distribution via autoregressive rational quadratic splines, as described in Section 2. The MADE blocks are conditioned on the IsData boolean and the three ancillary variables p T , η and N .Thus, the learning objective is the conditional probability p This setup corresponds to the typical application of this correction method in high-energy physics, where the model is trained in a control region and applied in a signal region with different distributions of the ancillary features.The preprocessing of the input variables includes a reweighting step for the ancillary variables and a smoothing step for discontinuous informative features.Additionally, all variables are studentised, i.e. we use (v i − v)/s v , with v and σ v the sample mean and the sample standard deviation of the given sample of variable v, respectively.The reweighting and smoothing procedures are introduced below.

Rational quadratic splines
The reweighting step ensures that the distributions of the ancillary variables match well between data and simulation.The ancillary variables are used as conditions for the normalising flow so that the correction is determined as a function of these variables.The idea is that the corrections can be used in samples that show a different distribution in the ancillary variables than the distribution of the training sample.Thus, the reweighting step avoids effects from differences between data and simulation in the ancillary variables.The reweighting is performed simultaneously in the three ancillary variables with 16 bins for p T , 16 bins η for and 10 bins for N .The binning is chosen such that each bin contains approximately the same number of events.
Discontinuous informative features are subject to a smoothing step, making them continuous.The reason is that normalising flows use differentiable transformations to map the original distribution to a differentiable base distribution.As this can only be approximate for discontinuous input distributions, the overall performance of normalising flows for the morphing application may suffer in such cases [29].In our dataset, this applies to the variables v B 1 and v B 2 .The smoothing step aims to fill gaps between continuous parts of distributions.In our dataset, the discontinuity consists of a peak at zero that is followed by a gap and a smoothly falling tail. Figure 5 (left) shows the feature v B 2 as an example.We substitute any zero value with a random value according to a triangular distribution with a fixed slope and then move the tail close to the end point of the triangle.We chose a triangular distribution for simplicity and because it fulfils the requirement to not overlap with the continuous tail so that the inversion of the random sampling is unambiguous.In the case of our dataset, we include an additional logarithmic transformation that spreads the values within the previous gap closer to the tail.The result of the smoothing step for v B 2 is shown in Figure 5 (right).
As a final preprocessing step, all four informative features and the three ancillary variables are studentised.This also applies to the smoothed and transformed v B i variables.As for the benchmark datasets in Section 2, the generated datasets are again separated into training, validation and test datasets with a split of 60%, 10% and 30%.The whole dataset is composed of 2.5 million data samples and 2.5 million simulation samples.We do not use the entire generated dataset, as this is not necessary for the successful training of the model.
A normalising flow as introduced in Section 2, and illustrated in Figure 4, is used for to correct the simulation in the physics-inspired dataset using six monotonic rational-quadratic transformations.We did not perform a detailed hyperparameter optimisation, as we observed stable and satisfactory results without extensive tuning.As a cross-check, we trained a model with double the number of transformations and used three layers in the auxiliary networks, but we did not observe any significant differences in the performance of the corrections.Early stopping is used to end the training process once the loss on the validation dataset fails to improve for 15 consecutive epochs.

Evaluation of the corrections
We evaluate the quality of the corrections by checking the agreement between data and corrected simulations in (a) the marginal distributions of the four informative features, (b) in the Pearson correlation coefficients between all seven variables in the dataset, and (c) in the output distribution of a boosted decision tree (BDT) that is trained to distinguish between data and corrected simulations.
The marginal distributions of the four informative features for uncorrected and corrected simulations and the data are shown in Figure 6.The distributions are normalised to unit area.The agreement between simulation and data is strongly improved by the normalising-flow corrections.We observe that the agreement after the correction is at the level of 1-2% in the bulk of the distributions for all four informative features.In the tails of the distributions, where the uncertainties due to the limited size of the test dataset (calculated from the variance of the sum of squared event weights) are of the order of a few per cent, the corrected simulation still agrees with the data within these uncertainties.The very good agreement in the discontinuous distributions v A 2 and v B 2 illustrates that the smoothing step in the preprocessing of these variables was successful.This is especially true for v B 1 , where a substantial migration is required from the peak at zero to the tail of the distribution for the simulation and the data to match.
Figure 7 shows the marginal distributions of the corrections applied to simulation for the four informative features.The continuous variables, v A 1 and v A 2 , show a smoothly falling distribution with the maximum at zero.The discontinuous variables, v B 1 and v B 2 , show a sharp peak at zero and then a second maximum.While the peak at zero comes from samples that were not moved in v A 2 or v B 2 , respectively, the second maximum originates from migrations from the original peaks to the tails of the distributions and vice versa.
To assess whether the normalising-flow correction can also capture correlations correctly, we compare the Pearson correlation coefficients (ρ) between all seven variables (informative features and ancillary variables) for simulation and data.In Figure 8, we show the difference between the correlation coefficients between nominal simulation and data on the one hand and corrected simulation and data on the other hand.The correlation coefficients for data and nominal simulation display notable differences, as induced during the generation of the dataset.The comparison between data and corrected simulation reveals a significant improvement.After the correction, the agreement in the correlation coefficients is below or at the level of 1%.This shows that the normalising flow is able to learn and correct the non-trivial correlation structure in the datasets.
As a final assessment of the quality of the corrections, we train a BDT to distinguish corrected simulated samples from data.We determine the classification power of the BDT to check how well the two underlying multivariate probability densities agree [30].The idea is that the better the agreement between data and simulation, the more challenging it is for the BDT to differentiate between data samples and corrected simulation samples.For comparison, we also train a separate BDT to distinguish data samples from uncorrected simulated samples.
The BDTs are trained with the XGBoost package [31], using logistic regression for binary classification as the loss function.The training is stopped once the loss does not improve for 30 boosting rounds.A learning rate of 0.1 is used, and the trees can grow to a maximum depth of 10.For the training and evaluation of the BDTs, only the events from the original normalising-flow test dataset are used, which is again divided into a training (60%) and test dataset (40%).
The output distributions of the BDT that was trained to distinguish data and corrected simulation are shown in Figure 9.The distributions for data and corrected simulation are very similar.ROC curves for the two BDTs are shown in Figure 10.The classifier is sufficiently powerful to distinguish data and uncorrected simulated events with an area under the curve (AUC) of 0.88 on the test dataset.Distinguishing data from corrected simulation is significantly more challenging: The ROC curve is close to the ROC curve for a random guess and has an AUC of 0.52 on the test dataset.These results indicate that the normalising-flow correction can also correct complex structural differences between the variables in the toy datasets that go beyond what can be captured with Pearson correlation coefficients.
We observed reduced BDT classification performance, i.e. improved performance of the corrections, when we improved the reweighting of the ancillary variables in the preprocessing from uniform binning to binning with approximately the same number of events per bin.This indicates that part of the residual difference between corrected simulation and data stems from small differences in the ancillary variables and not from imperfections in the normalising-flow morphing.
As an alternative to the autoregressive structure, we perform a test with a normalising flow based on GLOW coupling blocks [32], where the affine transformation is replaced by the monotonic rationalquadratic transforms.When training the BDT to discriminate simulations corrected with this alternative setup and data, we observe very similar performance with an AUC on the test dataset of 0.52.We also study the behaviour of the corrections over the full range of the ancillary features.This is important for many applications in high-energy physics, in which the corrections are typically obtained from a phase space with a different distribution of the ancillary variables than the phase space where the corrections are ultimately applied.We partition the events into 8 equally populated bins in p T , η and N .For each bin, we obtain the 25%, 50% and 75% quantiles of the output distribution of the BDT that we trained to distinguish the data from the corrected simulation.Figure 11 shows these quantiles for data and corrected simulations as a function of p T , η and N .The curves for data and corrected simulations are very similar, which indicates that the corrections are stable as a function of the three ancillary variables.

Conclusions
We have introduced an approach for morphing two multidimensional distributions that is based on a single autoregressive normalising flow and a boolean variable as a condition.We have applied this approach to correct simulation to correspond closer to data in a physics-inspired toy dataset of four informative features and three ancillary variables.The toy dataset presents non-trivial correlations between variables and pronounced differences in the distributions between simulation and data as a function of the ancillary variables.In addition, we have described a smoothing procedure for discontinuous input distributions that is beneficial as a preprocessing step for normalising flows.
We find an agreement at the level of 1-2% in the bulk of the marginal distributions of corrected simulation and data.In the tails of the marginal distributions, corrected simulation and data agree at the level of a few percent.In addition, we find large improvements in the agreement between the correlations in simulation and data.Using a boosted decision tree to classify corrected simulation and data, we observe that both classes are hardly distinguishable after morphing and that the performance of the morphing is stable as a function of ancillary variables.
The morphing approach with a single normalising flow shows excellent performance and seems a promising tool for the correction of complex simulated distributions in high-energy physics and beyond.Finally, we note that it is simple to extend the boolean switch to a multiclass condition to morph between more than two domains (cf.Appendix C).

A Two-dimensional visualisation of the dataset
Figure 12 visualises the physics-inspired dataset by showing two-dimensional 68% contours of each pair of variables.This shows their non-trivial correlations and highlights that these are different between toy simulation and toy data.

MC Data
Figure 12: Visualisation of the physics-inspired dataset in selected ranges for simulation (blue) and data (orange).On the diagonal, the marginal distributions are shown.On the off-diagonal, the 68% contours obtained from kernel density esimation are depicted.The ancillary variables are defined in a way that p T is unitless, η takes only positive values and N is defined in the interval [0,3].

B Details of the dataset generation
The p T variable is distributed according to the density The specific values for the parameters that were used to generate the samples for this study are given in Table 1.After the generation of these uncorrelated variables, the induce_correlations function of the mcerp package [27,28] is used to bestow non-trivial correlations among all seven variables.The degenerate values at zero for the v B i variables are smeared with small uniform noise to avoid that the correlation algorithm fails, as it is based on the per-variable ranking of the input data.The smeared values are mapped back to zero after the correlations are induced.

C Morphing between three domains
The single-flow morphing can easily be extended to morph between more than two domains by using one-hot encoding instead of a boolean condition.We illustrate this in Figure 13 by showing the results of morphing between three two-dimensional distributions: checkerboard, two moons and four circles.As in the two-domain case in Section 2.2, the single-flow approach is able to reproduce the sharp edges and discontinuous features of the distributions.

Figure 1 :
Figure 1: Top: Illustration of the single-flow morphing.The normalising flow is trained to map both dataand simulation to the same base distribution.The flow is conditioned on a boolean that encodes whether the input is drawn from simulation or data.Bottom: Illustration of the preservation of quantiles during the morphing from simulation to data space using the base distribution as an intermediary.

Figure 2 :
Figure 2: The upper plots show the morphing from the checkerboard distribution into the four-circles distribution (left) and into the two-moons distribution (right).The lower plots illustrate the inverted transformation.

Figure 3 :
Figure 3: Marginal distributions for the seven variables in the data and simulation datasets.The three ancillary variables are shown in the upper figures and the four informative features in the lower figures.The ancillary variables are defined in a way that p T is unitless, η takes only positive values and N is defined in the interval [0,3].

Figure 4 :
Figure 4: Illustration of the forward pass of the normalising flow for the example of informative feature v B 1 .The four informative features are transformed by the autoregressive structure one at a time.The ancillary variables and the IsData boolean are conditional inputs to the Masked Autoencoder for Distribution Estimation (MADE) neural network, which generates the parameters for the rational quadratic splines that transforms the variables.

Figure 5 :
Figure 5: Distribution of the informative feature v B 2 before (left) and after the smoothing and the logarithmic transformation (right) for nominal simulation and data, normalised to unit area.The first bin in the distribution on the right includes the underflow.The last bin in both distributions includes the overflow.

Figure 6 :
Figure 6: Nominal simulation (blue), corrected simulation (red) and data (black) marginal distributions (normalised to unit area) for the informative features.The two panels below each figure show the ratio of nominal/corrected simulation and the data with a range of 1.0 ± 0.2 and with a closer zoom of 1.00 ± 0.05.Markers that are not shown in the ratio plots are out of the y-axis range.The first bin in the upper distributions includes the underflow.The last bin in all distributions includes the overflow.

Figure 7 :
Figure 7: Marginal distributions of the corrections applied to simulation for the four informative features ("distance traveled").

Figure 8 :
Figure 8: Differences of the Pearson correlation coefficients (ρ) between nominal toy simulation and toy data (left) and between corrected simulation and data (right).The ρ values are given in per cent.Values of ρ with |ρ| < 1% are not shown as numbers but are only encoded by the bin colour.

Figure 10 :
Figure 10: ROC curves for BDTs trained to separate between data and nominal (red) and corrected (green)simulation, evaluated on the test datasets (solid).The ROC curve for a random guess is shown as a black dashed line for reference.

Figure 11 :
Figure 11: 50% (solid curves) and 25% and 75% quantiles (dashed curves) of the BDT output distribution as a function of p T (top, left), η (top, right) and N (bottom) for data (green) and corrected simulation (red).The quantiles are extracted in ten equally populated bins and the different values are shown at the bin centres.The ancillary variables are defined in a way that p T is unitless, η takes only positive values and N is defined in the interval [0, 3].Uncertainties from the limited number of simulated samples are shown as bands.

f
(p T ; β p ) = 1 β p exp(−p T /β p )(3)with scale parameter β p .The values for the η variable are generated according to η = |u • n| , where the two factors are distributed according to a uniform and a Gaussian distribution, respectively:U ∼ U(−2, 2) and N ∼ N (1, η s ).The noise N is uniformly distributed in the interval [0, 3].The values for the v A i variables are drawn from uncorrelated Gaussian distributions with conditioned means of µ − f µ • η and standard deviations of max(σ − f σ • p T + 0.1 • N, 0.1).The v B i values are drawn from a mixture distribution composed of a Dirac delta peak at zero and a shifted exponential distribution.Its density has five parameters: r, f r , x t , β B , and f B .The probability of assigning a value of zero is given by r + f r • p T − 0.1 • N .The exponential distributions is shifted by x t to the right and the scale parameter is given by β B + f B • η + 0.2 • N. The parameter values in the v A i and v B i families differ for the individual variables.

Table 1 :
Parameter settings for the generation of the toy dataset used in this study.The values in the upper (lower) part were used to generate toy data (simulation).

Figure 13 :
Figure 13: Morphing between three domains: The upper row shows the morphing from the checkerboard to the two-moons and four-circles datasets.The middle (bottom) row shows the morphing from the two-moons (four-circles) to the other two datasets.