EmulART: Emulating radiative transfer—a pilot study on autoencoder-based dimensionality reduction for radiative transfer models

Dust is a major component of the interstellar medium. Through scattering, absorption and thermal re-emission, it can profoundly alter astrophysical observations. Models for dust composition and distribution are necessary to better understand and curb their impact on observations. A new approach for serial and computationally inexpensive production of such models is here presented. Traditionally these models are studied with the help of radiative transfer modelling, a critical tool to understand the impact of dust attenuation and reddening on the observed properties of galaxies and active galactic nuclei. Such simulations present, however, an approximately linear computational cost increase with the desired information resolution. Our new efficient model generator proposes a denoising variational autoencoder (or alternatively PCA), for spectral compression, combined with an approximate Bayesian method for spatial inference, to emulate high information radiative transfer models from low information models. For a simple spherical dust shell model with anisotropic illumination, our proposed approach successfully emulates the reference simulation starting from less than 1% of the information. Our emulations of the model at different viewing angles present median residuals below 15% across the spectral dimension and below 48% across spatial and spectral dimensions. EmulART infers estimates for ∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}85% of information missing from the input, all within a total running time of around 20 minutes, estimated to be 6×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} faster than the present target high information resolution simulations, and up to 50×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} faster when applied to more complicated simulations.

impact on observations. A new approach for serial and computationally inexpensive production of such models is here presented. Traditionally these models are studied with the help of radiative transfer modelling, a critical tool to understand the impact of dust attenuation and reddening on the observed properties of galaxies and active galactic nuclei. Such simulations present, however, an approximately linear computational cost increase with the desired information resolution. Our new efficient model generator proposes a denoising variational autoencoder (or alternatively PCA), for spectral compression, combined with an approximate Bayesian method for spatial inference, to emulate high information radiative transfer models from low information models. For a simple spherical dust shell model with anisotropic illumination, our proposed approach successfully emulates the reference simulation starting from less than 1% of the information. Our emulations of the model at different viewing angles present median residuals below 15% across the spectral dimension, and below 48% across spatial and spectral dimensions. EmulART infers estimates

Introduction
Cosmic dust is ubiquitous in the Universe, particularly present in the interstellar medium (ISM) (Zhukovska et al, 2008) and in the line of sight towards astrophysical objects such as supernovae remnants (Rho et al, 2009), galaxies (Dunne et al, 2011) and active galactic nuclei (AGN) (AGN, Haas et al, 2000). Dust grains absorb and scatter UV/optical radiation and re-emit that energy at infrared wavelengths, and are thus responsible for both attenuation and reddening of light in the line of sight, which also impact distance measurements for cosmology when using "standard candles" such as supernovae (Betoule et al, 2014). Moreover, scattering on the dust grains and dichroic absorption in the dusty medium may lead to polarization of the light as it traverses the interstellar medium.
The effect these processes -of dust absorption, scattering and emissionhave on the light detected from astronomical objects must be accounted when studying their intrinsic properties. Moreover, the nature of the dust can also inform us on physical and chemical processes related to its own history, from formation, variation in composition, growth and destruction in different astrophysical structures such as accretion disks, clouds and galaxies, as well as its interaction with magnetic fields through dust grain alignment. The analysis of the aforementioned interactions requires performing radiative transfer (RT) calculations (Steinacker et al, 2013), allowing us to simulate the light path, from the source to the observer, depending on the physical properties of the emitting source embedded in dust structures of different geometries, sizes and composition. Comparing various RT models, based on different simulated properties, to global and pixel-by-pixel spectral energy distributions (SEDs) obtained in astronomical observations we can infer valuable information on the properties of the light sources, as well as the distribution and properties of the dust (André et al, 2010;Cox et al, 2012;Fritz et al, 2012).
Observations of molecular clouds (Beech, 1987;Falgarone et al, 1991) have shown that dust distributions are often inhomogeneous and complex, consequently the understanding of their intrinsic properties requires 3D radiative transfer calculations. This type of non-local and non-linear problem requires calculations which are computationally very costly (Steinacker et al, 2013), this has prompted the search for alternative non-analytic approximate ways to address them. One of the most successful ways is Monte Carlo Radiative Transfer (MCRT, Mattila, 1970;Roark et al, 1974).
Monte Carlo Radiative Transfer methods simulate a large number of test photons, that propagate from their emission by a source through their journeys through the dusty medium. At every stage the characteristics that define their paths are determined by generating numbers from the probability density function most suited for each process they may undergo (absorption, scattering, re-emission, etc). At the end of the simulation, the radiation field is recovered from a statistical analysis of the photon paths. As an ensemble, the distribution of particles provides a good representation of the radiative transfer, as long as a sufficient number of photons is chosen. Stellar Kinematics Including Radiative Transfer (SKIRT, Baes et al, 2011;Camps and Baes, 2015) is an MCRT suite that offers some built-in source templates, geometries, dust characterizations, spatial grids, and instruments, as well as an interface so that a user can easily describe a physical model. The user can in this way avoid coding the physics that describes both the source (e.g. AGN or galaxy type, observation perspective, emission spectrum) and environment (between the simulated source and observer, such as dust grain type and orientation, dust density distribution, etc) but instead design a model of modular complexity by following a Q&A prompt (itself adaptable to the user expertise).
MCRT simulations suffer from computational limitations, namely the memory requirement scaling with the volume grid density, and the processing time scaling quasi-linearly with the amount of photons simulated (Camps and Baes, 2020). Autoencoders (Zhuang et al, 2021) together with collocation strategies (Guo et al, 2019) have been applied to solve complex interactions such as the ones described above, our approach differs by attempting to upscale the information density within a simulated data product instead. Considering that the objects and phenomena modeled by MCRT simulations present non-random spatial structures with heavily correlated spectral features, we tackle the computational cost issue through the development of an emulator that can achieve HPN-like MCRT models by exploring and implementing an autoencoder neural network in combination with Integrated Nested Laplace Approximation (INLA, Rue et al, 2009), an approximate method for Bayesian inference of spatial maps modeled with Gaussian Markov random fields, on LPN-like MCRT simulations. The results are then compared against an analogous implementation employing principal component analysis (PCA).
Section 2 provides a brief highlight of the employed methods. Section 3 describes our pipeline architecture and some of steps that lead to its development. Results and performance evaluation follow in Section 4. Section 5 presents our perspective on the significance of the obtained results and provides the steps to follow in order to both improve and generalize them in future developments.
All files concerning this work (SKIRT simulations, R scripts, neural network models, emulation products and performance statistics) can be obtained from our repository 1 .

Methods
To reduce the computational cost of SKIRT simulations, without compromising, as much as possible, the quality of the resulting models, an autoencoder, i.e. a dimensionality reduction neural network, is implemented to compress the spectral information within LPN spectroscopic data cubes. Then approximate Bayesian inference is performed with INLA on the spatial information of the compressed feature maps. Lastly, the reconstructed feature maps are decompressed to an HPN emulation.

SKIRT
As previously state, SKIRT allows the creation of models by prompting a Q&A.
Through it the user can configure any one-to three-dimensional geometry and combine multiple sources and media components, each with their own spatial distribution and physical properties, by either employing the built-in library or importing components from hydrodynamic simulations. Media types include dust, electrons and neutral hydrogen; the user can configure their own mixture, including optical properties and size distributions or simply choose from the available built-in mixtures. The included instruments allow the "observation" and recording of spectra, broad-band images or spectroscopic data cubes.
SKIRT uses Monte Carlo method for tracing photon packets through the spatial domain (both regular and adaptive spatial grids area available, as well as some optimized for 1D or 2D models), these packets are wavelength sampled from the source/medium spectrum (the wavelength grids can be separately configured for both storing the radiation field and for discretizing medium emission spectra). As they progress through the spatial grid cells, these photon packets can experience different physical interactions, such as multiple anisotropic scattering, absorption and (re-)emission by the transfer medium, Doppler shifts due to kinematics of sources and/or media, polarization caused by scattering off dust grains as well as polarized emission, among others.
The present application is intended for the combination of both spatial and spectral information. The outputs used here will be spectroscopic data cubes, these are in the flexible image transport system (FITS) format (Wells et al, 1981) and are composed of 2D spatial distributions at the desired wavelength bins.
The simulated spatial flux densities vary according to the amount of photons/photon packets simulated. The simulation starts by assigning some energy to those photons following the spectral energy distribution (SED) of a given astrophysical source, ensuring that no matter how many photons are simulated the spectral information is preserved. Simulations with a lower photon number (LPN) will consequently display fewer spatial positions with information (non-zero flux) and some of these pixels will have higher flux, some lower, i.e. the SED will have lower signal to noise ratio than in simulations with higher photon number (HPN). SKIRT has already been employed in the study of various galaxies (De Looze et al, 2014;Viaene et al, 2017;Verstocken et al, 2020), AGN (Stalevski et al, 2016(Stalevski et al, , 2019 and other objects 2 .

Dimensionality Reduction
Dimensionality reduction methods can more familiarly be called as compressors. These are methods that analyse and transform data from a highdimensional space into a low-dimensional space while attempting to retain as much meaningful properties of the original data as possible. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality 3 , and analyzing the data usually becomes computationally intractable. Popular dimensionality reduction techniques in astronomy include PCA (Krone-Martins and Moitinho, 2014;Logan and Fotopoulou, 2020) and Non-Negative Matrix Factorization (NMF, Ren et al, 2018;Boulais et al, 2021).

Denoising Variational Autoencoders
Autoencoders (AEs) are a type of neural network architecture employed to learn compressed representations of data. In such architectures the input and output layers are equal, and the hidden layers display a funneling in and out scheme in regards to the number of neurons per layer, with the middle layer having the least amount of neurons and the input and output layers having the most. The models built this way can be seen as the coupling of a compressor/encoder and a decompressor/decoder, the first generating a more (ideally) fundamental representation of the data, and the second bringing it back to its initial feature space. After training, the encoder can be coupled at the beginning of other architectures, providing them with more efficient features from which to learn from. Interesting to note that AEs have shown promise as auxiliary tools in citizen science projects 4 , such as the Radio Galaxy Zoo (Ralph et al, 2019).
Alternative ways of training this kind of network exist, such as: • Having multiple instances of each data point, resulting from the injection of noise or from a set of transformations. Each instance of the same data point is then matched to the same output with the aim of making the model robust to noise and/or invariant to those transformations.
• Having the mid layer composed by two complementary layers of neurons (a mean layer and a standard deviation layer) instead of a single layer.
Complemented with an appropriate loss function, the model will learn approximate distributions of values instead of single values, making it more robust and allowing for the decoder to also become a generator of different, yet statistically identical, examples.
(Jiwoong Im et al, 2015) showed that the DVAE, by introducing a corruption model, can lead to an encoder that covers a broader class of distributions when compared to a regular VAE (the corruption model may however remove the information necessary for reconstruction). In this context the loss function, L DV AE , to be minimized is given by the weighted sum of the Kullback-Leibler divergence (which influences the encoder) and the reconstruction error (see Eq. 1), similarly to a VAE, with the difference that the encoder now learns to approximate a prior p(z) given corrupted examples and that in this case the reconstruction error can be interpreted as a denoising criterion: where KL refers to the Kullback-Liebler divergence, y ∼ q(y | y) is a sample of the corruption model, p(z) is the prior for the latent feature, q(z | y ) models the encoder, p(y | z) models the decoder/generator and (a 1 , a 2 ) are weights.
For more details the reader is referred to (Jiwoong Im et al, 2015;Doersch, 2016).

Principal Component Analysis
Principal Component Analysis (PCA) is a method that analyzes the feature space of the training data and creates orthogonal vectors (linear combinations of the initial variables) whose direction indicates the most variability. These new vectors in the transformed data set are called eigenvectors, or principal components, while the eigenvalues represent the coefficients attached to eigenvectors and give the relative amount of variance carried by each principal component.
PCA transforms the original space through rotation of its axes and rescaling of the axes range. The first new PC is aligned with the direction of largest variance in the data. The second PC should also maximize the variance, while being orthogonal to the first, and respectively for the remaining PCs.
Mathematically, these directions can be determined through the covariance matrix, as expressed in Eq. 2: where X f is the mean of all values at feature f and N is the total number of data points (for more details and a modern review on PCA we suggest (Jolliffe and Cadima, 2016)). Once Σ f f is diagonalized, the PCs are its eigenvectors, the first PC being the one with the largest associated eigenvalue and so on.
PCs are uncorrelated and frequently the information is compressed into the first K components, with K M (where M is the total number of features of the original space).
A data point from the original data set can then be reasonably recovered whereX represents the mean of all data points, P m is the m-th PC and c m is the projection of the data point on P m .
Using all the M PCs the reconstruction becomes identical to the original data, but with a new basis that captures a large fraction of the variance in a small number of components K, dimensionality reduction is achieved. In this work we used two approaches to determine K (see Section 3.4).
Further discussion on the importance of PCA and its applications in astronomy can be found in e.g. Sasdelli et al, 2014).

Spatial Approximate Bayesian Inference
Bayesian inference (BI) refers to a family of methods of statistical inference where a hypothetical probabilistic model is updated, following Bayes theorem (Eq. 4), whenever new data is obtained. Bayes theorem allows to calculate a posterior distribution p(θ | y) (the conditional probability of θ occurring given y) by weighting in p(y | θ) (the likelihood of y occurring given θ), p(θ) (estimation of the probability distribution of θ before observing y, also designated as a prior), and p(y) (the marginal probability of y, obtained from integrating θ out of p(y | θ)): This kind of update is of utmost relevance in the dynamical analysis of data streams or in the analysis of correlated data, and has been proposed in astronomy, from the study of variable stars (Zorich et al, 2020) to 3-D mapping of the Milky Way (Babusiaux et al, 2020).
Approximation techniques (Tierney and Kadane, 1986;Hinton and van Camp, 1993;Minka, 2013) have been developed over the years in order to help curb the very time-consuming process of sampling the whole likelihood p(y | θ).
To this end, we here implement the Integrated Nested Laplace Approximation (INLA, Rue et al, 2009).

Integrated Nested Laplace Approximation
INLA is an approximate BI method that accounts for spatial correlations between observed data points to recover an assumed Gaussian latent field and, in doing so, it is not only capable of predicting unobserved points of that field but also of correcting noisy observed ones, as well as associating a variance to p(x x x, θ θ θ | y y y) = p(y y y | x x x, θ θ θ) p(x x x, θ θ θ) p(y y y) ∝ p(y y y | x x x, θ θ θ) p(x x x, θ θ θ), where y y y = (y 1 , ..., y n ) represents a set of observations. Each observation is treated with a latent Gaussian effect, with each x i (a Gaussian distribution of mean value µ i and standard deviation σ i ) corresponding to an observation y i , where i ∈ [1, ..., n]. The observations are conditionally independent given the latent effect x x x and the hyper-parameters θ θ θ, the model likelihood is then: The joint distribution of the latent effects and the hyper-parameters, p(x x x, θ θ θ) can be written as p(x x x | θ θ θ) p(θ θ θ), where p(θ θ θ) represents the prior distribution of hyper-parameters θ θ θ. It is assumed that the spatial information can be treated as a discrete sampling of an underlying continuous spatial field, a latent Gaussian Markov Random Field (GMRF), that takes into account the spatial correlations, and whose hyper-parameters are inferred in the process.
For a GMRF, the posterior distribution of the latent effects is: where Q Q Q(θ θ θ) represents a precision matrix, or inverse of a covariance matrix, which depends on a vector of hyper-parameters θ θ θ. This kernel matrix is what actually treats the spatial correlation between neighboring observations. Using equation 6, the joint posterior distribution of the latent effects and hyperparameters can be written as: Instead of obtaining the exact posterior distribution from equation 9 INLA approximates the posterior marginals of the latent effects and hyperparameters, its key methodological feature is to use appropriate approximations for the following integrals: p(x i | y y y) = p(x i | θ θ θ, y y y)p(θ θ θ | y y y) dθ θ θ (10) p(θ j | y y y) = p(θ θ θ | y y y) dθ θ θ −j , where θ θ θ −j is a vector of hyper-parameters θ θ θ without element θ j .
INLA constructs nested approximations: p(x i | y y y) = p(x i | θ θ θ, y y y)p(θ θ θ | y y y) dθ θ θ , p(θ j | y y y) = p(θ θ θ | y y y) dθ θ θ −j , wherep(· | ·) is an approximated posterior density. Using the Laplace approximation, the posterior marginals of hyper-parameters p(θ θ θ | y y y) at a specific value θ θ θ = θ θ θ j can be written as: p(θ θ θ j | y y y) ∝ p(x x x, θ θ θ j , y y y) p G (x x x | θ θ θ j , y y y) wherep G (x x x | θ θ θ, y y y) is the Gaussian approximation to the full conditional of x x x, and x * x * x * (θ θ θ j ) is the mode of the full conditional x x x for given θ θ θ j . The posterior marginals of the latent effects are then numerically integrated as follows: p(x i | y y y) jp (x i | θ θ θ j , y y y)p(θ θ θ j | y y y)∆ j , where ∆ j represents the integration step.
A good approximation forp(x i | θ θ θ, y y y) is required and INLA offers three different options: Gaussian approximation, Laplace approximation and simplified Laplace approximation (Rue et al, 2009). In this work we used the simplified Laplace approximation, which represents a compromise between the accuracy of the Laplace approximation and the reduced computational cost achieved with the Gaussian approximation.
INLA has been shown (Rue et al, 2009) (Hinton and van Camp, 1993) and expectation-propagation (Minka, 2013), however these methods are not only slower, but they struggle with estimating the variance of the posterior since they execute iterative calculations instead of analytic approximations, unlike INLA (Rue et al, 2009).
INLA suffers nonetheless from some limitations. The first, already mentioned above, is that to get meaningful results the latent field to be inferred must be Gaussian -which is not always the case -, and it must display conditional independence properties; the second is that for fast inference it is required that the number of hyper-parameters (characterizers of the parameter models) should be inferior to 6, and that the number of available observations of the field to infer be much smaller than the size of that field.
INLA is freely available as an R package (Bachl et al, 2019), and it has already been shown to: 1) be capable of recovering structures in scalar and vector fields, with great fidelity, out of sparse sets of observations, and even of inferring structures never seen before; 2) be robust to noise injections (González-Gaitán et al, 2019). We refer to (Rue et al, 2009; Gómez-Rubio, EmulART: Emulating Radiative Transfer 2021) for more details on the mathematical background of INLA and the methods it employs.

Implementation
This section describes both the dataset, the combination of a DVAE/PCA with INLA to enhance low information density SKIRT simulation data cubes and the tools to do so.
Our pilot pipeline aims to emulate radiative transfer models, as such we

Dataset
In this work 30 SKIRT simulations were used for separate purposes. All simulations model a spherical dust shell composed by silicates and graphites surrounding a bright point source with anisotropic emission (Stalevski et al, 2016), as defined by Eq. 17 following Netzer (1987): where θ is the polar angle of the coordinate system. Each realization is a cube of 300 by 300 pixels maps at 103 distinct wavelength bins. The first 39 wavelength bins were discarded (leaving us with 64) for displaying very low signal of randomly scattered emission (less than 0.0001% of the pixels at these wavelengths display flux density different than 0). The final dataset thus includes 90,000 spaxels 5 per cube, each spaxel with 64 fluxes, or "features", at wavelength bins ranging from ∼1 µm to 1 mm 6 .
The 30 realizations differ from each other by up to three parameters: the tilt angle, φ, of the object as seen by the observer (0 • , face-on, and 90 • , edgeon 7 ); the optical depth 8 , τ 9.7 9 , of the dust shell (0.05, 0.1 and 1.0); and the amount 10 of photon packets simulated, N p ∈ {10 4 , 10 5 , 10 6 , 10 7 , 10 8 }. For each particular τ 9.7 and φ combination the corresponding N p = 10 8 realization was regarded as the HPN reference, or "ground truth", for the purpose of evaluating the performance of our routines, since those yield the highest information 5 By spaxel we refer to the array across the spectral dimension of at a given pixel spatial coordinates.
6 This wavelength range covers almost completely the infrared part of the light spectrum 7 The face-on/edge-on terminology refers to the shape of a disk-like object as seen by an observer when tilt angle between the plane of the disk and the plane of the observer is, respectively, 0 • or 90 • .
8 The optical depth, τ , describes the fraction of light that is transmitted through a material following the relation where I0 is the incoming light and I T the outgoing light. 9 Optical depth at wavelength 9.7 µm corresponding to the peak emission wavelength of silicates. 10 Throughout this text the symbol N X will stand for "amount/number/quantity of X". density, while all other simulations, with N p ∈ {10 4 , 10 5 , 10 6 , 10 7 }, were considered LPN simulations. Fig. 1 illustrates the difference between HPN references through different τ 9.7 and φ combinations, while Fig. 2 shows LPN models with differing N p , keeping τ 9.7 = 0.05 and φ = 0 • . Tab. 1 displays the difference between the quality of the individual spaxels 11 that compose each LPN realization and HPN reference; the median, M, and mean absolute deviation (MAD) of the normalized residuals (see Eq. 18) of every pixel within each LPN realization; as well as the total information ratio (TIR) for all realizations, here defined as the ratio of the number of pixels with flux different than 0 of an LPN input or emulation, N X =0 , and that same number for the HPN reference, N X =0 (see Eq. 19), as an information metric to balance against the normalized residuals 12 : .
The realizations were split into two subsets: one to train the autoencoder, and perform the first batch of tests to the emulation pipeline, labeled AESet and described in Section 3.1.1; and another, comprised exclusively by data the autoencoder did not see during training, to better assess EmulARTs performance, labeled EVASet and described in Section 3.1.2. 11 We discriminate spaxels according to their completeness along the spectral dimension. Spaxels that have 0 flux at all wavelengths bins are classified as "null spaxel"; spaxels that have all wavelength bins with positive flux are classified as "full spaxel"; spaxels in between the two previous cases are classified as "partial spaxel"; finally, the "empty spaxel" information metric is the difference between the amount of null spaxels within the HPN reference and a given LPN realization.
12 Should the estimation, X', be 0 the normalized residual will be 100%, meaning there is no actual estimate for the reference value, X.  The quantity and quality of the available information is further described by the total information ratio (TIR), and the amounts of different qualities of spaxels. Null spaxels are spaxels with zero information; empty spaxels are null spaxels that were not tagged as such for the HPN reference; full spaxels are spaxels that have information at every wavelength and, partial are those spaxels that are neither null nor full.

AESet
This subset is comprised of 5 SKIRT simulation outputs of the same model, a spherical shell of dust composed by silicates and graphites, with optical depth τ 9.7 = 0.05, surrounding a bright point source with anisotropic emission seen face-on, φ = 0 • , making the emission appear isotropic. The only different parameter across the realizations in AESet was N p ∈ {10 4 , 10 5 , 10 6 , 10 7 , 10 8 } (see Fig. 2). The N p = 10 8 realization is the HPN and was used as the reference, or "ground truth", for the purpose of both training the autoencoder model as well as evaluating the performance of EmulART.
Since the goal of our methodology is to reconstruct the reference simulation The spaxels of all cubes within AESet were used to train the DVAE model.
For this we split AESet into training set (5/6) and test set (1/6). Spaxels of different cubes but which share the same spatial coordinates were assigned to the same ensemble. This strategy aimed for the DVAE model to achieve a denoising capability and consists on having input spaxels that result from realizations with different N p always match the HPN references version on the output layer.
In SKIRT, choosing to simulate fewer photons results in an output with more zero flux density pixels which in turn means that more spaxels will be null at more wavelength bins. Even though this is different than noise, it is akin to missing data whose impact we aim to curb by implementing a DVAE architecture.
Before training the DVAE model we perform some spaxel selection and pre-processing tasks on AESet which is thoroughly described in Appendix A.

EVASet
This subset is comprised of the 25 SKIRT simulation cubes that also model a spherical shell of silicates and graphites surrounding a bright anisotropic point source but have different combinations of φ, τ 9.7 and N p from those in AESet.
To all realizations in EVASet we performed the same feature selection and flux de-normalization tasks described both in Section 3.1.1 and Appendix A.
The 20 LPN realizations within EVASet(N p ∈ {10 4 , 10 5 , 10 6 , 10 7 }) were used as input for EmulART for a deeper assessment of the capabilities of our emulation pipeline. The remaining 5 HPN cubes (those with N p = 10 8 ) were used as references to compute performance metrics.
A list detailing the parameters of the SKIRT simulations used in this work, as well as their split into AESet and EVASet, can be consulted in Appendix B. We measured performance by the percentage residuals of both the individually reconstructed spaxels, of the test subset of AESet, and of each of AESets cubes integrated SEDs. We selected the set of hyper-parameters listed below for being the most consistent across different tests. Fig. 4 shows the validation loss closely following the training loss, indicating successful convergence of the model to our data.

Training the DVAE
• Latent feature amount: 8 14 Number of samples that will be passed through the network at one time. An epoch is complete once all samples are passed through the network. Increasing the batch size accelerates the completion of each epoch but it may also degrade the quality of the model. 15 Limit, between 0 and 1, for the weight of bias neurons. 16 Learning rate is a parameter that determines the step size to take, at each iteration, in the direction determined by the optimization function so as to reach the minimum of the loss function.
17 Number of epochs to wait before implementing a change or stopping the training procedure.
• Activation function: SELU (Klambauer et al, 2017) and sigmoid (for the output layer only) • 18 A custom variation of the one presented in Keras documentation as the original version of this loss function makes use of the mean squared error instead.
19 A stochastic gradient descent method based on adaptive estimation of first-order and secondorder moments. 20 Number of epochs to wait, with no significant improvement in validation loss, before reducing the learning rate.
21 Number of epochs to wait, with no significant improvement in validation loss, before stopping the training process.
22 Ratio between the new and old learning rate.

DVAE Emulation Pipeline
Within EmulART the feature space is first compressed with the variational encoder; the latent space is then sampled and the resulting latent features spatial maps are reconstructed with INLA; finally the reconstructed wavelength (original feature space) maps are recovered with the decoder. Additionally, to conform the data to each of the different stages of the pipeline, we perform some operations described below.
2. Determine the value range, R f , of the map, Offset the value range of the map so that the new minimum is 0, For the offset map, Z f (x, y), determine the minimum positive value, m 1 f , and divide it by R 2 f to obtain the new minimum , m 2 f , 5. Obtain the final map, Z f (x, y), by offsetting the value range by m 2 f ,

PCA Emulation Pipelines
For the PCA emulation pipeline a new PCA model of the input feature space is constructed every time (unlike with the EmulART which DVAE model was trained on a subset of AESet, as described in Section 3.2). Once that model is constructed two approaches are followed: the first being the usage of the elbow method (see Fig. 6) to find the threshold number of components, K, that would explain the data without over-fitting it, using INLA to spatially reconstruct those components maps, and then return to them to the original feature space. In the second approach, 8 principal components are used (the same number as the latent features in the DVAE emulation pipeline).
After some preliminary tests, the data range transformations described in the previous section, before and after spatial reconstruction, were also removed as they failed to lead to execution time or reconstruction accuracy improvement. Moreover, spaxel de-normalization (see Section 3.1.1) was removed as it lead to the underestimation of the integrated SEDs of the reference model (possible reasons for this are presented in Section 4.1.1).
A scheme presenting the most relevant operations for both implementations of the PCA emulation pipeline can be found in Appendix D.

Results and Discussion
In this section we present and discuss some of the results from testing Emu-lART on AESet, these are compared against the results obtained with the PCA emulation implementations, and EVASet. Our goal in first using AESet was to evaluate the performance of the pipeline as a whole, mostly because the decoder network was not trained on latent and INLA reconstructed spaxels.
The realizations from EVASet were then used to better gauge its performance.
We created the emulations using different LPN inputs (N p ∈ {10 4 , 10 5 , 10 6 , 10 7 }). Because INLA performs faster on sparse maps, for each of those LPN realizations an emulation was performed by sampling different percentages of each latent feature map. These sampling percentages resulted from sampling 1 pixel in each bin of 2 × 2, 3 × 3 and 5 × 5 pixels, corresponding respectively to 25%, 11% and 4% of the spatial data. With the intent of reducing the influence of null spaxels in the spatial inference, 90% of the null spaxels were rejected from each map sampling pool.
Our analysis of the results consisted on inspecting how well EmulART reproduces the spectral and spatial features of the reference simulations, as well as the total computational time it took for the emulation to be completed.
To evaluate the spectral reconstruction we looked at the normalized residuals (see Eq. 18) between the integrated SEDs 13 of our emulations and of the HPN reference. We also inspected the spatial maps of the compressed features looking for spatial distributions compatible with physical properties of the simulated model. The spatial reconstruction was also evaluated by the median and MAD of the normalized residuals of our emulations as well as of their LPN inputs 26 at each wavelength. For the statistical analysis of the residuals, reference pixels with value 0 were not considered since this metric diverges, so the TIR for all emulations and simulations was calculated as well.

AESet Predictions
In this section we present and discuss the results obtained emulating the HPN reference of AESet using the different LPN realizations within it.
The upper panels of Fig. 7 show that the emulations integrated SEDs reproduce the shape of the references: a slow rise in the 1 µm to 8 µm range, the two emission bumps in the 8 µm to 20 µm range, and the steep decline towards longer wavelengths. Moreover, Tab. 2 displays the median and MAD of the residuals of the integrated SEDs for the LPN input realizations before and after being de-normalized (as we describe in Section 3.1.1), as well as those of the different emulations obtained from them. It is clear that using N p ≥ 10 6 realizations as input yields emulations integrated SEDs that closely (median residuals smaller than 15%) follow the references throughout the whole wavelength range, independently (within this subset) of the sampling percentage chosen for the spatial inference task.
From the residuals of the emulations integrated SEDs, shown in the lower panels of Fig. 7, we conclude that: shorter wavelengths yield higher residuals; more input data for the spatial reconstruction does yield a better emulation but at the cost of an increased run time 27 , as can be confirmed in Tab. 3; and, that the usage of the N p = 10 6 realization as input greatly improves the quality of the emulation in relation to the two lowest photon number alternatives.
Looking at Tabs. 1 and 3 we can also compare the overall performance of the pipeline at estimating information that was not available in the LPN 27 INLAs reconstruction is as influenced by the amount of data it takes as input as by how that data is spatially distributed. Though on average having a larger uniform sample will be better than having a smaller uniform sample, it is possible that a particular smaller sample exists with a distribution that better captures the information of the field and that yields a better reconstruction. In the present case, we avoid this variability by using a regular sampling grid.  Table 2: Comparison of statistics for the residuals of the integrated SEDs, for the case of a dust shell with τ 9.7 = 0.05 and φ = 0 • , for the different LPN realizations (columns 2 and 3), for those same realizations but after they have been de-normalized (columns 4 and 5), as described in Section 3.1.1, and for the resulting emulations (columns 7 and 8), while using different sampling amounts (column 6) for the spatial reconstruction.

Residuals of
input realizations. The amounts of differently classified spaxels in both LPN inputs and respective emulations show, together with median of the normalized residuals, that EmulART successfully estimates information missing from the input. Fig. 8 shows the comparison, at wavelength 9.28 µm, between the emulations resulting from the LPN inputs (see Fig. 2) and the HPN reference. Once again we can see that with the N p = 10 6 realization the emulations start to display resemblances to the HPN reference not only in the range of flux density values but also in the morphology that emerges from their distribution.  Table 3: Statistics regarding how the different emulations, of the case of a dust shell with τ 9.7 = 0.05 and φ = 0 • , compared to the HPN reference. The median normalized residuals and TIR display great improvement against all LPN realizations (see Tab. 1). The emulations that used N p = 10 6 and N p = 10 7 realizations as input display a negative amount of empty spaxels, indicating that beyond reconstructing all empty spaxels it also inferred some of the references nulls.
From Tabs. 2 and 3, as well as from Figs. 7 and 8, it would be natural to conclude that using the N p = 10 7 realization as input would bring the most benefit in terms of the amount of information inferred by the emulation as well as its accuracy. Moreover, as can be seen in the second column of Tab. 3, the run time of the emulations is more dependent on the amount of information sampled for the spatial reconstruction than on the N p of the LPN input. Nevertheless, considering how SKIRTs run time for a model scales with the simulated N p , the choice of LPN input to use with EmulART should weight the time it takes to produce that LPN input as well as the quality of the emulation we expect from it.

PCA Pipeline Predictions
The results of testing the PCA pipelines on AESet were processed in the same way as EmulARTs (statistical indicators regarding residuals of the emulation and of its spatial integration were calculated as well as the TIR; the number of different types of spaxels and execution time were measured).
The execution times 28 for the pipeline implementing an 8 PC model were indiscernible from the execution times of EmulART on the same input, while the execution time for the pipeline implementing a PCA model with number of components, K, determined by the elbow method was in general much higher since for all except one of the LPN inputs K was larger than 30 (the time per component map was very similar depending mostly on the amount of spatial information sampled).

Tabs. 4 and 5 present the most significant indicators to be compared to
EmulARTs. At first sight the performance of both PCA models on the emulation pipeline seems to be similar to, and in some cases even better than that of EmulARTs, showing statistically similar residuals, and presenting residuals regarding the spatially integrated SEDs 2 to 3 times lower. They are however unable to consistently recover complete spaxel information which then leads to poor spatial reconstructions at longer wavelengths, even when using K > 8 (see Fig. 9), as can be inferred from the TIR values (as well as the number of full and partial spaxels).
28 These do not include the time it took to train the PCA models. Training time for each PCA model was below 10 seconds, while training the DVAE took around 18 hours (∼ 6500× more). This was expected due to the methods themselves as well as the differences between training procedures. The DVAE model was trained with 6× the amount of data as each PCA model, with the purpose of being able to generalize across simulations resulting from different photon packet numbers; while each PCA model was trained on LPN simulation that was used as input for emulation, to compared against the results of EmulART.  In the present application we thus find the implementation of a DVAE model for spectral compression to be justified. Unlike the PCA models, it not only captures non-linear relationships between the spectral features but also achieves comparable results. Despite some loss regarding the reconstruction of integrated SED profile, when compared to PCA models, it achieves equal/higher compression rate, lower/equal execution time and most importantly the spatial structure can successfully be reconstructed by the remaining parts of the pipeline.
As such, the test results of EmulART on EVASet are compared against its results on AESet. Testing results obtained with PCA emulation pipelines on EVASet did not offer a perspective different from the one above and as such will not be discussed further here (a successful implementation of PCA in an emulation pipeline is described in Smole et al (2022, submitted)).

EVASet Predictions
In this section we present and discuss the results obtained by emulating the HPN references of EVASet using the respective LPN realizations within it.
The DVAE model was not trained on any data within this set, which allows us to evaluate whether it manages to accurately predict HPN-like spaxels from LPN spaxels that originate from simulations with different τ 9.7 and φ values.
First we tested EmulART on realizations with τ 9.7 = 0.05 and φ = 90 • ; we then applied the pipeline to different LPN realizations with τ 9.7 ∈ {0.1, 1.0} Similarly to the τ 9.7 = 0.05 and φ = 0 • emulations, the edge-on, φ = 90 • , emulations appear to preserve well the spectral information, reproducing the slow rise in the 1 µm to 8 µm range, the two emission bumps in the 8 µm to 20 µm range, and the steep decline towards longer wavelengths, as can be seen in Fig. 10. Tab. 6 shows that 4% sampling of the spatial information of the N p = 10 6 realization is enough to get median integrated residuals below 15%.
We note however the abnormal performance of the emulations that took as input the N p = 10 7 realizations, displaying higher median integrated residuals than the ones that used different samplings of the N p = 10 6 LPN. This may indicate that one or more of the spatial data manipulation modules, or their interface, should be improved upon.  Table 6: Comparison of statistics for the residuals of the spatial integration SEDs, for the case of a dust shell with τ 9.7 = 0.05 and φ = 90 • , for the different LPN realizations (columns 2 and 3), for those same realizations but after they have been de-normalized (columns 4 and 5), as described in Section 3.1.1, and for the resulting emulations (columns 7 and 8).
As for the emulations using as input LPN realizations of simulations with τ 9.7 ∈ {0.1, 1.0}, at both tilt angles, we observe that both the shape and flux density value range of the integrated SED degrades as τ 9.7 increases. As shown in Fig. 11, the emulations integrated SEDs fail to reproduce the shape of the HPN references, reproducing instead the shape that characterized the realizations present in AESet, a clear sign of over-fitting of the DVAE model. This can be solved by expanding the training set of our DVAE architecture to include spaxels originating from simulations with different optical depths.
For τ 9.7 = 1.0, with both φ cases, we observe the influence of the first wavelengths (see Fig. 12) in the overall residuals 29 . Fig. 13 shows that though the overall morphology of the spatial distribution is well recovered the value range for the emulations flux density value range is drastically underestimated while the contrast between the central and peripheral regions is considerably higher than what the HPN references display.
These results appear to show that our pipeline is capable of recovering 40% to 60% of the emergent spatial information of HPN MCRT models from LPN realizations, taking as input as little as 0.04% of the information that would be present in the HPN model, all while preserving 85% to 95% of the spectral information.
Furthermore the results also show a clear bias in the performance of the DVAE model as a compressor and decompressor of spectral information, with the performance degrading substantially as the LPN inputs models depart from the optical depth, τ 9.7 , value present within the training set.
Further details of the results we obtained with AESet and EVASet are discussed in Appendix E.
29 For more details consult the data products available at repository. Fig. 7: Emulations integrated SEDs resulting from spatial inference using 4% (7a), 11% (7b) and 25% (7c) samples of the spatial information, and the respective normalized residuals, for the case of a dust shell with τ 9.7 = 0.05 and φ = 0 • . The HPN reference, is represented in black (•), the emulation based on the N p = 10 4 realization is in red ( ), on the N p = 10 5 in green (+), on the N p = 10 6 in blue (×) and on the N p = 10 7 in cyan ( ).

Summary
We report the development of a pipeline that implements in conjunction a denoising variational autoencoder (or alternatively PCA), and a reconstruction method based on approximate Bayesian inference, INLA, with the purpose of emulating high information-like MCRT simulations using LPN MCRT models, created with SKIRT, as input. With this approach we aim for the hastily expansion of libraries of synthetic models against which to compare future observations. By producing positive preliminary indicators we show that such a framework is worth pursuing further, with multiple alleys to explore.
Conditions for systematically measuring the computational cost are necessary to properly evaluate the merit of this approach. However, in this work our aim was to qualitatively assess the potential of this method to be applied to MCRT simulations images. In this pilot study we chose a very simple model of a centrally-illuminated spherical dust shell which is computationally inexpensive, whose reference simulations took around two hours to be computed. Thus, in our particular examples we reduced the computational time by approximately 6×. Nevertheless, the computational cost of SKIRT simulations scales with the amount of photon packets simulated, the spatial resolution of the grid and the actual geometrical complexity of the model. As for our emulation pipeline, its computational cost is only impacted noticeably when increasing the size and density of the spatial grid to be processed by INLA. This leads us to believe that a generalized version of this pipeline may expedite, by up to 50×, the study of dust environments through this kind of radiative transfer models.
Further exploration of the proposed DVAE architecture is being undertaken, via expansion and diversification of the training set to improve the prediction of the spectral features.
Other approaches to be tested include the use of dropout (cutting the connections, mid training, whose weights are below a given threshold) and incremental learning (training a model with new data with the starting point of the network being the weights obtained in a previous training session). The first serves as feature selection tool, removing those of little importance, which also prevents the model from over-fitting the training data; the second would be of great use to quickly adapt an already trained model to new data, which is important in the context of emulating simulations.
To improve the reconstruction of the spatial features, using non-uniform sampling methods based on the information spatial density may help improve the reconstruction of the latent features that result from the compression of models simulated with insufficient number of photon (N p < 10 6 , in the particular case of our study). Alternative pre-INLA data pre-processing, such as data value range manipulations and sampling grids, may also be worth exploring.

Declarations
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
Appendix A Data Pre-processing

A.1 Identifying and Sampling Null-Spaxels
The SKIRT simulations of the spherical simulated models result in a peripheral region filled with spaxels with value 0 at all wavelength bins, these will be henceforth be referred to as null-spaxels. We perform two routines to identify and filter out null-spaxels, to curb their potential impact in both training the DVAE model and when (within the emulation procedure) reconstructing the spatial information of the latent feature maps: 1. Each feature (wavelength) of each spaxel is checked, if all features of a spaxel have value 0, then that spaxel is temporarily tagged as null-spaxel; 2. An iterative process then checks the immediate neighborhood of each previously tagged spaxel (a 5pix×5pix grid centered at the spaxel being tested) for other tagged spaxels, if a minimum of 4 neighbors share the null-spaxel tag, then that spaxel is tagged as a true null-spaxel, otherwise its temporary tag is removed.
The second routine serves as a way to discriminate spaxels that have zero information at given wavelengths, possibly from the spaxel being the result of a low photon count simulation, from actual null-spaxels -spaxels resulting from the lack of photons traveling from the source to the observer through that direction, neither by direct emission nor by scattering nor by absorption and re-emission by dust. Fig. A1 illustrates the results of routine 1 and 2.

Fig. A1
: On the left, in black, the spaxels temporarily tagged as null-spaxels by routine 1; on the right, also in black, the final selection of null-spaxels that resulted from applying routine 2 with a search grid of 5pix×5pix around each temporary null-spaxel. The result on the right shows that for this particular case of a spherical shell of dust surrounding an isotropic emitter we only find null-spaxels on the outskirts of the shell, as expected.
After identifying the null-spaxels we sample them to incorporate the filtered dataset.
For the DVAE training, complete removal of these null-spaxels could be done but we decided to reduce their incidence rate instead. Since there is no guarantee that their latent representation (where the spatial reconstruction will take place) will also be a null-vector we want the DVAE model to learn nullspaxels, too many of these will nevertheless be uninformative and may lead to unintended biases. For this reason we sample 10% of them, reducing the nullspaxel prevalence in the training dataset from ∼25% to ∼2.5%. Empty spaxels (spaxels tagged as null for LPN realizations but non-null for the respective HPN reference) were completely removed from training.
For the emulation process, and more specifically for the spatial reconstruction of the latent feature maps, we sample 10% of the null-spaxels selected by the spatial sampling grid, for INLA is faster with little data and that amount is sufficient for the null-region to be properly recovered 30 . After the emulation process, spaxels that were at this stage identified as being null-spaxels have their values changed to 0.

A.2 Reshaping Data Structures
Once all spaxels are selected the data cubes are reshaped into a 2D structure, with a given pixel corresponding to a row with its spectral information on the columns (see Fig. 5), to match the input format of the DVAE. The inverse reshaping, from a 2D to a 3D structure, is performed after the DVAE output layer.
Within the emulation pipeline the data is reshaped two more times. First, after sampling the latent feature distributions (to obtain the Z arrays, as seen in Fig. 5) the resulting sample is reshaped from a 2D to a 3D structure to obtain latent feature spatial maps (the 8 1D arrays become 8 2D maps) whose spatial information can then be reconstructed with INLA. The second reshape is performed on INLAs reconstructed results to conform that data to the decoder accepted input format, once again going from a 3D to a 2D structure.
30 Different simulated objects with different distributions of regions with 0 emission may need a higher sampling of null-spaxels.

Appendix B Dataset
Tab. B1 presents the list of SKIRT simulations used in this work. Every realization models a spherical dust, composed by silicates and graphites, surrounding a bright point source with anisotropic emission, differing in parameters such as the tilt angle, φ, the optical depth, τ 9.7 , and the amount of photon packets simulated, N p .  Table B1: Parametrical description of the SKIRT realizations used in this work, along with the distribution of these realizations by subset. Realizations marked with (*) were not parsed as input to EmulART while testing the pipeline, being used merely as the HPN reference for the purpose of computing performance metrics.

Appendix C Latent Features
In this section we briefly discuss the relationship between each of the compressed features, as well as how they may relate to some physical attributes of the simulated models.
The PCC measures how strongly linearly correlated different parameters are, and is defined as the ratio between the covariance of the parameters and the product of their standard deviations (see Eq. C1):  In the map of latent feature Z1 (Fig. C2a) we can relate the spatial distribution to the hot dust emission, at wavelengths 1-3µm, of the region close to the sublimation zone around the central source. Conversely, in the map of latent feature Z2 (Fig. C2b), the larger structure further away form the center, which corresponds to the colder dust emission at longer wavelengths is outlined.
As mentioned in Section 3.1 the model used along this work is that of a bright point like source with anisotropic emission (Stalevski et al, 2016), where the emission is defined by Eq. 17, following Netzer (1987). Looking at the map of latent feature Z8 (Fig. C2c) the geometric signature of the model is clearly identifiable.
31 Though our pipeline yields 8 latent features for every spaxel, we chose to only show these 3 maps since the remaining latent feature maps either were very similar to these or no other information could be extracted from them.

Appendix E Result Analysis
This section presents a detailed view of the results obtained with both AESet and EVASet. As mentioned in Section 4, for every τ 9.7 and φ case the realization with N p = 10 8 was considered as the HPN reference, or "ground truth".
The remaining LPN realizations and the emulations obtained from those were compared to the HPN reference in regards to their spectral and spatial features.
For the spectral analysis we computed the normalized residuals (Eq. 18) of the spatially integrated SEDs, compared the shape of those SEDs and inspected the spatial maps of the DVAE compressed features looking for distributions compatible with the physical properties of the HPN reference.
For the spatial analysis we computed the normalized residuals and measured statistical descriptors -median and mean absolute deviation (MAD)for each wavelength bin spatial map, for both LPN inputs and resulting emulations. To measure the impact of INLA on the reconstruction of input spaxels we count how many full, partial and null spaxels are present in the HPN reference, LPN input and the resulting emulations, and compute the TIR of each.

E.1.1 Spectral Predictions
In this section we discuss how well the spectral information is preserved, after the LPN inputs of AESet are processed by EmulART, by looking into the emulations spatially integrated SEDs.
From both Fig. 7 and Tab. 2 we determine the N p = 10 6 realization, with spaxel sampling of 4%, to be the most favorable input for our pipeline. This simulation is the result of 100× less photons simulated than the reference, and with the corresponding emulation using as input 4% of its spaxels which results in 2500× less input information than the present in the HPN reference.
This emulation presents median residuals of ∼85% between 1 and 4µm, ∼33% between 4µm and 1mm and ∼42% over the whole spectral range, and 2.8% when considering the SED that results from the spatial integration of each wavelength map (less than 2% worse than the realization which was used as input). It also reconstructs over 71,000 spaxels presenting almost two times the amount of full spaxels of the reference. Appendix F shows how the pipeline goes even further with the emulated individual spaxels presenting smoother profiles than the references.

E.1.2 Spatial Predictions
This section looks at the reconstructions of sparse samples of pixels from latent feature maps and the spatial distributions that result after returning those to the original spectral space.
The impact of INLAs spatial reconstruction is evidenced by the emulations residuals having lower median values than all realizations but the reference, and mean deviations of the same order of magnitude. In the case of the emulations performed using the N p = 10 7 realizations as input that is even clearer, with median of the spatial residuals at wavelengths greater than 5 µm going below 30%. Furthermore, as Fig. E1 shows, there are two transitions in quality evidenced by the gaps between the median residuals. It is also shown that all lower photon count simulations have their residuals dominated by lack of information (the reader is reminded that a 100% normalized residual means that pixel holds no information), and that the spatial inference at lower wavelengths results in large over-estimations. The absence of information which heavily influences the statistical descriptors of the simulations residuals is thus curbed. This is confirmed when comparing the TIR values, where we can see that the emulation based on the lowest N p realizations has a TIR of 98.3% compared to the reference realization (see Tab. 2), while the N p = 10 7 realization's is 65.2% (see Tab. 1).

Fig. E1
: Median of the normalized residuals, at each wavelengths spatial map, for every emulation obtained with N p = 10 4 (red), N p = 10 5 (green), N p = 10 6 (blue) and N p = 10 7 (cyan) realizations, for the case of a dust shell with τ 9.7 = 0.05 and φ = 0 • . Emulations whose spatial inference was performed using as input 25% of data are represented by ( ), 11% by (+) and 4% by (×). The interrupted black line marks the same metric for the LPN realizations. Fig. E2 shows the spatial distribution of the emulations residuals at some wavelengths, it is clear that: • The residuals are correlated with the distance towards the center of the object, with the highest residuals populating the outer edge and the lowest in the center.
The above results were expected given that: we performed a uniform spatial sampling of an object central to a spherically symmetric dust distribution; more spatial information is available at the center of the simulated object than in the outskirts; and, at shorter wavelengths there is overall less information, as seen in the previous section, from which to build the spatial reconstruction. : Spatial distribution of residuals, as defined in Eq. 18, for the emulation produced using as input a 4% sample of the N p = 10 6 realization, for the case of a dust shell with τ 9.7 = 0.05 and φ = 0 • , at wavelengths 0.84 µm (Fig.  E2a), 1.85 µm (Fig. E2b), 9.28 µm (Fig. E2c) and 211.35 µm (Fig. E2d).
It is harder to assess how accurate INLAs inference of the outskirt regions is. Those regions, given the spatial distribution of the simulated model, tend to be less dense. Nevertheless, at most wavelengths and most of the spacial distribution of the object, the residuals place the emulated value at the same order of magnitude as the reference values. Moreover, it is in these regions that INLA recovers more information, filling in the gaps.

E.2.1 Spectral Predictions
In this section we discuss how well the spectral information is preserved, after the LPN inputs of EVASet, with τ 9.7 = 0.05, are processed by EmulART, by once again looking into the emulations spatially integrated SEDs.
Regarding the case with τ 9.7 = 0.1, Fig. E3 and Tab. E1 show that, for both tilt angles, while the shape of the emulations integrated spectra still follows closely that of the references, the residuals are now over 2× higher than with τ 9.7 = 0.05. The results for the cases with τ 9.7 = 1.0 confirm this trend in degrading quality of the pipeline's spectral predictions. As seen in Fig. 11, the emulation integrated spectrum completely misses the shape of the reference spectrum, keeping the shape displayed for the cases with τ 9.7 = 0.05 and τ 9.7 = 0.1, a sign of over-fitting which is confirmed when inspecting individual spaxels (see Appendix F). It also underestimates the flux density and overestimates the relative magnitude of the two emission bumps in the 8 µm to 20 µm range.
The median residual of the total flux density per wavelength, displayed in Tab.
E1, further confirms the poor quality of the spectral predictions.

E.2.2 Spatial Predictions
Regarding spatial information, for emulations with LPN inputs of τ 9.7 = 0.05 and φ = 90 • we observe the same behavior in the performance metrics as with the φ = 0 • case. Tab. E2 shows that the overall median residuals decrease with the increase of N p . The median residuals per wavelength, displayed in Fig. E4, present a similar behavior to those of the φ = 0 • case. E3: Emulations integrated SEDs, resulting from spatial inference using 25% of spatial information, and the respective normalized residuals, for the cases of dust shells with τ 9.7 = 0.1, φ = 90 • (Fig. E3a) and φ = 0 • (Fig.  E3b). The HPN reference is represented in black (•), the emulation based on the N p = 10 4 realization is in red ( ), on the N p = 10 5 in green (+), on the N p = 10 6 in blue (×) and on the N p = 10 7 in cyan ( ) .  Fig. E6 the prediction for the peripheral region is more isotropic than the reference, this is even clearer when consulting the spatial distribution of residuals for these emulations in Fig. E7.
As with the τ 9.7 = 1.0 cases, for both tilt angle cases of τ 9.7 = 0.1 we observe a similar trend in the overall residuals. Fig. E9 shows that while the overall morphology of the spatial distribution is well recovered, the flux density value range for the emulations is underestimated, though not as considerably as with the τ 9.7 = 1.0 cases, and the contrast between the central and peripheral regions is higher than in the HPN references. This indicates a bias within 5.    E4: Median of the normalized residuals, at each wavelengths spatial map, for every emulation obtained with N p = 10 4 (red), N p = 10 5 (green), N p = 10 6 (blue) and N p = 10 7 (cyan) realizations, for the case of a dust shell with τ 9.7 = 0.05 and φ = 90 • . Emulations whose spatial inference was performed taking as input 25% of the data are represented by ( ), 11% by (+) and 4% by (×). The interrupted black line marks the same metric for the realizations used as input.
For both the face-on and edge-on cases, at all displayed positions, we can make the same assessment, that the shape of the emulated spaxels does not follow the shape of the HPN references. As the simulated models dust density increases so does the likelihood of scattering and absorption of light by the medium, consequently the peak emission features are flattened into the continuum. Figs. F4 and F5 show that the emulation of the spaxels in those regions (a) Np = 10 4 (b) Np = 10 5 (c) Np = 10 6 (d) Np = 10 7 Fig. F1: Comparison of some emulations spaxels located in row = 150, col = 150, against the HPN references and the LPN input (indicated in each plot), for the case of a dust shell with τ 9.7 = 0.05 and φ = 0 • . The emulations were performed using 25% of the LPN input spatial information. HPN references spaxel in black (•), emulations in gold ( ) and LPN inputs, upon which the emulation was performed, in pink (+).
is one to two orders of magnitude too low while the relative flux between the emission peak, at 9.7 µm, and the continuum is over-estimated, the allocation of flux along the spaxels does not match that of the reference. For the outer spaxels, F6, the flux mismatch is greater for longer wavelengths but still the emulation spaxels present a peak emission feature while the references no longer show it. This is a clear sign that the DVAE model is biased.