Decoding Quantum Field Theory with Machine Learning

We demonstrate how one can use machine learning techniques to bypass the technical difficulties of designing an experiment and translating its outcomes into concrete claims about fundamental features of quantum fields. In practice, all measurements of quantum fields are carried out through local probes. Despite measuring only a small portion of the field, such local measurements have the capacity to reveal many of the field’s global features. This is because, when in equilibrium with their environments, quantum fields store global information locally, albeit in a scrambled way. We show that neural networks can be trained to unscramble this information from data generated from a very simple one-size-fits-all local measurement protocol. To illustrate this general claim we will consider three non-trivial features of the field as case studies: a) how, as long as the field is in a stationary state, a particle detector can learn about the field’s boundary conditions even before signals have time to propagate from the boundary to the detector, b) how detectors can determine the temperature of the quantum field even without thermalizing with it, and c) how detectors can distinguish between Fock states and coherent states even when the first and second moments of all their quadrature operators match. Each of these examples uses the exact same simple fixed local measurement protocol and machine-learning ansatz successfully. This supports the claim that the framework proposed here can be applied to nearly any kind of local measurement on a quantum field to reveal nearly any of the field’s global properties in a one-size-fits-all manner.


I. INTRODUCTION
Our current best understanding of nature comes from quantum field theory (QFT).However, the process of obtaining experimental information through measurements from QFTs is arguably difficult to formalize.For example, projective measurements in QFT are incompatible with its relativistic nature: they cannot be localized [1], they can introduce ill-defined operations [2] and can enable superluminal signaling even in simple setups [3][4][5].Nevertheless, from high-energy physics experiments at the LHC to the capture of light at the human retina, quantum fields are subject to measurements where data is extracted through their interaction with localized probes.Such probes (e.g., atoms being excited by the electromagnetic field) can be generally modeled by particle detectors [6][7][8].Particle detectors perform indirect measurements on the field that are well-defined [9][10][11][12][13][14][15] and physically meaningful [16,17], allowing us to formulate a consistent measurement theory for quantum fields [18,19].
However, it is not always obvious how we are to translate outcomes of measurements performed on local probes into claims about concrete features of a quantum field.This is even more difficult if the measurements are local (as they are since realistic probes are necessarily localized systems) and we want to determine global properties of a field.Luckily for us, a thermalized quantum field stores information about its global structure locally, albeit in a very scrambled way [20][21][22][23][24][25].The question is then how sophisticated must our local measurement protocol (and the following data analysis) be in order to determine something of interest about the field from local measurement data.It is thinkable that if one is allowed sufficient measurements on a sufficiently large array of probes, one should be able to resolve any feature of interest of a quantum field, at least in principle.We say 'in principle' because-except for very few simple casesthere is usually no direct way of translating the theoretical predictions of particle detectors into specific values for the targeted features or parameters of the field.That makes it even more challenging to translate the readouts of local detectors into concrete claims about the field that they measure.Typically, the best one can do when trying to distinguish different field configurations is to look at differences in the detectors' transition probabilities (see, e.g., [8,[26][27][28][29][30]).
Moreover, the probes we use to measure quantum fields are usually simple in nature, and certainly much simpler and with smaller Hilbert spaces than the QFT itself.Additionally, in most cases we will not have access to a large array of probes with which to sample the field, but rather only a handful.Because of this, translating measurement data (e.g., a set of zeros and ones generated by measuring a two-level particle detector) into concrete claims about the field seems, a priori, a very complicated task.
In general, one may not expect a context-free solution to this question.One might expect that for each feature of the QFT we may be interested in, we will need to define 1) a different local measurement protocol and 2) arXiv:1910.03637v3[quant-ph] 9 Aug 2023 a different data analysis procedure.This assumption, i.e., that in general one needs different experiments and data processing techniques to measure different things, is arguably accepted without questions in experimental physics.Our goal here is to show that this is not the case.Concretely, we will show that fixing a simple local measurement protocol and a basic machine-learning data analysis ansatz is enough to unravel a wide variety of nonlocal features of the QFT with a one-size-fits-all method.
The core of this paper is to develop a universal framework to extract information about a QFT through local measurements.To demonstrate the breadth of this framework we will apply it to three examples.Namely, we will show how local probes can 1) learn about boundary conditions even before a signal has time to propagate from the boundary to the probe, 2) learn the KMS temperature of a field with great accuracy even before a probe has time to thermalize with the field, and 3) distinguish Fock and coherent states even when the first and second moments of all their quadrature operators match.
Our goal is to show that combining machine learning techniques with detector model tools from QFT allows us to avoid the complexity of designing specific local measurement protocols and translating their outcomes into concrete claims about the field.Rather, one can use a simple all-purpose measurement protocol.This offloads all the complexity to the data processing, which is handled via machine learning.The success of this framework in the three substantially different case studies presented should also be strong evidence of its generality: it can be applied regardless of the specific targeted feature of the QFT, hence in a universal, context-free manner.

II. GENERAL FRAMEWORK
In this Section we will introduce the general measurement protocol and data processing framework that constitute the core of our proposal.To make things concrete, let us assume that we have some foliation of spacetime associated to the coordinate frame (t, x) and a probe system, D, coupled to a quantum field, φ(t, x), in a local way.In particular, we will take the probe to be coupled linearly to the field via one of its observables μd (t) in the interaction picture1 as Ĥint = λ χ(t) where the switching function χ(t) and the smearing F (x) characterize the locality of the interaction in time and space respectively.This coupling is motivated by the Unruh-DeWitt model [7,50], which captures the fundamental features of the light-matter interaction when exchange of angular momentum can be neglected [16,17,51,52].Imagine that we are interested in some global property of the field, for instance we might want to know whether this field lives in a spacetime with an open topology versus a closed one.To determine this, we might carefully design a measurement protocol, M (top), where we let a probe (or an array of them) interact with the field in a localized manner, and then perform measurements on the probe (or array of probes) to extract some data D ∈ R Nm where N m is the number of measurements performed on the probe.A very simple example would be if M (top) specifies that we are to couple one single probe to the field for 1 second and then measure μd , repeating this process N m times.
In addition to specifying the data collection process, we must also design a data-analysis function, i.e., f top,M (top) : D → {open, closed}, which will (hopefully) determine with high accuracy the topology of spacetime from the data.Here, f x,M is a data analysis function which produces claims about X from data generated by the measurement procedure M .Simply put, given a feature of interest X, M (x) is the measurement scheme designed by the experimentalist to inform us about X specifically, and f x,M (x) is the "dictionary" that translates the experimental data collected during the measurement into explicit statements about X, with some degree of accuracy.
The problem now is how we might go about designing a good M (top) and f top,M (top) .In designing M (top) one might try to find measurements which are particularly well suited for the identification of the topology.In particular, one might wish to produce data which is, somehow, explicitly revealing of the field's topology.In other words, if the experimenter does a good job designing M (top), then f top,M (top) could be relatively trivial, i.e. the data would require little analysis.Unfortunately, this approach is not generally available to us.QFT forces us to use local measurements, and as a consequence some non-trivial data analysis is required to piece these local probe measurements together into a global picture that tells us about topology [21].
However, even if we could find a suitable M (top) and f top,M (top) to determine the topology of a spacetime through the measurement of a quantum field, their utility would be limited to this particular feature.If we are interested in a different feature, F (the charge of the field, its mass, entanglement structure, the space-time geometry, etc.), we would likely need to design a very different measurement protocol, M (f)-which is wellsuited to F-and a new data analysis function, f f,M (f) .While we may not have to design M (f) and f f,M (f) from scratch-many intuitions and previous knowledge may be applicable-this redesign is likely to be non-trivial.This seemingly uncontroversial statement, that you need different experiments and data processing techniques to measure different things, is one of the central (often unquestioned) tenets of experimental design in physics.
In this light, a question we address in this paper is whether when we shift our interest to feature F we can be lazy and keep our old measurement protocol, M (top).This would completely transfer the burden from designing a measurement protocol to designing a data-analysis function.In particular, we would now have to construct a function, f f,M (top) , which extracts information about F from data produced from M (top).
One may think this lazy strategy will not work for two main reasons.Firstly, one may have the intuition that since M (top) was not designed with feature F in mind, finding a good f f,M (top) is likely to be difficult, if not impossible.Indeed, if we then changed our interest to some other feature F ′ we would face the same difficult (if not impossible) task of finding a good f f ′ ,M (top) .Secondly, one may be inclined to think that even if M (top) somehow does extract a sufficient amount of information to be able to discern from it a wide variety of features, then it is likely to be an excessively complicated measurement procedure.
However, as we will show in this paper, both of these concerns can be overcome.In particular, we will present a simple fixed local measurement protocol, M 0 , designed without any specific feature of the QFT in mind.Despite this simplicity, we will show that the data produced by M 0 can be processed to produce accurate conclusions about a wide variety of features of the QFT.Moreover, we will prove that the data-analysis functions f f,M0 which produce these conclusions can be easily generated from a basic machine-learning ansatz using standard supervised learning techniques.
This also connects with a fundamental question in the measurement theory of QFT.How do quantum fields store information?The fact that untargeted local measurements can be used to track non-local features of a QFT can be ultimately traced back to the fact that quantum fields tend to store global information locally (albeit perhaps in a scrambled way).Hence, simple local measurements are sufficient to extract global information, and machine learning is an effective way of unscrambling it; Indeed, extracting non-trivial features from complex data sets is exactly the kind of task that machine learning is well-suited for.As we will see, the ability of neural networks to unscramble global features of the field from local probe information informs us about how fields store information as they react to changes in their environment.

A. Simple Fixed Measurement Procedure
In this Subsection we propose a simple measurement protocol (what we denoted M 0 above) to produce labeled data from a probe coupled locally to a quantum field.This data that can then be processed to learn about different features of QFT.
In order to make things concrete, we will now specify some details about the probe and its interaction with the field.It is critical to note that the methods discussed in this paper are not dependent on the particular details of the field, the probe or their interaction, but we will particularize the choice of probe to the usual Unruh-DeWitt model in Eq. ( 1), a common simple (yet realistic [16,17,51,52]) model for particle detectors probing quantum fields.
Consider a local probe coupled to a quantum field linearly via (1).For illustration purposes, we model the probe as a harmonic oscillator with free Hamiltonian Ĥd = ℏω d (p 2 d + q2 d )/2, where qd and pd are the probe's unitless quadrature operators satisfying [q d , pd ] = i 1 1.We take the probe to couple to the field via μd = qd .Our measurement procedure is as follows: 1. Initialize the field according to some conditions labeled y.For instance, for a quantum field in an optical cavity, if y labels boundary conditions, this would mean preparing a cavity with those boundary conditions and letting the field equilibrate with the cavity walls.If y labels temperatures, this would mean preparing a cavity at that temperature and letting the field thermalize with the cavity.
2. Initialize the probe to its ground state.Couple the probe locally to the field at time t = 0 (i.e., the switching function χ(t) is zero before t = 0).
3. At time t m = T min > 0, perform a projective measurement on the probe's qd quadrature and record the result.

5.
Repeat steps 1 to 4 a total of N times − 1 more times increasing t m by an amount ∆t each time.
6. Repeat this whole process N tom times.
This (simple and untargeted) measurement procedure yields raw data D raw ∈ R Nm , where The goal of the N tom repetitions is to increase the precision with which we can calculate the averages ⟨q d ⟩, ⟨p d ⟩, and ⟨r d ⟩.That is, the higher N tom , the more accurate the state tomography we can perform on the detector.This data comes along with an associated label, y.We collect N samples of these pairs, (D raw , y), where each of these N samples data points D raw are associated with generally different labels y.As we will see, this simple local (in both time and space) interaction of the detector with the field produces enough information to determine a variety of non-local properties of the field.To accomplish this, we will use this labelled data to train a neural network.Once trained the neural network will be able to accurately predict the correct label y when given new unlabeled data.In this way we will be able to learn about non-local features of the field from our local measurement data and the trained network.Some non-local features of the QFT might be more effectively captured by using more than one detector, so that we can use not only the outcomes of their individual measurements but also the correlations between them.In this sense, note that the above measurement scheme can be straightforwardly generalized to use arrays of local probes.
As a final remark, it is worth noticing that it is not necessary that the training (and validation) data come from actual experiments as described above.Indeed, it will often be convenient to train the neural network using simulated data from the available theoretical models to prepare the network to identify features in experimental data.

B. Data Preprocessing
While we could train our neural network directly on our N samples labeled data points, (D raw , y), in order to improve and speed up the training we will first compress and preprocess the data.
Note that each of our N tom measurements of qd (t m ) where t m = T min + m ∆t, m ∈ {0, . . ., N times − 1} are independent and identically distributed; they each come from identical independent experiments.We can summarize these N tom measurement outcomes, q k , via their sample mean, sample variance, and sample fourth central moments, We can similarly compress our measurements of qd , rd and pd at each time t m = T min + m∆t.Depending on our needs, we might want to include higher order (e.g., eighth) sample central moments.However, we will see that including the sample fourth moments (and often even only the second moments) in our compressed data is enough to allow our proposed machine learning methods to address our three physical examples, and potentially to answer a very general breadth of questions about the quantum field.
Once compressed, our data is described by the time series q(t m ), r(t m ), p(t m ), ( 5) That is, our compressed data can be represented by a vector D c ∈ R d where d = 9 N times .Notice that even after compression the data will generally be high-dimensional since N times will be very large.
After compression we perform standard preprocessing [53]: we center the data, do principal component analysis, and whiten the data.The details of our preprocessing are discussed in Appendix A. Let us call the preprocessed data, D p .Next we will discuss how our neural network is trained and validated on the N samples labeled data points (D p , y).

C. Neural Network Training
In this Section we lay out how neural networks provide a data-analysis ansatz (f f,M0 in Sec.II) which we can use to process our data.In particular, we will discuss how we can analyze the data produced by our simple fixed local measurement protocol, M 0 , to arrive at accurate conclusions about a wide variety of (non-local) features of the QFT.
Neural networks model complicated high-dimensional functions by alternatively applying tunable linear-affine transformations (controlled by weights and biases) and fixed non-linear transformations (i.e., activation functions) to their inputs.Fig. 1 illustrates a simple neural network architecture one might use for classifying features of interest (in our example, the topology of the spacetime) based on local probe measurement data on a quantum field.The circles in Fig. 1 represent the fixed activation functions and the lines represent the tunable weights and biases.The neural network takes as input the local probe measurement data and outputs a probability assignment for each possible value of the feature of interest (whether the spacetime topology is open or closed).
In order to find the proper settings for the network's weights and biases (in the example in Fig. 1, those settings which accurately predict the spacetime topology) we follow a supervised training procedure.Supervised training requires labeled data, i.e., many datapoints, D p , each paired with a label, y, indicating the result that our data analysis should produce.In the topology example this is y ∈ {open, closed}.These labels may also be continuous, for instance if we were interested in determining, e.g., the field's mass, m, we would have y = m ∈ R + .
Given N samples labelled datapoints we divide these into data for training the neural network (N train = 0.75 N samples ) and for validating its accuracy (N valid = 0.25 N samples ).We then define a cost function to characterize how wrong our network's predictions are over the training data.The network's weights and biases are adjusted to minimize this cost function.Once the training is complete, the network's accuracy is evaluated on the validation data.
The network architecture (fully connected feedforward) and training procedure (stochastic gradient descent) that we use are standard [53].Our code can be found on GitHub [54], and a summary of our network architecture, data preprocessing, and training process is given in Appendix A.

III. TWO METHODS OF SIMULATING DATA GENERATION
In the previous Section we discussed how the probe will interact with the field, how it will be measured, how the resulting data will be processed, how this data will be used to train a neural network, and how the accuracy of this network will be validated.In Sections IV, V, and VI we will apply this process to three examples: remote boundary sensing, thermometry, and quantum state discrimination, respectively.To be able to do this we need to first discuss how exactly we produce the labeled data in a general QFT framework.This is what we do in this Section.
Since we are but poor theoreticians, rather than having a real-life probe interact with a real-life field to generate our training data, we instead simulate both the probe-field interaction and the probe's subsequent measurement.Notice however that training and validating with simulated data is not necessary: If one were to work with actual gathered experimental data the exact same analysis would apply.The methods discussed in this paper are independent of how the data is generated, be it by experiment or by simulation.Moreover, our methods are independent of exactly how we simulate the data generation process.To demonstrate this we will simulate the probe's response in two different ways: one involving a lattice approximation and one involving Dirac-delta interactions between the probe and the field in a continuous optical cavity.In both cases we will take the interaction Hamiltonian between the probe and the field as a real-istic and accepted model for the light-matter interaction of atoms with the electromagnetic fields [16,17,51,52].

A. Method 1: Lattice Approximation
For two of the three examples that we will analyze (boundary sensing and thermometry) we simulated the probe's response to the field using a lattice approximation.This approximation is not necessary but it does greatly simplify the analysis.It can be motivated as follows.
From (1), if the region where F (x) is non-negligible has a lengthscale σ, the probe will not couple to field modes with wavenumber |k| ≫ σ −1 [62].Taking the probe to have a Gaussian smearing with standard deviation σ, the coupling to the high-frequency field modes is exponentially suppressed.This motivates a UV-cutoff [63,64] at |k| ≤ K := 16/σ (see Appendix B for technical details).
Cutting these high-frequency modes out of both the interaction Hamiltonian (1) and the field's free Hamiltonian (6) strongly simplifies the situation without substantially affecting the probe's response to the field.Typically, UV-cutoffs are at odds with locality assumptions, as they break the causality of the theory [11].However, for our choice of scales, the UV-cutoff only introduces a relative error in the probe's response below a fraction of a percent (see Appendix B for details) and therefore the field theory remains effectively causal to any perceptible accuracy.The causal behaviour of this scenario is explicitly demonstrated in our first example (see Sec. IV).
Taking the UV-cutoff can be seen as placing a bandlimit on the field.The Nyquist-Shannon sampling theorem then allows us to exactly reconstruct this bandlimited field from a discrete lattice of sample points.We provide some discussion in Appendix B. This yields the Hamiltonians where a = π/K is the lattice spacing , x n := n a and λ 0 = λ ℏ/ √ a m is strength of the probe-field cou-pling.We have defined dimensionless field operators qn := am/ℏ 2 φ(t, x n ) and pn := a/m π(t, x n ) satisfying the canonical commutation relations [q i , pj ] = i δ ij 1 1.
Since the operators qn and pn are associated with the field operators in a localized region we say that (q n , pn ) define spatial modes.
It is also possible to restrict the field theory to a finite region of space (think of an optical cavity or a finite size transmission line).We can incorporate this to our general formalism, which results in an additional IR-cutoff after we restrict the field to the region x ∈ [0, L] where L = N a.
Given the above Hamiltonian, we can use Gaussian Quantum Mechanics [65][66][67] to efficiently simulate the dynamics.This is because the Hamiltonians controlling the dynamics are all quadratic in the field and probe quadrature operators, and because the initial states of the probe and field are Gaussian states (i.e., states with Gaussian Wigner functions 2 ).Together these guarantee that the probe and field states remain in Gaussian states throughout their evolution.As such, their quantum states are fully characterized by the first and second moments of X which we can collect together in a displacement vector and a covariance matrix Unitary dynamics for the joint density matrix, ρ → Û ρ Û † corresponds to symplectic(-affine) evolution in phase space.In particular, the changes in the displacement vector and the covariance matrix are given by X → S X + d and σ → SσS ⊺ , where S is a symplectic matrix and d is a vector.Concretely, where F(t) is the symmetric matrix that satisfies Ĥ = Ĥd + Ĥϕ + Ĥint (t) = 1 2 X⊺ F(t) X, Ω is the symplectic form, [ Xj , Xk ] = i Ω jk 1 1, and T is the timeordering symbol.We note that since Ĥ has no terms linear in X we have d = 0.Moreover, we note that S can be calculated non-perturbatively much more efficiently than U [66][67][68][69][70][71][72].
Once we have computed the evolved joint covariance matrix and displacement vector, we can easily isolate the reduced state of the probe system as 2 For an introduction to the phase space quantum mechanics formalism in arbitrary dimensions, see, e.g., Section 4.7 in [68].
These values determine the distributions which our measurements of qd , pd and rd should be drawn from.
Namely, the outcomes of the measurements are distributed as where N (µ, σ) is the normal distribution with mean µ and variance σ.This can be straightforwardly justified from the fact that the partial Wigner function of the probe is still Gaussian and that its marginals correspond to the position and momentum distributions.Moreover, from the evolved probe state and N tom we can determine the distributions from which the sample mean and sample variance are drawn.For instance, where χ 2 (k) is the chi-squared distribution with k degrees of freedom.Note that while the sample means are distributed normally, the sample variances are not.

B. Method 2: Non-Gaussian Wigner Functions
Although working with the Gaussian formalism when possible is very convenient for calculational purposes, the techniques discussed in the previous Subsection can be easily extended to drop the assumption that the field and probe are initially in Gaussian states.We will show this explicitly in our third example scenario in which we will attempt to distinguish between a coherent state (which is Gaussian) and a Fock state (which is not).
Even when the initial state's Wigner function W (ξ) is non-Gaussian we can still understand the dynamics in terms of phase space evolution [68].A Gaussian unitary transformation U G (i.e., evolution generated by a quadratic Hamiltonian) still corresponds to a symplectic transformation for the vector of quadrature operators of the form described in Sec.III A, so that the dynamics induced by our unitary evolution operator, ρ → ÛG (t) ρ ÛG (t) † can be equivalently written in the Heisenberg picture as for some symplectic matrix S(t) and some vector d(t).
For our particular case, d(t) = 0 since H does not have any terms linear in X.This transformation entails, in turn, a transformation in the states' Wigner function, independent of the Gaussianity of the states.Mathematically, this transformation can be written as We can use this to determine the final reduced probe state from the initial probe-field state by integrating over all of the field variables as where ζ = (q 1 , p 1 , q 2 , p 2 , . . ., ) runs over all the field variables.Note that if the initial probe-field state is non-Gaussian then the final probe state will also be non-Gaussian.That is, we cannot characterize it by its first and second moments alone.In general, this final probe state will have non-trivial higher moments.However, the higher moments of the final probe distribution can be calculated straightforwardly from the higher moments of the initial probe-field distribution.We can use the central limit theorem to generate the sample means, sample variances and sample fourth central moments of Eqs. ( 2), ( 3) and ( 4).For instance, from the second, fourth and eighth central moments of the marginal distribution of the q quadrature of the probe, where we have, for N tom sufficiently large, In particular, if we consider states with ⟨q d ⟩ = 0, as we will in the example in Sec.VI, then where is the n-th moment of the marginal distribution of the q quadrature of the probe.

IV. A FIRST PHYSICAL EXAMPLE: REMOTE BOUNDARY SENSING
As a first application of our framework, we will use measurements of a local probe near one end of a cavity (x ≈ 0) to learn about the location of the boundary at the other end of the cavity (x ≈ L).
In this example we will employ the lattice approximation discussed in the previous section.That is, the field will be approximated as a finite chain of harmonic oscillators which we will call spatial modes.Note from Eq. ( 7) that these spatial modes have a nearest-neighbor coupling.As we will see, the lattice approximation does not significantly impact the relativistic compliance of our setup.Indeed, we will show that, in practice, signals do not propagate superluminally in our lattice.
To simulate different positions of the far boundary, we will modify the coupling in Eq. ( 7) so that we separate the coupling of the two oscillators furthest from the probe from the rest: That is, if there are N harmonic oscillators in the lattice, we will consider modifications to the coupling between oscillator N and oscillator N − 1.We summarize the modified couplings under consideration in Table I.
For the first case (y = 1) we take Ĥlast to be just the same as every other site-to-site coupling, so that the boundary is at the last site of the lattice.In the second case (y = 2) we take Ĥlast = 0, thus setting the boundary at the second to last site.We added a third case (y = 3) to measure the time that it takes for the probe to detect a perturbation coming from the boundary.To do so, we consider a time-dependent coupling between the two last oscillators which turns on at t = 0.In each of these three cases we assume that the field has thermalized to the ground state of its t < 0 Hamiltonian well before t = 0. Recall that t = 0 marks the instant when the probe first couples to the field.In this example, we take the switching function χ(t) to be constant for t > 0.
In the first two cases the t ≥ 0 field Hamiltonian is exactly the same as the t < 0 Hamiltonian.All that changes at t = 0 in these cases is that the probe couples to the field at one end of the cavity, x ≈ 0. We thus expect disturbances to begin propagating away from x = 0 beginning at t = 0.In the third case the field Hamiltonian suddenly changes at t = 0.This change is localized around x ≈ L. In this case we thus expect disturbances to begin propagating away from both x = 0 and x = L beginning at t = 0.In this third case, we take the last spatial mode to be in a highly-squeezed state (8 dB [73]) before t = 0 so that it produces a notable disturbance when it couples.
By comparing cases 2 and 3 we can explicitly measure the signal-propagation speed on the lattice.In these cases the field is in exactly the same state prior to t = 0, they differ by the local disturbance at t = 0 and x ≈ L in case 3.For t > 0 the disturbance at x ≈ L begins propagating through the lattice towards the probe.We can define the effective signaling time as the time it takes the probe to differentiate between cases 2 and 3.If the probe is able to differentiate cases 1 and 2 in less than this signalling time it cannot be due to having received a signal from the boundary.We can compare this signalling time to the light-crossing time of the cavity (t = L/c) to see if our probe is receiving any faster-than-light signals due to the approximations employed.
We consider a detector of roughly atomic size, with a Gaussian smearing function of width σ = 53 pm (the Bohr radius).Taking the UV-cutoff K = 16/σ gives us a lattice spacing a = π/K = 10.4 pm.We take the boundary to be at L = 90 σ = 457 a = 4.7 nm.We set the detector's excitation energy ℏω d = 130eV and the field's mass mc 2 = 1eV.The field mass is much smaller than any other energy scale in the problem (effectively massless).Finally, we investigate the non-perturbative strong-coupling regime where λ 0 = ℏω d = 130 eV.Note that the choice of parameters is for demonstration purposes; similar results are also obtained for a large set of different parameters.
In Fig. 2 we show the performance of the neural network (solid line).The green triangle lines show the causal behaviour of the setup: when we send a signal from the far boundary to the detector (coupling a new oscillator at t = 0) the neural network accuracy indicates that the probe does not receive the signal before ≈ 15 as.A comparison between this and a conservative estimate of the signal-to-edge-of-detector light-crossing time, (L − 5σ)/c = 15 as (vertical red line), shows that our toy model displays a good causal behaviour.This confirms that the lattice approximation did not compromise the relativistic structure that is essential for the model to be a faithful simulation of a quantum field theory in a cavity.
The blue circle line represents the ability of the neural network to sense the boundary at the far end of the cavity.Here, the information about the boundary has had time to spread all over space in the equilibration process before t = 0: the ground state knows locally about its boundary conditions.Indeed, the network accuracy shows that the nature of the field boundary can be resolved long before any signal from the boundary propagates to the detector.This allows the probe to see the boundary 'without light', that is, in the vacuum state of the theory and much before the light-crossing time of the lattice.
Notice that we do observe an increase in the accuracy of the neural network as the time of the measurements is further away from the start of the coupling of the probe.This is not related to signalling from the boundary, as the comparison between cases 2 and 3 shows, but rather to the fact that the more time we let the detector interact with the field, the more knowledge it gathers from the infrared structure of the field state, where the information about the boundary lies.
In itself, this 'seeing without light' phenomena is not a new result.It has been seen and understood in a number of different contexts [27][28][29][30].What is new here is the explicit collection of (simulated) data and the direct translation of these local measurement results into claims about the field's boundary conditions.Remarkably, the local measurement protocol and data-analysis ansatz was not tailored to the detection of boundaries.In fact (as we will soon see) the exact same local measurement protocol and data-analysis ansatz can be used to make accurate claims about a wide range of other features of the QFT, and not only to "see without light".

A. Near Optimality of the Neural Network
Unlike the more complex scenarios in the sections below, this particular example admits a more conventional statistical treatment that will allow us to discuss in some detail how we know that the neural network in this example is behaving near-optimally.
Recall from ( 17) and ( 18) that we know the distributions from which the sample first and second moments are drawn from.Thus, ultimately, we know the distributions which our compressed data D c is drawn from.In particular we know the distributions p(D c |y = 1, θ), p(D c |y = 2, θ), and p(D c |y = 3, θ) for each of the three cases listed in Table I, where θ contains all the other parameters of the measurement setup (e.g., coupling times).These distributions are simple, yet made out of rather unwieldy combinations of Gaussians and χ 2 distributions.
Suppose that we are presented with some compressed data D c and asked to guess whether it came from a y = 1 or a y = 2 case.This is, in fact, the exact question we are repeatedly asking our neural network.For this binary classification task, the optimal solution is known: we ought to guess whichever y makes p(D c |y, θ) larger (see Appendix C for details).The success rate of this strategy is p success (θ) = 1  2 (1 + TV 12 (θ)) where TV 12 (θ) ∈ [0, 1] is the total variation distance between p(D c |y = 1, θ) and p(D c |y = 2, θ).
If we could calculate TV 12 (θ) then we would have a tight upper bound on the accuracy achievable by any method aimed to distinguish these two cases.In particular, we would have an upper bound on the validation accuracy achievable by any neural network (with any architecture, training time, training method, etc.).Unfortunately, for the rather cumbersome combination of Gaussians and χ 2 distributions pertaining to this example, TV is not calculable in closed form.However, in the large N tom regime these distributions each simplify to multivariate Gaussian distributions.While the total variation distance between multi-variate Gaussian distributions is unknown in general, we can compute the Hellinger distance H(θ) ∈ [0, 1] between them.The Hellinger distance bounds the total variation distance above and below as (see Appendix C for details) Thus from H(θ) we have upper and lower bounds on the optimal validation accuracy.These bounds are plotted in dashed lines in Fig. 2. We have a guarantee that the optimal performance possible for any network (with any architecture, training time, training method, etc.) lies between the dashed lines.The fact that the network validation accuracy tracks these bounds indicates that the network is near optimal.Moreover, the fact that the upper bound in the signal case (green dashed line) is near to 50% before ≈ 15 as indicates that there is no way to process the data to learn where the boundary is before this time.This shows that the lattice approximations used for this example do not violate relativistic causality; no neural network can extract the signal from the data before relativity says it can, simply because the information is not there yet.
It is worth remarking that the network used was not designed with this problem in mind, and yet it performs almost optimally.This suggests that the network is good at extracting and processing all the information contained in the data produced by the measurement protocol.Therefore, it is reasonable to expect a near-optimal behaviour in the other examples as well, especially since we have changed neither the network architecture nor the training procedure.Notice in particular that we did not need to feed the neural network any data about the parameters of the experiment, σ, K, a, L, ω d , and m.We did, however, rely on the network being trained on data obtained in the same physical system that it was tested on later.If needed, the network could be trained on data from QFTs with a range of different parameters, so that it also learns to distinguish the dependence on these parameters from the dependence on the feature we are actually interested in.

V. A SECOND PHYSICAL EXAMPLE: THERMOMETRY
To showcase the broad applicability of our framework, we consider a very different problem keeping the exact same measurement protocol, the same coupling between probe and field, and the same data-analysis ansatz.
We consider a probe motivated by a superconducting circuit undergoing a long-range interaction with an open transmission line in a thermal state.Such systems do not couple strongly to frequencies above 50 GHz [62,74,75].Assuming a Gaussian profile we can match this behavior by taking 3/σ = 50 GHz/c, i.e., σ = 18 mm.These numbers are motivated by [62].Taking our UV-cutoff in the field at K = 16/σ = 267 GHz/c gives lattice spacing a = π/K = 3.5 mm.We couple the circuit to the center of a transmission line of length L = 100, a = 19.6,σ = 353 mm, with Dirichlet boundary conditions.We take the circuit to have an energy gap typical of such systems, ω d = 10 GHz, and the field to have a mass mc 2 /ℏ = 0.1GHz, much smaller than the other energy scales.We again consider strong-coupling: Using these parameters we trained the network to estimate the field's temperature based only on measurements of the local probe.For each base temperature T we generated our labeled data by simulating how the probe would respond to a quantum field of temperature, y, selected uniformly from the range [0.9T, 1.1T ], i.e., the range T ± 10%.For each T , we trained a neural network through regression to accurately predict the temperature of the field, y.To validate the accuracy of the network we determined the fraction of the validation data which the network was able to correctly place the label within y ± 0.01T of the correct value.By random chance you would expect the network to guess the correct temperature to within this accuracy 10% of the time.This is what one would expect when the coupling time is zero since the probe has not learned anything about the field.This is confirmed by the 10% validation accuracy shown in Fig. 3 when the coupling time is zero.
As the coupling time increases, the network becomes more accurate.For each base temperature T the network reaches nearly 100% of the validation data labeled correctly (to within y ± 0.01T ).It can do so even before the interaction's thermalization time, which is lowerbounded by the detector's Heisenberg time-the smallest timescale that the detector can resolve-1/ω d = 100 ps (red vertical line).Note that the neural network can determine the temperature very accurately even for very low transmission line temperature (sub-mK).
Same as in the previous example of boundary sensing, the possibility of measuring temperature without letting the thermometer thermalize is not a new result in itself.It is known that thermometry in times below the thermalization time of the thermometer is possible [76][77][78][79].What this example shows is that the field temperature can indeed be reconstructed for times as short as the Heisenberg time of the thermometer using the exact same local measurement protocol and data-analysis ansatz we used for boundary sensing, thus adding temperature to the arguably long list of features of the QFT that can be reconstructed with our framework using this very innocent choice of data sampling and processing.

VI. A THIRD PHYSICAL EXAMPLE: DISCRIMINATION BETWEEN FOCK AND COHERENT STATES
In this Section we would like to showcase the effectiveness of our framework in an example that assumes neither 1) the Gaussianity of the probe/field states, nor 2) a UV cutoff/bandlimit/lattice discretization.With this in mind, here we apply the proposed measurement framework to the quantum optical problem of distinguishing Fock states (like a single-photon state) from low amplitude coherent states (produced by stimulated emission) when the expectation of the number of photons in the state is the same.Consider a massless scalar field in a (1+1)-dimensional cavity of length L with Dirichlet boundary conditions.We can consider the mode decomposition of this field, where the dimensionless quadrature operators qn and pm satisfy canonical commutation relations [q n , pm ] = iδ nm 1 1 and where c k ℓ = ω ℓ = π ℓ c/L.We will take the field state to be the vacuum for all modes except for the lowest frequency one (the ℓ = 1 mode).We will try to determine the initial state of the ℓ = 1 mode by measuring a probe coupled locally to the field in the center of the cavity.We take the ℓ = 1 mode to be in either a) a Fock state |N ⟩ with N excitations, or b) a phase-averaged coherent state with N excitations on average.That is, a coherent state |α⟩ for some α ∈ C with |α| 2 = ⟨n⟩ = N but with an unknown phase.In other words, we will consider the two following initial probe-field Wigner functions where q d and p d are the probe variables and where W 0 (q, p) = e −q 2 −p 2 /π.Note that neither of these states are Gaussian.The Wigner function of a Fock state is [80], where L N (x) is the N -th Laguerre polynomial.For the unknown phase coherent state, the fact that we do not know (therefore average over) the phase makes this a non-Gaussian state.The Wigner function of a phase-averaged coherent state (PAC) is [81] W PAC (q, p; N ) = Moreover, we note that these two states have exactly the same first and second moments: Thus no analysis of these two field states in terms of their first and second moments can differentiate them; these field states can only be distinguished by methods which are sensitive to their third and higher order moments.
It is a non-trivial task to determine which of these states the field is in.Suppose that (forgoing the localized probe system temporarily) we are somehow able to measure one of the quadrature operators of the lowest mode (e.g., q1 ) directly.The outcome of this measurement would be selected from the marginal distributions of W Fock (q, p; N ) and W PAC (q, p; N ).These are shown in Fig. 4a for the case of N = 4.The total variation distance between these marginals is TV ≈ 0.29, such that best odds one can hope for given a single measurement outcome are (1 + TV)/2 ≈ 64.5%.
In actuality we will attempt to distinguish these field states from a harmonic oscillator probe's response to the field, as explained in Sec.II.This probe will pick up information from all of the field modes, much of which is irrelevant to the task at hand.Our machine learning algorithm will need to learn to distinguish the irrelevant noise from the (already weak) signal from the ℓ = 1 mode.
The only relevant difference with the measurement procedure in Secs.IV and V is the switching function.While the same results would be obtained with the same coupling protocol as in the previous section, for ease of analytical treatment in this manuscript we consider a switching function χ(t) = δ(t) + δ(t − t m ) where t m is a time just before we measure the probe.Specifically, the probe undergoes a strong sudden interaction with the field t = 0. Then both the probe and field evolve freely for a time t m .The probe undergoes another strong sudden interaction with the field at t = t m .Finally, we measure one of the probe operators (q d , pd or rd ).This measurement procedure is repeated N tom times for each probe operator and at each of the N times measurement time t m .It is important to note that since the field state is not Gaussian, these probe measurement values will not be distributed normally.The distributions they are drawn from are ultimately derived from the ones in Fig. 4a and are much noisier due to vacuum noise from the other field modes that also couple to the probe (we do not carry out any single mode or rotating-wave approximations).
As in the boundary sensing and thermometry examples, we record the sample means and sample variances of these N tom measurements in our compressed data.However, as discussed above, we will need more than just first and second moments to handle this problem.Thus we additionally include the central fourth moments of the distribution of sampled data, as explained in Sec.III B. In Appendix D and E we calculate non-perturbatively the second, fourth and eighth moments of the probe's quadrature operators for the two field states that are relevant for the example in Sec.VI.Recall that we do not require any more measurements, we just need the network to be able to process higher moments from the same sample of measurements used previously (i.e., less compression).We then train our neural network on many examples of this compressed data until it can accurately classify whether any given data came from an interaction with a Fock state or a coherent state.We use exactly the same neural network architecture, loss function and optimization method as in the two previous examples.
We consider an optical cavity of length L = 1 cm and a probe with a Gaussian smearing function of width σ = 0.1 mm placed at the center of the cavity x = L/2 = 0.5 cm.As discussed above, in each run of the experiment the probe strongly interacts with the cavity at two times: first at t = T min = 0, then at t = T min +n ∆t for n = 1, 2, . . ., N times with ∆t = 6.67 ps.As N times = 10, we have that T max = 66.7 ps.
Fig. 4b shows the validation accuracy of the neural network given different values for the probe frequency, ω d , and for the same tomography, N tom = 5000.This shows that the neural network can successfully distinguish between a Fock and a PAC state given the sample fourth moments.The neural network can distinguish the two states with almost 100% accuracy in a wide range of detector gaps.This plot also provides some physical insight: at resonance, i.e. when the probe frequency is the same as the frequency of the mode of interest (in Fig. 4b, a solid vertical line corresponds to ω d = ω 1 = π/L = 94.2GHz), we obtain an improvement The marginal distributions for the N = 4 Fock state (solid) and for the phase averaged coherent state with |α| 2 = 4 (dashed).The distributions are not Gaussian and they have the same mean and variance, making them impossible to distinguish with simple statistical analysis of first and second moments.b) The validation accuracy of a neural network trained to distinguish two field states from the measurements of a local detector coupled to the field.In particular the network differentiates vacuum cavity states with the following two modifications: 1) the lowest field mode is in an N -particle Fock state or 2) the lowest field mode is in a coherent state with expectation ⟨n1⟩ = N particles and unknown phase. in the accuracy of the neural network.Furthermore, there is a second peak at double the frequency of the first mode (in Fig. 4b, the dashed vertical line corresponds to ω d = 2 ω 1 = 188.4GHz).This is expected since when the mode frequency is an integer multiple of the detector energy gap, resonance occurs and the detector is more sensitive to getting excited by capturing field excitations.We also show in Fig. 5 how the accuracy a) increases rapidly with the number of measurements, and b) how it evolves during training.It is worth noticing that as the number of measurements increases, the neural network becomes increasingly fast at reaching a stable validation accuracy.Fig. 5a also shows that the valida-tion accuracy almost saturates with a moderate number of measurements (N tom ∼ 10 4 ), which supports the experimental viability of the proposal.In previous examples, we considered particularly big tomographic sizes (N tom = 10 22 in Sec.IV, and N tom = 10 20 in Sec.V) to ensure that the validation accuracy that we analyzed was the sole result of the network's ability to unscramble the information from the local measurements, and that no role was played by the possible inaccuracies of the data it was fed with-even though these potential inaccuracies and the resilience of the protocol against them do play an important role in experiments.Moreover, it should be taken into account that the number of measure-ments N tom does not necessarily translate into sequential repetitions of the measurement protocol.For instance, collective measurements made on ensembles of particles (e.g., an atomic gas) can be translated into averages of individual observables calculated over the number of particles of the ensemble, which typically is already of the order of the Avogadro number (N a ∼ 10 24 ).In this kind of setup, tomographic sizes such as those considered in the two previous examples are clearly within the reach of experiments.
The success of the measurement framework in this last example shows that its applicability is not restricted to simple Gaussian systems.Indeed, we emphasized that here we have recovered a feature of the field without using the Gaussianity of the probe/field states or a UV cutoff/bandlimit/lattice discretization.

VII. CONCLUSIONS
Local measurements of a quantum field can reveal information about its global features.We have shown that we can use machine learning techniques to unscramble the information about QFTs acquired by localized probes with a one-size-fits-all method, thus avoiding the necessity of designing a specific measurement protocol and data-analysis function for each feature we might be interested in.More concretely, we have demonstrated how to read out non-local features of a QFT from the outcomes of a fixed measurement protocol with local experiments, processed through a neural network with a generic architecture and training procedure.
As particular examples to showcase the power of the proposed machine learning framework we have examined three case studies: i) how a local probe can see a wall far away from it, in the vacuum and without actively sending signals to bounce off it, ii) how a local probe that is not given enough time to thermalize can still accurately determine the temperature of a quantum field, and iii) how detectors can accurately distinguish between Fock states and coherent states even when the first and second statistical moments of their observables match.To do so, in all cases we used the same simple measurement protocol, which was not adapted to the particular toy problems considered in this paper.Yet we were able to distinguish with high levels of accuracy the relevant features of the field we were after in each case.This is evidence of the potential of these methods to accommodate experimental needs.Namely, the use of machine learning techniques in the context of quantum field theory takes the complexity burden out of the design of experimental protocols and puts it on the data processing, which neural networks can deal with efficiently.
The techniques we present in this paper are general and of wide applicability.This paves the way to the use of machine learning techniques in more complicated scenarios such as distinguishing gravitational backgrounds [29,82], global state tomography [35] with local probes, acknowledging entanglement in analogue Hawking radiation [83], and maybe even new experimental proposals seeking direct evidence of yet untested QFT phenomena such as the Unruh effect [50].In each of these scenarios the response of local probes like the ones in this paper are often used to study features of the QFT, so the techniques proposed in this manuscript are directly applicable.Finally, the methods developed here are directly translatable to their use in many-body quantum physics, where they can be used to address the problem of measuring many-body observables with local probes in, e.g., quantum phase transitions [84].As we described in the main text, our measurement procedure and compression produces labeled data consisting of data D c ∈ R d where d = 9 N times and an associated label y.To begin training we collect n = N samples instances of this labeled data into a d × N samples design matrix X = (D 1 , . . ., D n ) ⊺ and a vector of labels y = (y 1 , . . ., y n ) ⊺ .We then portion off 75% of this data (N train = 0.75 N samples ) to be used for training the neural network, X train and y train , leaving the other 25% (N valid = 0.25 N samples ) as validation data, X valid and y valid , which we will ultimately use to test the accuracy of the trained network.Note that the network will not be exposed to any of the validation data during training.
We begin processing our data by subtracting off the mean of the training data, X → X − X avg train , where Next we do principle component analysis (PCA), which finds a representation of our data without linear correlations.To do this we compute the covariance matrix of our training data and perform a singular value decomposition on it, where V = (v 1 , . . ., v d ) ⊺ is the matrix of singular vectors, v j , and Λ = diag(λ 0 , . . .λ d ) is the matrix of singular values, λ j ∈ R + .The singular vectors are the directions in which our data varies independently, and the singular values indicate "how much" variance is in each direction.Using this decomposition we can rewrite our data in this singular basis by taking X → V X.After this transformation, the training data has a diagonal covariance matrix, namely Λ. Finally we can whiten the data by taking X → Λ −1/2 X.The covariance matrix of the training data is now the identity matrix.We do this in order to force the neural network to take into account all components of the data, since they come from sources that are not supposed to be directly comparable in magnitude.Note that we have not used PCA to compress our data; that is, we have not cut any small singular values out of Λ as is commonly done.
The data is now ready to begin training the neural network.As discussed in the main text, neural networks work by alternatingly applying tuneable linear-affine transformations (controlled by weights and biases) and fixed non-linear transformations (the activation function) to their inputs.See Fig. 1 for a schematic of a neural network that can be used to classify the topology of a QFT based on local probe measurement data.
We will now use the architecture in Fig. 1 as a basic illustrative example.In this example the network accepts a 5-dimensional input, x (0) , into the left-most layer of the network (note that in the examples discussed in the main text the input dimension is much larger).In passing this data to the next layer of the network, a linear-affine transformation is applied to x (0) , as x (1) = A (1) x (0) + b (1) .The weight matrix A (1) here has dimensions 7 × 5 and the bias vector has a dimension of 7 such that x (1) is 7-dimensional.The 7 × 5 + 7 = 42 values which determine this linear-affine transformation are left as free parameters to be optimized during training.Next, a fixed non-linear function, G (1) is applied element-wise to each entry of x (1) yielding z (1) = G (1) (x (1) ).For instance G (1) may be the hyperbolic tangent function or a rectified linear unit.
There are two different problem types we need to design a network for, classification and regression.In classification, our network is tasked with deciding which of several classes (given by a discrete label y) our data belongs to.In this scenario we take the number of neurons in the final layer to be equal to the number of classes, and the final activation function to be a softmax.This ensures that the network's final output is a probability distribution that can be interpreted as the probability that the initial data belongs to each class.In regression, our network is tasked with assigning the data a continuous label y.In this case we take the final layer to have a single neuron.
In the examples discussed in the paper we considered a network consisting of 90 neurons on the input layer, 30 in the intermediate (hidden) layer and either two or one neurons in the final layer depending on which example we are doing.In the boundary sensing and state discrimination examples we have two neurons in the final layer.In the thermometry cases we have only one neuron in the final layer.All of the non-linear activation functions were taken to be leaky rectified linear units [53].
The network's weights and biases are tuned to minimize error of the network's predictions over the training set.To quantify this error we define the following cost functions, Classification: where ỹk is the one-hot encoding of the k th data point's label.For the classification scenario, our cost function is the relative entropy between the network's probability assignment and the expected result.For the regression case, the cost function is the mean square error.To help reduce overfitting (that is, an excessively close alignment of the network's model to the training data that might end up worsening its performance with real-or validation-data) we add an L 2 regularizer to this cost function, λ 2 ||A|| 2 2 , for some chosen λ 2 .This reduces the complexity of the model by penalizing the network for using large weights.Additionally, when training the network, we randomly "drop" some fraction of the neurons.This forces the network to be more robust.The sum of the cost function and the regularizer are then minimized by stochastic gradient descent [53].
Appendix B: Free scalar QFT on the lattice As we discussed in the main text, we can motivate a UV-cutoff for the field-probe system through the length scale of the probe's smearing function.To see this, let us expand the field-probe interaction Hamiltonian in terms of plane-wave modes as where is the Fourier transform of F (x).Note that F (k) determines how strongly the probe couples to each of the field modes.If the smearing function decays fast enough outside of a region of size ∼ σ (e.g., F (x) is a Gaussian with standard deviation σ) then F (k) would have an approximate width ∼ 1/σ.That is, the probe would not couple strongly to modes with wavenumber |k| ≫ σ −1 .Thus by considering a probe with an effectively finite spatial extent we are automatically considering a soft-UV-cutoff in the interaction of field and probe.If F (k) decays sufficiently fast, we can neglect the coupling to the modes above some large UV threshold, say |k| > K (e.g. for a Gaussian profile we can take K = 16/σ).This yields an effective coupling of F uv (k . By the Nyquist-Shannon sampling theorem we can then reconstruct our UV-cutoff smearing function, , where a = π/K is the spacing of the discrete positions, x n = n a, and where S n (r Note that in general F uv (x n ) ̸ = F (x n ).This means that in order to recover the UV-cutoff smearing function we cannot sample the original smearing function, but its bandlimited version instead.However, precisely because we are assuming that F (k) is effectively bandlimited, we can approximate F uv (x n ) ≃ F (x n ).Indeed, for the particular case of a Gaussian with standard deviation σ and K = 16/σ, it can be straightforwardly shown that |F (x) − F uv (x)| ≲ 10 −59 for every real x.Now, note that S n (r) decays only polynomially for large r.Thus in general our UV-cutoff smearing function will have polynomial tails, as all bandlimited functions do.This might seem to be in contradiction with our previous approximation, since F uv (x n ) will decay polinomially, while F (x n ) does so exponentially.However, the bound above shows precisely that these differences in the rythm of decay are only relatively significant in the regions in which the order of magnitude of both F (x) and F uv (x) is already negligible.We can therefore faithfully approximate the UV-cutoff smearing function by sampling F (x) instead of F uv (x), i.e., we redefine F uv (x) = n F (x n ) S n (x/a).This function is still bandlimited and so still has polynomial tails, however the coefficients F (x n )-which, as we will soon see, tell us how the probe couples to the lattice sites-are sampled directly from the original smearing function.
Since F uv as defined above is bandlimited, we can define the UV-cutoff interaction Hamiltonian as where we note that the UV-cutoff smearing function effectively induces a UV-cutoff of the field operator, Next, we note that since φuv (t, x) is bandlimited we can express it as a sum of sinc functions as, φuv (t, x) = n φuv (t, x n ) S n (x/a).Recomputing the UV-cutoff interaction Hamiltonian using these sinc representations we find where we have used the orthonormality of the collection {S m (r)} in the L 2 norm.Thus, by taking a hard UV-cutoff on the probe's smearing function we automatically find that the probe effectively only couples to the field at the discrete positions x n = n a.
Notice that so far we are not implying that the field itself has a UV-cutoff or that the space it lives on is discretized.We have only discussed an approximation of the probe coupling.We could study the field theory as is without an explicit UV-cutoff, but for our purposes it is convenient to consider that the field is also bandlimited.We apply this UV-cutoff to the field by removing the field modes with k > |K|, yielding φuv (t, x) Note that since these operators are now bandlimited we can express them as The UV-cutoff field Hamiltonian is then the free field Hamiltonian for this bandlimited QFT, namely where we have again used the operator's sinc representations and the L 2 orthonormality of {S m (r)} to express the integral as a sum.This way, we have completely reduced the dynamics to the field amplitudes and momenta at points (t, x n ).One may think that the fact that the Hamiltonian Ĥuv ϕ has terms with the field derivatives ∂ x φuv (t, x n ) shows that we are still dealing with a continuum space, not a lattice.Surprisingly this is not the case, these continuous derivative terms are understandable in terms of the discrete lattice with no approximation.
To see this note that as discussed above, bandlimited function can be perfectly represented on a lattice.The derivative of a bandlimited function is itself a bandlimited function.Thus, for bandlimited functions, derivatives are perfectly understandable on a lattice.This is facilitated by the following remarkable derivative approximation (which is exact for bandlimited functions): Namely, when f is bandlimited with bandwidth of K and a ≤ π/K then this formula for the derivative is exact.Moreover, if the Fourier transform of f is mostly supported in [−K, K] with thin tails (e.g, Gaussian tails) outside this region, then this is a very good derivative approximation.We can apply this logic to field operator as well.By Eq. (B7) we have, The derivative of the sinc profile ∂ x S n (x/a) is bandlimited and so can be written as a sum of sinc profiles.Carrying this out we find for k ∈ Z, we get where we have used the L 2 orthonormality of {S m (r)} again.Thus, we see that the derivative ∂ x φ(t, x k ) has an expression in terms of the field operators at (t, x n ).
If we use Eq.(B12) in Eq. (B9), we get an expression for the UV-cutoff Hamiltonian that is fully defined within the 1D lattice {x n }.Our bandlimited QFT is perfectly lattice-representable.Indeed as discussed in [85], despite what you may have heard, there are perfectly Lorentzian lattice theories.
Note, however, that the derivative understood in terms of the lattice sites is in a sense extremely non-local: it involves all n ̸ = k.How is it that a perfectly local operation in the continuum (i.e., differentiation) is here being exactly represented by a non-local operation on the lattice sites?This issue is discussed at length in [86], but the ultimate resolution is as follows: the lattice site themselves ought to be thought of as non-local objects, each associated with overlapping sinc-profiles.Thus, our perfectly local differentiation in the continuum is carried out in terms of the lattice via a non-local combination of non-local objects.
If, however, we want to think of the lattice sites themselves as being local objects undergoing nearest-neighbour interactions, then the dynamics must be modified further.To achieve this instead of the exact formula Eq. (B12) we instead take the approximation, We note that these satisfy the commutation relations, [ φuv (t, x n ), πuv (t, x m )] = iℏ(δ nm /a)1 1.Finally, rewriting this Hamiltonian in terms of the dimensionless operators, qn = am/ℏ 2 φuv (t, x n ) and pn = a/m πuv (t, x n ) which satisfy the commutation relations, [q i , pj ] = iδ ij 1 1, yields the bandlimited field Hamiltonian of Eqs. ( 7) and ( 8).The implementation of this bandlimitation has introduced essentially three changes to the scenario being considered: 1) a UV-cutoff in the interaction Hamiltonian, that is, the probe no longer couples to high frequency modes, 2) a UV-cutoff in the field Hamiltonian, that is, the field no longer contains high frequency modes, and 3) a discrete approximation for the derivative.The effects of the first two changes on the probe's response to the field are exponentially suppressed with increasing K and quickly become irrelevant for our calculations.The effect of the third change is more subtle, since the approximation that lies behind it is more drastic in nature, as becomes apparent by comparing the expression for the discretized derivative and the exact expression in Eq. (B12).The discrete approximation for the derivative changes the dynamics of the field; namely, it modifies its dispersion relation from Note that ℏω k ≥ ℏω ′ k , as seen in Fig. 6a).Note also that the dispersion relation is mostly modified at high frequencies, that is, at frequencies to which the probe does not couple strongly, as also shown in Fig. 6a).One objection one may have, however, is that this modified dispersion relation allows for the possibility of superluminal signals to exist in these high frequency modes, but in practice, as we show, the probe does not couple to the field modes that behave pathologically.
To quantify how much the dispersion relation has changed in the modes that couple to the probe we define the "average relative error" in ℏω k .This error is the average relative difference between the modified and unmodified dispersion relations at each frequency weighted by the strength of the probe's coupling to that frequency.That is, We have computed the average relative error for various cutoffs, field masses, and probe sizes in Fig. 6b).To investigate how the average relative error decreases as we increase K we use the following series of inequalities, where the first inequality follows from ℏω k ≥ ℏω ′ k ≥ 0 and the second from ℏω k ≥ ℏck.For the case in which the smearing function is a Gaussian of variance σ 2 , the final expression is mass-independent and can be computed in closed form, yielding (B17) That is, we expect the error made by the discrete approximation to be polynomially suppressed as we increase Kσ.
In the main text we have chosen Kσ = 16 in both the boundary sensing and thermometry examples.This places the upper-bound on average relative error at 0.32%.In the remote boundary sensing and thermometry examples we have (taking ℏ = c = 1), mσ = 0.006 and mσ = 0.00027 respectively.The average relative error can be computed numerically in each case yielding 0.16% in both cases.In subfigure a) we show the probe's coupling strength to the field modes, F (k), (blue Gaussian) as a function of the mode's wavenumber, k.Note that the probes width is taken to be σ = 1.The field's dispersion relation ℏω k is also plotted (yellow hyperbola).Note that the field's mass is taken to be m = 1.Taking a UV-cutoff at K = 16 (vertical red dashed line) yields a modified dispersion relation, ℏω ′ k , (green dashed) at high frequencies.In subfigure b) we plot the average relative error in ℏω k as a function the cutoff K and the field mass m.This error decreases polynomially as Kσ increases.The error also decreases as the mass of the field increases.The black dashed line is a mass independent upper-bound on this error.In both subfigures we have taken ℏ = c = 1.some covariances, Σ q and Σ r .The Hellinger distance between two such multivariate normal distributions is given by [87] where ∆µ = µ r − µ q and Σ = (Σ r + Σ q )/2.Thus if we can compute the means and covariances of our data in the central limit, we can find bounds for the neural network's optimal performance.Due to the Gaussian nature of our setup all of our measurement results were drawn from normal distributions.Moreover, in the main text, we discussed how our data can be compressed by just considering the sample means and sample variances of each quadrature at each time point.For clarity we will restrict the following discussion to the results, q k , of our N tom measurements of q at some t = T min + m∆t, where m ∈ {0, . . ., N times − 1}.The sample mean and variance of these measurement outcomes are distributed as q = 1 N tom Ntom k=1 q k , ∼ N µ q , σ qq N tom , and sq = 1 where χ 2 (k) is the chi-squared distribution with k degrees of freedom [88] and µ q = ⟨q⟩ and σ qq = ⟨q 2 ⟩ − ⟨q⟩ 2 are the probe's first moment and variance in q at time t.Due to the Gaussian nature of our setup, these moments can be efficiently computed [69,89].Moreover, for independent identically distributed normal data, q k , the sample mean and variance are sufficient statistics [90] such that this compression is lossless.Note that the compressed data is not normally distributed.However, for large N tom , we can apply the central limit theorem yielding q ∼ N µ q , σ qq N tom , and sq The same discussion applies equally well for our measurements of p and r at each time.Thus in the high tomography regime, our compressed data, where T max = T min + (N times − 1) ∆t.Knowing this distribution we can compute the Hellinger distance (C2) and place bounds on optimal classification rate.
1.The effect of Gaussian unitaries on non-Gaussian states Consider a scenario in which the initial state of the probe-field system is non-Gaussian with well known moments, e.g., the probe in its (Gaussian) ground state and the field in a Fock state.Suppose that the joint system undergoes a generic Gaussian unitary interaction, Û .Such a transformation maps Gaussian states to Gaussian states.In this case there is a well known relationship between the first and second moments of the initial and final states.Conversely, when applied to a non-Gaussian state, the result of such a transformation would be another non-Gaussian state.Note that in this case, since the field is in a non-Gaussian state, the reduced state of the probe may end up being non-Gaussian even if its initial state was Gaussian.
However, despite this complication, just as in the Gaussian case, there is still a relatively simple systematic relationship between all of the higher-moments before and after this transformation.That is, given the moments of the initial non-Gaussian state, we can efficiently determine all of the moments of the final non-Gaussian state.
To show this, we first note that Gaussian unitary transformations correspond to symplectic transformations in phase space.That is, if we act on each component of the vector of operators, X, via some Gaussian unitary, Û , the result is a symplectic(-affine) transformation of the phase space vector X itself.Namely, for some symplectic transformation S with SΩS ⊺ = Ω (where Ω is the symplectic form) and some displacement vector d.Note that in the above equation Û acts as a linear map on the system's Hilbert space and acts on X component-wise.On the other hand, S is a linear map on the system's phase space and acts on X as a phase space vector, yielding linear combinations of its (operator-valued) components.
We note that this Gaussian unitary (or the equivalent symplectic) relationship is a property of the dynamics alone, independent of the system state.Thus, we can use this characterization of Gaussian unitaries to understand their effect on non-Gaussian states.The effect that a Gaussian unitary has on a state's Wigner function (even a non-Gaussian one) is, (D4) That is, the effect of a Gaussian unitary is just to transform the original Wigner function by applying a linear-affine transformation to the joint phase space variables.Note that ξ in the above equation is the real-valued vector of phase space variables, ξ := (q d , p d , q 1 , p 1 , q 2 , p 2 , . . . ) ⊺ .For simplicity we will now restrict our attention to situations with zero displacement, d = 0, since this is the case relevant for this paper.Next, we can use (D4) to determine the moments of the final probe distribution from the initial probe field moments.For instance, suppose that we are interested in the fourth moment of the probe's quadrature qd after the interaction with the field.We can calculate this as follows.Let q d = (1, 0, 0, . . . ) ⊺ be a phase space vector such that q ⊺ d ξ = q d .That is, q d isolates q d from the vector of phase space variables ξ.This allows us to rewrite the desired fourth moment as Note we have assumed d = 0 for simplicity.Making a canonical change of variables to ξ ′ = S −1 ξ we have where we have defined Note that for all symplectic transformations, det(S) = 1, such that no Jabobian factor arises in the above change of variables.The fourth moment of q d in the final state is equal to the fourth moment of Q d in the initial state.Since we have assumed that we know all of the moments of the initial non-Gaussian state, we can, at least in principle, calculate ⟨q 4 d ⟩ Û ρ Û .In general, Q d will have support over the probe portion of the phase spaces as well as over a great deal of the field phase space.As such, calculating ⟨q 4 d ⟩ Û ρ Û requires us to know the correlations between all of the field modes.In particular we have where T 4,i,j,k,ℓ = dξ ξ i ξ j ξ k ξ ℓ W ρ(ξ) are the fourth moments of the initial Wigner function, and T 4 collects these.
The desired probe moment, ⟨q 4 d ⟩ Û ρ Û , is this tensor evaluated on four copies of the phase space vector Q d .Similarly, for the fourth moments of pd and rd , where p d := (0, 1, 0, . . . ) ⊺ and P Any other final probe moments (e.g.⟨q 2 d ⟩ Û ρ Û or ⟨p 8  d ⟩ Û ρ Û ) can be calculated in analogous ways.All that one needs is an understanding of the moments of the initial non-Gaussian state (i.e., the tensors T 2 , T 4 , T 8 , etc.) as well as the vectors Q d and P d which the Gaussian unitary interaction maps into the probe's phase space.

Two delta pulse interaction
As we showed in the Subsec.D 1, in order to construct the desired probe moments from the moments of the initial non-Gaussian state, we need to know the vectors Q d := S ⊺ q d and P d := S ⊺ p d .These are the vectors which the interaction maps onto the probe observables qd and pd .To identify these vectors we first need to compute the symplectic transformation S which is associated with our Gaussian unitary evolution.
Let us consider an interaction Hamiltonian given by (1) with a switching function which is the sum of two delta functions, specifically, χ(t) = δ(t) + δ(t − t m ).In this scenario the probe undergoes a very strong, very brief interaction with the field once at t = 0 and then once again at t = t m > 0. Between these two times, the probe evolves freely.As such, the operator with which it couples to the field, μd = qd , evolves in the interaction picture as μd (t) = qd (t) = qd cos(ω d t) + pd sin(ω d t) (D10) where ω d is the probe's natural frequency.The full unitary map resulting from both of these sudden interactions is given by where Rewriting this in terms of the operator valued phase space vector, X = (q d , pd , q1 , p1 , q2 , p2 , . . . ) ⊺ , we have where H(t) is the following bilinear form: where F n = 2ℏc 2 /L dxF (x) sin (k n x).Note that u(t) has support only on the probe sector of the phase space.It tracks the evolution of the probe observable, μd = qd , through time.Similarly, v(t) has support only on the field sector of the phase space, and it tracks the evolution of the field observable, dx F (x) φ(t, x).
The symplectic transformation S corresponding to U is given by where In this case, the above symplectic transformations are easy to compute.This is due to the fact that the matrix ΩH(t) is nilpotent, in particular, (ΩH(t)) 2 = 0.This follows from the four orthogonality relations The four equalities hold because Ω is anti-symmetric and because Ω does not mix the probe and field portions of the phase space (therefore, Ωu(t) and v(t) have no common support).The nilpotence of ΩH(t) makes the matrix exponential trivial to compute.Indeed since ΩH(t) squares to the zero operator we have Note that these expressions are exact, not perturbative.Applying these transformations to q d := (1, 0, 0, . . . ) ⊺ and p d := (0, 1, 0, . . . ) ⊺ gives Q d and P d as desired.
Appendix E: Second, fourth and eighth moments for Fock and Phase-averaged states In this Appendix we will calculate the second, fourth and eighth moments for Fock and Phase-averaged states.We consider the initial probe-field Wigner function defined in Eq. ( 36) in the main text, E1) where q d and p d are the probe variables, and where W Vac (q, p) = e −q 2 −p 2 /π.Note that the probe and all the modes are uncorrelated from each other.Thus, all "cross moments" factorize (e.g., ⟨q 4 p 2 4 q 3 6 p 5 6 ⟩ = ⟨q 4 p 2 4 ⟩⟨q 3 6 p 5 6 ⟩).Note also that these averages are taken with respect to the Wigner function, so that when we calculate the average of a certain dynamical function f (ξ), which we will denote ⟨f (ξ)⟩, this corresponds to the expectation value of the Weyl quantized operator ⟨ f (ξ)⟩, which is the one that we obtain by using symmetric ordering in the quantization scheme [91][92][93].
The statistics of the vacuum Wigner function are with ⟨p n ⟩ = ⟨q n ⟩, ⟨p n q m ⟩ = ⟨p n ⟩⟨q m ⟩, and odd moments vanishing.We only write explicitly up to the eighth moments that we need for our purposes, but of course for the modes in the vacuum all their odd moments vanish and their even moments are trivial functions of their second moments.The statistics (⟨q 2 ⟩ N , ⟨q 4 ⟩ N , ⟨q 6 ⟩ N , ⟨q 8 ⟩ N ) of the Fock and PAC states Wigner functions are given in Tables II and III, respectively, with ⟨p n ⟩ = ⟨q n ⟩, ⟨p n q m ⟩ = ⟨p n ⟩⟨q m ⟩, and the odd moments vanishing.Note that second moments match (first column).Also, the first row is the same for both tables since for N = 0 both the Fock state and the PAC reduce to the vacuum.
In Appendix D, we obtained a general formula to calculate the moments of the probe's observables for a probe delta-coupled to a general field at times t = 0 and t = t m .We considered a harmonic oscillator probe, μd = qd , with frequency ω d .In the next subsection, we will apply those results to the particular cases of Tables II and III.

Second moments
Using the techniques in Appendix D, the general formula for ⟨q d (t) 2 ⟩, ⟨p d (t) 2 ⟩ and ⟨r d (t) 2 ⟩ for t > t m can be obtained as   where Q d = S ⊺ q d and P d = S ⊺ p d can be calculated as where u(t) = (cos(ω d t), sin(ω d t), 0, 0, . . . ) ⊺ , (E8) In Eqs.(E3)-(E10) we observe that the only dependence on the initial state of the field is encoded in σ 0 , the covariance matrix of the initial states.Therefore, the only further particularization that we need to perform is to substitute the covariance matrix of the initial field state for each particular case, which can be computed using the values in Tables II and III.

Eighth moments
In order to calculate the eight moments, as we did with the fourth moments, we can take advantage of the symmetries of the tensor T 8,i,j,k,l,m,n,o,p .As stated before, T 8,i,j,k,l,m,n,o,p = T 8,τ (i),τ (j),τ (k),τ (l),τ (m),τ (n),τ (o),τ (p) for any 8-permutation τ .As a consequence, several terms cancel out, yielding: Now let us look at the terms one by one.The second term's coefficient comes from the possible combinations with six equal indices and two equal indices (these two different from the previous six).Therefore we have to choose six indices from the eight available to be equal to i and then two from the remaining two to set to j: 8 6 2 2 = 28.Then, we have to sum over all the pairs (and the order of the pairs matters).Now, since T 8,i,i,i,i,i,i,j,j = T 6,i,i,i,i,i,i T 2,j,j , we get 28 ∞ i,j=1,j̸ =i T 6,i,i,i,i,i,i T 2,j,j Q 6 i (t)Q The third term's coefficient comes from the possible combinations of two quadruples of equal indices different between each other.Therefore, we have to choose four indices from the eight available to be equal to i and then four from the remaining four to set to j: 8 4 4 4 = 70.Then, we have to sum over all the ordered pairs, but since in this case order does not matter, we have to divide by two.Now, since T 8,i,i,i,i,j,j,j,j = T 4,i,i,i,i T 4,j,j,j,j , we get 35 ∞ i,j=1,j̸ =i T 4,i,i,i,i T 4,j,j,j,j Q 4 i (t)Q The fourth term's factor comes from the possible combinations of a quadruple and two pairs of equal indices, different between each other.Therefore, we have to choose four indices from the eight available to be equal to i, then two from the remaining four to set to j and then the last two to set to k :  = 420.Then, we have to sum over all triples, taking into account that we have to divide by two because the order of the pairs does not matter.To be careful with the expressions we are going to define: . (E24) The first term (A) is: T 2,j,j Q 2 j (t) 2 .

(E26)
The third term (C) is: The fifth term's coefficient comes from the possible combinations with 4 pairs of indices with i, j, k, l: = 2520.Taking into account the symmetries-i.e. the order of the pairs does not matter-we have to divide by the total number of 4-permutations, which is 24.Then we have, naming this fifth term by B ijkl that B ijkl = 105 ∞ i,j,k,l=1,j̸ =i,k̸ =i,j,l̸ =i,j,k T 8,i,i,j,j,k,k,l,l Q 2 i (t)Q 2 j (t)Q 2 k (t)Q 2 l (t) (E29) Taking into account the result in Eq. (E24) and the division in parts (A), (B) and (C) we obtain, for (A): T 2,j,j Q 2 j (t)
For (B) we have then T 2,j,j Q 2 j (t) .
For (C) we have Finally, .
Finally, we note that for i ̸ = 3, 4 we have T 8,i,i,i,i,i,i,i,i − 28T 6,i,i,i,i,i,i T 2,i,i , −35T

FIG. 1 .
FIG. 1. (Color online.)A schematic example of a neural network for processing local probe data to learn about a global feature of a QFT.
FIG. 2. We trained a neural network to predict (through classification) the position of the boundary of a quantum field in a cavity from local probe data gathered far from the boundary.The network was asked 1) to detect a signal sent from the boundary (green triangles) and 2) to detect a modification of the field's boundary position (blue circles).The network's accuracy (solid) along with upper and lower bounds on the theoretical optimal accuracy (dashed) are plotted as a function of the duration of the probe's interaction with the field.A point plotted at time t indicates the network's accuracy given measurements taken at Ntimes = 10 measurement times between t and the previous plot point.The network was trained on Ntrain = 11250 examples.Each example summarizes Ntom = 10 22 measurements of each of the probe's quadratures (qd, rd and pd) at each measurement times.The inset shows details of the causal response of the detector to the signal.The vertical red line is at the edge of detector to boundary light crossing time.

FIG. 3 .
FIG.3.A neural network trained to predict (through regression) the temperature of a quantum field from local probe data.The network was trained on labeled data corresponding to field temperatures from a range T ± 10%.The fraction of the validation data which the network labeled correctly to within ±1% is plotted as function of the duration of the probe's interaction with the field.A point plotted at time t indicates the network's accuracy given measurements taken at Ntimes = 10 measurement times between t and the previous plot point.The network was trained on ntrain = 7500 examples from each range.Each example summarizes Ntom = 10 20 measurements of each of the probe's quadratures (qd, rd and pd) at each measurement time.The vertical red line is the probe's Heisenberg time ω −1 d .

FIG. 4 .
FIG. 4. (Color online.)a)The marginal distributions for the N = 4 Fock state (solid) and for the phase averaged coherent state with |α| 2 = 4 (dashed).The distributions are not Gaussian and they have the same mean and variance, making them impossible to distinguish with simple statistical analysis of first and second moments.b) The validation accuracy of a neural network trained to distinguish two field states from the measurements of a local detector coupled to the field.In particular the network differentiates vacuum cavity states with the following two modifications: 1) the lowest field mode is in an N -particle Fock state or 2) the lowest field mode is in a coherent state with expectation ⟨n1⟩ = N particles and unknown phase.

FIG. 5 .
FIG. 5. (Color online.)a) Validation accuracies attained for different tomographic sizes Ntom, for expected number of particles N = 1, and detector frequency ωd = ω1.b) Evolution of network validation accuracies with the number of training iterations, plotted for different tomographic sizes.As before, the expected number of particles is N = 1, and the frequency of the detector is that of the first mode of the lattice, ωd = ω1.

ACKNOWLEDGMENTS
The authors would like to thank Luis J. Garay for enlightening discussions.DG acknowledges support by NSERC through a Vanier Scholarship.JPG acknowledges support by a Mike and Ophelia Lazaridis Fellowship.JPG also acknowledges the support of a fellowship from "La Caixa" Foundation (ID 100010434, code LCF/BQ/AA20/11820043).EMM acknowledges support through the Discovery Grant Program of the Natural Sciences and Engineering Research Council of Canada (NSERC).EMM also acknowledges support of his Ontario Early Researcher award.Research at Perimeter Institute is supported in part by the Government of Canada through the Department of Innovation, Science and Industry Canada and by the Province of Ontario through the Ministry of Colleges and Universities.This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARC-NET:www.sharcnet.ca)and Compute/Calcul Canada.
Appendix A: Preprocessing, Neural Network Architecture and Training Details FIG.6.In subfigure a) we show the probe's coupling strength to the field modes, F (k), (blue Gaussian) as a function of the mode's wavenumber, k.Note that the probes width is taken to be σ = 1.The field's dispersion relation ℏω k is also plotted (yellow hyperbola).Note that the field's mass is taken to be m = 1.Taking a UV-cutoff at K = 16 (vertical red dashed line) yields a modified dispersion relation, ℏω ′ k , (green dashed) at high frequencies.In subfigure b) we plot the average relative error in ℏω k as a function the cutoff K and the field mass m.This error decreases polynomially as Kσ increases.The error also decreases as the mass of the field increases.The black dashed line is a mass independent upper-bound on this error.In both subfigures we have taken ℏ = c = 1.

TABLE II .
Statistics for the N particle Fock states

TABLE III .
Statistics for the phase averaged coherent state with ⟨n⟩ = N particle on average ) T 4,i,j,k,ℓ i (t) 2 Q j (t) 2 T 2,i,i T 2,j,j + 3 Q ∞ i=1 R i (t) 2 T 2,i,i 2 .