An efficient adaptive MCMC algorithm for Pseudo-Bayesian quantum tomography

We revisit the Pseudo-Bayesian approach to the problem of estimating density matrix in quantum state tomography in this paper. Pseudo-Bayesian inference has been shown to offer a powerful paradign for quantum tomography with attractive theoretical and empirical results. However, the computation of (Pseudo-)Bayesian estimators, due to sampling from complex and high-dimensional distribution, pose significant challenges that hampers their usages in practical settings. To overcome this problem, we present an efficient adaptive MCMC sampling method for the Pseudo-Bayesian estimator. We show in simulations that our approach is substantially faster than the previous implementation by at least two orders of magnitude which is significant for practical quantum tomography.


Introduction
Quantum state tomography is a fundamental important step in quantum information processing [Nielsen andChuang, 2000, Paris andŘeháček, 2004]. In general, it aims at finding the underlying density matrix which describing the given state of a physical quantum system. This task is done by utilizing the results of measurements performed on repeated state preparations [Nielsen and Chuang, 2000].
Bayesian methods have been recognized as a powerful paradigm for quantum state tomography [Blume-Kohout, 2010], that deal with uncertainty in meaningful and informative ways and are the most accurate approach with respect to the expected error (operational divergence) even with finite samples. Several studies have been conducted: for example, the papers [Bužek et al., 1998, Baier et al., 2007 performed numerical comparisons between Bayesian estimations with other methods on simulated data; algorithms for computing Bayesian estimators have been discussed in [Kravtsov et al., 2013, Ferrie, 2014, Kueng and Ferrie, 2015, Schmied, 2016, Lukens et al., 2020.
Pseudo-Bayesian method for quantum tomography, introduced in [Mai and Alquier, 2017], propose a novel approach for this problem with several attractive features. Importantly, a novel prior distribution for quantum density matrix is introduced based on spectral decomposition parameterization (inspired by the priors used for low-rank matrix estimation, e.g. Alquier, 2015, Cottet andAlquier, 2016]). This prior can be easily used in any dimension and is found to be significantly more efficient to sample from and evaluate than the Cholesky approach in [Struchalin et al., 2016,Życzkowski et al., 2011, Seah et al., 2015, see [Lukens et al., 2020] for more detals. By replacing the likelihood with a loss function between a proposed density matrix and experimental data, the paper [Mai and Alquier, 2017] presents two different estimators: the prob-estimator and the dens-estimator.
However, the reference [Mai and Alquier, 2017] propose simply to approximate these two Pseudo-Bayesian estimators by naive Metropolis-Hastings algorithms which is very slow for high-dimensional systems. Recently, a faster and more efficient sampling method has been proposed for the dens-estimator, see [Lukens et al., 2020]. However, we would like to note that the prob-estimator is shown in [Mai and Alquier, 2017] to reach the best known up-to-date rate of convergence [Butucea et al., 2015] while the theoretical guarantee for the dens-estimator is far less satisfactory. Moreover, it is also shown in simulations that the prob-estimator yields better results compare to the den-estimator.
In this paper, we present a novel efficient adaptive Metropolis-Hastings implementation for the prob-estimator. This adaptive implementation base on considering the whole density matrix as a parameter need to sample at a time. Moreover, an adaptive proposal is explored based on the 'preconditioned Crank-Nicolson' [Cotter et al., 2013] sampling procedure that can elimiate the 'curse of dimensionality', which is the case for quantum state tomography where the dimension increases exponentially. We show in the simulations that our implementation is significantly faster than the algorithm in [Mai and Alquier, 2017].
The rest of the paper is organized as follow. In Section 2, we provide the necessary background and the statistical model for the problem of quantum state tomography. In Section 3, we recall the Pseudo-Bayesian approach and the prior distribution. Section 4 presents our novel adaptive MCMC implementation for the Pseudo-Bayesian estimator. Simulations studies are presented in Section 5. Conclusions are given in Section 6.

The quantum state tomography problem
Hereafter, we only provide the necessary background on quantum state tomography required for the paper. We would like to remind that a very nice introduction to this problem, from a statistic perspective, can be found in [Artiles et al., 2005]. Here, we have opted for the notations used in reference [Mai and Alquier, 2017].
In addition, physicists are especially interested in pure states and that a pure state ρ can be defined by rank(ρ) = 1. In practice, it often makes sense to assume that the rank of ρ is small [Gross et al., 2010, Gross, 2011, Butucea et al., 2015.
The goal of quantum tomography is to estimate the underlying density matrix ρ using measurement outcomes of many independent and identically systems prepared in the state ρ by the same experimental devices.
For a qubit, it is a standard procedure to measure one of the three Pauli observables σ x , σ y , σ z . The outcome for each will be 1 or −1, randomly (the corresponding probability is given in (1) below). As a consequence, with a n-qubits system, there are 3 n possible experimental observables. The set of all possible performed observables is {σ a = σ a 1 ⊗ . . . ⊗ σ an ; a = (a 1 , . . . , a n ) ∈ E n := {x, y, z} n }, where vector a identifies the experiment. The outcome for each fixed observable setting will be a random vector s = (s 1 , . . . , s n ) ∈ {−1, 1} n , thus there are 2 n outcomes in total.
Denote R a a random vector that is the outcome of an experiment indexed by a. From the Born's rule [Nielsen and Chuang, 2000], its probability distribution is given by ∀s ∈ {−1, 1} n , p a,s := P(R a = s) = Trace (ρ · P a s ) , where P a s := P a 1 s 1 ⊗ · · · ⊗ P an sn and P a i s i is the orthogonal projection associated to the eigenvalues s i ∈ {±1} in the diagonalization of σ a i ;,a i ∈{x,y,z} -that is σ a i = 1P a i +1 − 1P a i −1 . Statistically, for each experiment a ∈ E n , the experimenter repeats m times the experiment corresponding to a and thus collects m independent random copies of R a , say R a 1 , . . . , R a m . As there are 3 n possible experiment settings a, we define the quantum sample size as N := m · 3 n . We will refer to (R a i ) i∈{1,...,m},a∈E n as D (for data). Therefore, quantum state tomography is aiming at estimating the density matrix ρ based on the data D.

Popular estimation methods
Here, we briefly recall three classical major approaches have been adopted to estimate ρ which are: linear inversion, maximum likelihood and Bayesian inference.

Linear inversion
The first and simplest method considered in quantum information processing is the 'tomographic' method also known as linear/direct inversion [Vogel andRisken, 1989,Řeháček et al., 2010]. It is actually the analogous of the least-square estimator in the quantum setting. This method relies on the fact that measurement outcome probabilities are linear functions of the density matrix.
More specifically, let us consider the empirical frequencieŝ It is noted thatp a,s is an unbiased estimator of the underlying probability p a,s in (1). Therefore, the inversion method is based on solving the linear system of equations p a,s = Trace (ρ · P a s ) , a ∈ E n , s ∈ {−1, 1} n . (2) As mentioned above, the computation ofρ is quite clear and explicit formulas are classical that can be found for example in e.g. [Alquier et al., 2013]. While straightforward and providing unbiased estimate [Schwemmer et al., 2015], it tends to generate a non-physical density matrix as an output [Shang et al., 2014]: positive semi-definiteness cannot easily be satisfied and enforced.

Maximum likelihood
A popular approach in QST in recent years is the maximum likelihood estimation (MLE). MLE aims at finding the density matrix which is most likely to have produced the observed data D: where L(ρ; D) is likelihood, the probability of observing the outcomes given state ρ, as defined by some model [Hradil et al., 2004, James et al., 2001, Gonçalves et al., 2018. However, it has some critical problems, detailed in [Blume-Kohout, 2010], including a huge computational cost. Moreover, it is a point estimate which does not account the level of uncertainty in the result. Furthermore, these two methods (Linear inversion and MLE) can not take advantage of a prior knowledge where a system is in a state ρ for which some additional information is available. More particularly, it is noted that physicists usually focus on so-called pure states, for which rank(ρ) = 1.

Bayesian inference
Starting receiving attention in recent years, Bayesian QST had been shown as a promissing method in this problem [Blume-Kohout, 2010, Bužek et al., 1998, Baier et al., 2007, Lukens et al., 2020. Through Bayes' theorem, experimental uncertainty is explicitly accounted in Bayesian estimation. More specifically, suppose a density matrix ρ is parameterized by ρ(x) for some x, Bayesian inference is carried out via the posterior distribution where L(ρ(x); D) is the likelihood (as in MLE) and π(x) is the prior distribution. Using the posterior distribution π(ρ(x)|D), the expectation value of any function of ρ can be inferred, e.g. the Bayesian mean estimator as ρ(x)π(ρ(x)|D)dx.
Although recognized as a powerful approach, the numerical challenge of sampling from a high-dimensional probabilty distribution prevents widespread use of Bayesian methods in the physical problem.

Other approaches
Several other methods have also recently introduced and studied. The reference [Cai et al., 2016] proposed a method based on the expansion of the density matrix ρ in the Pauli basic. Some rank-penalized approaches were studied in [Guţȃ et al., 2012, Alquier et al., 2013. A thresholding method is introduced in [Butucea et al., 2015].

Pseudo-Bayesian estimation
Let us consider the pseudo-posterior, studied in [Mai and Alquier, 2017], defined bỹ where exp [−λℓ(ν, D)] is the pseudo-likelihood that acting the role of the empirical evidence to give more weight to the density ν when it fits the data well; π(dν) is the prior given in Section 3.2; and λ > 0 is a tunning parameter that balances between evidence from the data and prior information.
Taking ℓ(ν, D) := ℓ prob (ν, D) = a∈E n s∈R n [Tr(νP a s ) −p a,s ] 2 , the "prob-estimator" in [Mai and Alquier, 2017] is defined as the mean estimator of the pseudo-posterior :ρ this estimator also refered to, in statistical machine learning, as Gibbs estimator, PAC-Bayesian estimator or EWA, for exponentially weighted aggregate [Catoni, 2007, Dalalyan and Tsybakov, 200 For the sake of simplicity, we use the shortened notation p ν := [Tr(νP a s )] a,s andp := [p a,s ] a,s then ℓ prob (ν, D) = p ν −p 2 F ( · F is the Frobenius norm). Clearly, we can see that this distance measures the difference between the probabilities and the empirical frequencies in the sample. We remind that the matrix [p a,s ] a,s is of dimension 3 n × 2 n .

Prior distribution for quantum density matrix
The pior distribution employed in [Mai and Alquier, 2017] is as follow: the d × d density matrix ρ can be parameterized by d non-negative real numbers y i and d complex column vectors of length d, z i . Put x = {y 1 , . . . , y d , z 1 , . . . , z d }, then the density matrix is the prior distribution for x as where the weights are being treated as Gamma-distributed random variables Y i i.i.d.
∼ Γ(α, 1), and the vectors z i are standard-normal complex Gaussian distributed Z i i.i.d.
The tunning parameter α in (5) allows the user to favor low-rank or high-rank density matrices which are corresponding to pure or mixed states, respectively. More particularly, the normalized random variables ∼ Γ(α, 1) follows a Dirichlet distribution Dir(α) which ensures both normalization and non-negativity. An α < 1 promotes sparse draws and thus purer states, while α = 1 returns a fully uniform prior on all physically realizable states.
Remark 2. It is noted that this parameterization satisfies all physicallity conditions for the density matrix. The details can be found in [Mai and Alquier, 2017]. Moreover, this parameterization have been shown to be significantly more efficient to sample from and to evaluate than the Cholesky approach in references [Struchalin et al., 2016, Zyczkowski et al., 2011, Seah et al., 2015, see [Lukens et al., 2020] for details.
Remark 3. The theoretical guarantees for the "prob-estimator" in (3) are validated only for 0 < α ≤ 1. More specifically, the prob-estimator satisfies (up to a multiplicative logarithmic factor) that ρ prob λ * − ρ 0 2 F ≤ c3 n rank(ρ 0 )/N which is the best known up-todate rate in the problem of quantun state estimation [Butucea et al., 2015], where c is a numerical constant and λ * = m/2.

A novel efficient adaptive MCMC Implementation
Appropriately, the prob-estimator requires an evaluation of the integral (3) which is numerically challenging due to its sophisticated features and high dimentionality. A first attempt has been done in [Mai and Alquier, 2017] is to use a naive Metropolis-Hastings (MH) algorithm where the authors iterate between a random walk MH for Y i and an independent MH for z i . Typically, the approach is designed to obtain T samples x (1) , . . . , x (T ) as a consequence the integral (3) can be approximated aŝ However, as also noted in the reference [Mai and Alquier, 2017], their proposed algorithm can run into slow convergence and can be arbitrarily slow as the system dimensionality increases. For sake of self-containedness, the implementation of reference [Mai and Alquier, 2017] is given in the Appendix. Borrowing motivation from the recent work [Lukens et al., 2020] that proposes an efficient 'preconditioned Crank-Nicolson' [Cotter et al., 2013] sampling procedure for Bayesian quantum state estimation (which improve the computation of "dens-estimator" in [Mai and Alquier, 2017] only), we introduce an efficient adaptive Metropolis-Hastings implementation for the probestimator in [Mai and Alquier, 2017]. We remind that the prob-estimator shows better performance than the dens-estimator both in theory and simulations.
Specifically, we propose to use a modification of random-walk MH by scaling the previous step before adding a random move and generating the proposal z ′ . Following [Cotter et al., 2013] who introduced an efficient MCMC approach elimiating the 'curse of dimensionality', termed as 'preconditioned Crank-Nicolson', we use the proposal for z j as where β z ∈ (0, 1) is a tunning parameter. The proposal is a scaled, by the factor 1 − β 2 z , random walk that results in a slightly simpler acceptance probability. Unlike the independent proposal in [Mai and Alquier, 2017] (with β z = 1) where the acceptance probability can vary substantially, this kind of adaptive proposal allows one to control the acceptance rate efficiently.
The acceptance ratio min{1, A(x ′ |x (k) )} are followed from the standard form for MH [Robert and Casella, 2013]. Let p(x ′ |x (k) ) denote the proposal density, we have .
The details of the adaptive MH is given in Algorithm 1.
• Setting 2: a maximal mixed state (rank-d) that is with ψ i are normalized vectors and independently simulated from CN (0, I d ).
The experiments are done following Section 2 for m = 1000. The prob-estimator is employed with λ = m/2 and a prior with α = 1. We compare our adaptive MH implementation, denoted by "a-MH", against the (random-walk) in [Mai and Alquier, 2017], denoted by "r-MH". We run 100 independent samplers for each experiment, and compute the mean of the square error (MSE), for each method, together with their standard deviations. We also measure the mean absolute error of eigen values (MAEE) by  , 4, 6, 7 (d = 4, 16, 64, 128).
From Figure 1, it is clear to see that our adaptive MH implementation is greatly faster than the previous implementation from [Mai and Alquier, 2017] by at least two orders of magnitude as the number of qubits increase. More specifically, the data are simulated as in Setting 1 for n = 2, 4, 6, 7 for which the dimensions of the density matrix are d = 4, 16, 64, 128 and of the empirical frequencies matrices [p a,s ] are 9×4, 81×16, 729× 64, 2187 × 128. We note that this improvement is quite significant for practical quantum tomography where computational time is a precious resource.

Tunning parameters via acceptance rate
The tunning parameters β y , β z are chosen such that the acceptance rate of Algorithm 1 is approximating 0.3, which follows the optimum acceptence probability for random-walk Metropolis-Hastings under various assumptions [Gelman et al., 1997]. For example, as in our experiments, for n = 2 qubits: β y = 0.33, β z = 0.2; for n = 3 qubits: β y = 0.03, β z = 0.03 and for n = 4 qubits: β y = 0.03, β z = 0.02 (all are run with α = 1, λ = m/2). We note that as the number of qubits n increase, these tunning parameters tend to be smaller and smaller to asure that the 0.3 acceptance rate is obtained.
As an illustration, we conduct some simulations with n = 4 qubits in Setting 2. It can be seen from Figure 2 that the acceptance rate around 0.3 would be optimal, where as high acceptance rate like 0.7 could make the algorithm be trapped at local points, and very small acceptance rate as 0.1 could make the algorithm convergernce slower.

Discussion and conclusion
We have introduced an efficient sampling algorithm for Pseudo-Bayesian quantum tomography, especially for the prob-estimator. Our approach is an adaptive Metropolis-Hasting implementation which shows a clear improvement in convergence and computational time comparing with a naive MH implementation. We would like to mention that such an improvement is significant important for practical quantum state tomography.
Last but not least, faster algorithms based on optimization, such as Variation inference, for Bayesian quantum tomography would be an interesting research problem. However, it should be noted that the analysis of the uncertainty quantification when using Variational inference is not known, while this matter is an important aspect in the problem of quantum state estimation.

Availability of data and code
The R codes and data used in the numerical experiments are available at: https://github.com/tienmt/bqst .

A Naive Metropolis-Hastings
Algorithm 2 MH implementation from [Mai and Alquier, 2017] For t from 1 to T , we iteratively update through the following steps: updating for Y ′ i s: for i from 1 to d, where h is a proposal distribution given explicitely below.
updating for V ′ i s: for i from 1 to d, SampleṼ i from the uniform distribution on the unit sphere. Set where A(V (t−1) ,Ṽ ) is the acceptance ratio given below.