Attaining the Ultimate Precision Limit in Quantum State Estimation

We derive a bound on the precision of state estimation for finite dimensional quantum systems and prove its attainability in the generic case where the spectrum is non-degenerate. Our results hold under an assumption called local asymptotic covariance, which is weaker than unbiasedness or local unbiasedness. The derivation is based on an analysis of the limiting distribution of the estimator’s deviation from the true value of the parameter, and takes advantage of quantum local asymptotic normality, a useful asymptotic characterization of identically prepared states in terms of Gaussian states. We first prove our results for the mean square error of a special class of models, called D-invariant, and then extend the results to arbitrary models, generic cost functions, and global state estimation, where the unknown parameter is not restricted to a local neighbourhood of the true value. The extension includes a treatment of nuisance parameters, i.e. parameters that are not of interest to the experimenter but nevertheless affect the precision of the estimation. As an illustration of the general approach, we provide the optimal estimation strategies for the joint measurement of two qubit observables, for the estimation of qubit states in the presence of amplitude damping noise, and for noisy multiphase estimation.


Introduction
Quantum estimation theory is one of the pillars of quantum information science, with a wide range of applications from evaluating the performance of quantum devices [1,2] to exploring the foundation of physics [3,4]. In the typical scenario, the problem is specified by a parametric family of quantum states, called the model, and the objective is to design measurement strategies that estimate the parameters of interest with the highest possible precision. The precision measure is often chosen to be the mean square error (MSE), and is lower bounded through generalizations of the Cramér-Rao bound of classical statistics [5,6]. Given n copies of a quantum state, such generalizations imply that the product MSE · n converges to a positive constant in the large n limit. Despite many efforts made over the years (see, e.g., [5][6][7][8][9][10][11][12] and [13] for a review), the attainability of the precision bounds of quantum state estimation has only been proven in a few special cases. Consider, as an example, the most widely used bound, namely the symmetric logarithmic derivative Fisher information bound (SLD bound, for short). The SLD bound is tight in the one-parameter case [5,6], but is generally non-tight in multiparameter estimation. Intuitively, measuring one parameter may affect the precision in the measurement of another parameter, and thus it is extremely tricky to construct the optimal measurement. Another bound for multiparameter estimation is the right logarithmic derivative Fisher information bound (RLD bound, in short) [5]. Its achievability was shown in the Gaussian states case [5], the qubits case [14,15], and the qudits case [16,17]. In this sense, the RLD bound is superior to the SLD bound. However, the RLD bound holds only when the family of states to be estimated satisfies an ad hoc mathematical condition. The most general quantum extension of the classical Cramér-Rao bound till now is the Holevo bound [5], which gives the maximum among all existing lower bounds for the error of unbiased measurements for the estimation of any family of states. The attainability of the Holevo bound was studied in the pure states case [10] and the qubit case [14,15], and was conjectured to be generic by one of us [18]. Yamagata et al. [19] addressed the attainability question in a local scenario, showing that the Holevo bound can be attained under certain regularity conditions. However, the attaining estimator constructed therein depends on the true parameter, and therefore has limited practical interest. Meanwhile, the need of a general, attainable bound on multiparameter quantum estimation is increasing, as more and more applications are being investigated [20][21][22][23][24].
In this work we explore a new route to the study of precision limits in quantum estimation. This new route allows us to prove the asymptotic attainability of the Holevo bound in generic scenarios, to extend its validity to a broader class of estimators, and to derive a new set of attainable precision bounds. We adopt the condition of local asymptotic covariance [18] which is less restrictive than the unbiasedness condition [5] assumed in the derivation of the Holevo bound. Under local asymptotic covariance, we characterize the MSE of the limiting distribution, namely the distribution of the estimator's rescaled deviation from the true value of the parameter in the asymptotic limit of n → ∞.
Our contribution can be divided into two parts, the attainability of the Holevo bound and the proof that the Holevo bound still holds under the weaker condition of local asymptotic covariance. To show the achievability part, we employ quantum local asymptotic normality (Q-LAN), a useful characterization of n-copy d-dimensional (qudit) states in terms of multimode Gaussian states. The qubit case was derived in [14,15] and the case of full parametric models was derived by Kahn and Guta when the state has nondegenerate spectrum [16,17]. Here we extend this characterization to a larger class of models, called D-invariant models, using a technique of symplectic diagonalization. For models that are not D-invariant, we derive an achievable bound, expressed in terms of a quantum Fisher information-like quantity that can be straightforwardly evaluated. Whenever the model consists of qudit states with non-degenerate spectrum, this quantity turns out to be equal to the quantity in the Holevo bound [5]. Our evaluation has compact uniformity and order estimation of the convergence, which will allow us to prove the achievability of the bound even in the global setting.
We stress that, until now, the most general proof of the Holevo bound required the condition of local unbiasedness. In particular, no previous study showed the validity of the Holevo bound under the weaker condition of local asymptotic covariance in the multiparameter scenario. To avoid employing the (local) unbiasedness condition, we focus on the discretized version of the RLD Fisher information matrix, introduced by Tsuda and Matsumoto [25]. Using this version of the RLD Fisher information matrix, we manage to handle the local asymptotic covariance condition and to show the validity of the Holevo bound in this broader scenario. Remarkably, the validity of the bound does not require finite-dimensionality of the system or non-degeneracy of the states in the model. Our result also provides a simpler way of evaluating the Holevo bound, whose original expression involved a difficult optimization over a set of operators.
The advantage of local asymptotic covariance over local unbiasedness is the following. For practical applications, the estimator needs to attain the lower bound globally, i.e., at all points in the parameter set. However, it is quite difficult to meet this desideratum under the condition of local unbiasedness, even if we employ a two-step method based on a first rough estimate of the state, followed by the measurement that is optimal in the neighbourhood of the estimate. In this paper, we construct a locally asymptotic covariant estimator that achieves the Holevo bound at every point, for any qudit submodel except those with degenerate states. Our construction proceeds in two steps. In the first step, we perform a full tomography of the state, using the protocol proposed in [26]. In the second step, we implement a locally optimal estimator based on Q-LAN [16,17]. The two-step estimator works even when the estimated parameter is not assumed to be in a local neighbourhood of the true value. The key tool to prove this property is our precise evaluation of the optimal local estimator with compact uniformity and order estimation of the convergence. Our method can be extended from the MSE to arbitrary cost functions. A comparison between the approach adopted in this work (in green) and conventional approaches to quantum state estimation (in blue) can be found in Fig. 1.
Besides the attainability of the Holevo bound, the method can be used to derive a broad class of bounds for quantum state estimation. Under suitable assumptions, we characterize the tail of the limiting distribution, providing a bound on the probability that the estimate falls out of a confidence region. The limiting distribution is a good approximation of the (actual) probability distribution of the estimator, up to a term vanishing in n. Then, we derive a bound for quantum estimation with nuisance parameters, i.e. parameters that are not of interest to the experimenter but may affect the estimation of the other parameters. For instance, the strength of noise in a phase estimation scenario can be regarded as a nuisance parameter. Our bound applies also to arbitrary estimation models, thus extending nuisance parameter bounds derived for specific cases (see, e.g., [27][28][29]). In the final part of the paper, the above bounds are illustrated in concrete examples, including the joint measurement of two qubit observables, the estimation of qubit states in the presence of amplitude damping noise, and noisy multiphase estimation.
The remainder of the paper is structured as follows. In Sect. 2 we introduce the main ideas in the one-parameter case. Our discussion of the one-parameter case requires no regularity condition for the parametric model. Then we devote several sections to introducing and deriving tools for the multiparameter estimation. In Sect. 3, we briefly review the Holevo bound and Gaussian states, and derive some relations that will be useful in the rest of the paper. In Sect. 4, we introduce Q-LAN. In Sect. 5 we introduce the -difference RLD Fisher information matrix, which will be a key tool for deriving our bounds in the multiparameter case. In Sect. 6, we derive the general bound on the precision of multiparameter estimation. In Sect. 7, we address state estimation in the presence of nuisance parameters and derive a precision bound for this scenario. Section 8 provides bounds on the tail probability. In Sect. 9, we extend our results Comparison between the approach of this work (in green) and the traditional approach of quantum state estimation (in blue). In the traditional approach, one derives precision bounds based on the probability distribution function (PDF) for measurements on the original set of quantum states. The bounds are evaluated in the large n limit and the task is to find a sequence of measurements that achieves the limit bound. In this work, we first characterize the limiting distribution and then work out a bound in terms of the limiting distribution. This construction also provides the optimal measurement in the limiting scenario, which can be used to prove the asymptotic attainability of the bound. The analysis of the limiting distribution also provides tail bounds, which approximate the tail bounds for finite n up to a small correction, under the assumption that the cost function and the model satisfy a certain relation (see Theorem 9)  to global estimation and to generic cost functions. In Sect. 10, the general method is illustrated through examples. The conclusions are drawn in Sect. 11. Remark on the notation In this paper, we use z * for the complex conjugate of z ∈ C and A † for the Hermitian conjugate of an operator A. For convenience of the reader, we list other frequently appearing notations and their definitions in Table 1.

Precision Bound Under Local Asymptotic Covariance: One-Parameter Case
In this section, we discuss estimation of a single parameter under the local asymptotic covariance condition, without any assumption on the parametric model.

Cramér-Rao inequality without regularity assumptions.
Consider a one-parameter model M, of the form where is a subset of R. In the literature it is typically assumed that the parametrization is differentiable. When this is the case, one can define the symmetric logarithmic derivative operator (SLD in short) at t 0 via the equation Then, the SLD Fisher information is defined as The SLD L t 0 is not unique in general, but the SLD Fisher information J t 0 is uniquely defined because it does not depend on the choice of the SLD L t 0 among the operators satisfying (2). When the parametrization is C 1 -continuous and > 0 is a small number, one has where is the fidelity between two density matrices ρ and ρ . It is called Bhattacharya or Hellinger coefficient in the classical case [30,31].
Here we do not assume that the parametrization (1) is differentiable. Hence, the SLD Fisher information cannot be defined by (3). Instead, following the intuition of (4), we define the SLD Fisher information J t 0 as the limit In the n-copy case, we have the following lemma: Proof. Using the definition (6), we have lim inf In other words, the SLD Fisher information is constant over n if we replace by / √ n. To estimate the parameter t ∈ , we perform on the input state a quantum measurement, which is mathematically described by a positive operator valued measure (POVM) with outcomes in X ⊂ R. An outcome x is then mapped to an estimate of t by an estimatort(x). It is often assumed that the measurement is unbiased, in the following sense: a POVM M on a single input copy is called unbiased when For a POVM M, we define the mean square error (MSE) V t (M) as Then, we have the fidelity version of the Cramér-Rao inequality:

Theorem 1. For an unbiased measurement M satisfying
for any t, we have When lim →0 V t 0 + (M) = V t 0 (M), taking the limit → 0, we have The proof uses the notion of fidelity between two classical probability distributions: for two given distributions P and Q on a probability space X , we define the fidelity F(P Q) as follows. Let f P and f Q be the Radon-Nikodým derivatives of P and Q with respect to P + Q, respectively. Then, the fidelity F(P Q) can be defined as With the above definition, the fidelity satisfies an information processing inequality: for every classical channel G, one has F(G(P) G(Q)) ≥ F(P Q). For a family of probability distributions {P θ } θ∈ , we define the Fisher information as When the probability distributions are over a discrete set, their Fisher information coincides with the quantum SLD of the corresponding diagonal matrices.
Proof of Theorem 1. Without loss of generality, we assume t 0 = 0. We define the probability distribution P t by P t (B) := Tr [ ρ t M(B) ]. Then, the information processing inequality of the fidelity [32] yields the bound F(ρ t 0 ||ρ t 0 + ) ≤ F(P 0 P ). Hence, it is sufficient to show (12) for the probability distribution family {P t }.
Let f 0 and f be the Radon-Nikodým derivatives of P 0 and P with respect to P 0 + P . Denoting the estimate byt, we have and therefore Also, (14) implies the relation Hence, Schwartz inequality implies Combining (16), (17), and (18) we have (12).

2.2.
Local asymptotic covariance. When many copies of the state ρ t are available, the estimation of t can be reduced to a local neighbourhood of a fixed point t 0 ∈ . Motivated by Lemma 1, we adopt the following parametrization of the n-copy state having used the notation a +b := {ax +b |x ∈ }, for two arbitrary constants a, b ∈ R.
With this parametrization, the local n-copy model is ρ n Assuming t 0 to be known, the task is to estimate the local parameter t ∈ R, by performing a measurement on the n-copy state ρ n t 0 ,t and then mapping the obtained data to an estimatet n . The whole estimation strategy can be described by a sequence of POVMs m := {M n }. For every Borel set B ⊂ R, we adopt the standard notation In the existing works on quantum state estimation, the error criterion is defined in terms of the difference between the global estimate t 0 +t n √ n and the global true value t 0 + t √ n . Instead, here we focus on the difference between the local estimatet n and the true value of the local parameter t. With this aim in mind, we consider the probability distribution We focus on the behavior of ℘ n t 0 ,t|M n in the large n limit, assuming the following condition: Condition 1 (Local asymptotic covariance for a single-parameter). A sequence of measurements m = {M n } satisfies local asymptotic covariance 1 when 1. The distribution ℘ n t 0 ,t|M n (20) converges to a distribution ℘ t 0 ,t|m , called the limiting distribution, namely for any Borel set B.

the limiting distribution satisfies the relation
for any t ∈ R, which is equivalent to the condition Using the limiting distribution, we can faithfully approximate the tail probability as where the n term vanishes with n for every fixed . For convenience, one may be tempted to require the existence of a probability density function (PDF) of the limiting distribution ℘ t 0 ,t|m . However, the existence of a PDF is already guaranteed by the following lemma.

Lemma 2.
When a sequence m := {M n } of POVMs satisfies local asymptotic covariance, the limiting distribution ℘ t 0 ,t|m admits a PDF, denoted by ℘ t 0 ,0|m,d .
The proof is provided in "Appendix A".

MSE bound for the limiting distribution.
As a figure of merit, we focus on the mean square error (MSE) V [℘ t 0 ,t|m ] of the limiting distribution ℘ t 0 ,t|m , namely Note that local asymptotic covariance implies that the MSE is independent of t.
The main result of the section is the following theorem: Theorem 2 (MSE bound for single-parameter estimation). When a sequence m := {M n } of POVMs satisfies local asymptotic covariance, the MSE of its limiting distribution is lower bounded as where J t 0 is the SLD Fisher information of the model {ρ t } t∈ . The PDF of ℘ t 0 ,t|m is upper bounded by J t 0 . When the PDF of ℘ t 0 ,t|m is differentiable with respect to t, equality in (25) holds if and only if ℘ t 0 ,t|m is the normal distribution with average zero and variance J −1 t 0 .
Proof of Theorem 2. When the integral Rt ℘ t 0 ,0|m (dt) does not converge, V [℘ t 0 ,t|m ] is infinite and satisfies (25). Hence, we can assume that the above integral converges. Further, we can assume that the outcomet satisfies the unbiasedness condition Rt ℘ t 0 ,t|m (dt) = t. Otherwise, we can replacet byt 0 :=t − Rt ℘ t 0 ,0|m (dt ) because the estimatort 0 has a smaller MSE thant and satisfies the unbiasedness condition due to the covariance condition. Hence, Theorem 1 guarantees Applying Lemma 20 to {℘ t 0 ,t|m }, we have The inequality (a) holds by Lemma 20 from "Appendix B", and the inequality (b) comes from the data-processing inequality of the fidelity. The equation (c) follows from Lemma 1. Finally, substituting Eq. (27) into Eq. (26), we have the desired bound (25). Now, we denote the PDF of ℘ t 0 ,0|m by ℘ t 0 ,0|m,d . In "Appendix A" the proof of Lemma 2 shows that we can apply Lemma 19 to {℘ t 0 ,t|m } t . Since the Fisher information When the PDF ℘ t 0 ,t|m,d is differentiable, to derive the equality condition in Eq. (25), we show (26) in a different way. Let l t 0 ,t (x) be the logarithmic derivative of ℘ t 0 ,t|m,d (x), The numerator on the right hand side of Eq. (28) can be evaluated by noticing that By local asymptotic covariance, this quantity can be evaluated as Hence, (28) coincides with (26). The denominator on the right hand side of (28) equals the right hand side of (26). The equality in Eq. (28) holds if and only if is proportional tox, which implies that ℘ t 0 ,0|m is the normal distribution with average zero and variance J −1 t 0 . The RHS of (25) can be regarded as the limiting distribution version of the SLD quantum Cramér-Rao bound. Note that, when the limiting PDF is differentiable and the bound is attained, the probability distribution ℘ n t 0 ,t|M n is approximated (in the pointwise sense) by a normal distribution with average zero and variance 1 n J t 0 . Using this fact, we will show that there exists a sequence of POVMs that attains the equality (25) at all points uniformly. The optimal sequence of POVMs is constructed explicitly in Sect. 6.

2.4.
Comparison between local asymptotic covariance and other conditions. We conclude the section by discussing the relation between asymptotic covariance and other conditions that are often imposed on measurements. This subsection is not necessary for understanding the technical results in the next sections and can be skipped at a first reading.
Let us start with the unbiasedness condition. Assuming unbiasedness, one can derive the quantum Cramér-Rao bound on the MSE [5]. Holevo showed the attainability of the quantum Cramér-Rao bound when estimating displacements in Gaussian systems [5].
The disadvantage of unbiasedness is that it is too restrictive, as it is satisfied only by a small class of measurements. Indeed, the unbiasedness condition for the estimator M requires the condition Tr E d i ρ t dt i | t=t 0 = 0 for i ≥ 2 with E := t M(dt) as well as the condition Tr E dρ t dt | t=t 0 = 1. In certain situations, the above conditions might be incompatible. For example, consider a family of qubit states ρ t := 1 2 (I + n t · σ ). When the Bloch vector n t has a non-linear dependence on t and the set of higher order derivatives d i ρ t dt i | t=t 0 with i ≥ 2 spans the space of traceless Hermittian matrices, no unbiased estimator can exist. In contrast, local asymptotic covariance is only related to the first derivative dρ t dt | t=t 0 because the contribution of higher order derivatives to the variablet n has order o 1 √ n and vanishes under the condition of the local asymptotic covariance.
One can see that the unbiasedness condition implies local asymptotic covariance with the parameterization ρ t 0 + t √ n in the following sense. When we have n (more than one) input copies, we can construct unbiased estimator by applying a single-copy unbiased estimator M satisfying Eq. (9) to all copies as follows. For the i-th outcome x i , we take the rescaled average n i=1 x i n , which satisfies the unbiasedness (9) for the parameter t as well. When the single-copy estimator M has variance v at t 0 , which is lower bounded by the Cramér-Rao inequality, this estimator has variance v/n at t 0 . In addition, the average (30) of the obtained data satisfies the local asymptotic covariance because the rescaled estimator follows the Gaussian distribution with variance v in the large n limit by the central limit theorem; the center of the Gaussian distribution is pinned at the true value of the parameter by unbiasedness; the shape of the Gaussian is independent of the value t and depends only on t 0 ; thus locally asymptotic covariance holds.
The above discussion can be extended to the multiple-copy case as follows. Suppose that M is an unbiased measurement for the -copy state ρ ⊗ where is an arbitrary finite integer. From the measurement M we can construct a measurement for the n-copy state with n = k + i and i < by applying the measurement M k times and discarding the remaining i copies. In the following, we consider the limit where the total number n tends to infinity, while is kept fixed. When the variance of M at t 0 is v/ , the average k i=1 x i k (30) of the k obtained data x 1 , . . . , x k satisfies local asymptotic covariance, i.e., the rescaled estimator follows the Gaussian distribution with variance v in the large n limit. Therefore, for any unbiased estimator, there exists an estimator satisfying locally asymptotic covariance that has the same variance.
Another common condition, less restrictive than unbiasedness, is local unbiasedness. This condition depends on the true parameter t 0 and consists of the following two requirements where is a fixed, but otherwise arbitrary, integer. The derivation of the quantum Cramér-Rao bound still holds, because it uses only the condition (32). When the parametrization ρ t is C 1 continuous, the first derivative d dt t Tr ρ ⊗ t M (dt) is continuous at t = t 0 , and the locally unbiased condition at t 0 yields the local asymptotic covariance at t 0 in the way as Eq. (30). Another relaxation of the unbiasedness condition is asymptotic unbiasedness [11] lim n→∞ t Tr ρ ⊗n The condition of asymptotic unbiasedness leads to a precision bound on MSE [34,Chapter 6]. The bound is given by the SLD Fisher information, and therefore it is attainable for Gaussian states. However, no attainable bound for qudit systems has been derived so far under the condition of asymptotic unbiasedness. Interestingly, one cannot directly use the attainability for Gaussian systems to derive an attainability result for qudit systems, despite the asymptotic equivalence between Gaussian systems and qudit systems stated by quantum local asymptotic normality (Q-LAN) (see [16,17] and Sect. 4.1). The problem is that the error of Q-LAN goes to 0 for large n, but the error in the derivative may not go to zero, and therefore the condition (34) is not guaranteed to hold. In order to guarantee attainability of the quantum Cramér-Rao bound, one could think of further loosening the condition of the asymptotic unbiasedness. An attempt to avoid the problem of the Q-LAN error could be to remove condition (34) and keep only condition (33). This leads to an enlarged class of estimators, called weakly asymptotically unbiased. The problem with these estimators is that no general MSE bound is known to hold at every point x. For example, one can find superefficient estimators [35,36], which violate the Cramér-Rao bound on a set of points. Such a set must be of zero measure in the limit n → ∞, but the violation of the bound may occur in a considerably large set when n is finite. In contrast, local asymptotic covariance guarantees the MSE bound (25) at every point t where the local asymptotic convariance condition is satisfied. All these alternative conditions for deriving MSE bounds, discussed here in this subsection, are summarized in Table 2.

Holevo bound.
When studying multiparameter estimation in quantum systems, we need to address the tradeoff between the precision of estimation of different parameters. This is done using two types of quantum extensions of Fisher information matrix: the SLD and the right logarithmic derivative (RLD).
Consider a multiparameter family of density operators {ρ t } t∈ , where is an open set in R k , k being the number of parameters. Throughout this section, we assume that ρ t 0 is invertible and that the parametrization is C 1 in all parameters. Then, the SLD L j and the RLDL j for the parameter t j are defined through the following equations see e.g. [5,6] and [15,Sect. II]. It can be seen from the definitions that the SLD L j can always be chosen to be Hermitian, while the RLDL j is in general not Hermitian. The SLD quantum Fisher information matrix J t and the RLD quantum Fisher information matrixJ t are the k × k matrices defined as Notice that the SLD quantum Fisher information matrix J t is a real symmetric matrix, but the RLD quantum Fisher information matrixJ t is not a real matrix in general. A POVM M is called an unbiased estimator for the family S = {ρ t } when the relation holds for any parameter t. For a POVM M, we define the mean square error (MSE) It is known that an unbiased estimator M satisfies the SLD type and RLD type of Cramer-Rao inequalities respectively [5]. Since it is not always possible to minimize the MSE matrix under the unbiasedness condition, we minimize the weighted MSE tr W V t (M) for a given weight matrix W ≥ 0, where tr denotes the trace of k × k matrices. When a POVM M is unbiased, one has the RLD bound [5] tr with In particular, when W > 0, the lower bound (39) is attained by the matrix V = Re The RLD bound has a particularly tractable form when the model is D-invariant: Definition 1. The model {ρ t } t∈ is D-invariant at t when the space spanned by the SLD operators is invariant under the linear map D t . For any operator X , D t (X ) is defined via the following equation where [A, B] = AB − B A denotes the commutator. When the model is D-invariant at any point, it is simply called D-invariant.
For a D-invariant model, the RLD quantum Fisher information can be computed in terms of the D-matrix, namely the skew-symmetric matrix defined as Precisely, the RLD quantum Fisher information has the expression [5] Hence, (39) becomes For D-invariant models, the RLD bound is larger and thus it is a better bound than the bound derived by using the SLD Fisher information matrix (the SLD bound). However, in the one-parameter case, when the model is not D-invariant, the RLD bound is not tight, and it is common to use the SLD bound in the one-parameter case. Hence, both quantum extensions of the Cramér-Rao bound have advantages and disadvantages.
To unify both extensions, Holevo [5] derived the following bound, which improves the RLD bound when the model is not D-invariant. For a k-component vector X of operators, define the k × k matrix Z t (X) as Then, Holevo's bound is as follows: for any weight matrix W , one has where UB M denotes the set of all unbiased measurements under the model M, V is a real symmetric matrix, and X = (X i ) is a k-component vector of Hermitian operators satisfying C H,M (W, t) is called the Holevo bound. When W > 0, there exists a vector X achieving the minimum in (45). Hence, similar to the RLD case, the equality in (45) holds for W > 0 only when Moreover, we have the following proposition.
In (49), min X:M denotes the minimum for vector X whose components X i are linear combinations of the SLDs operators in the model M . In (50), the minimization is taken over all k × k matrices satisfying the constraint (P) i j := δ i j for i, j ≤ k, J t and D t are the SLD Fisher information matrix and the D-matrix [cf. Eqs. (35) and (41)] for the extended model S at t := (t, 0).
The Holevo bound is always tighter than the RLD bound: The equality holds if and only if the model M is D-invariant [37]. In the above proposition, it is not immediately clear whether the Holevo bound depends on the choice of the extended model S . In the following, we show that there is a minimum D-invariant extension of S, and thus the Holevo bound is independent of the choice of S . The minimum D-invariant subspace in the space of Hermitian matrices is given as follows. Let V be the subspace spanned SLDs Then, the subspace V is D-invariant and contains V. What remains is to show that V is the minimum Dinvariance subspace. Let V be the orthogonal space with respect to V for the inner product defined by Tr ρ X † Y . We denote by P and P the projections into V and V respectively. Each component X i of a vector of operators X can be expressed as X i = P X i + P X i . Then, the two vectors X := (P X i ) and X := (P X i ) satisfy the inequality Z t (X) = Z t (X ) + Z t (X ) ≥ Z t (X ). Substituting Eq. (35) into Eq. (47) and noticing that P X i has no support in V, we get that only the part P X i contributes the condition (47) and the minimum in (46) is attained when X = 0. Hence, the minimum is achieved when each component of the vector X is included in the minimum D-invariant subspace V . Therefore, since the minimum D-invariant subspace can be uniquely defined, the Holevo bound does not depend on the choice of the D-invariant model S that extends S.

Classical and quantum
Gaussian states. For a classical system of dimension d C , a Gaussian state is a d C -dimensional normal distribution N [α C , Γ C ] with mean α C and covariance matrix Γ . The corresponding random variable will be denoted as Z = (Z 1 , . . . , Z d C ) and will take values z = (z 1 , . . . , For quantum systems we will restrict our attention to a subfamily of Gaussian states, known as displaced thermal states. For a quantum system made of a single mode, the displaced thermal states are defined as where α ∈ C is the displacement, T Q α is the displacement operator,â is the annihilation operator satisfying the relation [â,â † ] = 1, and ρ thm β is a thermal state, defined as where the basis {| j } j∈N consists of the eigenvectors ofâ †â and β ∈ (0, ∞) is a real parameter, hereafter called the thermal parameter. For a quantum system of d Q modes, the products of single-mode displaced thermal states will be denoted as where α Q = (α j ) d Q j=1 is the vector of displacements and β Q = (β j ) d Q j=1 is the vector of thermal parameters. In the following we will regard α as a vector in R 2d Q , using the . For a hybrid system of d C classical variables and d Q quantum modes, we define the Equivalently, the canonical Gaussian states can be expressed as where T α is the Gaussian shift operator For the classical part, we have adopt the notation With this notation, the canonical Gaussian state G[α, Γ ] is uniquely identified by the characteristic equation [5] Tr The formulation in terms of the characteristic equation (60) can be used to generalize the notion of canonical Gaussian state [38]. Given a d-dimensional Hermitian matrix (correlation matrix) Γ = Re(Γ ) + iIm(Γ ) whose real part Re(Γ ) is positive semidefinite, we define the operators R := (R 1 , . . . , R d ) via the commutation relation We define the general Gaussian state G[α, Γ ] on the operators R as the linear functional on the operator algebra generated by R 1 , . . . , R d satisfying the characteristic equation (60) [38]. Note that, although Γ is not necessarily positive semi-definite, its real part Re(Γ ) is positive semi-definite. Hence, the right-hand-side of Eq. (60) is contains a negative semi-definite quadratic form, in the same way as for the standard Gaussian states.
For general Gaussian states, we have the following lemma.

Lemma 3. Given a Hermitian matrix Γ , there exists an invertible real matrix T such that the Hermitian matrix T Γ T T is the correlation matrix of a canonical Gaussian state.
In particular, when The proof is provided in "Appendix C".
In the above lemma, we can transform Γ into the block form Γ C ⊕ Γ Q where Γ C is real by applying orthogonal transformation. The unitary operation on the classical part is given as a scale conversion. Hence, an invertible real matrix T can be realized by the combination of a scale conversion and a linear conversion, which can be implemented as a unitary on the Hilbert space. Hence, a general Gaussian state can be given as the resultant linear functional on the operator algebra after the application of the linear conversion to a canonical Gaussian state. This kind of construction is unique up to unitarily equivalence. Indeed, Petz [38] showed a similar statement by using Gelfand-Naimark-Segal (GNS) construction. Our derivation directly shows the uniqueness without using the GNS construction.

Lemma 4. The Gaussian states family
This lemma shows the inverse of the RLD Fisher information matrix is given by the correlation matrix.
Proof. Due to the coordinate conversion give in Lemma 3, it is sufficient to show the relation (63) for the canonical Gaussian states family. In that case, the desired statement has already been shown by Holevo in [5].
Therefore, as shown in "Appendix D", a D-invariant Gaussian model can be characterized as follows:

) The image of the linear map A −1 T is invariant for the application of B. (3) There exist a unitary operator U and a Hermitian matrix
where

Measurements on Gaussian states family.
We discuss the stochastic behavior of the outcome of the measurement on the c-q system generated by R = (R j ) d j=1 when the state is given as a general Gaussian state G[α, Γ ]. To this purpose, we introduce the notation ℘ α|M (B) := Tr G[α, Γ ]M(B) for a POVM M. Then, we have the following lemma.
In this case, the weighted covariance matrix is The proof is provided in "Appendix E". In the above lemma, when X = R, we simplify M Γ P|W to M Γ W . This lemma is useful for estimation in the Gaussian states family M := {G[t, Γ ]} t∈R d . In this family, we consider the covariant condition.
for any t. This condition is equivalent to Then, we have the following lemma for this Gaussian states family.

Corollary 1 ([5]). For any weight matrix W ≥ 0 and the above Gaussian states family
where CUB M are the sets of covariant unbiased estimators for the model M , respectively. Further, when W > 0, the above infimum is attained by the covariant unbiased estimators M Γ W whose output distribution is the normal distribution with average t and covariance matrix Re This corollary can be shown as follows. Due to Lemma 4, the lower bound (43) of the weighted MSE tr W V t (M) of unbiased estimator M is calculated as the RHS of (65). Lemma 6 guarantees the required performance of M Γ W . To discuss the case when W is not strictly positive definite, we consider W := W + I . Using the above method, we can construct an unbiased and covariant estimator whose output distribution is the 2d Q -dimensional distribution of average t and covariance Re( , which converges to the bound (65). By combining Proposition 1, this corollary can be extended to a linear subfamily of k -dimensional Gaussian family {G[t , Γ ]} t ∈R k . Consider a linear map T from R k to R k . We have the following corollary for the subfamily M := {G[T (t), Γ ]} t∈R k .

Corollary 2. For any weight matrix W
Further, when W > 0, we choose a vector X to realize the minimum in (49). The above infimum is attained by the covariant unbiased estimators M W whose output distribution is the normal distribution with average t and covariance matrix Re( (49) can be given when the components X are given a linear combination of R 1 , . . . , R k . Hence, the latter part of the corollary with W > 0 follows from (45) and Lemma 6, implies this corollary for W > 0. The case with non strictly positive W can be shown by considering W in the same way as Corollary 1.

Local Asymptotic Normality
The extension from one-parameter estimation to multiparameter estimation is quite nontrivial. Hence we first develop the concept of local asymptotic normality which is the key tool to constructing the optimal measurement in multiparameter estimation. Since we could derive the tight bound of MSE for the Gaussian states family, it is a natural idea to approximate the general case by Gaussian states family, and local asymptotic normality will serve as the bridge between these general qudit families and Gaussian state families.

4.1.
Quantum local asymptotic normality with specific parametrization. For a quantum system of dimension d < ∞, also known as qudit, we consider generic states, described by density matrices with full rank and non-degenerate spectrum. To discuss quantum local asymptotic normality, we need to define a specific coordinate system. For this aim, we consider the neighborhood of a fixed density matrix ρ θ 0 , assumed to be diagonal in the canonical basis of C d , and parametrized as In the neighborhood of ρ θ 0 , we parametrize the states of the system as and U θ R ,θ I is the unitary matrix defined by Here θ R and θ I are vectors of real parameters where δ j,k is the delta function. We note that by this definition the components of θ R and θ I are in one-to-one correspondence. The parameter θ = (θ C , θ R , θ I ) will be referred to as the Q-LAN coordinate, and the state with this parametrization, which was used by Khan and Guta in [16,17,39], will be denoted by ρ KG θ . Q-LAN establishes an asymptotic correspondence between multicopy qudit states and Gaussian shift models. Using the parameterization θ = (θ C , θ R , θ I ), we have the multicopy qudit models and Gaussian shift models are equivalent in terms of the RLD quantum Fisher information matrix:

Lemma 7. The RLD quantum Fisher information matrices of the qudit model and the corresponding Gaussian model in
The calculations can be found in "Appendix F". The quantum version of local asymptotic normality has been derived in several different forms [16,17,39] with applications in quantum statistics [12,40], benchmarks [41] and data compression [42]. Here we use the version of [17], which states that n identical copies of a qudit state can be locally approximated by a c-q Gaussian state in the large n limit. The approximation is in the following sense: Definition 3 (Compact uniformly asymptotic equivalence of models). For every n ∈ N * , let {ρ t,n } t∈ n and { ρ t,n } t∈ n be two models of density matrices acting on Hilbert spaces H and K respectively where the set of parameters n may depend on n. We say that the two families are asymptotically equivalent for t ∈ n , denoted as ρ t,n ∼ = ρ t,n (t ∈ n ), if there exists a quantum channel T n (i.e. a completely positive trace preserving map) mapping trace-class operators on H to trace-class operators on K and a quantum channel S n mapping trace-class operators on K to trace-class operators on H, which are independent of t and satisfy the conditions Next, we extend asymptotic equivalence to compact uniformly asymptotic equivalence. In this extension, we also describe the order of the convergence.
Given a sequence {a n } converging to zero, for every t in a compact set K consider two models {ρ t,t ,n } t∈ n , and { ρ t,t ,n } t∈ n . We say that they are asymptotically equivalent for t ∈ n compact uniformly with respect to t with order a n , denoted as ρ t,t ,n t ∼ = ρ t,t ,n (t ∈ n , a n ), if for every t ∈ K there exists a quantum channel T n,t mapping trace-class operators on H to trace-class operators on K and a quantum channel S n,t mapping trace-class operators on K to trace-class operators on H such that Notice that the channels T n,t and S n,t depend on t and are independent of t.
In the above terminology, Q-LAN establishes an asymptotic equivalence between families of n copy qudit states and Gaussian shift models. Precisely, one has the following.
Proposition 2 (Q-LAN for a fixed parameterization; Kahn and Guta [16,17]). For any x < 1/9, we define the set n,x of θ as n,x := θ | θ ≤ n x ( · denotes the vector norm). Then, we have the following compact uniformly asymptotic equivalence; where κ is a parameter to satisfy κ ≥ 0.027, and N [θ C , Γ θ 0 ] is the multivariate normal distribution with mean θ C and covariance matrix Γ θ 0 ,k,l : The conditions (73) and (74) are not enough to translate precision limits for one family into precision limits for the other. This is because such limits are often expressed in terms of the derivatives of the density matrix, whose asymptotic behaviour is not fixed by (73) and (74). In the following we will establish an asymptotic equivalence in terms of the RLD quantum Fisher information.

Quantum local asymptotic normality with generic parametrization.
In the following, we explore to which extent can we extend Q-LAN in Proposition 2. Precisely, we derive a Q-LAN equivalence as in Eq. (75) which is not restricted to the parametrization of Eqs. (68) and (69).
In the previous subsection, we have discussed the specific parametrization given in (67). In the following, we discuss a generic parametrization. Given an arbitrary Dinvariant model ρ ⊗n with vector parameter t, we have the following theorem.
Theorem 3 (Q-LAN for an arbitrary parameterization). Let {ρ t } t∈ be a k-parameter D-invariant qudit model. Assume that ρ t 0 is a non-degenerate state, the parametrization is C 2 continuous, andJ −1 t 0 exists. Then, there exist a constant c(t 0 ) such that the set whereJ −1 t 0 is the RLD Fisher information at t 0 and κ is a parameter to satisfy κ ≥ 0.027.
Proof. We choose the basis {|i } d i=1 to diagonalize the state ρ t 0 . We denote the Q-LAN parametrization based on this basis by ρ is the parameter to describe the diagonal elements of ρ t 0 . Since the parametrization ρ t is C 2 -continuous, the function f is also C 2 -continuous. Proposition 2 guarantees that

the combination of this evaluation and (78) yields
The combination of Lemma 5 and (79) implies (77).

The -Difference RLD Fisher Information Matrix
In Sect. 2.1 we evaluated the limiting distribution in the one-parameter case, using the fidelity as a discretized version of the SLD Fisher information. In order to tackle the multiparameter case, we need to develop a similar discretization for the RLD Fisher information matrix, which is the relevant quantity for the multiparameter setting (cf. Sect. 3). In this section we define a discretized version of the RLD Fisher information matrix, extending to the multiparameter case the single-parameter definition introduced by Tsuda and Matsumoto [25], who in turn extended the corresponding classical notion [43,44].

Definition.
Let M = {ρ t } t∈ be a k-parameter model, with the property that ρ t 0 is invertible. If the parametrization ρ t is differentiable, the RLD quantum Fisher information matrixJ t can be rewritten as the following k × k matrix The -difference RLD quantum Fisher information matrixJ t 0 , is defined by replacing the partial derivatives with finite increments: where e j is the unit vector with 1 in the j-th entry and zero in the other entries. Notice that one has where When the parametrization ρ t is differentiable, one has whereJ t 0 is the RLD quantum Fisher information matrix (80). When the parametrization is not differentiable, we define the RLD Fisher information matrixJ t 0 to be the limit (83), provided that the limit exists. All throughout this section, we impose no condition on the parametrization ρ t , except for the requirement that ρ t 0 be invertible.

The -difference RLD Cramér-Rao inequality.
A discrete version of the RLD quantum Cramér-Rao inequality can be derived under the assumption of -locally unbiasedness, defined as follows: and Under the -locally unbiasedness condition, Tsuda et al. [25] derived a lower bound on the MSE for the one-parameter case. In the following theorem, we extend the bound to the multiparameter case.
Theorem 4 ( -difference RLD Cramér-Rao inequality). The MSE matrix for anlocally unbiased POVM M at t 0 satisfies the bound Proof. For simplicity, we assume that t 0 = 0. For two vectors a ∈ C k and b ∈ C k , we define the two observables X : Then, the Cauchy-Schwartz inequality implies the second equality following from -locally unbiasedness at t 0 . Note that one has Tr[Y † Yρ t 0 ] = b|J t 0 , |b and which implies a|V t 0 (M)|a ≥ a|(J t 0 , ) −1 |a . Since a is arbitrary, the last inequality implies (84).
The -difference RLD Cramér-Rao inequality can be used to derive an information processing inequality, which states that the -difference RLD Fisher information matrix is non-increasing under the application of measurements. For a family of probability distributions {P t } t∈ , we assume that P t+ e j is absolutely continuous with respect to P t for every j. Then, the -difference RLD Fisher information is defined as where p t+ e j and p t+ e i are the Radon-Nikodým derivatives of P t+ e j and P t+ e i with respect to P t , respectively. We note that the papers [43,44] defined its one-parameter version when the distributions are absolutely continuous with respect to the Lebesgue measure. Hence, when an estimatort for the distribution family {P t } t∈ is -locally unbiased at t 0 , in the same way as (84), we can show the -difference Cramér-Rao inequality; For a family of quantum states {ρ t } t∈ and a POVM M, we denote by J M t, the -difference Fisher information matrix of the probability distribution family {P M t } t∈ defined by P M t := Tr Mρ t . With this notation, we have the following lemma:

Lemma 8. For every family of quantum states {ρ t } t∈ and every POVM M, one has the information processing inequalityJ
Proof. Consider the estimation of t from the probability distribution family {P M t } t∈ . Following the same arguments used for the achievability of the Cramér-Rao bound with locally unbiased estimators (see, for instance, Chapter 2 of Ref. [34]), it is possible to show that there exists an -locally unbiased estimatort at t 0 such that Combining the POVM M with the -locally unbiased estimatort we obtain a new POVM M , which is -locally unbiased. Applying Theorem 4 to the POVM M we obtain which implies (90).
We stress that (90) is a matrix inequality for Hermitian matrices: in general,J t 0 , has complex entries. Also note that any classical process can be regarded as a POVM. Hence, in the same way as (90), using the -difference Cramér-Rao inequality (89), we can show the inequality for an classical process E when J is the -difference Fisher information matrix on the distribution family {P t } t∈ and J is the -difference Fisher information matrix on the distribution family {E(P t )} t∈ .

Extended models.
The lemmas in the previous subsection can be generalized to the case where an extended model M := {ρ t } t =(t, p) contains the original model M as ρ t = ρ (t,0) . Choosing t 0 = (t 0 , 0), we denote the -difference RLD Fisher information matrix at t 0 for the family M byJ t 0 , .

Lemma 9.
For an -locally unbiased estimator M at t 0 , there exists a k × k matrix P such that P i j = δ i j for i, j ≤ k and Proof of Lemma 9. For an -locally unbiased estimator M at t 0 , there exists a k × k matrix P such that Now, we introduce a new parametrizationρ η : Applying Theorem 4 to the parameter η, we obtain Combining (94) and (96), we obtain the desired statement.
In the same way as Lemmas 8, 9 yields the following lemma.

Lemma 10.
For any POVM M, there exists a k × k matrix P such that P i j = δ i j for i, j ≤ k and

Asymptotic case.
We denote byJ n t 0 , the -difference RLD Fisher information matrix of the n-copy states {ρ ⊗n t } t∈ . In the following we provide the analogue of Lemma 1 for the RLD Fisher information matrix.

Precision Bounds for Multiparameter Estimation
6.1. Covariance conditions. First, we introduce the condition for our estimators. The correspondence between qudit states and Gaussian states also extends to the estimator level. We consider a generic state family M = {ρ t } t∈ , with the parameter space being an open subset of R k . Similar to the single-parameter case, given a point t 0 ∈ , we consider a local model ρ n t 0 ,t := ρ ⊗n t 0 +t/ √ n . Throughout this section, we assume that ρ t 0 is invertible. For a sequence of POVM m := {M n }, we introduce the condition of local asymptotic covariance as follows: Condition 2 (Local asymptotic covariance). We say that a sequence of measurements m := {M n } satisfies local asymptotic covariance at t 0 ∈ under the state family M, if the probability distribution converges to a limiting distribution the relation holds for any t ∈ R k . 2 When we need to express the outcome of ℘ n t 0 ,t|M n or ℘ t 0 ,t|m , we denote it byt.
Further, we say that a sequence of measurements m := {M n } satisfies local asymptotic covariance under the state family M when it satisfies local asymptotic covariance at any element t 0 ∈ under the state family M.
Under these preparations, we obtain the following theorem by using Theorem 3.

Theorem 5. Let {ρ ⊗n
t } t∈ be a k-parameter D-invariant qudit model with C 2 continuous parametrization. Assume thatJ −1 t 0 exists, ρ t 0 is a non-degenerate state, and a sequence of measurements m := {M n } satisfies local asymptotic covariance at t 0 ∈ . Then there exists a covariant POVM M G such that for any vector t and any measurable subset B. HereJ t 0 is the RLD Fisher information of the qudit model at t 0 .
To show Theorem 5, we will use the following lemma. 2 The range of t is determined via the constraint t 0 + t/ √ n ∈ . Just as in the one-parameter case, t can take any value in R k when n is large enough. The range of the local parameter is then t ∈ R k .

holds for any vector α if and only if
Here ξ and y are k-dimensional vectors, | y is a (multimode) coherent state, γ j are thermal parameters of the Gaussian, and F −1 ξ → y (g) denotes the inverse of the Fourier transform F ξ → y (g) := dξ e iξ · y g. Therefore, for a given function f (α), there uniquely exists an operator F to satisfy (104).
The proof can be found in "Appendix G". Now, we are ready to prove Theorem 5.
Proof of Theorem 5. We consider without loss of generality G[t,J −1 t 0 ] to be in the canonical form, noticing that any Gaussian state is unitarily equivalent to a Gaussian state in the canonical form as shown by Lemma 3. For any measurable set B, we define the operator M G (B) as From the above definition, it can be verified that M G (B) satisfies the definition of a POVM: first, it is immediate to see that the term F −1 What remains to be shown is that the POVM { M G (B)} satisfies the covariance condition. Eq. (107) guarantees that and The uniqueness of the operator to satisfy the condition (104) implies the covariance condition

MSE bound for the D-invariant case.
Next, we derive the lower bound of MSE of the limiting distribution for any D-invariant model. As an extension of the mean square error, we introduce the mean square error matrix (MSE matrix), defined as for a generic probability distribution ℘. Since the set of symmetric matrices is not totally ordered, we will consider the minimization of the expectation value tr W V [℘ t 0 ,t|m ] for a certain weight matrix W ≥ 0. For short, we will refer to the quantity tr W V [℘ t 0 ,t|m ] as the weighted MSE. Under local asymptotic covariance, one can derive lower bounds on the covariance matrix of the limiting distribution and construct optimal measurements to achieve them. In general, the attainability of the conventional quantum Cramér-Rao bounds is a challenging issue. For instance, a well-known bound is the symmetric logarithmic derivative (SLD) Fisher information bound where J t 0 is the SLD Fisher information. The SLD bound is attainable in the singleparameter case, i.e. when k = 1, yet it is in general not attainable for multiparameter estimation (see, for instance, later in Sect. 10.1 for a concrete example).
In the following, we derive an attainable lower bound on the weighted MSE. To this purpose, we define the set LAC(t 0 ) of local asymptotic covariant sequences of measurements at the point t 0 ∈ . For a model M, we focus on the minimum value When k ≥ 2, a better choice is the RLD quantum Fisher information bound. The main result of this section is an attainable bound on the weighted MSE, relying on the RLD quantum Fisher information.
Theorem 6 (Weighted MSE bound for D-invariant models). Assume thatJ −1 t 0 exists. Consider any sequence of locally asymptotically covariant measurements m := {M n }. The limiting distribution is evaluated as whereJ t 0 is the RLD quantum Fisher information. When the model is C 1 continuous and D-invariant, we have the bound for the weighted MSE with weight matrix W ≥ 0 of the limiting distribution as where J t 0 is the SLD quantum Fisher information (35) and D t 0 is the D-matrix (41). When S is a D-invariant qudit model and the state ρ t 0 is not degenerate, we have Moreover, if W > 0 and ℘ t 0 ,0|m has a differentiable PDF, the equality in (112) holds if and only if ℘ t 0 ,t|m is the normal distribution with average zero and covariance Further, when {ρ t } t∈ is a qudit-model with C 2 continuous parametrization, the equality in (112) holds, i.e., there exist a sequence of POVMs M t 0 ,n W , a compact set K , and constant c(t 0 ) such that lim sup where κ is a parameter to satisfy κ ≥ 0.027.
In the following, we prove Theorem 6 following three steps. The first step is to derive the bound (112). The second step is to show that, to achieve the equality, the limiting distribution needs to be a Gaussian with certain covariance. The last step is to find a measurement attaining the equality. In this way, when the state is not degenerate, we can construct the measurement using Q-LAN. 3 Proof of Theorem 6. Impossibility part 4 (Proofs of (111) and (112)): To give a proof, we focus on the -difference RLD Fisher information matrixJ t 0 , at t 0 for a quantum states family {ρ t } t∈ . We denote the -difference Fisher information matrices for the distribution family {℘ n t 0 ,t|M n } t and {℘ t 0 ,t|m } t by J n t, and J m t, , respectively. Also, we employ the notations given Sect. 5.4.
Applying (90) to the POVM M n , we have 1 nJ By taking the limit n → ∞, the combination of (116), (98) of Lemma 11,and (117) impliesJ Here, in the same way as the proof of Theorem 2, we can assume that the outcomê t satisfies the unbiasedness condition. Hence, the -difference Carmér-Rao inequality (89) implies that By taking the limit → 0, (99) of Lemma 11 implies When the model is C 1 continuous and D-invariant, adding the conventional discussion for MSE bounds (see, e.g., Chapter 6 of [5]) to (119), we obtain (112). Achievability part (Proof of (113)): Next, we discuss the attainability of the bound when W > 0 and ℘ t 0 ,0|m has a differentiable PDF. In this case, we have the Fisher information matrix J m 0 of the location shift family {℘ t 0 ,t|m } t . Taking limit → 0 in (119), we have The equality of (112) holds if and only if V [℘ t 0 ,t|m ] = V t 0 |W and the equality in the first inequality of (121) holds. Due to the same discussion as the proof of Theorem 2, the equality in the first inequality of (121) holds only when all the components of the logarithmic derivative of the distribution family {℘ t 0 ,t|m } t equal the linear combinations of the estimate of t i . This condition is equivalent to the condition that the distribution family {℘ t 0 ,t|m } t is a distribution family of shifted normal distributions. Therefore, when W > 0, the equality condition of Eq. (112) is that ℘ t 0 ,t|m is the normal distribution with average zero and covariance matrix V t 0 |W . Now, we assume that the state ρ t 0 is not degenerate. Then, we use Q-LAN to show that there always exists a sequence of POVM m = {M n } satisfying the above property. We rewrite Eq. (77) of Theorem 3 as follows.
lim sup where the notation is the same as Theorem 3. Then, we choose the covariant POVM MJ Notice that when W has null eigenvalues, √ W −1 is not properly defined. In this case, we consider W : Meanwhile, since W > 0 we can repeat the above argument to find a qudit measurement that attains tr W J −1 . Taking the limit → 0 the quantity tr W J −1 converges to the equality of Eq. (113). Therefore, we can still find a sequence of measurements with Fisher information {J } that approaches the bound.

Precision bound for the estimation of generic models.
In the previous subsection, we established the precision bound for D-invariant models, where the bound is attainable and has a closed form. Here we extend the bound to any n-copy qudit models. The main idea is to extend the model to a larger D-invariant model by introducing additional parameters. When estimating parameters in a generic model S (consisting of states generated by noisy evolutions, for instance), the bound (112) may not hold. It is then convenient to extend the model to a D-invariant model S which contains S. Since the bound (112) holds for the new model S , a corresponding bound can be derived for the original model S. The new model S has some additional parameters other than those of S, which are fixed in the original model S. Therefore, a generic quantum state estimation problem can be regarded as an estimation problem in a D-invariant model with fixed parameters. The task is to estimate parameters in a model S (globally) parameterized as t 0 = (t 0 , p 0 ) ∈ , where p 0 is a fixed vector and is an open subset of R k that equals when restricted to R k . In the neighborhood of t 0 , since the vector p 0 is fixed, we have t = (t, 0) with 0 being the null vector of R k −k and t ∈ R k being a vector of free parameters. For this scenario, only the parameters in t need to be estimated and we know the parameters p 0 . Hence, the MSE of t is of the form for any local asymptotic covariant measurement sequence m. Due to the block diagonal form of the MSE matrix, to discuss the weight matrix W in the original model S, we consider the weight matrix W = P T W P in the D-invariant model S , where P is any k × k matrix satisfying the constraint (P) i j := δ i j for i, j ≤ k in the following way.
Theorem 7 (MSE bound for generic models). The models S and S are C 1 continuous and are given in the same way as Proposition 1, and the notations are the same as Proposition 1. Also, we assume thatJ −1 t 0 exists. Consider any sequence of locally asymptotically covariant measurements m := {M n }. Then, the MSE matrx of the limiting distribution is evaluated as ℘ t 0 ,t|m . There exists a k × k matrix P such that Moreover, if W > 0 and ℘ t 0 ,0|m has a differentiable PDF, the equality in (125) holds if and only if ℘ t 0 ,t|m is the normal distribution with average zero and covariance matrix Theorem 7 determines the ultimate precision limit for generic qudit models. Now, we compare it with the most general existing bound on quantum state estimation, namely Holevo's bound [5]. Let us define the ultimate precision of unbiased measurements as Since the Holevo bound still holds with the n-copy case, (see [15,Lemma 4]) we have There are a couple of differences between our results and existing results: The Holevo bound is derived under unbiasedness assumption, which, as mentioned earlier, is more restrictive than local asymptotic covariance. Our bound (125) thus applies to a wider class of measurements than the Holevo bound. Furthermore, Yamagata et al. [19] showed a similar statement as (127) of Theorem 7 in a local model scenario. They did not show the compact uniformity of the convergence and had no order estimation of the convergence. However, our evaluation (127) guarantees the compact uniformity with the order estimation. Then, they did not discuss an estimator to attain the bound globally. Later, we will construct an estimator to attain our bound globally based on the estimator given in Theorem 7. Our detailed evaluation with the compact uniformity and the order estimation enables us to evaluate the performance of such an estimator globally.
Proof of Theorem 7. Impossibility part (Proofs of (124) and (125)): We denote the -difference Fisher information matrices for the distribution family {℘ n t 0 ,t|M n } t and {℘ t 0 ,t|m } t by J n t, and J m t, , respectively. Also, we denote the -difference type RLD Fisher information matrix at t 0 = (t 0 , 0) of the family {ρ ⊗n t } t byJ n t 0 , . Then, we have (117) in the same way.
Applying (97) of Lemma 10 with → / √ n, there exist k × k matrices P n such that Hence, the combination of (98) of Lemma 11,(130), and (117) implies that there exists a k × k matrices P such that Due to the same reason as (119), we have By taking the limit → 0, the combination of (99) of Lemma 11 and (133) implies (124). When the model M is D-invariant, since we obtain (125) by using the expression (50) in the same way as (112): Achievability part (Proof of (126)): Since ρ t 0 is not degenerate, we can show the achievability in the same way as Theorem 6 because we can apply Q-LAN (Theorem 3) for the model M . The difference is the following. Choosing the matrix P to achieve the minimum (50), we employ the

Nuisance Parameters
For state estimation in a noisy environment, the strength of noise is not a parameter of interest, yet it affects the precision of estimating other parameters. In this scenario, the strength of noise is a nuisance parameter [46,47]. To illustrate the difference between nuisance parameters and fixed parameters that are discussed in the previous section, let us consider the case of a qubit clock state under going a noisy time evolution. To estimate the duration of the evolution, we introduce the strength of the noise as an additional parameter and consider the estimation problem in the extended model parameterized by the duration and the noise strength. The strength of the noise is usually unknown. Although it is not a parameter of interest, its value will affect the precision of our estimation, and thus it should be treated as a nuisance parameter.

Precision bound for estimation with nuisance parameters.
In this subsection, we consider state estimation of an arbitrary (k + s)-parameter model {ρ t, p } (t, p)∈˜ , where t and p are k-dimensional and s-dimensional parameters, respectively. Our task is to estimate only the parameters t and it is not required to estimate the other parameters p, which is called nuisance parameters. Hence, our estimate is k-dimensional. We say that a parametric family of a structure of nuisance parameters is a nuisance parameter model, and denote it byS = {ρ t, p } (t, p)∈˜ . We simplify (t, p) byt.
The concept of local asymptotic covariance can be extended to a model with nuisance parameters by considering a local model ρ ñ t 0 ,t := ρ ⊗ñ t 0 +t/ √ n . Throughout this section, we assume that ρ˜t 0 is invertible and all the parametrizations are at least C 1 continuous. Condition 3 (Local asymptotic covariance with nuisance parameters). We say that a sequence of measurements m := {M n } to estimate the k-dimensional parameter t satisfies local asymptotic covariance att 0 = (t 0 , p 0 ) ∈˜ under the nuisance parameter model M when the probability distribution

In (138), V is a real symmetric matrix and X = (X i ) is a k-component vector of operators to satisfy
In (139), the minimization is taken over all k × (k + s) matrices satisfying the constraint (P) i j := δ i j for i ≤ k, j ≤ k +s, and, J t 0 and D t 0 are the SLD Fisher information matrix and the D-matrix [cf. Eqs. (35) and (41)] for the extended model S at t 0 := (t 0 , 0). In the following, we derive an attainable lower bound on the weighted MSE. To this purpose, we define the set LAC(t 0 ) of local asymptotic covariant sequences of measurements at the pointt 0 ∈˜ for the nuisance parameter modelM, and focus on the minimum value When the model S is D-invariant, we have the bound for the weighted MSE with weight matrix W ≥ 0 of the limiting distribution as

When the model S is a D-invariant qudit model and the state ρ t 0 is not degenerate, we have
Moreover, if W > 0 and ℘˜t 0 ,t|m has a differentiable PDF, the equality in (144) holds if and only if ℘ t 0 ,t|m is the normal distribution with average zero and covariance where X is the vector to realize the minimum (138 Here κ is a parameter to satisfy κ ≥ 0.027. Before proving Theorem 8, we discuss a linear subfamily of k -dimensional Gaussian family {G[t , γ ]} t ∈R k . Consider a linear map T from R (k+s) to R k . We have the subfamilyM := {G[T (t, p), γ ]} (t, p)∈R k+s as a nuisance parameter model. Then, the covariance condition is extended as follows.  ℘ t, p|M (B + t).
Then, we have the following corollary of Lemma 6.  T (t, p), γ ]} with C 1 continuous parametrization satisfies where UBM and CUBM are the sets of unbiased estimators and covariant unbiased estimators of the nuisance parameter modelM, respectively. Further, when W > 0, we choose a vector X to realize the minimum in (49). The above infimum is attained by the covariant unbiased estimators M W whose output distribution is the normal distribution with average t and covariance matrix Re((Z t (X))+ This corollary can be shown as follows. The inequality inf M∈UBM tr W V t (M) ≥ C NH,M (W, t) follows from the condition (140). Similar to Corollary 2, Proposition 1 guarantees that the latter part of the corollary with W > 0 follows from (138) and Lemma 6. Hence, we obtain this corollary for W > 0. The case with non strictly positive W can be shown by considering W in the same way as Corollary 1.
Proof of Theorem 8. Impossibility part (Proofs of (143) and (144)): We denote the -difference Fisher information matrix of {℘˜t 0 ,t|m }˜t by J m t 0 , . Due to (132), there exists a (k + s) × k matrixP satisfying the following conditions.
We define the k × (k + s) matrixP bȳ by p i . Then, for two vectors a ∈ R k and b ∈ R k+s , we apply Schwartz inequality to the two variables X :

Lemma 14. WhenS = {ρ (t, p) } (t, p)∈˜ is a D-invariant k + s-parameter nuisance parameter model and J −1 t 0 exists, we have
A few comments are in order. First, the nuisance parameter bound (144) reduces to the bound (112), when the parameters to estimate are orthogonal to the nuisance parameters in the sense that the RLD Fisher information matrixJ˜t 0 is block-diagonal. This orthogonality is equivalent to the condition that the SLD Fisher information matrix J˜t 0 and the D-matrix take the block diagonal forms This is the case, for instance, of simultaneous estimation of the spectrum and the Hamiltonian-generated phase of a two-level system. Under such circumstances, the inverse of the Fisher information matrix can be done by inverting J t 0 and J N independently. The same precision bound is thus obtained with or without introducing nuisance parameters, and we have the following lemma.

Lemma 15. When all nuisance parameters are orthogonal to the parameters of interest, the bound with nuisance parameters (144) coincides with the D-invariant MSE bound (112).
In the case of orthogonal nuisance parameters, the estimation of nuisance parameters does not affect the precision of estimating the parameters of interest, which does not hold for the generic case of non-orthogonal nuisance parameters. Thanks to this fact, one can achieve the bound (144) by first measuring the nuisance parameters and then constructing the optimal measurement based on the estimated value of the nuisance parameters. On the other hand, an RLD bound [cf. Eq. (39)] can be attained if and only if its model is D-invariant. Combining these arguments with Lemma 15, we obtain a characterization of the attainability of RLD bounds as follows.

Corollary 4. An RLD bound can be achieved if and only if it has an orthogonal nuisance extension, i.e. Eq. (154) holds for some choice of nuisance parameters.
The above corollary offers a simple criterion for the important problem of the attainability of RLD bounds. In Sect. 10.3, we will illustrate the application of this criterion with a concrete example. The bound (144) can be straightforwardly computed even for complex models; for Dinvariant models, the SLD operators have an uniform entry-wise expression and one only needs to shot it into a program to yield the bound (144). Moreover, the bound does not rely on the explicit choice of nuisance parameters. To see this, one can consider another parameterization x of the D-invariant model. The bound (144) comes from the RLD bound for the D-invariant model, and the RLD quantum Fisher information matrices J t 0 and J x 0 for two parameterizations are connected by the equation Since both parameterizations are extensions of the same model S satisfying P 0 t 0 = P 0 x 0 = t 0 , the Jacobian takes the form Then we have J −1 and J t 0 are equal. The bound (144) thus remains unchanged.

Precision bound for joint measurements. A useful implication of
The main result of this subsection is the following corollary:

where MSE o i denotes the MSE of o i under joint measurement and J is the SLD quantum Fisher information. The sum of the SLD gaps for all observables satisfies the attainable bound:
where D is the D-matrix.
The right hand side of Eq. (157) is exactly the gap between the SLD bound and the ultimate precision limit. It shows a typical example where the SLD bound is not attainable.
Proof. Substituting W in Eq. (144) by the projection into the subspace R k , we obtain a bound for the MSE {MSE o i } of the limiting distributions: Here J and D are the SLD Fisher information and D-matrix for the extended model, and (A) k×k denotes the upper-left k × k block of a matrix A. Substituting the above definition into Eq. (158), we obtain Corollary 5.
Specifically, for the case of two parameters, the bound (157) reduces to ji L i are the SLD operators in the dual space. Next, taking partial derivative with respect to o j on both sides of Eq. (155) and substituting in the definition of RLD operators, the observables satisfy the orthogonality relation with the SLD operators as By uniqueness of the dual space, we havê . . , k and the bound becomes Another bound expressing the tradeoff between Δ o 1 and Δ o 2 was obtained by Watanabe et al. [48] as Now, substituting O 2 by α O 2 for a variable α ∈ R in Eq. (160), we have For the above quadratic inequality to hold for any α ∈ R, its discriminant must be non-positive, which immediately implies the bound (161). Notice that the bound (161) was derived under asymptotic unbiasedness [48], and thus it was not guaranteed to be attainable. Here, instead, since our bound (160) is always attainable, the bound (161) can also be achieved in any qudit model under the asymptotically covariant condition.

Nuisance parameters versus fixed parameters. It is intuitive to ask what is the relationship between the nuisance parameter bound (144) and the general bound (125).
To see it, let S = {ρ t } t∈ be a generic k-parameter qudit model and letS be a (k + s)parameter D-invariant model containing S. When ρ t 0 is non-degenerate, we notice that the QCR bound with nuisance parameters (144) can be rewritten as where P 0 is a k ×(k +s) matrix satisfying the constraint (P 0 ) i j := δ i j for any i, j ≤ k +s. By definition, P 0 is a special case of P, and it follows straightforwardly from comparing Eq. (162) with Eq. (125) that the general MSE bound is upper bounded by the MSE bound for the nuisance parameter case. This observation agrees with the obvious intuition that having additional information on the system is helpful for (or at least, not detrimental to) estimation. At last, since J˜t 0 and D˜t 0 are block-diagonal in the case of orthogonal nuisance parameters, we have for any k × (k + s) matrix satisfying the constraint (P) i j := δ i j for i, j ≤ k. This implies that the general bound (125) coincides with the nuisance parameter bound (144) when the nuisance parameters are orthogonal.

Tail Property of the Limiting Distribution
In previous discussions, we focused on the MSE of the limiting distribution. Here, instead, we consider the behavior of the limiting distribution itself. The characteristic property is the tail property: Given a weight matrix W ≥ 0 and a constant c, we define the tail region T W,c (t) as For a measurement m = {M n (t n )}, the probability that the estimatet n is in the tail region can be approximated by the tail probability of the limiting distribution, i.e.
up to n being a term vanishing in n. The tail property is usually harder to characterize than the MSE. Nevertheless, here we show that, under certain conditions, there exists a good bound on the tail property of the limiting distribution.

Tail property of Gaussian shift models.
Just like in the previous sections, the tail property of n-copy qudit models can be analyzed by studying the tail property of Gaussian shift models. In this subsection, we first derive a bound on the tail probability of Gaussian shift models. The result has an interest in its own and can be used for further analysis of qudit models using Q-LAN. β] and a measurement M G (α). Then, define the probability ℘ α|M G T W,c (α) , where T W,c (α) is the tail region around α defined as

Consider a Gaussian shift model
Then, for covariant POVMs, the tail probability is independent of α and is given by: When the measurement is covariant, we have the following bound on the tail probability, which can be attained by a certain covariant POVM: with W C ≥ 0. Then, the tail probability of the limiting distribution is bounded as where e is the 2s-dimensional vector with all entries equal to 1. For the definition of E s e −β + e/2 , see (56). When the POVM M G is given as M G (B) = B |α 1 , . . . , α s α 1 , . . . , α s |dα, the equality in (164) holds.
The proof can be found in "Appendix H". When the model has a group covariance, similar evaluation might be possible. For example, similar evaluation was done in the n-copy of full pure states family [49] and in the n-copy of squeezed states family [50, Sect. 4.1.3].

Tail property of D-invariant qudit models. For a k-parameter D-invariant model
for T c := {x ∈ R k | x ≥ c}. The equality holds if and only if ℘ t 0 ,t|m is the normal distribution with average zero and covariance V t 0 |W as defined in Eq. (114).
We note that bounds on the probability distributions are usually more difficult to obtain and more informative than the MSE bounds, as the MSE can be determined by the probability distribution. Theorem 9 provides an attainable bound of the tail probability, which can be used to determine the maximal probability that the estimate falls into a confidence region T W,c as well as the optimal measurement. Our proof of Theorem 9 needs some preparations. First, we introduce the concept of simultaneous diagonalization in the sense of symmetric transformation. Two 2k × 2k real symmetric matrices A 1 and A 2 are called simultaneously symplectic diagonalizable when there exist a symplectic matrix S and two real vectors β 1 and β 2 such that such that with E k defined in Eq. (56). Regarding the simultaneous diagonalization, we have the following property, whose proof can be found in "Appendix I": For a sequence of measurement m := {M n } to satisfy local asymptotic covariance at t 0 ∈ , according to Theorem 5, we choose a covariant POVM M G to satisfy (103). Applying Lemma 16 to the POVM M G , we obtain the desired statement.
Step 2 We consider the general case. Now, we choose the local parameter t := J −1/2 t 0 t. In this coordinate, The inverse of the RLD quantum Fisher information is I +J , the weight matrix has no cross term between the classical and quantum parts. Using the above discussion and Lemma 16, we obtain the desired statement.

Extension to Global Estimation and Generic Cost Functions
In the previous sections, we focused on local models and cost functions of the form tr W V [℘ t 0 ,t|m ]. In this section, our treatment will be extended to global models {ρ t } t∈ . (where the parameter to be estimated is not restricted to a local neighborhood) and to generic cost functions.

Optimal global estimation via local estimation.
Our optimal global estimation is given by combining the two-step method and local optimal estimation. That is, the first step is the application of full tomography proposed in [26] on n 1−x/2 copies with the outcomet 0 for a constant x ∈ (0, 2/9), and the second step is the local optimal estimation att 0 , given in Sect. 6.3, on a n,x := n − n 1−x/2 copies. Before its full description, we define the neighborhood n,x (t) of t ∈ as (167) Given a generic model M = {ρ t } t∈ that does not contain any degenerate state and a weight matrix W > 0, we describe the full protocol as follows.
(A1) Localization: Perform full tomography proposed in [26] on n 1−x/2 copies, which is described by a POVM {M tomo n 1−x/2 }, for a constant x ∈ (0, 2/9). The tomography outputs the first estimatet 0 so that for any true parameter t. (A2) Local estimation: Based on the first estimatet 0 , apply the optimal local measurement Mˆt 0 ,a n,x W given in Theorem 7 with the weight matrix W . If the measurement outcomet 1 of Mˆt 0 ,a n,x W is in n,x (t 0 ), output the outcomet 1 as the final estimate; otherwise outputt 0 as the final estimate.
Denoting the POVM of the whole process by m W = {M n W }, we obtain the following theorem.
holds for any point t 0 ∈ and any t ∈ n,x,c(t 0 ) corresponding to a non-degenerate state, where C S (W, t 0 ) is the minimum weighted MSE as defined in Eq. (110). More precisely, we have lim sup for a compact set K ⊂ , where V t 0 |W is defined in Eq. (146) and n,x,c(t 0 ) is defined in Eq. (76). Further, when the parameter set is bounded and x < κ, we have the following relation.
Here, we should remark the key point of the derivation. The existing papers [8,11] addressed the achievability of min M tr W J −1 t|M with the two-step method, where J t|M is the Fisher information matrix of the distribution family {℘ t|M } t , which expresses the bounds among separable measurement [34,Exercise 6.42]. Hence it can be called the separable bound. In the one-parameter case, the separable bound equals the Holevo bound. To achieve the separable bound, we do not consider the sequence of measurement. Hence, we do not handle a complicated convergence. The global achievability of the separable bound can be easily shown by the two-step method [8,11]. However, in our setting, we need to handle the sequence of measurement to achieve the local optimality. Hence, we need to carefully consider the compact uniformity and the order estimate of the convergence in Theorem 7. In the following proof, we employ our evaluation with such detailed analysis as in Eq. (127).

Proof.
Step 1 Define by t g := t 0 + t √ n the true value of the parameters. By definition, we have t g −t 0 ≤ n − 1−x 2 with probability 1 − O(e −n x/2 ) and t g − t 0 ≤ c(t 0 )n − 1 2 +x by definition. Since the error probability vanishes exponentially, it would not affect the scaling of MSE. In this step, we will show Since Eq. (127) of Theorem 7 implies ℘ a n,x t 0 ,t|Mˆt 0 ,an,x Since ℘ n t 0 ,t|Mˆt 0 ,an,x As we have we obtain Step 2 We will show (170). First, we discuss two exceptional cases t g −t 0 > n − 1−x 2 and t 1 −t 0 > n − 1−x 2 . Eq. (168) guarantees that Eq. (175) and the property of normal distribution implies Tr ρ ⊗a n,x t g Mˆt 0 ,a n, When t g −t 0 ≤ n − 1−x 2 and t 1 −t 0 ≤ n − 1−x 2 , Eq. (172) holds under the condition t g − t 0 ≤ c(t 0 )n − 1 2 +x , which implies that Since the above evaluation is compactly uniform with respect to t 0 , we have (170).
The compactness of guarantees that the error n(t − t g )(t − t g ) T is bounded by nC with a constant C. Due to (178), the contribution of the first case is bounded by nC · O(e −n x/2 ), which goes to zero.
In the second case, sincet 0 =t, the error n(t − t g )(t − t g ) T is bounded by Due to (179), the contribution of the second case is bounded by n x · O(n −κ ) = O(n x−κ ), which goes to zero.
In the third case, since Due to (175), the contribution of the second case is bounded by 2n x · O(n −κ ) = O(n x−κ ), which goes to zero. Therefore, we obtain (181).

Generic cost functions.
Finally, we show that results in this work hold also for any cost function c(t, t), which is bounded and has a symmetric expansion, in the sense of satisfying the following two conditions: (t, t) has a continuous third derivative, so that it can be expanded as To adopt this situation, we replace the step (A2) by the following step (A2)': (A2)' Based on the first estimatet 0 , apply the optimal local measurement Mˆt 0 ,a n,x  corresponding to a non-degenerate state.

Theorem 11 is reduced to a bound for the (actual) MSE when c(t, t) = (t T −t T )W (t −t)
for W ≥ 0. Therefore, bounds in this work, Eqs. (125) and (144) for instance, are also attainable bounds for the MSE of any locally asymptotically unbiased measurement. Proof.
Step 1 We prove (1). Consider any sequence of asymptotically covariant measurements m t 0 := {M n,t 0 } at t 0 . Denote by t g := t 0 + t √ n the true value of the parameters. For a cost function c satisfying (ii), we have Step 2 We prove (2). We replace W by W t in the proof of Theorem 10. In this replacement, (173) is replaced by where x ∈ (0, 2/9). Hence, the contributions of the first and second cases of Step 3 of the proof of Theorem 10 go to zero.
In the third case of Step 3 of the proof, we have t g −t 1 ≤ 2n − 1−x 2 , Hence, Hence, in the contribution of the third case, we can replace the expectation of nc(t 1 , t g ) by the weighted MSE with weight W t g . Hence, we obtain the part (2).

Applications
In this section, we show how to evaluate the MSE bounds in several concrete examples.

Joint measurement of observables.
Here we consider the fundamental problem of the joint measurement of two observables. For simplicity we choose to analyze qubit systems, although the approach can be readily generalized to arbitrary dimension. The task is to simultaneously estimate the expectation of two observables A and B in a qubit system. The observables can be expressed as A = a · σ and B = b · σ with σ = (σ x , σ y , σ z ) being the vector of Pauli matrices. We assume without loss of generality that |a| = |b| = 1 and a · b ∈ [0, 1). The state of an arbitrary qubit system can be expressed as where n is the Bloch vector.
With this notation, the task is reduced to estimate the parameters x := a · n, y := b · n.
It is also convenient to introduce a third unit vector c orthogonal to a and b so that {a, b, c} form a (non-orthogonal) normalized basis of R 3 . In terms of this vector, we can define the parameter z := c · n. In this way, we extend the problem to the full model containing all qubit states, where x, y are the parameters of interest and z is a nuisance parameter. Under this parameterization, we can evaluate the SLD operators for x, y, and z, as well as the SLD Fisher information matrix and the D matrix (see "Appendix J" for details), substituting which into the bound (144) yields: 1−|n| 2 +x 2 +2x y s+y 2 +z 2 − x y (1−s 2 )+(1−|n| 2 +z 2 )s 1−|n| 2 +x 2 +2x y s+y 2 +z 2 − x y (1−s 2 )+(1−|n| 2 +z 2 )s 1−|n| 2 +x 2 +2x y s+y 2 +z 2 1−|n| 2 +x 2 (1−s 2 )+z 2 1−|n| 2 +x 2 +2x y s+y 2 +z 2 ⎞ ⎟ ⎠ where s := a · b, x = x−ys 1−s 2 , and y = y−xs 1−s 2 . The tradeoff between the measurement precisions for the two observables is of fundamental interest. Substituting the expressions of D-matrix and the SLD Fisher information matrix (see "Appendix J") into Eq. (159), we obtain which characterizes the precision tradeoff in joint measurements of qubit observables.

Direction estimation in the presence of noise.
Consider the task of estimating a pure qubit state |ψ = cos θ 2 |0 +e iϕ sin θ 2 |1 , which can also be regarded as determining a direction in space, as qubits are often realized in spin-1/2 systems. In a practical setup, it is necessary to take into account the effect of noise, under which the qubit becomes mixed. For noises with strong symmetry, like depolarization, the usual MSE bound produces a good estimate of the error. For other kind of noises, it is essential to introduce nuisance parameters, and to use the techniques introduced in this paper.
As an illustration, we consider the amplitude damping noise as an example, which can be formulated as the channel In terms of the derivative vector, the SLD for the parameter x ∈ {θ, ϕ, η} takes the form After some straightforward calculations, we get Then we have the MSE bound with nuisance parameter η. An illustration can be found in Fig. 2 with W = I in Eq. (144). The minimum of the sum of the (x, x)-th matrix element of the MSE matrix for x = θ, ϕ is independent of ϕ, which is a result of the symmetry of the problem: the D-matrix does not depend on ϕ, and thus an estimation of ϕ can be obtained without affecting the precisions of other parameters. Notice that when the state is close to |0 or |1 , it is insensitive to the change of θ , resulting in the cup-shape curves in Fig. 2. Next, we evaluate the sum of MSEs of ϕ and θ when η is a (known) fixed parameter using Eq. (125) and compare it to the nuisance parameter case. The result of the numerical evaluation is plotted in Fig. 3. It is clear from the plot that the variance sum is strictly lower when η is treated as a fixed parameter, compared to the nuisance parameter case. This is a good example of how knowledge on a parameter (η) can assist the estimation of other parameters (ϕ and θ ). It is also observed that, when the noise is larger (i.e. when η is smaller), the gain of precision by knowing η is also bigger.

Multiphase estimation with noise.
Here we consider a noisy version of the multiphase estimation setting [20,51]. This problem was first studied by [20], where the authors derived a lower bound for the quantum Fisher information and conjectured that it was tight. Under local asymptotic covariance, we can now derive an attainable bound and show its equivalence to the SLD bound using the orthogonality of nuisance parameters, which proves the conjecture.
Our techniques also allow to resolve an open issue about the result of Ref. [20], where it was unclear whether or not the best precision depended on the knowledge of the noise. Using Corollary 4, we will also see that knowing a priori the strength of the noise does not help to decrease the estimation error.
The setting is illustrated in Fig. 4. Due to photon loss, the phase-shift operation is no longer unitary. Instead, it corresponds to a noisy channel with the following Kraus form:  Note that η = 0 corresponds to the noiseless scenario. We consider a pure input state with N photons and in the "generalized NOON form" as The output state from the noisy multiphase evolution would be , and ρ η is independent of t. Notice that the output state is supported by the finite set of orthonormal states {|n j : j = 0, . . . , d, n = 0, . . . , N }, and thus it is in the scope of this work. In this case, {t j } are the parameters of interest, while α η and p η can be regarded as nuisance parameters. The SLD operators for these parameters can be calculated as where ℘ H ⊥ refers to the projection into the space orthogonal to |ψ η,t . Notice that p η and α η are orthogonal to other parameters, in the sense that Tr ρ L t j L p η = Tr ρ L α η L p η = 0 and Tr ρ L t j L α η = 2i p η sin 2α η d Therefore, the SLD Fisher information matrix and the D matrix are of the forms Substituting the above into the bound (144), we immediately get an attainable bound for any locally asymptotically covariant measurement m. Taking W to be the identity, one will see that for small η the sum of the variances scales as N 2 /d 2 , while for η → 1 it scales as N 2 /d, losing the boost in scaling compared to separate measurement of the phases. The bound (186) coincides with the SLD bound and the RLD bound. By Corollary 4, we conclude that the SLD (RLD) bound can be attained in the case of joint estimation of multiple phases. In addition, we stress that the ultimate precision does not depend on whether or not the noisy parameter η is known aprior: If η is unknown, one can obtain the same precision as when η is known by estimating η without disturbing the parameters of interest.

Conclusion
In this work, we completely solved the attainability problem of precision bounds for quantum state estimation under the local asymptotic covariance condition. We provided an explicit construction for the optimal measurement which attains the bounds globally.
The key building block of the optimal measurement is the quantum local asymptotic normality, derived in [16,17] for a particular type of parametrization and generalized here to arbitrary parameterizations. Besides the bound of MSE, we also derived a bound for the tail probability of estimation. Our work provides a general tool of constructing benchmarks and optimal measurements in multiparameter state estimation. In Table 3, we compare our result with existing results.
Here, we should remark the relation with the results by Yamagata et al. [19], which showed a similar statement for this kind of achievability in a local model scenario by a kind of local quantum asymptotic normality. In Theorem 7, we have shown the compact uniformity with the order estimation in our convergence, but they did not such properties. In the evaluation of global estimator, these properties for the convergence is essential. The difference between our evaluation and their evaluation comes from the key tools.
The key tool of our derivation is Q-LAN (Proposition 2) by [16,17], which gives the state conversion, i.e., the TP-CP maps converting the states family with precise evaluation of the trace norm. However, their method is based on the algebraic central limit theorem [38,52], which gives only the behavior of the expectation of the function of operators R i . This idea of applying this method to the achievability of the Holevo bound was first mentioned in [18]. Yamagata et al. [19] derived the detailed discussion in this direction.
Indeed, the algebraic version of Q-LAN by [38,52] can be directly applied to the vector X of Hermitian matrices to achieve the Holevo bound while use of the state conversion of Q-LAN requires some complicated procedure to handle the the vector X of Hermitian matrices, which is the disadvantage of our approach. However, since the algebraic version of Q-LAN does not give a state conversion directly, it is quite difficult to give the compact uniformity and the order estimate of the convergence. In this paper, to overcome the disadvantage of our approach, we have derived several advanced properties for Gaussian states in Sects. 3.2 and 3.3 by using symplectic structure. Using these properties, we could smoothly handle complicated procedure to fill the gap between the full quit model and arbitrary submodel.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A. Proof of Lemma 2
In this appendix, we show Lemma 2. For this aim, we discuss the existence of PDF. First, we show the following lemma.

Lemma 19.
Let P be a probability measure on R. Define the location shift family {P t } as P t (B) := P(B + t). For an arbitrary disjoint decomposition A := {A i } of R, we assume that the probability distribution family {P A,t } has finite Fisher information J A,t , where P A,t (i) := P t (A i ). We also assume that J t := sup A J A,t < ∞. Also, we define x + := inf{x |P(x , ∞) = 0} and x − := sup{x |P((−∞, x ]) = 0}.
Then, for Proof. Assume that x + < ∞. We choose The fidelity between P A,0 and P A,t is Hence, we have Hence, we have which implies the existence of p(x). Thus, Hence, we have Hence, when d → 0 i.e., which implies that p(x) is Hölder continuous with order 1/2.
Using the previous Lemma, we are in position to prove Lemma 2.
Proof of Lemma 2. Let A := {A i } be an arbitrary disjoint finite decomposition of R. Let G A be the coarse-graining map from a distribution on R to a distribution on the meshes A i . Then, the Fisher information J A,n,t of {G A (℘ n t 0 ,t|M n )} t is not greater than the Fisher information J t of ρ n t 0 ,t t∈ n . Hence, the Fisher information J A,t of {G A (℘ t 0 ,t|m )} t satisfies Therefore, we can apply Lemma 19 to ℘ t 0 ,t|m . Lemma 19 guarantees the existence of the PDF of the limiting distribution ℘ t 0 ,t|m ,

B. Lemmas Used for Asymptotic Evaluations
In this appendix, we prepare two lemmas for asymptotic evaluations of information quantity of probability distributions.
Lemma 20. Assume that two sequences of probability distributions {(P n , Q n )} on R converges to a pair of probability distributions (P, Q) on R, respectively. Then, the inequality holds.
Proof. Let p and q be the Radon-Nikodým derivative of P and Q with respect to P + Q. Since Eq. (199) implies that Also, we have Then, information processing inequality for the fidelity yields that Since the number of meshes is finite, we have

Lemma 21.
Let be an open subset of R k . Assume that a sequence of probability distributions {P t,n } t∈ on R k converges to a family of probability distributions {P t } t∈ on R k . We denote their -difference Fisher information matrices by J n t, and J t, , respectively. For an vector t and > 0, we also assume that there exists a Hermitian matrix J such that J n t, ≤ J . Then, P t+ e j is absolutely continuous with respect to P t for j = 1, . . . , k , and the inequality holds for any complex vector a ∈ C k .
Proof of Lemma 21. Since J n t, and J t, are real matrices, it is sufficient to show (204) for a real vector a. In this proof, we fix the vector t.
Step (1): We show that P t+ e j is absolutely continuous with respect to P t for j = 1, . . . , k by contradiction. Assume that there exists an integer j such that P t+ e j is not absolutely continuous with respect to P t . There exists a Borel set B ⊂ R k such that P t+ e j (B) > 0 and P t (B) = 0. Let G be the coarse-grained map from a distribution P on R k to a binary distribution (P(B), P(B c )) on two events {B, B c }. Let J t,B, and J n t,B, be the -difference Fisher information matrices of {G(P t )} and {G(P t,n )}, respectively. Information processing inequality implies that J n t,B, ≤ J n t, ≤ J . Also, J n t,B, → J t,B, as n → ∞. Hence, J t,B, ≤ J . However, the j-th diagonal element of J t,B, is infinity. It contradicts the assumption of contradiction.
Step (2): Let p t+ e j be the Radon-Nikodým derivative of P t+ e j with respect to P t . We show that for N , > 0, and any integer j = 1, . . . , k by contradiction. We denote the LHS of (205) by C j and assume there exists an integer j such that C j > 0. We set R = J ; j, j /C j + 2. Setting B to be {x| p t+ e j (x) > R}, we repeat the same discussion as Step (1). Then, we obtain the contradiction as follows.
Step (3): We show (204) for a real vector a. We define the subsets Given R > 0, let G R be the coarse-grained map from a distribution on R k to a distribution on the family of measurable sets {B ⊂ R k \ C R } ∪ {C R }, where B is any Borel set in R k \C R . Given N > 0 and R > 0, let G N ,R be the coarse-grained map from a distribution on R k to a distribution on the family of measurable sets where B is any Borel subset in R k \ C N ,R . Given N > 0, R > 0, and N > 0, let G N ,N ,R be the coarse-grained map from a distribution on R k to a distribution on meshes Lebesgue convergence theorem guarantees that Let J n t, ,N ,R,N be the -difference Fisher information matrix for the distribution family {G N ,R,N ( p t,n )} t . Then, information processing inequality (93) for the -difference Fisher information matrix yields that Since the number of meshes is finite, we have Hence, using (210), (211), (212), and (215), we have lim inf n→∞ a|J n t, − J t, |a ≥ 0.

C. Proof of Lemma 3
Before starting our proof of Lemma 3, we prepare the following lemmas.

Lemma 22. Consider a canonical quantum Gaussian states family {Φ[θ, β]}. When a symplectic matrix S satisfies
where E q is the matrix defined in Eq. (56), there exists a unitary operator U S such that Proof. Consider any coordinate θ = (θ C , θ Q ), where θ Q obtained by a reversible linear transformation S on the Q-LAN coordinate θ Q , i.e. θ Q = Sθ Q .
, and x = (q 1 ,p 1 , . . . ,q q ,p q ) T . We have where y := (S −1 ) T x and Z β > 0 is a normalizing constant. Now, by the definition of E q (x) in Eq. (56) and S E q (e −β V ) S T = E q e −(β V ) , S must be of the block diagonal form S = i O s i . Here {s i } is a partition of {1, . . . , 2q} and j, k ∈ s i if and only if β j = β k , and O s i is an orthogonal matrix acting on any component j ∈ s i . Since β V , β V and ln β V are in one-to-one correspondence, we S E q (e −β V )S T = E q (e −β V ). Substituting it into Eq. (217), we have That is, (S −1 ) T can be regarded as a transformation of x. Finally, S is symplectic since S DS T = D, and there exists a unitary U S such that [50] respectively. We introduce the classical parameters θ C and the quantum parameters θ Q in Ker Im(Γ ) and Supp Im(Γ ), respectively. That is, the classical parameter θ C and the quantum parameter θ Q are given by an invertible linear transformation T such that θ := (θ C , θ Q ) = T t satisfies Since the above separation is unique up to the linear conversion and any classical Gaussian states can be converted to each other via scale conversion, the remaining problem is to show the desired statement for the quantum part. Next, we focus on the quantum part ((T −1 ) T Γ T −1 ) Q of the Hermitian matrix (T −1 ) T Γ T −1 It is now convenient to define the matrix The role of A is to normalize the D-matrix. Indeed, since Im( 0 is a real symmetric matrix, there exists a symplectic matrix S and a vector β such that [53] Meanwhile, we have SS 0 A −1 Im(((T −1 ) T Γ T −1 ) Q )A −1 S 0 S T = Ω d Q since S is symplectic. Overall, when T is given as (I ⊕ (SS 0 A −1 ))T , the desired requirement is satisfied. The uniqueness of β is guaranteed by the uniqueness of symplectic eigenvalues. Hence, when two linear conversions T andT satisfies the condition of the statement, T Γ T T =T ΓT T . Thus, Lemma 22 guarantees that the canonical Gaussian states G(T −1 α, T Γ T T ) and G(T −1 α,T ΓT T ) are unitarily equivalent.

D. Proof of Lemma 5
(3) ⇒ (1): When a Gaussian states family is given in the RHS of (64), it is clearly D-invariant. Hence, the D-invariance is equivalent to the condition (2).
(2) ⇒ (3): First, we separate the system into the classical and the quantum parts. In the Gaussian states family {G[α, Γ ]} this separation can be done by considering the Kernel of I m(Γ ) as in the proof of Lemma 3. In the Gaussian states family G[T (t), Γ ] this separation can be done by considering the Kernel of the D-matrix D 0 in the same way. Since the relation (64) for the classical part is easily done, we show the relation (64) when only the quantum part exists.
Under the above assumption, we define the k × (d − k) matrix T such that F := (T ⊕T ) is invertible and T T A −1 T = 0. Then, Lemma 3 guarantees that G [F(t, t ), Γ ] is unitarily equivalent to G[(t, t ), F −1 Γ (F T ) −1 ]. Since T T A −1 T = 0, we have Putting t = 0, we obtain the condition (3).

E. Proof of Lemma 6
Since Lemma 3 shows that general Gaussian states can be reduced to the canonical Gaussian states, we discuss only the canonical Gaussian states.
Step 1 We show the statement when we have only the quantum part and X = R. For a given state ρ, we define the POVM M ρ by When ρ is a squeezed state with Tr ρ Q j = Tr ρ j P = 0, the output distribution ℘ α|M [ρ] of M[ρ] is the 2d Q -dimensional normal distribution of average α and the following covariance matrix [5]; E d Q (β) + V ρ , with V ρ := (Tr Q i Q j ρ) i, j (Tr Q i P j ρ) i, j (Tr P i Q j ρ) i, j (Tr P i P j ρ) i, j .
In the single-mode case, without loss of generality, we can assume that W is a diagonal matrix w 1 0 0 w 2 because this diagonalization can be done by applying the orthogonal transformation between Q and P. Then, In the multiple-mode case, we choose a symplectic matrix S such that SW S T is a diagonal matrix with diagonal element w 1 , w 2 , . . . , w 2d Q . The matrix

G. Proof of Lemma 12
Denote by Q( y) := 1 π k/2 y|F| y the Q-function of F [54]. Expanding displaced thermal states into a convex combination of coherent states, Eq. (104) can be rewritten as Taking the Fourier transform F y→ξ (g) := d y e i y·ξ g on both sides, we get In addition, we know that the P-function P( y) [55] of F can be evaluated via the Qfunction as (see, for instance, [56]) The combination of (236) and (237) yields By definition of the P-function P( y), F satisfies Conversely, we assume that F is given by (105). Then, we choose the function Q(α) to satisfy Applying the inverse of F α→ξ to (239), we obtain (235). The combination of (235) and (240) implies (104).
In the classical case, the covariant measurement is unique. So, we have the extension as in Lemma 16.

I. Proof of Lemma 17
(i)⇒ (ii): Since S −1 A 2 (S T ) −1 = (S T A −1 2 S) −1 , S T A 1 S, and D commute with each other, we have Since S T DS = D, we have S T D = DS −1 and DS = (S T ) −1 D. Thus, which implies (ii). Hence, S −1 A 2 (S T ) −1 commute with S T A 1 S D. There exists an orthogonal matrix S such that SS is a symplectic matrix, and (SS ) T A 1 (SS ) and (SS ) −1 A 2 ((SS ) T ) −1 are diagonal matrices. Considering the inverse of A −1 2 , we obtain (i).