Attaining the ultimate precision limit in quantum state estimation

We derive a bound on the precision of state estimation for finite dimensional quantum systems and prove its attainability in the generic case where the spectrum is non-degenerate. Our results hold under an assumption called local asymptotic covariance, which is weaker than unbiasedness or local unbiasedness. The derivation is based on an analysis of the limiting distribution of the estimator's deviation from the true value of the parameter, and takes advantage of quantum local asymptotic normality, a useful asymptotic characterization of identically prepared states in terms of Gaussian states. We first prove our results for the mean square error of a special class of models, called D-invariant, and then extend the results to arbitrary models, generic cost functions, and global state estimation, where the unknown parameter is not restricted to a local neighbourhood of the true value. The extension includes a treatment of nuisance parameters, i.e. parameters that are not of interest to the experimenter but nevertheless affect the precision of the estimation. As an illustration of the general approach, we provide the optimal estimation strategies for the joint measurement of two qubit observables, for the estimation of qubit states in the presence of amplitude damping noise, and for noisy multiphase estimation.


November 5, 2018
Abstract: We derive a bound on the precision of state estimation for finite dimensional quantum systems and prove its attainability in the generic case where the spectrum is non-degenerate. Our results hold under an assumption called local asymptotic covariance, which is weaker than unbiasedness or local unbiasedness. The derivation is based on an analysis of the limiting distribution of the estimator's deviation from the true value of the parameter, and takes advantage of quantum local asymptotic normality, a useful asymptotic characterization of identically prepared states in terms of Gaussian states. We first prove our results for the mean square error of a special class of models, called D-invariant, and then extend the results to arbitrary models, generic cost functions, and global state estimation, where the unknown parameter is not restricted to a local neighbourhood of the true value. The extension includes a treatment of nuisance parameters, i.e. parameters that are not of interest to the experimenter but nevertheless affect the precision of the estimation. As an illustration of the general approach, we provide the optimal estimation strategies for the joint measurement of two qubit observables, for the estimation of qubit states in the presence of amplitude damping noise, and for noisy multiphase estimation.

Introduction.
Quantum estimation theory is one of the pillars of quantum information science, with a wide range of applications from evaluating the performance of quantum devices [1,2] to exploring the foundation of physics [3,4]. In the typical scenario, the problem is specified by a parametric family of quantum states, called the model, and the objective is to design measurement strategies that estimate the parameters of interest with the highest possible precision. The precision measure is often chosen to be the mean square error (MSE), and is lower bounded through generalizations of the Cramér-Rao bound of classical statistics [5,6]. Given n copies of a quantum state, such generalizations imply that the product MSE · n converges to a positive constant in the large n limit. Despite many efforts made over the years (see, e.g., [5,6,7,8,9,10,11,12] and [13] for a review), the attainability of the precision bounds of quantum state estimation has only been proven in a few special cases. Consider, as an example, the most widely used bound, namely the symmetric logarithmic derivative Fisher information bound (SLD bound, for short). The SLD bound is tight in the one-parameter case [5,6], but is generally non-tight in multiparameter estimation. Intuitively, measuring one parameter may affect the precision in the measurement of another parameter, and thus it is extremely tricky to construct the optimal measurement. Another bound for multiparameter estimation is the right logarithmic derivative Fisher information bound (RLD bound, in short) [5]. Its achievability was shown in the Gaussian states case [5], the qubits case [14,15], and the qudits case [16,17]. In this sense, the RLD bound is superior to the SLD bound. However, the RLD bound holds only when the family of states to be estimated satisfies an ad hoc mathematical condition. The most general quantum extension of the classical Cramér-Rao bound till now is the Holevo bound [5], which gives the maximum among all existing lower bounds for the error of unbiased measurements for the estimation of any family of states. The attainability of the Holevo bound was studied in the pure states case [10] and the qubit case [14,15], and was conjectured to be generic by one of us [18]. Yamagata, Fujiwara, and Gill [19] addressed the attainability question in a local scenario, showing that the Holevo bound can be attained under certain regularity conditions. However, the attaining estimator constructed therein depends on the true parameter, and therefore has limited practical interest. Meanwhile, the need of a general, attainable bound on multiparameter quantum estimation is increasing, as more and more applications are being investigated [20,21,22,23,24].
In this work we explore a new route to the study of precision limits in quantum estimation. This new route allows us to prove the asymptotic attainability of the Holevo bound in generic scenarios, to extend its validity to a broader class of estimators, and to derive a new set of attainable precision bounds. We adopt the condition of local asymptotic covariance [18] which is less restrictive than the unbiasedness condition [5] assumed in the derivation of the Holevo bound. Under local asymptotic covariance, we characterize the MSE of the limiting distribution, namely the distribution of the estimator's rescaled deviation from the true value of the parameter in the asymptotic limit of n → ∞.
Our contribution can be divided into two parts, the attainability of the Holevo bound and the proof that the Holevo bound still holds under the weaker condition of local asymptotic covariance. To show the achievability part, we employ quantum local asymptotic normality (Q-LAN), a useful characterization of n-copy d-dimensional (qudit) states in terms of multimode Gaussian states. The qubit case was derived in [14,15] and the case of full parametric models was derived by Kahn and Guta when the state has non-degenerate spectrum [16,17]. Here we extend this characterization to a larger class of models, called D-invariant models, using a technique of symplectic diagonalization. For models that are not D-invariant, we derive an achievable bound, expressed in terms of a quantum Fisher information-like quantity that can be straightforwardly evaluated. Whenever the model consists of qudit states with non-degenerate spectrum, this quantity turns out to be equal to the quantity in the Holevo bound [5]. Our evaluation has compact uniformity and order estimation of the convergence, which will allow us to prove the achievability of the bound even in the global setting.
We stress that, until now, the most general proof of the Holevo bound required the condition of local unbiasedness. In particular, no previous study showed the validity of the Holevo bound under the weaker condition of local asymptotic covariance in the multiparameter scenario. To avoid employing the (local) unbiasedness condition, we focus on the discretized version of the RLD Fisher information matrix, introduced by Tsuda and Matsumoto [25]. Using this version of RLD Fisher information matrix, we manage to handle the local asymptotic covariance condition, and to establish the validity of the Holevo bound in this broader scenario. Remarkably, the validity of the bound does not require finitedimensionality of the system or non-degeneracy of the states in the model. Our result also provides a simpler way to evaluate the Holevo bound, whose original expression involved a difficult optimization over a set of operators.
The advantage of local asymptotic covariance over local unbiasedness is the following. For practical applications, the estimator needs to attain the lower bound globally, i.e., at all points in the parameter set. However, it is quite difficult to meet this desideratum under the condition of local unbiasedness, even if we employ a two-step method based on a first rough estimate of the state, followed by the measurement that is optimal in the neighbourhood of the estimate. In this paper, we construct a locally asymptotic covariant estimator that achieves the Holevo bound at every point, for any qudit submodel except those with degenerate states. Our construction proceeds in two steps. In the first step, we perform a full tomography of the state, using the protocol proposed in [26]. In the second step, we implement a locally optimal estimator based on Q-LAN [16,17]. The two-step estimator works even when the estimated parameter is not assumed to be in a local neighbourhood of the true value. The key tool to prove this property is our precise evaluation of the optimal local estimator with compact uniformity and order estimation of the convergence. A comparison between the approach adopted in this work (in green) and conventional approach to quantum state estimation (in blue) can be found in Figure 1.
Our method can be extended from the MSE to arbitrary cost functions. Besides the attainability of the Holevo bound, the method can be used to derive a broad class of bounds for quantum state estimation. Under suitable assumptions, we characterize the tail of the limiting distribution, providing a bound on the probability that the estimate falls out of a confidence region. The limiting distribution is a good approximation of the (actual) probability distribution of the estimator, up to a term vanishing in n. Then, we derive a bound for quantum estimation with nuisance parameters, i.e. parameters that are not of interest to the experimenter but may affect the estimation of the other parameters. For instance, the strength of noise in a phase estimation scenario can be regarded as a nuisance parameter. Our bound applies also to arbitrary estimation models, thus extending nuisance parameter bounds derived for specific cases (see, e.g., [27,28,29]). In the final part of the paper, the above bounds are illustrated in concrete examples, including the joint measurement of two qubit observables, Fig. 1. Comparison between the approach of this work (in green) and the traditional approach of quantum state estimation (in blue). In the traditional approach, one derives precision bounds based on the probability distribution function (PDF) for measurements on the original set of quantum states. The bounds are evaluated in the large n limit and the task is to find a sequence of measurements that achieves the limit bound. In this work, we first characterize the limiting distribution and then work out a bound in terms of the limiting distribution. This construction also provides the optimal measurement in the limiting scenario, which can be used to prove the asymptotic attainability of the bound. The analysis of the limiting distribution also provides tail bounds, which approximate the tail bounds for finite n up to a small correction, under the assumption that the cost function and the model satisfy a certain relation (see Theorem 9).
the estimation of qubit states in the presence of amplitude damping noise, and noisy multiphase estimation.
The remainder of the paper is structured as follows. In Section 2 we introduce the main ideas in the one-parameter case. Our discussion of the one-parameter case requires no regularity condition for the parametric model. In Section 5 we introduce the -difference RLD Fisher information matrix, which will be a key tool for deriving our bounds in the multiparameter case. In Section 3, we briefly review Gaussian states and derive some relations that will be useful in the rest of the paper. In Sections 4, we introduce Q-LAN. In Section 6, we derive the general bound on the precision of multiparameter estimation. In Section 7, we address state estimation in the presence of nuisance parameters and derive a precision bound for this scenario. Section 8 provides bounds on the tail probability. In Section 9, we extend our results to global estimation and to generic cost functions. In Section 10, the general method is illustrated through examples. The conclusions are drawn in Section 11.

Term
Notation Definition Limiting distribution ℘ t 0 ,t|m Eqs. (21) and (102) SLD quantum Fisher information at t 0 J t 0 Eq. (35) RLD quantum Fisher information at t 0 J t 0 Eq. ( Remark on the notation. In this paper, we use z * for the complex conjugate of z ∈ C and A † for the Hermitian conjugate of an operator A. For convenience of the reader, we list other frequently appearing notations and their definitions in Table 1.

Precision bound under local asymptotic covariance:
one-parameter case.
In this section, we discuss estimation of a single parameter under the local asymptotic covariance condition, without any assumption on the parametric model.
where Θ is a subset of R. In the literature it is typically assumed that the parametrization is differentiable. When this is the case, one can define the symmetric logarithmic derivative (SLD) operator at t 0 via the equation Then, the SLD Fisher information is defined as The SLD L t0 is not unique in general, but the SLD Fisher information J t0 is uniquely defined because it does not depend on the choice of the SLD L t0 among the operators satisfying (2). When the parametrization is C 1 -continuous and > 0 is a small number, one has where is the fidelity between two density matrices ρ and ρ .
Here we do not assume that the parametrization (1) is differentiable. Hence, the SLD Fisher information cannot be defined by (3). Instead, we use (4) and define the SLD Fisher information J t0 as the limit In the n-copy case, we have the following lemma: Proof. Using the definition (6), we have In other words, the SLD Fisher information is constant over n if we replace by / √ n.
Estimation of the parameter t is implemented by performing a quantum measurement, mathematically described by a positive operator valued measure (POVM) with outcomes in R. It is often assumed that the measurement is unbiased, in the following sense: a POVM M on a single input copy is called unbiased when For a POVM M , we define the mean square error (MSE) V t (M ) as Then, we have the fidelity version of the Cramér-Rao inequality: Theorem 1 For an unbiased measurement M satisfying for any t, we have When lim →0 V t0+ (M ) = V t0 (M ), taking the limit → 0, we have The proof uses the notion of fidelity between two classical probability distributions: for two given distributions P and Q on a probability space X , we define the fidelity F (P Q) as follows. Let f P and f Q be the Radon-Nikodým derivatives of P and Q with respect to P + Q, respectively. Then, we define the fidelity F (P Q) as With the above definition, the fidelity satisfies an information processing inequality: for every classical channel G, one has F (G(P ) G(Q)) ≥ F (P Q). For a family of probability distributions {P θ } θ∈Θ , we define the Fisher information as When the probability distributions are over a discrete set, their Fisher information coincides with the quantum SLD of the corresponding diagonal matrices.
Proof of theorem (1): Without loss of generality, we assume t 0 = 0. We define the probability distribution P t by P t (B) := Tr [ ρ t M (B) ]. Then, the information processing inequality of the fidelity [30] yields the bound F (ρ t0 ||ρ t0+ ) ≤ F (P 0 P ). Hence, it is sufficient to show (12) for the probability distribution family {P t }.
Let f 0 and f be the Radon-Nikodým derivatives of P 0 and P with respect to P 0 + P . Denoting the estimate byt, we have and therefore Also, (14) implies the relation Hence, Schwartz inequality implies Combining (16), (17), and (18) we have (12).

2.2.
Local asymptotic covariance. When many copies of the state ρ t are available, the estimation of t can be reduced to a local neighbourhood of a fixed point t 0 ∈ Θ. Motivated by Lemma 1, we adopt the following parametrization of the n-copy state ρ n t0,t := ρ ⊗n having used the notation a Θ + b := {ax + b |x ∈ Θ}, for two arbitrary constants a, b ∈ R. With this parametrization, the local n-copy model is ρ n t0,t t∈∆n . Note that, for every t ∈ R, there exists a sufficiently large n such that ∆ n contains t. As a consequence, one has n∈N ∆ n = R.
Assuming t 0 to be known, the task is to estimate the local parameter t ∈ R, by performing a measurement on the n-copy state ρ n t0,t and then mapping the obtained data to an estimatet n . The whole estimation strategy can be described by a sequence of POVMs m := {M n }. For every Borel set B ⊂ R, we adopt the standard notation In the existing works on quantum state estimation, the error criterion is defined in terms of the difference between the global estimate t 0 +t n √ n and the global true value t 0 + t √ n . Instead, here we focus on the difference between the local estimatet n and the true value of the local parameter t. With this aim in mind, we consider the probability distribution We focus on the behavior of ℘ n t0,t|Mn in the large n limit, assuming the following condition: Condition 1 (Local asymptotic covariance for a single-parameter) A sequence of measurements m = {M n } satisfies local asymptotic covariance 1 when 1. The distribution ℘ n t0,t|Mn (20) converges to a distribution ℘ t0,t|m , called the limiting distribution, namely ℘ t0,t|m (B) := lim n→∞ ℘ n t0,t|Mn (B) (21) for any Borel set B. 2. the limiting distribution satisfies the relation for any t ∈ R, which is equivalent to the condition Using the limiting distribution, we can faithfully approximate the tail probability as where the n term vanishes with n for every fixed . For convenience, one may be tempted to require the existence of a probability density function (PDF) of the limiting distribution ℘ t0,t|m . However, the existence of a PDF is already guaranteed by the following lemma. Lemma 2 When a sequence m := {M n } of POVMs satisfies local asymptotic covariance, the limiting distribution ℘ t0,t|m admits a PDF, denoted by ℘ t0,0|m,d .
The proof is provided in Appendix A.
2.3. MSE bound for the limiting distribution. As a figure of merit, we focus on the mean square error (MSE) V [℘ t0,t|m ] of the limiting distribution ℘ t0,t|m , namely Note that local asymptotic covariance implies that the MSE is independent of t.
The main result of the section is the following theorem: Theorem 2 (MSE bound for single-parameter estimation) When a sequence m := {M n } of POVMs satisfies local asymptotic covariance, the MSE of its limiting distribution is lower bounded as where J t0 is the SLD Fisher information of the model {ρ t } t∈Θ . The PDF of ℘ t0,t|m is upper bounded by J t0 . When the PDF of ℘ t0,t|m is differentiable with respect to t, equality in (25) holds if and only if ℘ t0,t|m is the normal distribution with average zero and variance J −1 t0 .
Proof of Theorem 2: When the integral Rt ℘ t0,0|m (dt) does not converge, V [℘ t0,t|m ] is infinite and satisfies (25). Hence, we can assume that the above integral converges. Further, we can assume that the outcomet satisfies the unbiasedness condition Rt ℘ t0,t|m (dt) = t. Otherwise, we can replacet byt 0 :=t − Rt ℘ t0,0|m (dt) because the estimatort 0 has a smaller MSE thant and satisfies the unbiasedness condition due to the covariance condition. Hence, Theorem 1 guarantees Applying Lemma 20 to {℘ t0,t|m }, we have The inequality (a) holds by Lemma 20 from Appendix B, and the inequality (b) comes from the data-processing inequality of the fidelity. The equation (c) follows from Lemma 1. Finally, substituting Eq. (27) into Eq. (26), we have the desired bound (25). Now, we denote the PDF of ℘ t0,0|m by ℘ t0,0|m,d . In Appendix A the proof of Lemma 2 shows that we can apply Lemma 19 to {℘ t0,t|m } t . Since the Fisher information of {℘ t0,t|m } t is upper bounded by J t0 , this application guarantees that ℘ t0,t|m,d (x) ≤ J t0 for x ∈ R.
When the PDF ℘ t0,t|m,d is differentiable, to derive the equality condition in Eq. (25), we show (26) in a different way. Let l t0,t (x) be the logarithmic . By Schwartz inequality, we have The numerator on the right hand side of Eq. (28) can be evaluated by noticing that By local asymptotic covariance, this quantity can be evaluated as Hence, (28) coincides with (26). The denominator on the right hand side of (28) equals the right hand side of (26). The equality in Eq. (28) holds if and only if is proportional tox, which implies that ℘ t0,0|m is the normal distribution with average zero and variance J −1 t0 .
The RHS of (25) can be regarded as the limiting distribution version of the SLD quantum Cramér-Rao bound. Note that, when the limiting PDF is differentiable and the bound is attained, the probability distribution ℘ n t0,t|Mn is approximated (in the pointwise sense) by a normal distribution with average zero and variance 1 nJt 0 . Using this fact, we will show that there exists a sequence of POVMs that attains the equality (25) at all points uniformly. The optimal sequence of POVMs is constructed explicitly in Section 6.
2.4. Comparison between local asymptotic covariance and other conditions. We conclude the section by discussing the relation between asymptotic covariance and other conditions that are often imposed on measurements. This subsection is not necessary for understanding the technical results in the next sections and can be skipped at a first reading.
Let us start with the unbiasedness condition. Assuming unbiasedness, one can derive the quantum Cramér-Rao bound on the MSE [5]. Holevo showed the attainability of the quantum Cramér-Rao when estimating displacements in Gaussian systems [5].
The disadvantage of unbiasedness is that it is too restrictive, as it is satisfied only by a small class of measurements. Indeed, the unbiasedness condition for the estimator M requires the condition Tr E d i ρt dt i | t=t0 = 0 for i ≥ 2 with E := t M (dt) as well as the condition Tr E dρt dt | t=t0 = 1. In certain situations, the above conditions might be incompatible. For example, consider a family of qubit states ρ t := 1 2 (I + n t · σ). When the Bloch vector n t has a non-linear dependence on t and the set of higher order derivatives d i ρt dt i | t=t0 with i ≥ 2 spans the space of traceless Hermittian matrices, no unbiased estimator can exist.
In contrast, local asymptotic covariance is only related to the first derivative dρt dt | t=t0 because the contribution of higher order derivatives to the variablet n has order o 1 √ n and vanishes in the condition of the local asymptotic covariance. One can see that the unbiasedness condition implies local asymptotic covariance with the parameterization ρ t0+ t √ n in the following sense. When we have n (more than one) input copies, we can construct unbiased estimator by applying a single-copy unbiased estimator M satisfying Eq. (9) to all copies as follows. For the i-th outcome x i , we take the rescaled average n i=1 xi n , which satisfies the unbiasedness (9) for the parameter t as well. When the single-copy estimator M has variance v at t 0 , which is lower bounded by the Cramér-Rao inequality, this estimator has the variance v/n at t 0 . In addition, the average (30) of the obtained data satisfies the local asymptotic covariance because the rescaled es- xi n ) − t 0 ) follows the Gaussian distribution with variance v in the large n limit by the central limit theorem; the center of the Gaussian distribution is pinned at the true value of the parameter by unbiasedness; the shape of the Gaussian is independent of the value t and depends only on t 0 ; thus locally asymptotic covariance holds.
The above discussion can be extended to the multiple-copy case as follows. Suppose that M is an unbiased measurement for the -copy state ρ ⊗ t0+ t √ n with respect to the parameter t 0 + t √ n , where is an arbitrary finite integer. From the measurement M we can construct a measurement for the n-copy state with n = k + i and i < by applying the measurement M k times and discarding the remaining i copies. In the following, we consider the limit where the total number n tends to infinity, while is kept fixed. When the variance of M at t 0 is v/ , the average of the k obtained data x 1 , . . . , x k satisfies local asymptotic covariance, i.e., the rescaled estimator √ n(( xi k ) − t 0 ) follows the Gaussian distribution with variance v in the large n limit. Therefore, any variance realized by an unbiased estimator can be realized under the condition of locally asymptotic covariance.
Another popular condition, less restrictive than unbiasedness, is local unbiasedness. This condition depends on the true parameter t 0 and consists of the following two requirements where is a fixed, but otherwise arbitrary, integer. The derivation of the quantum Cramér-Rao bound still holds, because it uses only the condition (32). When the parametrization ρ t is C 1 continuous, the first derivative d dt t Tr ρ ⊗ t M (dt) is continuous at t = t 0 , and the locally unbiased condition at t 0 yields the local asymptotic covariance at t 0 in the way as Eq. (30).
Another relaxation of the unbiasedness condition is asymptotic unbiasedness [11] lim n→∞ x Tr ρ ⊗n The condition of asymptotic unbiasedness leads to a precision bound on MSE [32,Chapter 6]. The bound is given by the SLD Fisher information, and therefore it is attainable for Gaussian states. However, no attainable bound for qudit systems has been derived so far under the condition of asymptotic unbiasedness. Interestingly, one cannot directly use the attainability for Gaussian systems to derive an attainability result for qudit systems, despite the asymptotic equivalence between Gaussian systems and qudit systems stated by quantum local asymptotic normality (Q-LAN) (see [16,17] and Section 4.1). The problem is that the error of Q-LAN goes to 0 for large n, but the error in the derivative may not go to zero, and therefore the condition (34) is not guaranteed to hold.
In order to guarantee attainability of the quantum Cramér-Rao bound, one could think of further loosening the condition of the asymptotic unbiasedness. An attempt to avoid the problem of the Q-LAN error could be to remove condition (34), and keep only condition (33). This leads to an enlarged class of estimators, called weakly asymptotically unbiased. The problem with these estimators is that no general MSE bound is known to hold at every point x. For example, one can find superefficient estimators [33,34], which violate the Cramér-Rao bound on a set of points. Such a set must be of zero measure in the limit n → ∞, but the violation of the bound may occur in a considerably large set when n is finite. In contrast, local asymptotic covariance guarantees the MSE bound (25) at every point t where the local asymptotic convariance condition is satisfied.

Holevo bound and Gaussian states families
3.1. Holevo bound. When studying multiparameter estimation in quantum systems, we need to address the tradeoff between the precision of estimation of different parameters. This is done using two types of quantum extensions of Fisher information matrix: the SLD and the right logarithmic derivative (RLD).
Consider a multiparameter family of density operators {ρ t } t∈Θ , where Θ is an open set in R k , k being the number of parameters. Throughout this section, we assume that ρ t0 is invertible and that the parametrization is C 1 in all parameters. Then, the SLD L j and the RLDL j for the parameter t j are defined through the following equation see e.g. [6,5], and [15,Section II]. It can be seen from the definitions that the SLD L j can always be chosen to be Hermitian, while the RLDL j is in general not Hermitian. The SLD quantum Fisher information matrix J t and the RLD quantum Fisher information matrixJ t are the k × k matrices defined as Notice that the SLD quantum Fisher information matrix J t is a real symmetric matrix, but the RLD quantum Fisher information matrixJ t is not a real matrix in general. A POVM M is called an unbiased estimator for the family S = {ρ t } when the relation holds any parameter t. For a POVM M , we define the mean square error (MSE) It is known that an unbiased estimator M satisfies the SLD type and RLD type of Cramer-Rao inequalities respectively [5]. Since it is not always possible to minimize the MSE matrix under the unbiasedness condition, we minimize the weighted MSE tr W V t (M ) for a given weight matrix W ≥ 0, where tr denotes the trace of k × k matrices. When a POVM M is unbiased, one has the RLD bound [5] tr with In particular, when W > 0, the lower bound (39) is attained by the matrix The RLD bound has a particularly tractable form when the model is Dinvariant: Definition 1 The model {ρ t } t∈Θ is D-invariant at t when the space spanned by the SLD operators is invariant under the linear map D t defined by where [A, B] = AB − BA denotes the commutator. When the model is Dinvariant at any point, it is simply called D-invariant.
For a D-invariant model, the RLD quantum Fisher information can be computed in terms of the D-matrix, namely the skew-symmetric matrix defined as Precisely, the RLD quantum Fisher information has the expression [5] Hence, (39) becomes For D-invariant models, the RLD bound is larger and thus it is a better bound than the bound derived by using the SLD Fisher information matrix (the SLD bound). However, in the one-parameter case, when the model is not D-invariant, the RLD bound is not tight, and it is common to use the SLD bound in the one-parameter case. Hence, both quantum extensions of the Cramér-Rao bound have advantages and disadvantages.
To unify both extensions, Holevo [5] derived the following bound, which improves the RLD bound when the model is not D-invariant. For a k-component vector X of operators, define the k × k matrix Z t (X) as Then, Holevo's bound is as follows: for any weight matrix W , one has = min X tr W Re(Z t (X)) + tr where UB M denotes the set of all unbiased measurements under the model M, V is a real symmetric matrix, and X = (X i ) is a k-component vector of Hermitian operators satisfying C H,M (W, t) is called the Holevo bound. When W > 0, there exists a vector X achieving the minimum in (45). Hence, similar to the RLD case, the equality in (45) holds for W > 0 only when Moreover, we have the following proposition. Proposition 1 ([15, Theorem 4]) Let S = {ρ t } t∈Θ be a generic k-parameter qudit model and let S = {ρ t,p } t,p be a k -parameter model containing S as ρ t = ρ (t,0) . When S is D-invariant and the inverseJ −1 t of the RLD Fisher information matrix of the model S exists, we have In (49), min X:M denotes the minimum for vector X whose components X i are linear combinations of the SLDs operators in the model M . In (50), the minimization is taken over all k × k matrices satisfying the constraint (P ) ij := δ ij for i, j ≤ k, J t0 and D t are the SLD Fisher information matrix and the D-matrix [cf. Eqs. (35) and (41)] for the extended model S at (t 0 , 0). The Holevo bound is always tighter than the RLD bound: The equality holds if and only if the model M is D-invariant [35]. In fact, the minimum D-invariant subspace in the space of Hermitian matrices is given as follows. Let V be the subspace spanned SLDs {L i } of the original model M at ρ t0 . Let V be the subspace spanned by ∪ ∞ j=0 D j t0 (V). Then, the subspace V is D-invariant and contains V. Since any D-invariant subspace containing V contains V , V is the minimum D-invariant subspace to contain V. In fact, the minimum is achieved when each component of the vector X is included in the minimum D-invariant subspace V due to the following reason. Let V be the orthogonal space with respect to V for the inner product defined by Tr ρX † Y . We denote their projections by P and P . For each component X i of the vector X X i is written as P X i + P X i . Then, the two vectors X := (P X i ) and X := (P X i ) satisfy the inequality Z t (X) = Z t (X ) + Z t (X ) ≥ Z t (X ). Since only the part P X i contributes the condition (47) and the part P X i does not contribute the condition (47), the minimum (46) is attained only when X = 0. Hence, the minimum is achieved when each component of the vector X is included in the minimum D-invariant subspace V . Therefore, since the minimum D-invariant subspace can be uniquely defined, the Holevo bound does not depend on the choice of the D-invariant model.

3.2.
Classical and quantum Gaussian states. For a classical system of dimension d C , a Gaussian state is a d C -dimensional normal distribution N [α C , Γ C ] with mean α C and covariance matrix Γ . The corresponding random variable will be denoted as Z = (Z 1 , . . . , Z d C ) and will take values z = (z 1 , . . . , z d C ) ∈ R d C .
For quantum systems we will restrict our attention to a subfamily of Gaussian states, known as displaced thermal states. For a quantum system made of a single mode, the displaced thermal states are defined as where α ∈ C is the displacement, T Q α is the displacement operator,â is the annihilation operator, satisfying the relation [â,â † ] = 1, and ρ thm β is the thermal state, defined as where the basis {|j } j∈N consists of the eigenvectors ofâ †â and β ∈ (0, ∞) is a real parameter, hereafter called the thermal parameter. For a quantum system of d Q modes, the products of single-mode displaced thermal states will be denoted as where is the vector of thermal parameters. In the following we will regard α as a vector in R 2d Q , using the notation For a hybrid system of d C classical variables and d Q quantum modes, we define the canonical classical-quantum (c-q) Gaussian states G[α, Γ ] as Equivalently, the canonical Gaussian states can be expressed as where T α is the Gaussian shift operator For the classical part, we have adopt the notation With this notation, the canonical Gaussian state G[α, Γ ] is uniquely identified by the characteristic equation [5] Tr Therefore, we find that the canonical Gaussian state G[α, Γ ] satisfies the following characteristic equation [5].
The formulation in terms of the characteristic equation (62) can be used to generalize the notion of canonical Gaussian state [36]. Given a d-dimensional positive semi-definite matrix (correlation matrix) Γ , we define the operators R := (R 1 , . . . , R d ) via the commutation relation We define the general Gaussian state G[α, Γ ] on the operators R as the linear functional on the operator algebra generated by R 1 , . . . , R d satisfying the characteristic equation (62) [36].
With this definition, we have the following lemma.

Lemma 3 Given a Hermitian matrix Γ , there exists an invertible real matrix
T such that the Hermitian matrix T Γ T T is the correlation matrix of a canonical Gaussian state. In particular, when Im(Γ ) is invertible, 2 Ω d Q and the vector β is unique up to the permutation. Further, when two matrices T andT satisfy the above condition, the canonical Gaussian states The proof is provided in Appendix C.
In the above lemma, the unitary operation on the classical part is given as a scale conversion. Hence, an invertible real matrix T can be realized by the combination of a scale conversion and a linear conversion, which can be implemented as a unitary on the Hilbert space. Indeed, Γ does not have the block form as Γ C ⊕Γ Q in general. However, applying orthogonal transformation, we have the block form as Γ C ⊕ Γ Q where Γ C is real. Hence, a general Gaussian state can be given as the resultant linear functional on the operator algebra after the application of the linear conversion to a canonical Gaussian state. This kinds of construction is uniquely up to unitarily equivalence. Indeed, Petz [36] showed a similar statement by using Gelfand-Naimark-Segal (GNS) construction. Our derivation directly shows the uniqueness without using the GNS construction.

Lemma 4 The Gaussian states family
This lemma shows the inverse of the RLD Fisher information matrix is given by the correlation matrix.
Proof. Due to the coordinate conversion give in Lemma 3, it is sufficient to show the relation (64) for the canonical Gaussian states family. In that case, the desired statement has already been shown by Holevo in [5].
Therefore, as shown in Appendix D, a D-invariant Gaussian model can be characterized as follows: Lemma 5 Given an d×d strictly positive-definite Hermitian matrix Γ = A+iB (A, B are real matrices) and a d × k real matrix T with k ≤ d, the following conditions are equivalent.
The image of the linear map A −1 T is invariant for the application of B.
(3) There exist a unitary operator U and a Hermitian matrix Γ 0 such that where

Measurements on Gaussian states family.
We discuss the stochastic behavior of the outcome of the measurement on the c-q system generated by R = (R j ) d j=1 when the state is given as a general Gaussian state G[α, Γ ]. To this purpose, we introduce the notation ℘ α|M (B) := Tr G[α, Γ ]M (B) for a POVM M . Then, we have the following lemma.
In this case, the weighted covariance matrix is The proof is provided in Appendix G.1.
In the above lemma, when X = R, we simplify M Γ P |W to M Γ W . This lemma is useful for estimation in the Gaussian states family M := {G[t, Γ ]} t∈R d . In this family, we consider the covariant condition.
Then, we have the following lemma for this Gaussian states family. Corollary 1 ( [5]) For any weight matrix W ≥ 0 and the above Gaussian states family M , we have where CUB M are the sets of covariant unbiased estimators for the model M , respectively. Further, when W > 0, the above infimum is attained by the covariant unbiased estimators M Γ W whose output distribution is the normal distribution with average t and covariance matrix Re This corollary can be shown as follows. Due to Lemma 4, the lower bound (43) of the weighted MSE tr W V t (M ) of unbiased estimator M is calculated as the RHS of (66). Lemma 6 guarantees the required performance of M Γ W . To discuss the case when W is not strictly positive definite, we consider W := W + I. Using the above method, we can construct an unbiased and covariant estimator whose output distribution is the 2d Q -dimensional distribution of average t and covariance Re( We have the following corollary for the subfamily M : Further, when W > 0, we choose a vector X to realize the minimum in (49). The above infimum is attained by the covariant unbiased estimators M W whose output distribution is the normal distribution with average t and covariance matrix (49) can be given when the components X are given a linear combination of R 1 , . . . , R k . Hence, the latter part of the corollary with W > 0 follows from (45) and Lemma 6, implies this corollary for W > 0. The case with non strictly positive W can be shown by considering W in the same way as Corollary 1.

Local asymptotic normality
The extension of the bound from one-parameter estimation to multiparameter estimation is quite nontrivial and it is not immediately clear whether the bound is attainable. Hence we first develop the concept of local asymptotic normality which is the key tool to constructing the optimal measurement. Since we could derive the tight bound for the Gaussian states family, it is a natural idea to approximate the general case by Gaussian states family, and local asymptotic normality will serve as the bridge between these general qudit families and Gaussian state families.

4.1.
Quantum local asymptotic normality with specific parametrization. For a quantum system of dimension d < ∞, also known as qudit, we consider generic states, described by density matrices with full rank and non-degenerate spectrum. To discuss quantum local asymptotic normality, we need to define a specific coordinate system. For this aim, we consider the neighborhood of a fixed density matrix ρ θ0 , assumed to be diagonal in the canonical basis of C d , and parametrized as with spectrum ordered as θ 0,1 > · · · > θ 0,d−1 > θ 0,d > 0. In the neighborhood of ρ θ0 , we parametrize the states of the system as and U θ R ,θ I is the unitary matrix defined by Here θ R and θ I are vectors of real parameters (θ R j,k ) 1≤j<k≤d and (θ I j,k ) 1≤j<k≤d , and F I (F R ) is the matrix defined by (F I ) j,k := iδ j,k − iδ k,j ((F R ) k,j := δ j,k + δ k,j ), where δ j,k is the delta function. We note that by this definition the components of θ R and θ I are in one-to-one correspondence. The parameter θ = (θ C , θ R , θ I ) will be referred to as the Q-LAN coordinate, and the state with this parametrization will be denoted by ρ KG θ [17,16,37]. Q-LAN establishes an asymptotic correspondence between multicopy qudit states and Gaussian shift models. Using the parameterization θ = (θ C , θ R , θ I ), we have the multicopy qudit models and Gaussian shift models are equivalent in terms of RLD quantum Fisher information matrix: Lemma 7 The RLD quantum Fisher information matrices of the qudit model and the corresponding Gaussian model in Eq. (76) are both equal to The calculations can be found in Appendix E. The quantum version of local asymptotic normality has been derived in several different forms [17,16,37] with applications in quantum statistics [38,12], benchmarks [39] and data compression [40]. Here we use the version of [17], which states that n identical copies of a qudit state can be locally approximated by a c-q Gaussian state in the large n limit. The approximation is in the following sense: Definition 3 (Compact uniformly asymptotic equivalence of models) For every n ∈ N * , let {ρ t,n } t∈Θn and { ρ t,n } t∈Θn be two models of density matrices acting on Hilbert spaces H and K respectively where the set of parameters Θ n may depend on n. We say that the two families are asymptotically equivalent for t ∈ Θ n , denoted as ρ t,n ∼ = ρ t,n (t ∈ Θ n ), if there exists a quantum channel T n (i.e. a completely positive trace preserving map) mapping trace-class operators on H to trace-class operators on K and a quantum channel S n mapping trace-class operators on K to trace-class operators on H, which are independent of t and satisfy the conditions Next, we extend asymptotic equivalence to compact uniformly asymptotic equivalence. In this extension, we also describe the order of the convergence.
Given a sequence {a n } converging to zero, for every t in a compact set K consider two models {ρ t,t ,n } t∈Θn, and { ρ t,t ,n } t∈Θn . We say that they are asymptotically equivalent for t ∈ Θ n compact uniformly with respect to t with order a n , denoted as ρ t,t ,n t ∼ = ρ t,t ,n (t ∈ Θ n , a n ), if for every t ∈ K there exists a quantum channel T n,t mapping trace-class operators on H to trace-class operators on K and a quantum channel S n,t mapping trace-class operators on K to trace-class operators on H such that Notice that the channels T n,t and S n,t depend on t and are independent of t.
In the above terminology, Q-LAN establishes an asymptotic equivalence between families of n copy qudit states and Gaussian shift models. Precisely, one has the following Proposition 2 (Q-LAN for a fixed parameterization; Kahn and Guta [17,16]) For any x < 1/9, we define the set Θ n,x of θ as ( · denotes the vector norm). Then, we have the following compact uniformly asymptotic equivalence; where κ is a parameter to satisfy κ ≥ 0.027, and N [θ C , Γ θ0 ] is the multivariate normal distribution with mean θ C and covariance matrix Γ θ0,k,l : The conditions (74) and (75) are not enough to translate precision limits for one family into precision limits for the other. This is because such limits are often expressed in terms of the derivatives of the density matrix, whose asymptotic behaviour is not fixed by (74) and (75). In the following we will establish an asymptotic equivalence in terms of the RLD quantum Fisher information.

4.2.
Quantum local asymptotic normality with generic parametrization. In the following, we explore to which extent can we extend Q-LAN in Proposition 2. Precisely, we derive a Q-LAN equivalence as in Eq. (76) which is not restricted to the parametrization of Eqs. (69) and (70).
In the previous subsection, we have discussed the specific parametrization given in (68). In the following, we discuss a generic parametrization. Given an arbitrary D-invariant model ρ ⊗n with vector parameter t, we have the following theorem. Theorem 3 (Q-LAN for an arbitrary parameterization) Let {ρ t } t∈Θ be a k-parameter D-invariant qudit model. Assume that ρ t0 is a non-degenerate state, the parametrization is C 2 continuous, andJ −1 t0 exists. Then, there exist a constant c(t 0 ) such that the set whereJ −1 t0 is the RLD Fisher information at t 0 and κ is a parameter to satisfy κ ≥ 0.027.
Proof. We choose the basis {|i } d i=1 to diagonalize the state ρ t0 . We denote the Q-LAN parametrization based on this basis by ρ KG|t0 θ , where this parametrization depends on t 0 . It is enough to consider the neighborhood with suitable choice of the constant c(t 0 ). When f t0 expresses the Jacobian ma- Since O(n −5/18 ) is smaller than n −κ , the combination of this evaluation and (79) yields The combination of Lemma 5 and (80) implies (78).

The -difference RLD Fisher information matrix
In Section 2.1 we evaluated the limiting distribution in the one-parameter case, using the fidelity as a discretized version of the SLD Fisher information. In order to tackle the multiparameter case, we need to develop a similar discretization for RLD Fisher information matrix, which is the tool of choice for multiparameter setting (cf. Section 3). In this section we define a discretized version of the RLD Fisher information matrix, extending to the multiparameter case the singleparameter definition introduced by Tsuda and Matsumoto [25], who in turn extended the corresponding classical notion [41,42].

Definition.
Let M = {ρ t } t∈Θ be a k-parameter model, with the property that ρ t0 is invertible. If the parametrization ρ t is differentiable, the RLD quantum Fisher information matrixJ t can be rewritten as the following k × k matrix The -difference RLD quantum Fisher information matrixJ t0, is defined by replacing the partial derivatives with finite increments: where e j is the unit vector with 1 in the j-th entry and zero in the other entries.
When the parametrization ρ t is differentiable, one has whereJ t0 is the RLD quantum Fisher information matrix (81). When the parametrization is not differentiable, we define the RLD Fisher information matrixJ t0 to be the limit (84), provided that the limit exists. All throughout this section, we impose no condition on the parametrization ρ t , except for the requirement that ρ t0 be invertible.

The -difference RLD Cramér-Rao inequality.
A discrete version of the RLD quantum Cramér-Rao inequality can be derived under the assumption of -locally unbiasedness, defined as follows: Under the -locally unbiasedness condition, Tsuda et al. [25] derived a lower bound on the MSE for the one-parameter case. In the following theorem, we extend the bound to the multiparameter case. Theorem 4 ( -difference RLD Cramér-Rao inequality) The MSE matrix for an -locally unbiased POVM M at t 0 satisfies the bound Proof. For simplicity, we assume that t 0 = 0. For two vectors a ∈ C k and b ∈ C k , we define the two observables X := i a i x i M (dx) and Y := j b j ρt 0 + e j −ρt 0 ρ −1 t0 . Then, the Cauchy-Schwartz inequality implies the second equality following from -locally unbiasedness at t 0 . Note that one has Tr[Y † Y ρ t0 ] = b|J t0, |b and which implies a|V t0 (M )|a ≥ a|(J t0, ) −1 |a . Since a is arbitrary, the last inequality implies (85).
The -difference RLD Cramér-Rao inequality can be used to derive an information processing inequality, stating that the -difference RLD Fisher information matrix is non-increasing under the application of measurements. For a family of probability distributions {P t } t∈Θ , we assume that P t+ ej is absolutely continuous with respect to P t for every j. Then, the -difference RLD Fisher information is defined as where p t+ ej and p t+ ei are the Radon-Nikodým derivatives of P t+ ej and P t+ ei with respect to P t , respectively while the papers [41,42] defined its one-parameter version when the distributions are absolutely continuous with respect to the Lebesgue measure. Hence, when an estimatort for the distribution family {P t } t∈Θ is -locally unbiased at t 0 , in the same way as (85), we can show the -difference Cramér-Rao inequality; For a family of quantum states {ρ t } t∈Θ and a POVM M , we denote by J M t, the -difference Fisher information matrix of the probability distribution family {P M t } t∈Θ defined by P M t := Tr M ρ t . With this notation, we have the following lemma: Lemma 8 For every family of quantum states {ρ t } t∈Θ and every POVM M , one has the information processing inequalitỹ whereJ t0, is the -difference RLD Fisher information of the model {ρ t } t∈Θ .
Proof. Consider the estimation of t from the probability distribution family {P M t } t∈Θ . Following the same arguments used for the achievability of the Cramér-Rao bound with locally unbiased estimators (see, for instance, Chapter 2 of Ref. [32]), it is possible to show that there exists an -locally unbiased estimatort at t 0 such that Combining the POVM M with the -locally unbiased estimatort we obtain a new POVM M , which is -locally unbiased. Applying Theorem 4 to the POVM M we obtain which implies (91).
We stress that (91) is a matrix inequality for Hermitian matrices: in general, J t0, has complex entries. Also, note that any classical process can be regarded as a POVM. Hence, in the same way as (91), using the -difference Cramér-Rao inequality (90), we can show the inequality for an classical process E when J is the -difference Fisher information matrix on the distribution family {P t } t∈Θ and J is the -difference Fisher information matrix on the distribution family {E(P t )} t∈Θ .

Extended models.
The lemmas in the previous subsection can be generalized to the case where an extended model M : Fisher information matrix at t 0 for the family M byJ t 0 , . Lemma 9 For an -locally unbiased estimator M at t 0 , there exists a k × k matrix P such that P ij = δ ij for i, j ≤ k and Proof of Lemma 9: For an -locally unbiased estimator M at t 0 , there exists a k × k matrix P such that Considering the difference of the expectation of the estimate, we obtain the following statement as an extension of Theorem 4.
In the same way as Lemma 8, Lemma 9 yields the following lemma. Lemma 10 For any POVM M , there exists a k×k matrix P such that P ij = δ ij for i, j ≤ k and 5.4. Asymptotic case. We denote byJ n t0, the -difference RLD Fisher information matrix of the n-copy states {ρ ⊗n t } t∈Θ . In the following we provide the analogue of Lemma 1 for the RLD Fisher information matrix. Lemma 11 When the parametrization is C 1 continuous, the following relations hold where Proof of Lemma 11: Eq. (100) holds trivially. Using (84), we have which implies (99).

Precision bounds for multiparameter estimation
6.1. Covariance conditions. First, we introduce the condition for our estimators. The correspondence between qudit states and Gaussian states also extends to the estimator level. We address a generic state family M = {ρ t } t∈Θ , with the parameter space Θ being an open subset of R k . Similar to the single-parameter case, given a point t 0 ∈ Θ, we consider a local model ρ n t0,t := ρ ⊗n t0+t/ √ n . Throughout this section, we assume that ρ t0 is invertible. For a sequence of POVM m := {M n }, we introduce the condition of local asymptotic covariance as follows: Condition 2 (Local asymptotic covariance) We say that a sequence of measurements m := {M n } satisfies local asymptotic covariance at t 0 ∈ Θ under the state family M, if the probability distribution ℘ n t0,t|Mn (B) := Tr ρ ⊗n converges to a limiting distribution the relation holds for any t ∈ R k2 . When we need to express the outcome of ℘ n t0,t|Mn or ℘ t0,t|m , we denote it byt.
Further, we say that a sequence of measurements m := {M n } satisfies local asymptotic covariance under the state family M when it satisfies local asymptotic covariance at any element t 0 ∈ Θ under the state family M.
Under these preparations, we obtain the following theorem by using Theorem 3. Theorem 5 Let {ρ ⊗n t } t∈Θ be a k-parameter D-invariant qudit model with C 2 continuous parametrization. Assume thatJ −1 t0 exists, ρ t0 is a non-degenerate state, and a sequence of measurements m := {M n } satisfies local asymptotic covariance at t 0 ∈ Θ. Then there exists a covariant POVM M G such that for any vector t and any measurable subset B. HereJ t0 is the RLD Fisher information of the qudit model at t 0 .
To show Theorem 5, we will use the following lemma.
Here ξ and y are k-dimensional vectors, |y is a (multimode) coherent state, γ j are thermal parameters of the Gaussian, and F −1 ξ→y (g) denotes the inverse of the Fourier transform F ξ→y (g) := dξ e iξ·y g. Therefore, for a given function f (α), there uniquely exists an operator F to satisfy (105). 2 The range of t is determined via the constraint t 0 +t/ √ n ∈ Θ. Just as in the one-parameter case, t can take any value in R k when n is large enough. The range of the local parameter is then t ∈ R k .
The proof can be found in Appendix F. Now, we are ready to prove Theorem 5.
Proof of Theorem 5: We consider without loss of generality G[t,J −1 t0 ] to be in the canonical form, noticing that any Gaussian state is unitarily equivalent to a Gaussian state in the canonical form as shown by Lemma 3. For any measurable set B, we define the operator M G (B) as What remains to be shown is that the POVM { M G (B)} satisfies the covariance condition. Eq. (108) guarantees that and The uniqueness of the operator to satisfy the condition (105) implies the covariance condition 6.2. MSE bound for the D-invariant case. Next, we derive the lower bound of MSE of the limiting distribution for any D-invariant model. As an extension of the mean square error, we introduce the mean square error matrix (MSE matrix), defined as for a generic probability distribution ℘. Since the set of symmetric matrices is not totally ordered, we will consider the minimization of the expectation value tr W V [℘ t0,t|m ] for a certain weight matrix W ≥ 0. For short, we will refer to the quantity tr W V [℘ t0,t|m ] as the weighted MSE. Under local asymptotic covariance, one can derive lower bounds on the covariance matrix of the limiting distribution and construct optimal measurements to achieve them. In general, the attainability of the conventional quantum Cramér-Rao bounds is a challenging issue. For instance, a well-known bound is the symmetric logarithmic derivative (SLD) Fisher information bound where J t0 is the SLD Fisher information. The SLD bound is attainable in the single-parameter case, i.e. when k = 1, yet it is in general not attainable for multiparameter estimation (see, for instance, later in Subsection 10.1 for a concrete example).
In the following, we derive an attainable lower bound on the weighted MSE. To this purpose, we define the set LAC(t 0 ) of local asymptotic covariant sequences of measurements at the point t 0 ∈ Θ. For a model M, we focus on the minimum value When k ≥ 2, a better choice is the RLD quantum Fisher information bound.
The main result of this section is an attainable bound on the weighted MSE, relying on the RLD quantum Fisher information. Theorem 6 (Weighted MSE bound for D-invariant models) Assume that J −1 t0 exists. Consider any sequence of locally asymptotically covariant measurements m := {M n }. The limiting distribution is evaluated as whereJ t0 is the RLD quantum Fisher information. When the model is C 1 continuous and D-invariant, we have the bound for the weighted MSE with weight matrix W ≥ 0 of the limiting distribution as where J t0 is the SLD quantum Fisher information (35) and D t0 is the D-matrix (41). When S is a D-invariant qudit model and the state ρ t0 is not degenerate, we have Moreover, if W > 0 and ℘ t0,0|m has a differentiable PDF, the equality in (113) holds if and only if ℘ t0,t|m is the normal distribution with average zero and covariance Further, when {ρ t } t∈Θ is a qudit-model with C 2 continuous parametrization, the equality in (113) holds, i.e., there exist a sequence of POVMs M t0,n W , a compact set K, and constant c(t 0 ) such that where κ is a parameter to satisfy κ ≥ 0.027 In the following, we prove Theorem 6 following three steps. The first step is to derive the bound (113). The second step is to show that, to achieve the equality, the limiting distribution needs to be a Gaussian with certain covariance. The last step is to find a measurement attaining the equality. In this way, when the state is not degenerate, we can construct the measurement using Q-LAN 3 .
Proof of Theorem 6: Impossibility part 4 (Proofs of (112) and (113)): To give a proof, we focus on the -difference RLD Fisher information matrix J t0, at t 0 for a quantum states family {ρ t } t∈Θ . We denote the -difference Fisher information matrices for the distribution family {℘ n t0,t|Mn } t and {℘ t0,t|m } t by J n t, and J m t, , respectively. Also, we employ the notations given Section 5.4. Applying (91) to the POVM M n , we have Combining (99) By taking the limit n → ∞, the combination of (117), (99) of Lemma 11,and (118) impliesJ Here, in the same way as the proof of Theorem 2, we can assume that the outcomet satisfies the unbiasedness condition. Hence, -difference Carmér-Rao inequality implies that By taking the limit → 0, (100) of Lemma 11 implies 3 We note that state estimation involving degenerate states was only solved in a few special cases (see, e.g., [43] for the case of maximally mixed qubits). 4 Note that when the state is not degenerate the proof of the impossibility part of Theorem 6 can be simplified using the correspondence between n-copy qudit states and Gaussian states as established by the Q-LAN.
When the model is C 1 continuous and D-invariant, adding the conventional discussion for MSE bounds (see, e.g., Chapter 6 of [5]) to (120), we obtain (113). Achievability part (Proof of (114)): Next, we discuss the attainability of the bound when W > 0 and ℘ t0,0|m has a differentiable PDF. In this case, we have the Fisher information matrix J m 0 of the location shift family {℘ t0,t|m } t . Taking limit → 0 in (120), we have The equality of (113) holds if and only if V [℘ t0,t|m ] = V t0|W and the equality in the first inequality of (122) holds. Due to the same discussion as the proof of Theorem 2, the equality in the first inequality of (122) holds only when all the components of the logarithmic derivative of the distribution family {℘ t0,t|m } t equal the linear combinations of the estimate of t i . This condition is equivalent to the condition that the distribution family {℘ t0,t|m } t is a distribution family of shifted normal distributions. Therefore, when W > 0, the equality condition of Eq. (113) is that ℘ t0,t|m is the normal distribution with average zero and covariance matrix V t0|W . Now, we assume that the state ρ t0 is not degenerate. Then, we use Q-LAN to show that there always exists a sequence of POVM m = {M n } satisfying the above property. We rewrite Eq. (78) of Theorem 3 as follows.
lim sup where the notation is the same as Theorem 3. Then, we choose the covariant POVM MJ Meanwhile, since W > 0 we can repeat the above argument to find a qudit measurement that attains tr W J −1 . Taking the limit → 0 the quantity tr W J −1 converges to the equality of Eq. (114). Therefore, we can still find a sequence of measurements with Fisher information {J } that approaches the bound.

Precision bound for the estimation of generic models.
In the previous subsection, we established the precision bound for D-invariant models, where the bound is attainable and has a closed form. Here we extend the bound to any ncopy qudit models. The main idea is to extend the model to a larger D-invariant model by introducing additional parameters. When estimating parameters in a generic model S (consisting of states generated by noisy evolutions, for instance), the bound (113) may not hold. It is then convenient to extend the model to a D-invariant model S which contains S. Since the bound (113) holds for the new model S , a corresponding bound can be derived for the original model S. The new model S has some additional parameters other than those of S, which are fixed in the original model S. Therefore, a generic quantum state estimation problem can be regarded as an estimation problem in a D-invariant model with fixed parameters. The task is to estimate parameters in a model S (globally) parameterized as t 0 = (t 0 , p 0 ) ∈ Θ , where p 0 is a fixed vector and Θ is an open subset of R k that equals Θ when restricted to R k . In the neighborhood of t 0 , since the vector p 0 is fixed, we have t = (t, 0) with 0 being the null vector of R k −k and t ∈ R k being a vector of free parameters. For this scenario, only the parameters in t need to be estimated and we know the parameters p 0 . Hence, the MSE of t is of the form ,t|m ] 0 0 0 for any local asymptotic covariant measurement sequence m. Due to the block diagonal form of the MSE matrix, to discuss the weight matrix W in the original model S, we consider the weight matrix W = P T W P in the D-invariant model S , where P is any k × k matrix satisfying the constraint (P ) ij := δ ij for i, j ≤ k in the following way. Theorem 7 (MSE bound for generic models) The models S and S are C 1 continuous and are given in the same way as Proposition 1, and the notations is the same as Proposition 1. Also, we assume thatJ −1 t0 exists. Consider any sequence of locally asymptotically covariant measurements m := {M n }. Then, the MSE matrx of the limiting distribution is evaluated as ℘ t0,t|m . There exists a k × k matrix P such that whereJ t 0 is the RLD quantum Fisher information of S . When the model M is a D-invariant model, we have the bound for the weighted MSE with weight matrix W ≥ 0 of the limiting distribution as with C H,M (W, t 0 ) defined in Eq. (46). When the model M is a D-invariant qudit model and the state ρ t0 is not degenerate, we have Moreover, if W > 0 and ℘ t0,0|m has a differentiable PDF, the equality in (126) holds if and only if ℘ t0,t|m is the normal distribution with average zero and where X is the vector to realize the minimum (49). Further, when the models S and S are qudit-models with C 2 continuous parametrization, the equality in (126) holds, i.e., there exist a sequence of POVMs M t0,n W , a compact set K, and constant c(t 0 ) such that Theorem 7 determines the ultimate precision limit for generic qudit models. Now, we compare it with the most general existing bound on quantum state estimation, namely Holevo's bound [5]. Let us define the ultimate precision of unbiased measurements as Since the Holevo bound still holds with the n-copy case, (see [15,Lemma 4]) we have There are a couple of differences between our results and existing results: The Holevo bound is derived under unbiasedness assumption, which, as mentioned earlier, is more restrictive than local asymptotic covariance. Our bound (126) thus applies to a wider class of measurements than the Holevo bound. Furthermore, Yamagata et. al. [19] showed a similar statement as (128) of Theorem 7 in a local model scenario. They did not show the compact uniformity of the convergence and had no order estimation of the convergence. However, our evaluation (128) guarantees the compact uniformity with the order estimation. Then, they did not discuss an estimator to attain the bound globally. Later, we will construct an estimator to attain our bound globally based on the estimator given in Theorem 7. Our detailed evaluation with the compact uniformity and the order estimation enables us to evaluate the performance of such an estimator globally.
Proof of Theorem 7: Impossibility part (Proofs of (125) and (126)): We denote the -difference Fisher information matrices for the distribution family {℘ n t0,t|Mn } t and {℘ t0,t|m } t by J n t, and J m t, , respectively. Also, we denote the -difference type RLD Fisher information matrix at t 0 = (t 0 , 0) of the family {ρ ⊗n t } t byJ n t 0 , . Then, we have (118) in the same way. Applying (98) of Lemma 10 with → / √ n, there exist k × k matrices P n such that Hence, the combination of (99) of Lemma 11,(131), and (118) implies that there exists a k × k matrices P such that Due to the same reason as (120), we have By taking the limit → 0, the combination of (100) of Lemma 11 and (134) implies (125). When the model M is D-invariant, since we obtain (126) by using the expression (50) in the same way as (113). Achievability part (Proof of (127)): Since ρ t 0 is not degenerate, we can show the achievability in the same way as Theorem 6 because we can apply Q-LAN (Theorem 3) for the model M . The difference is the following. Choosing the matrix P to achieve the minimum (50), we employ the covariant POVM MJ

Nuisance parameters
For state estimation in a noisy environment, the strength of noise is not a parameter of interest, yet it affects the precision of estimating other parameters. In this scenario, the strength of noise is a nuisance parameter [44,45]. To illustrate the difference between nuisance parameters and fixed parameters that are discussed in the previous section, let us consider the case of a qubit clock state under going a noisy time evolution, studied in [29]. To estimate the duration of the evolution, we introduce the strength of the noise as an additional parameter and consider the estimation problem in the extended model parameterized by the duration and the noise strength, distinguishing between the following two cases. In one case, the strength of the noise is known due to, e.g., some preprocessing procedure, and thus one out of the two parameters is fixed. Knowing the value of this parameter, we can design an optimal estimation strategy, which will depend parametrically on the strength of the noise. In the other case, instead, the strength of the noise may be unknown. Although it is not a parameter of interest, its value will affect the precision of our estimation, and thus it should be treated as a nuisance parameter.

Precision bound for estimation with nuisance parameters.
In this subsection, we consider state estimation of an arbitrary (k + s)-parameter model {ρ t,p } (t,p)∈Θ , where t and p are k-dimensional and s-dimensional parameters, respectively. Our task is to estimate only the parameters t and it is not required to estimate the other parameters p, which is called nuisance parameters. Hence, our estimate is k-dimensional. We say that a parametric family of a structure of nuisance parameters is a nuisance parameter model, and denote it byS = {ρ t,p } (t,p)∈Θ . We simplify (t, p) byt.
The concept of local asymptotic covariance can be extended to a model with nuisance parameters by considering a local model ρ ñ t0,t := ρ ⊗ñ t0+t/ √ n . Throughout this section, we assume that ρt 0 is invertible and all the parametrizations are at least C 1 continuous.

Condition 3 (Local asymptotic covariance with nuisance parameters)
We say that a sequence of measurements m := {M n } to estimate the k-dimensional parameter t satisfies local asymptotic covariance att 0 = (t 0 , p 0 ) ∈Θ under the nuisance parameter modelM when the probability distribution the relation holds for anyt = (t, p) ∈ R k+s . Further, we say that a sequence of measurements m := {M n } satisfies local asymptotic covariance under the nuisance parameter modelM when it satisfies local asymptotic covariance at any elementt 0 ∈Θ under the nuisance parameter modelM.
The quantity we want to bound is tr V [℘ t0,t|m ]W , where ℘ t0,t|m is the limiting distribution of a sequence m of locally asymptotically covariant measurements and W is a weight matrix. Since nuisance parameters are not of interest, the weight matrix of the modelS is a (k + s) × (k + s) matrix of the form Lemma 13 LetS = {ρt}t ∈Θ be a (k + s)-parameter nuisance parameter model and let S = {ρ t } t =(t,q) be a k -parameter model containingS as ρt = ρ (t,0) . When S is D-invariant and the inverseJ −1 t0 exists, we have In (139), V is a real symmetric matrix and X = (X i ) is a k-component vector of operators to satisfy In (140), the minimization is taken over all k × (k + s) matrices satisfying the constraint (P ) ij := δ ij for i ≤ k, j ≤ k + s, and, J t 0 and D t 0 are the SLD Fisher information matrix and the D-matrix [cf. Eqs. (35) and (41)] for the extended model S at t 0 := (t 0 , 0). This lemma is a different statement from [15,Theorem 4]. However, using the method of [15,Theorem 4], we can show this lemma.
In the following, we derive an attainable lower bound on the weighted MSE. To this purpose, we define the set LAC(t 0 ) of local asymptotic covariant sequences of measurements at the pointt 0 ∈Θ for the nuisance parameter modelM, and focus on the minimum value Theorem 8 (Weighted MSE bound with nuisance parameters) The mod-elsS and S are given in the same way as Lemma 13, and the notations is the same as Lemma 13. Also, we assume that (J t 0 ) −1 exists. Consider any sequence of locally asymptotically covariant measurements m := {M n } for the nuisance parameter modelS. Then, the MSE matrx of the limiting distribution is evaluated as follows. There exists a k × k matrix P such that Moreover, if W > 0 and ℘t 0,t|m has a differentiable PDF, the equality in (145) holds if and only if ℘ t0,t|m is the normal distribution with average zero and covariance where X is the vector to realize the minimum (139). Further, when the models S and S are qudit-models with C 2 continuous parametrization, the equality in (145) holds, i.e., there exist a sequence of POVMs M t0,n W , a compact set K, and constant c(t 0 ) such that Here κ is a parameter to satisfy κ ≥ 0.027.
Before proving Theorem 8, we discuss a linear subfamily of k -dimensional Gaussian family {G[t , γ]} t ∈R k . Consider a linear map T from R (k+s) to R k . We have the subfamilyM := {G[T (t, p), γ]} (t,p)∈R k+s as a nuisance parameter model. Then, the covariance condition is extended as follows. Then, we have the following corollary of Lemma 6. Corollary 3 For any weight matrix W ≥ 0, the nuisance parameter modelM = {G[T (t, p), γ]} with C 1 continuous parametrization satisfies where UBM and CUBM are the sets of unbiased estimators and covariant unbiased estimators of the nuisance parameter modelM, respectively. Further, when W > 0, we choose a vector X to realize the minimum in (49). The above infimum is attained by the covariant unbiased estimators M W whose output distribution is the normal distribution with average t and covariance matrix This corollary can be shown as follows. The inequality inf M ∈UBM tr W V t (M ) ≥ C NH,M (W, t) follows from the condition (141). Similar to Corollary 2, Proposition 1 guarantees that the latter part of the corollary with W > 0 follows from (139) and Lemma 6, Hence, we obtain this corollary for W > 0. The case with non strictly positive W can be shown by considering W in the same way as Corollary 1.
Proof of Theorem 8: Impossibility part (Proofs of (144) and (145)): We denote the -difference Fisher information matrix of {℘t 0,t|m }t by J m t0, . Due to (133), there exists a (k + s) × k matrixP satisfying the following conditions.P We define the k × (k + s) matrixP bȳ with respect to ℘t 0,0|m by p i . Then, for two vectors a ∈ R k and b ∈ R k+s , we apply Schwartz inequality to the two variables X : where the final equation follows from the fact that the expectation of the variablê t − E (0,0) [t] equals t =Pt under the distribution ℘t 0,t|m , which can be shown by the covariance condition for the distribution family {℘t 0,t|m }t.
we obtain (145) by using the expression (140) in the same way as (126). Achievability part (Proof of (146)): Since ρ t0 is not degenerate, we can show the achievability part in the same way as Theorem 6 because we can apply Q-LAN (Theorem 3) for the model M . The difference is the following. Instead of Corollary 1, we employ Corollary 3 to choose the covariant POVM MJ A few comments are in order. First, the nuisance parameter bound (145) reduces to the bound (113), when the parameters to estimate are orthogonal to the nuisance parameters in the sense that the RLD Fisher information matrix Jt 0 is block-diagonal. This orthogonality is equivalent to the condition that the SLD Fisher information matrix Jt 0 and the D-matrix take the block diagonal forms This is the case, for instance, of simultaneous estimation of the spectrum and the Hamiltonian-generated phase of a two-level system. Under such circumstances, the inverse of the Fisher information matrix can be done by inverting J t0 and J N independently. The same precision bound is thus obtained with or without introducing nuisance parameters, and we have the following lemma. Lemma 15 When all nuisance parameters are orthogonal to the parameters of interest, the bound with nuisance parameters (145) coincides with the Dinvariant MSE bound (113).
In the case of orthogonal nuisance parameters, the estimation of nuisance parameters does not affect the precision of estimating the parameters of interest, which does not hold for the generic case of non-orthogonal nuisance parameters. Thanks to this fact, one can achieve the bound (145)  The above corollary offers a simple criterion for the important problem of the attainability of RLD bounds. In Section 10.3, we will illustrate the application of this criterion with a concrete example. The bound (145) can be straightforwardly computed even for complex models; for D-invariant models, the SLD operators have an uniform entry-wise expression and one only needs to shot it into a program to yield the bound (145). Moreover, the bound does not rely on the explicit choice of nuisance parameters. To see this, one can consider another parameterization x of the D-invariant model. The bound (145) comes from the RLD bound for the D-invariant model, and the RLD quantum Fisher information matrices J t 0 and J x 0 for two parameterizations are connected by the equation J t 0 = A J x 0 A T , where A is the Jacobian (∂x /∂t ) at t 0 . Since both parameterizations are extensions of the same model S satisfying P 0 t 0 = P 0 x 0 = t 0 , the Jacobian takes the form Then we have J −1 The main result of this subsection is the following corollary: Corollary 5 Define the SLD gap of o i as where MSE oi denotes the MSE of o i under joint measurement and J is the SLD quantum Fisher information. The sum of the SLD gaps for all observables satisfies the attainable bound: where D is the D-matrix.
The right hand side of Eq. (158) is exactly the gap between the SLD bound and the ultimate precision limit. It shows a typical example where the SLD bound is not attainable.
Proof. Substituting W in Eq. (145) by the projection into the subspace R k , we obtain a bound for the MSE {MSE oi } of the limiting distributions: Here J and D are the SLD Fisher information and D-matrix for the extended model, and (A) k×k denotes the upper-left k ×k block of a matrix A. Substituting the above definition into Eq. (159), we obtain Corollary 5.
Specifically, for the case of two parameters, the bound (158) reduces to ji L i are the SLD operators in the dual space. Next, taking partial derivative with respect to o j on both sides of Eq. (156) and substituting in the definition of RLD operators, the observables satisfy the orthogonality relation with the SLD operators as By uniqueness of the dual space, we havê . . , k and the bound becomes Another bound expressing the tradeoff between ∆ o1 and ∆ o2 was obtained by Watanabe et al. [46] as Now, substituting O 2 by αO 2 for a variable α ∈ R in Eq. (161), we have For the above quadratic inequality to hold for any α ∈ R, its discriminant must be non-positive, which immediately implies the bound (162). Notice that the bound (162) was derived under asymptotic unbiasedness [46], and thus it was not guaranteed to be attainable. Here, instead, since our bound (161) is always attainable, the bound (162) can also be achieved in any qudit model under the asymptotically covariant condition.

Nuisance parameters versus fixed parameters.
It is intuitive to ask what is the relationship between the nuisance parameter bound (145) and the general bound (126). To see it, let S = {ρ t } t∈Θ be a generic k-parameter qudit model and letS be a (k + s)-parameter D-invariant model containing S. When ρ t0 is non-degenerate, we notice that the QCR bound with nuisance parameters (145) can be rewritten as where P 0 is a k × (k + s) matrix satisfying the constraint (P 0 ) ij := δ ij for any i, j ≤ k + s. By definition, P 0 is a special case of P , and it follows straightforwardly from comparing Eq. (163) with Eq. (126) that the general MSE bound is upper bounded by the MSE bound for the nuisance parameter case. This observation agrees with the obvious intuition that having additional information on the system is helpful for (or at least, not detrimental to) estimation. At last, since Jt 0 and Dt 0 are block-diagonal in the case of orthogonal nuisance parameters, we have P (Jt 0 ) −1 P T = J −1 t0 P Dt 0 P T = D t0 for any k × (k + s) matrix satisfying the constraint (P ) ij := δ ij for i, j ≤ k. This implies that the general bound (126) coincides with the nuisance parameter bound (145) when the nuisance parameters are orthogonal.

Tail property of the limiting distribution
In previous discussions, we focused on the MSE of the limiting distribution. Here, instead, we consider the behavior of the limiting distribution itself. The characteristic property is the tail property: Given a weight matrix W ≥ 0 and a constant c, we define the tail region T W,c (t) as For a measurement m = {M n (t n )}, the probability that the estimatet n is in the tail region can be approximated by the tail probability of the limiting distribution, i.e.
Prob t n ∈ T W,c (t) = ℘ t0,t|m (T W,c (t)) + n , up to n being a term vanishing in n. The tail property is usually harder to characterize than the MSE. Nevertheless, here we show that, under certain conditions, there exists a good bound on the tail property of the limiting distribution.
8.1. Tail property of Gaussian shift models. Just like in the previous sections, the tail property of n-copy qudit models can be analyzed by studying the tail property of Gaussian shift models. In this subsection, we first derive a bound on the tail probability of Gaussian shift models. The result has an interest in its own and can be used for further analysis of qudit models using Q-LAN.
Consider a Gaussian shift model , β] and a measurement M G (α). Then, define the probability ℘ α|M G (T W,c (α)), where T W,c (α) is the tail region around α defined as Then, for covariant POVMs, the tail probability is independent of α and is given by: When the measurement is covariant, we have the following bound on the tail probability, which can be attained by a certain covariant POVM: with s classical parameters and s pairs of quantum parameters. Assume that a POVM {M G (B)} B⊂R s ×R 2s is covariant and the weight matrix W has the following form; with W C ≥ 0. Then, the tail probability of the limiting distribution is bounded as where e is the 2s-dimensional vector with all entries equal to 1. For the definition of E s e −β + e/2 , see (56). When the POVM M G is given as M G (B) = B |α 1 , . . . , α s α 1 , . . . , α s |dα, the equality in (165) holds.
The proof can be found in Appendix G. When the model has a group covariance, similar evaluation might be possible. For example, similar evaluation was done in the n-copy of full pure states family [47] and in the n-copy of squeezed states family [48, Section 4.1.3].

Tail property of D-invariant qudit models.
For a k-parameter D-invariant model {ρ t }, using Lemma 16 and Q-LAN, we have the following theorem. Theorem 9 Let {ρ t } t∈Θ be a k-parameter D-invariant model. Assume that (J t 0 ) −1 exists, ρ t0 is a non-degenerate state, and a sequence of measurements m := {M n } satisfies local asymptotic covariance at t 0 ∈ Θ. When J The equality holds if and only if ℘ t0,t|m is the normal distribution with average zero and covariance V t0|W as defined in Eq.
(115). We note that bounds on the probability distributions are usually more difficult to obtain and more informative than the MSE bounds, as the MSE can be determined by the probability distribution. Theorem 9 provides an attainable bound of the tail probability, which can be used to determine the maximal probability that the estimate falls into a confidence region T W,c as well as the optimal measurement. Our proof of Theorem 9 needs some preparations. First, we introduce the concept of simultaneous diagonalization in the sense of symmetric transformation. Two 2k × 2k real symmetric matrices A 1 and A 2 are called simultaneously symplectic diagonalizable when there exist a symplectic matrix S and two real vectors β 1 and β 2 such that such that with E k defined in Eq. (56). Regarding the simultaneous diagonalization, we have the following property, whose proof can be found in Appendix H: Lemma 17 The following conditions are equivalent for two 2k × 2k real symmetric matrices A 1 and A 2 .
Using Lemma 17, we obtain the following lemma.

Proof of Theorem 9: Define
. Applying a suitable orthogonal matrix, we assume that A 2 = Ω k without loss of generality.
Step 1: For simplicity, we assume that there is no classical part. First, we choose an orthogonal matrix S such that S T A 2 S = D. Using Lemma 18 guarantees that the condition [J ] = 0 allows us to simultaneously diagonalize W and D t0 . That is, we can choose symplectic ma- For a sequence of measurement m := {M n } to satisfy local asymptotic covariance at t 0 ∈ Θ, according to Theorem 5, we choose a covariant POVM M G to satisfy (104). Applying Lemma 16 to the POVM M G , we obtain the desired statement.
Step 2: We consider the general case. Now, we choose the local parameter t := J −1/2 t0 t. In this coordinate, The inverse of the RLD quantum Fisher information , the weight matrix has no cross term between the classical and quantum parts. Using the above discussion and Lemma 16, we obtain the desired statement.

Extension to global estimation and generic cost functions
In the previous sections, we focused on local models and cost functions of the form tr W V [℘ t0,t|m ]. In this section, our treatment will be extended to global models {ρ t } t∈Θ . (where the parameter to be estimated is not restricted to a local neighborhood) and to generic cost functions.

Optimal global estimation via local estimation.
Our optimal global estimation is given by combining the two-step method and local optimal estimation. That is, the first step is the application of full tomography proposed in [26] on n 1−x/2 copies with the outcomet 0 for a constant x ∈ (0, 2/9), and the second step is the local optimal estimation att 0 , given in Section 6.3, on a n,x := n − n 1−x/2 copies. Before its full description, we define the neighborhood Θ n,x (t) of t ∈ Θ as Given a generic model M = {ρ t } t∈Θ that does not contain any degenerate state and a weight matrix W > 0, we describe the full protocol as follows.
for a compact set K ⊂ Θ, where V t0|W is defined in Eq. (147) and Θ n,x,c(t0) is defined in Eq. (77). Further, when the parameter set Θ is bounded and x < κ, we have the following relation.
Here, we should remark the key point of the derivation. The existing papers [8,11] addressed the achievability of min M tr W J −1 t|M with the two-step method, where J t|M is the Fisher information matrix of the distribution family {℘ t|M } t , which expresses the bounds among separable measurement [32,Exercise 6.42]. Hence it can be called the separable bound. In the one-parameter case, the separable bound equals the Holevo bound. To achieve the separable bound, we do not consider the sequence of measurement. Hence, we do not handle a complicated convergence. The global achievability of the separable bound can be easily shown by the two-step method [8,11]. However, in our setting, we need to handle the sequence of measurement to achieve the local optimality. Hence, we need to carefully consider the compact uniformity and the order estimate of the convergence in Theorem 7. In the following proof, we employ our evaluation with such detailed analysis as in Eq. (128). Proof.
Step 1: Define by t g := t 0 + t √ n the true value of the parameters. By definition, we have t g −t 0 ≤ n − 1−x 2 with probability 1 − O(e −n x/2 ) and t g − t 0 ≤ c(t 0 )n − 1 2 +x by definition. Since the error probability vanishes exponentially, it would not affect the scaling of MSE. In this step, we will show Eq. (128) of Theorem 7 implies Since ℘ n t0,t|Mt 0 ,an,x As we have we obtain Step 2: We will show (171). First, we discuss two exceptional cases t g −t 0 > n − 1−x 2 and t 1 −t 0 > n − 1−x 2 . Eq. (169) guarantees that Tr ρ ⊗n 1−x/2 tg M tomo Eq. (176) and the property of normal distribution implies Tr ρ ⊗an,x tg When t g −t 0 ≤ n − 1−x 2 and t 1 −t 0 ≤ n − 1−x 2 , Eq. (173) holds under the condition t g − t 0 ≤ c(t 0 )n − 1 2 +x , which implies that Since the above evaluation is compactly uniform with respect to t 0 , we have (171).
Step 3: We will show because Eq. (182) implies (172) due to the convergence n an,x → 1. There are three cases.
The compactness of Θ guarantees that the error n(t − t g )(t − t g ) T is bounded by nC with a constant C. Due to (179), the contribution of the first case is bounded by nC · O(e −n x/2 ), which goes to zero.
In the second case, sincet 0 =t, the error n(t − t g )(t − t g ) T is bounded by n(n − 1−x 2 ) 2 = n x . Due to (180), the contribution of the second case is bounded by n x · O(n −κ ) = O(n x−κ ), which goes to zero.
In the third case, since Due to (176), the contribution of the second case is bounded by 2n x · O(n −κ ) = O(n x−κ ), which goes to zero. Therefore, we obtain (182).

Generic cost functions.
Finally, we show that results in this work hold also for any cost function c(t, t), which is bounded and has a symmetric expansion, in the sense of satisfying the following two conditions: (B1) c(t, t) has a continuous third derivative, so that it can be expanded as where the matrix W t ≥ 0 is a continuous function of t. (B2) c(t, t) ≤ C for a constant C > 0 and for anyt, t ∈ R k .
To adopt this situation, we replace the step (A2) by the following step (A2)': (A2)' Based on the first estimatet 0 , apply the optimal local measurement (ii) In addition, if c also satisfies Condition (B2), m c = {M n c } is locally asymptotically covariant and attains the equality in (183) at any point t 0 ∈ Θ corresponding to a non-degenerate state. Theorem 11 is reduced to a bound for the (actual) MSE when c(t, t) = (t T − t T )W (t − t) for W ≥ 0. Therefore, bounds in this work, Eqs. (126) and (145) for instance, are also attainable bounds for the MSE of any locally asymptotically unbiased measurement.

Proof.
Step 1: We prove (1). Consider any sequence of asymptotically covariant measurements m t0 := {M n,t0 } at t 0 . Denote by t g := t 0 + t √ n the true value of the parameters. For a cost function c satisfying (ii), we have Step 2: We prove (2). We replace W by W t in the proof of Theorem 10. In this replacement, (174) is replaced by where x ∈ (0, 2/9). Hence, the contributions of the first and second cases of Step 3 of the proof of Theorem 10 go to zero.
In the third case of Step 3 of the proof, we have t g −t 1 ≤ 2n − 1−x 2 , Hence, Hence, in the contribution of the third case, we can replace the expectation of nc(t 1 , t g ) by the weighted MSE with weight W tg . Hence, we obtain the part (2).

Applications.
In this section, we show how to evaluate the MSE bounds in several concrete examples.
10.1. Joint measurement of observables. Here we consider the fundamental problem of the joint measurement of two observables. For simplicity we choose to analyze qubit systems, although the approach can be readily generalized to arbitrary dimension. The task is to simultaneously estimate the expectation of two observables A and B in a qubit system. The observables can be expressed as A = a · σ and B = b · σ with σ = (σ x , σ y , σ z ) being the vector of Pauli matrices. We assume without loss of generality that |a| = |b| = 1 and a · b ∈ [0, 1). The state of an arbitrary qubit system can be expressed as where n is the Bloch vector. With this notation, the task is reduced to estimate the parameters x := a · n, y := b · n.
It is also convenient to introduce a third unit vector c orthogonal to a and b so that {a, b, c} form a (non-orthogonal) normalized basis of R 3 . In terms of this vector, we can define the parameter z := c·n. In this way, we extend the problem to the full model containing all qubit states, where x, y are the parameters of interest and z is a nuisance parameter. Under this parameterization, we can evaluate the SLD operators for x, y, and z, as well as the SLD Fisher information matrix and the D matrix (see Appendix I for details), substituting which into the bound (145) yields: 1−|n| 2 +x 2 +2x y s+y 2 +z 2 − x y (1−s 2 )+(1−|n| 2 +z 2 )s 1−|n| 2 +x 2 +2x y s+y 2 +z 2 − x y (1−s 2 )+(1−|n| 2 +z 2 )s 1−|n| 2 +x 2 +2x y s+y 2 +z 2 1−|n| 2 +x 2 (1−s 2 )+z 2 1−|n| 2 +x 2 +2x y s+y 2 +z 2    where s := a · b, x = x−ys 1−s 2 , and y = y−xs 1−s 2 . The tradeoff between the measurement precisions for the two observables is of fundamental interest. Substituting the expressions of D-matrix and the SLD Fisher information matrix (see Appendix I) into Eq. (160), we obtain which characterizes the precision tradeoff in joint measurements of qubit observables. 10.2. Direction estimation in the presence of noise. Consider the task of estimating a pure qubit state |ψ = cos θ 2 |0 +e iϕ sin θ 2 |1 , which can also be regarded as determining a direction in space, as qubits are often realized in spin-1/2 systems. In a practical setup, it is necessary to take into account the effect of noise, under which the qubit becomes mixed. For noises with strong symmetry, like depolarization, the usual MSE bound produces a good estimate of the error. For other kind of noises, it is essential to introduce nuisance parameters, and to use the techniques introduced in this paper.
As an illustration, we consider the amplitude damping noise as an example, which can be formulated as the channel are the Kraus operators. After the noisy evolution, the qubit state can be expressed as ρ := 1 2 (I + n · σ) with n := ( √ η sin θ sin ϕ, √ η sin θ cos ϕ, 1 − η + η cos θ). Now we can regard η as a nuisance parameter, while θ and ϕ are the parameters of interest. Defining the derivative vector through the equation p x · σ = ∂ρ/∂x, we can calculate the vectors p θ = ( √ η cos θ sin ϕ, √ η cos θ cos ϕ, −η sin θ) In terms of the derivative vector, the SLD for the parameter x ∈ {θ, ϕ, η} takes the form After some straightforward calculations, we get Then we have the MSE bound with nuisance parameter η. An illustration can be found in Figure 2 with W = I in Eq. (145). The minimum of the sum of the (x, x)-th matrix element of the MSE matrix for x = θ, ϕ is independent of ϕ, which is a result of the symmetry of the problem: the D-matrix does not depend on ϕ, and thus an estimation of ϕ can be obtained without affecting the precisions of other parameters. Notice that when the state is close to |0 or |1 , it is insensitive to the change of θ, resulting in the cup-shape curves in Figure 2. Next, we evaluate the sum of MSEs of ϕ and θ when η is a (known) fixed parameter using Eq. (126) and compare it to the nuisance parameter case. The result of the numerical evaluation is plotted in Fig. 3. It is clear from the plot that the variance sum is strictly lower when η is treated as a fixed parameter, compared to the nuisance parameter case. This is a good example of how knowledge on a parameter (η) can assist the estimation of other parameters (ϕ and θ). It is also observed that, when the noise is larger (i.e. when η is smaller), the gain of precision by knowing η is also bigger.

Multiphase estimation with noise.
Here we consider a noisy version of the multiphase estimation setting [49,20]. This problem was first studied by [20], where the authors derived a lower bound for the quantum Fisher information and conjectured that it was tight. Under local asymptotic covariance, we can now derive an attainable bound and show its equivalence to the SLD bound using the orthogonality of nuisance parameters, which proves the conjecture.
Our techniques also allow to resolve an open issue about the result of Ref. [20], where it was unclear whether or not the best precision depended on the knowledge of the noise. Using Corollary 4, we will also see that knowing a priori the strength of the noise does not help to decrease the estimation error. Note that η = 0 corresponds to the noiseless scenario. We consider a pure input state with N photons and in the "generalized NOON form" as The output state from the noisy multiphase evolution would be , and ρ η is independent of t. Notice that the output state is supported by the finite set of orthonormal states {|n j : j = 0, . . . , d, n = 0, . . . , N }, and thus it is in the scope of this work. In this case, {t j } are the parameters of interest, while α η and p η can be regarded as nuisance parameters. The SLD operators for these parameters can be calculated as where ℘ H ⊥ refers to the projection into the space orthogonal to |ψ η,t . Notice that p η and α η are orthogonal to other parameters, in the sense that Tr ρL tj L pη = Tr ρL αη L pη = 0 and Tr ρL tj L αη = 2ip η sin 2α η d is purely imaginary. We also have Therefore, the SLD Fisher information matrix and the D matrix are of the forms Substituting the above into the bound (145), we immediately get an attainable bound for any locally asymptotically covariant measurement m. Taking W to be the identity, one will see that for small η the sum of the variances scales as N 2 /d 2 , while for η → 1 it scales as N 2 /d, losing the boost in scaling compared to separate measurement of the phases. The bound (187) coincides with the SLD bound and the RLD bound. By Corollary 4, we conclude that the SLD (RLD) bound can be attained in the case of joint estimation of multiple phases. In addition, we stress that the ultimate precision does not depend on whether or not the noisy parameter η is known aprior: If η is unknown, one can obtain the same precision as when η is known by estimating η without disturbing the parameters of interest. General D-inv. model Table 3. Comparison between our results and existing results.

Conclusion
In this work, we completely solved the attainability problem of precision bounds for quantum state estimation under the local asymptotic covariance condition. We provided an explicit construction for the optimal measurement which attains the bounds globally. The key building block of the optimal measurement is the quantum local asymptotic normality, derived in [17,16] for a particular type of parametrization and generalized here to arbitrary parameterizations. Besides the bound of MSE, we also derived a bound for the tail probability of estimation. Our work provides a general tool of constructing benchmarks and optimal measurements in multiparameter state estimation.
Here, we should remark the relation with the results by Yamagata et. al. [19], which showed a similar statement for this kind of achievability in a local model scenario by a kind of local quantum asymptotic normality. In Theorem 7, we have shown the compact uniformity with the order estimation in our convergence, but they did not such properties. In the evaluation of global estimator, these properties for the convergence is essential. The difference between our evaluation and their evaluation comes from the key tools. The key tool of our derivation is Q-LAN (Proposition 2) by [17,16], which gives the state conversion, i.e., the TP-CP maps converting the states family with precise evaluation of the trace norm. However, their method is based on the algebraic central limit theorem [50,36], which gives only the behavior of the expectation of the function of operators R i . This idea of applying this method to the achievability of the Holevo bound was first mentioned in [18]. Yamagata et. al. [19] derived the detailed discussion in this direction.
Indeed, the algebraic version of Q-LAN by [50,36] can be directly applied to the vector X of Hermitian matrices to achieve the Holevo bound while use of the state conversion of Q-LAN requires some complicated procedure to handle the the vector X of Hermitian matrices, which is the disadvantage of our approach. However, since the algebraic version of Q-LAN does not give a state conversion directly, it is quite difficult to give the compact uniformity and the order estimate of the convergence. In this paper, to overcome the disadvantage of our approach, we have derived several advanced properties for Gaussian states in Sections 3.2 and 3.3 by using symplectic structure. Using these properties, we could smoothly handle complicated procedure to fill the gap between the full quit model and arbitrary submodel.

A. Proof of Lemma 2
In this appendix, we show Lemma 2. For this aim, we discuss the existence of PDF. First, we show the following lemma. Lemma 19 Let P be a probability measure on R. Define the location shift family {P t } as P t (B) := P (B + t). For an arbitrary disjoint decomposition A := {A i } of R, we assume that the probability distribution family {P A,t } has finite Fisher information J A,t , where P A,t (i) := P t (A i ). We also assume that J t := sup A J A,t < ∞. Also, we define x + := inf{x |P (x , ∞) = 0} and Also, p(x) is bounded, i.e., sup x p(x) < √ J 0 .
Using the previous Lemma, we are in position to prove Lemma 2.
Proof of Lemma 2: Let A := {A i } be an arbitrary disjoint finite decomposition of R. Let G A be the coarse-graining map from a distribution on R to a distribution on the meshes A i . Then, the Fisher information J A,n,t of {G A (℘ n t0,t|Mn )} t is not greater than the Fisher information J t of ρ n t0,t t∈∆n . Hence, the Fisher information J A,t of {G A (℘ t0,t|m )} t satisfies Therefore, we can apply Lemma 19 to ℘ t0,t|m . Lemma 19 guarantees the existence of the PDF of the limiting distribution ℘ t0,t|m ,

B. Lemmas used for asymptotic evaluations
In this appendix, we prepare two lemmas for asymptotic evaluations of information quantity of probability distributions. Lemma 20 Assume that two sequences of probability distributions {(P n , Q n )} on R converges to a pair of probability distributions (P, Q) on R, respectively. Then, the inequality holds.
Proof. Let p and q be the Radon-Nikodým derivative of P and Q with respect to P +Q. Given N > 0, let G N be the coarse-grained map from a distribution on R to a distribution on the Borel subsets B Then, information processing inequality for the fidelity yields that Since the number of meshes is finite, we have

Lemma 21 Let
Θ be an open subset of R k . Assume that a sequence of probability distributions {P t,n } t∈Θ on R k converges to a family of probability distributions {P t } t∈Θ on R k . We denote their -difference Fisher information matrices by J n t, and J t, , respectively. For an vector t and > 0, we also assume that there exists a Hermitian matrix J such that J n t, ≤ J . Then, P t+ ej is absolutely continuous with respect to P t for j = 1, . . . , k , and the inequality lim inf n→∞ a|J n t, − J t, |a ≥ 0 (205) holds for any complex vector a ∈ C k .
Proof of Lemma 21: Since J n t, and J t, are real matrices, it is sufficient to show (205) for a real vector a. In this proof, we fix the vector t.
Step (1): We show that P t+ ej is absolutely continuous with respect to P t for j = 1, . . . , k by contradiction. Assume that there exists an integer j such that P t+ ej is not absolutely continuous with respect to P t . There exists a Borel set B ⊂ R k such that P t+ ej (B) > 0 and P t (B) = 0. Let G be the coarse-grained map from a distribution P on R k to a binary distribution (P (B), P (B c )) on two events {B, B c }. Let J t,B, and J n t,B, be the -difference Fisher information matrices of {G(P t )} and {G(P t,n )}, respectively. Information processing inequality implies that J n t,B, ≤ J n t, ≤ J . Also, J n t,B, → J t,B, as n → ∞. Hence, J t,B, ≤ J . However, the j-th diagonal element of J t,B, is infinity. It contradicts the assumption of contradiction.
Step (2): Let p t+ ej be the Radon-Nikodým derivative of P t+ ej with respect to P t . We show that for N , > 0, and any integer j = 1, . . . , k by contradiction. We denote the LHS of (206) by C j and assume there exists an integer j such that C j > 0. We set R = J ;j,j /C j + 2. Setting B to be {x|p t+ ej (x) > R}, we repeat the same discussion as Step (1). Then, we obtain the contradiction as follows.
Step (3): We show (205) for a real vector a. We define the subsets Given R > 0, let G R be the coarse-grained map from a distribution on R k to a distribution on the family of measurable sets {B ⊂ R k \ C R } ∪ {C R }, where B is any Borel set in R k \ C R . Given N > 0 and R > 0, let G N,R be the coarse-grained map from a distribution on R k to a distribution on the family of measurable sets {B ⊂ R k \ C N,R } ∪ {C N,R }, where B is any Borel subset in R k \ C N,R . Given N > 0, R > 0, and N > 0, let G N ,N,R be the coarse-grained map from a distribution on R k to a distribution on meshes ( Let J n t, ,N ,R,N be the -difference Fisher information matrix for the distribution family {G N ,R,N (p t,n )} t . Then, information processing inequality (94) for the -difference Fisher information matrix yields that Hence, using (211), (212), (213), and (216), we have lim inf n→∞ a|J n t, −J t, |a ≥ 0.

C. Proof of Lemma 3
Before starting our proof of Lemma 3, we prepare the following lemmas. Lemma 22 Consider a canonical quantum Gaussian states family {Φ[θ, β]}. When a symplectic matrix S satisfies where E q is the matrix defined in Eq. (56), there exists a unitary operator U S such that Proof. Consider any coordinate θ = (θ C , θ Q ), where θ Q obtained by a reversible linear transformation S on the Q-LAN coordinate θ Q , i.e. θ Q = Sθ Q . Defineq j = 1 √ 2 (â j +â † j ),p j = 1 i √ 2 (â j −â † j ), and x = (q 1 ,p 1 , . . . ,q q ,p q ) T . We have where y := (S −1 ) T x and Z β > 0 is a normalizing constant. Now, by the definition of E q (x) in Eq. (56) and SE q (e −β V ) S T = E q e −(β V ) , S must be of the block diagonal form S = i O si . Here {s i } is a partition of {1, . . . , 2q} and j, k ∈ s i if and only if β j = β k , and O si is an orthogonal matrix acting on any component j ∈ s i . Since β V , β V and ln β V are in one-to-one correspondence, we SE q (e −β V )S T = E q (e −β V ). Substituting it into Eq. (218), we have That is, (S −1 ) T can be regarded as a transformation of x. Finally, S is symplectic since SDS T = D, and there exists a unitary U S such that [48] Φ Proof of Lemma 3: Using the imaginary part Im(Γ ), we distinguish the classical and the quantum parts. Specifically, the kernel and support of Im(Γ ) are Ker Im(Γ ) := x ∈ R k : Im(Γ )x = 0 (220) and Supp Im(Γ ) := (Ker Im(Γ )) ⊥ , respectively. We introduce the classical parameters θ C and the quantum parameters θ Q in Ker Im(Γ ) and Supp Im(Γ ), respectively. That is, the classical parameter θ C and the quantum parameter θ Q are given by an invertible linear transformation T such that θ := (θ C , θ Q ) = T t satisfies Since the above separation is unique up to the linear conversion and any classical Gaussian states can be converted to each other via scale conversion, the remaining problem is to show the desired statement for the quantum part.
Next, we focus on the quantum part ((T −1 ) T Γ T −1 ) Q of the Hermitian matrix (T −1 ) T Γ T −1 It is now convenient to define the matrix The role of A is to normalize the D-matrix. Indeed, since Im(((T −1 ) T Γ T −1 ) Q ) is skew symmetric, A −1 Im(((T −1 ) T Γ T −1 ) Q )A −1 is similar to Ω d Q , namely that there exists an orthogonal matrix S 0 so that S 0 A −1 Im(((T −1 ) T Γ T −1 ) Q )A −1 S T 0 = Ω d Q . Moreover, since S 0 A −1 Re(((T −1 ) T Γ T −1 ) Q )A −1 S T 0 is a real symmetric matrix, there exists a symplectic matrix S and a vector β such that [51] Meanwhile, we have SS 0 A −1 Im(((T −1 ) T Γ T −1 ) Q )A −1 S 0 S T = Ω d Q since S is symplectic. Overall, when T is given as (I ⊕ (SS 0 A −1 ))T , the desired requirement is satisfied.
The uniqueness of β is guaranteed by the uniqueness of symplectic eigenvalues. Hence, when two linear conversions T andT satisfies the condition of the statement, T Γ T T =T ΓT T . Thus, Lemma 22 guarantees that the canonical Gaussian states G(T −1 α, T Γ T T ) and G(T −1 α,T ΓT T ) are unitarily equivalent.

D. Proof of Lemma 5
(3) ⇒ (1): When a Gaussian states family is given in the RHS of (65), it is clearly D-invariant. Hence, the D-invariance is equivalent to the condition (2).
(2) ⇒ (3): First, we separate the system into the classical and the quantum parts. In the Gaussian states family {G[α, Γ ]} this separation can be done by considering the Kernel of Im(Γ ) as in the proof of Lemma 3. In the Gaussian states family G[T (t), Γ ] this separation can be done by considering the Kernel of the D-matrix D 0 in the same way. Since the relation (65) for the classical part is easily done, we show the relation (65) when only the quantum part exists.
Under the above assumption, we define the k × (d − k) matrix T such that F := (T ⊕ T ) is invertible and T T A −1 T = 0. Then, Lemma 3 guarantees that G[F (t, t ), Γ ] is unitarily equivalent to G[(t, t ), F −1 Γ (F T ) −1 ]. Since T T A −1 T = 0, we have we get

F. Proof of Lemma 12
Denote by Q(y) := 1 π k/2 y|F |y the Q-function of F [52]. Expanding displaced thermal states into a convex combination of coherent states, Eq. (105) can be rewritten as Taking the Fourier transform F y→ξ (g) := dy e iy·ξ g on both sides, we get In addition, we know that the P-function P (y) [53] of F can be evaluated via the Q-function as (see, for instance, [54]) The combination of (232) and (233) Conversely, we assume that F is given by (106). Then, we choose the function Q(α) to satisfy Since F = dyF −1 ξ→y e 1 4 j ξ 2 j F α→ξ (Q(α)) |y y|, we have Q(y) = 1 π k/2 y|F |y . Expanding displaced thermal states into a convex combination of coherent states, we have Applying the inverse of F α→ξ to (235), we obtain (231). The combination of (231) and (236) implies (105).

G. Proof of Lemma 16
First we focus on the quantum part and we show that, when a POVM {M G 0 } B⊂R 2s is covariant, where α Q = (α R , α I ) and W Q is a diagonal matrix with eigenvalues w j > 0. Since T W Q ,c (0) . There exists a state τ such that where T Q α Q is defined as in Eq. (52). LetN j be the number operator on the j-th system. SinceN When ρ is a squeezed state with Tr ρQ j = Tr ρ j P = 0, the output distribution ℘ α|M [ρ] of M [ρ] is the 2d Q -dimensional normal distribution of average α and the following covariance matrix [5]; In the single-mode case, without loss of generality, we can assume that W is a diagonal matrix w 1 0 0 w 2 because this diagonalization can be done by applying the orthogonal transformation between Q and P . Then,