Quantum Information Processing

, Volume 13, Issue 3, pp 683–707 | Cite as

Minimal resources identifiability and estimation of quantum channels

  • Mattia Zorzi
  • Francesco Ticozzi
  • Augusto Ferrante
Article

Abstract

We characterize and discuss the identifiability condition for quantum process tomography, as well as the minimal experimental resources that ensure a unique solution in the estimation of quantum channels, with both direct and convex optimization methods. A convenient parametrization of the constrained set is used to develop a globally converging Newton-type algorithm that ensures a physically admissible solution to the problem. Numerical simulation is provided to support the results and indicate that the minimal experimental setting is sufficient to guarantee good estimates.

Keywords

Quantum process tomography Maximum likelihood estimation Quantum information 

1 Introduction

Recent advances and miniaturization in laser technology and electronic devices, together with some profound results in quantum physics and quantum information theory, have generated in the last two decades, increasing interest in the promising field of quantum information engineering. The potential of these new technologies have been demonstrated by a number of theoretical and experimental results, including intrinsically secure quantum cryptography protocols, proof-of-principle implementation of quantum computing, as well as dramatic advances in controlled engineering of molecular dynamics, opto-mechanical devices, and a variety of other experimentally available systems. In this area, a key role is played by control, estimation and identification problems for quantum-mechanical systems  [18, 24, 34, 35, 43].

In the spirit that has driven the contributions of the control-theoretical community so far [3, 4, 5, 17, 19, 22, 27, 28, 33, 39, 40, 41], namely of developing research which is both strongly motivated by emerging applications and mathematically rigorous, we consider an identification problem arising in the reconstruction of quantum dynamical models from experimental data. This is a key issue in many quantum information processing tasks [10, 11, 31, 32, 34]. For example, a precise knowledge of the behavior of a channel to be used for quantum computation or communications is needed in order to ensure that optimal encoding/decoding strategies are employed, and to verify that the noise thresholds for hierarchical error-correction protocols, or for effectiveness of quantum key distribution protocols, are met [11, 32]. In many cases of interest, for example, in free-space communication [42], channels are not stationary, and to ensure good performances, repeated and fast estimation steps would be needed as a prerequisite for adaptive encodings. In addition to this, when the goal is to embed, the system used for probing the channel in a moving vehicle or a satellite, one seeks the simplest implementation, or at least a compromise between estimation accuracy and the number of experimental resources needed. This work has been motivated by the interaction of the authors with experimental groups, and the issues we here address are directly relevant to the quantum communication applications described above.

In fact, all these applications call for an answer to the following identifiability problem: are the given experimental resources sufficient to uniquely and correctly reconstruct the unknown channel from the available data? Strictly related issues include determining the minimal resources needed for correct identification and the efficiency of the minimal resources with respect to “richer” experimental settings.

In this paper, we are concerned with standard quantum process tomography (see e.g., [31, 34] and references therein; standard is used to distinguish the methods we discuss here from those that use auxiliary quantum systems as resources, see discussion below) for estimating quantum channels. More precisely, we focus on the minimal experimental resources (or quorum, in the language of [21]) needed for the identifiability of a quantum channel. Remarkably, the same results applies to all the standard quantum process tomography methods, including “inversion” and convex optimization methods. Among these, we revisit as notable examples some typical applications of the commonly used maximum likelihood (ML) method. In doing so, we pursue a rigorous presentation of the results and we try, whenever possible, to make contact with ideas and methods of (classical) system identification.

As noticed before, it is also possible to estimate a quantum channel by using an ancillary system [8, 20, 38]. Such a method is named ancilla-assisted process tomography. In the large body of literature regarding both standard quantum process tomography and ancilla-assisted process tomography, the experimental resources are usually assumed to be given. Mohseni et al. [31] compare different strategies, but focus on the role of having entangled states as an additional resource. However, it can be argued (see e.g., [44]) that the information acquired through an ancilla-assisted method by measuring a certain number \(K\) of independent observables can be equivalently gathered by the method we describe, by properly choosing \(L\) probe states and \(M\) observables such that \(L\cdot M=K\). Accordingly, our result on the minimal resources for TP channel estimation can be easily adapted to this approach. The problem we study is close in spirit to that taken in [36] while studying minimal state tomography.

In the derivation of the main result, we employ a natural parametrization for the quantum channels that directly account for the trace-preservation constraint. The same parametrization can be used as a starting point for convex optimization methods by resorting to a general Newton-type algorithm with barriers that guarantees convergence to a physical (nonnegative) solution.

With respect to the existing (ML) approaches, we do not introduce the TP constraint through a Lagrange multiplier [31, 34], as we constrain the set of channels of the optimization problem to TP maps from the beginning. In a \(d\)-level quantum system, this allow for an immediate reduction from \(d^4\) to \(d^4-d^2\) free parameters in the estimation problem. Our analysis can be considered as complementary to the one presented in [9], where the TP assumption is relaxed to include losses. The Newton-type algorithm is then used for our numerical simulations, showing that using the minimal setting does not deteriorate the performance: the estimation accuracy is mainly dependent just on the total number of trials. As expected, simulations also confirm that the convex optimization methods allow for more robust solutions with respect to the positivity constraints and confirm the consistency property.

The paper is structured as follows: Sect. 2 serves as an introduction to quantum channels and their linear-algebraic representation. Section 3 employs these linear-algebraic tools in deriving the core identifiability result and in proving existence and uniqueness of the solution, as well as consistency, for general convex optimization methods. Section 4 is devoted to applications of these theoretical contributions. First, we show how our results can be used to prove existence, uniqueness and consistency of the solution for two well-known ML estimation problems. Next, we show that a Newton-type algorithm can be designed to compute the estimate for this class of identification methods. Finally, numerical simulations are presented.

2 A review of quantum channels and \(\chi \)-representation

In this section, we review some basic facts about quantum channels, as well as their Choi representation and its properties [15]. Consider a \(d\)-level quantum system with associated Hilbert space \(\mathcal {H}\) isomorphic to \({\mathbb C}^d\). If \(X\) is the matrix representation of a linear operator on \(\mathcal {H},\) we shall denote by \(X^\mathrm{T}\) its transpose, while \(X^\dag \) will denote its transpose-conjugate (i.e., the matrix representation of the adjoint operator). The state of the system is described by a density matrix, namely by a positive semidefinite, unit-trace matrix
$$\begin{aligned} \rho \in \mathfrak {D}(\mathcal {H})=\left\{ \rho \in \mathbb {C}^{d\times d}| \rho =\rho ^\dag \ge 0,\,\mathrm{tr}(\rho )=1\right\} , \end{aligned}$$
which plays the role of probability distribution in classical probability. A state is called pure if \(\rho \) is an orthogonal projection matrix on a one-dimensional subspace. Measurable quantities or observables are associated with Hermitian matrices \(X=\sum _kx_k\varPi _k,\) with \(\{\varPi _k\}\) the associated spectral family of orthogonal projections. Their spectrum \(\{x_k\}\) represents the possible outcomes, and the probability of observing the \(k\)th outcome can be computed as \(p_\rho (\varPi _k)=\mathrm{tr}(\varPi _k\rho )\). More generally, indirect or generalized measurements are associated with families of nonnegative operators \(\{M_k\}\) such that \(\sum _k M_k=I\) and \(p_\rho (M_k)=\mathrm{tr}(M_k\rho ),\) also known as Positive-Operator Valued Measurements (POVM). In the rest of the paper, we consider projective measurements, but the same reasoning applies with no further complications to POVMs.
A quantum channel (in Schrödinger’s picture) is a map \( \mathcal {E}: \mathfrak {D}(\mathcal {H})\rightarrow \mathfrak {D}(\mathcal {H})\). It is well known [30, 32] that a physically admissible quantum channel must be linear and Completely Positive (CP), namely it must admit an Operator-Sum Representation (OSR)
$$\begin{aligned} \mathcal {E}(\rho )= \sum _{j=1}^{d^2}K_j\rho K_j^\dag \end{aligned}$$
(1)
where \(K_i\in \mathbb {C}^{d \times d}\) are called Kraus operators. Moreover, such a map must be Trace Preserving (TP), a necessary condition to map states to states. This is the case if
$$\begin{aligned} \sum _{j=1}^{d^2}K_j^\dag K_j=I_d \end{aligned}$$
(2)
where \(I_d\) is the \(d\times d\) identity matrix. CPTP maps can be thought as the quantum equivalent of Markov transition matrices in the classical setting. An alternative way to describe a CPTP map is offered by the \(\chi \)-representation. Each Kraus operator \(K_j\in \mathbb {C}^{d\times d}\) can be expressed as a linear combination (with complex coefficients) of \(\{F_m\}_{m=1}^{d^2},\,F_m\) being the elementary matrix \(E_{jk},\) (whose entries are all zero except the one in position \(jk\) which is \(1\)) with \(m=(j-1)d+k\). Accordingly, the OSR (1) can be rewritten as
$$\begin{aligned} \mathcal {E}(\rho )=\sum _{m,n=1}^{d^2} \chi _{m,n}F_m\rho F_n^\dag . \end{aligned}$$
(3)
Let \(\chi \) be the \(d^2 \times d^2\) matrix with element \(\chi _{m,n}\) in position \((m,n)\). It is easy to see that it must satisfy
$$\begin{aligned} \chi =\chi ^\dag \ge 0 \end{aligned}$$
(4)
and, following from (2),
$$\begin{aligned} \sum _{m,n=1}^{d^2} \chi _{m,n}F_n^\dag F_m=I_d. \end{aligned}$$
(5)
The map \(\mathcal {E}\) is completely determined by the matrix \(\chi \).
The \(\chi \) matrix can be used directly to calculate the effect of the map on a given state, and the probability measurements outcomes, as well as observable expectations. Before providing the explicit formulas in the next lemmas, we need to recall the definition of partial trace. Consider two finite-dimensional vector spaces \(\mathcal{V}_1\,\mathcal{V}_2,\) with \(\dim \mathcal{V}_1=n_1,\,\dim \mathcal{V}_2=n_2\). Let us denote by \(\mathcal{M}_{n}\) the set of complex matrices of dimension \(n\times n\). Let \(\{M_j\}\) be a basis for \(\mathcal{M}_{n_1},\) and \(\{N_j\}\) be a basis for \(\mathcal{M}_{n_2},\) representing linear maps on \(\mathcal{V}_1\) and \(\mathcal{V}_2,\) respectively. Consider \(\mathcal{M}_{n_1\cdot n_2}=\mathcal{M}_{n_1}\otimes \mathcal{M}_{n_2}\): it is easy to show that the \(n_1^2\times n_2^2\) linearly independent matrices \(\{M_j\otimes N_k\}\) form a basis for \(\mathcal{M}_{n_1\cdot n_2},\) where \(\otimes \) denotes the Kronecker product. Thus, one can express any \(X\in \mathcal{M}_{n_1\cdot n_2}\) as
$$\begin{aligned} X=\sum _{jk}c_{jk}M_j\otimes N_k. \end{aligned}$$
The partial trace over \(\mathcal{V}_2\) is the linear map
$$\begin{aligned}&\mathrm{tr}_{2}:\mathcal{M}_{n_1\cdot n_2}\rightarrow {M}_{n_1} \\&\qquad \,\,\, X\mapsto \mathrm{tr}_{2}(X){:=} \sum _{j,k}\left( c_{jk}\mathrm{tr}(N_k)\right) M_j. \end{aligned}$$
An analogous definition can be given for the partial trace over \(\mathcal{V}_1\). The partial trace can be also implicitly defined (without reference to a specific basis) as a linear map such that for any pair \(X\in \mathcal{M}_{n_1},\; Y\in \mathcal{M}_{n_2}\):
$$\begin{aligned} \mathrm{tr}_{2}(X\otimes Y)=\mathrm{tr}(Y)X. \end{aligned}$$
By linearity, this clearly implies
$$\begin{aligned} \mathrm{tr}((A\otimes I)B)=\mathrm{tr}\left( A\,\mathrm{tr}_{2}(B)\right) . \end{aligned}$$

Lemma 1

Let \(\mathcal {E}_\chi \) be a CPTP map associated with a given \(\chi \). Then for any \(\rho \in \mathfrak {D}(\mathcal {H})\)
$$\begin{aligned} \mathcal {E}_\chi (\rho )=\mathrm{tr}_2\left( \chi (I_d\otimes \rho ^\mathrm{T})\right) . \end{aligned}$$
(6)

Proof

Let us rewrite each \(F_j\) as the corresponding elementary matrix \(E_{lm},\) with \(j=(l-1)d+m,\,k=(n-1)d+p,\) and relabel \(\chi _{jk}\) as \(\hat{\chi }_{lmnp}\) accordingly. Hence, we get
$$\begin{aligned} \chi =\sum _{l,m,n,p}\hat{\chi }_{lmnp}E_{ln}\otimes E_{mp}, \end{aligned}$$
(7)
and
$$\begin{aligned} \mathcal {E}_\chi (\rho )=\sum _{l,m,n,p} \hat{\chi }_{lmnp}E_{lm}\rho E_{pn}. \end{aligned}$$
We can also expand \(\rho =\sum _{rs}\rho _{rs}E_{rs},\) and substitute it in the above expression. Taking into account that \(E_{lm}E_{np}=\delta _{mn}E_{lp}\), and defining \([\hat{\chi }^B_{ln}]_{mp}=\hat{\chi }_{lmnp}\), we get:
$$\begin{aligned} \mathcal {E}_\chi (\rho )&= \sum _{l,m,n,p,r,s}\rho _{rs}\hat{\chi }_{lmnp}E_{lm}E_{rs}E_{pn}\\&= \sum _{l,n,r,s}\rho _{rs}\hat{\chi }_{lrns}E_{ln}= \sum _{l,n}\left( \sum _{r,s}\rho _{rs}\hat{\chi }_{lrns}\right) E_{ln}\\&= \sum _{l,n}{\mathrm{tr}}\left( \rho ^\mathrm{T}\hat{\chi }^B_{ln}\right) E_{ln}= \mathrm{tr}_2(\chi (I\otimes \rho ^{\mathrm{T}})) \end{aligned}$$
where we used the fact that \(\hat{\chi }^B_{ln}\) corresponds to the \(d\times d\) dimensional block of \(\chi \) in position \((l,n),\) and that for every pair of matrices \(X,Y,\) we can write \(\sum _{rs}X_{rs} Y_{rs}=\mathrm{tr}(X^\mathrm{T}Y)\). \(\square \)

Remark

The \(\chi \) representation is equivalent to the Choi representation for quantum channels, i.e., the one associated with the Choi matrix \(C_\mathcal{E}=\sum _{mn}E_{mn}\otimes \mathcal {E}({E_{mn})}\) [35]. In fact, either by direct computation or by confronting formula (6) with its equivalent for the Choi matrix \(C_\mathcal {E}\) (see e.g., [34], chapter 2), it is easy to see that \(C_\mathcal {E}=O\chi O^\dag ,\) where \(O\) is the unique unitary such that \(O( X\otimes Y) O^\dag =Y\otimes X\) [7].

Lemma 1 leads to a useful expression for the computation of the expectations.

Corollary 1

Let us consider a state \(\rho \), a projector \(\varPi \) and a quantum channel \(\mathcal {E}\) with associated a \(\chi \)-representation matrix \(\chi \). Then
$$\begin{aligned} p_{\mathcal {E}(\rho )}(\varPi )=\mathrm{tr}\left( \mathcal {E}(\rho )\varPi \right) =\mathrm{tr}\left( \chi (\varPi \otimes \rho ^{\mathrm{T}})\right) . \end{aligned}$$

Proof

It suffices to substitute (6) in \(p_{\chi ,\rho }(\varPi )=\mathrm{tr}(\mathcal {E}(\rho )\varPi ),\) and use the identity \(\mathrm{tr}((X\otimes I) Y)=\mathrm{tr}(X\mathrm{tr}_2(Y))\). \(\square \)

The TP condition (5) can also be re-expressed directly in terms of the \(\chi \) matrix.

Corollary 2

Let us consider a CP map \(\mathcal {E}_\chi \) with associated \(\chi \)-representation matrix \(\chi \). Then \(\mathcal {E}_\chi \) is TP if and only if
$$\begin{aligned} \mathrm{tr}_1(\chi )=I_d. \end{aligned}$$
(8)

Proof

Using the same notation we used in the proof of Lemma 1, we can re-express the TP condition (5) as:
$$\begin{aligned} I_d= \sum _{l,m,n,p}\hat{\chi }_{lmnp}E_{pn}E_{lm} = \sum _{l,m,p}\hat{\chi }_{lmlp}E_{pm} = \mathrm{tr}_1(\chi ). \end{aligned}$$
\(\square \)

3 Identifiability condition and minimal setting

3.1 The channel identification problem

Consider the following setting: a quantum system prepared in a known pure state \(\rho \) is fed to an unknown quantum channel \(\mathcal{E}\). The system in the output state \(\mathcal {E}(\rho )\) is then subjected to a projective measurement of an observable. By noting that an observable (being represented by an Hermitian matrix in our setting) admits a decomposition in orthogonal projections representing mutually incompatible quantum events, we can without loss of generality restrict ourselves to consider measurements associated with orthogonal projections \(\varPi =\varPi ^\dag =\varPi ^2\) (or, more generally, positive semidefinite \(M_k\le I\) for generalized measurements.) For each one of these, the outcome \(x\) is in the set \(\{0,1\},\) and can be interpreted as a sample of the (classical) random variable \(X\) which has distribution
$$\begin{aligned} P_{\chi (x),\rho }=\left\{ \begin{array}{ll} p_{\chi ,\rho }(\varPi ),&{}\; \; \hbox {if } x=1\\ 1-p_{\chi ,\rho }(\varPi ),&{}\; \; \hbox {if } x=0 \end{array} \right. \end{aligned}$$
(9)
where \(p_{\chi ,\rho }(\varPi )=\mathrm{tr}(\mathcal {E}_\chi (\rho )\varPi )\) is the probability that the measurement of \(\varPi \) returns outcome 1 when the state is \(\mathcal {E}_\chi (\rho )\).
Assume that the experiment is repeated with a series of known input (pure) states \(\{\rho _k\}_{k=1}^{L}\), and to each trial, the same orthogonal projections \(\{\varPi _j\}_{ j=1}^{M}\) are measured \(N\) times, obtaining a series of outcomes \(\{x_{l}^{jk}\}\). We consider the sampled frequencies to be our data, namely
$$\begin{aligned} f_{jk}:=\frac{1}{N}\sum _{l=1}^N x_l^{jk}. \end{aligned}$$
(10)
The channel identification problem or, as it is referred to in the physics literature, the standard quantum process tomography problem [31, 32, 34], consists in estimating a matrix \(\hat{\chi }\) satisfying constraints (3), (4) such that the corresponding Kraus map \({\mathcal {E}}_{\hat{\chi }}\) fits the experimental data in some optimal way.

3.2 Necessary and sufficient conditions for identifiability

The minimal experimental setting needed to uniquely identify an unknown quantum state has been object of study in [13], in the framework of informationally complete measurements, and it has been also studied in detail in [36]. Here, by exploiting the similarity of the \(\chi \) matrix with a density operator, we follow a similar path and characterize the minimal set of probe states/measurements for which a quantum channel is identifiable.1

It is well known [34, 37] that by imposing linear constraints associated with the TP condition (5), or equivalently (8), one reduces the \(d^4\) real degrees of freedom of \(\chi \) to \(d^4-d^2\). We shall exploit this fact by directly parameterizing \(\chi \) in a “generalized” Pauli basis (also known as Gell-Mann matrices, Fano basis or coherence vector representation in the case of states [2, 6, 34]). Usually the TP constraint is not directly included in the standard tomography method [31], since in principle it should emerge from the physical properties of the channel, or it is imposed through a (nonlinear) Lagrange multiplier in the ML approach [34]. Here, in order to investigate the minimum number of probe (input) states and measured projectors needed to uniquely determine \(\chi \), it is convenient to include this constraint from the very beginning. Doing so, we lose the possibility of exploiting a Cholesky factorization in order to impose positive semidefiniteness of \(\chi \): nonetheless, we show in Sect. 4.3 that semidefiniteness of the solution can be imposed algorithmically by using a barrier method [12].

Before proceeding to the main results, a number of definitions are in order. Consider an orthonormal basis for \(d^2\times d^2\) Hermitian matrices of the form \(\{\sigma _j\otimes \sigma _k\}_{j,k=0,1,\ldots ,d^2-1},\) where \(\sigma _0=1/\sqrt{d}I_d,\) while \(\{\sigma _j\}_{j=1,\ldots ,d^2-1}\) is a basis for the traceless subspaces. We can now write
$$\begin{aligned} \chi =\sum _{jk}s_{jk}\sigma _j\otimes \sigma _k. \end{aligned}$$
If we now substitute it into (8), we get:
$$\begin{aligned} I_d =\mathrm{tr}_1(\chi )=\sum _{jk}s_{jk}\mathrm{tr}(\sigma _j)\sigma _k =\sum _{k}\sqrt{d} \, s_{0k}\sigma _k, \end{aligned}$$
and hence, since the \(\sigma _j\) are linearly independent, we can conclude that \(s_{00}=1,\,s_{0j}=0\) for \(j=1,\ldots ,d^2-1\). Hence, the free parameters for a TP map (at this point not necessarily CP, since we have not imposed the positivity of \(\chi \) yet) are \(d^4-d^2,\) and we can write any TP \(\chi \) as \(\chi =d^{-1} I_{d^2}+\sum _{j=1,k=0}^{d^2-1,d^2-1}s_{jk}\sigma _j\otimes \sigma _k,\) or, in a more compact notation,
$$\begin{aligned} \chi (\underline{\theta })=d^{-1} I_{d^2}+\sum _{\ell =1}^{d^4-d^2}\theta _\ell Q_\ell , \end{aligned}$$
(11)
by rearranging the double indexes \(j,k\) in a single index \(\ell ,\) and defining the corresponding \(Q_\ell =\sigma _j\otimes \sigma _k\). Thus, there exists a one to one correspondence among \(\chi \) and the \(d^4-d^2\)-dimensional real vector \(\underline{\theta }=\left[ \begin{array}{lll} \theta _1 &{} \ldots &{} \theta _{d^4-d^2} \\ \end{array} \right] ^{\mathrm{T}}\), and the \(\chi \) matrices corresponding to TP maps form an affine space, its linear part being
$$\begin{aligned} \mathcal{S}_{TP}:=\mathrm{span}\{Q_\ell \}=\mathrm{span}\left\{ \sigma _j\otimes \sigma _k\right\} _{j=1,\ldots ,d^2-1,k=0,\ldots ,d^2-1}. \end{aligned}$$
In order to find necessary and sufficient conditions for identifiability, it is convenient to define
$$\begin{aligned} B_{jk}=(\varPi _j-\frac{r_j}{d}I)\otimes \rho ^{\mathrm{T}}_k \end{aligned}$$
(12)
where \(r_j\) is the rank of \(\varPi _j,\) or equivalently \(\mathrm{tr}(\varPi _j)=r_j\). The results we obtain next will also hold if we substitute the orthogonal projectors \(\varPi _j\) in (12) with general POVM operators. Moreover, we define \(\mathcal{B}=\mathrm{span}\{B_{jk} \}_{j=1,\ldots , M,k=1,\ldots ,L}\). Intuitively, \(\mathcal{B}\) represents the space of input/output combination that can be probed by the set of experimental resources \(\{\rho _k\},\{\varPi _j\}\) we choose. The definition of the \(B_{jk}\) is motivated by the fact that, since \(Q_\ell =\sigma _{j\ne 0}\otimes \sigma _k,\) it holds that
$$\begin{aligned} \mathrm{tr}\left( Q_\ell (\varPi _j\otimes \rho ^{\mathrm{T}}_k)\right) =\mathrm{tr}(Q_\ell B_{jk}). \end{aligned}$$
(13)
By recalling that \(\sigma _j,j=1,\ldots d^2-1\) is a basis for the traceless subspace of Hermitian matrices it is immediate to show that \(\mathcal{B}\subseteq \mathcal{S}_{TP}\). Finally, let us introduce the function \(g\) that maps the space of TP channels into the (theoretical) set of probabilities for the input states/measured projectors combinations:
$$\begin{aligned}&g : \mathbb {R}^{d^4-d^2} \rightarrow \mathbb {R}^{M\times L}\nonumber \\&\qquad \underline{\theta }\mapsto g(\underline{\theta }) \end{aligned}$$
where the component of \(g(\underline{\theta })\) in position \((j,k)\) is defined as
$$\begin{aligned} g_{jk}(\underline{\theta })=p_{\chi (\underline{\theta }),\rho _k}(\varPi _j)=\mathrm{tr}\left( \chi (\underline{\theta })\left( \varPi _j\otimes \rho ^{\mathrm{T}}_k\right) \right) . \end{aligned}$$
(14)
The key result on identifiability is the following:

Proposition 1

 \(g\) is injective if and only if \(\mathcal {S}_{TP}= \mathcal {B}\).

Proof

Given (14), we have that
$$\begin{aligned} g_{jk}\left( \underline{\theta }_1\right) -g_{jk}\left( \underline{\theta }_2\right)&= \mathrm{tr}\left[ \left( \chi (\underline{\theta }_1)-\chi (\underline{\theta }_2)\right) \left( \varPi _j\otimes \rho ^{\mathrm{T}}_k\right) \right] \\&= \mathrm{tr}\left[ S(\underline{\theta }_1-\underline{\theta }_2)B_{jk}\right] \nonumber \\&= \langle S\left( \underline{\theta }_1-\underline{\theta }_2\right) ,B_{jk}\rangle \end{aligned}$$
where \(S(\underline{\theta }_1-\underline{\theta }_2)=\chi (\underline{\theta }_1)-\chi (\underline{\theta }_2)=\sum _{l=1}^{d^4-d^2}(\theta _{1,l}-\theta _{2,l})Q_l\in \mathcal {S}_{TP}\). So, we have that
$$\begin{aligned} g(\underline{\theta }_1)=g(\underline{\theta }_2) \; \Leftrightarrow \; \langle S\left( \underline{\theta }_1-\underline{\theta }_2\right) ,B_{jk}\rangle =0 \; \; \forall \; j,k. \end{aligned}$$
(15)
Assume \(\mathcal {S}_{TP}= \mathcal {B}:\) the only element of \(\mathcal{S}_{TP}\) for which the r.h.s. of (15) could be true is zero. Since by definition \(S(\underline{\theta }_1-\underline{\theta }_2)=0\) if and only if \(\underline{\theta }_1=\underline{\theta }_2,\; g\) is injective. On the other hand, assume that \(\mathcal {B}\subsetneq \mathcal {S}_{TP}:\) therefore, there exists \(T \ne 0\in \mathcal {S}_{TP}\bigcap \mathcal{B}^\perp \) such that
$$\begin{aligned} T=\sum _{\ell }\gamma _\ell Q_\ell ,\quad \langle T,B_{jk}\rangle =0 \; \forall j,k. \end{aligned}$$
But this would mean that \(\underline{\theta }\) and \(\underline{\theta }+\underline{\gamma }\) have the same image \(g(\underline{\theta })\), and hence, \(g\) is not injective. \(\square \)

We anticipate here that \(g\) being injective is a necessary and sufficient condition for a priori identifiability of \(\chi ,\) and thus for having a unique solution of the problem for both inversion and convex optimization-based (e.g., ML) methods, up to some basic assumptions on the cost functional. The proof of these facts is given in full detail in Sects. 3.3 and 3.4.

As a consequence of these facts, we can determine the minimal experimental resources, in terms of input states and measured projectors, needed for faithfully reconstructing \(\chi \) from noiseless data \(\{f^\circ _{jk}\}\), where \(f^\circ _{jk}=p_{\chi ,\rho _k}(\varPi _j)\). In the light of Proposition 1, the minimal experimental setting is characterized by a choice of \(\{\varPi _j,\rho _k\}\) such that \(\mathcal{S}_{TP}=\mathcal{B}\). Recalling the definition of \(\mathcal{B},\) through (12), it is immediate to see that \(\mathcal{S}_{TP}=\mathcal{B}\) if and only if \(\mathrm{span}\{\varPi _j- \frac{r_j}{d}I_d\}=\mathrm{span}\{\sigma _j,j=1,\ldots ,d^2-1\}\) and \(\mathrm{span}\{\rho _k\}=\mathcal {H}_d\). Here, \(\mathcal {H}_d\) denotes the vector space of Hermitian matrices of dimension \(d\). We can summarize this fact as a corollary of Proposition 1.

Corollary 3

 \(g\) is injective if and only if we have at least \(d^2\) linearly independent input states \(\{\rho _k\},\) and \(d^2-1\) measured \(\{\varPi _j\}\) such that
$$\begin{aligned} \mathrm{span}\left\{ \varPi _j- \frac{r_j}{d}I_d\right\} =\mathrm{span}\left\{ \sigma _j,j=1,\ldots ,d^2-1\right\} . \end{aligned}$$

It is worth observing that the reduction in the number of the observables is a direct consequence of the imposed trace-preserving constraint. We call such a set a minimal experimental setting. Notice that, using the terminology of [21, 34], the minimal quorum of observables consists of \(d^2-1\) properly chosen elements. While in most of the literature at least \(d^2\) observables are considered [23, 31], we showed it is in principle possible to spare a measurement channel at the output. A physically inspired interpretation for this fact is that, since we a priori know, or assume, that the map is TP, measuring the component of the observables along the identity does not provide useful information. This is clearly not true if one relaxes the TP condition, as it has been done in [9]: in that case, by the same line of reasoning, \(d^2\) linearly independent observables are the necessary and sufficient for \(g\) to be injective.

As an example relevant to many experimental situation, consider the qubit case, i.e., \(d=2\). A minimal set of projector has to span the traceless subspace of \(\mathcal {H}_2\): one can choose e.g.:
$$\begin{aligned} \varPi _j=\frac{1}{2}I_2+\sigma _j,\;j=x,y,z. \end{aligned}$$
$$\begin{aligned} \rho _{x,y}=\frac{1}{2}I_2+\sigma _{x,y},\quad \rho _\pm =\frac{1}{2}I_2\pm \sigma _z. \end{aligned}$$
(16)
It is clear that there is an asymmetry between the role of output and inputs: in fact, exchanging the number of \(\{\varPi _j\}\) and \(\{\rho _k\}\) can not lead to an injective \(g\).

3.3 Standard process tomography by inversion

Assume that \(\mathcal {S}_{TP}= \mathcal {B},\) and that the data \(\{f_{jk}\}\) have been collected. Since \(f_{jk}\) is an estimate of \(p_{\chi (\underline{\theta }),\rho _k}(\varPi _j)\), consider the following least mean square problem
$$\begin{aligned} \min _{\underline{\theta }\in \mathbb {R}^{d^4-d^2}} \Vert \underline{g}(\underline{\theta })-\underline{f}\Vert \end{aligned}$$
(17)
where \(\underline{g}(\underline{\theta })\) and \(\underline{f}\) are the vectors obtained by stacking the \(g_{jk}(\underline{\theta })\) and \(f_{jk}\,j=1\ldots M,\,k=1\ldots L\), respectively. In view of (11) and (14) we have that \(\underline{g}(\underline{\theta })=T\underline{\theta }+d^{-1}\underline{r}\) where
$$\begin{aligned} T=\left[ \begin{array}{lcl} \ddots &{} \vdots &{} \\ &{} \mathrm{tr}(B_{jk}Q_\ell ) &{} \\ &{} \vdots &{} \ddots \end{array}\right] \end{aligned}$$
(18)
and
$$\begin{aligned} \underline{r}=\left[ \begin{array}{lcl} r_1 &{} \ldots &{} r_M \\ \end{array} \right] ^{\mathrm{T}}. \end{aligned}$$
Notice that the \(\ell \hbox {th}\) column of \(T\) is formed with the inner products of \(Q_\ell \) with each \(B_{jk}\). Since \(\mathcal {S}_{TP}= \mathcal {B}\), the \(Q_\ell \) are linearly independent and the \(B_{jk}\) are the generators of \(\mathcal{B},\) then \(T\) is full column rank, namely has rank \(d^4-d^2\). Hence, in principle, one can reconstruct \(\hat{\theta }\) as
$$\begin{aligned} \hat{\underline{\theta }}=T^\#\left( \underline{f}-\frac{1}{d} \underline{r}\right) , \end{aligned}$$
(19)
\(T^\#\) being the Moore–Penrose pseudo inverse of \(T\) [25]. If the experimental setting is minimal, the usual inverse suffices. However, as it is well known, when computing \(\chi \) this way from real (noisy) data, the positivity character is typically lost [1, 34]. We better illustrate this fact in Sect. 4.4, through numerical simulations.

3.4 Convex methods: general framework

More robust approaches for the estimation of physically acceptable \(\chi \) (or equivalent parametrizations) have been developed, most notably by resorting to ML methods [23, 34, 37, 45]. The optimal channel estimation problem can be stated, by using the parametrization for \(\chi (\underline{\theta })=d^{-1}I_{d^2}+\sum _\ell \theta _\ell Q_\ell \) presented in the previous section, as it follows: consider a set of data \(\{f_{jk}\}\) as above, and a cost functional \(J(\underline{\theta }):=h[ g(\underline{\theta })]\) where \(h: \mathbb {R}^{M\times L}\rightarrow \mathbb {R}\) is a suitable function which depends on the data \(\{f_{jk}\}\). We aim to find
$$\begin{aligned} \hat{\theta }=\arg \min _{\underline{\theta }} J(\underline{\theta }) \end{aligned}$$
(20)
subject to \(\underline{\theta }\) belonging to some constrained set \(\mathcal{C}\subset \mathbb {R}^{d^4-d^2}\). In our case
$$\begin{aligned} \mathcal{C}=\mathcal{A}_+\quad \mathrm{or }\quad \mathcal{C}=\mathcal{A}_+ \cap \mathcal{I}, \end{aligned}$$
with \(\mathcal{A}_+=\{\underline{\theta }\;|\; \chi (\theta )\ge 0 \},\) while \(\mathcal{I}=\{\underline{\theta }\;|\;0<\mathrm{tr}(\chi (\underline{\theta })(\varPi _j\otimes \rho ^{\mathrm{T}}_k))<1,\; \forall \,j,k\}\). The second constraint may be used when a cost functional which is not well defined for extremal probabilities, or in order to ensure that the estimated channel exhibits some noise in each of the measured directions, as it is expected in realistic experimental settings. Since the analysis does not change significantly in the two settings, we will not distinguish between them where it is not strictly necessary. The following result will be instrumental to prove the existence of a unique solution.

Proposition 2

\(\mathcal{C}\) is a bounded set.

Proof

First, we remark that \(\mathcal{C}\) is neither closed nor open in general. Since \(\mathcal {C}\subset \mathcal{A}_+\), it is sufficient to show that \(\mathcal{A_+}\) is bounded or, equivalently, that a sequence \(\{\underline{\theta }_j\}_{j\ge 0}\), with \(\underline{\theta }_j\in \mathbb {R}^{d^4-d^2}\), and \(\Vert \underline{\theta }_j\Vert \rightarrow +\infty \), cannot belong to \(\mathcal{A_+}\). To this end, it is sufficient to show that, as \(\Vert \underline{\theta }_j\Vert \rightarrow +\infty \), the minimum eigenvalue of \(\chi (\underline{\theta }_j)\) tends to \(-\infty \) so that, for \(j\) large enough, \(\underline{\theta }_j\) does not satisfy condition \(\chi (\underline{\theta }_j)\ge 0\). Notice that the map \(\underline{\theta }\mapsto \chi (\underline{\theta })\) is affine. Moreover, since the \(Q_\ell \) are linearly independent, this map is injective. Accordingly, \(\Vert \chi (\underline{\theta }_j)\Vert \) approach infinity as \(\Vert \underline{\theta }_j\Vert \rightarrow +\infty \). Since \(\chi (\underline{\theta }_j)\) is a Hermitian matrix, \(\chi (\underline{\theta }_j)\) has an eigenvalue \(\lambda _j\) such that \(|\lambda _j|\rightarrow +\infty \) as \(\Vert \chi (\underline{\theta }_j)\Vert \rightarrow +\infty \). Recall that \(\chi (\underline{\theta }_j)\) satisfies (8) by construction which implies that \(\mathrm{tr}(\chi (\underline{\theta }_j))=d\) namely the sum of its eigenvalues is always equal to \(d\). Thus, there exists an eigenvalue of \(\chi (\underline{\theta }_j)\) which approaches \(-\infty \) as \(j\rightarrow +\infty \), which is in contrast with its positivity. So, \(\mathcal {C}\) is bounded. \(\square \)

Here we focus on the following issue: under which conditions on the experimental setting (or, mathematically, on the set \(\mathcal{B}\) defined above) do the optimization approach have a unique solution? In either of the cases above, \(\mathcal{C}\) is the intersection of convex nonempty sets: in fact, \(\mathcal{S}_{TP}\) and \(\chi \ge 0\) are convex and so must be the corresponding sets of \(\underline{\theta }\), and it is immediate to verify that \(\mathcal{I}\) is convex as well; all of these contain \(\underline{\theta }=0,\) corresponding to \(\frac{1}{d}I_{d^2},\) and hence, they are nonempty. In the light of this, it is possible to derive sufficient conditions on \(J\) for existence and uniqueness of the minimum in the presence of arbitrary constraint set \(\mathcal{C}\). Define \(\partial \mathcal{C}_0:=\partial \mathcal{C}\setminus (\partial \mathcal{C}\cap \mathcal{C})\).

Proposition 3

Assume \(h\) is continuous and strictly convex on \(g(\mathcal{C})\), and
$$\begin{aligned} \lim _{\underline{\theta }\rightarrow \partial \mathcal {C}_0 }J(\underline{\theta })=\lim _{\underline{\theta }\rightarrow \partial \mathcal {C}_0 }h[ g(\underline{\theta })]=+\infty . \end{aligned}$$
(21)
If \(\mathcal {S}_{TP}= \mathcal {B}\), then the functional \(J\) has a unique minimum point in \(\mathcal {C}\).

Proof

Since \(h\) is strictly convex on \(g(\mathcal {C})\) and the linear function \(g\), in view of Proposition 1, is injective on \(\mathcal {C},\,J\) is strictly convex on \(\mathcal {C}\). So, we only need to show that \(J\) takes a minimum value on \(\mathcal {C}\). In order to do so, it is sufficient to show that \(J\) is inf-compact, i.e., the image of \((-\infty ,r]\) under the map \(J^{-1}\) is a compact set. Existence of the minimum for \(J\) then follows from a version of Weierstrass theorem since an inf-compact function has closed level sets and is therefore lower semicontinuous [29, p. 56]. Define \(\underline{\theta }_0:=\left[ \begin{array}{lllll} 0 &{} \ldots &{} 0\\ \end{array} \right] ^{\mathrm{T}}\in \mathbb {R}^{d^4-d^2}\). Observe that \(\chi (\underline{\theta }_0)={d}^{-1}I_{d^2}\ge 0\). Moreover,
$$\begin{aligned} \mathrm{tr}\left( \chi (\underline{\theta }_0)\varPi _j\otimes \rho ^{\mathrm{T}}_k\right) =\frac{ r_j}{d} <1 \; \; \forall j,k. \end{aligned}$$
(22)
Therefore, \(\underline{\theta }_0\in \mathcal {C}\) and call \(J(\underline{\theta }_0)=J_0<\infty \). So, we can restrict the search for a minimum point to the image of \((-\infty ,J_0 ]\) under \(J^{-1}\). Since \(\mathcal{C}\) is a bounded set by construction, to prove inf-compactness of \(J\) it is sufficient to guarantee that
$$\begin{aligned} \lim _{\underline{\theta }\rightarrow \partial \mathcal {C}_0}J(\underline{\theta })=+\infty . \end{aligned}$$
\(\square \)
We now discuss consistency of this method. First, notice that \(h\) depends on \(\{\varPi _j,\rho _k,f_{jk}\}\) where \(f_{jk}\) are the sample frequencies. We write \(h_{\underline{f}}\), where \([\underline{f} ]_{jk}=f_{jk}\), to make explicit the dependence of \(h\) on the matrix of \(f_{jk}\). Let \(N\) be the number of measures performed for each \(\rho _k\) and \(\varPi _j\) and denote the estimate (20) by \(\hat{\theta }_N\) to highlight its dependance on \(N\). Let \(\underline{\theta }^\circ \) be the “true” value of the parameter. We recall that the estimate is consistent if \(\forall \theta \in {\mathcal C}\),
$$\begin{aligned} \lim _{N\rightarrow \infty } \hat{\theta }_N=\underline{\theta }^\circ \qquad \mathrm{in \ probability}. \end{aligned}$$
Let \(\mathcal{F} \subset g(\mathcal {C})\) be the class of admissible \(\underline{f}\), i.e., the class of (“true”) probabilities that are consistent with a CPTP channel. The following result permits to derive a large class of functions \(h\) that guarantees consistency of the estimator.

Proposition 4

If for all \(\underline{f}\in \mathcal{F}\)
$$\begin{aligned} \underline{f}=\underset{\underline{x}\in \mathbb {R}^{M\times L}}{\mathrm{argmin}} h_{\underline{f}}(\underline{x}), \end{aligned}$$
then the corresponding estimate \(\hat{\theta }\) is consistent.

Proof

Let \(\underline{\theta }^\circ \) be the “true” parameter and \(\chi =\chi (\underline{\theta }^\circ )\) be the corresponding \(\chi \)-matrix of the “true” channel. First observe that, once fixed the sample frequencies \(\underline{f}\),
$$\begin{aligned} J(\underline{\theta })=h_{\underline{f}}[ g(\underline{\theta })]\ge h_{\underline{f}}(\underline{f}), \end{aligned}$$
so that if there exists \(\hat{\underline{\theta }}\in \mathcal{C}\) such that \(\mathrm{tr}[\chi (\hat{\underline{\theta }})(\varPi _{j}\otimes \rho _{k}^{\mathrm{T}})]=f_{jk},\) then such a \(\hat{\underline{\theta }}\) is optimal. Hence, in particular, the (unique) optimal solution corresponding to the \(f_{jk}\) equal to the “true” probabilities \(\mathrm{tr}[\chi (\varPi _j\otimes \rho _k^{\mathrm{T}})]\) is exactly \(\underline{\theta }^\circ \). On the other hand, as the number of experiments \(N\) increases, the sample frequencies \(f_{jk}\) converge in probability to the “true” probabilities \(\mathrm{tr}[\chi (\varPi _j\otimes \rho _k^{\mathrm{T}})]\). Therefore, in view of convexity of \(J\) and of the continuity of \(J\), the corresponding optimal solution tends to the “true” parameter \(\underline{\theta }^\circ \). This proves consistency. \(\square \)

Notice that any \(h_{\underline{f}}(x)\) of the form \(D(\underline{f},x)\) where \(D\) is any (pseudo-)distance satisfies the requirements of this proposition.

4 Applications and numerical methods

In the previous section, we considered an identification paradigm for estimating quantum channels, deriving simple sufficient conditions on \(J\) which guarantee identifiability, as well as the uniqueness and the consistency of the corresponding estimate. In Sects. 4.1 and 4.2, we consider two ML problems widely considered in the literature, see for instance [23] and [26]. We directly show that these fit our framework and satisfy the sufficient conditions. In Sect. 4.3, we show that it is possible to design a globally convergent Newton algorithm for numerically computing the solution to problem (20) when the cost functional \(J\) satisfies the assumption in Proposition 3.3. In Sect. 4.4, we analyze the estimation performances of the minimal setting in the particular case of the ML method of Sect. 4.1.

4.1 ML binomial functional

Assume a certain set of data \(\{f_{jk}\}\) have been obtained, by repeating \(N\) times the measurement of each pair \((\rho _k,\varPi _j)\). For technical reasons (strict convexity of the ML functional on the optimization set) and experimental considerations (noise typically irreversibly affects any state), it is typically assumed that \(0<f_{jk}<1\). The probability of obtaining a series of outcomes with \(c_{jk}=f_{jk}N\) ones for the pair \((j,k)\) is then
$$\begin{aligned} P_\chi (c_{jk})= \left( \begin{array}{ll}{N}\\ {c_{jk}}\end{array}\right) \mathrm{tr}(\chi \varPi _j\otimes \rho _k^{\mathrm{T}})^{c_{jk}} \left[ 1-\mathrm{tr}(\chi \varPi _j\otimes \rho _k^{\mathrm{T}})\right] ^{N-c_{jk}} \end{aligned}$$
(23)
so that the overall probability of \(\{c_{jk}\}\), may be expressed as: \(P_\chi (\{c_{jk}\})=\prod _{j=1}^M\prod _{k=1}^L P_\chi (c_{jk})\). By adopting the ML criterion, once fixed the \(\{c_{jk}\}\) describing the recorded data, the optimal estimate \(\hat{\chi }\) of \(\chi \) is given by maximizing \(P_\chi (\{c_{jk}\})\) with respect to \(\chi \) over a suitable set \(\mathcal {C}\). Let us consider our parametrization of the TP \(\chi (\underline{\theta })\) as in (11). If we assume \(0<\mathrm{tr}(\chi (\underline{\theta }) (\varPi _j\otimes \rho _k^{\mathrm{T}}))<1,\) since the logarithm function is monotone, it is equivalent (up to a constant emerging from the binomial coefficients) to minimize over \(\mathcal {C}=\mathcal{A}_+\cap \mathcal{I}\) 2 the function
$$\begin{aligned} J(\underline{\theta })&= -\frac{1}{N}\sum \limits _{k,l}\log P_{\chi (\underline{\theta })}\left( \left\{ c_{jk}\right\} \right) +\frac{1}{N}\sum _{j,k}\log \left( \begin{array}{c} N \\ c_{jk} \\ \end{array} \right) \nonumber \\&= -\sum _{j,k} f_{jk}\log \left[ \mathrm{tr}\left( \chi (\underline{\theta }) \left( \varPi _j\otimes \rho _k^{\mathrm{T}}\right) \right) \right] \nonumber \\&\quad +\,(1-f_{jk})\log \left[ 1-\mathrm{tr}\left( \chi (\underline{\theta }) \left( \varPi _j\otimes \rho _k^{\mathrm{T}}\right) \right) \right] . \end{aligned}$$
(24)
Here, \(h(X)=-\sum _{j,k} f_{jk}\log (x_{jk})+(1-f_{jk})\log (1-x_{jk})\) with \(x_{jk}=[X]_{jk}\) and \(X\in \mathbb {R}^{M\times L}\) is strictly convex on \(\mathbb {R}^{M \times L}\) because \(0<f_{jk}<1\) by assumption. Notice that \(\partial \mathcal {C}_0\) is the set of \(\underline{\theta }\in \mathcal{A}_+\) for which there exists at least one pair \((\tilde{i},\tilde{k})\) such that \(\mathrm{tr}(\chi (\underline{\theta })(\varPi _{\tilde{j}}\otimes \rho _{\tilde{k}}^{\mathrm{T}}))=0,1\). Suppose that \(\mathrm{tr}(\chi (\underline{\theta })(\varPi _{\tilde{j}}\otimes \rho _{\tilde{k}}^{\mathrm{T}}))\rightarrow 0\) as \(\underline{\theta }\rightarrow \partial \mathcal {C}_0\). Therefore, \(\log [\mathrm{tr}(\chi (\underline{\theta })(\varPi _j\otimes \rho _k^{\mathrm{T}}))]\rightarrow -\infty \). Since \(c_{\tilde{j},\tilde{k}}>0\) by assumption, we have that
$$\begin{aligned} \lim _{\underline{\theta }\rightarrow \partial \mathcal {C}_0} J(\underline{\theta })&= -\lim _{\underline{\theta }\rightarrow \partial \mathcal {C}_0} \sum _{j,k} f_{jk}\log \left[ \mathrm{tr}\left( \chi (\underline{\theta })\left( \varPi _j\otimes \rho _k^{\mathrm{T}}\right) \right) \right] \\&\quad +\,(1-f_{jk})\log \left[ 1-\mathrm{tr}\left( \chi (\underline{\theta }) \left( \varPi _j\otimes \rho _k^{\mathrm{T}}\right) \right) \right] \nonumber \\&= -f_{\tilde{j},\tilde{k}} \lim _{\underline{\theta }\rightarrow \partial \mathcal {C}_0 } \log \left[ \mathrm{tr}\left( \chi (\underline{\theta })\left( \varPi _{\tilde{j}}\otimes \rho _{\tilde{k}}^{\mathrm{T}}\right) \right) \right] \\&= +\,\infty . \end{aligned}$$
In similar way, we obtain the same result from the other case, and the conditions for existence and uniqueness of the minimum of Proposition 3 are satisfied.
It remains to be shown the consistency of the method. Let \(y_f\in \mathbb {R}^{2ML}\) be the vector obtained by stacking \(f_{jk}\) and \(1-f_{jk}\) with \(j=1\ldots M,\; k=1\ldots L\). In similar way, we define \(y_x\in \mathbb {R}^{2ML}\) obtained by stacking \(x_{jk}\) and \(1-x_{jk}\). Note that the sum of the entries of \(y_f\) is equal to \(ML\), thus \(y_f\), as well as \(y_x\), is an unnormalized mass function. It turns out that
$$\begin{aligned} h(X)=D(y_f,y_x)-\sum _{j=1}^{2ML} y_f(j)\log (y_f(j)) \end{aligned}$$
(25)
where
$$\begin{aligned} D(y_f,y_x)=\sum _{j=1}^{2ML} y_f(j)\log \left( \frac{y_f(j)}{y_x(j)}\right) \end{aligned}$$
(26)
is the Kullback–Leibler (pseudo-)distance, [16], and \(y_f(j)\) denotes the \(j\)-th entry of \(y_f\). The second term on the r.h.s in (25) plays no role in the optimization problem (20). Accordingly, we can equivalently consider the functional \(J(\underline{\theta })=D(y_f,y_{g(\underline{\theta })})\) and, in view of Proposition 4, we conclude that the method is consistent.

Finally, notice that one can minimize with respect to the other argument of the Kullback–Leibler distance, i.e., consider the functional \(J(\underline{\theta })=D(y_{g(\underline{\theta })},y_f)\). Clearly, Propositions 3 and 4 are still valid. This method can be understood as a generalization of the maximum entropy criterium in the presence of a prior. In fact, when \(y_f(j)=\frac{1}{2}\) the minimization of \(J\) is equivalent to the maximization of the entropy functional \(\mathbb {H}(y_x)=\sum _{j=1}^{2ML} y_x(j)\log (y_x(j))\).

4.2 ML Gaussian functional

Assume a certain data \(\{f_{jk}\}\) have been obtained. For each \(\rho _k\) consider the sample vector \(\underline{f}_k=\left[ \begin{array}{lll} f_{1k} &{} \ldots &{} f_{Mk}\\ \end{array} \right] ^{\mathrm{T}}\in \mathbb {R}^{M}\), that can be thought as a sample of \(\underline{p}_{\chi }^k=\left[ \begin{array}{lll} \mathrm{tr}\left( \chi \left( \varPi _1\otimes \rho _k^{\mathrm{T}}\right) \right) &{} \ldots &{} \mathrm{tr}\left( \chi \left( \varPi _M\otimes \rho _k^{\mathrm{T}}\right) \right) \\ \end{array} \right] ^{\mathrm{T}}\). Accordingly, we can consider the probabilistic model \(\underline{f}_k=\underline{p}_{\chi }^k+\underline{v}_k\) where \(\underline{v}_k\sim \mathcal {N}(0,\Sigma ),\Sigma >0\) is gaussian noise. This noise model is a good representation of certain experimental settings in quantum optics, where the sampled frequencies are obtained with high number of counts \(c_{j}\) and the gaussian noise is due to the electronic of the measurement devices, typically photodiodes. In our model, we can think that to each measured \(\varPi _j\) is associated a different device with noise component \(v_j\). Notice that, the noise components are in general correlated. Let \(\underline{\mathcal {D}}_j\) denote the device associated with \(\varPi _j\). Then, \(\underline{\mathcal {D}}_j\) will measure the data \(f_{j1},\ldots ,f_{jL}\). Since \(\underline{f}_{k}\sim \mathcal {N}(\underline{p}_\chi ^k,\Sigma )\), the probability of obtaining the outcomes \(\underline{f}_k\) is then
$$\begin{aligned} P^k_\chi (\underline{f}_{k})=\frac{1}{\sqrt{(2\pi )^M\det \Sigma }}\exp \{-\frac{1}{2}(\underline{f}_{k}-\underline{p}_{\chi }^k)\Sigma ^{-1}(\underline{f}_{k}-\underline{p}_{\chi }^k)^{\mathrm{T}}\} \end{aligned}$$
(27)
so that the overall probability of \(\{f_{jk}\}\) is equal to \(P_\chi (\{f_{jk}\})=\prod _{k=1}^L P^k_\chi (\underline{f}_k)\). By adopting the ML criterion, given \(\{f_{jk}\}\), the optimal estimate \(\hat{\chi }\) of \(\chi \) is given by maximizing \(P_\chi (\{f_{jk}\})\) with respect to \(\chi \). Taking into account the parametrization \(\chi (\underline{\theta })\) as in (11), it is equivalent to minimize over \(\mathcal {C}=\mathcal {A}_+\) the function
$$\begin{aligned} J(\underline{\theta })&= -2\log \left( \sqrt{(2\pi )^M\det (\Sigma )}\, P_{\chi (\underline{\theta })}(\{f_{jk}\})\right) \nonumber \\&= \sum _{k=1}^{L}(\underline{f}_{k}-\underline{p}_{\chi (\underline{\theta })}^k)\Sigma ^{-1}(\underline{f}_{k}-\underline{p}_{\chi (\underline{\theta })}^k)^{\mathrm{T}}. \end{aligned}$$
(28)
Then, it easy to see that the conditions of Proposition 3 are satisfied. Accordingly, the minimum \(\hat{\underline{\theta }}\) of \(J\) is unique. By Proposition 4, we get the consistency of the method.

4.3 A convergent Newton-type algorithm

In Proposition 3, we characterized a general identification paradigm for estimating quantum channels and showed the existence and uniqueness of its solution. Now, we face the problem of (numerically) finding the solution \(\hat{\underline{\theta }}\) minimizing \(J\) over the prescribed set.

Assume that \(\mathcal {S}_{TP}= \mathcal {B}\). Problem (20), with \(\mathcal {C}=\mathcal {A}_+\cap \mathcal {I}\), is equivalent to minimize \(J\) over \(\mathcal {I}\) with the linear inequality constraint \(\chi (\underline{\theta })\ge 0\). Rewrite the problem, making the inequality constraint implicit in the objective
$$\begin{aligned} \hat{\underline{\theta }}= \min _{\underline{\theta }\in \mathcal {I}}J(\underline{\theta })+I_-(\underline{\theta }) \end{aligned}$$
(29)
where \(I_-: \mathbb {R}^{d^4-d^2}\rightarrow \mathbb {R}\) is the indicator function for the nonpositive semidefinite matrices \(\chi (\underline{\theta })\)
$$\begin{aligned} I_-(\underline{\theta }):=\left\{ \begin{array}{ll} 0, &{} \underline{\theta } s.t. \chi (\underline{\theta })\ge 0 \\ +\infty , &{} \hbox {elsewhere.} \\ \end{array} \right. \end{aligned}$$
(30)
The basic idea is to approximate the indicator function \(I_-\) by the convex function
$$\begin{aligned} \hat{I}_-(\underline{\theta }):=-\frac{1}{q}\log \det (\chi (\underline{\theta })) \end{aligned}$$
(31)
where \(q>0\) is a parameter that sets the accuracy of the approximation (the approximation becomes more accurate as \(q\) increases). Then, we take into account the approximated problem
$$\begin{aligned} \hat{\underline{\theta }}^q=\mathop {\hbox {min}}\limits _{\underline{\theta }\in \hbox {int} \left( \mathcal {C} \right) } G_q(\underline{\theta }) \end{aligned}$$
(32)
where \(\hbox {int} \left( \mathcal {C} \right) \) denotes the interior of \(\mathcal {C}\) and the convex function
$$\begin{aligned} G_q(\underline{\theta }):=q J(\underline{\theta })-\log \det (\chi (\underline{\theta })). \end{aligned}$$
(33)
The solution \(\hat{\underline{\theta }}^q\) can be computed employing the following Newton algorithm with backtracking stage:
  1. 1.

    Set the initial condition \(\underline{\theta }_0\in \hbox {int} \left( \mathcal {C} \right) \).

     
  2. 2.
    At each iteration, compute the Newton step
    $$\begin{aligned} \Delta \underline{\theta }_l =-H_{\underline{\theta }_l}^{-1} \nabla G_{\underline{\theta }_l}\in \mathbb {R}^{d^4-d^2} \end{aligned}$$
    (34)
    where
    $$\begin{aligned} \left[ \,\nabla G_{\underline{\theta }}\,\right] _s&:= \frac{\partial {G_q(\underline{\theta })}}{\partial {\theta _{s}}}=q\frac{\partial {J(\underline{\theta })}}{\partial {\theta _{s}}}-\mathrm{tr}[\chi (\underline{\theta })^{-1}Q_s]\\ \left[ \,H_{\underline{\theta }}\,\right] _{r,s}&:= \frac{\partial {G_q(\underline{\theta })}}{\partial {\theta _{r}\theta _{s}}}=q \frac{\partial {J(\underline{\theta })}}{\partial {\theta _{r}\theta _{s}}} +\mathrm{tr}[\chi (\underline{\theta })^{-1}Q_r\chi (\underline{\theta })^{-1}Q_s] \end{aligned}$$
    are the element in position \(s\) of the gradient (understood as column vector) and the element in position \((r,s)\) of the Hessian of \(G_q\) both computed at \(\underline{\theta }\).
     
  3. 3.
    Set \(t^0_l = 1\), and let \(t^{p+1}_l=t^p_l/2\) until all the following conditions hold:
    $$\begin{aligned}&\underline{\theta }_l + t^p_l \Delta \underline{\theta }_l\in \hbox {int} (\mathcal {C})\\&G_q( \underline{\theta }_l + t^p_l \Delta \underline{\theta }_l )< G_q(\underline{\theta }_l)+\gamma t^p_l \nabla G_{\underline{\theta }_l}^{\mathrm{T}} \Delta \underline{\theta }_l \end{aligned}$$
    where \(\gamma \) is a real constant, \(0<\gamma <\frac{1}{2}\).
     
  4. 4.

    Set \(\underline{\theta }_{l+1} = \underline{\theta }_l + t^p_l \Delta \underline{\theta }_l\in \hbox {int} \left( \mathcal {C} \right) \).

     
  5. 5.

    Repeat steps 2, 3 and 4 until the condition \(\Vert \nabla G_{\underline{\theta }_l}\Vert < \epsilon \) is satisfied, where \(\epsilon \) is a (small) tolerance threshold, then set \(\hat{\underline{\theta }}^q= \underline{\theta }_l\).

     
Adding additional mild assumptions on J, which typically hold, it is possible to show the algorithm globally converges: in the first stage, it converges in a linear way, while in the last stage, it does converge quadratically. To illustrate this fact, in the Appendix we prove the global convergence of the algorithm when the cost function (24) is considered. Then, it is possible to show [12, p. 597] that
$$\begin{aligned} J(\hat{\underline{\theta }}) \le J(\hat{\underline{\theta }}^q)\le J(\hat{\underline{\theta }})+\frac{d}{q}. \end{aligned}$$
(35)
Hence, \(d/q\) is the accuracy (with respect to \(\hat{\underline{\theta }}\)) of the solution \(\hat{\underline{\theta }}^q\) found. This method, however, works well only setting a moderate accuracy.
An extension of the previous procedure is given by the Barrier method [12, p. 569] which solves (29) with a specified accuracy \(\xi >0\):
  1. 1.

    Set the initial conditions \(q_0>0\) and \(\underline{\theta }^{q_0}=\left[ \begin{array}{lll} 0 &{}\ldots &{} 0 \\ \end{array} \right] ^{\mathrm{T}}\in \hbox {int} \left( \mathcal {C} \right) \).

     
  2. 2.

    Centering step: At the \(k\)-th iteration compute \(\hat{\underline{\theta }}^{q_k}\in \hbox {int} \left( \mathcal {C} \right) \) by minimizing \(G_{q_k}\) with starting point \(\hat{\underline{\theta }}^{q_{k-1}}\) using the Newton method previously presented.

     
  3. 3.

    Set \(q_{k+1}=\mu q_{k}\).

     
  4. 4.

    Repeat steps 2 and 3 until the condition \( \frac{d^2}{q_k}< \xi \) is satisfied, then set \(\hat{\underline{\theta }}= \hat{\underline{\theta }}^{q_k}\).

     
So, at each iteration, we compute \(\hat{\underline{\theta }}^{q_k}\) starting from the previously computed point \(\hat{\underline{\theta }}^{q_{k-1}}\) and then increase \(q_k\) by a factor \(\mu >1\). The choice of the value of the parameters \(q_0\) and \(\mu \) is discussed in [12, p. 574]. Since the Newton method used in the centering step globally converges, (see the Appendix) the sequence \(\{\hat{\underline{\theta }}^{q_k}\}_{k\ge 0}\) converges to the unique minimum point \(\hat{\underline{\theta }}\) of \(J\) with accuracy \(\xi \). Moreover, the number of centering steps required to compute \(\hat{\underline{\theta }}\) with accuracy \(\xi \) starting with \(q_0\) is equal to \(\left\lceil \frac{\log \frac{d}{\xi q_0}}{\log \mu } \right\rceil +1\), [12, p. 601].

4.4 Numerical simulations

We use the following notation:
  • IN method to denote the standard quantum process tomography by inversion of Sect. 3.3.

  • ML method to denote the ML method presented in Sect. 4.1. Here, we want to compare the performance of IN and ML method for the qubit case \(d=2\). Consider a set of CPTP map \(\{\chi _l\}_{l=1}^{100}\) randomly generated and the minimal setting (16). Once the number of measurements \(N\) for each couple \((\rho _k,\varPi _j)\) is fixed, we consider the following comparison procedure:

  • At the \(l\)-th experiment, let \(\{c_{jk}^l\}\) be the data corresponding to the map \(\chi _l\). Then, compute the corresponding frequencies \(f_{jk}^l=c^l_{jk}/N\).

  • From \(\{f_{jk}^l\}\) compute the estimates \(\hat{\chi }^{IN}_l\) and \(\hat{\chi }^{ML}_l\) using IN and ML method, respectively.

  • Compute the relative errors
    $$\begin{aligned} e_{IN}(l)=\frac{\Vert \hat{\chi }^{IN}_l-\chi _l\Vert }{\Vert \chi _l\Vert },\;e_{ML}(l)=\frac{\Vert \hat{\chi }^{ML}_l-\chi _l\Vert }{\Vert \chi _l\Vert }. \end{aligned}$$
    (36)
  • When the experiments are completed, compute the mean of the relative error
    $$\begin{aligned} \mu _{IN}=\frac{1}{100}\sum _{l=1}^{100} e_{IN}(l),\;\mu _{ML}=\frac{1}{100}\sum _{l=1}^{100} e_{ML}(l). \end{aligned}$$
    (37)
In Fig. 1, the results obtained for different lengths \(N\) of measurements related to \(\{c_{jk}^l\}\) are depicted. The mean error norm of ML method is smaller than the one corresponding to the IN method, in particular when \(N\) is small (typical situation in the practice). In addition, more than half of the estimates obtained by the IN method are not positive semidefinite, i.e., not physically acceptable, even when \(N\) is sufficient large. Finally, we observe that for both methods, the mean error decrease as \(N\) grows. This fact confirms in the practice their consistency.
Fig. 1

Comparison performance IN versus ML method. \(N\) is the total number of measurements for each \((\rho _k,\varPi _j),\,\mu \) is the mean relative error as introduced in (37), while \(\sharp F\) denotes the number of failures of the IN method, i.e., the times in which the reconstructed \(\chi \) is not positive

Let \(\mathcal {T}_{M,L}\) denote the set of the experimental settings with \(L\) input states and \(M\) observables satisfying Proposition 1. Accordingly, the set of the minimal experimental settings is \(\mathcal {T}_{d^2-1,d^2}\). Here, we consider the case \(d=2\). We want to compare the performance of the minimal settings in \(\mathcal {T}_{3,4}\) with those settings that employ more input states and observables. We shall do so by picking a test channel, finding a minimal setting that performs well, and comparing its performance with a nonminimal setting in \(\mathcal {T}_{M,L},\,M>3,L\ge 4\) that performs well in this set while the total number \(N_T\) of trials is fixed.

Consider the Kraus map (1) representing a perturbed amplitude damping operation \((\gamma =0.5)\) with
$$\begin{aligned} K_1=\sqrt{0.9}\left[ \begin{array}{cl} \sqrt{0.5} &{} 0 \\ 0 &{} 0 \\ \end{array} \right] , K_2=\sqrt{0.9}\left[ \begin{array}{lc} 1 &{} 0 \\ 0 &{} \sqrt{0.5} \\ \end{array} \right] , \end{aligned}$$
\(K_3=\sqrt{0.1}/2 I_2,\,K_j=\sqrt{0.1}/2\sigma _{l(j)},\,j=4,5,6,\,l(j)=x,y,z\) corresponding to the \(\chi \)-representation
$$\begin{aligned} \chi =\left[ \begin{array}{cccc} 0.95&{} 0 &{} 0 &{} 0.6364\\ 0 &{} 0.5&{} 0 &{} 0\\ 0 &{} 0 &{} 0.05&{} 0\\ 0.6364 &{} 0 &{} 0 &{} 0.5 \\ \end{array} \right] . \end{aligned}$$
We set the total number of trials \(N_T=3600\). Fixed the set \(\mathcal {T}_{M,L}\;M\ge 3\;L\ge 4\), we take into account the following procedure:
  • Set \(N=N_T\setminus (LM)\) and choose a randomly generated collection \(\{\mathrm{T}_m\}_{m=1}^{100},\,\mathrm{T}_m\in \mathcal {T}_{M,L}\).

  • Perform 50 experiments for each \(\mathrm{T}_m\). At the \(l\)-th experiment, we have a sample data \(\{f_{jk}^m(l)\}\) corresponding to \(\chi \) and \(\mathrm{T}_m\). From \(\{f_{jk}^m(l)\}\) compute the estimate \(\hat{\chi }_{m}(l)\) using the ML method and the corresponding error norm \(e_m(l)=\Vert \hat{\chi }_m(l)-\chi \Vert / \Vert \chi \Vert \).

  • When the experiments corresponding to \(\mathrm{T}_m\) are completed, compute the mean error norm \(\mu _m=\frac{1}{50}\sum _{l=1}^{50}e_m(l) \).

  • When we have \(\mu _m\) for \(m=1 \ldots 100\), compute
    $$\begin{aligned} \bar{\mu }_{L,M}=\min _{m\in \{1,\ldots ,50\}} \mu _m. \end{aligned}$$
In Fig. 2, \(\bar{\mu }_{L,M}\) is depicted for different values of \(M\) and \(L\). As we can see, incrementing the number of input states/observables does not lead to an improvement in the performance index. Analogous results have been observed with other choices of test maps and \(N_T\). Finally, in Fig. 3 is depicted the true \(\chi \) and the averaged estimation \(\bar{\chi }_{ML}=\frac{1}{50}\sum _{l=1}^{50}\chi _m(l)\) with \(m=\arg \min _{m\in \{1,\ldots ,100\}} \mu _m\) for \(M=3\) and \(L=4\).
Fig. 2

\(\bar{\mu }_{L,M}\) for different values of \(L\) and \(M\)

Fig. 3

Real and imaginary part of \(\chi \) (top) and the averaged estimation \(\bar{\chi }_{ML}\) (bottom). In order to improve readability, the vertical scale of the imaginary part has been magnified in order to show the errors are below 0.01

5 Conclusions

Standard quantum process tomography or, in system-theoretical terms, identification of quantum channels is based on repeated measurements of \(L\) quantum states along \(M\) observables. We characterized properties and cardinality of the minimal sets that guarantee identifiability for linearly independent states and measurements. Building on these result, we have also addressed the existence, uniqueness and consistency problems for a general trace-preserving channel identification techniques based on the minimization of convex cost functionals. Moreover, we presented a Newton-type numerical method for solving the problem. Finally, we performed numerical simulations which indicate that:
  • convex methods are more reliable than the inverse method (at least) when the number of measurement is small

  • experimental settings which are richer than the minimal one do not improve the quality of the estimate. The critical parameter, as long as accuracy of the estimation is concerned, is the total size of the data set.

Accordingly, the minimal experimental setting for standard quantum process tomography constitutes a valid, sensible choice, because of both the reliability of the solution and the reduction in the experimental apparatus complexity.

Footnotes

  1. 1.

    These topics could also be studied from an abstract, frame-theoretical viewpoint [14]: however, in order to maintain contact with well-established notations and concepts in quantum information theory, we choose a more direct approach.

  2. 2.

    If the optimization is constrained to \(\mathcal{A}_+\cap \mathcal{I},\) we are guaranteed that \(f_{jk}\) will tend to be positive for a sufficiently large numbers of trials.

Notes

Acknowledgments

The authors would like to thank Alberto Dall’Arche, Andrea Tomaello, Prof. Paolo Villoresi and Dr. Giuseppe Vallone for stimulating discussions on the topics of this paper. Work partially supported by the QFuture research grant of the University of Padova, and by the Department of Information Engineering research project “QUINTET.”

References

  1. 1.
    Aiello, A., Puentes, G., Voigt, D., Woerdman, J.P.: Maximum-likelihood estimation of Mueller matrices. Opt. Lett. 31(6), 817–819 (2006)CrossRefADSGoogle Scholar
  2. 2.
    Alicki, R., Lendi, K.: Quantum Dynamical Semigroups and Applications. Springer, Berlin (1987)MATHGoogle Scholar
  3. 3.
    Altafini, C.: Feedback stabilization of isospectral control systems on complex flag manifolds: application to quantum ensembles. IEEE Trans. Autom. Control 11(52), 2019–2028 (2007)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Altafini, C., Ticozzi, F.: Modeling and control of quantum systems: an introduction. IEEE Trans. Autom. Control 57(8), 1898–1917 (2012)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Belavkin, V.P.: Towards the theory of control in observable quantum systems. Autom. Remote Control 44, 178–188 (1983)MathSciNetMATHGoogle Scholar
  6. 6.
    Benenti, G., Strini, G.: Simple representation of quantum process tomography. Phys. Rev. A 80(2), 022–318 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Bhatia, R.: Matrix Analysis. Springer, New York (1997)CrossRefMATHGoogle Scholar
  8. 8.
    Bisio, A., Chiribella, G., D’Ariano, G.M., Facchini, S., Perinotti, P.: Optimal quantum tomography of states, measurements, and transformations. Phys. Rev. Lett. 102, 010,404 (2009)Google Scholar
  9. 9.
    Bongioanni, I., Sansoni, L., Sciarrino, F., Vallone, G.: Experimental quantum process tomography of non-trace-preserving maps. Phys. Rev. A 82(4), 042–307 (2010)CrossRefGoogle Scholar
  10. 10.
    Boulant, N., Havel, T.F., Pravia, M.A., Cory, D.G.: Robust method for estimating the Lindblad operators of a dissipative quantum process from measurements of the density operator at multiple time points. Phys. Rev. A 67(4), 042–322 (2003)CrossRefGoogle Scholar
  11. 11.
    Bouwmeester, D., Ekert, A., Zeilinger, A. (eds.): The Physics of Quantum Information: Quantum Cryptography, Quantum Teleportation, Quantum Computation. Springer, Berlin (2000)MATHGoogle Scholar
  12. 12.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefMATHGoogle Scholar
  13. 13.
    Busch, P.: Informationally complete sets of physical quantities. Int. J. Theor. Phys. 30(9), 1217–1227 (1991)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Casazza, P.: The art of frame theory. Taiwan. J. Math. 4(2), 129–201 (2000)MathSciNetMATHGoogle Scholar
  15. 15.
    Choi, M.D.: Completely positive linear maps on complex matrices. Linear Algebra Appl. 10, 285–290 (1975)CrossRefMathSciNetMATHGoogle Scholar
  16. 16.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)CrossRefMATHGoogle Scholar
  17. 17.
    Dahleh, M., Pierce, A., Rabitz, H., Pierce, A.: Control of molecular motion. Proc. IEEE 84, 6–15 (1996)CrossRefGoogle Scholar
  18. 18.
    D’Alessandro, D.: Introduction to Quantum Control and Dynamics. Applied Mathematics & Nonlinear Science. Chapman & Hall/CRC, London (2007)CrossRefGoogle Scholar
  19. 19.
    D’Alessandro, D., Dahleh, M.: Optimal control of two level quantum system. IEEE Trans. Autom. Control 46(6), 866–876 (2001)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    D’Ariano, G.M., Lo Presti, P.: Quantum tomography for measuring experimentally the matrix elements of an arbitrary quantum operation. Phys. Rev. Lett. 86, 4195–4198 (2001)CrossRefADSGoogle Scholar
  21. 21.
    D’Ariano, G.M., Maccone, L., Paris, M.G.A.: Quorum of observables for universal quantum estimation. J. Phys. A Math. Gen. 34(1), 93 (2001)MathSciNetCrossRefADSMATHGoogle Scholar
  22. 22.
    Doherty, A., Doyle, J., Mabuchi, H., Jacobs, K., Habib, S.: Robust control in the quantum domain. Proc. IEEE Conf. Decis. Control 1, 949–954 (2000)Google Scholar
  23. 23.
    Fiurášek, J., Hradil, Z.: Maximum-likelihood estimation of quantum processes. Phys. Rev. A 63(2), 020–101 (2001). doi:10.1103/PhysRevA.63.020101 Google Scholar
  24. 24.
    Holevo, A.: Statistical Structure of Quantum Theory. Lecture Notes in Physics; Monographs, vol. 67. Springer, Berlin (2001)CrossRefGoogle Scholar
  25. 25.
    Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (1990)MATHGoogle Scholar
  26. 26.
    James, D.F.V., Kwiat, P.G., Munro, W.J., White, A.G.: Measurement of qubits. Phys. Rev. A 64, 052–312 (2001)CrossRefGoogle Scholar
  27. 27.
    James, M., Nurdin, H., Petersen, I.: \({H}^{\infty }\) control of linear quantum stochastic systems. IEEE Trans. Autom. Control 53(8), 1787–1803 (2008)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Khaneja, N., Brockett, R., Glaser, S.: Time optimal control of spin systems. Phys. Rev. A 63, 032–308 (2001)CrossRefGoogle Scholar
  29. 29.
    Kosmol, P.: Optimierung und Approximation. de Gruyter, Berlin (1991)MATHGoogle Scholar
  30. 30.
    Kraus, K.: States, Effects, and Operations: Fundamental Notions of Quantum Theory. Lecture notes in Physics. Springer, Berlin (1983)CrossRefGoogle Scholar
  31. 31.
    Mohseni, M., Rezakhani, A.T., Lidar, D.A.: Quantum-process tomography: resource analysis of different strategies. Phys. Rev. A 77(3), 032,322 (2008)Google Scholar
  32. 32.
    Nielsen, M.A., Chuang, I.L.: Quantum Computation and Information. Cambridge University Press, Cambridge (2002)MATHGoogle Scholar
  33. 33.
    Nurdin, H., James, M., Petersen, I.: Coherent quantum LQG control. Automatica 45, 1837–1846 (2009)MathSciNetCrossRefMATHGoogle Scholar
  34. 34.
    Paris, M.G.A., R̆ehác̆ek, J. (eds.): Quantum States Estimation, Lecture Notes Physics, vol. 649. Springer, Berlin (2004)Google Scholar
  35. 35.
    Petz, D.: Quantum Information Theory and Quantum Statistics. Springer, Berlin (2008)MATHGoogle Scholar
  36. 36.
    R̆ehác̆ek, J., Englert, B.G., Kaszlikowski, D.: Minimal qubit tomography. Phys. Rev. A 70(5), 052,321 (2004)Google Scholar
  37. 37.
    Sacchi, M.F.: Maximum-likelihood reconstruction of completely positive maps. Phys. Rev. A 63(5), 054–104 (2001). doi:10.1103/PhysRevA.63.054104 MathSciNetCrossRefGoogle Scholar
  38. 38.
    Scott, A.J.: Optimizing quantum process tomography with unitary 2-designs. J. Phys. A Math. Theor. 41, 055–308 (2008)CrossRefMathSciNetGoogle Scholar
  39. 39.
    Ticozzi, F., Nishio, K., Altafini, C.: Stabilization of stochastic quantum dynamics via open- and closed-loop control. IEEE Trans. Autom. Control 58(1), 74–85Google Scholar
  40. 40.
    Ticozzi, F., Viola, L.: Analysis and synthesis of attractive quantum Markovian dynamics. Automatica 45, 2002–2009 (2009)MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    van Handel, R., Stockton, J.K., Mabuchi, H.: Feedback control of quantum state reduction. IEEE Trans. Autom. Control 50(6), 768–780 (2005)CrossRefMathSciNetGoogle Scholar
  42. 42.
    Villoresi, P., Jennewein, T., Tamburini, F., M. Aspelmeyer, C.B., Ursin, R., Pernechele, C., Luceri, V., Bianco, G., Zeilinger, A., Barbieri, C.: Experimental verification of the feasibility of a quantum channel between space and earth. New J. Phys. 10, 033,038 (2008)Google Scholar
  43. 43.
    Wiseman, H.M., Milburn, G.J.: Quantum Measurement and Control. Cambridge University Press, Cambridge, MA (2009)CrossRefMATHGoogle Scholar
  44. 44.
    Ziman, M.: Incomplete quantum process tomography and principle of maximal entropy. Phys. Rev. A 78, 032–118 (2008)MathSciNetCrossRefMATHGoogle Scholar
  45. 45.
    Ziman, M., Plesch, M., Bužek, V., Štelmachovič, P.: Process reconstruction: From unphysical to physical maps via maximum likelihood. Phys. Rev. A 72(2), 022–106 (2005)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Mattia Zorzi
    • 1
  • Francesco Ticozzi
    • 2
  • Augusto Ferrante
    • 2
  1. 1.Department of Electrical Engineering and Computer ScienceUniversity of LiègeLiègeBelgium
  2. 2.Dipartimento di Ingegneria dell’InformazioneUniversità di PadovaPaduaItaly

Personalised recommendations