Classical shadows and POVMs

On several occasions, it's been suggested to me that the papers

are very closely related, but that's far from obvious on the surface. Sure, they both describe methods to obtain a density operator from measurement data, but the former uses POVMs and a generative model, whereas the latter inverts a quantum channel. Those definitely sound at least a little bit different!

In their Supplementary Information, Huang et al. mention that

This classical shadow ρ^\hat{\rho} corresponds to the linear inversion (or least squares) estimator of ρ\rho in the single-shot limit.

As it turns out, the method of Carrasquilla et al. is also a special case of linear least squares. Despite their outward appearance, the two approaches are the same under the hood!

Hermitian operators can't hide from least squares

Suppose there's a finite-dimensional density operator ρ^\hat{\rho} to which, by some cruel twist of fate, we have access only through pi=TrE^iρ^. p_i = \Tr \hat{E}_i \hat{\rho}. Can we reconstruct ρ^\hat{\rho} from {pi}\{ p_i \}? The answer turns out to be "it depends on {E^i}\{ \hat{E}_i \}", but first, let's tackle the slightly more general problem of reconstructing a finite-dimensional Hermitian operator A^\hat{A} from yi=TrE^iA^ y_i = \Tr \hat{E}_i \hat{A} with Hermitian E^i\hat{E}_i. Despite acting on a complex Hilbert space, Hermitian operators (for which A^=A^\hat{A}\adj = \hat{A}) form a real inner product space. The inner product is given by TrA^B^\Tr \hat{A} \hat{B}, and we can choose an orthonormal basis {F^j}\{ \hat{F}_j \}, so the basis expansion of A^\hat{A} is A^=jajF^j, \hat{A} = \sum_j a_j \hat{F}_j, with the unique expansion coefficients aj=Tr[F^jA^]a_j = \Tr[\hat{F}_j \hat{A}].

If we let {F^j}\{ \hat{F}_j \} be the standard basis in which to express matrices and vectors, we can write y=Ea, \vec{y} = \vec{E} \vec{a}, where the matrix entries Eij=TrF^jE^i E_{i j} = \Tr{\hat{F}_j \hat{E}_i} are the expansion coefficients of E^i\hat{E}_i. This is clearly a system of linear equations, and the usual considerations apply. Specifically, we can expect a unique solution iff the rank of E\vec{E} is equal to the dimension of the vector space, which happens when {E^i}\{ \hat{E}_i \} is a spanning set. In that case, ETE\vec{E}\trans \vec{E} is invertible and we find the well-known equation a=(ETE)1ETy. \vec{a} = (\vec{E}\trans \vec{E})\inv \vec{E}\trans \vec{y}. For no reason whatsoever, we can generalize it a bit by inserting any positive definite matrix Λ\vec{\Lambda}: a=(ETΛE)1ETΛy. \vec{a} = (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{y}. Thus, A^=j[(ETΛE)1ETΛy]jF^j \hat{A} = \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{y} \right]_j \hat{F}_j is the operator we'd like to recover from y\vec{y}. Note that (ETΛE)1ETΛ(\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} is a left inverse of E\vec{E}.

The above assumes that y\vec{y} is known exactly, but what if we only have noisy data? Suppose that y~=Ea+ε, \vec{\tilde{y}} = \vec{E} \vec{a} + \vecg{\varepsilon}, with a random perturbation ε\vecg{\varepsilon}. Knowing only y~\vec{\tilde{y}}, the best we can hope for is to find a vector a~\vec{\tilde{a}} that minimizes the squares of the residual errors, y~i(Ea~)i2| \tilde{y}_i - (\vec{E} \vec{\tilde{a}})_i |^2, because the system will generally be inconsistent. If we assume that all components of the perturbation are uncorrelated, with zero mean and identical variance, the optimal solution turns out to be given by the aptly-named least squares method: a~=(ETE)1ETy~. \vec{\tilde{a}} = (\vec{E}\trans \vec{E})\inv \vec{E}\trans \vec{\tilde{y}}. If we allow each εi\varepsilon_i to have a different variance σi2>0\sigma_i^2 > 0, then it's better to use the more general weighted least squares: a~=(ETΛE)1ETΛy~, \vec{\tilde{a}} = (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{y}}, where Λ\vec{\Lambda} is a diagonal matrix with the entries Λii=λi=1/σi2\Lambda_{i i} = \lambda_i = 1 / \sigma_i^2, which weights the observations according to their relative uncertainties. Thus, we can approximate the operator we'd like to recover from y~\vec{\tilde{y}}: A^j[(ETΛE)1ETΛy~]jF^j. \hat{A} \approx \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{y}} \right]_j \hat{F}_j. It's not hard to see that averaging over the noise gives A^\hat{A} exactly.

Because F^j=i[(ETΛE)1ETΛ]jiE^i, \hat{F}_j = \sum_i \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \right]_{j i} \hat{E}_i, we can avoid explicitly mentioning the basis {F^j}\{ \hat{F}_j \} and work exclusively with {E^i}\{ \hat{E}_i \}. Performing the substitution gives us A^=i[ΛE(ETΛE)2ETΛy]iE^i \hat{A} = \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{y} \right]_i \hat{E}_i and A^i[ΛE(ETΛE)2ETΛy~]iE^i. \hat{A} \approx \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{\tilde{y}} \right]_i \hat{E}_i. Building the matrix E\vec{E} directly from inner products of E^i\hat{E}_i with F^j\hat{F}_j also requires us to consider a specific basis {F^j}\{ \hat{F}_j \}, but we can choose the basis implicitly by decomposing the matrix of mutual overlaps of {E^i}\{ \hat{E}_i \}, with the entries Sik=TrE^iE^k, S_{i k} = \Tr \hat{E}_i \hat{E}_k, as S=EET\vec{S} = \vec{E} \vec{E}\trans.

State reconstruction with POVMs is least squares

How does this apply to the reconstruction of ρ^\hat{\rho}? Because density operators are Hermitian, all of the above is relevant for them as well, and we can obtain ρ^\hat{\rho} ρ^=jrjF^j \hat{\rho} = \sum_j r_j \hat{F}_j from pi=TrEi^ρ^, p_i = \Tr \hat{E_i} \hat{\rho}, which we can write more compactly as p=Er\vec{p} = \vec{E} \vec{r}. This is particularly relevant when the Ei^\hat{E_i} are the elements of a POVM, which must be positive semidefinite and sum to identity: iEi^=1^; \sum_i \hat{E_i} = \hat{1}; the former guarantees that 0pi0 \le p_i and the latter that ipi=Trρ^=1. \sum_i p_i = \Tr \hat{\rho} = 1. If we have some apparatus that measures the state ρ^\hat{\rho} using this POVM, it will give the outcome label ii with probability pip_i. If we run this measurement process long enough, we might get enough data to approximate p\vec{p} with p~\vec{\tilde{p}} and find ρ^i[ΛE(ETΛE)2ETΛp~]iE^i, \hat{\rho} \approx \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_i \hat{E}_i, where the right-hand side isn't guaranteed to be a legitimate density operator on account of the noise. Recall that Λ\vec{\Lambda} should either contain the reciprocals of the variances of the random perturbations on the diagonal, or else be any positive definite matrix in the noiseless case.

This inversion of E\vec{E} is the approach taken by Carrasquilla et al., but they use the simpler expression ρ^i(S1p~)iE^i, \hat{\rho} \approx \sum_i (\vec{S}\inv \vec{\tilde{p}})_i \hat{E}_i, which requires the inverse of the overlap matrix S\vec{S}. Because S\vec{S} is only invertible when {E^i}\{ \hat{E}_i \} is a basis, this works for the tetrahedral and Pauli-4 POVMs, but not for the Pauli-6 POVM, which has more than 4 elements. When S1\vec{S}\inv exists, E1\vec{E}\inv does too, so it's straightforward to show that for any arbitrary invertible Λ\vec{\Lambda}, we can simplify the awful formula we derived earlier all the way down to a single matrix: ΛE(ETΛE)2ETΛ=ΛEE1Λ1ETE1Λ1ETETΛ=ETE1=S1. \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} = \vec{\Lambda} \vec{E} \vec{E}\inv \vec{\Lambda}\inv \vec{E}\invtrans \vec{E}\inv \vec{\Lambda}\inv \vec{E}\invtrans \vec{E}\trans \vec{\Lambda} = \vec{E}\invtrans \vec{E}\inv = \vec{S}\inv.

As an aside, Carrasquilla et al. also point out that it's not strictly necessary to reconstruct ρ^\hat{\rho} if it will only be used to compute an expectation value of a Hermitian operator, because A^=Trρ^A^=Trρ^ik(S1)ikTr[E^kA^]E^i=ikpi(S1)ikTr[E^kA^]. \langle \hat{A} \rangle = \Tr \hat{\rho} \hat{A} = \Tr \hat{\rho} \sum_{i k} (\vec{S}\inv)_{i k} \Tr[\hat{E}_k \hat{A}] \hat{E}_i = \sum_{i k} p_i (\vec{S}\inv)_{i k} \Tr[\hat{E}_k \hat{A}]. If we don't limit ourselves to invertible S\vec{S}, we can write that slightly more generally as A^=Trρ^A^=ijkTr[E^iρ^][E(ETE)1]ij[(ETE)1ET]jkTr[E^kA^], \langle \hat{A} \rangle = \Tr \hat{\rho} \hat{A} = \sum_{i j k} \Tr[\hat{E}_i \hat{\rho}] \left[ \vec{E} (\vec{E}\trans \vec{E})\inv \right]_{i j} \left[ (\vec{E}\trans \vec{E})\inv \vec{E}\trans \right]_{j k} \Tr[\hat{E}_k \hat{A}], which has a rather nice symmetry to it.

Classical shadows are least squares

On the other hand, Huang et al. start from a probability distribution over unitary operators and measurement outcomes, and define a quantum channel M\mathcal{M} that maps the density operator ρ^\hat{\rho} to M(ρ^)=kσπkpk(σ)U^kσ ⁣σU^k, \mathcal{M}(\hat{\rho}) = \sum_{k \sigma} \pi_k p_k(\sigma) \hat{U}_k\adj \dyad{\sigma} \hat{U}_k, where πk\pi_k is the probability of choosing U^k\hat{U}_k, which we'll take to be discrete, and pk(σ)=TrU^kρ^U^kσ ⁣σ p_k(\sigma) = \Tr \hat{U}_k \hat{\rho} \hat{U}_k\adj \dyad{\sigma} is the probability of getting the outcome σ\sigma when measuring ρ^\hat{\rho} in the corresponding basis. The elements E^kσ=πkU^kσ ⁣σU^k \hat{E}_{k \sigma} = \pi_k \hat{U}_k\adj \dyad{\sigma} \hat{U}_k form a straightforward POVM, and we can write the action of the channel as M(ρ^)=kσ1πkTr[E^kσρ^]E^kσ. \mathcal{M}(\hat{\rho}) = \sum_{k \sigma} \frac{1}{\pi_k} \Tr[\hat{E}_{k \sigma} \hat{\rho}] \hat{E}_{k \sigma}. We'll make this a bit more generic by writing M(ρ^)=iλiTr[E^iρ^]E^i, \mathcal{M}(\hat{\rho}) = \sum_i \lambda_i \Tr[\hat{E}_i \hat{\rho}] \hat{E}_i, with λi>0\lambda_i > 0. As before, we'll take for granted that {E^i}\{ \hat{E}_i \} is a spanning set, so that E\vec{E} has a left inverse.

If we apply this channel (which is linear) to a basis state F^j\hat{F}_j, we get M(F^j)=iλiTr[E^iF^j]E^i=iλiEijE^i=k(ETΛE)jkF^k, \mathcal{M}(\hat{F}_j) = \sum_i \lambda_i \Tr[\hat{E}_i \hat{F}_j] \hat{E}_i = \sum_i \lambda_i E_{i j} \hat{E}_i = \sum_k (\vec{E}\trans \vec{\Lambda} \vec{E})_{j k} \hat{F}_k, where we've put all the λi\lambda_i on the diagonal of Λ\vec{\Lambda}. Naturally, the inverse channel (which should also be a linear map) must give us back the basis state: F^j=k(ETΛE)jkM1(F^k)=k(ETΛE)jkWkF^=δjF^, \hat{F}_j = \sum_k (\vec{E}\trans \vec{\Lambda} \vec{E})_{j k} \mathcal{M}\inv(\hat{F}_k) = \sum_{k \ell} (\vec{E}\trans \vec{\Lambda} \vec{E})_{j k} \mathcal{W}_{k \ell} \hat{F}_\ell = \sum_\ell \delta_{j \ell} \hat{F}_\ell, where the Wk=TrF^M1(F^k) \mathcal{W}_{k \ell} = \Tr \hat{F}_\ell \mathcal{M}\inv(\hat{F}_k) are unique on account of being the basis expansion coefficients of M1(F^k)\mathcal{M}\inv(\hat{F}_k). Hence, Wk=[(ETΛE)1]k, \mathcal{W}_{k \ell} = \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \right]_{k \ell}, so we must have M1(F^k)=[(ETΛE)1]kF^. \mathcal{M}\inv(\hat{F}_k) = \sum_\ell \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \right]_{k \ell} \hat{F}_\ell.

In order to reconstruct ρ^\hat{\rho}, the inverse mapping should be averaged over the measurement outcomes: ρ^=kσπkpk(σ)M1(U^kσ ⁣σU^k), \hat{\rho} = \sum_{k \sigma} \pi_k p_k(\sigma) \mathcal{M}\inv(\hat{U}_k\adj \dyad{\sigma} \hat{U}_k), or in more general terms: ρ^=iλiTr[E^iρ^]M1(E^i). \hat{\rho} = \sum_i \lambda_i \Tr[\hat{E}_i \hat{\rho}] \mathcal{M}\inv(\hat{E}_i). This can be expanded into ρ^=j[(ETΛE)1ETΛp]jF^j, \hat{\rho} = \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{p} \right]_j \hat{F}_j, which we've encountered before. In particular, when p~\vec{\tilde{p}} is an approximation to p\vec{p}, the expansion turns into the weighted least squares estimator: ρ^j[(ETΛE)1ETΛp~]jF^j. \hat{\rho} \approx \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_j \hat{F}_j. This time, each weight is given by the reciprocal of the probability πk\pi_k of choosing the corresponding unitary U^k\hat{U}_k.

Everything is least squares

Once these methods are expressed using the same language, it becomes quite clear that Carrasquilla et al. and Huang et al. are approximating the density operator ρ^\hat{\rho} from measurement probabilities using the standard linear least squares procedure, but each with their own restrictions. In the former, only POVMs with invertible overlap matrices are supported, while the latter only considers POVMs with rank-1 projectors onto orthonormal basis states. Both are approximating ρ^\hat{\rho} from noisy probabilities p~\vec{\tilde{p}} as ρ^j[(ETΛE)1ETΛp~]jF^j \hat{\rho} \approx \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_j \hat{F}_j or, equivalently, ρ^i[ΛE(ETΛE)2ETΛp~]iE^i. \hat{\rho} \approx \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_i \hat{E}_i.

To wrap up, we can do some quick numerical experiments with the above expressions, using Julia with StatsBase.jl. First, some setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
julia> VERSION
v"1.7.3"
julia> using LinearAlgebra, Random; Random.seed!(1);
julia> using StatsBase; # v0.33.13
julia> fidelity(x, y) = real(tr(sqrt(sqrt(x) * y * sqrt(x))));
julia> id = [1 0; 0 1]; sx = [0 1; 1 0];
julia> sy = [0 -im; im 0]; sz = [1 0; 0 -1];
julia> rho_vs = [rand(ComplexF64, 4) for _ in 1:8];
julia> rho = sum(v * v' for v in rho_vs); rho /= tr(rho);
julia> real(rho)
4×4 Matrix{Float64}:
 0.222127  0.216931  0.163605  0.166542
 0.216931  0.276303  0.212351  0.205966
 0.163605  0.212351  0.219397  0.17293
 0.166542  0.205966  0.17293   0.282173
julia> imag(rho)
4×4 Matrix{Float64}:
  0.0          0.00321748  0.0451109  -0.0720395
 -0.00321748   0.0         0.0469695  -0.0950539
 -0.0451109   -0.0469695   0.0        -0.106679
  0.0720395    0.0950539   0.106679    0.0

We'll be trying to reconstruct this random 2-qubit density operator ρ^\hat{\rho} (rho).

The tetrahedral POVM for a single qubit, {E^i}\{ \hat{E}_i \} (Es), has an invertible overlap matrix S\vec{S} (S):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
julia> tet = [[0, 0, 1],
              [2sqrt(2)/3, 0, -1/3],
              [-sqrt(2)/3, sqrt(2/3), -1/3],
              [-sqrt(2)/3, -sqrt(2/3), -1/3]];
julia> bloch(x, y, z) = (id + x * sx + y * sy + z * sz) ./ 2;
julia> Es = [bloch(xyz...) ./ 2 for xyz in tet];
julia> S = hcat([[real(tr(E1 * E2)) for E1 in Es] for E2 in Es]...)
4×4 Matrix{Float64}:
 0.25       0.0833333  0.0833333  0.0833333
 0.0833333  0.25       0.0833333  0.0833333
 0.0833333  0.0833333  0.25       0.0833333
 0.0833333  0.0833333  0.0833333  0.25
julia> rank(S)
4

We can easily decompose it into EET\vec{E} \vec{E}\trans (Emat * Emat'), and then construct S1\vec{S}\inv (inv_mat) the roundabout way:

1
2
3
4
5
6
7
julia> eig = eigen(S);
julia> Emat = eig.vectors * diagm(sqrt.(eig.values));
julia> maximum(abs, Emat * Emat' .- S)
4.996003610813204e-16
julia> inv_mat = Emat * inv(Emat' * Emat)^2 * Emat';
julia> maximum(abs, inv(S) .- inv_mat)
1.0658141036401503e-14

For the reconstruction, we'll need the 16-element tensor product POVM, {E^iE^k}\{ \hat{E}_i \otimes \hat{E}_k \} (Es2):

1
2
3
julia> Es2 = [kron(E1, E2) for E1 in Es for E2 in Es];
julia> length(Es2)
16

Using the exact probability vector p\vec{p} (ps_exact), we can get the exact density matrix:

1
2
3
4
5
6
7
8
julia> ps_exact = [real(tr(E12 * rho)) for E12 in Es2];
julia> sum(ps_exact) - 1.0
0.0
julia> rho_exact = sum(ps_exact .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_exact .- rho)
4.130557004330886e-16
julia> fidelity(rho, rho_exact)
1.0000000000000013

If we draw a random sample using the exact probabilities, we can get fairly close:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
julia> samples = sample(1:length(ps_exact), Weights(ps_exact), 100_000);
julia> ps_approx = counts(samples, length(ps_exact)) / length(samples);
julia> sum(ps_approx) - 1.0
0.0
julia> maximum(abs, ps_exact .- ps_approx)
0.00214799137076567
julia> rho_approx = sum(ps_approx .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_approx .- rho)
0.009381602086482116
julia> fidelity(rho, rho_approx)
0.9987720679289586

We'll use the same setup for the Pauli-6 POVM, but this time the overlap matrix S\vec{S} (S) won't be invertible:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
julia> p6 = hcat([eigvecs(sx), eigvecs(sy), eigvecs(sz)]...);
julia> Es = [v * v' ./ 3 for v in eachcol(p6)];
julia> S = hcat([[real(tr(E1 * E2)) for E1 in Es] for E2 in Es]...)
6×6 Matrix{Float64}:
 0.111111   0.0        0.0555556  0.0555556  0.0555556  0.0555556
 0.0        0.111111   0.0555556  0.0555556  0.0555556  0.0555556
 0.0555556  0.0555556  0.111111   0.0        0.0555556  0.0555556
 0.0555556  0.0555556  0.0        0.111111   0.0555556  0.0555556
 0.0555556  0.0555556  0.0555556  0.0555556  0.111111   0.0
 0.0555556  0.0555556  0.0555556  0.0555556  0.0        0.111111
julia> rank(S)
4
julia> inv(S)
ERROR: SingularException(6)

If we disregard the nullspace of S\vec{S} (S) by throwing away the corresponding eigenvectors, we can still decompose it into EET\vec{E} \vec{E}\trans (Emat * Emat'):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
julia> eig = eigen(S); eig.values
6-element Vector{Float64}:
 2.7755575615628914e-17
 2.7755575615628914e-16
 0.11111111111111106
 0.11111111111111109
 0.11111111111111127
 0.3333333333333332
julia> Emat = eig.vectors[:, 3:end] * diagm(sqrt.(eig.values[3:end]));
julia> maximum(abs, Emat * Emat' .- S)
2.220446049250313e-16

Once again, we'll need the tensor product POVM, {E^iE^k}\{ \hat{E}_i \otimes \hat{E}_k \} (Es2), but this time it will have 36 elements:

1
2
3
julia> Es2 = [kron(E1, E2) for E1 in Es for E2 in Es];
julia> length(Es2)
36

The 1-qubit inverse can no longer be found by inverting S\vec{S} (S), but the general formula still holds:

1
2
3
4
5
6
7
8
9
julia> L = diagm([1/(1/3) for _ in 1:length(Es)]);
julia> inv_mat = L * Emat * inv(Emat' * L * Emat)^2 * Emat' * L
6×6 Matrix{Float64}:
  5.0  -4.0   0.5   0.5   0.5   0.5
 -4.0   5.0   0.5   0.5   0.5   0.5
  0.5   0.5   5.0  -4.0   0.5   0.5
  0.5   0.5  -4.0   5.0   0.5   0.5
  0.5   0.5   0.5   0.5   5.0  -4.0
  0.5   0.5   0.5   0.5  -4.0   5.0

According to Huang et al., the inverse mapping M1\mathcal{M}\inv for each qubit is given by the inverse of the depolarizing channel: M1(E^i)=3E^iλi11^. \mathcal{M}\inv(\hat{E}_i) = 3 \hat{E}_i - \lambda_i\inv \hat{1}. From this, it immediately follows that λiM1(E^i)=k(3λiδik1)E^k, \lambda_i \mathcal{M}\inv(\hat{E}_i) = \sum_k (3 \lambda_i \delta_{i k} - 1) \hat{E}_k, which is equal to the expected expansion, λiM1(E^i)=k[ΛE(ETΛE)2ETΛ]ikE^k, \lambda_i \mathcal{M}\inv(\hat{E}_i) = \sum_k \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \right]_{i k} \hat{E}_k, even though [ΛE(ETΛE)2ETΛ]ik3λiδik1. \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \right]_{i k} \ne 3 \lambda_i \delta_{i k} - 1. This discrepancy is fine, because {E^i}\{ \hat{E}_i \} isn't a basis, so the expansion coefficients aren't unique. Regardless of the choice of expansion, we recover the same operators:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
julia> shadow = 3L - ones(size(L))
6×6 Matrix{Float64}:
  8.0  -1.0  -1.0  -1.0  -1.0  -1.0
 -1.0   8.0  -1.0  -1.0  -1.0  -1.0
 -1.0  -1.0   8.0  -1.0  -1.0  -1.0
 -1.0  -1.0  -1.0   8.0  -1.0  -1.0
 -1.0  -1.0  -1.0  -1.0   8.0  -1.0
 -1.0  -1.0  -1.0  -1.0  -1.0   8.0
julia> inv_mat .- shadow
6×6 Matrix{Float64}:
 -3.0  -3.0   1.5   1.5   1.5   1.5
 -3.0  -3.0   1.5   1.5   1.5   1.5
  1.5   1.5  -3.0  -3.0   1.5   1.5
  1.5   1.5  -3.0  -3.0   1.5   1.5
  1.5   1.5   1.5   1.5  -3.0  -3.0
  1.5   1.5   1.5   1.5  -3.0  -3.0
julia> diff = kron(inv_mat, inv_mat) .- kron(shadow, shadow);
julia> maximum(maximum.(abs, diff * Es2))
1.4210854715202004e-14

As expected, this works in the case of exact probabilities:

1
2
3
4
5
6
7
8
julia> ps_exact = [real(tr(E12 * rho)) for E12 in Es2];
julia> sum(ps_exact) - 1.0
-4.440892098500626e-16
julia> rho_exact = sum(ps_exact .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_exact .- rho)
3.3306690738754696e-16
julia> fidelity(rho, rho_exact)
1.000000000000001

and also approximate ones:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
julia> samples = sample(1:length(ps_exact), Weights(ps_exact), 100_000);
julia> ps_approx = counts(samples, length(ps_exact)) / length(samples);
julia> sum(ps_approx) - 1.0
0.0
julia> maximum(abs, ps_exact .- ps_approx)
0.0011980879766291357
julia> rho_approx = sum(ps_approx .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_approx .- rho)
0.005617328200877472
julia> fidelity(rho, rho_approx)
0.9998102691121425