Classical shadows and POVMs
2022-07-24On several occasions, it's been suggested to me that the papers
- Reconstructing quantum states with generative models by Carrasquilla, Torlai, Melko & Aolita (arXiv:1810.10584) and
- Predicting many properties of a quantum system from very few measurements by Huang, Kueng & Preskill (arXiv:2002.08953)
are very closely related, but that's far from obvious on the surface. Sure, they both describe methods to obtain a density operator from measurement data, but the former uses POVMs and a generative model, whereas the latter inverts a quantum channel. Those definitely sound at least a little bit different!
In their Supplementary Information, Huang et al. mention that
This classical shadow corresponds to the linear inversion (or least squares) estimator of in the single-shot limit.
As it turns out, the method of Carrasquilla et al. is also a special case of linear least squares. Despite their outward appearance, the two approaches are the same under the hood!
Hermitian operators can't hide from least squares
Suppose there's a finite-dimensional density operator to which, by some cruel twist of fate, we have access only through Can we reconstruct from ? The answer turns out to be "it depends on ", but first, let's tackle the slightly more general problem of reconstructing a finite-dimensional Hermitian operator from with Hermitian . Despite acting on a complex Hilbert space, Hermitian operators (for which ) form a real inner product space. The inner product is given by , and we can choose an orthonormal basis , so the basis expansion of is with the unique expansion coefficients .
If we let be the standard basis in which to express matrices and vectors, we can write where the matrix entries are the expansion coefficients of . This is clearly a system of linear equations, and the usual considerations apply. Specifically, we can expect a unique solution iff the rank of is equal to the dimension of the vector space, which happens when is a spanning set. In that case, is invertible and we find the well-known equation For no reason whatsoever, we can generalize it a bit by inserting any positive definite matrix : Thus, is the operator we'd like to recover from . Note that is a left inverse of .
The above assumes that is known exactly, but what if we only have noisy data? Suppose that with a random perturbation . Knowing only , the best we can hope for is to find a vector that minimizes the squares of the residual errors, , because the system will generally be inconsistent. If we assume that all components of the perturbation are uncorrelated, with zero mean and identical variance, the optimal solution turns out to be given by the aptly-named least squares method: If we allow each to have a different variance , then it's better to use the more general weighted least squares: where is a diagonal matrix with the entries , which weights the observations according to their relative uncertainties. Thus, we can approximate the operator we'd like to recover from : It's not hard to see that averaging over the noise gives exactly.
Because we can avoid explicitly mentioning the basis and work exclusively with . Performing the substitution gives us and Building the matrix directly from inner products of with also requires us to consider a specific basis , but we can choose the basis implicitly by decomposing the matrix of mutual overlaps of , with the entries as .
State reconstruction with POVMs is least squares
How does this apply to the reconstruction of ? Because density operators are Hermitian, all of the above is relevant for them as well, and we can obtain from which we can write more compactly as . This is particularly relevant when the are the elements of a POVM, which must be positive semidefinite and sum to identity: the former guarantees that and the latter that If we have some apparatus that measures the state using this POVM, it will give the outcome label with probability . If we run this measurement process long enough, we might get enough data to approximate with and find where the right-hand side isn't guaranteed to be a legitimate density operator on account of the noise. Recall that should either contain the reciprocals of the variances of the random perturbations on the diagonal, or else be any positive definite matrix in the noiseless case.
This inversion of is the approach taken by Carrasquilla et al., but they use the simpler expression which requires the inverse of the overlap matrix . Because is only invertible when is a basis, this works for the tetrahedral and Pauli-4 POVMs, but not for the Pauli-6 POVM, which has more than 4 elements. When exists, does too, so it's straightforward to show that for any arbitrary invertible , we can simplify the awful formula we derived earlier all the way down to a single matrix:
As an aside, Carrasquilla et al. also point out that it's not strictly necessary to reconstruct if it will only be used to compute an expectation value of a Hermitian operator, because If we don't limit ourselves to invertible , we can write that slightly more generally as which has a rather nice symmetry to it.
Classical shadows are least squares
On the other hand, Huang et al. start from a probability distribution over unitary operators and measurement outcomes, and define a quantum channel that maps the density operator to where is the probability of choosing , which we'll take to be discrete, and is the probability of getting the outcome when measuring in the corresponding basis. The elements form a straightforward POVM, and we can write the action of the channel as We'll make this a bit more generic by writing with . As before, we'll take for granted that is a spanning set, so that has a left inverse.
If we apply this channel (which is linear) to a basis state , we get where we've put all the on the diagonal of . Naturally, the inverse channel (which should also be a linear map) must give us back the basis state: where the are unique on account of being the basis expansion coefficients of . Hence, so we must have
In order to reconstruct , the inverse mapping should be averaged over the measurement outcomes: or in more general terms: This can be expanded into which we've encountered before. In particular, when is an approximation to , the expansion turns into the weighted least squares estimator: This time, each weight is given by the reciprocal of the probability of choosing the corresponding unitary .
Everything is least squares
Once these methods are expressed using the same language, it becomes quite clear that Carrasquilla et al. and Huang et al. are approximating the density operator from measurement probabilities using the standard linear least squares procedure, but each with their own restrictions. In the former, only POVMs with invertible overlap matrices are supported, while the latter only considers POVMs with rank-1 projectors onto orthonormal basis states. Both are approximating from noisy probabilities as or, equivalently,
To wrap up, we can do some quick numerical experiments with the above expressions, using Julia with StatsBase.jl. First, some setup:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | julia> VERSION
v"1.7.3"
julia> using LinearAlgebra, Random; Random.seed!(1);
julia> using StatsBase; # v0.33.13
julia> fidelity(x, y) = real(tr(sqrt(sqrt(x) * y * sqrt(x))));
julia> id = [1 0; 0 1]; sx = [0 1; 1 0];
julia> sy = [0 -im; im 0]; sz = [1 0; 0 -1];
julia> rho_vs = [rand(ComplexF64, 4) for _ in 1:8];
julia> rho = sum(v * v' for v in rho_vs); rho /= tr(rho);
julia> real(rho)
4×4 Matrix{Float64}:
0.222127 0.216931 0.163605 0.166542
0.216931 0.276303 0.212351 0.205966
0.163605 0.212351 0.219397 0.17293
0.166542 0.205966 0.17293 0.282173
julia> imag(rho)
4×4 Matrix{Float64}:
0.0 0.00321748 0.0451109 -0.0720395
-0.00321748 0.0 0.0469695 -0.0950539
-0.0451109 -0.0469695 0.0 -0.106679
0.0720395 0.0950539 0.106679 0.0
|
We'll be trying to reconstruct this random 2-qubit density operator (rho
).
The tetrahedral POVM for a single qubit, (Es
), has an invertible overlap matrix (S
):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | julia> tet = [[0, 0, 1],
[2sqrt(2)/3, 0, -1/3],
[-sqrt(2)/3, sqrt(2/3), -1/3],
[-sqrt(2)/3, -sqrt(2/3), -1/3]];
julia> bloch(x, y, z) = (id + x * sx + y * sy + z * sz) ./ 2;
julia> Es = [bloch(xyz...) ./ 2 for xyz in tet];
julia> S = hcat([[real(tr(E1 * E2)) for E1 in Es] for E2 in Es]...)
4×4 Matrix{Float64}:
0.25 0.0833333 0.0833333 0.0833333
0.0833333 0.25 0.0833333 0.0833333
0.0833333 0.0833333 0.25 0.0833333
0.0833333 0.0833333 0.0833333 0.25
julia> rank(S)
4
|
We can easily decompose it into (Emat * Emat'
), and then construct (inv_mat
) the roundabout way:
1 2 3 4 5 6 7 | julia> eig = eigen(S);
julia> Emat = eig.vectors * diagm(sqrt.(eig.values));
julia> maximum(abs, Emat * Emat' .- S)
4.996003610813204e-16
julia> inv_mat = Emat * inv(Emat' * Emat)^2 * Emat';
julia> maximum(abs, inv(S) .- inv_mat)
1.0658141036401503e-14
|
For the reconstruction, we'll need the 16-element tensor product POVM, (Es2
):
1 2 3 | julia> Es2 = [kron(E1, E2) for E1 in Es for E2 in Es];
julia> length(Es2)
16
|
Using the exact probability vector (ps_exact
), we can get the exact density matrix:
1 2 3 4 5 6 7 8 | julia> ps_exact = [real(tr(E12 * rho)) for E12 in Es2];
julia> sum(ps_exact) - 1.0
0.0
julia> rho_exact = sum(ps_exact .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_exact .- rho)
4.130557004330886e-16
julia> fidelity(rho, rho_exact)
1.0000000000000013
|
If we draw a random sample using the exact probabilities, we can get fairly close:
1 2 3 4 5 6 7 8 9 10 11 | julia> samples = sample(1:length(ps_exact), Weights(ps_exact), 100_000);
julia> ps_approx = counts(samples, length(ps_exact)) / length(samples);
julia> sum(ps_approx) - 1.0
0.0
julia> maximum(abs, ps_exact .- ps_approx)
0.00214799137076567
julia> rho_approx = sum(ps_approx .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_approx .- rho)
0.009381602086482116
julia> fidelity(rho, rho_approx)
0.9987720679289586
|
We'll use the same setup for the Pauli-6 POVM, but this time the overlap matrix (S
) won't be invertible:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | julia> p6 = hcat([eigvecs(sx), eigvecs(sy), eigvecs(sz)]...);
julia> Es = [v * v' ./ 3 for v in eachcol(p6)];
julia> S = hcat([[real(tr(E1 * E2)) for E1 in Es] for E2 in Es]...)
6×6 Matrix{Float64}:
0.111111 0.0 0.0555556 0.0555556 0.0555556 0.0555556
0.0 0.111111 0.0555556 0.0555556 0.0555556 0.0555556
0.0555556 0.0555556 0.111111 0.0 0.0555556 0.0555556
0.0555556 0.0555556 0.0 0.111111 0.0555556 0.0555556
0.0555556 0.0555556 0.0555556 0.0555556 0.111111 0.0
0.0555556 0.0555556 0.0555556 0.0555556 0.0 0.111111
julia> rank(S)
4
julia> inv(S)
ERROR: SingularException(6)
|
If we disregard the nullspace of (S
) by throwing away the corresponding eigenvectors, we can still decompose it into (Emat * Emat'
):
1 2 3 4 5 6 7 8 9 10 11 | julia> eig = eigen(S); eig.values
6-element Vector{Float64}:
2.7755575615628914e-17
2.7755575615628914e-16
0.11111111111111106
0.11111111111111109
0.11111111111111127
0.3333333333333332
julia> Emat = eig.vectors[:, 3:end] * diagm(sqrt.(eig.values[3:end]));
julia> maximum(abs, Emat * Emat' .- S)
2.220446049250313e-16
|
Once again, we'll need the tensor product POVM, (Es2
), but this time it will have 36 elements:
1 2 3 | julia> Es2 = [kron(E1, E2) for E1 in Es for E2 in Es];
julia> length(Es2)
36
|
The 1-qubit inverse can no longer be found by inverting (S
), but the general formula still holds:
1 2 3 4 5 6 7 8 9 | julia> L = diagm([1/(1/3) for _ in 1:length(Es)]);
julia> inv_mat = L * Emat * inv(Emat' * L * Emat)^2 * Emat' * L
6×6 Matrix{Float64}:
5.0 -4.0 0.5 0.5 0.5 0.5
-4.0 5.0 0.5 0.5 0.5 0.5
0.5 0.5 5.0 -4.0 0.5 0.5
0.5 0.5 -4.0 5.0 0.5 0.5
0.5 0.5 0.5 0.5 5.0 -4.0
0.5 0.5 0.5 0.5 -4.0 5.0
|
According to Huang et al., the inverse mapping for each qubit is given by the inverse of the depolarizing channel: From this, it immediately follows that which is equal to the expected expansion, even though This discrepancy is fine, because isn't a basis, so the expansion coefficients aren't unique. Regardless of the choice of expansion, we recover the same operators:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | julia> shadow = 3L - ones(size(L))
6×6 Matrix{Float64}:
8.0 -1.0 -1.0 -1.0 -1.0 -1.0
-1.0 8.0 -1.0 -1.0 -1.0 -1.0
-1.0 -1.0 8.0 -1.0 -1.0 -1.0
-1.0 -1.0 -1.0 8.0 -1.0 -1.0
-1.0 -1.0 -1.0 -1.0 8.0 -1.0
-1.0 -1.0 -1.0 -1.0 -1.0 8.0
julia> inv_mat .- shadow
6×6 Matrix{Float64}:
-3.0 -3.0 1.5 1.5 1.5 1.5
-3.0 -3.0 1.5 1.5 1.5 1.5
1.5 1.5 -3.0 -3.0 1.5 1.5
1.5 1.5 -3.0 -3.0 1.5 1.5
1.5 1.5 1.5 1.5 -3.0 -3.0
1.5 1.5 1.5 1.5 -3.0 -3.0
julia> diff = kron(inv_mat, inv_mat) .- kron(shadow, shadow);
julia> maximum(maximum.(abs, diff * Es2))
1.4210854715202004e-14
|
As expected, this works in the case of exact probabilities:
1 2 3 4 5 6 7 8 | julia> ps_exact = [real(tr(E12 * rho)) for E12 in Es2];
julia> sum(ps_exact) - 1.0
-4.440892098500626e-16
julia> rho_exact = sum(ps_exact .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_exact .- rho)
3.3306690738754696e-16
julia> fidelity(rho, rho_exact)
1.000000000000001
|
and also approximate ones:
1 2 3 4 5 6 7 8 9 10 11 | julia> samples = sample(1:length(ps_exact), Weights(ps_exact), 100_000);
julia> ps_approx = counts(samples, length(ps_exact)) / length(samples);
julia> sum(ps_approx) - 1.0
0.0
julia> maximum(abs, ps_exact .- ps_approx)
0.0011980879766291357
julia> rho_approx = sum(ps_approx .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_approx .- rho)
0.005617328200877472
julia> fidelity(rho, rho_approx)
0.9998102691121425
|