Classical shadows and POVMs

2022-07-24

On several occasions, it's been suggested to me that the papers

Reconstructing quantum states with generative models by Carrasquilla, Torlai, Melko & Aolita (arXiv:1810.10584) and
Predicting many properties of a quantum system from very few measurements by Huang, Kueng & Preskill (arXiv:2002.08953)

are very closely related, but that's far from obvious on the surface. Sure, they both describe methods to obtain a density operator from measurement data, but the former uses POVMs and a generative model, whereas the latter inverts a quantum channel. Those definitely sound at least a little bit different!

In their Supplementary Information, Huang et al. mention that

This classical shadow $\hat{\rho}$ corresponds to the linear inversion (or least squares) estimator of $\rho$ in the single-shot limit.

As it turns out, the method of Carrasquilla et al. is also a special case of linear least squares. Despite their outward appearance, the two approaches are the same under the hood!

Hermitian operators can't hide from least squares

Suppose there's a finite-dimensional density operator $\hat{\rho}$ to which, by some cruel twist of fate, we have access only through $p_i = \Tr \hat{E}_i \hat{\rho}.$ Can we reconstruct $\hat{\rho}$ from $\{ p_i \}$ ? The answer turns out to be "it depends on $\{ \hat{E}_i \}$ ", but first, let's tackle the slightly more general problem of reconstructing a finite-dimensional Hermitian operator $\hat{A}$ from $y_i = \Tr \hat{E}_i \hat{A}$ with Hermitian $\hat{E}_i$ . Despite acting on a complex Hilbert space, Hermitian operators (for which $\hat{A}\adj = \hat{A}$ ) form a real inner product space. The inner product is given by $\Tr \hat{A} \hat{B}$ , and we can choose an orthonormal basis $\{ \hat{F}_j \}$ , so the basis expansion of $\hat{A}$ is $\hat{A} = \sum_j a_j \hat{F}_j,$ with the unique expansion coefficients $a_j = \Tr[\hat{F}_j \hat{A}]$ .

If we let $\{ \hat{F}_j \}$ be the standard basis in which to express matrices and vectors, we can write $\vec{y} = \vec{E} \vec{a},$ where the matrix entries $E_{i j} = \Tr{\hat{F}_j \hat{E}_i}$ are the expansion coefficients of $\hat{E}_i$ . This is clearly a system of linear equations, and the usual considerations apply. Specifically, we can expect a unique solution iff the rank of $\vec{E}$ is equal to the dimension of the vector space, which happens when $\{ \hat{E}_i \}$ is a spanning set. In that case, $\vec{E}\trans \vec{E}$ is invertible and we find the well-known equation $\vec{a} = (\vec{E}\trans \vec{E})\inv \vec{E}\trans \vec{y}.$ For no reason whatsoever, we can generalize it a bit by inserting any positive definite matrix $\vec{\Lambda}$ : $\vec{a} = (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{y}.$ Thus, $\hat{A} = \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{y} \right]_j \hat{F}_j$ is the operator we'd like to recover from $\vec{y}$ . Note that $(\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda}$ is a left inverse of $\vec{E}$ .

The above assumes that $\vec{y}$ is known exactly, but what if we only have noisy data? Suppose that $\vec{\tilde{y}} = \vec{E} \vec{a} + \vecg{\varepsilon},$ with a random perturbation $\vecg{\varepsilon}$ . Knowing only $\vec{\tilde{y}}$ , the best we can hope for is to find a vector $\vec{\tilde{a}}$ that minimizes the squares of the residual errors, $| \tilde{y}_i - (\vec{E} \vec{\tilde{a}})_i |^2$ , because the system will generally be inconsistent. If we assume that all components of the perturbation are uncorrelated, with zero mean and identical variance, the optimal solution turns out to be given by the aptly-named least squares method: $\vec{\tilde{a}} = (\vec{E}\trans \vec{E})\inv \vec{E}\trans \vec{\tilde{y}}.$ If we allow each $\varepsilon_i$ to have a different variance $\sigma_i^2 > 0$ , then it's better to use the more general weighted least squares: $\vec{\tilde{a}} = (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{y}},$ where $\vec{\Lambda}$ is a diagonal matrix with the entries $\Lambda_{i i} = \lambda_i = 1 / \sigma_i^2$ , which weights the observations according to their relative uncertainties. Thus, we can approximate the operator we'd like to recover from $\vec{\tilde{y}}$ : $\hat{A} \approx \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{y}} \right]_j \hat{F}_j.$ It's not hard to see that averaging over the noise gives $\hat{A}$ exactly.

Because $\hat{F}_j = \sum_i \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \right]_{j i} \hat{E}_i,$ we can avoid explicitly mentioning the basis $\{ \hat{F}_j \}$ and work exclusively with $\{ \hat{E}_i \}$ . Performing the substitution gives us $\hat{A} = \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{y} \right]_i \hat{E}_i$ and $\hat{A} \approx \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{\tilde{y}} \right]_i \hat{E}_i.$ Building the matrix $\vec{E}$ directly from inner products of $\hat{E}_i$ with $\hat{F}_j$ also requires us to consider a specific basis $\{ \hat{F}_j \}$ , but we can choose the basis implicitly by decomposing the matrix of mutual overlaps of $\{ \hat{E}_i \}$ , with the entries $S_{i k} = \Tr \hat{E}_i \hat{E}_k,$ as $\vec{S} = \vec{E} \vec{E}\trans$ .

State reconstruction with POVMs is least squares

How does this apply to the reconstruction of $\hat{\rho}$ ? Because density operators are Hermitian, all of the above is relevant for them as well, and we can obtain $\hat{\rho}$ $\hat{\rho} = \sum_j r_j \hat{F}_j$ from $p_i = \Tr \hat{E_i} \hat{\rho},$ which we can write more compactly as $\vec{p} = \vec{E} \vec{r}$ . This is particularly relevant when the $\hat{E_i}$ are the elements of a POVM, which must be positive semidefinite and sum to identity: $\sum_i \hat{E_i} = \hat{1};$ the former guarantees that $0 \le p_i$ and the latter that $\sum_i p_i = \Tr \hat{\rho} = 1.$ If we have some apparatus that measures the state $\hat{\rho}$ using this POVM, it will give the outcome label $i$ with probability $p_i$ . If we run this measurement process long enough, we might get enough data to approximate $\vec{p}$ with $\vec{\tilde{p}}$ and find $\hat{\rho} \approx \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_i \hat{E}_i,$ where the right-hand side isn't guaranteed to be a legitimate density operator on account of the noise. Recall that $\vec{\Lambda}$ should either contain the reciprocals of the variances of the random perturbations on the diagonal, or else be any positive definite matrix in the noiseless case.

This inversion of $\vec{E}$ is the approach taken by Carrasquilla et al., but they use the simpler expression $\hat{\rho} \approx \sum_i (\vec{S}\inv \vec{\tilde{p}})_i \hat{E}_i,$ which requires the inverse of the overlap matrix $\vec{S}$ . Because $\vec{S}$ is only invertible when $\{ \hat{E}_i \}$ is a basis, this works for the tetrahedral and Pauli-4 POVMs, but not for the Pauli-6 POVM, which has more than 4 elements. When $\vec{S}\inv$ exists, $\vec{E}\inv$ does too, so it's straightforward to show that for any arbitrary invertible $\vec{\Lambda}$ , we can simplify the awful formula we derived earlier all the way down to a single matrix: $\vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} = \vec{\Lambda} \vec{E} \vec{E}\inv \vec{\Lambda}\inv \vec{E}\invtrans \vec{E}\inv \vec{\Lambda}\inv \vec{E}\invtrans \vec{E}\trans \vec{\Lambda} = \vec{E}\invtrans \vec{E}\inv = \vec{S}\inv.$

As an aside, Carrasquilla et al. also point out that it's not strictly necessary to reconstruct $\hat{\rho}$ if it will only be used to compute an expectation value of a Hermitian operator, because $\langle \hat{A} \rangle = \Tr \hat{\rho} \hat{A} = \Tr \hat{\rho} \sum_{i k} (\vec{S}\inv)_{i k} \Tr[\hat{E}_k \hat{A}] \hat{E}_i = \sum_{i k} p_i (\vec{S}\inv)_{i k} \Tr[\hat{E}_k \hat{A}].$ If we don't limit ourselves to invertible $\vec{S}$ , we can write that slightly more generally as $\langle \hat{A} \rangle = \Tr \hat{\rho} \hat{A} = \sum_{i j k} \Tr[\hat{E}_i \hat{\rho}] \left[ \vec{E} (\vec{E}\trans \vec{E})\inv \right]_{i j} \left[ (\vec{E}\trans \vec{E})\inv \vec{E}\trans \right]_{j k} \Tr[\hat{E}_k \hat{A}],$ which has a rather nice symmetry to it.

Classical shadows are least squares

On the other hand, Huang et al. start from a probability distribution over unitary operators and measurement outcomes, and define a quantum channel $\mathcal{M}$ that maps the density operator $\hat{\rho}$ to $\mathcal{M}(\hat{\rho}) = \sum_{k \sigma} \pi_k p_k(\sigma) \hat{U}_k\adj \dyad{\sigma} \hat{U}_k,$ where $\pi_k$ is the probability of choosing $\hat{U}_k$ , which we'll take to be discrete, and $p_k(\sigma) = \Tr \hat{U}_k \hat{\rho} \hat{U}_k\adj \dyad{\sigma}$ is the probability of getting the outcome $\sigma$ when measuring $\hat{\rho}$ in the corresponding basis. The elements $\hat{E}_{k \sigma} = \pi_k \hat{U}_k\adj \dyad{\sigma} \hat{U}_k$ form a straightforward POVM, and we can write the action of the channel as $\mathcal{M}(\hat{\rho}) = \sum_{k \sigma} \frac{1}{\pi_k} \Tr[\hat{E}_{k \sigma} \hat{\rho}] \hat{E}_{k \sigma}.$ We'll make this a bit more generic by writing $\mathcal{M}(\hat{\rho}) = \sum_i \lambda_i \Tr[\hat{E}_i \hat{\rho}] \hat{E}_i,$ with $\lambda_i > 0$ . As before, we'll take for granted that $\{ \hat{E}_i \}$ is a spanning set, so that $\vec{E}$ has a left inverse.

If we apply this channel (which is linear) to a basis state $\hat{F}_j$ , we get $\mathcal{M}(\hat{F}_j) = \sum_i \lambda_i \Tr[\hat{E}_i \hat{F}_j] \hat{E}_i = \sum_i \lambda_i E_{i j} \hat{E}_i = \sum_k (\vec{E}\trans \vec{\Lambda} \vec{E})_{j k} \hat{F}_k,$ where we've put all the $\lambda_i$ on the diagonal of $\vec{\Lambda}$ . Naturally, the inverse channel (which should also be a linear map) must give us back the basis state: $\hat{F}_j = \sum_k (\vec{E}\trans \vec{\Lambda} \vec{E})_{j k} \mathcal{M}\inv(\hat{F}_k) = \sum_{k \ell} (\vec{E}\trans \vec{\Lambda} \vec{E})_{j k} \mathcal{W}_{k \ell} \hat{F}_\ell = \sum_\ell \delta_{j \ell} \hat{F}_\ell,$ where the $\mathcal{W}_{k \ell} = \Tr \hat{F}_\ell \mathcal{M}\inv(\hat{F}_k)$ are unique on account of being the basis expansion coefficients of $\mathcal{M}\inv(\hat{F}_k)$ . Hence, $\mathcal{W}_{k \ell} = \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \right]_{k \ell},$ so we must have $\mathcal{M}\inv(\hat{F}_k) = \sum_\ell \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \right]_{k \ell} \hat{F}_\ell.$

In order to reconstruct $\hat{\rho}$ , the inverse mapping should be averaged over the measurement outcomes: $\hat{\rho} = \sum_{k \sigma} \pi_k p_k(\sigma) \mathcal{M}\inv(\hat{U}_k\adj \dyad{\sigma} \hat{U}_k),$ or in more general terms: $\hat{\rho} = \sum_i \lambda_i \Tr[\hat{E}_i \hat{\rho}] \mathcal{M}\inv(\hat{E}_i).$ This can be expanded into $\hat{\rho} = \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{p} \right]_j \hat{F}_j,$ which we've encountered before. In particular, when $\vec{\tilde{p}}$ is an approximation to $\vec{p}$ , the expansion turns into the weighted least squares estimator: $\hat{\rho} \approx \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_j \hat{F}_j.$ This time, each weight is given by the reciprocal of the probability $\pi_k$ of choosing the corresponding unitary $\hat{U}_k$ .

Everything is least squares

Once these methods are expressed using the same language, it becomes quite clear that Carrasquilla et al. and Huang et al. are approximating the density operator $\hat{\rho}$ from measurement probabilities using the standard linear least squares procedure, but each with their own restrictions. In the former, only POVMs with invertible overlap matrices are supported, while the latter only considers POVMs with rank-1 projectors onto orthonormal basis states. Both are approximating $\hat{\rho}$ from noisy probabilities $\vec{\tilde{p}}$ as $\hat{\rho} \approx \sum_j \left[ (\vec{E}\trans \vec{\Lambda} \vec{E})\inv \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_j \hat{F}_j$ or, equivalently, $\hat{\rho} \approx \sum_i \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \vec{\tilde{p}} \right]_i \hat{E}_i.$

To wrap up, we can do some quick numerical experiments with the above expressions, using Julia with StatsBase.jl. First, some setup:

julia> VERSION
v"1.7.3"
julia> using LinearAlgebra, Random; Random.seed!(1);
julia> using StatsBase; # v0.33.13
julia> fidelity(x, y) = real(tr(sqrt(sqrt(x) * y * sqrt(x))));
julia> id = [1 0; 0 1]; sx = [0 1; 1 0];
julia> sy = [0 -im; im 0]; sz = [1 0; 0 -1];
julia> rho_vs = [rand(ComplexF64, 4) for _ in 1:8];
julia> rho = sum(v * v' for v in rho_vs); rho /= tr(rho);
julia> real(rho)
4×4 Matrix{Float64}:
 0.222127  0.216931  0.163605  0.166542
 0.216931  0.276303  0.212351  0.205966
 0.163605  0.212351  0.219397  0.17293
 0.166542  0.205966  0.17293   0.282173
julia> imag(rho)
4×4 Matrix{Float64}:
  0.0          0.00321748  0.0451109  -0.0720395
 -0.00321748   0.0         0.0469695  -0.0950539
 -0.0451109   -0.0469695   0.0        -0.106679
  0.0720395    0.0950539   0.106679    0.0

We'll be trying to reconstruct this random 2-qubit density operator $\hat{\rho}$ (rho).

The tetrahedral POVM for a single qubit, $\{ \hat{E}_i \}$ (Es), has an invertible overlap matrix $\vec{S}$ (S):

julia> tet = [[0, 0, 1],
              [2sqrt(2)/3, 0, -1/3],
              [-sqrt(2)/3, sqrt(2/3), -1/3],
              [-sqrt(2)/3, -sqrt(2/3), -1/3]];
julia> bloch(x, y, z) = (id + x * sx + y * sy + z * sz) ./ 2;
julia> Es = [bloch(xyz...) ./ 2 for xyz in tet];
julia> S = hcat([[real(tr(E1 * E2)) for E1 in Es] for E2 in Es]...)
4×4 Matrix{Float64}:
 0.25       0.0833333  0.0833333  0.0833333
 0.0833333  0.25       0.0833333  0.0833333
 0.0833333  0.0833333  0.25       0.0833333
 0.0833333  0.0833333  0.0833333  0.25
julia> rank(S)
4

We can easily decompose it into $\vec{E} \vec{E}\trans$ (Emat * Emat'), and then construct $\vec{S}\inv$ (inv_mat) the roundabout way:

julia> eig = eigen(S);
julia> Emat = eig.vectors * diagm(sqrt.(eig.values));
julia> maximum(abs, Emat * Emat' .- S)
4.996003610813204e-16
julia> inv_mat = Emat * inv(Emat' * Emat)^2 * Emat';
julia> maximum(abs, inv(S) .- inv_mat)
1.0658141036401503e-14

For the reconstruction, we'll need the 16-element tensor product POVM, $\{ \hat{E}_i \otimes \hat{E}_k \}$ (Es2):

1
2
3

julia> Es2 = [kron(E1, E2) for E1 in Es for E2 in Es];
julia> length(Es2)
16

Using the exact probability vector $\vec{p}$ (ps_exact), we can get the exact density matrix:

julia> ps_exact = [real(tr(E12 * rho)) for E12 in Es2];
julia> sum(ps_exact) - 1.0
0.0
julia> rho_exact = sum(ps_exact .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_exact .- rho)
4.130557004330886e-16
julia> fidelity(rho, rho_exact)
1.0000000000000013

If we draw a random sample using the exact probabilities, we can get fairly close:

julia> samples = sample(1:length(ps_exact), Weights(ps_exact), 100_000);
julia> ps_approx = counts(samples, length(ps_exact)) / length(samples);
julia> sum(ps_approx) - 1.0
0.0
julia> maximum(abs, ps_exact .- ps_approx)
0.00214799137076567
julia> rho_approx = sum(ps_approx .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_approx .- rho)
0.009381602086482116
julia> fidelity(rho, rho_approx)
0.9987720679289586

We'll use the same setup for the Pauli-6 POVM, but this time the overlap matrix $\vec{S}$ (S) won't be invertible:

julia> p6 = hcat([eigvecs(sx), eigvecs(sy), eigvecs(sz)]...);
julia> Es = [v * v' ./ 3 for v in eachcol(p6)];
julia> S = hcat([[real(tr(E1 * E2)) for E1 in Es] for E2 in Es]...)
6×6 Matrix{Float64}:
 0.111111   0.0        0.0555556  0.0555556  0.0555556  0.0555556
 0.0        0.111111   0.0555556  0.0555556  0.0555556  0.0555556
 0.0555556  0.0555556  0.111111   0.0        0.0555556  0.0555556
 0.0555556  0.0555556  0.0        0.111111   0.0555556  0.0555556
 0.0555556  0.0555556  0.0555556  0.0555556  0.111111   0.0
 0.0555556  0.0555556  0.0555556  0.0555556  0.0        0.111111
julia> rank(S)
4
julia> inv(S)
ERROR: SingularException(6)

If we disregard the nullspace of $\vec{S}$ (S) by throwing away the corresponding eigenvectors, we can still decompose it into $\vec{E} \vec{E}\trans$ (Emat * Emat'):

julia> eig = eigen(S); eig.values
6-element Vector{Float64}:
 2.7755575615628914e-17
 2.7755575615628914e-16
 0.11111111111111106
 0.11111111111111109
 0.11111111111111127
 0.3333333333333332
julia> Emat = eig.vectors[:, 3:end] * diagm(sqrt.(eig.values[3:end]));
julia> maximum(abs, Emat * Emat' .- S)
2.220446049250313e-16

Once again, we'll need the tensor product POVM, $\{ \hat{E}_i \otimes \hat{E}_k \}$ (Es2), but this time it will have 36 elements:

1
2
3

julia> Es2 = [kron(E1, E2) for E1 in Es for E2 in Es];
julia> length(Es2)
36

The 1-qubit inverse can no longer be found by inverting $\vec{S}$ (S), but the general formula still holds:

julia> L = diagm([1/(1/3) for _ in 1:length(Es)]);
julia> inv_mat = L * Emat * inv(Emat' * L * Emat)^2 * Emat' * L
6×6 Matrix{Float64}:
  5.0  -4.0   0.5   0.5   0.5   0.5
 -4.0   5.0   0.5   0.5   0.5   0.5
  0.5   0.5   5.0  -4.0   0.5   0.5
  0.5   0.5  -4.0   5.0   0.5   0.5
  0.5   0.5   0.5   0.5   5.0  -4.0
  0.5   0.5   0.5   0.5  -4.0   5.0

According to Huang et al., the inverse mapping $\mathcal{M}\inv$ for each qubit is given by the inverse of the depolarizing channel: $\mathcal{M}\inv(\hat{E}_i) = 3 \hat{E}_i - \lambda_i\inv \hat{1}.$ From this, it immediately follows that $\lambda_i \mathcal{M}\inv(\hat{E}_i) = \sum_k (3 \lambda_i \delta_{i k} - 1) \hat{E}_k,$ which is equal to the expected expansion, $\lambda_i \mathcal{M}\inv(\hat{E}_i) = \sum_k \left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \right]_{i k} \hat{E}_k,$ even though $\left[ \vec{\Lambda} \vec{E} (\vec{E}\trans \vec{\Lambda} \vec{E})\invsq \vec{E}\trans \vec{\Lambda} \right]_{i k} \ne 3 \lambda_i \delta_{i k} - 1.$ This discrepancy is fine, because $\{ \hat{E}_i \}$ isn't a basis, so the expansion coefficients aren't unique. Regardless of the choice of expansion, we recover the same operators:

julia> shadow = 3L - ones(size(L))
6×6 Matrix{Float64}:
  8.0  -1.0  -1.0  -1.0  -1.0  -1.0
 -1.0   8.0  -1.0  -1.0  -1.0  -1.0
 -1.0  -1.0   8.0  -1.0  -1.0  -1.0
 -1.0  -1.0  -1.0   8.0  -1.0  -1.0
 -1.0  -1.0  -1.0  -1.0   8.0  -1.0
 -1.0  -1.0  -1.0  -1.0  -1.0   8.0
julia> inv_mat .- shadow
6×6 Matrix{Float64}:
 -3.0  -3.0   1.5   1.5   1.5   1.5
 -3.0  -3.0   1.5   1.5   1.5   1.5
  1.5   1.5  -3.0  -3.0   1.5   1.5
  1.5   1.5  -3.0  -3.0   1.5   1.5
  1.5   1.5   1.5   1.5  -3.0  -3.0
  1.5   1.5   1.5   1.5  -3.0  -3.0
julia> diff = kron(inv_mat, inv_mat) .- kron(shadow, shadow);
julia> maximum(maximum.(abs, diff * Es2))
1.4210854715202004e-14

As expected, this works in the case of exact probabilities:

julia> ps_exact = [real(tr(E12 * rho)) for E12 in Es2];
julia> sum(ps_exact) - 1.0
-4.440892098500626e-16
julia> rho_exact = sum(ps_exact .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_exact .- rho)
3.3306690738754696e-16
julia> fidelity(rho, rho_exact)
1.000000000000001

and also approximate ones:

julia> samples = sample(1:length(ps_exact), Weights(ps_exact), 100_000);
julia> ps_approx = counts(samples, length(ps_exact)) / length(samples);
julia> sum(ps_approx) - 1.0
0.0
julia> maximum(abs, ps_exact .- ps_approx)
0.0011980879766291357
julia> rho_approx = sum(ps_approx .* kron(inv_mat, inv_mat) * Es2);
julia> maximum(abs, rho_approx .- rho)
0.005617328200877472
julia> fidelity(rho, rho_approx)
0.9998102691121425