Sampling from a different Gaussian2017-07-14
One way to evaluate integral ratios of the form
is using Monte Carlo integration.
If is proportional to the pdf of a Gaussian distribution,
then sampling is very easy, with many languages and environments offering a function for rejection-free sampling (e.g. Julia's
For example, we choose the function and a Gaussian distribution with mean and standard deviation . From the definition of the variance (), we see that If we perform the average for samples and bin the results over many trials, the distribution looks like
Suppose that we want to evaluate the same integral ratio as before, but by sampling from a different Gaussian distribution with mean and standard deviation . We introduce so that This gives us to which we apply a change of variables from to , yielding where We don't need to worry about because it would appear in both the numerator and denominator.
Hence, we see that We may verify that this is the case by sampling from and evaluating Binning the results from several trials, we obtain As expected, this distribution looks very similar to the previous one.
From an implementation perspective, this result is incredibly trivial. Since in practice we obtain random values from a standard Gaussian distribution (with zero mean and unit variance), sampling from a Gaussian with specific parameters amounts to a scaling and a shift:
xs1 = 1.5 * randn(SAMPLE_SIZE) + 2.5
It is then clear that sampling with any other parameters and performing the described transformation is functionally identical:
xs2 = 2.8 * randn(SAMPLE_SIZE) + (-1.2) xs1 = (xs2 - (-1.2)) * 1.5 / 2.8 + 2.5
The case of several independent Gaussians is a straightforward generalization of the above. In principle, multiple correlated Gaussians may always be transformed into independent Gaussians, so we may stop here. However, it is interesting to see what the transformation for the sampled values is in terms of the original coordinates.
We now consider a distribution with mean vector and covariance matrix , whose pdf is For convenience, we introduce and diagonalize it as . If we have another distribution with parameters and , then the choice results in This is essentially identical to the univariate , but with the introduction of the transformation matrices and to decouple the coordinates.
As before, we have that where Again, this is not surprising if we consider what is done operationally when sampling from independent standard Gaussian distributions.
We may check this result with and The resulting distributions are: The grey curve is from sampling , while the black curve is from sampling and then transforming the sampled coordinates. The exact result (dotted) is given by So far, we've seen that sampling from any Gaussian is as good as any other.
Application: Finite difference
Consider the function where is a scalar or a vector. We may approximate its derivative using a finite difference formula: More elaborately, its log derivative is given by This is equivalently written as
Suppose that sampling is the most expensive part of the calculation and so we only wish to sample values from a single distribution. From the above discussion, we know that there are functions and such that where the only approximation is due to the finite difference. Specifically, where are the appropriate coordinate transformations. Hence, we may estimate this derivative by sampling from just one distribution rather than three.
For example, consider the multivariate Gaussian with and If we choose the function then is the value we were estimating in the previous section. Now we will estimate with . The distributions that we get are: The black curve is generated by sampling only from (and not or ). The dotted line is found by applying the finite difference method to the exact result. The grey curve (which looks like a horizontal line in the above plot) shows what happens if we sample from all three distributions separately. To really appreciate the difference between the two, it helps to zoom out:
Surprisingly, it seems that we not only save on sampling effort using this technique, but we also significantly reduce the standard error of the mean! This reduction might be due to a favourable cancellation of errors that is only possible when the sampled points are the same.
We may also try to evaluate the derivative of a more convoluted expression. Starting from we may always obtain by setting
For our example, we'll use and the same as before. We obtain a simple form for the expression: Because is the determinant of , which happens to be constant, we have that Sampling the finite difference at yields and Once again, we see an improvement if we only sample once.
It appears that although sampling from one Gaussian and then swapping it out for a different one is formally a useless procedure, there do exist situations where it provides some benefit.