13.10 Importance Sampling
Importance sampling is a powerful variance reduction technique that exploits the fact that the Monte Carlo estimator
converges more quickly if the samples are taken from a distribution that is similar to the function in the integrand. The basic idea is that by concentrating work where the value of the integrand is relatively high, an accurate estimate is computed more efficiently (Figure 13.19).
For example, suppose we are evaluating the scattering equation, Equation (5.9). Consider what happens when this integral is estimated; if a direction is randomly sampled that is nearly perpendicular to the surface normal, the cosine term will be close to 0. All the expense of evaluating the BSDF and tracing a ray to find the incoming radiance at that sample location will be essentially wasted, as the contribution to the final result will be minuscule. In general, we would be better served if we sampled directions in a way that reduced the likelihood of choosing directions near the horizon. More generally, if directions are sampled from distributions that match other factors of the integrand (the BSDF, the incoming illumination distribution, etc.), efficiency is similarly improved.
As long as the random variables are sampled from a probability distribution that is similar in shape to the integrand, variance is reduced. We will not provide a rigorous proof of this fact but will instead present an informal and intuitive argument. Suppose we’re trying to use Monte Carlo techniques to evaluate some integral . Since we have freedom in choosing a sampling distribution, consider the effect of using a distribution , or . It is trivial to show that normalization forces
Finding such a PDF requires that we know the value of the integral, which is what we were trying to estimate in the first place. Nonetheless, for the purposes of this example, if we could sample from this distribution, each estimate would have the value
Since is a constant, each estimate has the same value, and the variance is zero! Of course, this is ludicrous since we wouldn’t bother using Monte Carlo if we could integrate directly. However, if a density can be found that is similar in shape to , variance decreases.
Importance sampling can increase variance if a poorly chosen distribution is used. To understand the effect of sampling from a PDF that is a poor match for the integrand, consider using the distribution
to compute the estimate of where
By construction, the value of the integral is one, yet using to draw samples to compute a Monte Carlo estimate will give a terrible result: almost all of the samples will be in the range , where the estimator has the value . For any estimate where none of the samples ends up being outside of this range, the result will be very inaccurate, almost 10,000 times smaller than it should be. Even worse is the case where some samples do end up being taken in the range . This will happen rarely, but when it does, we have the combination of a relatively high value of the integrand and a relatively low value of the PDF, . A large number of samples would be necessary to balance out these extremes to reduce variance enough to get a result close to the actual value, 1.
Fortunately, it’s not too hard to find good sampling distributions for importance sampling for many integration problems in graphics. As we implement Integrators in the next three chapters, we will derive a variety of sampling distributions for BSDFs, lights, cameras, and participating media. In many cases, the integrand is the product of more than one function. It can be difficult to construct a PDF that is similar to the complete product, but finding one that is similar to one of the multiplicands is still helpful.
In practice, importance sampling is one of the most frequently used variance reduction techniques in rendering, since it is easy to apply and is very effective when good sampling distributions are used.
13.10.1 Multiple Importance Sampling
Monte Carlo provides tools to estimate integrals of the form . However, we are frequently faced with integrals that are the product of two or more functions: . If we have an importance sampling strategy for and a strategy for , which should we use? (Assume that we are not able to combine the two sampling strategies to compute a PDF that is proportional to the product that can itself be sampled easily.) As discussed above, a bad choice of sampling distribution can be much worse than just using a uniform distribution.
For example, consider the problem of evaluating direct lighting integrals of the form
If we were to perform importance sampling to estimate this integral according to distributions based on either or , one of these two will often perform poorly.
Consider a near-mirror BRDF illuminated by an area light where ’s distribution is used to draw samples. Because the BRDF is almost a mirror, the value of the integrand will be close to 0 at all directions except those around the perfect specular reflection direction. This means that almost all of the directions sampled by will have 0 contribution, and variance will be quite high. Even worse, as the light source grows large and a larger set of directions is potentially sampled, the value of the PDF decreases, so for the rare directions where the BRDF is non-0 for the sampled direction we will have a large integrand value being divided by a small PDF value. While sampling from the BRDF’s distribution would be a much better approach to this particular case, for diffuse or glossy BRDFs and small light sources, sampling from the BRDF’s distribution can similarly lead to much higher variance than sampling from the light’s distribution.
Unfortunately, the obvious solution of taking some samples from each distribution and averaging the two estimators is little better. Because the variance is additive in this case, this approach doesn’t help—once variance has crept into an estimator, we can’t eliminate it by adding it to another estimator even if it itself has low variance.
Multiple importance sampling (MIS) addresses exactly this issue, with a simple and easy-to-implement technique. The basic idea is that, when estimating an integral, we should draw samples from multiple sampling distributions, chosen in the hope that at least one of them will match the shape of the integrand reasonably well, even if we don’t know which one this will be. MIS provides a method to weight the samples from each technique that can eliminate large variance spikes due to mismatches between the integrand’s value and the sampling density. Specialized sampling routines that only account for unusual special cases are even encouraged, as they reduce variance when those cases occur, with relatively little cost in general. See Figure 14.13 in Chapter 14 to see the reduction in variance from using MIS to compute reflection from direct illumination compared to sampling either just the BSDF or the light’s distribution by itself.
If two sampling distributions and are used to estimate the value of , the new Monte Carlo estimator given by MIS is
where is the number of samples taken from the distribution method, is the number of samples taken from , and and are special weighting functions chosen such that the expected value of this estimator is the value of the integral of .
The weighting functions take into account all of the different ways that a sample or could have been generated, rather than just the particular one that was actually used. A good choice for this weighting function is the balance heuristic:
The balance heuristic is a provably good way to weight samples to reduce variance.
Consider the effect of this term for the case where a sample has been drawn from the distribution at a point where the value is relatively low. Assuming that is a good match for the shape of , then the value of will also be relatively low. But suppose that has a relatively high value. The standard importance sampling estimate
will have a very large value due to being small, and we will have high variance.
With the balance heuristic, the contribution of will be
As long as ’s distribution is a reasonable match for , then the denominator won’t be too small thanks to the term, and the huge variance spike is eliminated, even though was sampled from a distribution that was in fact a poor match for the integrand. The fact that another distribution will also be used to generate samples and that this new distribution will likely find a large value of the integrand at are brought together in the weighting term to reduce the variance problem.
Here we provide an implementation of the balance heuristic for the specific case of two distributions and . We will not need a more general multidistribution case in pbrt.
In practice, the power heuristic often reduces variance even further. For an exponent , the power heuristic is