13.10 Importance Sampling

Importance sampling is a powerful variance reduction technique that exploits the fact that the Monte Carlo estimator

upper F Subscript upper N Baseline equals StartFraction 1 Over upper N EndFraction sigma-summation Underscript i equals 1 Overscript upper N Endscripts StartFraction f left-parenthesis upper X Subscript i Baseline right-parenthesis Over p left-parenthesis upper X Subscript i Baseline right-parenthesis EndFraction

converges more quickly if the samples are taken from a distribution p left-parenthesis x right-parenthesis that is similar to the function f left-parenthesis x right-parenthesis in the integrand. The basic idea is that by concentrating work where the value of the integrand is relatively high, an accurate estimate is computed more efficiently (Figure 13.19).

Figure 13.19: (1) Using a stratified uniform distribution of rays over the hemisphere gives an image with much more variance than (2) applying importance sampling and choosing stratified rays from a distribution based on the BRDF.

For example, suppose we are evaluating the scattering equation, Equation (5.9). Consider what happens when this integral is estimated; if a direction is randomly sampled that is nearly perpendicular to the surface normal, the cosine term will be close to 0. All the expense of evaluating the BSDF and tracing a ray to find the incoming radiance at that sample location will be essentially wasted, as the contribution to the final result will be minuscule. In general, we would be better served if we sampled directions in a way that reduced the likelihood of choosing directions near the horizon. More generally, if directions are sampled from distributions that match other factors of the integrand (the BSDF, the incoming illumination distribution, etc.), efficiency is similarly improved.

As long as the random variables are sampled from a probability distribution that is similar in shape to the integrand, variance is reduced. We will not provide a rigorous proof of this fact but will instead present an informal and intuitive argument. Suppose we’re trying to use Monte Carlo techniques to evaluate some integral integral f left-parenthesis x right-parenthesis normal d x . Since we have freedom in choosing a sampling distribution, consider the effect of using a distribution p left-parenthesis x right-parenthesis proportional-to f left-parenthesis x right-parenthesis , or p left-parenthesis x right-parenthesis equals c f left-parenthesis x right-parenthesis . It is trivial to show that normalization forces

c equals StartFraction 1 Over integral f left-parenthesis x right-parenthesis normal d x EndFraction period

Finding such a PDF requires that we know the value of the integral, which is what we were trying to estimate in the first place. Nonetheless, for the purposes of this example, if we could sample from this distribution, each estimate would have the value

StartFraction f left-parenthesis upper X Subscript i Baseline right-parenthesis Over p left-parenthesis upper X Subscript i Baseline right-parenthesis EndFraction equals StartFraction 1 Over c EndFraction equals integral f left-parenthesis x right-parenthesis normal d x period

Since c is a constant, each estimate has the same value, and the variance is zero! Of course, this is ludicrous since we wouldn’t bother using Monte Carlo if we could integrate f directly. However, if a density p left-parenthesis x right-parenthesis can be found that is similar in shape to f left-parenthesis x right-parenthesis , variance decreases.

Importance sampling can increase variance if a poorly chosen distribution is used. To understand the effect of sampling from a PDF that is a poor match for the integrand, consider using the distribution

p left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 99.01 2nd Column x element-of left-bracket 0 comma .01 right-parenthesis 2nd Row 1st Column .01 2nd Column x element-of left-bracket .01 comma 1 right-parenthesis EndLayout

to compute the estimate of integral f left-parenthesis x right-parenthesis normal d x where

f left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column .01 2nd Column x element-of left-bracket 0 comma .01 right-parenthesis 2nd Row 1st Column 1.01 2nd Column x element-of left-bracket .01 comma 1 right-parenthesis period EndLayout

By construction, the value of the integral is one, yet using p left-parenthesis x right-parenthesis to draw samples to compute a Monte Carlo estimate will give a terrible result: almost all of the samples will be in the range left-bracket 0 comma .01 right-parenthesis , where the estimator has the value f left-parenthesis x right-parenthesis slash p left-parenthesis x right-parenthesis almost-equals 0.0001 . For any estimate where none of the samples ends up being outside of this range, the result will be very inaccurate, almost 10,000 times smaller than it should be. Even worse is the case where some samples do end up being taken in the range left-bracket .01 comma 1 right-parenthesis . This will happen rarely, but when it does, we have the combination of a relatively high value of the integrand and a relatively low value of the PDF, f left-parenthesis x right-parenthesis slash p left-parenthesis x right-parenthesis equals 101 . A large number of samples would be necessary to balance out these extremes to reduce variance enough to get a result close to the actual value, 1.

Fortunately, it’s not too hard to find good sampling distributions for importance sampling for many integration problems in graphics. As we implement Integrators in the next three chapters, we will derive a variety of sampling distributions for BSDFs, lights, cameras, and participating media. In many cases, the integrand is the product of more than one function. It can be difficult to construct a PDF that is similar to the complete product, but finding one that is similar to one of the multiplicands is still helpful.

In practice, importance sampling is one of the most frequently used variance reduction techniques in rendering, since it is easy to apply and is very effective when good sampling distributions are used.

13.10.1 Multiple Importance Sampling

Monte Carlo provides tools to estimate integrals of the form integral f left-parenthesis x right-parenthesis normal d x . However, we are frequently faced with integrals that are the product of two or more functions: integral f left-parenthesis x right-parenthesis g left-parenthesis x right-parenthesis normal d x . If we have an importance sampling strategy for f left-parenthesis x right-parenthesis and a strategy for g left-parenthesis x right-parenthesis , which should we use? (Assume that we are not able to combine the two sampling strategies to compute a PDF that is proportional to the product f left-parenthesis x right-parenthesis g left-parenthesis x right-parenthesis that can itself be sampled easily.) As discussed above, a bad choice of sampling distribution can be much worse than just using a uniform distribution.

For example, consider the problem of evaluating direct lighting integrals of the form

upper L Subscript normal o Superscript Baseline left-parenthesis normal p Subscript Baseline comma omega Subscript normal o Baseline right-parenthesis equals integral Underscript script upper S squared Endscripts f left-parenthesis normal p Subscript Baseline comma omega Subscript normal o Baseline comma omega Subscript normal i Baseline right-parenthesis upper L Subscript normal d Superscript Baseline left-parenthesis normal p Subscript Baseline comma omega Subscript normal i Baseline right-parenthesis StartAbsoluteValue cosine theta Subscript normal i Baseline EndAbsoluteValue normal d omega Subscript normal i Baseline period

If we were to perform importance sampling to estimate this integral according to distributions based on either upper L Subscript normal d Superscript or f , one of these two will often perform poorly.

Consider a near-mirror BRDF illuminated by an area light where upper L Subscript normal d Superscript ’s distribution is used to draw samples. Because the BRDF is almost a mirror, the value of the integrand will be close to 0 at all omega Subscript normal i directions except those around the perfect specular reflection direction. This means that almost all of the directions sampled by upper L Subscript normal d Superscript will have 0 contribution, and variance will be quite high. Even worse, as the light source grows large and a larger set of directions is potentially sampled, the value of the PDF decreases, so for the rare directions where the BRDF is non-0 for the sampled direction we will have a large integrand value being divided by a small PDF value. While sampling from the BRDF’s distribution would be a much better approach to this particular case, for diffuse or glossy BRDFs and small light sources, sampling from the BRDF’s distribution can similarly lead to much higher variance than sampling from the light’s distribution.

Unfortunately, the obvious solution of taking some samples from each distribution and averaging the two estimators is little better. Because the variance is additive in this case, this approach doesn’t help—once variance has crept into an estimator, we can’t eliminate it by adding it to another estimator even if it itself has low variance.

Multiple importance sampling (MIS) addresses exactly this issue, with a simple and easy-to-implement technique. The basic idea is that, when estimating an integral, we should draw samples from multiple sampling distributions, chosen in the hope that at least one of them will match the shape of the integrand reasonably well, even if we don’t know which one this will be. MIS provides a method to weight the samples from each technique that can eliminate large variance spikes due to mismatches between the integrand’s value and the sampling density. Specialized sampling routines that only account for unusual special cases are even encouraged, as they reduce variance when those cases occur, with relatively little cost in general. See Figure 14.13 in Chapter 14 to see the reduction in variance from using MIS to compute reflection from direct illumination compared to sampling either just the BSDF or the light’s distribution by itself.

If two sampling distributions p Subscript f and p Subscript g are used to estimate the value of integral f left-parenthesis x right-parenthesis g left-parenthesis x right-parenthesis normal d x , the new Monte Carlo estimator given by MIS is

StartFraction 1 Over n Subscript f Baseline EndFraction sigma-summation Underscript i equals 1 Overscript n Subscript f Baseline Endscripts StartFraction f left-parenthesis upper X Subscript i Baseline right-parenthesis g left-parenthesis upper X Subscript i Baseline right-parenthesis w Subscript f Baseline left-parenthesis upper X Subscript i Baseline right-parenthesis Over p Subscript f Baseline left-parenthesis upper X Subscript i Baseline right-parenthesis EndFraction plus StartFraction 1 Over n Subscript g Baseline EndFraction sigma-summation Underscript j equals 1 Overscript n Subscript g Baseline Endscripts StartFraction f left-parenthesis upper Y Subscript j Baseline right-parenthesis g left-parenthesis upper Y Subscript j Baseline right-parenthesis w Subscript g Baseline left-parenthesis upper Y Subscript j Baseline right-parenthesis Over p Subscript g Baseline left-parenthesis upper Y Subscript j Baseline right-parenthesis EndFraction comma

where n Subscript f is the number of samples taken from the p Subscript f distribution method, n Subscript g is the number of samples taken from p Subscript g , and w Subscript f and w Subscript g are special weighting functions chosen such that the expected value of this estimator is the value of the integral of f left-parenthesis x right-parenthesis g left-parenthesis x right-parenthesis .

The weighting functions take into account all of the different ways that a sample upper X Subscript i or upper Y Subscript j could have been generated, rather than just the particular one that was actually used. A good choice for this weighting function is the balance heuristic:

w Subscript s Baseline left-parenthesis x right-parenthesis equals StartFraction n Subscript s Baseline p Subscript s Baseline left-parenthesis x right-parenthesis Over sigma-summation Underscript i Endscripts n Subscript i Baseline p Subscript i Baseline left-parenthesis x right-parenthesis EndFraction period

The balance heuristic is a provably good way to weight samples to reduce variance.

Consider the effect of this term for the case where a sample upper X has been drawn from the p Subscript f distribution at a point where the value p Subscript f Baseline left-parenthesis upper X right-parenthesis is relatively low. Assuming that p Subscript f is a good match for the shape of f left-parenthesis x right-parenthesis , then the value of f left-parenthesis upper X right-parenthesis will also be relatively low. But suppose that g left-parenthesis upper X right-parenthesis has a relatively high value. The standard importance sampling estimate

StartFraction f left-parenthesis upper X right-parenthesis g left-parenthesis upper X right-parenthesis Over p Subscript f Baseline left-parenthesis upper X right-parenthesis EndFraction

will have a very large value due to p Subscript f Baseline left-parenthesis upper X right-parenthesis being small, and we will have high variance.

With the balance heuristic, the contribution of upper X will be

StartFraction f left-parenthesis upper X right-parenthesis g left-parenthesis upper X right-parenthesis w Subscript f Baseline left-parenthesis upper X right-parenthesis Over p Subscript f Baseline left-parenthesis upper X right-parenthesis EndFraction equals StartFraction f left-parenthesis upper X right-parenthesis g left-parenthesis upper X right-parenthesis n Subscript f Baseline p Subscript f Baseline left-parenthesis upper X right-parenthesis Over p Subscript f Baseline left-parenthesis upper X right-parenthesis left-parenthesis n Subscript f Baseline p Subscript f Baseline left-parenthesis upper X right-parenthesis plus n Subscript g Baseline p Subscript g Baseline left-parenthesis upper X right-parenthesis right-parenthesis EndFraction equals StartFraction f left-parenthesis upper X right-parenthesis g left-parenthesis upper X right-parenthesis n Subscript f Baseline Over n Subscript f Baseline p Subscript f Baseline left-parenthesis upper X right-parenthesis plus n Subscript g Baseline p Subscript g Baseline left-parenthesis upper X right-parenthesis EndFraction period

As long as p Subscript g ’s distribution is a reasonable match for g left-parenthesis x right-parenthesis , then the denominator won’t be too small thanks to the n Subscript g Baseline p Subscript g Baseline left-parenthesis upper X right-parenthesis term, and the huge variance spike is eliminated, even though upper X was sampled from a distribution that was in fact a poor match for the integrand. The fact that another distribution will also be used to generate samples and that this new distribution will likely find a large value of the integrand at upper X are brought together in the weighting term to reduce the variance problem.

Here we provide an implementation of the balance heuristic for the specific case of two distributions p Subscript f and p Subscript g . We will not need a more general multidistribution case in pbrt.

<<Sampling Inline Functions>>+=  
inline Float BalanceHeuristic(int nf, Float fPdf, int ng, Float gPdf) { return (nf * fPdf) / (nf * fPdf + ng * gPdf); }

In practice, the power heuristic often reduces variance even further. For an exponent beta , the power heuristic is

w Subscript s Baseline left-parenthesis x right-parenthesis equals StartFraction left-parenthesis n Subscript s Baseline p Subscript s Baseline left-parenthesis x right-parenthesis right-parenthesis Superscript beta Baseline Over sigma-summation Underscript i Endscripts left-parenthesis n Subscript i Baseline p Subscript i Baseline left-parenthesis x right-parenthesis right-parenthesis Superscript beta Baseline EndFraction period

Veach determined empirically that beta equals 2 is a good value. We have beta equals 2 hard-coded into the implementation here.

<<Sampling Inline Functions>>+= 
inline Float PowerHeuristic(int nf, Float fPdf, int ng, Float gPdf) { Float f = nf * fPdf, g = ng * gPdf; return (f * f) / (f * f + g * g); }