2.2 Improving Efficiency
Given an unbiased Monte Carlo estimator, we are in the fortunate position of having a reliable relationship between the number of samples taken and variance (and thus, error). If we have an unacceptably noisy rendered image, increasing the number of samples will reduce error in a predictable way, and—given enough computation—an image of sufficient quality can be generated.
However, computation takes time, and often there is not enough of it. The deadline for a movie may be at hand, or the sixtieth-of-a-second time slice in a real-time renderer may be coming to an end. Given the consequentially limited number of samples, the only option for variance reduction is to find ways to make more of the samples that can be taken. Fortunately, a variety of techniques have been developed to improve the basic Monte Carlo estimator by making the most of the samples that are taken; here we will discuss the most important ones that are used in pbrt.
2.2.1 Stratified Sampling
A classic and effective family of techniques for variance reduction is based on the careful placement of samples in order to better capture the features of the integrand (or, more accurately, to be less likely to miss important features). These techniques are used extensively in pbrt. Stratified sampling decomposes the integration domain into regions and places samples in each one; here we will analyze that approach in terms of its variance reduction properties. Later, in Section 8.2.1, we will return with machinery based on Fourier analysis that provides further insights about it.
Stratified sampling subdivides the integration domain into nonoverlapping regions . Each region is called a stratum, and they must completely cover the original domain:
To draw samples from , we will draw samples from each , according to densities inside each stratum. A simple example is supersampling a pixel. With stratified sampling, the area around a pixel is divided into a grid, and a sample is drawn uniformly within each grid cell. This is better than taking random samples, since the sample locations are less likely to clump together. Here we will show why this technique reduces variance.
Within a single stratum , the Monte Carlo estimate is
where is the th sample drawn from density . The overall estimate is , where is the fractional volume of stratum ().
The true value of the integrand in stratum is
and the variance in this stratum is
Thus, with samples in the stratum, the variance of the per-stratum estimator is . This shows that the variance of the overall estimator is
If we make the reasonable assumption that the number of samples is proportional to the volume , then we have , and the variance of the overall estimator is
To compare this result to the variance without stratification, we note that choosing an unstratified sample is equivalent to choosing a random stratum according to the discrete probability distribution defined by the volumes and then choosing a random sample in . In this sense, is chosen conditionally on , so it can be shown using conditional probability that
where is the mean of over the whole domain .
There are two things to notice about Equation (2.12). First, we know that the right-hand sum must be nonnegative, since variance is always nonnegative. Second, it demonstrates that stratified sampling can never increase variance. Stratification always reduces variance unless the right-hand sum is exactly 0. It can only be 0 when the function has the same mean over each stratum . For stratified sampling to work best, we would like to maximize the right-hand sum, so it is best to make the strata have means that are as unequal as possible. This explains why compact strata are desirable if one does not know anything about the function . If the strata are wide, they will contain more variation and will have closer to the true mean .
Figure 2.1 shows the effect of using stratified sampling versus an independent random distribution for sampling when rendering an image that includes glossy reflection. There is a reasonable reduction in variance at essentially no cost in running time.
The main downside of stratified sampling is that it suffers from the same “curse of dimensionality” as standard numerical quadrature. Full stratification in dimensions with strata per dimension requires samples, which quickly becomes prohibitive. Fortunately, it is often possible to stratify some of the dimensions independently and then randomly associate samples from different dimensions; this approach will be used in Section 8.5. Choosing which dimensions are stratified should be done in a way that stratifies dimensions that tend to be most highly correlated in their effect on the value of the integrand (Owen 1998).
2.2.2 Importance Sampling
Importance sampling is a powerful variance reduction technique that exploits the fact that the Monte Carlo estimator
converges more quickly if the samples are taken from a distribution that is similar to the function in the integrand. In this case, samples are more likely to be taken when the magnitude of the integrand is relatively large. Importance sampling is one of the most frequently used variance reduction techniques in rendering, since it is easy to apply and is very effective when good sampling distributions are used.
To see why such sampling distributions reduce error, first consider the effect of using a distribution , or . It is trivial to show that normalization of the PDF requires that
Finding such a PDF requires that we know the value of the integral, which is what we were trying to estimate in the first place. Nonetheless, if we could sample from this distribution, each term of the sum in the estimator would have the value
The variance of the estimator is zero! Of course, this is ludicrous since we would not bother using Monte Carlo if we could integrate directly. However, if a density can be found that is similar in shape to , variance is reduced.
As a more realistic example, consider the Gaussian function , which is plotted in Figure 2.2(a) over . Its value is close to zero over most of the domain. Samples with or are of little help in estimating the value of the integral since they give no information about the magnitude of the bump in the function’s value around . With uniform sampling and the basic Monte Carlo estimator, variance is approximately .
If samples are instead drawn from the piecewise-constant distribution
which is plotted in Figure 2.2(b), and the estimator from Equation (2.7) is used instead, then variance is reduced by a factor of approximately . A representative set of 6 points from this distribution is shown in Figure 2.2(c); we can see that most of the evaluations of are in the interesting region where it is not nearly zero.
Importance sampling can increase variance if a poorly chosen distribution is used, however. Consider instead using the distribution
for estimating the integral of the Gaussian function. This PDF increases the probability of sampling the function where its value is close to zero and decreases the probability of sampling it where its magnitude is larger.
Not only does this PDF generate fewer samples where the integrand is large, but when it does, the magnitude of in the Monte Carlo estimator will be especially high since in that region. The result is approximately higher variance than uniform sampling, and nearly higher variance than the better PDF above. In the context of Monte Carlo integration for rendering where evaluating the integrand generally involves the expense of tracing a ray, it is desirable to minimize the number of samples taken; using an inferior sampling distribution and making up for it by evaluating more samples is an unappealing option.
2.2.3 Multiple Importance Sampling
We are frequently faced with integrals that are the product of two or more functions: . It is often possible to derive separate importance sampling strategies for individual factors individually, though not one that is similar to their product. This situation is especially common in the integrals involved with light transport, such as in the product of BSDF, incident radiance, and a cosine factor in the light transport equation (1.1).
To understand the challenges involved with applying Monte Carlo to such products, assume for now the good fortune of having two sampling distributions and that match the distributions of and exactly. (In practice, this will not normally be the case.) With the Monte Carlo estimator of Equation (2.7), we have two options: we might draw samples using , which gives the estimator
where is a constant equal to the integral of , since . The variance of this estimator is proportional to the variance of , which may itself be high. Conversely, we might sample from , though doing so gives us an estimator with variance proportional to the variance of , which may similarly be high. In the more common case where the sampling distributions only approximately match one of the factors, the situation is usually even worse.
Unfortunately, the obvious solution of taking some samples from each distribution and averaging the two estimators is not much better. Because variance is additive, once variance has crept into an estimator, we cannot eliminate it by adding it to another low-variance estimator.
Multiple importance sampling (MIS) addresses exactly this issue, with an easy-to-implement variance reduction technique. The basic idea is that, when estimating an integral, we should draw samples from multiple sampling distributions, chosen in the hope that at least one of them will match the shape of the integrand reasonably well, even if we do not know which one this will be. MIS then provides a method to weight the samples from each technique that can eliminate large variance spikes due to mismatches between the integrand’s value and the sampling density. Specialized sampling routines that only account for unusual special cases are even encouraged, as they reduce variance when those cases occur, with relatively little cost in general.
With two sampling distributions and and a single sample taken from each one, and , the MIS Monte Carlo estimator is
where and are weighting functions chosen such that the expected value of this estimator is the value of the integral of .
More generally, given sampling distributions with samples taken from the th distribution, the MIS Monte Carlo estimator is
(The full set of conditions on the weighting functions for the estimator to be unbiased are that they sum to 1 when , and that if .)
Setting corresponds to the case of summing the various estimators, which we have already seen is an ineffective way to reduce variance. It would be better if the weighting functions were relatively large when the corresponding sampling technique was a good match to the integrand and relatively small when it was not, thus reducing the contribution of high-variance samples.
In practice, a good choice for the weighting functions is given by the balance heuristic, which attempts to fulfill this goal by taking into account all the different ways that a sample could have been generated, rather than just the particular one that was used to do so. The balance heuristic’s weighting function for the th sampling technique is
With the balance heuristic and our example of taking a single sample from each of two sampling techniques, the estimator of Equation (2.13) works out to be
Each evaluation of is divided by the sum of all PDFs for the corresponding sample rather than just the one that generated the sample. Thus, if generates a sample with low probability at a point where the has a higher probability, then dividing by reduces the sample’s contribution. Effectively, such samples are downweighted when sampled from , recognizing that the sampling technique associated with is more effective at the corresponding point in the integration domain. As long as just one of the sampling techniques has a reasonable probability of sampling a point where the function’s value is large, the MIS weights can lead to a significant reduction in variance.
BalanceHeuristic() computes Equation (2.14) for the specific case of two distributions and . We will not need a more general multidistribution case in pbrt.
In practice, the power heuristic often reduces variance even further. For an exponent , the power heuristic is
Note that the power heuristic has a similar form to the balance heuristic, though it further reduces the contribution of relatively low probabilities. Our implementation has hard-coded in its implementation; that parameter value usually works well in practice.
Multiple importance sampling can be applied even without sampling from all the distributions. This approach is known as the single sample model. We will not include the derivation here, but it can be shown that given an integrand , if a sampling technique is chosen from a set of techniques with probability and a sample is drawn from , then the single sample estimator
gives an unbiased estimate of the integral. For the single sample model, the balance heuristic is provably optimal.
One shortcoming of multiple importance sampling is that if one of the sampling techniques is a very good match to the integrand, MIS can slightly increase variance. For rendering applications, MIS is almost always worthwhile for the variance reduction it provides in cases that can otherwise have high variance.
MIS Compensation
Multiple importance sampling is generally applied using probability distributions that are all individually valid for importance sampling the integrand, with nonzero probability of generating a sample anywhere that the integrand is nonzero. However, when MIS is being used, it is not a requirement that all PDFs are nonzero where the function’s value is nonzero; only one of them must be.
This observation led to the development of a technique called MIS compensation, which can further reduce variance. It is motivated by the fact that if all the sampling distributions allocate some probability to sampling regions where the integrand’s value is small, it is often the case that that region of the integrand ends up being oversampled, leaving the region where the integrand is high undersampled.
MIS compensation is based on the idea of sharpening one or more (but not all) the probability distributions—for example, by adjusting them to have zero probability in areas where they earlier had low probability. A new sampling distribution can, for example, be defined by
for some fixed value .
This technique is especially easy to apply in the case of tabularized sampling distributions. In Section 12.5, it is used to good effect for sampling environment map light sources.
2.2.4 Russian Roulette
Russian roulette is a technique that can improve the efficiency of Monte Carlo estimates by skipping the evaluation of samples that would make a small contribution to the final result. In rendering, we often have estimators of the form
where the integrand consists of some factors that are easily evaluated (e.g., those that relate to how the surface scatters light) and others that are more expensive to evaluate, such as a binary visibility factor that requires tracing a ray. In these cases, most of the computational expense of evaluating the estimator lies in .
If is zero, it is obviously worth skipping the work of evaluating , since its value will not affect the value of the estimator. However, if we also skipped evaluating estimators where was small but nonzero, then we would introduce bias into the estimator and would systemically underestimate the value of the integrand. Russian roulette solves this problem, making it possible to also skip tracing rays when ’s value is small but not necessarily 0, while still computing the correct value on average.
To apply Russian roulette, we select some termination probability . This value can be chosen in almost any manner; for example, it could be based on an estimate of the value of the integrand for the particular sample chosen, increasing as the integrand’s value becomes smaller. With probability , the estimator is not evaluated for the particular sample, and some constant value is used in its place ( is often used). With probability , the estimator is still evaluated but is weighted by the factor , which effectively compensates for the samples that were skipped.
We have the new estimator
It is easy to see that its expected value is the same as the expected value of the original estimator:
Russian roulette never reduces variance. In fact, unless somehow , it will always increase variance. However, it does improve Monte Carlo efficiency if the probabilities are chosen so that samples that are likely to make a small contribution to the final result are skipped.
2.2.5 Splitting
While Russian roulette reduces the number of samples, splitting increases the number of samples in some dimensions of multidimensional integrals in order to improve efficiency. As an example, consider an integral of the general form
With the standard importance sampling estimator, we might draw samples from independent distributions, and , and compute
Splitting allows us to formalize the idea of taking more than one sample for the integral over for each sample taken in . With splitting, we might take samples for each sample , giving the estimator
If it is possible to partially evaluate for each , then we can compute a total of samples more efficiently than we had taken independent values using Equation (2.18).
For an example from rendering, an integral of the form of Equation (2.17) is evaluated to compute the color of pixels in an image: an integral is taken over the area of the pixel where at each point in the pixel , a ray is traced into the scene and the reflected radiance at the intersection point is computed using an integral over the hemisphere (denoted here by ) for which one or more rays are traced. With splitting, we can take multiple samples for each lighting integral, improving efficiency by amortizing the cost of tracing the initial ray from the camera over them.