13.8 Careful Sample Placement

A classic and effective family of techniques for variance reduction is based on the careful placement of samples in order to better capture the features of the integrand (or, more accurately, to be less likely to miss important features). These techniques are used extensively in pbrt. Indeed, one of the tasks of Samplers in Chapter 7 was to generate well-distributed samples for use by the integrators for just this reason, although at the time we offered only an intuitive sense of why this was worthwhile. Here we will justify that extra work in the context of Monte Carlo integration.

13.8.1 Stratified Sampling

Stratified sampling was first introduced in Section 7.3, and we now have the tools to motivate its use. Stratified sampling works by subdividing the integration domain normal upper Lamda into n nonoverlapping regions normal upper Lamda 1 comma normal upper Lamda 2 comma ellipsis comma normal upper Lamda Subscript n Baseline . Each region is called a stratum, and they must completely cover the original domain:

union Underscript i equals 1 Overscript n Endscripts normal upper Lamda Subscript i Baseline equals normal upper Lamda period

To draw samples from normal upper Lamda , we will draw n Subscript i samples from each normal upper Lamda Subscript i , according to densities p Subscript i inside each stratum. A simple example is supersampling a pixel. With stratified sampling, the area around a pixel is divided into a k times k grid, and a sample is drawn from each grid cell. This is better than taking k squared random samples, since the sample locations are less likely to clump together. Here we will show why this technique reduces variance.

Within a single stratum normal upper Lamda Subscript i , the Monte Carlo estimate is

upper F Subscript i Baseline equals StartFraction 1 Over n Subscript i Baseline EndFraction sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts StartFraction f left-parenthesis upper X Subscript i comma j Baseline right-parenthesis Over p Subscript i Baseline left-parenthesis upper X Subscript i comma j Baseline right-parenthesis EndFraction comma

where upper X Subscript i comma j is the j th sample drawn from density p Subscript i . The overall estimate is upper F equals sigma-summation Underscript i equals 1 Overscript n Endscripts v Subscript i Baseline upper F Subscript i , where v Subscript i is the fractional volume of stratum i ( v Subscript i Baseline element-of left-parenthesis 0 comma 1 right-bracket ).

The true value of the integrand in stratum i is

mu Subscript i Baseline equals upper E left-bracket f left-parenthesis upper X Subscript i comma j Baseline right-parenthesis right-bracket equals StartFraction 1 Over v Subscript i Baseline EndFraction integral Underscript normal upper Lamda Subscript i Baseline Endscripts f left-parenthesis x right-parenthesis normal d x comma

and the variance in this stratum is

sigma Subscript i Superscript 2 Baseline equals StartFraction 1 Over v Subscript i Baseline EndFraction integral Underscript normal upper Lamda Subscript i Baseline Endscripts left-parenthesis f left-parenthesis x right-parenthesis minus mu Subscript i Baseline right-parenthesis squared normal d x period

Thus, with n Subscript i samples in the stratum, the variance of the per-stratum estimator is sigma Subscript i Superscript 2 Baseline slash n Subscript i . This shows that the variance of the overall estimator is

StartLayout 1st Row 1st Column upper V left-bracket upper F right-bracket 2nd Column equals upper V left-bracket sigma-summation v Subscript i Baseline upper F Subscript i Baseline right-bracket 2nd Row 1st Column Blank 2nd Column equals sigma-summation upper V left-bracket v Subscript i Baseline upper F Subscript i Baseline right-bracket 3rd Row 1st Column Blank 2nd Column equals sigma-summation v Subscript i Superscript 2 Baseline upper V left-bracket upper F Subscript i Baseline right-bracket 4th Row 1st Column Blank 2nd Column equals sigma-summation StartFraction v Subscript i Superscript 2 Baseline sigma Subscript i Superscript 2 Baseline Over n Subscript i Baseline EndFraction period EndLayout

If we make the reasonable assumption that the number of samples n Subscript i is proportional to the volume v Subscript i , then we have n Subscript i Baseline equals v Subscript i Baseline upper N , and the variance of the overall estimator is

upper V left-bracket upper F Subscript upper N Baseline right-bracket equals StartFraction 1 Over upper N EndFraction sigma-summation v Subscript i Baseline sigma Subscript i Superscript 2 Baseline period

To compare this result to the variance without stratification, we note that choosing an unstratified sample is equivalent to choosing a random stratum upper I according to the discrete probability distribution defined by the volumes v Subscript i and then choosing a random sample upper X in normal upper Lamda Subscript upper I . In this sense, upper X is chosen conditionally on upper I , so it can be shown using conditional probability that

StartLayout 1st Row 1st Column upper V left-bracket upper F right-bracket 2nd Column equals upper E Subscript x Baseline upper V Subscript i Baseline upper F plus upper V Subscript x Baseline upper E Subscript i Baseline upper F 2nd Row 1st Column Blank 2nd Column equals StartFraction 1 Over upper N EndFraction left-bracket sigma-summation v Subscript i Baseline sigma Subscript i Superscript 2 Baseline plus sigma-summation v Subscript i Baseline left-parenthesis mu Subscript i Baseline minus upper Q right-parenthesis right-bracket comma EndLayout

where upper Q is the mean of f over the whole domain normal upper Lamda . See Veach (1997) for a derivation of this result.

There are two things to notice about this expression. First, we know that the right-hand sum must be nonnegative, since variance is always nonnegative. Second, it demonstrates that stratified sampling can never increase variance. In fact, stratification always reduces variance unless the right-hand sum is exactly 0. It can only be 0 when the function  f has the same mean over each stratum  normal upper Lamda Subscript i . In fact, for stratified sampling to work best, we would like to maximize the right-hand sum, so it is best to make the strata have means that are as unequal as possible. This explains why compact strata are desirable if one does not know anything about the function f . If the strata are wide, they will contain more variation and will have  mu Subscript i closer to the true mean  upper Q .

Figure 13.17 shows the effect of using stratified sampling versus a uniform random distribution for sampling ray directions for glossy reflection. There is a reasonable reduction in variance at essentially no cost in running time.

Figure 13.17: Variance is higher and the image noisier (1) when random sampling is used to compute the effect of glossy reflection than (2) when a stratified distribution of sample directions is used instead. (Compare the edges of the highlights on the ground, for example.)

The main downside of stratified sampling is that it suffers from the same “curse of dimensionality” as standard numerical quadrature. Full stratification in  upper D dimensions with  upper S strata per dimension requires  upper S Superscript upper D samples, which quickly becomes prohibitive. Fortunately, it is often possible to stratify some of the dimensions independently and then randomly associate samples from different dimensions, as was done in Section 7.3. Choosing which dimensions are stratified should be done in a way that stratifies dimensions that tend to be most highly correlated in their effect on the value of the integrand (Owen 1998). For example, for the direct lighting example in Section 13.7.1, it is far more effective to stratify the left-parenthesis x comma y right-parenthesis pixel positions and to stratify the left-parenthesis theta comma phi right-parenthesis ray direction—stratifying left-parenthesis x comma theta right-parenthesis and left-parenthesis y comma phi right-parenthesis would almost certainly be ineffective.

Another solution to the curse of dimensionality that has many of the same advantages of stratification is to use Latin hypercube sampling (also introduced in Section 7.3), which can generate any number of samples independent of the number of dimensions. Unfortunately, Latin hypercube sampling isn’t as effective as stratified sampling at reducing variance, especially as the number of samples taken becomes large. Nevertheless, Latin hypercube sampling is provably no worse than uniform random sampling and is often much better.

13.8.2 Quasi Monte Carlo

The low-discrepancy sampling techniques introduced in Chapter 7 are the foundation of a branch of Monte Carlo called quasi Monte Carlo. The key component of quasi–Monte Carlo techniques is that they replace the pseudo-random numbers used in standard Monte Carlo with low-discrepancy point sets generated by carefully designed deterministic algorithms.

The advantage of this approach is that for many integration problems, quasi–Monte Carlo techniques have asymptotically faster rates of convergence than methods based on standard Monte Carlo. Many of the techniques used in regular Monte Carlo algorithms can be shown to work equally well with quasi–random sample points, including importance sampling. Some others (e.g., rejection sampling) cannot. While the asymptotic convergence rates are not generally applicable to the discontinuous integrands in graphics because they depend on smoothness properties in the integrand, quasi Monte Carlo nevertheless generally performs better than regular Monte Carlo for these integrals in practice. The “Further Reading” section at the end of this chapter has more information about this topic.

In pbrt, we have generally glossed over the differences between these two approaches and have localized them in the Samplers in Chapter 7. This introduces the possibility of subtle errors if a Sampler generates quasi–random sample points that an Integrator then improperly uses as part of an implementation of an algorithm that is not suitable for quasi Monte Carlo. As long as Integrators only use these sample points for importance sampling or other techniques that are applicable in both approaches, this isn’t a problem.

13.8.3 Warping Samples and Distortion

When applying stratified sampling or low-discrepancy sampling to problems like choosing points on light sources for integration for area lighting, pbrt generates a set of samples left-parenthesis u 1 comma u 2 right-parenthesis over the domain left-bracket 0 comma 1 right-parenthesis squared and then uses algorithms based on the transformation methods introduced in Sections 13.5 and 13.6 to map these samples to points on the light source. Implicit in this process is the expectation that the transformation to points on the light source will generally preserve the stratification properties of the samples from left-bracket 0 comma 1 right-parenthesis squared —in other words, nearby samples should map to nearby positions on the surface of the light, and faraway samples should map to far-apart positions on the light. If the mapping does not preserve this property, then the benefits of stratification are lost.

Figure 13.18: (a) Transforming a 4 times 4 stratified sampling pattern to points on a long and thin quadrilateral light source effectively gives less than 16 well-distributed samples; stratification in the vertical direction is not helpful. (b) Samples from a left-parenthesis 0 comma 2 right-parenthesis -sequence remain well distributed even after this transformation.

Shirley’s square-to-circle mapping (Figure 13.13) preserves stratification more effectively than the straightforward mapping (Figure 13.12), which has less compact strata away from the center. This issue also explains why low-discrepancy sequences are generally more effective than stratified patterns in practice: they are more robust with respect to preserving their good distribution properties after being transformed to other domains. Figure 13.18 shows what happens when a set of 16 well-distributed sample points are transformed to be points on a skinny quadrilateral by scaling them to cover its surface; samples from a left-parenthesis 0 comma 2 right-parenthesis -sequence remain well distributed, but samples from a stratified pattern fare much less well.