8.2 Sampling and Integration

The lighting integration algorithms used throughout pbrt are based on Monte Carlo integration, yet the focus of Section 8.1 was on sampling and reconstruction. That topic is an important one for understanding aliasing and the use of filters for image reconstruction, but it is a different one than minimizing Monte Carlo integration error. There are a number of connections between Monte Carlo and both Fourier analysis and other approaches to analyzing point-sampling algorithms, however. For example, jittered sampling is a form of stratified sampling, a variance reduction technique that was introduced in Section 2.2.1. Thus, we can see that jittered sampling is advantageous from both perspectives.

Given multiple perspectives on the problem, one might ask, what is the best sampling approach to use for Monte Carlo integration? There is no easy answer to this question, which is reflected by the fact that this chapter presents a total of 6 classes that implement the upcoming Sampler interface to generate sample points, though a number of them offer a few variations of an underlying sampling approach, giving a total of 17 different techniques.

Although some of this variety is for pedagogy, it is largely due to the fact that the question of which sampling technique is best is not easily answered. Not only do the various mathematical approaches to analyzing sampling techniques often disagree, but another difficulty comes from the human visual system: rendered images are generally for human consumption and most mathematical approaches for evaluating sampling patterns do not account for this fact. Later in this chapter, we will see that sampling patterns that lead to errors in the image having blue noise characteristics are visually preferable, yet may not have any lower numeric error than those that do not. Thus, pbrt provides a variety of options, allowing the user to make their own choice among them.

8.2.1 Fourier Analysis of Variance

Fourier analysis can also be applied to evaluate sampling patterns in the context of Monte Carlo integration, leading to insights about both variance and the convergence rates of various sampling algorithms. We will make three simplifications in our treatment of this topic here. There are more general forms of the theory that do not require these, though they are more complex. (As always, see the “Further Reading” section for more information.) We assume that:

  1. The sample points are uniformly distributed and equally weighted (i.e., importance sampling is not being used).
  2. The Monte Carlo estimator used is unbiased.
  3. The properties of the sample points are homogeneous with respect to toroidal translation over the sampling domain. (If they are not, the analysis is effectively over all possible random translations of the sample points.)

Excluding importance sampling has obvious implications, though we note that the last assumption, homogeneity, is also significant. Many of the sampling approaches later in this chapter are based on decomposing the left-bracket 0 comma 1 right-parenthesis Superscript n sampling domain into strata and placing a single sample in each one. Homogenizing such algorithms causes some of those regions to wrap around the boundaries of the domain, which harms their effectiveness. Equivalently, homogenization can be seen as toroidally translating the function being integrated, which can introduce discontinuities that were not previously present. Nevertheless, we will see that there is still much useful insight to be had about the behavior of sampling patterns in spite of these simplifications.

Our first step is to introduce the Fourier series representation of functions, which we will use as the basis for analysis of sampling patterns for the remainder of this section. The Fourier transform assumes that the function f left-parenthesis x right-parenthesis has infinite extent, while for rendering we are generally operating over the left-bracket 0 comma 1 right-parenthesis Superscript n domain or on mappings from there to other finite domains such as the unit hemisphere. While it is tempting to apply the Fourier transform as is, defining f left-parenthesis x right-parenthesis to be zero outside the domain of interest, doing so introduces a discontinuity in the function at the boundaries that leads to error due to the Gibbs phenomenon in the Fourier coefficients. Fourier series are defined over a specific finite domain and so do not suffer from this problem.

The Fourier series represents a function using an infinite set of coefficients f Subscript j for all integer-valued j greater-than-or-equal-to 0 . (We use j to index coefficients in order to avoid confusion with the use of normal i for the unit imaginary number.) For the left-bracket 0 comma 1 right-parenthesis domain, the coefficients are given by

f Subscript j Baseline equals integral Underscript left-bracket 0 comma 1 right-parenthesis Endscripts f left-parenthesis x right-parenthesis normal e Superscript minus normal i Baseline 2 pi j x Baseline normal d x period
(8.8)

(Domains other than left-bracket 0 comma 1 right-parenthesis can be handled using a straightforward reparameterization.)

Expressed using the Fourier series coefficients, the original function is

f left-parenthesis x right-parenthesis equals sigma-summation Underscript j element-of bold upper Z Endscripts f Subscript j Baseline normal e Superscript minus normal i Baseline 2 pi j x Baseline period
(8.9)

It can be shown that the continuous Fourier transform corresponds to the limit of taking the Fourier series with an infinite extent.

The PSD of a function in the Fourier series basis is given by the product of each coefficient with its complex conjugate,

script upper P Subscript f Baseline left-parenthesis j right-parenthesis equals f Subscript j Baseline f Subscript j Baseline overbar period

In order to analyze Monte Carlo integration in frequency space, we will start by defining the sampling function s left-parenthesis x right-parenthesis for a set of sample points x Subscript i as the averaged sum of n samples, each represented by a delta distribution,

s left-parenthesis x right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts delta left-parenthesis x minus x Subscript i Baseline right-parenthesis period

Given the sampling function, it is possible to rewrite the Monte Carlo estimator as an integral:

StartLayout 1st Row 1st Column integral Underscript left-bracket 0 comma 1 right-parenthesis Endscripts f left-parenthesis x right-parenthesis normal d x 2nd Column almost-equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts f left-parenthesis x Subscript i Baseline right-parenthesis 2nd Row 1st Column Blank 2nd Column equals integral Underscript left-bracket 0 comma 1 right-parenthesis Endscripts f left-parenthesis x right-parenthesis s left-parenthesis x right-parenthesis normal d x period EndLayout

It may seem like we are moving backward: after all, the point of Monte Carlo integration is to transform integrals into sums. However, this transformation is key to being able to apply the Fourier machinery to the problem.

If we substitute the Fourier series expansion of Equation (8.9) into Equation (8.10), we can find that

integral Underscript left-bracket 0 comma 1 right-parenthesis Endscripts f left-parenthesis x right-parenthesis s left-parenthesis x right-parenthesis normal d x equals sigma-summation Underscript j element-of bold upper Z Endscripts f Subscript j Baseline overbar s Subscript j Baseline period

From the definition of the Fourier series coefficients, we know that f 0 equals f 0 overbar equals integral f left-parenthesis x right-parenthesis normal d x . Furthermore, s 0 equals 1 from the definition of s left-parenthesis x right-parenthesis and the assumption of uniform and unweighted samples. Therefore, the error in the Monte Carlo estimate is given by

StartAbsoluteValue integral Underscript left-bracket 0 comma 1 right-parenthesis Endscripts f left-parenthesis x right-parenthesis normal d x minus integral Underscript left-bracket 0 comma 1 right-parenthesis Endscripts f left-parenthesis x right-parenthesis s left-parenthesis x right-parenthesis normal d x EndAbsoluteValue equals StartAbsoluteValue f 0 minus sigma-summation Underscript j element-of bold upper Z Endscripts f Subscript j Baseline overbar s Subscript j Baseline EndAbsoluteValue equals sigma-summation Underscript j element-of bold upper Z Superscript asterisk Baseline Endscripts f Subscript j Baseline overbar s Subscript j Baseline comma

where bold upper Z Superscript asterisk denotes the set of all integers except for zero.

Equation (8.11) is the key result that gives us insight about integration error. It is worth taking the time to understand and to consider the implications of it. For example, if f is band limited, then f Subscript j Baseline equals 0 for all j after some value j Subscript normal m normal a normal x . In that case, if s ’s sampling rate is at least equal to f ’s highest frequency, then s Subscript j Baseline equals 0 for all 0 less-than j less-than j Subscript normal m normal a normal x and a zero variance estimator is the result. Only half the sampling rate is necessary for perfect integration compared to what is needed for perfect reconstruction!

Using Equation (8.11) with the definition of variance, it can be shown that the variance of the estimator is given by the sum of products of f left-parenthesis x right-parenthesis ’s and s left-parenthesis x right-parenthesis ’s PSDs:

upper V left-bracket StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts f left-parenthesis x Subscript i Baseline right-parenthesis right-bracket equals sigma-summation Underscript j element-of bold upper Z Superscript asterisk Baseline Endscripts script upper P Subscript f Baseline left-parenthesis j right-parenthesis script upper P Subscript s Baseline left-parenthesis j right-parenthesis period

This gives a clear direction about how to reduce variance: it is best if the power spectrum of the sampling pattern is low where the function’s power spectrum is high. In rendering, the function is generally not available analytically, let alone in the Fourier series basis, so we follow the usual expectation that the function has most of its frequency content in lower frequencies. This assumption argues for a sampling pattern with its energy concentrated in higher frequencies and minimal energy in the lower frequencies—precisely the same blue noise criterion that we earlier saw was effective for antialiasing.

An insight that directly follows from Equation (8.12) is that with uniform random sampling (i.e., white noise), script upper P Subscript s is the constant 1 slash n , which leads to the variance of

StartFraction 1 Over n EndFraction sigma-summation Underscript j element-of bold upper Z Superscript asterisk Baseline Endscripts script upper P Subscript f Baseline left-parenthesis j right-parenthesis equals upper O left-parenthesis StartFraction 1 Over n EndFraction right-parenthesis comma

which is the same variance of the Monte Carlo estimator that was derived earlier using different means in Section 2.1.4. More generally, if the PSD for a sampling technique can be asymptotically bounded, it can be shown that the technique exhibits a higher rate of variance reduction given a suitable function being integrated. One example is that in 2D, a jittered sampling pattern can achieve upper O left-parenthesis n Superscript negative 2 Baseline right-parenthesis variance, given a smooth integrand.

Fourier analysis has also revealed that Poisson disk sampling patterns have unexpectedly bad asymptotic convergence. Poisson disk point sets are constructed such that no two points can be closer than some minimum distance d (see Figure 8.16). For many years, they were believed to be superior to jittered patterns. The Poisson disk criterion is an appealing one, as it prohibits multiple samples from clumping close together, as is possible with adjacent jittered samples.

Figure 8.16: 256 sample points distributed using (a) a jittered distribution, and (b) a Poisson disk distribution. Poisson disk point sets combine some randomness in the locations of the points with some structure from no two of them being too close together.

Part of the appeal of Poisson disk patterns is that initially they seem to have superior blue noise characters to jittered patterns, with a much larger range of frequencies around the origin where the PSD is low. Figure 8.17 shows the PSDs of 2D jittered and Poisson disk sample points. Both feature a spike at the origin, a ring of low energy around it, and then a transition to fairly equal-energy noise at higher frequencies.

Figure 8.17: PSDs of (a) jittered and (b) Poisson disk–distributed sample points. The origin with the central spike is at the center of each image.

Figure 8.18: Radially averaged PSDs of (a) jittered and (b) Poisson disk–distributed sample points.

Radially averaged plots of the distribution of energy in these PSDs, however, makes their behavior in the low frequencies more clear; see Figure 8.18. We can see that although the Poisson disk pattern has low energy for a larger range of frequencies than the jittered pat- tern, its PSD retains a small amount of energy all the way until 0, while the jittered pattern does not.

Using Fourier analysis of variance, it can be shown that due to this lingering energy, the variance when using Poisson disk sampling is never any better than upper O left-parenthesis n Superscript negative 1 Baseline right-parenthesis —worse than jittered points for some integrands. (Though remember that these are asymptotic bounds, and that for small n , Poisson disk–distributed points may give lower variance.) Nevertheless, the poor asymptotic convergence for what seems like it should be an effective sampling approach was a surprise, and points to the value of this form of analysis.

8.2.2 Low Discrepancy and Quasi Monte Carlo

Outside of Fourier analysis, another useful approach for evaluating the quality of sample points is based on a concept called discrepancy. Well-distributed sampling patterns have low discrepancy, and thus the sample pattern generation problem can be considered to be one of finding a suitable pattern of points with low discrepancy.

In discussing the discrepancy of sample points, we will draw a distinction between sample sets, which are a specific number of points, and sample sequences, which are defined by an algorithm that can generate an arbitrary number of points. For a fixed number of samples, it is generally possible to distribute points in a sample set slightly better than the same number of points in a sample sequence. However, sequences can be especially useful with adaptive sampling algorithms, thanks to their flexibility in the number of points they generate.

The basic idea of discrepancy is that the quality of a set of points in a d -dimensional space left-bracket 0 comma 1 right-parenthesis Superscript d can be evaluated by looking at regions of the domain left-bracket 0 comma 1 right-parenthesis Superscript d , counting the number of points inside each region, and comparing the volume of each region to the number of sample points inside. In general, a given fraction of the volume should have roughly the same fraction of the total number of sample points inside of it. While it is not possible for this always to be the case, we can still try to use patterns that minimize the maximum difference between the actual volume and the volume estimated by the points (the discrepancy). Figure 8.19 shows an example of the idea in two dimensions.

Figure 8.19: The discrepancy of a box (shaded) given a set of 2D sample points in left-bracket 0 comma 1 right-parenthesis squared . One of the four sample points is inside the box, so this set of points would estimate the box’s area to be 1 slash 4 . The true area of the box is 0.3 times 0.3 equals .09 , so the discrepancy for this particular box is .25 minus .09 equals .16 . In general, we are interested in finding the maximum discrepancy of all possible boxes (or some other shape).

To compute the discrepancy of a set of points, we first pick a family of shapes upper B that are subsets of left-bracket 0 comma 1 right-parenthesis Superscript d . For example, boxes with one corner at the origin are often used. This corresponds to

upper B equals left-brace left-bracket 0 comma v 1 right-bracket times left-bracket 0 comma v 2 right-bracket times midline-horizontal-ellipsis times left-bracket 0 comma v Subscript d Baseline right-bracket right-brace comma

where 0 less-than-or-equal-to v Subscript i Baseline less-than 1 . Given a set of n sample points upper P equals StartSet x 1 comma ellipsis comma x Subscript n Baseline EndSet , the discrepancy of upper P with respect to upper B is

upper D Subscript n Baseline left-parenthesis upper B comma upper P right-parenthesis equals sup Underscript b element-of upper B Endscripts StartAbsoluteValue StartFraction normal ♯ left-brace x Subscript i Baseline element-of b right-brace Over n EndFraction minus upper V left-parenthesis b right-parenthesis EndAbsoluteValue comma

where normal ♯ StartSet x Subscript i Baseline element-of b EndSet is the number of points in b and upper V left-parenthesis b right-parenthesis is the volume of b .

The intuition for why Equation (8.13) is a reasonable measure of quality is that the value normal ♯ left-brace x Subscript i Baseline element-of b right-brace slash n is an approximation of the volume of the box b given by the particular points upper P . Therefore, the discrepancy is the worst error over all possible boxes from this way of approximating the volume. When the set of shapes  upper B is the set of boxes with a corner at the origin, this value is called the star discrepancy, upper D Subscript n Superscript asterisk Baseline left-parenthesis upper P right-parenthesis . Another popular option for upper B is the set of all axis-aligned boxes, where the restriction that one corner be at the origin has been removed.

For some point sets, the discrepancy can be computed analytically. For example, consider the set of points in one dimension

x Subscript i Baseline equals StartFraction i Over n EndFraction period

We can see that the star discrepancy of x Subscript i is

upper D Subscript n Superscript asterisk Baseline left-parenthesis x 1 comma ellipsis comma x Subscript n Baseline right-parenthesis equals StartFraction 1 Over n EndFraction period

For example, take the interval b equals left-bracket 0 comma 1 slash n right-parenthesis . Then upper V left-parenthesis b right-parenthesis equals 1 slash n , but normal ♯ StartSet x Subscript i Baseline element-of b EndSet equals 0 . This interval (and the intervals left-bracket 0 comma 2 slash n right-parenthesis , etc.) is the interval where the largest differences between volume and fraction of points inside the volume are seen.

The star discrepancy of this point set can be improved by modifying it slightly:

x Subscript i Baseline equals StartFraction i minus one-half Over n EndFraction period

Then

upper D Subscript n Superscript asterisk Baseline left-parenthesis x Subscript i Baseline right-parenthesis equals StartFraction 1 Over 2 n EndFraction period

The bounds for the star discrepancy of a sequence of points in one dimension have been shown to be

upper D Subscript n Superscript asterisk Baseline left-parenthesis x Subscript i Baseline right-parenthesis equals StartFraction 1 Over 2 n EndFraction plus max Underscript 1 less-than-or-equal-to i less-than-or-equal-to n Endscripts StartAbsoluteValue x Subscript i Baseline minus StartFraction 2 i minus 1 Over 2 n EndFraction EndAbsoluteValue period

Thus, the earlier set from Equation (8.14) has the lowest possible discrepancy for a sequence in 1D. In general, it is much easier to analyze and compute bounds for the discrepancy of sequences in 1D than for those in higher dimensions. When it is not possible to derive the discrepancy of a sampling technique analytically, it can be estimated numerically by constructing a large number of shapes  b , computing their discrepancy, and reporting the maximum value found.

The astute reader will notice that according to the discrepancy measure, the uniform sequence in 1D is optimal, but Fourier analysis indicated that jittering was superior to uniform sampling. Fortunately, low-discrepancy patterns in higher dimensions are much less uniform than they are in one dimension and thus usually work reasonably well as sample patterns in practice. Nevertheless, their underlying uniformity means that low-discrepancy patterns can be more prone to visually objectionable aliasing than patterns with pseudo-random variation.

A d -dimensional sequence of points is said to have low discrepancy if its discrepancy is of the order

upper O left-parenthesis StartFraction left-parenthesis log n right-parenthesis Superscript d Baseline Over n EndFraction right-parenthesis period

These bounds are the best that are known for arbitrary d .

Low-discrepancy point sets and sequences are often generated using deterministic algorithms; we will see a number of such algorithms in Sections 8.6 and 8.7. Using such points to sample functions for integration brings us to quasi–Monte Carlo (QMC) methods. Many of the techniques used in regular Monte Carlo algorithms can be shown to work equally well with such quasi-random sample points.

The Koksma–Hlawka inequality relates the discrepancy of a set of points used for integration to the error of an estimate of the integral of a function f . It is:

StartAbsoluteValue integral f left-parenthesis x right-parenthesis normal d x minus StartFraction 1 Over n EndFraction sigma-summation Underscript i Endscripts f left-parenthesis x Subscript i Baseline right-parenthesis EndAbsoluteValue less-than-or-equal-to upper D Subscript n Baseline left-parenthesis upper B comma upper P right-parenthesis script upper V Subscript f Baseline comma

where script upper V Subscript f is the total variation of the function f being integrated. It is defined as

script upper V Subscript f Baseline equals sup Underscript 0 equals y 1 less-than y 2 less-than midline-horizontal-ellipsis less-than y Subscript m Baseline equals 1 Endscripts sigma-summation Underscript i equals 1 Overscript m Endscripts StartAbsoluteValue f left-parenthesis y Subscript i Baseline right-parenthesis minus f left-parenthesis y Subscript i plus 1 Baseline right-parenthesis EndAbsoluteValue comma

over all partitions of the left-bracket 0 comma 1 right-parenthesis domain at points y Subscript i . In essence, the total variation represents how quickly the function’s value ever changes between points, and the discrepancy represents how effective the points used for integration are at catching the function’s variation.

Given the definition of low discrepancy from Equation (8.15), we can see from the Koksma–Hlawka inequality that as the dimensionality d of the integrand increases, the integration error with low discrepancy approaches upper O left-parenthesis n Superscript negative 1 Baseline right-parenthesis , which is asymptotically much better than the upper O left-parenthesis n Superscript negative 1 slash 2 Baseline right-parenthesis error from Monte Carlo integration (Section 2.1.4). Note also that these error bounds are asymptotic; in practice, QMC usually has an even better rate of convergence.

However, because QMC integration is deterministic, it is not possible to use variance as a measure of an estimator’s quality, though of course one can still compute the mean squared error. Alternatively, the sample points can be randomized using approaches that are carefully designed not to harm their discrepancy. We will see later in the chapter that randomization can even lead to improved rates of convergence. Such approaches are randomized quasi–Monte Carlo (RQMC) methods and again allow the use of variance. RQMC is the foundation of most of pbrt’s Monte Carlo integration algorithms.

In most of this text, we have glossed over the differences between Monte Carlo, QMC, and RQMC, and have localized the choice among them in the Samplers in this chapter. Doing so introduces the possibility of subtle errors if a Sampler generates quasi-random sample points that an Integrator then improperly uses as part of an implementation of an algorithm that is not suitable for quasi Monte Carlo, though none of the integrators described in the text do so.