Monte Carlo: Basics

2.1 Monte Carlo: Basics

Because Monte Carlo integration is based on randomization, we will start this chapter with a brief review of ideas from probability and statistics that provide the foundations of the approach. Doing so will allow us to introduce the basic Monte Carlo algorithm as well as mathematical tools for evaluating its error.

2.1.1 Background and Probability Review

We will start by defining some terms and reviewing basic ideas from probability. We assume that the reader is already familiar with basic probability concepts; readers needing a more complete introduction to this topic should consult a textbook such as Sheldon Ross’s Introduction to Probability Models (2002).

A random variable is a value chosen by some random process. We will generally use capital letters to denote random variables, with exceptions made for a few Greek symbols that represent special random variables. Random variables are always drawn from some domain, which can be either discrete (e.g., a fixed, finite set of possibilities) or continuous (e.g., the real numbers ). Applying a function to a random variable results in a new random variable .

For example, the result of a roll of a die is a discrete random variable sampled from the set of events . Each event has a probability , and the sum of probabilities is necessarily one. A random variable like this one that has the same probability for all potential values of it is said to be uniform. A function that gives a discrete random variable’s probability is termed a probability mass function (PMF), and so we could equivalently write in this case.

Two random variables are independent if the probability of one does not affect the probability of the other. In this case, the joint probability of two random variables is given by the product of their probabilities:

For example, two random variables representing random samples of the six sides of a die are independent.

For dependent random variables, one’s probability affects the other’s. Consider a bag filled with some number of black balls and some number of white balls. If we randomly choose two balls from the bag, the probability of the second ball being white is affected by the color of the first ball since its choice changes the number of balls of one type left in the bag. We will say that the second ball’s probability is conditioned on the choice of the first one. In this case, the joint probability for choosing two balls and is given by

(2.1)

where is the conditional probability of given a value of .

In the following, it will often be the case that a random variable’s probability is conditioned on many values; for example, when choosing a light source from which to sample illumination, the BVHLightSampler in Section 12.6.3 considers the 3D position of the receiving point and its surface normal, and so the choice of light is conditioned on them. However, we will often omit the variables that a random variable is conditioned on in cases where there are many of them and where enumerating them would obscure notation.

A particularly important random variable is the canonical uniform random variable, which we will write as . This variable takes on all values in its domain independently and with uniform probability. This particular variable is important for two reasons. First, it is easy to generate a variable with this distribution in software—most runtime libraries have a pseudo-random number generator that does just that. Second, we can take the canonical uniform random variable and map it to a discrete random variable, choosing if

(2.2)

For lighting applications, we might want to define the probability of sampling illumination from each light in the scene based on its power relative to the total power from all sources:

Notice that these values also sum to 1. Given such per-light probabilities, could be used to select a light source from which to sample illumination.

The cumulative distribution function (CDF) of a random variable is the probability that a value from the variable’s distribution is less than or equal to some value :

(2.3)

For the die example, , since two of the six possibilities are less than or equal to 2.

Continuous random variables take on values over ranges of continuous domains (e.g., the real numbers, directions on the unit sphere, or the surfaces of shapes in the scene). Beyond , another example of a continuous random variable is the random variable that ranges over the real numbers between 0 and 2, where the probability of its taking on any particular value is proportional to the value : it is twice as likely for this random variable to take on a value around 0 as it is to take one around 1, and so forth.

The probability density function (PDF) formalizes this idea: it describes the relative probability of a random variable taking on a particular value and is the continuous analog of the PMF. The PDF is the derivative of the random variable’s CDF,

For uniform random variables, is a constant; this is a direct consequence of uniformity. For we have

PDFs are necessarily nonnegative and always integrate to 1 over their domains. Note that their value at a point is not necessarily less than 1, however.

Given an interval in the domain, integrating the PDF gives the probability that a random variable lies inside the interval:

This follows directly from the first fundamental theorem of calculus and the definition of the PDF.

2.1.2 Expected Values

The expected value of a function is defined as the average value of the function over some distribution of values over its domain . It is defined as

(2.4)

As an example, consider finding the expected value of the cosine function between and , where is uniform. Because the PDF must integrate to 1 over the domain, , so

which is precisely the expected result. (Consider the graph of over to see why this is so.)

The expected value has a few useful properties that follow from its definition:

(2.5)

We will repeatedly use these properties in derivations in the following sections.

2.1.3 The Monte Carlo Estimator

We can now define the Monte Carlo estimator, which approximates the value of an arbitrary integral. Suppose that we want to evaluate a 1D integral . Given a supply of independent uniform random variables , the Monte Carlo estimator says that the expected value of the estimator

(2.6)

, is equal to the integral. This fact can be demonstrated with just a few steps. First, note that the PDF corresponding to the random variable must be equal to , since must not only be a constant but also integrate to 1 over the domain . Algebraic manipulation using the properties from Equations (2.4) and (2.5) then shows that

Extending this estimator to multiple dimensions or complex integration domains is straightforward: independent samples are taken from a uniform multidimensional PDF, and the estimator is applied in the same way. For example, consider the 3D integral

If samples are chosen uniformly from the cube from , then the PDF is the constant value

and the estimator is

The restriction to uniform random variables can be relaxed with a small generalization. This is an important step, since carefully choosing the PDF from which samples are drawn leads to a key technique for reducing error in Monte Carlo that will be introduced in Section 2.2.2. If the random variables are drawn from a PDF , then the estimator

(2.7)

can be used to estimate the integral instead. The only limitation on is that it must be nonzero for all where .

It is similarly not too hard to see that the expected value of this estimator is the desired integral of :

We can now understand the factor of in the implementation of the RandomWalkIntegrator: directions are uniformly sampled over the unit sphere, which has area . Because the PDF is normalized over the sampling domain, it must have the constant value . When the estimator of Equation (2.7) is applied, that value appears in the divisor.

With Monte Carlo, the number of samples can be chosen arbitrarily, regardless of the dimensionality of the integrand. This is another important advantage of Monte Carlo over traditional deterministic quadrature techniques, which typically require a number of samples that is exponential in the dimension.

2.1.4 Error in Monte Carlo Estimators

Showing that the Monte Carlo estimator converges to the right answer is not enough to justify its use; its rate of convergence is important too. Variance, the expected squared deviation of a function from its expected value, is a useful way to characterize Monte Carlo estimators’ convergence. The variance of an estimator is defined as

(2.8)

from which it follows that

This property and Equation (2.5) yield an alternative expression for the variance:

(2.9)

Thus, the variance is the expected value of the square minus the square of the expected value.

If the estimator is a sum of independent random variables (like the Monte Carlo estimator ), then the variance of the sum is the sum of the individual random variables’ variances:

(2.10)

From Equation (2.10) it is easy to show that variance decreases linearly with the number of samples . Because variance is squared error, the error in a Monte Carlo estimate therefore only goes down at a rate of in the number of samples. Although standard quadrature techniques converge at a faster rate in one dimension, their performance becomes exponentially worse as the dimensionality of the integrand increases, while Monte Carlo’s convergence rate is independent of the dimension, making Monte Carlo the only practical numerical integration algorithm for high-dimensional integrals.

The characteristic of Monte Carlo’s rate of error reduction is apparent when watching a progressive rendering of a scene where additional samples are incrementally taken in all pixels. The image improves rapidly for the first few samples when doubling the number of samples is relatively little additional work. Later on, once tens or hundreds of samples have been taken, each additional sample doubling takes much longer and remaining error in the image takes a long time to disappear.

The linear decrease in variance with increasing numbers of samples makes it easy to compare different Monte Carlo estimators. Consider two estimators, where the second has half the variance of the first but takes three times as long to compute an estimate; which of the two is better? In that case, the first is preferable: it could take three times as many samples in the time consumed by the second, in which case it would achieve a variance reduction. This concept can be encapsulated in the efficiency of an estimator , which is defined as

where is its variance and is the running time to compute its value.

Not all estimators of integrals have expected values that are equal to the value of the integral. Such estimators are said to be biased, where the difference

is the amount of bias. Biased estimators may still be desirable if they are able to get close to the correct result more quickly than unbiased estimators. Kalos and Whitlock (1986, pp. 36–37) gave the following example: consider the problem of computing an estimate of the mean value of a uniform distribution over the interval from 0 to 1. One could use the estimator

or one could use the biased estimator

The first estimator is unbiased but has variance with order . The second estimator’s expected value is

so it is biased, although its variance is , which is much better. This estimator has the useful property that its error goes to 0 in the limit as the number of samples goes to infinity; such estimators are consistent. Most of the Monte Carlo estimators used in pbrt are unbiased, with the notable exception of the SPPMIntegrator, which implements a photon mapping algorithm.

Closely related to the variance is the mean squared error (MSE), which is defined as the expectation of the squared difference of an estimator and the true value,

For an unbiased estimator, MSE is equal to the variance; otherwise it is the sum of variance and the squared bias of the estimator.

It is possible to work out the variance and MSE of some simple estimators in closed form, but for most of the ones of interest in rendering, this is not possible. Yet it is still useful to be able to quantify these values. For this purpose, the sample variance can be computed using a set of independent random variables . Equation (2.8) points at one way to compute the sample variance for a set of random variables . If the sample mean is computed as their average, , then the sample variance is

(2.11)

The division by rather than is Bessel’s correction, and ensures that the sample variance is an unbiased estimate of the variance. (See also Section B.2.11, where a numerically stable approach for computing the sample variance is introduced.)

The sample variance is itself an estimate of the variance, so it has variance itself. Consider, for example, a random variable that has a value of 1 99.99% of the time, and a value of one million 0.01% of the time. If we took ten random samples of it that all had the value 1, the sample variance would suggest that the random variable had zero variance even though its variance is actually much higher.

If an accurate estimate of the integral can be computed (for example, using a large number of samples), then the mean squared error can be estimated by

The imgtool utility program that is provided in pbrt’s distribution can compute an image’s MSE with respect to a reference image via its diff option.