## 2.1 Monte Carlo: Basics

Because Monte Carlo integration is based on randomization, we will start this chapter with a brief review of ideas from probability and statistics that provide the foundations of the approach. Doing so will allow us to introduce the basic Monte Carlo algorithm as well as mathematical tools for evaluating its error.

### 2.1.1 Background and Probability Review

We will start by defining some terms and reviewing basic ideas from
probability. We assume that the reader is already familiar with
basic probability concepts; readers needing a more complete introduction to this
topic should consult a textbook such as Sheldon Ross’s *Introduction to
Probability Models* (2002).

A *random variable* is a value chosen by some random process. We
will generally use capital letters to denote random variables, with
exceptions made for a few Greek symbols that represent special random
variables. Random variables are always drawn from some domain, which can
be either discrete (e.g., a fixed, finite set of possibilities) or
continuous (e.g., the real numbers ). Applying a function to
a random variable results in a new random variable .

For example, the result of a roll of a die is a discrete random variable
sampled from the set of events . Each
event has a probability , and the sum of probabilities
is necessarily one. A random variable like this one that has
the same probability for all potential values of it is said to be
*uniform*. A function that gives a discrete random variable’s
probability is termed a *probability mass function* (PMF), and so we
could equivalently write in this case.

Two random variables are *independent* if the probability of one does
not affect the probability of the other. In this case, the *joint
probability* of two random variables is given by the product of
their probabilities:

For example, two random variables representing random samples of the six sides of a die are independent.

For *dependent* random variables, one’s probability affects the
other’s. Consider a bag filled with some number of black balls and some
number of white balls. If we randomly choose two balls from the bag, the
probability of the second ball being white is affected by the color of the
first ball since its choice changes the number of balls of one type left in
the bag. We will say that the second ball’s probability is
*conditioned* on the choice of the first one. In this case, the
joint probability for choosing two balls and is given
by

where is the *conditional probability* of given a
value of .

In the following, it will often be the case that a random variable’s
probability is conditioned on many values; for example, when choosing a
light source from which to sample illumination, the `BVHLightSampler`
in Section 12.6.3 considers the 3D position of
the receiving point and its surface normal, and so the choice of light is
conditioned on them. However, we will often omit the
variables that a random variable is conditioned on in cases where there are
many of them and where enumerating them would obscure notation.

A particularly important random variable is the *canonical uniform
random variable*, which we will write as . This variable takes on
all values in its domain independently and with uniform probability. This particular
variable is important for two reasons. First, it is easy to generate a
variable with this distribution in software—most runtime libraries have
a pseudo-random number generator that does just that.
Second, we can take the canonical uniform random variable and map
it to a discrete random variable, choosing if

For lighting applications, we might want to define the probability of sampling illumination from each light in the scene based on its power relative to the total power from all sources:

Notice that these values also sum to 1. Given such per-light probabilities, could be used to select a light source from which to sample illumination.

The *cumulative distribution function* (CDF) of a random
variable is the probability that a value from the variable’s distribution is less
than or equal to some value :

For the die example, , since two of the six possibilities are less than or equal to 2.

*Continuous random variables* take on values over ranges of continuous
domains (e.g., the real numbers, directions on the unit sphere, or the
surfaces of shapes in the scene). Beyond , another example of a
continuous random variable is the
random variable that ranges over the real numbers between 0 and 2, where
the probability of its taking on any particular value is proportional
to the value : it is twice as likely for this random variable to take
on a value around 0 as it is to take one around 1, and so forth.

The *probability density function* (PDF) formalizes this idea: it
describes the relative probability of a random variable taking on a
particular value and is the continuous analog of the PMF. The PDF
is the derivative of the random variable’s CDF,

For uniform random variables, is a constant; this is a direct consequence of uniformity. For we have

PDFs are necessarily nonnegative and always integrate to 1 over their
domains. Note that their value at a point is *not* necessarily
less than 1, however.

Given an interval in the domain, integrating the PDF gives the probability that a random variable lies inside the interval:

This follows directly from the first fundamental theorem of calculus and the definition of the PDF.

### 2.1.2 Expected Values

The *expected value* of a function is defined as the
average value of the function over some distribution of values over its
domain . It is defined as

As an example, consider finding the expected value of the cosine function between and , where is uniform. Because the PDF must integrate to 1 over the domain, , so

which is precisely the expected result. (Consider the graph of over to see why this is so.)

The expected value has a few useful properties that follow from its definition:

We will repeatedly use these properties in derivations in the following sections.

### 2.1.3 The Monte Carlo Estimator

We can now define the Monte Carlo estimator, which approximates the value of an arbitrary integral. Suppose that we want to evaluate a 1D integral . Given a supply of independent uniform random variables , the Monte Carlo estimator says that the expected value of the estimator

, is equal to the integral. This fact can be demonstrated with just a few steps. First, note that the PDF corresponding to the random variable must be equal to , since must not only be a constant but also integrate to 1 over the domain . Algebraic manipulation using the properties from Equations (2.4) and (2.5) then shows that

Extending this estimator to multiple dimensions or complex integration domains is straightforward: independent samples are taken from a uniform multidimensional PDF, and the estimator is applied in the same way. For example, consider the 3D integral

If samples are chosen uniformly from the cube from , then the PDF is the constant value

and the estimator is

The restriction to uniform random variables can be relaxed with a small generalization. This is an important step, since carefully choosing the PDF from which samples are drawn leads to a key technique for reducing error in Monte Carlo that will be introduced in Section 2.2.2. If the random variables are drawn from a PDF , then the estimator

can be used to estimate the integral instead. The only limitation on is that it must be nonzero for all where .

It is similarly not too hard to see that the expected value of this estimator is the desired integral of :

We can now understand the factor of in the implementation of the
`RandomWalkIntegrator`: directions are uniformly sampled over the unit
sphere, which has area . Because the PDF is normalized over the
sampling domain, it must have the constant value . When the
estimator of Equation (2.7) is applied, that value
appears in the divisor.

With Monte Carlo, the number of samples can be chosen arbitrarily, regardless of the dimensionality of the integrand. This is another important advantage of Monte Carlo over traditional deterministic quadrature techniques, which typically require a number of samples that is exponential in the dimension.

### 2.1.4 Error in Monte Carlo Estimators

Showing that the Monte Carlo estimator converges to the right answer is not
enough to justify its use; its rate of convergence is important too.
*Variance*, the expected squared deviation of a function
from its expected value, is a useful way to characterize Monte Carlo
estimators’ convergence.
The variance of an estimator is defined as

from which it follows that

Thus, the variance is the expected value of the square minus the square of the expected value.

If the estimator is a sum of independent random variables (like the Monte Carlo estimator ), then the variance of the sum is the sum of the individual random variables’ variances:

From Equation (2.10) it is easy to show that variance decreases linearly with the number of samples . Because variance is squared error, the error in a Monte Carlo estimate therefore only goes down at a rate of in the number of samples. Although standard quadrature techniques converge at a faster rate in one dimension, their performance becomes exponentially worse as the dimensionality of the integrand increases, while Monte Carlo’s convergence rate is independent of the dimension, making Monte Carlo the only practical numerical integration algorithm for high-dimensional integrals.

The characteristic of Monte Carlo’s rate of error reduction is apparent when watching a progressive rendering of a scene where additional samples are incrementally taken in all pixels. The image improves rapidly for the first few samples when doubling the number of samples is relatively little additional work. Later on, once tens or hundreds of samples have been taken, each additional sample doubling takes much longer and remaining error in the image takes a long time to disappear.

The linear decrease in variance with increasing numbers of samples
makes it easy to compare different Monte Carlo estimators. Consider two
estimators, where the second has half the variance of the first but takes
three times as long to compute an estimate; which of the two is better? In
that case, the first is preferable: it could
take three times as many samples in the time consumed by the second, in
which case it would achieve a variance reduction.
This concept can be encapsulated in the *efficiency* of an estimator
, which is defined as

where is its variance and is the running time to compute its value.

Not all estimators of integrals have expected values that are equal to
the value of the integral. Such estimators are said to be *biased*, where
the difference

is the amount of bias. Biased estimators may still be desirable if they are able to get close to the correct result more quickly than unbiased estimators. Kalos and Whitlock (1986, pp. 36–37) gave the following example: consider the problem of computing an estimate of the mean value of a uniform distribution over the interval from 0 to 1. One could use the estimator

or one could use the biased estimator

The first estimator is unbiased but has variance with order . The second estimator’s expected value is

so it is biased, although its variance is , which is much better.
This estimator has the useful property that its error goes to 0 in
the limit as the number of samples goes to infinity; such estimators
are *consistent*.
Most of the Monte Carlo estimators used in `pbrt` are unbiased, with the
notable exception of the `SPPMIntegrator`, which implements a photon
mapping algorithm.

Closely related to the variance is the *mean squared error* (MSE),
which is defined as the expectation of the squared difference of an
estimator and the true value,

For an unbiased estimator, MSE is equal to the variance; otherwise it is the sum of variance and the squared bias of the estimator.

It is possible to work out the variance and MSE of some simple estimators in
closed form, but for most of the ones of interest in rendering, this is not
possible. Yet it is still useful to be able to quantify these values.
For this purpose, the *sample variance* can be computed using a set of
independent random variables .
Equation (2.8) points at one way to compute the sample
variance for a set of random variables . If the *sample mean* is
computed as their average, , then the sample variance
is

The division by rather than is *Bessel’s correction*,
and ensures that the sample variance is an unbiased estimate of the
variance. (See also Section B.2.11, where a numerically
stable approach for computing the sample variance is introduced.)

The sample variance is itself an estimate of the variance, so it has variance itself. Consider, for example, a random variable that has a value of 1 99.99% of the time, and a value of one million 0.01% of the time. If we took ten random samples of it that all had the value 1, the sample variance would suggest that the random variable had zero variance even though its variance is actually much higher.

If an accurate estimate of the integral can be computed (for example, using a large number of samples), then the mean squared error can be estimated by

The `imgtool` utility program that is provided in `pbrt`’s distribution
can compute an image’s MSE with respect to a reference image via its
`diff` option.