2.2 Improving Efficiency

Given an unbiased Monte Carlo estimator, we are in the fortunate position of having a reliable relationship between the number of samples taken and variance (and thus, error). If we have an unacceptably noisy rendered image, increasing the number of samples will reduce error in a predictable way, and—given enough computation—an image of sufficient quality can be generated.

However, computation takes time, and often there is not enough of it. The deadline for a movie may be at hand, or the sixtieth-of-a-second time slice in a real-time renderer may be coming to an end. Given the consequentially limited number of samples, the only option for variance reduction is to find ways to make more of the samples that can be taken. Fortunately, a variety of techniques have been developed to improve the basic Monte Carlo estimator by making the most of the samples that are taken; here we will discuss the most important ones that are used in pbrt.

2.2.1 Stratified Sampling

A classic and effective family of techniques for variance reduction is based on the careful placement of samples in order to better capture the features of the integrand (or, more accurately, to be less likely to miss important features). These techniques are used extensively in pbrt. Stratified sampling decomposes the integration domain into regions and places samples in each one; here we will analyze that approach in terms of its variance reduction properties. Later, in Section 8.2.1, we will return with machinery based on Fourier analysis that provides further insights about it.

Stratified sampling subdivides the integration domain normal upper Lamda into n nonoverlapping regions normal upper Lamda 1 comma normal upper Lamda 2 comma ellipsis comma normal upper Lamda Subscript n Baseline . Each region is called a stratum, and they must completely cover the original domain:

union Underscript i equals 1 Overscript n Endscripts normal upper Lamda Subscript i Baseline equals normal upper Lamda period

To draw samples from normal upper Lamda , we will draw n Subscript i samples from each normal upper Lamda Subscript i , according to densities p Subscript i inside each stratum. A simple example is supersampling a pixel. With stratified sampling, the area around a pixel is divided into a k times k grid, and a sample is drawn uniformly within each grid cell. This is better than taking k squared random samples, since the sample locations are less likely to clump together. Here we will show why this technique reduces variance.

Within a single stratum normal upper Lamda Subscript i , the Monte Carlo estimate is

upper F Subscript i Baseline equals StartFraction 1 Over n Subscript i Baseline EndFraction sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts StartFraction f left-parenthesis upper X Subscript i comma j Baseline right-parenthesis Over p Subscript i Baseline left-parenthesis upper X Subscript i comma j Baseline right-parenthesis EndFraction comma

where upper X Subscript i comma j is the j th sample drawn from density p Subscript i . The overall estimate is upper F equals sigma-summation Underscript i Endscripts v Subscript i Baseline upper F Subscript i , where v Subscript i is the fractional volume of stratum i ( v Subscript i Baseline element-of left-parenthesis 0 comma 1 right-bracket ).

The true value of the integrand in stratum i is

mu Subscript i Baseline equals upper E left-bracket f left-parenthesis upper X Subscript i comma j Baseline right-parenthesis right-bracket equals StartFraction 1 Over v Subscript i Baseline EndFraction integral Underscript normal upper Lamda Subscript i Baseline Endscripts f left-parenthesis x right-parenthesis normal d x comma

and the variance in this stratum is

sigma Subscript i Superscript 2 Baseline equals StartFraction 1 Over v Subscript i Baseline EndFraction integral Underscript normal upper Lamda Subscript i Baseline Endscripts left-parenthesis f left-parenthesis x right-parenthesis minus mu Subscript i Baseline right-parenthesis squared normal d x period

Thus, with n Subscript i samples in the stratum, the variance of the per-stratum estimator is sigma Subscript i Superscript 2 Baseline slash n Subscript i . This shows that the variance of the overall estimator is

StartLayout 1st Row 1st Column upper V left-bracket upper F right-bracket 2nd Column equals upper V left-bracket sigma-summation v Subscript i Baseline upper F Subscript i Baseline right-bracket 2nd Row 1st Column Blank 2nd Column equals sigma-summation upper V left-bracket v Subscript i Baseline upper F Subscript i Baseline right-bracket 3rd Row 1st Column Blank 2nd Column equals sigma-summation v Subscript i Superscript 2 Baseline upper V left-bracket upper F Subscript i Baseline right-bracket 4th Row 1st Column Blank 2nd Column equals sigma-summation StartFraction v Subscript i Superscript 2 Baseline sigma Subscript i Superscript 2 Baseline Over n Subscript i Baseline EndFraction period EndLayout

If we make the reasonable assumption that the number of samples n Subscript i is proportional to the volume v Subscript i , then we have n Subscript i Baseline equals v Subscript i Baseline n , and the variance of the overall estimator is

upper V left-bracket upper F Subscript n Baseline right-bracket equals StartFraction 1 Over n EndFraction sigma-summation v Subscript i Baseline sigma Subscript i Superscript 2 Baseline period

To compare this result to the variance without stratification, we note that choosing an unstratified sample is equivalent to choosing a random stratum upper I according to the discrete probability distribution defined by the volumes v Subscript i and then choosing a random sample upper X in normal upper Lamda Subscript upper I . In this sense, upper X is chosen conditionally on upper I , so it can be shown using conditional probability that

upper V left-bracket upper F right-bracket equals StartFraction 1 Over n EndFraction left-bracket sigma-summation v Subscript i Baseline sigma Subscript i Superscript 2 Baseline plus sigma-summation v Subscript i Baseline left-parenthesis mu Subscript i Baseline minus upper Q right-parenthesis squared right-bracket comma

where upper Q is the mean of f over the whole domain normal upper Lamda .

There are two things to notice about Equation (2.12). First, we know that the right-hand sum must be nonnegative, since variance is always nonnegative. Second, it demonstrates that stratified sampling can never increase variance. Stratification always reduces variance unless the right-hand sum is exactly 0. It can only be 0 when the function  f has the same mean over each stratum  normal upper Lamda Subscript i . For stratified sampling to work best, we would like to maximize the right-hand sum, so it is best to make the strata have means that are as unequal as possible. This explains why compact strata are desirable if one does not know anything about the function f . If the strata are wide, they will contain more variation and will have  mu Subscript i closer to the true mean  upper Q .

Figure 2.1 shows the effect of using stratified sampling versus an independent random distribution for sampling when rendering an image that includes glossy reflection. There is a reasonable reduction in variance at essentially no cost in running time.

Figure 2.1: Variance is higher and the image noisier (a) when independent random sampling is used than (b) when a stratified distribution of sample directions is used instead. (Bunny model courtesy of the Stanford Computer Graphics Laboratory.)

The main downside of stratified sampling is that it suffers from the same “curse of dimensionality” as standard numerical quadrature. Full stratification in  upper D dimensions with  upper S strata per dimension requires  upper S Superscript upper D samples, which quickly becomes prohibitive. Fortunately, it is often possible to stratify some of the dimensions independently and then randomly associate samples from different dimensions; this approach will be used in Section 8.5. Choosing which dimensions are stratified should be done in a way that stratifies dimensions that tend to be most highly correlated in their effect on the value of the integrand (Owen 1998).

2.2.2 Importance Sampling

Importance sampling is a powerful variance reduction technique that exploits the fact that the Monte Carlo estimator

upper F Subscript n Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction f left-parenthesis upper X Subscript i Baseline right-parenthesis Over p left-parenthesis upper X Subscript i Baseline right-parenthesis EndFraction

converges more quickly if the samples are taken from a distribution p left-parenthesis x right-parenthesis that is similar to the function f left-parenthesis x right-parenthesis in the integrand. In this case, samples are more likely to be taken when the magnitude of the integrand is relatively large. Importance sampling is one of the most frequently used variance reduction techniques in rendering, since it is easy to apply and is very effective when good sampling distributions are used.

To see why such sampling distributions reduce error, first consider the effect of using a distribution p left-parenthesis x right-parenthesis proportional-to f left-parenthesis x right-parenthesis , or p left-parenthesis x right-parenthesis equals c f left-parenthesis x right-parenthesis . It is trivial to show that normalization of the PDF requires that

c equals StartFraction 1 Over integral f left-parenthesis x right-parenthesis normal d x EndFraction period

Finding such a PDF requires that we know the value of the integral, which is what we were trying to estimate in the first place. Nonetheless, if we could sample from this distribution, each term of the sum in the estimator would have the value

StartFraction f left-parenthesis upper X Subscript i Baseline right-parenthesis Over p left-parenthesis upper X Subscript i Baseline right-parenthesis EndFraction equals StartFraction 1 Over c EndFraction equals integral f left-parenthesis x right-parenthesis normal d x period

The variance of the estimator is zero! Of course, this is ludicrous since we would not bother using Monte Carlo if we could integrate f directly. However, if a density p left-parenthesis x right-parenthesis can be found that is similar in shape to f left-parenthesis x right-parenthesis , variance is reduced.

As a more realistic example, consider the Gaussian function f left-parenthesis x right-parenthesis equals normal e Superscript minus 1000 left-parenthesis x minus 1 slash 2 right-parenthesis squared , which is plotted in Figure 2.2(a) over left-bracket 0 comma 1 right-bracket . Its value is close to zero over most of the domain. Samples upper X with upper X less-than 0.2 or upper X greater-than 0.3 are of little help in estimating the value of the integral since they give no information about the magnitude of the bump in the function’s value around 1 slash 4 . With uniform sampling and the basic Monte Carlo estimator, variance is approximately 0.0365 .

If samples are instead drawn from the piecewise-constant distribution

p left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 0.1 2nd Column x element-of left-bracket 0 comma 0.45 right-parenthesis 2nd Row 1st Column 9.1 2nd Column x element-of left-bracket 0.45 comma 0.55 right-parenthesis 3rd Row 1st Column 0.1 2nd Column x element-of left-bracket 0.55 comma 1 right-parenthesis comma EndLayout

which is plotted in Figure 2.2(b), and the estimator from Equation (2.7) is used instead, then variance is reduced by a factor of approximately 6.7 times . A representative set of 6 points from this distribution is shown in Figure 2.2(c); we can see that most of the evaluations of f left-parenthesis x right-parenthesis are in the interesting region where it is not nearly zero.

Figure 2.2: (a) A narrow Gaussian function that is close to zero over most of the range left-bracket 0 comma 1 right-bracket . The basic Monte Carlo estimator of Equation (2.6) has relatively high variance if it is used to integrate this function, since most samples have values that are close to zero. (b) A PDF that roughly approximates the function’s distribution. If this PDF is used to generate samples, variance is reduced substantially. (c) A representative distribution of samples generated according to (b).

Importance sampling can increase variance if a poorly chosen distribution is used, however. Consider instead using the distribution

p left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1.2 2nd Column x element-of left-bracket 0 comma 0.4 right-parenthesis 2nd Row 1st Column 0.2 2nd Column x element-of left-bracket 0.4 comma 0.6 right-parenthesis 3rd Row 1st Column 1.2 2nd Column x element-of left-bracket 0.6 comma 1 right-parenthesis comma EndLayout

for estimating the integral of the Gaussian function. This PDF increases the probability of sampling the function where its value is close to zero and decreases the probability of sampling it where its magnitude is larger.

Not only does this PDF generate fewer samples where the integrand is large, but when it does, the magnitude of f left-parenthesis x right-parenthesis slash p left-parenthesis x right-parenthesis in the Monte Carlo estimator will be especially high since p left-parenthesis x right-parenthesis equals 0.4 in that region. The result is approximately 5.4 times higher variance than uniform sampling, and nearly 36 times higher variance than the better PDF above. In the context of Monte Carlo integration for rendering where evaluating the integrand generally involves the expense of tracing a ray, it is desirable to minimize the number of samples taken; using an inferior sampling distribution and making up for it by evaluating more samples is an unappealing option.

2.2.3 Multiple Importance Sampling

We are frequently faced with integrals that are the product of two or more functions: integral f Subscript a Baseline left-parenthesis x right-parenthesis f Subscript b Baseline left-parenthesis x right-parenthesis normal d x . It is often possible to derive separate importance sampling strategies for individual factors individually, though not one that is similar to their product. This situation is especially common in the integrals involved with light transport, such as in the product of BSDF, incident radiance, and a cosine factor in the light transport equation (1.1).

To understand the challenges involved with applying Monte Carlo to such products, assume for now the good fortune of having two sampling distributions p Subscript a and p Subscript b that match the distributions of f Subscript a and f Subscript b exactly. (In practice, this will not normally be the case.) With the Monte Carlo estimator of Equation (2.7), we have two options: we might draw samples using p Subscript a , which gives the estimator

StartFraction f left-parenthesis upper X right-parenthesis Over p Subscript a Baseline left-parenthesis upper X right-parenthesis EndFraction equals StartFraction f Subscript a Baseline left-parenthesis upper X right-parenthesis f Subscript b Baseline left-parenthesis upper X right-parenthesis Over p Subscript a Baseline left-parenthesis upper X right-parenthesis EndFraction equals c f Subscript b Baseline left-parenthesis upper X right-parenthesis comma

where c is a constant equal to the integral of f Subscript a , since p Subscript a Baseline left-parenthesis x right-parenthesis proportional-to f Subscript a Baseline left-parenthesis x right-parenthesis . The variance of this estimator is proportional to the variance of f Subscript b , which may itself be high. Conversely, we might sample from p Subscript b , though doing so gives us an estimator with variance proportional to the variance of f Subscript a , which may similarly be high. In the more common case where the sampling distributions only approximately match one of the factors, the situation is usually even worse.

Unfortunately, the obvious solution of taking some samples from each distribution and averaging the two estimators is not much better. Because variance is additive, once variance has crept into an estimator, we cannot eliminate it by adding it to another low-variance estimator.

Multiple importance sampling (MIS) addresses exactly this issue, with an easy-to-implement variance reduction technique. The basic idea is that, when estimating an integral, we should draw samples from multiple sampling distributions, chosen in the hope that at least one of them will match the shape of the integrand reasonably well, even if we do not know which one this will be. MIS then provides a method to weight the samples from each technique that can eliminate large variance spikes due to mismatches between the integrand’s value and the sampling density. Specialized sampling routines that only account for unusual special cases are even encouraged, as they reduce variance when those cases occur, with relatively little cost in general.

With two sampling distributions p Subscript a and p Subscript b and a single sample taken from each one, upper X tilde p Subscript a and upper Y tilde p Subscript b , the MIS Monte Carlo estimator is

w Subscript a Baseline left-parenthesis upper X right-parenthesis StartFraction f left-parenthesis upper X right-parenthesis Over p Subscript a Baseline left-parenthesis upper X right-parenthesis EndFraction plus w Subscript b Baseline left-parenthesis upper Y right-parenthesis StartFraction f left-parenthesis upper Y right-parenthesis Over p Subscript b Baseline left-parenthesis upper Y right-parenthesis EndFraction comma

where w Subscript a and w Subscript b are weighting functions chosen such that the expected value of this estimator is the value of the integral of f left-parenthesis x right-parenthesis .

More generally, given n sampling distributions p Subscript i with n Subscript i samples upper X Subscript i comma j taken from the i th distribution, the MIS Monte Carlo estimator is

upper F Subscript n Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction 1 Over n Subscript i Baseline EndFraction sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts w Subscript i Baseline left-parenthesis upper X Subscript i comma j Baseline right-parenthesis StartFraction f left-parenthesis upper X Subscript i comma j Baseline right-parenthesis Over p Subscript i Baseline left-parenthesis upper X Subscript i comma j Baseline right-parenthesis EndFraction period

(The full set of conditions on the weighting functions for the estimator to be unbiased are that they sum to 1 when f left-parenthesis x right-parenthesis not-equals 0 , sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis x right-parenthesis equals 1 comma and that w Subscript i Baseline left-parenthesis x right-parenthesis equals 0 if p Subscript i Baseline left-parenthesis x right-parenthesis equals 0 .)

Setting w Subscript i Baseline left-parenthesis upper X right-parenthesis equals 1 slash n corresponds to the case of summing the various estimators, which we have already seen is an ineffective way to reduce variance. It would be better if the weighting functions were relatively large when the corresponding sampling technique was a good match to the integrand and relatively small when it was not, thus reducing the contribution of high-variance samples.

In practice, a good choice for the weighting functions is given by the balance heuristic, which attempts to fulfill this goal by taking into account all the different ways that a sample could have been generated, rather than just the particular one that was used to do so. The balance heuristic’s weighting function for the i th sampling technique is

w Subscript i Baseline left-parenthesis x right-parenthesis equals StartFraction n Subscript i Baseline p Subscript i Baseline left-parenthesis x right-parenthesis Over sigma-summation Underscript j Endscripts n Subscript j Baseline p Subscript j Baseline left-parenthesis x right-parenthesis EndFraction period

With the balance heuristic and our example of taking a single sample from each of two sampling techniques, the estimator of Equation (2.13) works out to be

StartFraction f left-parenthesis upper X right-parenthesis Over p Subscript a Baseline left-parenthesis upper X right-parenthesis plus p Subscript b Baseline left-parenthesis upper X right-parenthesis EndFraction plus StartFraction f left-parenthesis upper Y right-parenthesis Over p Subscript a Baseline left-parenthesis upper Y right-parenthesis plus p Subscript b Baseline left-parenthesis upper Y right-parenthesis EndFraction period

Each evaluation of f is divided by the sum of all PDFs for the corresponding sample rather than just the one that generated the sample. Thus, if p Subscript a generates a sample with low probability at a point where the p Subscript b has a higher probability, then dividing by p Subscript a Baseline left-parenthesis upper X right-parenthesis plus p Subscript b Baseline left-parenthesis upper X right-parenthesis reduces the sample’s contribution. Effectively, such samples are downweighted when sampled from p Subscript a , recognizing that the sampling technique associated with p Subscript b is more effective at the corresponding point in the integration domain. As long as just one of the sampling techniques has a reasonable probability of sampling a point where the function’s value is large, the MIS weights can lead to a significant reduction in variance.

BalanceHeuristic() computes Equation (2.14) for the specific case of two distributions p Subscript a and p Subscript b . We will not need a more general multidistribution case in pbrt.

<<Sampling Inline Functions>>= 
Float BalanceHeuristic(int nf, Float fPdf, int ng, Float gPdf) { return (nf * fPdf) / (nf * fPdf + ng * gPdf); }

In practice, the power heuristic often reduces variance even further. For an exponent beta , the power heuristic is

w Subscript i Baseline left-parenthesis x right-parenthesis equals StartFraction left-parenthesis n Subscript i Baseline p Subscript i Baseline left-parenthesis x right-parenthesis right-parenthesis Superscript beta Baseline Over sigma-summation Underscript j Endscripts left-parenthesis n Subscript j Baseline p Subscript j Baseline left-parenthesis x right-parenthesis right-parenthesis Superscript beta Baseline EndFraction period

Note that the power heuristic has a similar form to the balance heuristic, though it further reduces the contribution of relatively low probabilities. Our implementation has beta equals 2 hard-coded in its implementation; that parameter value usually works well in practice.

<<Sampling Inline Functions>>+=  
Float PowerHeuristic(int nf, Float fPdf, int ng, Float gPdf) { Float f = nf * fPdf, g = ng * gPdf; return Sqr(f) / (Sqr(f) + Sqr(g)); }

Multiple importance sampling can be applied even without sampling from all the distributions. This approach is known as the single sample model. We will not include the derivation here, but it can be shown that given an integrand f left-parenthesis x right-parenthesis , if a sampling technique p Subscript i is chosen from a set of techniques with probability q Subscript i and a sample upper X is drawn from p Subscript i , then the single sample estimator

StartFraction w Subscript i Baseline left-parenthesis upper X right-parenthesis Over q Subscript i Baseline EndFraction StartFraction f left-parenthesis upper X right-parenthesis Over p Subscript i Baseline left-parenthesis upper X right-parenthesis EndFraction

gives an unbiased estimate of the integral. For the single sample model, the balance heuristic is provably optimal.

One shortcoming of multiple importance sampling is that if one of the sampling techniques is a very good match to the integrand, MIS can slightly increase variance. For rendering applications, MIS is almost always worthwhile for the variance reduction it provides in cases that can otherwise have high variance.

MIS Compensation

Multiple importance sampling is generally applied using probability distributions that are all individually valid for importance sampling the integrand, with nonzero probability of generating a sample anywhere that the integrand is nonzero. However, when MIS is being used, it is not a requirement that all PDFs are nonzero where the function’s value is nonzero; only one of them must be.

This observation led to the development of a technique called MIS compensation, which can further reduce variance. It is motivated by the fact that if all the sampling distributions allocate some probability to sampling regions where the integrand’s value is small, it is often the case that that region of the integrand ends up being oversampled, leaving the region where the integrand is high undersampled.

MIS compensation is based on the idea of sharpening one or more (but not all) the probability distributions—for example, by adjusting them to have zero probability in areas where they earlier had low probability. A new sampling distribution p prime can, for example, be defined by

p prime left-parenthesis x right-parenthesis equals StartFraction max left-parenthesis 0 comma p left-parenthesis x right-parenthesis minus delta right-parenthesis Over integral max left-parenthesis 0 comma p left-parenthesis x right-parenthesis minus delta right-parenthesis normal d x EndFraction comma

for some fixed value delta .

This technique is especially easy to apply in the case of tabularized sampling distributions. In Section 12.5, it is used to good effect for sampling environment map light sources.

2.2.4 Russian Roulette

Russian roulette is a technique that can improve the efficiency of Monte Carlo estimates by skipping the evaluation of samples that would make a small contribution to the final result. In rendering, we often have estimators of the form

StartFraction f left-parenthesis upper X right-parenthesis v left-parenthesis upper X right-parenthesis Over p left-parenthesis upper X right-parenthesis EndFraction comma

where the integrand consists of some factors f left-parenthesis upper X right-parenthesis that are easily evaluated (e.g., those that relate to how the surface scatters light) and others that are more expensive to evaluate, such as a binary visibility factor v left-parenthesis upper X right-parenthesis that requires tracing a ray. In these cases, most of the computational expense of evaluating the estimator lies in v .

If f left-parenthesis upper X right-parenthesis is zero, it is obviously worth skipping the work of evaluating v left-parenthesis upper X right-parenthesis , since its value will not affect the value of the estimator. However, if we also skipped evaluating estimators where f left-parenthesis upper X right-parenthesis was small but nonzero, then we would introduce bias into the estimator and would systemically underestimate the value of the integrand. Russian roulette solves this problem, making it possible to also skip tracing rays when f left-parenthesis upper X right-parenthesis ’s value is small but not necessarily 0, while still computing the correct value on average.

To apply Russian roulette, we select some termination probability q . This value can be chosen in almost any manner; for example, it could be based on an estimate of the value of the integrand for the particular sample chosen, increasing as the integrand’s value becomes smaller. With probability q , the estimator is not evaluated for the particular sample, and some constant value c is used in its place ( c equals 0 is often used). With probability 1 minus q , the estimator is still evaluated but is weighted by the factor 1 slash left-parenthesis 1 minus q right-parenthesis , which effectively compensates for the samples that were skipped.

We have the new estimator

upper F prime equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction upper F minus q c Over 1 minus q EndFraction 2nd Column xi Subscript Baseline greater-than q 2nd Row 1st Column c 2nd Column otherwise period EndLayout

It is easy to see that its expected value is the same as the expected value of the original estimator:

upper E left-bracket upper F Superscript prime Baseline right-bracket equals left-parenthesis 1 minus q right-parenthesis left-parenthesis StartFraction upper E left-bracket upper F right-bracket minus q c Over 1 minus q EndFraction right-parenthesis plus q c equals upper E left-bracket upper F right-bracket period

Russian roulette never reduces variance. In fact, unless somehow c equals upper F , it will always increase variance. However, it does improve Monte Carlo efficiency if the probabilities are chosen so that samples that are likely to make a small contribution to the final result are skipped.

2.2.5 Splitting

While Russian roulette reduces the number of samples, splitting increases the number of samples in some dimensions of multidimensional integrals in order to improve efficiency. As an example, consider an integral of the general form

integral Underscript upper A Endscripts integral Underscript upper B Endscripts f left-parenthesis x comma y right-parenthesis normal d x normal d y period

With the standard importance sampling estimator, we might draw n samples from independent distributions, upper X Subscript i Baseline tilde p Subscript x and upper Y Subscript i Baseline tilde p Subscript y , and compute

StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction f left-parenthesis upper X Subscript i Baseline comma upper Y Subscript i Baseline right-parenthesis Over p Subscript x Baseline left-parenthesis upper X Subscript i Baseline right-parenthesis p Subscript y Baseline left-parenthesis upper Y Subscript i Baseline right-parenthesis EndFraction period

Splitting allows us to formalize the idea of taking more than one sample for the integral over upper B for each sample taken in upper A . With splitting, we might take m samples upper Y Subscript i comma j for each sample upper X Subscript i , giving the estimator

StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction 1 Over m EndFraction sigma-summation Underscript j equals 1 Overscript m Endscripts StartFraction f left-parenthesis upper X Subscript i Baseline comma upper Y Subscript i comma j Baseline right-parenthesis Over p Subscript x Baseline left-parenthesis upper X Subscript i Baseline right-parenthesis p Subscript y Baseline left-parenthesis upper Y Subscript i comma j Baseline right-parenthesis EndFraction period

If it is possible to partially evaluate f left-parenthesis upper X Subscript i Baseline comma dot right-parenthesis for each upper X Subscript i , then we can compute a total of n m samples more efficiently than we had taken n m independent upper X Subscript i values using Equation (2.18).

For an example from rendering, an integral of the form of Equation (2.17) is evaluated to compute the color of pixels in an image: an integral is taken over the area of the pixel upper A where at each point in the pixel x , a ray is traced into the scene and the reflected radiance at the intersection point is computed using an integral over the hemisphere (denoted here by upper B ) for which one or more rays are traced. With splitting, we can take multiple samples for each lighting integral, improving efficiency by amortizing the cost of tracing the initial ray from the camera over them.