5.2 Projective Camera Models

One of the fundamental issues in 3D computer graphics is the 3D viewing problem: how to project a 3D scene onto a 2D image for display. Most of the classic approaches can be expressed by a 4 times 4 projective transformation matrix. Therefore, we will introduce a projection matrix camera class, ProjectiveCamera, and then define two camera models based on it. The first implements an orthographic projection, and the other implements a perspective projection—two classic and widely used projections.

<<ProjectiveCamera Definition>>= 
class ProjectiveCamera : public CameraBase { public: <<ProjectiveCamera Public Methods>> 
ProjectiveCamera() = default; void InitMetadata(ImageMetadata *metadata) const; std::string BaseToString() const; ProjectiveCamera(CameraBaseParameters baseParameters, const Transform &screenFromCamera, Bounds2f screenWindow, Float lensRadius, Float focalDistance) : CameraBase(baseParameters), screenFromCamera(screenFromCamera), lensRadius(lensRadius), focalDistance(focalDistance) { <<Compute projective camera transformations>> 
<<Compute projective camera screen transformations>> 
Transform NDCFromScreen = Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMax.y - screenWindow.pMin.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); Transform rasterFromNDC = Scale(film.FullResolution().x, -film.FullResolution().y, 1); rasterFromScreen = rasterFromNDC * NDCFromScreen; screenFromRaster = Inverse(rasterFromScreen);
cameraFromRaster = Inverse(screenFromCamera) * screenFromRaster;
}
protected: <<ProjectiveCamera Protected Members>> 
Transform screenFromCamera, cameraFromRaster; Transform rasterFromScreen, screenFromRaster; Float lensRadius, focalDistance;
};

The orthographic and perspective projections both require the specification of two planes perpendicular to the viewing direction: the near and far planes. When rasterization is used for rendering, objects that are not between those two planes are culled and not included in the final image. (Culling objects in front of the near plane is particularly important in order to avoid a singularity at the depth 0 and because otherwise the projection matrices map points behind the camera to appear to be in front of it.) In a ray tracer, the projection matrices are used purely to determine rays leaving the camera and these concerns do not apply; there is therefore less need to worry about setting those planes’ depths carefully in this context.

Three more coordinate systems (summarized in Figure 5.2) are useful for defining and discussing projective cameras:

  • Screen space: Screen space is defined on the film plane. The camera projects objects in camera space onto the film plane; the parts inside the screen window are visible in the image that is generated. Points at the near plane are mapped to a depth  z value of 0 and points at the far plane are mapped to 1. Note that, although this is called “screen” space, it is still a 3D coordinate system, since z values are meaningful.
  • Normalized device coordinate (NDC) space: This is the coordinate system for the actual image being rendered. In x and y , this space ranges from left-parenthesis 0 comma 0 right-parenthesis to left-parenthesis 1 comma 1 right-parenthesis , with left-parenthesis 0 comma 0 right-parenthesis being the upper-left corner of the image. Depth values are the same as in screen space, and a linear transformation converts from screen to NDC space.
  • Raster space: This is almost the same as NDC space, except the x and y coordinates range from left-parenthesis 0 comma 0 right-parenthesis to the resolution of the image in x and y pixels.

Projective cameras use 4 times 4 matrices to transform among all of these spaces.

Figure 5.2: Several camera-related coordinate spaces are commonly used to simplify the implementation of Cameras. The camera class holds transformations between them. Scene objects in rendering space are viewed by the camera, which sits at the origin of camera space and points along the plus z axis. Objects between the near and far planes are projected onto the film plane at z equals normal n normal e normal a normal r in camera space. The film plane is at z equals 0 in raster space, where x and y range from left-parenthesis 0 comma 0 right-parenthesis to the image resolution in pixels. Normalized device coordinate (NDC) space normalizes raster space so that x and y range from left-parenthesis 0 comma 0 right-parenthesis to left-parenthesis 1 comma 1 right-parenthesis .

In addition to the parameters required by the CameraBase class, the ProjectiveCamera takes the projective transformation matrix, the screen space extent of the image, and additional parameters related to the distance at which the camera is focused and the size of its lens aperture. If the lens aperture is not an infinitesimal pinhole, then parts of the image may be blurred, as happens for out-of-focus objects with real lens systems. Simulation of this effect will be discussed later in this section.

<<ProjectiveCamera Public Methods>>= 
ProjectiveCamera(CameraBaseParameters baseParameters, const Transform &screenFromCamera, Bounds2f screenWindow, Float lensRadius, Float focalDistance) : CameraBase(baseParameters), screenFromCamera(screenFromCamera), lensRadius(lensRadius), focalDistance(focalDistance) { <<Compute projective camera transformations>> 
<<Compute projective camera screen transformations>> 
Transform NDCFromScreen = Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMax.y - screenWindow.pMin.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); Transform rasterFromNDC = Scale(film.FullResolution().x, -film.FullResolution().y, 1); rasterFromScreen = rasterFromNDC * NDCFromScreen; screenFromRaster = Inverse(rasterFromScreen);
cameraFromRaster = Inverse(screenFromCamera) * screenFromRaster;
}

ProjectiveCamera implementations pass the projective transformation up to the base class constructor shown here. This transformation gives the screen-from-camera projection; from that, the constructor can easily compute the other transformations that go all the way from raster space to camera space.

<<Compute projective camera transformations>>= 
<<Compute projective camera screen transformations>> 
Transform NDCFromScreen = Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMax.y - screenWindow.pMin.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); Transform rasterFromNDC = Scale(film.FullResolution().x, -film.FullResolution().y, 1); rasterFromScreen = rasterFromNDC * NDCFromScreen; screenFromRaster = Inverse(rasterFromScreen);
cameraFromRaster = Inverse(screenFromCamera) * screenFromRaster;

<<ProjectiveCamera Protected Members>>= 
Transform screenFromCamera, cameraFromRaster;

The only nontrivial transformation to compute in the constructor is the raster-from-screen projection. It is computed in two steps, via composition of the raster-from-NDC and NDC-from-screen transformations. An important detail here is that the y coordinate is inverted by the final transformation; this is necessary because increasing y values move up the image in screen coordinates but down in raster coordinates.

<<Compute projective camera screen transformations>>= 
Transform NDCFromScreen = Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMax.y - screenWindow.pMin.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); Transform rasterFromNDC = Scale(film.FullResolution().x, -film.FullResolution().y, 1); rasterFromScreen = rasterFromNDC * NDCFromScreen; screenFromRaster = Inverse(rasterFromScreen);

<<ProjectiveCamera Protected Members>>+=  
Transform rasterFromScreen, screenFromRaster;

5.2.1 Orthographic Camera

The orthographic camera is based on the orthographic projection transformation. The orthographic transformation takes a rectangular region of the scene and projects it onto the front face of the box that defines the region. It does not give the effect of foreshortening—objects becoming smaller on the image plane as they get farther away—but it does leave parallel lines parallel, and it preserves relative distance between objects. Figure 5.3 shows how this rectangular volume defines the visible region of the scene.

<<OrthographicCamera Definition>>= 
class OrthographicCamera : public ProjectiveCamera { public: <<OrthographicCamera Public Methods>> 
OrthographicCamera(CameraBaseParameters baseParameters, Bounds2f screenWindow, Float lensRadius, Float focalDist) : ProjectiveCamera(baseParameters, Orthographic(0, 1), screenWindow, lensRadius, focalDist) { <<Compute differential changes in origin for orthographic camera rays>>  <<Compute minimum differentials for orthographic camera>> 
minDirDifferentialX = minDirDifferentialY = Vector3f(0, 0, 0); minPosDifferentialX = dxCamera; minPosDifferentialY = dyCamera;
} PBRT_CPU_GPU pstd::optional<CameraRay> GenerateRay(CameraSample sample, SampledWavelengths &lambda) const; PBRT_CPU_GPU pstd::optional<CameraRayDifferential> GenerateRayDifferential( CameraSample sample, SampledWavelengths &lambda) const; static OrthographicCamera *Create(const ParameterDictionary &parameters, const CameraTransform &cameraTransform, Film film, Medium medium, const FileLoc *loc, Allocator alloc = {}); PBRT_CPU_GPU SampledSpectrum We(const Ray &ray, SampledWavelengths &lambda, Point2f *pRaster2 = nullptr) const { LOG_FATAL("We() unimplemented for OrthographicCamera"); return {}; } PBRT_CPU_GPU void PDF_We(const Ray &ray, Float *pdfPos, Float *pdfDir) const { LOG_FATAL("PDF_We() unimplemented for OrthographicCamera"); } PBRT_CPU_GPU pstd::optional<CameraWiSample> SampleWi(const Interaction &ref, Point2f u, SampledWavelengths &lambda) const { LOG_FATAL("SampleWi() unimplemented for OrthographicCamera"); return {}; } std::string ToString() const;
private: <<OrthographicCamera Private Members>> 
Vector3f dxCamera, dyCamera;
};

Figure 5.3: The orthographic view volume is an axis-aligned box in camera space, defined such that objects inside the region are projected onto the z equals normal n normal e normal a normal r face of the box.

Figure 5.4: Kroken Scene Rendered with Different Camera Models. Images are rendered from the same viewpoint with (a) orthographic and (b) perspective cameras. The lack of foreshortening makes the orthographic view feel like it has less depth, although it does preserve parallel lines, which can be a useful property. (Scene courtesy of Angelo Ferretti.)

Figure 5.4 compares the result of using the orthographic projection for rendering to that of the perspective projection defined in the next section.

The orthographic camera constructor generates the orthographic transformation matrix with the Orthographic() function, which will be defined shortly.

<<OrthographicCamera Public Methods>>= 
OrthographicCamera(CameraBaseParameters baseParameters, Bounds2f screenWindow, Float lensRadius, Float focalDist) : ProjectiveCamera(baseParameters, Orthographic(0, 1), screenWindow, lensRadius, focalDist) { <<Compute differential changes in origin for orthographic camera rays>>  <<Compute minimum differentials for orthographic camera>> 
minDirDifferentialX = minDirDifferentialY = Vector3f(0, 0, 0); minPosDifferentialX = dxCamera; minPosDifferentialY = dyCamera;
}

The orthographic viewing transformation leaves x and y coordinates unchanged but maps  z values at the near plane to 0 and  z values at the far plane to 1. To do this, the scene is first translated along the z axis so that the near plane is aligned with z equals 0 . Then, the scene is scaled in z so that the far plane maps to z equals 1 . The composition of these two transformations gives the overall transformation. For a ray tracer like pbrt, we would like the near plane to be at 0 so that rays start at the plane that goes through the camera’s position; the far plane’s position does not particularly matter.

<<Transform Function Definitions>>+=  
Transform Orthographic(Float zNear, Float zFar) { return Scale(1, 1, 1 / (zFar - zNear)) * Translate(Vector3f(0, 0, -zNear)); }

Thanks to the simplicity of the orthographic projection, it is easy to directly compute the differential rays in the x and y directions in the GenerateRayDifferential() method. The directions of the differential rays will be the same as the main ray (as they are for all rays generated by an orthographic camera), and the difference in origins will be the same for all rays. Therefore, the constructor here precomputes how much the ray origins shift in camera space coordinates due to a single pixel shift in the x and y directions on the film plane.

<<Compute differential changes in origin for orthographic camera rays>>= 

<<OrthographicCamera Private Members>>= 
Vector3f dxCamera, dyCamera;

We can now go through the code that takes a sample point in raster space and turns it into a camera ray. The process is summarized in Figure 5.5. First, the raster space sample position is transformed into a point in camera space, giving a point located on the near plane, which is the origin of the camera ray. Because the camera space viewing direction points down the z axis, the camera space ray direction is left-parenthesis 0 comma 0 comma 1 right-parenthesis .

Figure 5.5: To create a ray with the orthographic camera, a raster space position on the film plane is transformed to camera space, giving the ray’s origin on the near plane. The ray’s direction in camera space is left-parenthesis 0 comma 0 comma 1 right-parenthesis , down the z axis.

If the lens aperture is not a pinhole, the ray’s origin and direction are modified so that defocus blur is simulated. Finally, the ray is transformed into rendering space before being returned.

<<OrthographicCamera Method Definitions>>= 
pstd::optional<CameraRay> OrthographicCamera::GenerateRay( CameraSample sample, SampledWavelengths &lambda) const { <<Compute raster and camera sample positions>> 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = cameraFromRaster(pFilm);
Ray ray(pCamera, Vector3f(0, 0, 1), SampleTime(sample.time), medium); <<Modify ray for depth of field>> 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray.d.z; Point3f pFocus = ray(ft);
<<Update ray for effect of lens>> 
ray.o = Point3f(pLens.x, pLens.y, 0); ray.d = Normalize(pFocus - ray.o);
}
return CameraRay{RenderFromCamera(ray)}; }

Once all the transformation matrices have been set up, it is easy to transform the raster space sample point to camera space.

<<Compute raster and camera sample positions>>= 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = cameraFromRaster(pFilm);

The implementation of GenerateRayDifferential() performs the same computation to generate the main camera ray. The differential ray origins are found using the offsets computed in the OrthographicCamera constructor, and then the full ray differential is transformed to rendering space.

<<OrthographicCamera Method Definitions>>+= 
pstd::optional<CameraRayDifferential> OrthographicCamera::GenerateRayDifferential(CameraSample sample, SampledWavelengths &lambda) const { <<Compute main orthographic viewing ray>> 
<<Compute raster and camera sample positions>> 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = cameraFromRaster(pFilm);
RayDifferential ray(pCamera, Vector3f(0, 0, 1), SampleTime(sample.time), medium); <<Modify ray for depth of field>> 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray.d.z; Point3f pFocus = ray(ft);
<<Update ray for effect of lens>> 
ray.o = Point3f(pLens.x, pLens.y, 0); ray.d = Normalize(pFocus - ray.o);
}
<<Compute ray differentials for OrthographicCamera>> 
if (lensRadius > 0) { <<Compute OrthographicCamera ray differentials accounting for lens>> 
<<Sample point on lens>>  Float ft = focalDistance / ray.d.z; Point3f pFocus = pCamera + dxCamera + (ft * Vector3f(0, 0, 1)); ray.rxOrigin = Point3f(pLens.x, pLens.y, 0); ray.rxDirection = Normalize(pFocus - ray.rxOrigin); pFocus = pCamera + dyCamera + (ft * Vector3f(0, 0, 1)); ray.ryOrigin = Point3f(pLens.x, pLens.y, 0); ray.ryDirection = Normalize(pFocus - ray.ryOrigin);
} else { ray.rxOrigin = ray.o + dxCamera; ray.ryOrigin = ray.o + dyCamera; ray.rxDirection = ray.ryDirection = ray.d; }
ray.hasDifferentials = true; return CameraRayDifferential{RenderFromCamera(ray)}; }

<<Compute ray differentials for OrthographicCamera>>= 
if (lensRadius > 0) { <<Compute OrthographicCamera ray differentials accounting for lens>> 
<<Sample point on lens>>  Float ft = focalDistance / ray.d.z; Point3f pFocus = pCamera + dxCamera + (ft * Vector3f(0, 0, 1)); ray.rxOrigin = Point3f(pLens.x, pLens.y, 0); ray.rxDirection = Normalize(pFocus - ray.rxOrigin); pFocus = pCamera + dyCamera + (ft * Vector3f(0, 0, 1)); ray.ryOrigin = Point3f(pLens.x, pLens.y, 0); ray.ryDirection = Normalize(pFocus - ray.ryOrigin);
} else { ray.rxOrigin = ray.o + dxCamera; ray.ryOrigin = ray.o + dyCamera; ray.rxDirection = ray.ryDirection = ray.d; }

5.2.2 Perspective Camera

The perspective projection is similar to the orthographic projection in that it projects a volume of space onto a 2D film plane. However, it includes the effect of foreshortening: objects that are far away are projected to be smaller than objects of the same size that are closer. Unlike the orthographic projection, the perspective projection does not preserve distances or angles, and parallel lines no longer remain parallel. The perspective projection is a reasonably close match to how an eye or camera lens generates images of the 3D world.

<<PerspectiveCamera Definition>>= 
class PerspectiveCamera : public ProjectiveCamera { public: <<PerspectiveCamera Public Methods>> 
PerspectiveCamera(CameraBaseParameters baseParameters, Float fov, Bounds2f screenWindow, Float lensRadius, Float focalDist) : ProjectiveCamera(baseParameters, Perspective(fov, 1e-2f, 1000.f), screenWindow, lensRadius, focalDist) { <<Compute differential changes in origin for perspective camera rays>>  <<Compute cosTotalWidth for perspective camera>> 
Point2f radius = Point2f(film.GetFilter().Radius()); Point3f pCorner(-radius.x, -radius.y, 0.f); Vector3f wCornerCamera = Normalize(Vector3f(cameraFromRaster(pCorner))); cosTotalWidth = wCornerCamera.z;
<<Compute image plane area at z equals 1 for PerspectiveCamera>> 
<<Compute minimum differentials for PerspectiveCamera>>  } PerspectiveCamera() = default; static PerspectiveCamera *Create(const ParameterDictionary &parameters, const CameraTransform &cameraTransform, Film film, Medium medium, const FileLoc *loc, Allocator alloc = {}); PBRT_CPU_GPU pstd::optional<CameraRay> GenerateRay(CameraSample sample, SampledWavelengths &lambda) const; PBRT_CPU_GPU pstd::optional<CameraRayDifferential> GenerateRayDifferential( CameraSample sample, SampledWavelengths &lambda) const; PBRT_CPU_GPU SampledSpectrum We(const Ray &ray, SampledWavelengths &lambda, Point2f *pRaster2 = nullptr) const; PBRT_CPU_GPU void PDF_We(const Ray &ray, Float *pdfPos, Float *pdfDir) const; PBRT_CPU_GPU pstd::optional<CameraWiSample> SampleWi(const Interaction &ref, Point2f u, SampledWavelengths &lambda) const; std::string ToString() const;
private: <<PerspectiveCamera Private Members>> 
Vector3f dxCamera, dyCamera; Float cosTotalWidth;
};

<<PerspectiveCamera Public Methods>>= 
PerspectiveCamera(CameraBaseParameters baseParameters, Float fov, Bounds2f screenWindow, Float lensRadius, Float focalDist) : ProjectiveCamera(baseParameters, Perspective(fov, 1e-2f, 1000.f), screenWindow, lensRadius, focalDist) { <<Compute differential changes in origin for perspective camera rays>>  <<Compute cosTotalWidth for perspective camera>> 
Point2f radius = Point2f(film.GetFilter().Radius()); Point3f pCorner(-radius.x, -radius.y, 0.f); Vector3f wCornerCamera = Normalize(Vector3f(cameraFromRaster(pCorner))); cosTotalWidth = wCornerCamera.z;
<<Compute image plane area at z equals 1 for PerspectiveCamera>> 
<<Compute minimum differentials for PerspectiveCamera>>  }

The perspective projection describes perspective viewing of the scene. Points in the scene are projected onto a viewing plane perpendicular to the z axis. The Perspective() function computes this transformation; it takes a field-of-view angle in fov and the distances to a near z plane and a far z plane (Figure 5.6).

Figure 5.6: The perspective transformation matrix projects points in camera space onto the near plane. The x prime and y prime coordinates of the projected points are equal to the unprojected x and y coordinates divided by the z coordinate. That operation is depicted here, where the effect of the projection is indicated by an arrow. The projected z prime coordinate is then computed so that points on the near plane map to z prime equals 0 and points on the far plane map to z prime equals 1 .

<<Transform Function Definitions>>+= 
Transform Perspective(Float fov, Float n, Float f) { <<Perform projective divide for perspective projection>> 
SquareMatrix<4> persp(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, f / (f - n), -f*n / (f - n), 0, 0, 1, 0);
<<Scale canonical perspective view to specified field of view>> 
Float invTanAng = 1 / std::tan(Radians(fov) / 2); return Scale(invTanAng, invTanAng, 1) * Transform(persp);
}

The transformation is most easily understood in two steps:

  1. Points normal p Subscript in camera space are projected onto the viewing plane. A bit of algebra shows that the projected  x prime and  y prime coordinates on the viewing plane can be computed by dividing  x and  y by the point’s z coordinate value. The projected  z depth is remapped so that  z values at the near plane are 0 and  z values at the far plane are 1. The computation we would like to do is
    StartLayout 1st Row 1st Column x prime 2nd Column equals x slash z 2nd Row 1st Column y prime 2nd Column equals y slash z 3rd Row 1st Column z prime 2nd Column equals StartFraction f left-parenthesis z minus n right-parenthesis Over z left-parenthesis f minus n right-parenthesis EndFraction period EndLayout
    All of this computation can be encoded in a 4 times 4 matrix that can then be applied to homogeneous coordinates:
    Start 4 By 4 Matrix 1st Row 1st Column 1 2nd Column 0 3rd Column 0 4th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 0 4th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column StartFraction f Over f minus n EndFraction 4th Column minus StartFraction f n Over f minus n EndFraction 4th Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 EndMatrix
    <<Perform projective divide for perspective projection>>= 
    SquareMatrix<4> persp(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, f / (f - n), -f*n / (f - n), 0, 0, 1, 0);
  2. The angular field of view (fov) specified by the user is accounted for by scaling the left-parenthesis x comma y right-parenthesis values on the projection plane so that points inside the field of view project to coordinates between left-bracket negative 1 comma 1 right-bracket on the view plane. For square images, both x and y lie between left-bracket negative 1 comma 1 right-bracket in screen space. Otherwise, the direction in which the image is narrower maps to left-bracket negative 1 comma 1 right-bracket , and the wider direction maps to a proportionally larger range of screen space values. Recall that the tangent is equal to the ratio of the opposite side of a right triangle to the adjacent side. Here the adjacent side has length 1, so the opposite side has the length tangent left-parenthesis monospace f monospace o monospace v slash 2 right-parenthesis . Scaling by the reciprocal of this length maps the field of view to the range left-bracket negative 1 comma 1 right-bracket .
    <<Scale canonical perspective view to specified field of view>>= 
    Float invTanAng = 1 / std::tan(Radians(fov) / 2); return Scale(invTanAng, invTanAng, 1) * Transform(persp);

As with the OrthographicCamera, the PerspectiveCamera’s constructor computes information about how the rays it generates change with shifts in pixels. In this case, the ray origins are unchanged and the ray differentials are only different in their directions. Here, we compute the change in position on the near perspective plane in camera space with respect to shifts in pixel location.

<<Compute differential changes in origin for perspective camera rays>>= 

<<PerspectiveCamera Private Members>>= 
Vector3f dxCamera, dyCamera;

The cosine of the maximum angle of the perspective camera’s field of view will occasionally be useful. In particular, points outside the field of view can be quickly culled via a dot product with the viewing direction and comparison to this value. This cosine can be found by computing the angle between the camera’s viewing vector and a vector to one of the corners of the image (see Figure 5.7). This corner needs a small adjustment here to account for the width of the filter function centered at each pixel that is used to weight image samples according to their location (this topic is discussed in Section 8.8).

Figure 5.7: Computing the Cosine of the Perspective Camera’s Maximum View Angle. A cone that bounds the viewing directions of a PerspectiveCamera can be found by using the camera’s viewing direction as the center axis and by computing the cosine of the angle theta between that axis and a vector to one of the corners of the image. In camera space, that simplifies to be the z component of that vector, normalized.

<<Compute cosTotalWidth for perspective camera>>= 
Point2f radius = Point2f(film.GetFilter().Radius()); Point3f pCorner(-radius.x, -radius.y, 0.f); Vector3f wCornerCamera = Normalize(Vector3f(cameraFromRaster(pCorner))); cosTotalWidth = wCornerCamera.z;

<<PerspectiveCamera Private Members>>+= 
Float cosTotalWidth;

With the perspective projection, camera space rays all originate from the origin, left-parenthesis 0 comma 0 comma 0 right-parenthesis . A ray’s direction is given by the vector from the origin to the point on the near plane, pCamera, that corresponds to the provided CameraSample’s pFilm location. In other words, the ray’s vector direction is component-wise equal to this point’s position, so rather than doing a useless subtraction to compute the direction, we just initialize the direction directly from the point pCamera.

<<PerspectiveCamera Method Definitions>>= 
pstd::optional<CameraRay> PerspectiveCamera::GenerateRay( CameraSample sample, SampledWavelengths &lambda) const { <<Compute raster and camera sample positions>> 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = cameraFromRaster(pFilm);
Ray ray(Point3f(0, 0, 0), Normalize(Vector3f(pCamera)), SampleTime(sample.time), medium); <<Modify ray for depth of field>> 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray.d.z; Point3f pFocus = ray(ft);
<<Update ray for effect of lens>> 
ray.o = Point3f(pLens.x, pLens.y, 0); ray.d = Normalize(pFocus - ray.o);
}
return CameraRay{RenderFromCamera(ray)}; }

The GenerateRayDifferential() method follows the implementation of GenerateRay(), except for this additional fragment that computes the differential rays.

<<Compute offset rays for PerspectiveCamera ray differentials>>= 
if (lensRadius > 0) { <<Compute PerspectiveCamera ray differentials accounting for lens>> 
<<Sample point on lens>>  <<Compute x ray differential for PerspectiveCamera with lens>> 
Vector3f dx = Normalize(Vector3f(pCamera + dxCamera)); Float ft = focalDistance / dx.z; Point3f pFocus = Point3f(0, 0, 0) + (ft * dx); ray.rxOrigin = Point3f(pLens.x, pLens.y, 0); ray.rxDirection = Normalize(pFocus - ray.rxOrigin);
<<Compute y ray differential for PerspectiveCamera with lens>> 
Vector3f dy = Normalize(Vector3f(pCamera + dyCamera)); ft = focalDistance / dy.z; pFocus = Point3f(0, 0, 0) + (ft * dy); ray.ryOrigin = Point3f(pLens.x, pLens.y, 0); ray.ryDirection = Normalize(pFocus - ray.ryOrigin);
} else { ray.rxOrigin = ray.ryOrigin = ray.o; ray.rxDirection = Normalize(Vector3f(pCamera) + dxCamera); ray.ryDirection = Normalize(Vector3f(pCamera) + dyCamera); }

5.2.3 The Thin Lens Model and Depth of Field

An ideal pinhole camera that only allows rays passing through a single point to reach the film is not physically realizable; while it is possible to make cameras with extremely small apertures that approach this behavior, small apertures allow relatively little light to reach the film sensor. With a small aperture, long exposure times are required to capture enough photons to accurately capture the image, which in turn can lead to blur from objects in the scene moving while the camera shutter is open.

Real cameras have lens systems that focus light through a finite-sized aperture onto the film plane. Camera designers (and photographers using cameras with adjustable apertures) face a trade-off: the larger the aperture, the more light reaches the film and the shorter the exposures that are needed. However, lenses can only focus on a single plane (the focal plane), and the farther objects in the scene are from this plane, the blurrier they are. The larger the aperture, the more pronounced this effect is.

The RealisticCamera (included only in the online edition of the book) implements a fairly accurate simulation of lens systems in real-world cameras. For the simple camera models introduced so far, we can apply a classic approximation from optics, the thin lens approximation, to model the effect of finite apertures with traditional computer graphics projection models. The thin lens approximation models an optical system as a single lens with spherical profiles, where the thickness of the lens is small relative to the radius of curvature of the lens.

Under the thin lens approximation, incident rays that are parallel to the optical axis and pass through the lens focus at a point behind the lens called the focal point. The distance the focal point is behind the lens, f , is the lens’s focal length. If the film plane is placed at a distance equal to the focal length behind the lens, then objects infinitely far away will be in focus, as they image to a single point on the film.

Figure 5.8 illustrates the basic setting. Here we have followed the typical lens coordinate system convention of placing the lens perpendicular to the z axis, with the lens at z equals 0 and the scene along negative z . (Note that this is a different coordinate system from the one we used for camera space, where the viewing direction is plus z .) Distances on the scene side of the lens are denoted with unprimed variables z , and distances on the film side of the lens (positive z ) are primed, z prime .

Figure 5.8: A thin lens, located along the z axis at z equals 0 . Incident rays that are parallel to the optical axis and pass through a thin lens (dashed lines) all pass through a point normal p Subscript , the focal point. The distance between the lens and the focal point, f , is the lens’s focal length.

For points in the scene at a depth z from a thin lens with focal length f , the Gaussian lens equation relates the distances from the object to the lens and from the lens to the image of the point:

StartFraction 1 Over z Superscript prime Baseline EndFraction minus StartFraction 1 Over z EndFraction equals StartFraction 1 Over f EndFraction period
(5.1)

Note that for z equals negative normal infinity , we have z prime equals f , as expected.

We can use the Gaussian lens equation to solve for the distance between the lens and the film that sets the plane of focus at some z , the focal distance (Figure 5.9):

z prime equals StartFraction f z Over f plus z EndFraction period
(5.2)

Figure 5.9: To focus a thin lens at a depth z in the scene, Equation (5.2) can be used to compute the distance z prime on the film side of the lens that points at z focus to. Focusing is performed by adjusting the distance between the lens and the film plane.

A point that does not lie on the plane of focus is imaged to a disk on the film plane, rather than to a single point. The boundary of this disk is called the circle of confusion. The size of the circle of confusion is affected by the diameter of the aperture that light rays pass through, the focal distance, and the distance between the object and the lens. Although the circle of confusion only has zero radius for a single depth, a range of nearby depths have small enough circles of confusion that they still appear to be in focus. (As long as its circle of confusion is smaller than the spacing between pixels, a point will effectively appear to be in focus.) The range of depths that appear in focus are termed the depth of field.

Figure 5.10 shows this effect, in the Watercolor scene. As the size of the lens aperture increases, blurriness increases the farther a point is from the plane of focus. Note that the pencil cup in the center remains in focus throughout all the images, as the plane of focus has been placed at its depth. Figure 5.11 shows depth of field used to render the landscape scene. Note how the effect draws the viewer’s eye to the in-focus grass in the center of the image.

Figure 5.10: (a) Scene rendered with no defocus blur, (b) extensive depth of field due to a relatively small lens aperture, which gives only a small amount of blurriness in the out-of-focus regions, and (c) a very large aperture, giving a larger circle of confusion in the out-of-focus areas, resulting in a greater amount of blur on the film plane. (Scene courtesy of Angelo Ferretti.)

Figure 5.11: Depth of field gives a greater sense of depth and scale to this part of the landscape scene. (Scene courtesy of Laubwerk.)

The Gaussian lens equation also lets us compute the size of the circle of confusion; given a lens with focal length f that is focused at a distance z Subscript normal f , the film plane is at z prime Subscript normal f . Given another point at depth z , the Gaussian lens equation gives the distance z prime that the lens focuses the point to. This point is either in front of or behind the film plane; Figure 5.12(a) shows the case where it is behind.

Figure 5.12: (a) If a thin lens with focal length f is focused at some depth z Subscript normal f , then the distance from the lens to the focus plane is z prime Subscript normal f , given by the Gaussian lens equation. A point in the scene at depth z not-equals z Subscript normal f will be imaged as a circle on the film plane; here z focuses at z prime , which is behind the film plane. (b) To compute the diameter of the circle of confusion, we can apply similar triangles: the ratio of d Subscript normal l , the diameter of the lens, to z prime must be the same as the ratio of d Subscript normal c , the diameter of the circle of confusion, to z prime minus z prime Subscript normal f .

The diameter of the circle of confusion is given by the intersection of the cone between z prime and the lens with the film plane. If we know the diameter of the lens d Subscript normal l , then we can use similar triangles to solve for the diameter of the circle of confusion d Subscript normal c (Figure 5.12(b)):

StartFraction d Subscript normal l Baseline Over z Superscript prime Baseline EndFraction equals StartFraction d Subscript normal c Baseline Over StartAbsoluteValue z prime minus z prime Subscript normal f EndAbsoluteValue EndFraction period

Solving for d Subscript normal c , we have

d Subscript normal c Baseline equals StartAbsoluteValue StartFraction d Subscript normal l Baseline left-parenthesis z prime minus z prime Subscript normal f right-parenthesis Over z Superscript prime Baseline EndFraction EndAbsoluteValue period

Applying the Gaussian lens equation to express the result in terms of scene depths, we can find that

d Subscript normal c Baseline equals StartAbsoluteValue StartFraction d Subscript normal l Baseline f left-parenthesis z minus z Subscript normal f Baseline right-parenthesis Over z left-parenthesis f plus z Subscript normal f Baseline right-parenthesis EndFraction EndAbsoluteValue period

Note that the diameter of the circle of confusion is proportional to the diameter of the lens. The lens diameter is often expressed as the lens’s f-number n , which expresses diameter as a fraction of focal length, d Subscript normal l Baseline equals f slash n .

Figure 5.13 shows a graph of this function for a 50-mm focal length lens with a 25-mm aperture, focused at z Subscript normal f Baseline equals 1 normal m . Note that the blur is asymmetric with depth around the focal plane and grows much more quickly for objects in front of the plane of focus than for objects behind it.

Figure 5.13: The diameter of the circle of confusion as a function of depth for a 50-mm focal length lens with 25-mm aperture, focused at 1 meter.

Modeling a thin lens in a ray tracer is remarkably straightforward: all that is necessary is to choose a point on the lens and find the appropriate ray that starts on the lens at that point such that objects in the plane of focus are in focus on the film (Figure 5.14). Therefore, projective cameras take two extra parameters for depth of field: one sets the size of the lens aperture, and the other sets the focal distance.

Figure 5.14: (a) For a pinhole camera model, a single camera ray is associated with each point on the film plane (filled circle), given by the ray that passes through the single point of the pinhole lens (empty circle). (b) For a camera model with a finite aperture, we sample a point (filled circle) on the disk-shaped lens for each ray. We then compute the ray that passes through the center of the lens (corresponding to the pinhole model) and the point where it intersects the plane of focus (solid line). We know that all objects in the plane of focus must be in focus, regardless of the lens sample position. Therefore, the ray corresponding to the lens position sample (dashed line) is given by the ray starting on the lens sample point and passing through the computed intersection point on the plane of focus.

<<ProjectiveCamera Protected Members>>+= 
Float lensRadius, focalDistance;

It is generally necessary to trace many rays for each image pixel in order to adequately sample the lens for smooth defocus blur. Figure 5.15 shows the landscape scene from Figure 5.11 with only four samples per pixel (Figure 5.11 had 2048 samples per pixel).

Figure 5.15: Landscape scene with depth of field and only four samples per pixel: the depth of field is undersampled and the image is grainy. (Scene courtesy of Laubwerk.)

<<Modify ray for depth of field>>= 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray.d.z; Point3f pFocus = ray(ft);
<<Update ray for effect of lens>> 
ray.o = Point3f(pLens.x, pLens.y, 0); ray.d = Normalize(pFocus - ray.o);
}

The SampleUniformDiskConcentric() function, which is defined in Section A.5.1, takes a left-parenthesis u comma v right-parenthesis sample position in left-bracket 0 comma 1 right-parenthesis squared and maps it to a 2D unit disk centered at the origin left-parenthesis 0 comma 0 right-parenthesis . To turn this into a point on the lens, these coordinates are scaled by the lens radius. The CameraSample class provides the left-parenthesis u comma v right-parenthesis lens-sampling parameters in the pLens member variable.

<<Sample point on lens>>= 

The ray’s origin is this point on the lens. Now it is necessary to determine the proper direction for the new ray. We know that all rays from the given image sample through the lens must converge at the same point on the plane of focus. Furthermore, we know that rays pass through the center of the lens without a change in direction, so finding the appropriate point of convergence is a matter of intersecting the unperturbed ray from the pinhole model with the plane of focus and then setting the new ray’s direction to be the vector from the point on the lens to the intersection point.

For this simple model, the plane of focus is perpendicular to the z axis and the ray starts at the origin, so intersecting the ray through the lens center with the plane of focus is straightforward. The t value of the intersection is given by

t equals StartFraction f o c a l upper D i s t a n c e Over bold d Subscript z Baseline EndFraction period

<<Compute point on plane of focus>>= 
Float ft = focalDistance / ray.d.z; Point3f pFocus = ray(ft);

Now the ray can be initialized. The origin is set to the sampled point on the lens, and the direction is set so that the ray passes through the point on the plane of focus, pFocus.

<<Update ray for effect of lens>>= 
ray.o = Point3f(pLens.x, pLens.y, 0); ray.d = Normalize(pFocus - ray.o);

To compute ray differentials with the thin lens, the approach used in the fragment <<Update ray for effect of lens>> is applied to rays offset one pixel in the x and y directions on the film plane. The fragments that implement this, <<Compute OrthographicCamera ray differentials accounting for lens>> and <<Compute PerspectiveCamera ray differentials accounting for lens>>, are not included here.