6.2 Projective Camera Models
One of the fundamental issues in 3D computer graphics is the 3D viewing problem: how to project a 3D scene onto a 2D image for display. Most of the classic approaches can be expressed by a projective transformation matrix. Therefore, we will introduce a projection matrix camera class, ProjectiveCamera, and then define two camera models based on it. The first implements an orthographic projection, and the other implements a perspective projection—two classic and widely used projections.
Three more coordinate systems (summarized in Figure 6.1) are useful for defining and discussing projective cameras:
- Screen space: Screen space is defined on the film plane. The camera projects objects in camera space onto the film plane; the parts inside the screen window are visible in the image that is generated. Depth values in screen space range from 0 to 1, corresponding to points at the near and far clipping planes, respectively. Note that, although this is called “screen” space, it is still a 3D coordinate system, since values are meaningful.
- Normalized device coordinate (NDC) space: This is the coordinate system for the actual image being rendered. In and , this space ranges from to , with being the upper-left corner of the image. Depth values are the same as in screen space, and a linear transformation converts from screen to NDC space.
- Raster space: This is almost the same as NDC space, except the and coordinates range from to .
Projective cameras use matrices to transform among all of these spaces, but cameras with unusual imaging characteristics can’t necessarily represent all of these transformations with matrices.
In addition to the parameters required by the Camera base class, the ProjectiveCamera takes the projective transformation matrix, the screen space extent of the image, and additional parameters related to depth of field. Depth of field, which will be described and implemented at the end of this section, simulates the blurriness of out-of-focus objects that occurs in real lens systems.
ProjectiveCamera implementations pass the projective transformation up to the base class constructor shown here. This transformation gives the camera-to-screen projection; from that, the constructor can easily compute the other transformation that will be needed, to go all the way from raster space to camera space.
The only nontrivial transformation to compute in the constructor is the screen-to-raster projection. In the following code, note the composition of transformations where (reading from bottom to top), we start with a point in screen space, translate so that the upper-left corner of the screen is at the origin, and then scale by the reciprocal of the screen width and height, giving us a point with and coordinates between 0 and 1 (these are NDC coordinates). Finally, we scale by the raster resolution, so that we end up covering the entire raster range from up to the overall raster resolution. An important detail here is that the coordinate is inverted by this transformation; this is necessary because increasing values move up the image in screen coordinates but down in raster coordinates.
6.2.1 Orthographic Camera
The orthographic camera, defined in the files cameras/orthographic.h and cameras/orthographic.cpp, is based on the orthographic projection transformation. The orthographic transformation takes a rectangular region of the scene and projects it onto the front face of the box that defines the region. It doesn’t give the effect of foreshortening—objects becoming smaller on the image plane as they get farther away—but it does leave parallel lines parallel, and it preserves relative distance between objects. Figure 6.2 shows how this rectangular volume defines the visible region of the scene.
Figure 6.3 compares the result of using the orthographic projection for rendering to the perspective projection defined in the next section.
The orthographic camera constructor generates the orthographic transformation matrix with the Orthographic() function, which will be defined shortly.
The orthographic viewing transformation leaves and coordinates unchanged but maps values at the near plane to 0 and values at the far plane to 1. To do this, the scene is first translated along the axis so that the near plane is aligned with . Then, the scene is scaled in so that the far plane maps to . The composition of these two transformations gives the overall transformation. (For a ray tracer like pbrt, we’d like the near plane to be at 0 so that rays start at the plane that goes through the camera’s position; the far plane offset doesn’t particularly matter.)
Thanks to the simplicity of the orthographic projection, it’s easy to directly compute the differential rays in the and directions in the GenerateRayDifferential() method. The directions of the differential rays will be the same as the main ray (as they are for all rays generated by an orthographic camera), and the difference in origins will be the same for all rays. Therefore, the constructor here precomputes how much the ray origins shift in camera space coordinates due to a single pixel shift in the and directions on the film plane.
We can now go through the code to take a sample point in raster space and turn it into a camera ray. The process is summarized in Figure 6.4. First, the raster space sample position is transformed into a point in camera space, giving a point located on the near plane, which is the origin of the camera ray. Because the camera space viewing direction points down the axis, the camera space ray direction is .
If depth of field has been enabled for this scene, the ray’s origin and direction are modified so that depth of field is simulated. Depth of field will be explained later in this section. The ray’s time value is set by linearly interpolating between the shutter open and shutter close times by the CameraSample::time offset (which is in the range ). Finally, the ray is transformed into world space before being returned.
Once all of the transformation matrices have been set up, it’s easy to transform the raster space sample point to camera space.
The implementation of GenerateRayDifferential() performs the same computation to generate the main camera ray. The differential ray origins are found using the offsets computed in the OrthographicCamera constructor, and then the full ray differential is transformed to world space.
6.2.2 Perspective Camera
The perspective projection is similar to the orthographic projection in that it projects a volume of space onto a 2D film plane. However, it includes the effect of foreshortening: objects that are far away are projected to be smaller than objects of the same size that are closer. Unlike the orthographic projection, the perspective projection doesn’t preserve distances or angles, and parallel lines no longer remain parallel. The perspective projection is a reasonably close match to how an eye or camera lens generates images of the 3D world. The perspective camera is implemented in the files cameras/perspective.h and cameras/perspective.cpp.
The perspective projection describes perspective viewing of the scene. Points in the scene are projected onto a viewing plane perpendicular to the axis. The Perspective() function computes this transformation; it takes a field-of-view angle in fov and the distances to a near plane and a far plane. After the perspective projection, points at the near plane are mapped to have , and points at the far plane have (Figure 6.5). For rendering systems based on rasterization, it’s important to set the positions of these planes carefully; they determine the range of the scene that is rendered, but setting them with too many orders of magnitude variation between their values can lead to numerical precision errors. For ray tracers like pbrt, they can be set arbitrarily as they are here.
The transformation is most easily understood in two steps:
- Points in camera space are projected onto the viewing
plane. A bit of algebra shows that the projected and coordinates
on the viewing plane can be computed by dividing and by the point’s
coordinate value. The projected depth is remapped so that
values at the near plane are 0 and values at the far plane are 1.
The computation we’d like to do is
<<Perform projective divide for perspective projection>>=Matrix4x4 persp(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, f / (f - n), -f*n / (f - n), 0, 0, 1, 0);
- The angular field of view (fov) specified by the user is accounted for by scaling the values on the projection plane so that points inside the field of view project to coordinates between on the view plane. For square images, both and lie between in screen space. Otherwise, the direction in which the image is narrower maps to , and the wider direction maps to a proportionally larger range of screen space values. Recall that the tangent is equal to the ratio of the opposite side of a right triangle to the adjacent side. Here the adjacent side has length 1, so the opposite side has the length . Scaling by the reciprocal of this length maps the field of view to range from .
Similar to the OrthographicCamera, information about how the camera rays generated by the PerspectiveCamera change as we shift pixels on the film plane can be precomputed in the constructor. Here, we compute the change in position on the near perspective plane in camera space with respect to shifts in pixel location.
With the perspective projection, all rays originate from the origin, , in camera space. A ray’s direction is given by the vector from the origin to the point on the near plane, pCamera, that corresponds to the provided CameraSample’s pFilm location. In other words, the ray’s vector direction is component-wise equal to this point’s position, so rather than doing a useless subtraction to compute the direction, we just initialize the direction directly from the point pCamera.
The GenerateRayDifferential() method follows the implementation of GenerateRay(), except for an additional fragment that computes the differential rays.
6.2.3 The Thin Lens Model and Depth of Field
An ideal pinhole camera that only allows rays passing through a single point to reach the film isn’t physically realizable; while it’s possible to make cameras with extremely small apertures that approach this behavior, small apertures allow relatively little light to reach the film sensor. With a small aperture, long exposure times are required to capture enough photons to accurately capture the image, which in turn can lead to blur from objects in the scene moving while the camera shutter is open.
Real cameras have lens systems that focus light through a finite-sized aperture onto the film plane. Camera designers (and photographers using cameras with adjustable apertures) face a trade-off: the larger the aperture, the more light reaches the film and the shorter the exposures that are needed. However, lenses can only focus on a single plane (the focal plane), and the farther objects in the scene are from this plane, the blurrier they are. The larger the aperture, the more pronounced this effect is: objects at depths different from the one the lens system has in focus become increasingly blurry.
The camera model in Section 6.4 implements a fairly accurate simulation of lens systems in realistic cameras. For the simple camera models introduced so far, we can apply a classic approximation from optics, the thin lens approximation, to model the effect of finite apertures with traditional computer graphics projection models. The thin lens approximation models an optical system as a single lens with spherical profiles, where the thickness of the lens is small relative to the radius of curvature of the lens. (The more general thick lens approximation, which doesn’t assume that the lens’s thickness is negligible, is introduced in Section 6.4.3.)
Under the thin lens approximation, incident rays that are parallel to the optical axis and pass through the lens focus at a point behind the lens called the focal point. The distance the focal point is behind the lens, , is the lens’s focal length. If the film plane is placed at a distance equal to the focal length behind the lens, then objects infinitely far away will be in focus, as they image to a single point on the film.
Figure 6.6 illustrates the basic setting. Here we’ve followed the typical lens coordinate system convention of placing the lens perpendicular to the axis, with the lens at and the scene along . (Note that this is a different coordinate system from the one we used for camera space, where the viewing direction is .) Distances on the scene side of the lens are denoted with unprimed variables , and distances on the film side of the lens (positive ) are primed, .
For points in the scene at a depth from a thin lens with focal length , the Gaussian lens equation relates the distances from the object to the lens and from lens to the image of the point:
Note that for , we have , as expected.
We can use the Gaussian lens equation to solve for the distance between the lens and the film that sets the plane of focus at some , the focal distance (Figure 6.7):
A point that doesn’t lie on the plane of focus is imaged to a disk on the film plane, rather than to a single point. The boundary of this disk is called the circle of confusion. The size of the circle of confusion is affected by the diameter of the aperture that light rays pass through, the focal distance, and the distance between the object and the lens. Figure 6.8 shows this effect, depth of field, in a scene with a series of copies of the dragon model. As the size of the lens aperture increases, blurriness increases the farther a point is from the plane of focus. Note that the second dragon from the right remains in focus throughout all of the images, as the plane of focus has been placed at its depth.
Figure 6.9 shows depth of field used to render the landscape scene. Note how the effect draws the viewer’s eye to the in-focus grass in the center of the image.
In practice, objects do not have to be exactly on the plane of focus to appear in sharp focus; as long as the circle of confusion is roughly smaller than a pixel on the film sensor, objects appear to be in focus. The range of distances from the lens at which objects appear in focus is called the lens’s depth of field.
The Gaussian lens equation also lets us compute the size of the circle of confusion; given a lens with focal length that is focused at a distance , the film plane is at . Given another point at depth , the Gaussian lens equation gives the distance that the lens focuses the point to. This point is either in front of or behind the film plane; Figure 6.10(a) shows the case where it is behind.
The diameter of the circle of confusion is given by the intersection of the cone between and the lens with the film plane. If we know the diameter of the lens , then we can use similar triangles to solve for the diameter of the circle of confusion (Figure 6.10(b)):
Solving for , we have
Applying the Gaussian lens equation to express the result in terms of scene depths, we can find that
Note that the diameter of the circle of confusion is proportional to the diameter of the lens. The lens diameter is often expressed as the lens’s f-number , which expresses diameter as a fraction of focal length, .
Figure 6.11 shows a graph of this function for a 50-mm focal length lens with a 25-mm aperture, focused at . Note that the blur is asymmetric with depth around the focal plane and grows much more quickly for objects in front of the plane of focus than for objects behind it.
Modeling a thin lens in a ray tracer is remarkably straightforward: all that is necessary is to choose a point on the lens and find the appropriate ray that starts on the lens at that point such that objects in the plane of focus are in focus on the film (Figure 6.12). Therefore, projective cameras take two extra parameters for depth of field: one sets the size of the lens aperture, and the other sets the focal distance.
It is generally necessary to trace many rays for each image pixel in order to adequately sample the lens for smooth depth of field. Figure 6.13 shows the landscape scene from Figure 6.9 with only four samples per pixel (Figure 6.9 had 2048 samples per pixel).
The ConcentricSampleDisk() function, defined in Chapter 13, takes a sample position in and maps it to a 2D unit disk centered at the origin . To turn this into a point on the lens, these coordinates are scaled by the lens radius. The CameraSample class provides the lens-sampling parameters in the pLens member variable.
The ray’s origin is this point on the lens. Now it is necessary to determine the proper direction for the new ray. We know that all rays from the given image sample through the lens must converge at the same point on the plane of focus. Furthermore, we know that rays pass through the center of the lens without a change in direction, so finding the appropriate point of convergence is a matter of intersecting the unperturbed ray from the pinhole model with the plane of focus and then setting the new ray’s direction to be the vector from the point on the lens to the intersection point.
For this simple model, the plane of focus is perpendicular to the axis and the ray starts at the origin, so intersecting the ray through the lens center with the plane of focus is straightforward. The value of the intersection is given by
Now the ray can be initialized. The origin is set to the sampled point on the lens, and the direction is set so that the ray passes through the point on the plane of focus, pFocus.
To compute ray differentials with the thin lens, the approach used in the fragment <<Update ray for effect of lens>> is applied to rays offset one pixel in the and directions on the film plane. The fragments that implement this, <<Compute OrthographicCamera ray differentials accounting for lens>> and <<Compute PerspectiveCamera ray differentials accounting for lens>>, aren’t included here.