5.2 Projective Camera Models
One of the fundamental issues in 3D computer graphics is the 3D viewing problem: how to project a 3D scene onto a 2D image for display. Most of the classic approaches can be expressed by a projective transformation matrix. Therefore, we will introduce a projection matrix camera class, ProjectiveCamera, and then define two camera models based on it. The first implements an orthographic projection, and the other implements a perspective projection—two classic and widely used projections.
The orthographic and perspective projections both require the specification of two planes perpendicular to the viewing direction: the near and far planes. When rasterization is used for rendering, objects that are not between those two planes are culled and not included in the final image. (Culling objects in front of the near plane is particularly important in order to avoid a singularity at the depth 0 and because otherwise the projection matrices map points behind the camera to appear to be in front of it.) In a ray tracer, the projection matrices are used purely to determine rays leaving the camera and these concerns do not apply; there is therefore less need to worry about setting those planes’ depths carefully in this context.
Three more coordinate systems (summarized in Figure 5.2) are useful for defining and discussing projective cameras:
- Screen space: Screen space is defined on the film plane. The camera projects objects in camera space onto the film plane; the parts inside the screen window are visible in the image that is generated. Points at the near plane are mapped to a depth value of 0 and points at the far plane are mapped to 1. Note that, although this is called “screen” space, it is still a 3D coordinate system, since values are meaningful.
- Normalized device coordinate (NDC) space: This is the coordinate system for the actual image being rendered. In and , this space ranges from to , with being the upper-left corner of the image. Depth values are the same as in screen space, and a linear transformation converts from screen to NDC space.
- Raster space: This is almost the same as NDC space, except the and coordinates range from to the resolution of the image in and pixels.
Projective cameras use matrices to transform among all of these spaces.
In addition to the parameters required by the CameraBase class, the ProjectiveCamera takes the projective transformation matrix, the screen space extent of the image, and additional parameters related to the distance at which the camera is focused and the size of its lens aperture. If the lens aperture is not an infinitesimal pinhole, then parts of the image may be blurred, as happens for out-of-focus objects with real lens systems. Simulation of this effect will be discussed later in this section.
ProjectiveCamera implementations pass the projective transformation up to the base class constructor shown here. This transformation gives the screen-from-camera projection; from that, the constructor can easily compute the other transformations that go all the way from raster space to camera space.
The only nontrivial transformation to compute in the constructor is the raster-from-screen projection. It is computed in two steps, via composition of the raster-from-NDC and NDC-from-screen transformations. An important detail here is that the coordinate is inverted by the final transformation; this is necessary because increasing values move up the image in screen coordinates but down in raster coordinates.
5.2.1 Orthographic Camera
The orthographic camera is based on the orthographic projection transformation. The orthographic transformation takes a rectangular region of the scene and projects it onto the front face of the box that defines the region. It does not give the effect of foreshortening—objects becoming smaller on the image plane as they get farther away—but it does leave parallel lines parallel, and it preserves relative distance between objects. Figure 5.3 shows how this rectangular volume defines the visible region of the scene.
Figure 5.4 compares the result of using the orthographic projection for rendering to that of the perspective projection defined in the next section.
The orthographic camera constructor generates the orthographic transformation matrix with the Orthographic() function, which will be defined shortly.
The orthographic viewing transformation leaves and coordinates unchanged but maps values at the near plane to 0 and values at the far plane to 1. To do this, the scene is first translated along the axis so that the near plane is aligned with . Then, the scene is scaled in so that the far plane maps to . The composition of these two transformations gives the overall transformation. For a ray tracer like pbrt, we would like the near plane to be at 0 so that rays start at the plane that goes through the camera’s position; the far plane’s position does not particularly matter.
Thanks to the simplicity of the orthographic projection, it is easy to directly compute the differential rays in the and directions in the GenerateRayDifferential() method. The directions of the differential rays will be the same as the main ray (as they are for all rays generated by an orthographic camera), and the difference in origins will be the same for all rays. Therefore, the constructor here precomputes how much the ray origins shift in camera space coordinates due to a single pixel shift in the and directions on the film plane.
We can now go through the code that takes a sample point in raster space and turns it into a camera ray. The process is summarized in Figure 5.5. First, the raster space sample position is transformed into a point in camera space, giving a point located on the near plane, which is the origin of the camera ray. Because the camera space viewing direction points down the axis, the camera space ray direction is .
If the lens aperture is not a pinhole, the ray’s origin and direction are modified so that defocus blur is simulated. Finally, the ray is transformed into rendering space before being returned.
Once all the transformation matrices have been set up, it is easy to transform the raster space sample point to camera space.
The implementation of GenerateRayDifferential() performs the same computation to generate the main camera ray. The differential ray origins are found using the offsets computed in the OrthographicCamera constructor, and then the full ray differential is transformed to rendering space.
5.2.2 Perspective Camera
The perspective projection is similar to the orthographic projection in that it projects a volume of space onto a 2D film plane. However, it includes the effect of foreshortening: objects that are far away are projected to be smaller than objects of the same size that are closer. Unlike the orthographic projection, the perspective projection does not preserve distances or angles, and parallel lines no longer remain parallel. The perspective projection is a reasonably close match to how an eye or camera lens generates images of the 3D world.
The perspective projection describes perspective viewing of the scene. Points in the scene are projected onto a viewing plane perpendicular to the axis. The Perspective() function computes this transformation; it takes a field-of-view angle in fov and the distances to a near plane and a far plane (Figure 5.6).
The transformation is most easily understood in two steps:
- Points in camera space are projected onto the viewing
plane. A bit of algebra shows that the projected and coordinates
on the viewing plane can be computed by dividing and by the point’s
coordinate value. The projected depth is remapped so that
values at the near plane are 0 and values at the far plane are 1.
The computation we would like to do is
<<Perform projective divide for perspective projection>>=SquareMatrix<4> persp(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, f / (f - n), -f*n / (f - n), 0, 0, 1, 0);
- The angular field of view (fov) specified by the user
is accounted for by scaling the values on the projection plane so that points inside the
field of view project to coordinates between on the view plane. For
square images, both and lie between in screen space.
Otherwise, the direction in which the image is narrower maps to ,
and the wider direction maps to a proportionally larger range of
screen space values.
Recall that the tangent is equal to the ratio of the opposite side of a right
triangle to the adjacent side. Here the adjacent side has length 1, so the
opposite side has the length .
Scaling by the reciprocal of this length maps the field of view to
the range .
<<Scale canonical perspective view to specified field of view>>=
As with the OrthographicCamera, the PerspectiveCamera’s constructor computes information about how the rays it generates change with shifts in pixels. In this case, the ray origins are unchanged and the ray differentials are only different in their directions. Here, we compute the change in position on the near perspective plane in camera space with respect to shifts in pixel location.
The cosine of the maximum angle of the perspective camera’s field of view will occasionally be useful. In particular, points outside the field of view can be quickly culled via a dot product with the viewing direction and comparison to this value. This cosine can be found by computing the angle between the camera’s viewing vector and a vector to one of the corners of the image (see Figure 5.7). This corner needs a small adjustment here to account for the width of the filter function centered at each pixel that is used to weight image samples according to their location (this topic is discussed in Section 8.8).
With the perspective projection, camera space rays all originate from the origin, . A ray’s direction is given by the vector from the origin to the point on the near plane, pCamera, that corresponds to the provided CameraSample’s pFilm location. In other words, the ray’s vector direction is component-wise equal to this point’s position, so rather than doing a useless subtraction to compute the direction, we just initialize the direction directly from the point pCamera.
The GenerateRayDifferential() method follows the implementation of GenerateRay(), except for this additional fragment that computes the differential rays.
5.2.3 The Thin Lens Model and Depth of Field
An ideal pinhole camera that only allows rays passing through a single point to reach the film is not physically realizable; while it is possible to make cameras with extremely small apertures that approach this behavior, small apertures allow relatively little light to reach the film sensor. With a small aperture, long exposure times are required to capture enough photons to accurately capture the image, which in turn can lead to blur from objects in the scene moving while the camera shutter is open.
Real cameras have lens systems that focus light through a finite-sized aperture onto the film plane. Camera designers (and photographers using cameras with adjustable apertures) face a trade-off: the larger the aperture, the more light reaches the film and the shorter the exposures that are needed. However, lenses can only focus on a single plane (the focal plane), and the farther objects in the scene are from this plane, the blurrier they are. The larger the aperture, the more pronounced this effect is.
The RealisticCamera (included only in the online edition of the book) implements a fairly accurate simulation of lens systems in real-world cameras. For the simple camera models introduced so far, we can apply a classic approximation from optics, the thin lens approximation, to model the effect of finite apertures with traditional computer graphics projection models. The thin lens approximation models an optical system as a single lens with spherical profiles, where the thickness of the lens is small relative to the radius of curvature of the lens.
Under the thin lens approximation, incident rays that are parallel to the optical axis and pass through the lens focus at a point behind the lens called the focal point. The distance the focal point is behind the lens, , is the lens’s focal length. If the film plane is placed at a distance equal to the focal length behind the lens, then objects infinitely far away will be in focus, as they image to a single point on the film.
Figure 5.8 illustrates the basic setting. Here we have followed the typical lens coordinate system convention of placing the lens perpendicular to the axis, with the lens at and the scene along . (Note that this is a different coordinate system from the one we used for camera space, where the viewing direction is .) Distances on the scene side of the lens are denoted with unprimed variables , and distances on the film side of the lens (positive ) are primed, .
For points in the scene at a depth from a thin lens with focal length , the Gaussian lens equation relates the distances from the object to the lens and from the lens to the image of the point:
Note that for , we have , as expected.
We can use the Gaussian lens equation to solve for the distance between the lens and the film that sets the plane of focus at some , the focal distance (Figure 5.9):
A point that does not lie on the plane of focus is imaged to a disk on the film plane, rather than to a single point. The boundary of this disk is called the circle of confusion. The size of the circle of confusion is affected by the diameter of the aperture that light rays pass through, the focal distance, and the distance between the object and the lens. Although the circle of confusion only has zero radius for a single depth, a range of nearby depths have small enough circles of confusion that they still appear to be in focus. (As long as its circle of confusion is smaller than the spacing between pixels, a point will effectively appear to be in focus.) The range of depths that appear in focus are termed the depth of field.
Figure 5.10 shows this effect, in the Watercolor scene. As the size of the lens aperture increases, blurriness increases the farther a point is from the plane of focus. Note that the pencil cup in the center remains in focus throughout all the images, as the plane of focus has been placed at its depth. Figure 5.11 shows depth of field used to render the landscape scene. Note how the effect draws the viewer’s eye to the in-focus grass in the center of the image.
The Gaussian lens equation also lets us compute the size of the circle of confusion; given a lens with focal length that is focused at a distance , the film plane is at . Given another point at depth , the Gaussian lens equation gives the distance that the lens focuses the point to. This point is either in front of or behind the film plane; Figure 5.12(a) shows the case where it is behind.
The diameter of the circle of confusion is given by the intersection of the cone between and the lens with the film plane. If we know the diameter of the lens , then we can use similar triangles to solve for the diameter of the circle of confusion (Figure 5.12(b)):
Solving for , we have
Applying the Gaussian lens equation to express the result in terms of scene depths, we can find that
Note that the diameter of the circle of confusion is proportional to the diameter of the lens. The lens diameter is often expressed as the lens’s f-number , which expresses diameter as a fraction of focal length, .
Figure 5.13 shows a graph of this function for a 50-mm focal length lens with a 25-mm aperture, focused at . Note that the blur is asymmetric with depth around the focal plane and grows much more quickly for objects in front of the plane of focus than for objects behind it.
Modeling a thin lens in a ray tracer is remarkably straightforward: all that is necessary is to choose a point on the lens and find the appropriate ray that starts on the lens at that point such that objects in the plane of focus are in focus on the film (Figure 5.14). Therefore, projective cameras take two extra parameters for depth of field: one sets the size of the lens aperture, and the other sets the focal distance.
It is generally necessary to trace many rays for each image pixel in order to adequately sample the lens for smooth defocus blur. Figure 5.15 shows the landscape scene from Figure 5.11 with only four samples per pixel (Figure 5.11 had 2048 samples per pixel).
The SampleUniformDiskConcentric() function, which is defined in Section A.5.1, takes a sample position in and maps it to a 2D unit disk centered at the origin . To turn this into a point on the lens, these coordinates are scaled by the lens radius. The CameraSample class provides the lens-sampling parameters in the pLens member variable.
The ray’s origin is this point on the lens. Now it is necessary to determine the proper direction for the new ray. We know that all rays from the given image sample through the lens must converge at the same point on the plane of focus. Furthermore, we know that rays pass through the center of the lens without a change in direction, so finding the appropriate point of convergence is a matter of intersecting the unperturbed ray from the pinhole model with the plane of focus and then setting the new ray’s direction to be the vector from the point on the lens to the intersection point.
For this simple model, the plane of focus is perpendicular to the axis and the ray starts at the origin, so intersecting the ray through the lens center with the plane of focus is straightforward. The value of the intersection is given by
Now the ray can be initialized. The origin is set to the sampled point on the lens, and the direction is set so that the ray passes through the point on the plane of focus, pFocus.
To compute ray differentials with the thin lens, the approach used in the fragment <<Update ray for effect of lens>> is applied to rays offset one pixel in the and directions on the film plane. The fragments that implement this, <<Compute OrthographicCamera ray differentials accounting for lens>> and <<Compute PerspectiveCamera ray differentials accounting for lens>>, are not included here.