2.7 Transformations
In general, a transformation is a mapping from points to points and from vectors to vectors:
The transformation may be an arbitrary procedure. However, we will consider a subset of all possible transformations in this chapter. In particular, they will be
- Linear: If is an arbitrary linear transformation and is an arbitrary scalar, then and . These two properties can greatly simplify reasoning about transformations.
- Continuous: Roughly speaking, maps the neighborhoods around and to neighborhoods around and .
- One-to-one and invertible: For each , maps to a single unique . Furthermore, there exists an inverse transform that maps back to .
We will often want to take a point, vector, or normal defined with respect to one coordinate frame and find its coordinate values with respect to another frame. Using basic properties of linear algebra, a matrix can be shown to express the linear transformation of a point or vector from one frame to another. Furthermore, such a matrix suffices to express all linear transformations of points and vectors within a fixed frame, such as translation in space or rotation around a point. Therefore, there are two different (and incompatible!) ways that a matrix can be interpreted:
- Transformation of the frame: Given a point, the matrix could express how to compute a new point in the same frame that represents the transformation of the original point (e.g., by translating it in some direction).
- Transformation from one frame to another: A matrix can express the coordinates of a point or vector in a new frame in terms of the coordinates in the original frame.
Most uses of transformations in pbrt are for transforming points from one frame to another.
In general, transformations make it possible to work in the most convenient coordinate space. For example, we can write routines that define a virtual camera, assuming that the camera is located at the origin, looks down the axis, and has the axis pointing up and the axis pointing right. These assumptions greatly simplify the camera implementation. Then, to place the camera at any point in the scene looking in any direction, we just construct a transformation that maps points in the scene’s coordinate system to the camera’s coordinate system. (See Section 6.1.1 for more information about camera coordinate spaces in pbrt.)
2.7.1 Homogeneous Coordinates
Given a frame defined by , there is ambiguity between the representation of a point and a vector with the same coordinates. Using the representations of points and vectors introduced at the start of the chapter, we can write the point as the inner product and the vector as the inner product These four-vectors of three values and a zero or one are called the homogeneous representations of the point and the vector. The fourth coordinate of the homogeneous representation is sometimes called the weight. For a point, its value can be any scalar other than zero: the homogeneous points and describe the same Cartesian point . Converting homogeneous points into ordinary points entails dividing the first three components by the weight:
We will use these facts to see how a transformation matrix can describe how points and vectors in one frame can be mapped to another frame. Consider a matrix that describes the transformation from one coordinate system to another:
(In this book, we define matrix element indices starting from zero, so that equations and source code correspond more directly.) Then if the transformation represented by is applied to the axis vector , we have
Thus, directly reading the columns of the matrix shows how the basis vectors and the origin of the current coordinate system are transformed by the matrix:
In general, by characterizing how the basis is transformed, we know how any point or vector specified in terms of that basis is transformed. Because points and vectors in the current coordinate system are expressed in terms of the current coordinate system’s frame, applying the transformation to them directly is equivalent to applying the transformation to the current coordinate system’s basis and finding their coordinates in terms of the transformed basis.
We will not use homogeneous coordinates explicitly in our code; there is no Homogeneous class in pbrt. However, the various transformation routines in the next section will implicitly convert points, vectors, and normals to homogeneous form, transform the homogeneous points, and then convert them back before returning the result. This isolates the details of homogeneous coordinates in one place (namely, the implementation of transformations).
A transformation is represented by the elements of the matrix m, a Matrix4x4 object. The low-level Matrix4x4 class is defined in Section A.5.3. The matrix m is stored in row-major form, so element m[i][j] corresponds to , where is the row number and is the column number. For convenience, the Transform also stores the inverse of the matrix m in the Transform::mInv member; for pbrt’s needs, it is better to have the inverse easily available than to repeatedly compute it as needed.
This representation of transformations is relatively memory hungry: assuming 4 bytes of storage for a Float value, a Transform requires 128 bytes of storage. Used naïvely, this approach can be wasteful; if a scene has millions of shapes but only a few thousand unique transformations, there’s no reason to redundantly store the same transform many times in memory. Therefore, Shapes in pbrt store a pointer to a Transform, and the scene specification code defined in Section A.3.5 uses a TransformCache to ensure that all shapes that share the same transformation point to a single instance of that transformation in memory.
This decision to share transformations implies a loss of flexibility, however: the elements of a Transform shouldn’t be modified after it is created if the Transform is shared by multiple objects in the scene (and those objects don’t expect it to be changing.) This limitation isn’t a problem in practice, since the transformations in a scene are typically created when pbrt parses the scene description file and don’t need to change later at rendering time.
2.7.2 Basic Operations
When a new Transform is created, it defaults to the identity transformation—the transformation that maps each point and each vector to itself. This transformation is represented by the identity matrix:
The implementation here relies on the default Matrix4x4 constructor to fill in the identity matrix for m and mInv.
A Transform can also be created from a given matrix. In this case, the given matrix must be explicitly inverted.
The most commonly used constructor takes a reference to the transformation matrix along with an explicitly provided inverse. This is a superior approach to computing the inverse in the constructor because many geometric transformations have very simple inverses and we can avoid the expense and potential loss of numeric precision from computing a general matrix inverse. Of course, this places the burden on the caller to make sure that the supplied inverse is correct.
The Transform representing the inverse of a Transform can be returned by just swapping the roles of mInv and m.
Transposing the two matrices in the transform to compute a new transform can also be useful.
We provide Transform equality (and inequality) testing methods; their implementation is straightforward and not included here. Transform also provides an IsIdentity() method that checks to see if the transformation is the identity.
2.7.3 Translations
One of the simplest transformations is the translation transformation, . When applied to a point , it translates ’s coordinates by , , and , as shown in Figure 2.10. As an example, .
Translation has some simple properties:
Translation only affects points, leaving vectors unchanged. In matrix form, the translation transformation is
When we consider the operation of a translation matrix on a point, we see the value of homogeneous coordinates. Consider the product of the matrix for with a point in homogeneous coordinates :
As expected, we have computed a new point with its coordinates offset by . However, if we apply to a vector , we have
The result is the same vector . This makes sense because vectors represent directions, so translation leaves them unchanged.
We will define a routine that creates a new Transform matrix to represent a given translation—it is a straightforward application of the translation matrix equation. This routine fully initializes the Transform that is returned, also initializing the matrix that represents the inverse of the translation.
2.7.4 Scaling
Another basic transformation is the scale transformation, . It has the effect of taking a point or vector and multiplying its components by scale factors in , , and : . It has the following basic properties:
We can differentiate between uniform scaling, where all three scale factors have the same value, and nonuniform scaling, where they may have different values. The general scale matrix is
It’s useful to be able to test if a transformation has a scaling term in it; an easy way to do this is to transform the three coordinate axes and see if any of their lengths are appreciably different from one.
2.7.5 , , and Axis Rotations
Another useful type of transformation is the rotation transformation, . In general, we can define an arbitrary axis from the origin in any direction and then rotate around that axis by a given angle. The most common rotations of this type are around the , , and coordinate axes. We will write these rotations as , , and so on. The rotation around an arbitrary axis is denoted by .
Rotations also have some basic properties:
where is the matrix transpose of . This last property, that the inverse of is equal to its transpose, stems from the fact that is an orthogonal matrix; its first three columns (or rows) are all normalized and orthogonal to each other. Fortunately, the transpose is much easier to compute than a full matrix inverse.
For a left-handed coordinate system, the matrix for clockwise rotation around the axis is
Figure 2.11 gives an intuition for how this matrix works.
It’s easy to see that the matrix leaves the axis unchanged:
It maps the axis to and the axis to . The and axes remain in the same plane, perpendicular to the axis, but are rotated by the given angle. An arbitrary point in space is similarly rotated about the axis by this transformation while staying in the same plane as it was originally.
The implementation of the RotateX() function is straightforward.
Similarly, for clockwise rotation around and , we have
The implementations of RotateY() and RotateZ() follow directly and are not included here.
2.7.6 Rotation around an Arbitrary Axis
We also provide a routine to compute the transformation that represents rotation around an arbitrary axis. The usual derivation of this matrix is based on computing rotations that map the given axis to a fixed axis (e.g., ), performing the rotation there, and then rotating the fixed axis back to the original axis. A more elegant derivation can be constructed with vector algebra.
Consider a normalized direction vector that gives the axis to rotate around by angle , and a vector to be rotated (Figure 2.12).
First, we can compute the vector along the axis that is in the plane through the end point of and is parallel to . Assuming and form an angle , we have
We now compute a pair of basis vectors and in this plane. Trivially, one of them is
and the other can be computed with a cross product
Because is normalized, and have the same length, equal to the length of the vector between and . To now compute the rotation by an angle about in the plane of rotation, the rotation formulas earlier give us
To convert this to a rotation matrix, we apply this formula to the basis vectors , , and to get the values of the rows of the matrix. The result of all this is encapsulated in the following function. As with the other rotation matrices, the inverse is equal to the transpose.
The code for the other two basis vectors follows similarly and isn’t included here.
2.7.7 The Look-At Transformation
The look-at transformation is particularly useful for placing a camera in the scene. The caller specifies the desired position of the camera, a point the camera is looking at, and an “up” vector that orients the camera along the viewing direction implied by the first two parameters. All of these values are given in world space coordinates. The look-at construction then gives a transformation between camera space and world space (Figure 2.13).
In order to find the entries of the look-at transformation matrix, we use principles described earlier in this section: the columns of a transformation matrix give the effect of the transformation on the basis of a coordinate system.
The easiest column is the fourth one, which gives the point that the camera space origin, maps to in world space. This is clearly just the camera position, supplied by the user.
The other three columns aren’t much more difficult. First, LookAt() computes the normalized direction vector from the camera location to the look-at point; this gives the vector coordinates that the axis should map to and, thus, the third column of the matrix. (In a left-handed coordinate system, camera space is defined with the viewing direction down the axis.) The first column, giving the world space direction that the axis in camera space maps to, is found by taking the cross product of the user-supplied “up” vector with the recently computed viewing direction vector. Finally, the “up” vector is recomputed by taking the cross product of the viewing direction vector with the transformed axis vector, thus ensuring that the and axes are perpendicular and we have an orthonormal viewing coordinate system.