5.1 Camera Interface

The Camera class uses the usual TaggedPointer-based approach to dynamically dispatch interface method calls to the correct implementation based on the actual type of the camera. (As usual, we will not include the implementations of those methods in the book here.) Camera is defined in the file base/camera.h.

<<Camera Definition>>= 
class Camera : public TaggedPointer<PerspectiveCamera, OrthographicCamera, SphericalCamera, RealisticCamera> { public: <<Camera Interface>> 
using TaggedPointer::TaggedPointer; static Camera Create(const std::string &name, const ParameterDictionary &parameters, Medium medium, const CameraTransform &cameraTransform, Film film, const FileLoc *loc, Allocator alloc); std::string ToString() const; pstd::optional<CameraRay> GenerateRay(CameraSample sample, SampledWavelengths &lambda) const; pstd::optional<CameraRayDifferential> GenerateRayDifferential( CameraSample sample, SampledWavelengths &lambda) const; Film GetFilm() const; Float SampleTime(Float u) const; void InitMetadata(ImageMetadata *metadata) const; const CameraTransform &GetCameraTransform() const; void Approximate_dp_dxy(Point3f p, Normal3f n, Float time, int samplesPerPixel, Vector3f *dpdx, Vector3f *dpdy) const;
};

The first method that cameras must implement is GenerateRay(), which computes the ray corresponding to a given image sample. It is important that the direction component of the returned ray be normalized—many other parts of the system will depend on this behavior. If for some reason there is no valid ray for the given CameraSample, then the pstd::optional return value should be unset. The SampledWavelengths for the ray are passed as a non-const reference so that cameras can model dispersion in their lenses, in which case only a single wavelength of light is tracked by the ray and the GenerateRay() method will call SampledWavelengths::TerminateSecondary().

<<Camera Interface>>= 
pstd::optional<CameraRay> GenerateRay(CameraSample sample, SampledWavelengths &lambda) const;

The CameraSample structure that is passed to GenerateRay() holds all the sample values needed to specify a camera ray. Its pFilm member gives the point on the film to which the generated ray should carry radiance. The point on the lens the ray passes through is in pLens (for cameras that include the notion of lenses), and time gives the time at which the ray should sample the scene. If the camera itself is in motion, the time value determines what camera position to use when generating the ray.

Finally, the filterWeight member variable is an additional scale factor that is applied when the ray’s radiance is added to the image stored by the film; it accounts for the reconstruction filter used to filter image samples at each pixel. This topic is discussed in Sections 5.4.3 and 8.8.

<<CameraSample Definition>>= 
struct CameraSample { Point2f pFilm; Point2f pLens; Float time = 0; Float filterWeight = 1; };

The CameraRay structure that is returned by GenerateRay() includes both a ray and a spectral weight associated with it. Simple camera models leave the weight at the default value of one, while more sophisticated ones like RealisticCamera return a weight that is used in modeling the radiometry of image formation. (Section 5.4.1 contains more information about how exactly this weight is computed and used in the latter case.)

<<CameraRay Definition>>= 
struct CameraRay { Ray ray; SampledSpectrum weight = SampledSpectrum(1); };

Cameras must also provide an implementation of GenerateRayDifferential(), which computes a main ray like GenerateRay() but also computes the corresponding rays for pixels shifted one pixel in the x and y directions on the film plane. This information about how camera rays change as a function of position on the film helps give other parts of the system a notion of how much of the film area a particular camera ray’s sample represents, which is useful for antialiasing texture lookups.

<<Camera Interface>>+=  
pstd::optional<CameraRayDifferential> GenerateRayDifferential( CameraSample sample, SampledWavelengths &lambda) const;

GenerateRayDifferential() returns an instance of the CameraRayDifferential structure, which is equivalent to CameraRay, except it stores a RayDifferential.

<<CameraRayDifferential Definition>>= 
struct CameraRayDifferential { RayDifferential ray; SampledSpectrum weight = SampledSpectrum(1); };

Camera implementations must provide access to their Film, which allows other parts of the system to determine things such as the resolution of the output image.

<<Camera Interface>>+=  
Film GetFilm() const;

Just like real-world cameras, pbrt’s camera models include the notion of a shutter that opens for a short period of time to expose the film to light. One result of this nonzero exposure time is motion blur: objects that are in motion relative to the camera during the exposure are blurred. Time is yet another thing that is amenable to point sampling and Monte Carlo integration: given an appropriate distribution of ray times between the shutter open time and the shutter close time, it is possible to compute images that exhibit motion blur.

The SampleTime() interface method should therefore map a uniform random sample u in the range left-bracket 0 comma 1 right-parenthesis to a time when the camera’s shutter is open. Normally, it is just used to linearly interpolate between the shutter open and close times.

<<Camera Interface>>+=  
Float SampleTime(Float u) const;

The last interface method allows camera implementations to set fields in the ImageMetadata class to specify transformation matrices related to the camera. If the output image format has support for storing this sort of auxiliary information, it will be included in the final image that is written to disk.

<<Camera Interface>>+=  
void InitMetadata(ImageMetadata *metadata) const;

5.1.1 Camera Coordinate Spaces

Before we start to describe the implementation of pbrt’s camera models, we will define some of the coordinate spaces that they use. In addition to world space, which was introduced in Section 3.1, we will now introduce four additional coordinate spaces, object space, camera space, camera-world space, and rendering space. In sum, we have:

  • Object space: This is the coordinate system in which geometric primitives are defined. For example, spheres in pbrt are defined to be centered at the origin of their object space.
  • World space: While each primitive may have its own object space, all objects in the scene are placed in relation to a single world space. A world-from-object transformation determines where each object is located in world space. World space is the standard frame that all other spaces are defined in terms of.
  • Camera space: A camera is placed in the scene at some world space point with a particular viewing direction and orientation. This camera defines a new coordinate system with its origin at the camera’s location. The z axis of this coordinate system is mapped to the viewing direction, and the y axis is mapped to the up direction.
  • Camera-world space: Like camera space, the origin of this coordinate system is the camera’s position, but it maintains the orientation of world space (i.e., unlike camera space, the camera is not necessarily looking down the z axis).
  • Rendering space: This is the coordinate system into which the scene is transformed for the purposes of rendering. In pbrt, it may be world space, camera space, or camera-world space.

Renderers based on rasterization traditionally do most of their computations in camera space: triangle vertices are transformed all the way from object space to camera space before being projected onto the screen and rasterized. In that context, camera space is a handy space for reasoning about which objects are potentially visible to the camera. For example, if an object’s camera space bounding box is entirely behind the z equals 0 plane (and the camera does not have a field of view wider than 180 degrees), the object will not be visible.

Conversely, many ray tracers (including all versions of pbrt prior to this one) render in world space. Camera implementations may start out in camera space when generating rays, but they transform those rays to world space where all subsequent ray intersection and shading calculations are performed. A problem with that approach stems from the fact that floating-point numbers have more precision close to the origin than far away from it. If the camera is placed far from the origin, there may be insufficient precision to accurately represent the part of the scene that it is looking at.

Figure 5.1 illustrates the precision problem with rendering in world space. In Figure 5.1(a), the scene is rendered with the camera and objects as they were provided in the original scene specification, which happened to be in the range of plus-or-minus 10 in each coordinate in world space. In Figure 5.1(b), both the camera and the scene have been translated 1,000,000 units in each dimension. In principle, both images should be the same, but much less precision is available for the second viewpoint, to the extent that the discretization of floating-point numbers is visible in the geometric model.

Figure 5.1: Effect of the Loss of Floating-Point Precision Far from the Origin. (a) As originally specified, this scene is within 10 units of the origin. Rendering the scene in world space produces the expected image. (b) If both the scene and the camera are translated 1,000,000 units from the origin and the scene is rendered in world space, there is significantly less floating-point precision to represent the scene, giving this poor result. (c) If the translated scene is rendered in camera-world space, much more precision is available and the geometric detail is preserved. However, the viewpoint has shifted slightly due to a loss of accuracy in the representation of the camera position. (Model courtesy of Yasutoshi Mori.)

Rendering in camera space naturally provides the most floating-point precision for the objects closest to the camera. If the scene in Figure 5.1 is rendered in camera space, translating both the camera and the scene geometry by 1,000,000 units has no effect—the translations cancel. However, there is a problem with using camera space with ray tracing. Scenes are often modeled with major features aligned to the coordinate axes (e.g., consider an architectural model, where the floor and ceiling might be aligned with y planes). Axis-aligned bounding boxes of such features are degenerate in one dimension, which reduces their surface area. Acceleration structures like the BVH that will be introduced in Chapter 7 are particularly effective with such bounding boxes. In turn, if the camera is rotated with respect to the scene, axis-aligned bounding boxes are less effective at bounding such features and rendering performance is affected: for the scene in Figure 5.1, rendering time increases by 27%.

Rendering using camera-world space gives the best of both worlds: the camera is at the origin and the scene is translated accordingly. However, the rotation is not applied to the scene geometry, thus preserving good bounding boxes for the acceleration structures. With camera-world space, there is no increase in rendering time and higher precision is maintained, as is shown in Figure 5.1(c). The CameraTransform class abstracts the choice of which particular coordinate system is used for rendering by handling the details of transforming among the various spaces.

<<CameraTransform Definition>>= 
class CameraTransform { public: <<CameraTransform Public Methods>> 
CameraTransform() = default; explicit CameraTransform(const AnimatedTransform &worldFromCamera); Point3f RenderFromCamera(Point3f p, Float time) const { return renderFromCamera(p, time); } Point3f CameraFromRender(Point3f p, Float time) const { return renderFromCamera.ApplyInverse(p, time); } Point3f RenderFromWorld(Point3f p) const { return worldFromRender.ApplyInverse(p); } Transform RenderFromWorld() const { return Inverse(worldFromRender); } Transform CameraFromRender(Float time) const { return Inverse(renderFromCamera.Interpolate(time)); } Transform CameraFromWorld(Float time) const { return Inverse(worldFromRender * renderFromCamera.Interpolate(time)); } PBRT_CPU_GPU bool CameraFromRenderHasScale() const { return renderFromCamera.HasScale(); } PBRT_CPU_GPU Vector3f RenderFromCamera(Vector3f v, Float time) const { return renderFromCamera(v, time); } PBRT_CPU_GPU Normal3f RenderFromCamera(Normal3f n, Float time) const { return renderFromCamera(n, time); } PBRT_CPU_GPU Ray RenderFromCamera(const Ray &r) const { return renderFromCamera(r); } PBRT_CPU_GPU RayDifferential RenderFromCamera(const RayDifferential &r) const { return renderFromCamera(r); } PBRT_CPU_GPU Vector3f CameraFromRender(Vector3f v, Float time) const { return renderFromCamera.ApplyInverse(v, time); } PBRT_CPU_GPU Normal3f CameraFromRender(Normal3f v, Float time) const { return renderFromCamera.ApplyInverse(v, time); } PBRT_CPU_GPU const AnimatedTransform &RenderFromCamera() const { return renderFromCamera; } PBRT_CPU_GPU const Transform &WorldFromRender() const { return worldFromRender; } std::string ToString() const;
private: <<CameraTransform Private Members>> 
AnimatedTransform renderFromCamera; Transform worldFromRender;
};

Camera implementations must make their CameraTransform available to other parts of the system, so we will add one more method to the Camera interface.

<<Camera Interface>>+=  
const CameraTransform &GetCameraTransform() const;

CameraTransform maintains two transformations: one from camera space to the rendering space, and one from the rendering space to world space. In pbrt, the latter transformation cannot be animated; any animation in the camera transformation is kept in the first transformation. This ensures that a moving camera does not cause static geometry in the scene to become animated, which in turn would harm performance.

<<CameraTransform Private Members>>= 
AnimatedTransform renderFromCamera; Transform worldFromRender;

The CameraTransform constructor takes the world-from-camera transformation as specified in the scene description and decomposes it into the two transformations described earlier. The default rendering space is camera-world, though this choice can be overridden using a command-line option.

<<CameraTransform Method Definitions>>= 
CameraTransform::CameraTransform(const AnimatedTransform &worldFromCamera) { switch (Options->renderingSpace) { case RenderingCoordinateSystem::Camera: { <<Compute worldFromRender for camera-space rendering>> 
Float tMid = (worldFromCamera.startTime + worldFromCamera.endTime) / 2; worldFromRender = worldFromCamera.Interpolate(tMid); break;
} case RenderingCoordinateSystem::CameraWorld: { <<Compute worldFromRender for camera-world space rendering>> 
Float tMid = (worldFromCamera.startTime + worldFromCamera.endTime) / 2; Point3f pCamera = worldFromCamera(Point3f(0, 0, 0), tMid); worldFromRender = Translate(Vector3f(pCamera)); break;
} case RenderingCoordinateSystem::World: { <<Compute worldFromRender for world-space rendering>>  } } <<Compute renderFromCamera transformation>> 
Transform renderFromWorld = Inverse(worldFromRender); Transform rfc[2] = { renderFromWorld * worldFromCamera.startTransform, renderFromWorld * worldFromCamera.endTransform }; renderFromCamera = AnimatedTransform(rfc[0], worldFromCamera.startTime, rfc[1], worldFromCamera.endTime);
}

For camera-space rendering, the world-from-camera transformation should be used for worldFromRender and an identity transformation for the render-from-camera transformation, since those two coordinate systems are equivalent. However, because worldFromRender cannot be animated, the implementation takes the world-from-camera transformation at the midpoint of the frame and then folds the effect of any animation in the camera transformation into renderFromCamera.

<<Compute worldFromRender for camera-space rendering>>= 
Float tMid = (worldFromCamera.startTime + worldFromCamera.endTime) / 2; worldFromRender = worldFromCamera.Interpolate(tMid); break;

For the default case of rendering in camera-world space, the world-from-render transformation is given by translating to the camera’s position at the midpoint of the frame.

<<Compute worldFromRender for camera-world space rendering>>= 
Float tMid = (worldFromCamera.startTime + worldFromCamera.endTime) / 2; Point3f pCamera = worldFromCamera(Point3f(0, 0, 0), tMid); worldFromRender = Translate(Vector3f(pCamera)); break;

For world-space rendering, worldFromRender is the identity transformation.

<<Compute worldFromRender for world-space rendering>>= 

Once worldFromRender has been set, whatever transformation remains in worldFromCamera is extracted and stored in renderFromCamera.

<<Compute renderFromCamera transformation>>= 
Transform renderFromWorld = Inverse(worldFromRender); Transform rfc[2] = { renderFromWorld * worldFromCamera.startTransform, renderFromWorld * worldFromCamera.endTransform }; renderFromCamera = AnimatedTransform(rfc[0], worldFromCamera.startTime, rfc[1], worldFromCamera.endTime);

The CameraTransform class provides a variety of overloaded methods named RenderFromCamera(), CameraFromRender(), and RenderFromWorld() that transform points, vectors, normals, and rays among the coordinate systems it manages. Other methods return the corresponding transformations directly. Their straightforward implementations are not included here.

5.1.2 The CameraBase Class

All of the camera implementations in this chapter share some common functionality that we have factored into a single class, CameraBase, from which all of them inherit. CameraBase, as well as all the camera implementations, is defined in the files cameras.h and cameras.cpp.

<<CameraBase Definition>>= 
class CameraBase { public: <<CameraBase Public Methods>> 
Film GetFilm() const { return film; } const CameraTransform &GetCameraTransform() const { return cameraTransform; } Float SampleTime(Float u) const { return Lerp(u, shutterOpen, shutterClose); } void InitMetadata(ImageMetadata *metadata) const; std::string ToString() const; void Approximate_dp_dxy(Point3f p, Normal3f n, Float time, int samplesPerPixel, Vector3f *dpdx, Vector3f *dpdy) const { <<Compute tangent plane equation for ray differential intersections>> 
Point3f pCamera = CameraFromRender(p, time); Transform DownZFromCamera = RotateFromTo(Normalize(Vector3f(pCamera)), Vector3f(0, 0, 1)); Point3f pDownZ = DownZFromCamera(pCamera); Normal3f nDownZ = DownZFromCamera(CameraFromRender(n, time)); Float d = nDownZ.z * pDownZ.z;
<<Find intersection points for approximated camera differential rays>> 
Ray xRay(Point3f(0,0,0) + minPosDifferentialX, Vector3f(0,0,1) + minDirDifferentialX); Float tx = -(Dot(nDownZ, Vector3f(xRay.o)) - d) / Dot(nDownZ, xRay.d); Ray yRay(Point3f(0,0,0) + minPosDifferentialY, Vector3f(0,0,1) + minDirDifferentialY); Float ty = -(Dot(nDownZ, Vector3f(yRay.o)) - d) / Dot(nDownZ, yRay.d); Point3f px = xRay(tx), py = yRay(ty);
<<Estimate partial-differential normal p slash partial-differential x and partial-differential normal p slash partial-differential y in tangent plane at intersection point>> 
Float sppScale = GetOptions().disablePixelJitter ? 1 : std::max<Float>(.125, 1 / std::sqrt((Float)samplesPerPixel)); *dpdx = sppScale * RenderFromCamera(DownZFromCamera.ApplyInverse(px - pDownZ), time); *dpdy = sppScale * RenderFromCamera(DownZFromCamera.ApplyInverse(py - pDownZ), time);
}
protected: <<CameraBase Protected Members>> 
CameraTransform cameraTransform; Float shutterOpen, shutterClose; Film film; Medium medium; Vector3f minPosDifferentialX, minPosDifferentialY; Vector3f minDirDifferentialX, minDirDifferentialY;
<<CameraBase Protected Methods>> 
CameraBase(CameraBaseParameters p); PBRT_CPU_GPU static pstd::optional<CameraRayDifferential> GenerateRayDifferential( Camera camera, CameraSample sample, SampledWavelengths &lambda); Ray RenderFromCamera(const Ray &r) const { return cameraTransform.RenderFromCamera(r); } RayDifferential RenderFromCamera(const RayDifferential &r) const { return cameraTransform.RenderFromCamera(r); } PBRT_CPU_GPU Vector3f RenderFromCamera(Vector3f v, Float time) const { return cameraTransform.RenderFromCamera(v, time); } PBRT_CPU_GPU Normal3f RenderFromCamera(Normal3f v, Float time) const { return cameraTransform.RenderFromCamera(v, time); } PBRT_CPU_GPU Point3f RenderFromCamera(Point3f p, Float time) const { return cameraTransform.RenderFromCamera(p, time); } PBRT_CPU_GPU Vector3f CameraFromRender(Vector3f v, Float time) const { return cameraTransform.CameraFromRender(v, time); } PBRT_CPU_GPU Normal3f CameraFromRender(Normal3f v, Float time) const { return cameraTransform.CameraFromRender(v, time); } PBRT_CPU_GPU Point3f CameraFromRender(Point3f p, Float time) const { return cameraTransform.CameraFromRender(p, time); } void FindMinimumDifferentials(Camera camera);
};

The CameraBase constructor takes a variety of parameters that are applicable to all of pbrt’s cameras:

  • One of the most important is the transformation that places the camera in the scene, which is represented by a CameraTransform and is stored in the cameraTransform member variable.
  • Next is a pair of floating-point values that give the times at which the camera’s shutter opens and closes.
  • A Film instance stores the final image and models the film sensor.
  • Last is a Medium instance that represents the scattering medium that the camera lies in, if any (Medium is described in Section 11.4).

A small structure bundles them together and helps shorten the length of the parameter lists for Camera constructors.

<<CameraBaseParameters Definition>>= 
struct CameraBaseParameters { CameraTransform cameraTransform; Float shutterOpen = 0, shutterClose = 1; Film film; Medium medium; };

We will only include the constructor’s prototype here because its implementation does no more than assign the parameters to the corresponding member variables.

<<CameraBase Protected Methods>>= 
CameraBase(CameraBaseParameters p);

<<CameraBase Protected Members>>= 
CameraTransform cameraTransform; Float shutterOpen, shutterClose; Film film; Medium medium;

CameraBase can implement a number of the methods required by the Camera interface directly, thus saving the trouble of needing to redundantly implement them in the camera implementations that inherit from it.

For example, accessor methods make the Film and CameraTransform available.

<<CameraBase Public Methods>>= 
Film GetFilm() const { return film; } const CameraTransform &GetCameraTransform() const { return cameraTransform; }

The SampleTime() method is implemented by linearly interpolating between the shutter open and close times using the sample u.

<<CameraBase Public Methods>>+=  
Float SampleTime(Float u) const { return Lerp(u, shutterOpen, shutterClose); }

CameraBase provides a GenerateRayDifferential() method that computes a ray differential via multiple calls to a camera’s GenerateRay() method. One subtlety is that camera implementations that use this method still must implement a Camera GenerateRayDifferential() method themselves, but then call this method from theirs. (Note that this method’s signature is different than that one.) Cameras pass their this pointer as a Camera parameter, which allows it to call the camera’s GenerateRay() method. This additional complexity stems from our not using virtual functions for the camera interface, which means that the CameraBase class does not on its own have the ability to call that method unless a Camera is provided to it.

<<CameraBase Method Definitions>>= 
pstd::optional<CameraRayDifferential> CameraBase::GenerateRayDifferential(Camera camera, CameraSample sample, SampledWavelengths &lambda) { <<Generate regular camera ray cr for ray differential>> 
pstd::optional<CameraRay> cr = camera.GenerateRay(sample, lambda); if (!cr) return {}; RayDifferential rd(cr->ray);
<<Find camera ray after shifting one pixel in the x direction>> 
pstd::optional<CameraRay> rx; for (Float eps : {.05f, -.05f}) { CameraSample sshift = sample; sshift.pFilm.x += eps; <<Try to generate ray with sshift and compute x differential>> 
if (rx = camera.GenerateRay(sshift, lambda); rx) { rd.rxOrigin = rd.o + (rx->ray.o - rd.o) / eps; rd.rxDirection = rd.d + (rx->ray.d - rd.d) / eps; break; }
}
<<Find camera ray after shifting one pixel in the y direction>> 
pstd::optional<CameraRay> ry; for (Float eps : {.05f, -.05f}) { CameraSample sshift = sample; sshift.pFilm.y += eps; if (ry = camera.GenerateRay(sshift, lambda); ry) { rd.ryOrigin = rd.o + (ry->ray.o - rd.o) / eps; rd.ryDirection = rd.d + (ry->ray.d - rd.d) / eps; break; } }
<<Return approximate ray differential and weight>> 
rd.hasDifferentials = rx && ry; return CameraRayDifferential{rd, cr->weight};
}

The primary ray is found via a first call to GenerateRay(). If there is no valid ray for the given sample, then there can be no ray differential either.

<<Generate regular camera ray cr for ray differential>>= 
pstd::optional<CameraRay> cr = camera.GenerateRay(sample, lambda); if (!cr) return {}; RayDifferential rd(cr->ray);

Two attempts are made to find the x ray differential: one using forward differencing and one using backward differencing by a fraction of a pixel. It is important to try both of these due to vignetting at the edges of images formed by realistic camera models—sometimes the main ray is valid but shifting in one direction moves past the image formed by the lens system. In that case, trying the other direction may successfully generate a ray.

<<Find camera ray after shifting one pixel in the x direction>>= 
pstd::optional<CameraRay> rx; for (Float eps : {.05f, -.05f}) { CameraSample sshift = sample; sshift.pFilm.x += eps; <<Try to generate ray with sshift and compute x differential>> 
if (rx = camera.GenerateRay(sshift, lambda); rx) { rd.rxOrigin = rd.o + (rx->ray.o - rd.o) / eps; rd.rxDirection = rd.d + (rx->ray.d - rd.d) / eps; break; }
}

If it was possible to generate the auxiliary x ray, then the corresponding pixel-wide differential is initialized via differencing.

<<Try to generate ray with sshift and compute x differential>>= 
if (rx = camera.GenerateRay(sshift, lambda); rx) { rd.rxOrigin = rd.o + (rx->ray.o - rd.o) / eps; rd.rxDirection = rd.d + (rx->ray.d - rd.d) / eps; break; }

The implementation of the fragment <<Find camera ray after shifting one pixel in the y direction>> follows similarly and is not included here.

If a valid ray was found for both x and y , we can go ahead and set the hasDifferentials member variable to true. Otherwise, the main ray can still be traced, just without differentials available.

<<Return approximate ray differential and weight>>= 
rd.hasDifferentials = rx && ry; return CameraRayDifferential{rd, cr->weight};

Finally, for the convenience of its subclasses, CameraBase provides various transformation methods that use the CameraTransform. We will only include the Ray method here; the others are analogous.

<<CameraBase Protected Methods>>+=