This page intentionally left blank 3 CHAPTER LOW-LEVEL RENDERING This chapter describes the traditional low-level 3D pipeline as it has been defined in OpenGL. A diagram of the OpenGL ES pipeline in Figure 3.1 shows how various pipeline components relate to each other, and how the data flows from an application to the frame buffer. Figure 3.2 visualizes some of the processing in these various pipeline stages. Note that the diagram, and the whole pipeline specification, is only conceptual; imple- mentations may vary the processing order, but they must produce the same result as the conceptual specification. We start by describing the primitives that define the shapes and patterns that are displayed, including both the 3D geometric primitives such as points, lines, and triangles, as well as the 2D image primitives. The geometric primitives can have materials that interact with light, and they can be affected by fog. The primitives are projected into a coordinate system that allows simple determination of what is visible: a primitive is visible if it is not occluded by other primitives, and it lies in the viewing frustum of the camera. Continuous 3D shapes are then rasterized into discrete fragments, and the fragment colors can be modulated by one or more texture maps. The fragments can still be rejected by various tests, and their colors can be blended with the pixels that already exist in the frame buffer. 55 56 LOW-LEVEL RENDERING CHAPTER 3 Vertex buffer Vertex array Matrix control M P T 0 T n21 M 2 T Current texcoord n 2 1 Current texcoord 0 Current vertex Current normal Current color Material control Lighting User clip plane User clip Primitive assembly Eye coordinates Clip coordinates Normalized device coordinates Window coordinates Frustum clip Perspective divide Viewport transform Backface cull Texture 0 application Texture n21 application Texel n21 fetch Texel 0 fetch Rasterization & interpolation Texture memory Alpha test Multi- sample Scissor test Coverage generation Fog Depth offset Stencil test Depth test Blending Dithering Logic Op Masking Read pixels Frame Buffer (Color, Depth, Stencil) F R A G M E N T S P R I M I T I V E S V E R T I C E S Copy pixels Figure 3.1: A simplified view of the OpenGL ES 1.1 pipeline. Shapes with dashed outlines indicate features that are new or significantly expanded in version 1.1. Dark gray and light gray shapes indi- cate features that are not included in M3G, or are included in simplified form, respectively. M is the modelview matrix, the T i are texture matrices, and P is the projection matrix. SECTION 3.1 RENDERING PRIMITIVES 57 Figure 3.2: Illustrating the various stages of shading discussed in Chapters 3 and 8–10. Top row, left to right: wire frame model; filled model; diffuse lighting; diffuse and Phong specular lighting. Bottom row: texturing added; texturing with a sep- arate specular pass; bump mapping added; and rendered with an intersecting translucent object to demonstrate Z-buffering and alpha blending. (See the color plate.) 3.1 RENDERING PRIMITIVES In this section we describe the rendering primitives of a 3D engine. We begin with the geometric primitives such as points, lines, and triangles, and then continue to raster prim- itives using texture map data. 3.1.1 GEOMETRIC PRIMITIVES The basic geometric primitives defined in OpenGL are points, lines, triangles, quads, and polygons. However, in many hardware-accelerated 3D engines triangles are the only truly native rendering primitive. They are in a sense the easiest and best-behaved of the prim- itives. Three points always lie on a common plane, and if they are not collinear, they uniquely define that plane. The projection of a triangle into the image plane is well defined and changes continuously as we zoom in or zoom out. Points and lines are mathematically even simpler than triangles. The problem is that a mathematical point does not have any extent, nor does a line have any width. Instead of turning a point into a sphere and a line into a cylinder—which would still be real 3D entities—OpenGL defines points and lines as mixed 2D/3D entities, where the location of a point or line end points are true 3D entities, but after they are projected into the image plane they have a point size or line width defined in pixels, making them partially 2D enti- ties. When you zoom into them, the distance between the points or line end points grows, but the point size or line width remains constant. However, it is possible to attenuate the 58 LOW-LEVEL RENDERING CHAPTER 3 size of a point based on its distance from the camera, so as to approximate the effect of true perspective. Quads or quadrilaterals, i.e., polygons with four corners, and other polygons are prob- lematic, because unlike triangles, they are not guaranteed to be planar. If the vertices of a polygon do not lie on the same plane, the edges between the vertices are still well defined, but the surface between them is not. In the worst case, when viewed from the side, the viewer would see both the front and back side at the same time. An obvious solution, which most OpenGL drivers perform internally, is to split the polygon into triangles. The OpenGL ES standardization g roup decided to sidestep the whole issue and only support triangles. Figure 3.3 shows the primitives supported by OpenGL ES. All of them can be expressed as an array of vertices with implicit connectivity. In the upper row we have four vertices, and depending on the primitive type, the four vertices are interpreted either as four points, two disjoint line segments, a strip of three lines, or a l oop of four lines. Similarly, in the bottom row six vertices define either two disjoint triangles, a four-triangle strip where the first three vertices define the first triangle and then every new vertex is connected with the two previous vertices, or a four-triangle fan where the first vertex is the center of the fan, and all the other vertices connect to it. The use of the basic primitives to define a more complex object is illustrated in the first image in Figure 3.2. Figure 3.4 shows a small segment of a regular triangle mesh. You can see that all the internal vertices are adjacent to six triangles, that is, every vertex is responsible for two triangles (see the grayed out triangles on the right upper corner), giving 0.5 vertices per triangle. This, however, works only in the limit w ith a large enough closed mesh. For smaller and irregular meshes, possibly with boundaries, there are usually 0.5–1.0 vertices 0 1 2 3 GL_POINTS GL_LINES 0 1 2 3 GL_LINE_STRIP 0 1 2 3 GL_LINE_LOOP 0 1 2 3 GL_TRIANGLES 0 1 2 3 4 5 GL_TRIANGLE_STRIP GL_TRIANGLE_FAN 0 1 2 3 4 5 0 1 2 3 4 5 Figure 3.3: The geometric primitives in OpenGL ES include points, three ways of defining line segments, and three ways of defining triangles. SECTION 3.1 RENDERING PRIMITIVES 59 Figure 3.4: A piece of a regular triangle mesh. In this case every nonboundary vertex is shared by six triangles. In other words, most vertices define two new triangles. per triangle. The ratios become important as we study how much data must be passed from the program to the graphics hardware, and possibly replicated. The most straightforward way uses implicit indexing, and simply lists the triang les of a mesh, three vertices of a triangle at a time. This is clearly wasteful, as each vertex is expressed and transformed six times, on the average. Triangle strips and fans are much more efficient, as after the first triangle every new vertex produces a new triangle. But as you can see in Figure 3.4, if you make each row of triangles a strip, the vertices in the inter- nal rows have to be processed twice, once when they are the “lower” vertices of a st rip, and the second time as the “upper” vertices of the next strip. If you want to do better, you have to use explicit indexing. That is, instead of giving the vertices in a particular order from which the primitive connectivity is deduced, you first just give the vertices in an array, and then have another array which indexes into the first array and gives the triangle connectivity using the same methods as w ith the implicit ordering, e.g., triangles, triangle strips, and so on. A key advantage is that now it is possi- ble to avoid transforming vertices more than once. Many engines use a vertex cache, which buffers transformed vertices on the GPU. If the same vertex is indexed again, the system may avoid retransforming the same vertex. A naive implementation would require a large vertex cache, but a careful ordering of the triangles so that the vertices are accessed again soon after the first use, instead of much later, can provide almost the same savings using only a small vertex cache [Hop99]. Similar primitives should be batched as much as possible, i.e., they should be put into the same array. It is much faster to draw one array of a hundred triangles than fifty arrays of two triang les each. This becomes even more important if parts of the scene use differ- ent materials and textures—combining primitives that share the same state, e.g., texture maps, can produce considerable savings as the state changes are often quite costly. See Section 6.4 for more information. Most modeling programs support higher-order smooth surfaces and curves such as sub- division surfaces [ZSD + 00], B ´ ezier patches [Boo01], or nonuniform rational b-splines 60 LOW-LEVEL RENDERING CHAPTER 3 (NURBS) [PT96]. Smooth primitives encode smooth surfaces much more compactly than a dense triangle mesh can. Additionally, when you zoom close enough, the triangle mesh becomes visibly polygonal, while a smooth surface remains smooth no matter how closely inspected. This makes smooth surfaces good candidates for a storage for mat. How- ever, smooth surfaces are much more complicated to rasterize into pixels than triangles are. Furthermore, there are many choices for the representation of smooth surfaces, and there is no general consensus of a type that would be optimal for all uses. Therefore modern 3D engines do not usually provide direct support for smooth surfaces, but require the application to tessellate them into a set of triangles, which may be cached for repeated rendering. 3.1.2 RASTER PRIMITIVES Raster primitives consist of image data, blocks of pixels, and do not scale as naturally as geometric primitives. If you scale a raster image down, several input pixels map to each output pixel, and they need to be first low-pass filtered, averaged, so that the right amount of blurring in the output takes place. This can be done, and it only requires some additional computation. However, when scaling a raster image up, no new information is introduced. Instead, the pixels grow to become large visible squares, and the image quality suffers. An advantage of raster images over geometric primitives is that arbitrarily complicated imagery can be rendered very quickly. In its simplest form, a raster image is just copied into the frame buffer without any scaling or blending operations. Another advantage is that obtaining raster images can be easy. One can draw the images with an image-editing program, or one can take photos with a digital camera. Applications of raster images include using them to draw a background, e.g., the faraway landscape and sky in a rally game, or the foreground, e.g., the dashboard of the rally car, while the moving objects such as the road, other cars, and trees that are close enough are dr awn using geometric primi- tives. 2D games are often designed and implemented using sprites, small raster images that are directly placed on the 2D window. In 3D, sprites are often called impostors, especially if they are used in place of an object with complicated geometry, such as a tree or a bush. Unless the sprite or impostor is a rectangular block, some of the pixels need to be marked as transparent, while the rest are fully or partially opaque. The opacity information is stored in an additional alpha channel value associated with each pixel. With bitmaps one can take this approach to the extreme, storing only a single bit per pixel for opacity. Such bitmaps can be used for drawing non-antialiased text, for example. The concept of texture mapping complements geometric primitives with raster images. In texture mapping one “pastes” a raster image on the geometric primitives such as triangles, before drawing the scene from the vantage point of the camera. We will cover texture mapping in more detail later. Whereas OpenGL supports direct drawing of both raster images and bitmaps, OpenGL ES simplifies the API by supporting only texture mapping, SECTION 3.2 LIGHTING 61 with the idea that the omitted functionality can be simply emulated by drawing a rectangle formed of two texture-mapped triangles. While M3G supports background images and sprites, those are often implemented using textured triangles. 3.2 LIGHTING The earlier sections have covered the 3D primitives and transformations needed to model objects, place a camera, and project a scene to the camera’s frame buffer. That is suffi- cient for line drawings, or silhouettes of shapes in uniform colors. However, that is not enough to get an impression of the 3D shape of an object. For this we need to estimate how light sources illuminate the surfaces of the objects. As Figure 3.1 shows, the user may spec- ify vertex colors that are used either as is, or as surface material properties used in the lighting equation. Properly determining the correct color of a surface illuminated by various light sources is a difficult problem, and a series of simplifications is required to come up with a compu- tationally reasonable approximation of the true interaction of light, matter, participating media (such as the air with fog and airborne dust), and finally the eye observing the scene. A light source, such as the sun or an electric bulb, emits countless photons to every direc- tion; These photons then travel, usually along a straight path, and can be absorbed or filtered by the medium through which they travel. When a photon hits a surface it can be reflected to various directions; it can be refracted and transmitted through a tr ansparent or translucent material, it can be scattered inside the mater ial and exit in a different loca- tion from where it entered, it can be absorbed by the matter, and the absorbed energy may be later released as fluorescence or phosphorescence. Raytracing algorithms mimic this complicated behavior of rays, but traditional real-time graphics architectures use local, simpler approximations. In this section we first describe the color representation. Then we explain normal vectors and what they are used for. We continue with the OpenGL reflectance model consisting of ambient, diffuse, specular, and emissive components of material properties, cover the sup- ported light sources, and finish with the complete lighting equation. The second through fourth images in Figure 3.2 illustrate the effects of ambient, diffuse, and specular shading, respectively. 3.2.1 COLOR Light is electromagnetic radiation of any wavelength, but the visible range of a typical human eye is between 400 and 700 nm. Color, on the other hand, is more of a perception in people’s minds than a part of objective reality. The eye contains three types of sensors called cones. Each type is sensitive to different wavelengths. There is also a fourth sensor type, rod, but its signals are only perceived when it is dark. From this fact two interesting observations follow. First, even though all the colors of a rainbow correspond to a single 62 LOW-LEVEL RENDERING CHAPTER 3 wavelength, many colors that people can see, e.g., pink, brown, purple, or white, can only be created by a combination of at least two or even three different wavelengths. Second, one can use three “primary” colors (in computer graphics Red, Green, and Blue), the combinations of which create most colors that people are capable of seeing. The absence of R, G, and B is black, adding red and green together produces yellow, green and blue produce cyan, and red and blue produce magenta. Adding equal amounts of R, G, and B produces a shade of gray ranging from black to white. In OpenGL light is represented as a triplet of arbitrary numbers denoting the amount of red, green, and blue light, each of which is clamped to the range [0, 1] before being stored into the frame buffer. 1.0 means the maximum amount of light that can be displayed on a traditional display, and the RGB triplet (1.0, 1.0, 1.0) indicates white light, (1.0, 0.0, 0.0) provides bright red, and (0.0, 0.0, 0.3) corresponds to dark blue. Larger values are simply clamped to 1.0, so (11.0, 22.0, 0.5) will become (1.0, 1.0, 0.5) at the time of display. If 8-bit integers are used to encode the color components, 0 maps to 0.0 and 255 maps to 1.0. The stored light values do not really correspond to the amount of light energ y. The human eye responds to the amount of light ver y nonlinearly, and the number rather encodes a roughly linear color perception. For example, (0.5, 0.5, 0.5) produces a gray color, roughly halfway between black and white. This is useful as it makes it easy to assign colors, but it does not correspond to true light intensities. A more physically correct representation would store floating-point numbers that correspond to the amount of light energy at each channel of a pixel, and finally map the result into color values between 0.0 and 1.0 by taking into account the eye’s nonlinear response to light. Such high dynamic range (HDR) light and color representations are possible on desktop hardware with the sup- port of floating-point frame buffers and textures, and there are even HDR displays that can emit brighter lights than traditional displays. The mobile APIs only support the tra- ditional low-dynamic range representation of 4–8 bits per color channel. In addition to the color channels, OpenGL defines an additional alpha channel. The alpha channel does not have any inherent meaning or interpretation, but is usually used to encode the level of transparency of a material or surface. Alpha is crucial for compositing, such as merging of nonrectangular images so that the boundaries blend in smoothly with the background. Many systems save in storage by omitting the destination alpha, that is, the frame buffer only stores the RGB value, and the stored alpha is implicitly 1.0. How- ever, it is always possible to define an arbitrary (between 0.0 and 1.0) value as the source alpha, for example, in the definition of a surface material. The amount of storage used in the frame buffer can be denoted by the names of the channels and the number of bits in each channel. Some first-generation mobile graph- ics engines use 16-bit frame buffers. For example, an RGB565 frame buffer has a total of 16 bits for the red, green, and blue channels, and does not store any alpha. Here the red and blue channels have only 5 bits each (31 maps to 1.0) while the green channel has 6 bits (63 maps to 1.0). RGBA4444 and RGBA5551 also use 16 bits per pixel, the former allocates four and the latter one bit for alpha. Desktop engines have for long used frame buffers SECTION 3.2 LIGHTING 63 with 8 bits per channel, i.e., RGB888 and RGBA8888, and those are becoming increasingly common also on handhelds. The desktop and console world is already moving to 16-bit floating-point frame buffers (a total of 64 bits for RGBA), but those are not yet available for mobile devices. 3.2.2 NORMAL VECTORS The intensity of the light reflected back from a surface element to the camera depends strongly on the orientation of the element. An orientation can be represented with a unit normal vector, i.e., a vector that is per pendicular to the surface and has a length of one. As three vertices a, b, c define a triangle uniquely, we can calculate the orientation by n = b − c × ( a − c ) , (3.1) and then normalizing n using Equation (2.4). A t riangle is planar, therefore the whole triangle has the same orientation and reflects a constant amount of light (assuming a small triangle, of the same material at every vertex, far away from the light source). When a smooth surface is approximated by a triangle mesh, the polygonal nature of the approximation is readily observed as the human eye is very good at seeing color discontinuities. The color discontinuity corresponds to normal vector discontinuity, as each vertex has several normals, as many as there are adjoining triangles. A better solution is to define a unique normal vector at each vertex, and then let either the normal vector or the shad- ing vary smoothly between the vertices. These two cases are illustrated in Figure 3.5. As the shading is then continuously interpolated both within and across triangles, the illu- sion of a smooth surface is retained much better, at least inside the silhouette boundaries (a coarsely triangulated mesh still betrays its polygonal nature at the piecewise straight silhouette). For this reason OpenGL requires each vertex to be associated with its own normal vector. Figure 3.5: Left: the vertices of a polygonal mesh are replicated so that each polygon has its own copy of the shared vertex, and the vertices are assigned the surface normals of the polygons. This yields shading discontinuity at vertices. Right: each shared vertex exists only once, with a normal that is the average of the normals of the neighboring faces, resulting in smooth shading. . sphere and a line into a cylinder—which would still be real 3D entities OpenGL defines points and lines as mixed 2D /3D entities, where the location of a point or line end points are true 3D entities,. primitives of a 3D engine. We begin with the geometric primitives such as points, lines, and triangles, and then continue to raster prim- itives using texture map data. 3.1.1 GEOMETRIC PRIMITIVES The. view of the OpenGL ES 1.1 pipeline. Shapes with dashed outlines indicate features that are new or significantly expanded in version 1.1. Dark gray and light gray shapes indi- cate features that are