74 LOW-LEVEL RENDERING CHAPTER 3 For each pixel within the primitive, one or more fragments, or samples of the geometry within the pixel, are generated. The fragment contains the interpolated values, which are used to texture map the fragment, to blend it with previous values in the frame buffer, and to subject it to various tests, such as the depth test to determine visibility. If the fragment passes the tests, it is finally stored into the frame buffer. Determining which pixels a primitive should cover is not trivial. For example, when ras- terizing the area covered by two adjacent triangles, each pixel needs to be rasterized exactly once, that is, no gaps may appear, nor may the neighboring t riangles draw twice any of the edge pixels. After the rasterization has been prepared, traditional 3D pipelines that do not support floating-point values in the frame buffer may perform all the remaining steps of raster- ization and pixel processing using fixed-point, i.e., integer arithmetic. Color values, for example, can be expressed in a few bits, for example 8 bits per channel. How many bits are needed to express and interpolate screen coordinates depends on the size of the dis- play. For example, if the width and height of the display are at most 512 pixels, 9 bits are enough to store the x and y pixel coordinates. Some additional decimal bits are needed to maintain sub-pixel accuracy, as otherwise slowly moving objects would produce jerky motion. Being able to convert floating-point values to fixed point means that software rasterization on devices without floating-point hardware remains feasible, and even if the rasterization uses specialized hardware, that hardware is simpler, using less silicon and also less power. Below we will first describe texture mapping; then we study the ways to interpolate the vertex values across the primitive, we deal with fog, and finally take a look at antialiasing. The images on the bottom row of Figure 3.2 illustrate basic texture mapping on a sphere, and multitexturing using a bump map as the second texture map. 3.4.1 TEXTURE MAPPING Coarse models where geometry and colors change only at vertices are fast to model and draw, but do not appear realistic. For example, one could model a brick wall simply with two orange triangles. An alternative is to have a very detailed model of the wall, insert- ing vertices anywhere the geometry or colors change. The results can be highly realistic, but modeling becomes difficult and rendering terribly slow. Texture mapping combines the best aspects of these two approaches. The base geometric model can be coarse, but a detailed image is mapped over that geometry, producing more detailed and realistic apparent geometry and detailed varying colors. The texture map can be created, for exam- ple, by taking digital photographs of real objects. The typical case is a 2D texture map, where the texture is a single image. A 1D texture map is a special case of a 2D texture that has only one row of texture pixels, or texels for short. A 3D texture consists of a stack of 2D images; one can think of them filling SECTION 3.4 RASTERIZATION 75 Figure 3.11: Texture mapping. Portions of a bitmap image on the left are mapped on two triangles on the right. If the triangles do not have the same shape as their preimages in the texture map, the image appears somewhat distorted. a volume. However, 3D texture data requires a lot of memory at runtime, more than is usually available on mobile devices, and thus the mobile 3D APIs only support 2 D texture maps. Images are often stored in compressed formats such as JPEG or PNG. Usually the devel- oper first has to read the file into memory as an uncompressed array, and then pass the texture image data to the 3D API. Some implementations may also support hard- ware texture compression [BAC96, Fen03, SAM05], but those compression formats are proprietary and are not guaranteed to be supported by different phones. Texture coordinates The way texture data is mapped to the geometry is determined using texture coordinates. For a textured surface every vertex needs to have an associated texture coordinate. The texture coordinates are also 4D homogeneous coordinates, similar as covered before for geometry, but they are called (s, t, r, q).Ifonlysomeofthemaregiven,says and t, r is set to 0 and q to 1. The texture coordinates can be transformed using a 4 ×4 texture matrix, and for 2D texture mapping s and t of the result are divided by q, while the output r is ig nored. The transformed texture coordinates map to the texture map image so that the lower left image corner has coordinates (0.0, 0.0) and the top right image corner has coordinates (1.0, 1.0). During rasterization, the texture coordinates are interpolated. If the values of q are dif- ferent on different vertices, we are doing projective texture mapping. In that case also the 76 LOW-LEVEL RENDERING CHAPTER 3 q component needs to be interpolated, and the division of r and s by q should happen at each fragment, not only at the vertices. For each fragment the interpolated coordinates are used to fetch the actual texture data, and the texture data is used to adjust or replace the fragment color. If multiple texture maps are assigned for a surface, there needs to be a separate set of texture coordinates for each map, and the textures are applied in succession. It is much easier to build hardware to access texture data if the texture image sizes are pow- ers of two, that is, 1, 2, 4, , 64, 128, and so on. Therefore the texture image dimensions are by default restricted to be powers of two, though the width can differ from height, e.g., 32 × 64 is a valid texture size, while 24 × 24 is not. As shown in Figure 3.12(a), the origin (s, t) = (0, 0) of the texture coordinates is in the lower left corner of the texture data, and for all texture sizes, even if the width differs from the height, the right end is at s = 1, the top at t = 1, and the top right corner at (s, t) = (1, 1). Some implementations provide an extension that lifts the requirement that the texture sizes must be powers of two. Texture coordinates that have a value less than zero or greater than one have to be wrapped so that they access valid texture data. The two basic wrapping modes are clamp-to-edge, that is, projecting the texture coordinate to the closest texel on the edge of the texture map, and repeat, which repeats the image by ignoring the integer part of the texture coordinate and only using the fractional part. These are illustrated in Figure 3.12(b) and (c) respec- tively. Note that it is possible to use a different wrapping mode for the s (horizontal) and t (vertical) directions. Texture fetch and filtering For each fragment, the rasterizer interpolates a texture coordinate, with which we then need to sample the texture image. The simplest approach is to use point sampling:con- vert the texture coordinate to the address of the texel that matches the coordinate, and fetch that value. Although returning just one of the values stored in the texture map is sometimes just what is needed, for better quality more processing is required. On the left side of Figure 3.13 the diagram shows the area of the texture map that corresponds to (0, 0) (1, 0) (1, 1) (0, 1) (a) (b) (c) (0, 0) (1, 0) (1, 1) (0, 1) (0, 0) (1, 0) (1, 1) (0, 1) Figure 3.12: (a) The (s, t) coordinate system of a texture image of 4 × 2 texels. (b) Wrapping with clamp to edge. (c) Wrapping with repeat. SECTION 3.4 RASTERIZATION 77 Level i 1 1 Figure 3.13: Texture filtering. If a screen pixel corresponds to an area in the texture map smaller than a texel, the texture map needs to be magnified, otherwise it needs to be minified for the pixel. In bilinear interpolation texel colors are first interpolated based on the s-coordinate value, then on the t-coordinate. Trilinear interpolation additionally interpolates across mipmap levels. A mipmap image sequence consists of smaller filtered versions of the detailed base level texture map. a particular image pixel. In one case a smaller area than one texel is magnified to fill the pixel; in the other the area of almost eight texels needs to be minified into a single pixel. In magnification, if the texel area matching the pixel comes fully from a single texel, point sampling would give a correct solution. However, in Figure 3.13 at the center of the top row, the pixel happens to project roughly to the corner of four texels. A smoother filtered result is obtained by bilinear interpolation as illustrated at top middle. In the illustration, the pixel projects to the gray point in the middle of the small square among the four texels. The values of the two top row texels are interpolated based on the s-coordinate, and the same is done on the lower row. Then these interpolated values are interpolated again using the t-coordinate value. The closer the gray point is to the black centers of texels, the closer the interpolated value is to that of the texel. Minification is more demanding than magnification, as more texels influence the out- come. Minification can be made faster by prefiltering, usually done by mipmapping [Wil83]. The term mip comes from the Latin phrase multum in parvo, “much in little,” summarizing or compressing much into little space. A mipmap is a sequence of prefiltered images. The most detailed image is at the zeroth level; at the first level the image is only a quarter of the size of the original, and its pixels are often obtained by averaging four pixels from the finer level. That map is then filtered in turn, until we end up w ith a 1 × 1 texture map which is the average of the whole image. The complete mipmap pyramid 78 LOW-LEVEL RENDERING CHAPTER 3 takes only 1 3 more space than the original texture map. Now if roughly seven texels would be needed to cover the pixel in Figure 3.13, we can perform a bilinear interpolation at the levels 1 (1 texel covers 4 original texels) and 2 (1 texel covers 16 original texels), and linearly interpolate between those bilinearly filtered levels, producing trilinear filtering. Mipmapping improves performance for two reasons. First, the number of texels required is bound, even if the whole object is so far away that it projects to a single pixel. Second, even if we did only point sampling for minification, neighboring image pixels would need to fetch texels that are widely scattered across the texture map. At a suitable mipmap level the texels needed for neighboring image pixels are also adjacent to each other, and it is often cheaper to fetch adjacent items from memory than scattered items. Nevertheless, trilinear filtering requires accessing and blending eight texels, which is quite a lot for soft- ware engines without dedicated texture units, so the mobile 3D APIs allow approximating full trilinear filtering with a bilinear filtering at the closest mipmap level. In general, point sampling is faster than bilinear filtering, whereas bilinear filtering gives higher-quality results. However, if you want to map texels directly to pixels so they have the same size (so that neither minification nor magnification is used) and the s-direction aligns with screen x and t w ith y, p oint sampling yields both faster and better results. Bilinear filtering can also be leveraged for post-processing effects. Figure 3.14 demon- strates a light bloom effect, where the highlights of a scene are rendered into a separate image. This image is then repeatedly downsampled by using bilinear filtering , averaging four pixels into one in each pass. Finally, a weighted blend of the downsampled versions is composited on top of the normal image, achieving the appearance of bright light outside of the window. In desktop OpenGL there are some additional filtering features that are not available in the current versions of the mobile APIs. They include level of detail (LOD) parameters for better control of the use and memory allocation of mipmap levels, and anisotropic filtering for surfaces that are slanted with respect to the camera viewing direction. Texture borders and linear interpolation The or iginal OpenGL clamps texture coordinates to [0, 1], which gives problems for tex- ture filtering. Let us see what happens at (s, t) = (0, 0) . It lies at the lower left corner of the lower l eftmost texel, and bilinear interpolation should return the average of that texel and its west, south, and southwest neighbors. The problem is that those neighbors do not exist. To overcome this problem, one could add a one-texel-wide boundary or border around the texture map image to provide the required neighbors for correct filtering. However, the introduction of the clamp-to-edge mode mostly removes the need of such neighbors. This mode clamps the texture coordinates to [min, max] where min = 1/(2N) and max = 1 − min, and N is either the width or height of the texture map. As a result, borders were dropped from OpenGL ES. SECTION 3.4 RASTERIZATION 79 10% 15% 34% 60% 128 3 128 64 3 64 32 3 32 16 3 16 1 Figure 3.14: Rendering a light bloom effect by blurring the highlights and compositing on top of the normal scene. Images copyright AMD. (See the color plate.) There is one case where the border would be useful, however: if a larger texture map should be created from several smaller ones, and filtering across them should work cor- rectly. The triangle corners would have texture coordinate values of 0 or 1, and the borders would be copied from the neighboring texture maps. However, you can emulate that even without borders. First, create texture maps so that they overlap by one texel. Then set the texture coordinates of the neighboring triangles to 1/N or 1 − 1/N instead of 0 or 1. Now the texture maps filter correctly and blend to each other seamlessly. Note that borders are never needed with the repeat mode, since if a neighboring texel that would be outside of the texture image is needed, it is fetched from the other side of the same image. If you do not intend to repeat your textures, enabling the repeat mode may create artifacts on the boundary pixels as the colors may bleed from the other side of the texture at the boundaries. Therefore you should not use the repeat mode if clamp-to-edge is sufficient. Texture formats and functions Depending on the texture pixel format and blending function, the fragment’s base color, that is interpolated from the vertices, is replaced with, modulated by, or otherwise com- bined with the filtered texel. 80 LOW-LEVEL RENDERING CHAPTER 3 The most versatile of the texture formats is RGBA, a four-channel texture image. The RGB format stores only the color but no alpha value. If all the color channels have the same value, we can save space and use only a single luminance channel L. Finally, we can have one-channel alpha A, or combine luminance and alpha into LA. Now as we describe the texture functions, also known as texture blending functions or modes, we define the interpolated fragment color and alpha as C f and A f , the texture source color and alpha as C s and A s , and the user-given constant color as C c and A c . See Figure 3.15 for an example of using each mode. The texture function and the constant color together comprise the texture environment. Note that these attributes are set sepa- rately for each texture unit. With the REPLACE function, the texture source data replaces the fragment color and/or alpha. RGBA and LA formats produce (C s , A s ), L and RGB formats give (C s , A f ), and A format yields (C f , A s ). With the MODULATE function, the source data modulates the fragment data through multiplication. RGBA and LA formats produce (C f C s , A f A s ), L and RGB formats give (C f C s , A f ), and A format yields (C f , A f A s ). The DECAL function can be only used with RGB and RGBA formats. The color of the underlying surface is changed, but its transparency (alpha) is not affected. With RGB the color is simply replaced (C s , A f ), but RGBA blends the fragment and texture colors using the texture alpha as the blending factor (C f (1 − A s ) + C s A s , A f ). The BLEND function modulates alpha through multiplication, and uses the texture color to blend between the fragment color and user-given constant color. RGBA and LA formats produce (C f (1 − C s ) + C c C s , A f A s ), L and RGB formats give (C f (1 − C s ) + C c C s , A f ), and A format yields (C f , A f A s ). Finally, the ADD function modulates alpha and adds together the fragment and texture source colors. RGBA and LA formats produce (C f + C s , A f A s ), L and RGB formats give (C f + C s , A f ), and A format yields (C f , A f A s ). Multitexturing A 3D engine may have several texturing units, each with its own texture data format, function, matrix, and so on. By default, the input fragment color is successively combined with each texture according to the state of the corresponding unit, and the resulting color is passed as input to the next unit, until the final output goes to the next stage of 3D pipeline, i.e., tests and blending. On OpenGL ES 1.1, it is possible to use more powerful texture combiner functions. A separate function can be defined for the RGB and alpha components. The inputs to the function can come either from the texture map of the current unit, from the original fragment color, from the output of the previous unit, or it can be the constant user- defined color (C c , A c ). The functions allow you to add, subtract, multiply, or interpolate SECTION 3.4 RASTERIZATION 81 Figure 3.15: The effect of different texture functions. At the top, incoming fragment colors (left) and texture (right); trans- parency is indicated with the checkerboard pattern behind the image. Bottom: resulting textures after each texture operation; left to right: REPLACE, MODULATE, DECAL, BLEND, ADD. For the BLEND mode, the user-defined blending color is pure yellow. (See the color plate.) the inputs, and even take a texel-wise dot product, which can be used for per-pixel lighting effects. With multiple texture units it is useful to separate which par t of the texture mapping state belongs to each texturing unit, and which part belongs to each texture object. A texture object contains the texture image data, the for mat that the data is in, and the fil- tering parameter (such as clamp-to-edge or repeat). Each texturing unit, on the other 82 LOW-LEVEL RENDERING CHAPTER 3 hand, includes a currently bound texture object, a texture blending function, a user-given constant color (C c , A c ), a texture matrix that is applied to texture coordinates, and a pointer to texture coordinates of the unit. 3.4.2 INTERPOLATING GRADIENTS The simplest way to spread the values at vertices across triangles is to choose the values at one of the vertices and assign the same value to every fragment within the triangle. In OpenGL this is called flat shading, since the triangle will then have a constant color, the color calculated when shading the first vertex of the triangle. Although fast to compute, this results in a faceted look. Much better results can be obtained when the vertex values are interpolated. Screen linear interpolation Screen linear interpolation projects the vertices to the frame buffer, finds the target pixels, and linearly interpolates the associated values such as colors and texture coordinates to the pixels between the vertices. We can express this using so-called barycentric coordinates. If we take any a, b, and c such that they sum up to one, the point p = ap a + bp b + cp c will lie on the plane defined by the three points p a , p b , and p c , and if none of a, b, c are negative, then p lies within the triangle formed by the three points. We can use the same weights to blend the values at triangle corners to get a linearly interpolated value for any pixel within the triangle: f = af a + bf b + cf c (3.5) where f a , f b , and f c are the values at triangle corners, and f is the interpolated value. Many graphics systems interpolate vertex colors this way as it produces smoothly varying shading where the triangulated nature of the underlying surface is far less obvious than with flat shading. However, linear interpolation on the screen space ignores perspective effects such as foreshortening. While vertices are projected correctly, the values on the pixels between them are not. Figure 3.16 shows two squares (pairs of triangles) that Figure 3.16: A square made of two triangles, with a grid pattern, seen in perspective. For the first square the grid pattern is interpolated in screen space. The center vertical bar on the upper triangle goes from the center of the upper edge to the center of the diagonal, and continues to the center of the lower edge of the lower triangle. For the second square the interpolation is perspective-correct, and the center vertical bar remains straight. SECTION 3.4 RASTERIZATION 83 are tilted with respect to the camera, and the errors caused by screen linear interpolation. The grid pattern on the squares makes the effect obvious. The vertical lines, which appear straight on the right image, are broken in the left one. The center of the square interpolates to the middle of the diagonal, and when that is connected to the middle of the top edge and of the bottom edge, the bar does not make a straight line. Perspective-correct interpolation The fragments on the square on the right have been interpolated in a perspective-correct manner. The key to do this is to delay the perspective division of homogeneous coordi- nates until after the interpolation. That is, linearly interpolate both f/w and 1/w,where f is the value and w is the last component of the homogeneous coordinate, then recover the perspective-correct value by dividing the interpolated f/w by the interpolated 1/w, yielding f = af a /w a + bf b /w b + cf c /w c a/w a + b/w b + c/w c . (3.6) If we add another projection to the system—that is, projective texture mapping—we also need to bring q into the equation: f = af a /w a + bf b /w b + cf c /w c aq a /w a + bq b /w b + cq c /w c . (3.7) Perspective-correct interpolation is clearly quite expensive: it implies more interpolation (also the 1/ w term), but even worse, it implies a division at each fragment. These opera- tions require either extra processing cycles or more silicon. Because of its impact to performance, some software-based engines only do perspective- correct interpolation for texture coordinates; other values are interpolated linearly in screen space. Another approach is based on the fact that if the triangles are very small— only a few pixels in image space—the error due to screen linear interpolation becomes negligible. Reasonably good results can be achieved by doing the perspective-correct inter- polation only every few screen pixels, and by linearly interpolating between those samples. Many software implementations achieve this by recursively subdividing triangles. If done at the application level, this is likely to be slow, but can be made reasonably fast if imple- mented inside the graphics engine. 3.4.3 TEXTURE-BASED LIGHTING There are several ways to do high-quality lighting effects using texture maps. The basic OpenGL lighting is performed only at vertices, and using a relatively simple lighting model. Using texture mapping it is possible to get per-pixel illumination using arbitrary lighting models. The simplest situation is if the lighting of the environment is static and view-independent, that is, if the lighting is fixed and we only have diffuse lighting. Then one can bake . the corresponding unit, and the resulting color is passed as input to the next unit, until the final output goes to the next stage of 3D pipeline, i.e., tests and blending. On OpenGL ES 1.1, it. texture coordinates of the unit. 3.4.2 INTERPOLATING GRADIENTS The simplest way to spread the values at vertices across triangles is to choose the values at one of the vertices and assign the same. pixels, and linearly interpolates the associated values such as colors and texture coordinates to the pixels between the vertices. We can express this using so-called barycentric coordinates. If we