Alpert/Handbook of Algorithms for Physical Design Automation AU7242_S008 Finals Page 812 24-9-2008 #3 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 813 29-9-2008 #2 39 Placement-Driven S ynthesis Design Closure T ool Charles J. Alpert, Nathaniel Hieter, Arjen Mets, Ruchir Puri, Lakshmi Reddy, Haoxing Ren, and Louise Trevillyan CONTENTS 39.1 Introduction 813 39.2 Major Phases of Physical Synthesis 814 39.3 Optimization and Placement Interaction 816 39.3.1 Bin-Based Placement Model 817 39.3.2 Exact Placement 818 39.4 Critical Path Optimizations 818 39.4.1 Gate Sizing 819 39.4.2 Gate Sizing with Multiple-vt Libraries 819 39.4.3 Incremental Synthesis 820 39.4.4 Advanced Synthesis Techniques 823 39.4.5 Fixing Early Paths 823 39.4.6 Drivers for Multiple Objectives 823 39.5 Mechanisms for Recovery 824 39.5.1 Area Recovery 824 39.5.2 Routing Recovery 826 39.5.3 vt Recovery 827 39.6 Other Considerations 827 39.6.1 HierarchicalDesign 827 39.6.2 High-Performance Clocking 830 39.6.3 Power Gating to ReduceLeakage Power 830 39.7 Into the Future 832 References 833 39.1 INTRODUCTION Much of this book has focused on the components of physical synthesis, such as global placement, detailed placement, buffering, routing, Steiner tree, and congestion estimation. Physical synthesis combines these steps as well as several others to (primarily) perform timing closure. When wire delays were relatively insignificant compared to gate delays, logic synthesis provided a sufficiently accurate picture of the timing of the design. Placement and routing did not need to focus on timing, but were exclusively wirelength driven. Of course, technology trends have transformed physical design because the physical implementation affects timing. 813 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 814 29-9-2008 #3 814 Handbook of Algorithms for Physical Design Automation Today, a design that satisfies timing requirements in synthesis almost certainly will not do so once implemented physically due to wire delays. Physical synthesis is a process that modifies the design so that the impact on timing due to wiring is mitigated. It may move cells, resize logic, buffer nets, and perform local resynthesis. Besides basic timing closure, there are many newer challenges that the physical synthesis system needs to handle [1]. Some examples include lowering power using a technology library with multiple threshold voltages (vt), fixing noise violations that show up after performing routing, and handling the timing variability and uncertainty introduced by modern design processes. This chapter surveys IBM’s physical synthesis tool, called placement-driven synthesis (PDS) or placement-driven synthesis. It builds upon a description of the basics of the tool [2] and also some innovations in turnaround time published in Ref. [3]. 39.2 MAJOR PHASES OF PHYSICAL SYNTHESIS Placement-driven synthesis has hundreds of p aram eter settings available to the user and can be customized by the designer to run in many ways. For example, there are different degrees of routing congestion mitigation or area recovery available. The user may want to exploit gates with low vt or allow assignment of wires to different routing planes. These choices depend on the nature of the design being closed. Although there is no single PDS algorithm to describe, the f ollowing outlines a typical invocation: 1. Netlist preparation. When PDS initializes, of course, the data model needs to be loaded with timing assertions (which encapsulate the timing constraints), user parameters, etc. There also may need to be some scrubbing of the n etlist so that optimization is even viable. As examples, • Gates may need to be sized down so that the total area of the netlist fits within the area of the placaeable region. • Buffers inserted during synthesis may need to be removed so that they do not badly influence placement. A placement algorithm may handle a fanout tree several levels deep, then they logically equivalent single large net. • If the clock tree has not yet been built, it ma y need to be hidden from optimization so that it is not treated as a signal net. Changes to a clocked sequential cell could o therwise cause the timing for every cell in the clock tree to be updated. Before synthesis, an ideal clock with zero skew can be assumed and later replaced with the optimized one. • Timing information can be extracted from either an unplaced or a previously optimized netlist to generate net weights for the p lacement step. 2. Global placement. This step is well-covered in Chapters 1 4 through 19. Besides just tradi- tional minimum wirelength optimization, placement needs to address several other types of constraints. For example, • Density targets direct the placer to not pack cells in tightly in certain areas, so that physical synthesis will have the flexibility to size up cells, insert buffers, etc. • Designer cell movement constraints are used to enable floorplanning in a flat methodology. By restricting a set of cells to a certain rectangular region, the designer is able to plan thatblock, whilestillallowingthetool the flexibility to perform optimizations and placements of the cells within the block. • Routability directives can be used to improve the routability of placement, such as arti- ficially inflating the size of cells in routing congested regions in order to force more spreading [4]. • Clockdomainconstraintscanbeconsideredduring placement toreduceclock tree latency and dynamic power consumption. Latches that belong to the same clock domain can be Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 815 29-9-2008 #4 Placement-Driven Synthesis Design Closure Tool 815 directed to be placed close to each other by either adding special net weights or by imposing movement constraints on latches. 3. Timing analysis. At every point in the flow, timing analysis is a core component because it provides the evaluation of how well PDS is doing in terms of timing closure. It is run both stand alone and incrementally throughout the optimization. For this, IBM’s static timing analysis tool, EinsTimer [5], is used. 4. Electrical correction. After placement, one will certainly find gates that drive loads above the allowed specification and long wires for which the signal exceeds the designer specified slew rate. A few bad slew rates inevitably cause terrible timing results. At this point, it makes sense to correct th e desig n by fixing local slew and capacitance violations, typically through buffering and gate sizing, thereby getting the design into a reasonably good timing state. One can also employ a logical effort [6] type of approach to improve the global timing characteristics of the design. 5. Placement legalization. Fixing electrical violation may result in thousands of buffers being added to the design, and potentially every gate may be assigned a new gate size, which will create overlaps, causing the placement solution to become illegal. The goal of legalization is to fix these overlaps while providing minimum perturbation to the netlist (Chapter 20). 6. Critical path optimization. Once the design is legal and is in a reasonably good timing state, one can employ all kinds of techniques to try to fix the critical paths. Chapters 26 through 28 discuss powerful buffering techniques. Section 39.5 d escribes how other optimization or transforms can also be deployed. A transform is a change to the netlist designed to improve some aspect of the design, for example, breaking apart a complex gate into several smaller simpler ones. During this phase, incremental timing analysis and legalization may be periodically invoked to keep the design in a legal and consistent state. 7. Compression. Critical path optimization may become stuck at some point, when a certain set of the most critical paths cannot be fixed without manual design intervention (e.g., changes to the floorplan must be made). This is shown in Figure 39.1 where the original timing histogram (Figure 39.1a) is improved by critical path optimizations (Figure 39.1b) until it saturates. However, there still may be thousands of failed timing points that exist which could be fixed with lighter weight optimizations directed at the not so critical regions that still violate timing constraints. The purpose of this phase is to compress the remaining negativeportion of the timing histogram to leave as little work as possible for the designer as shown in Figure 39.1c. As in critical path optimization phase, incremental timing analysis and legalization must be incorporated where appropriate. (b) # Failing endpoints Slack(a) Slack # Failing endpoints (c) # Failing endpoints Slack FIGURE 39.1 Timing histogramof (a)an unoptimizeddesign can beimproved by (b)critical path optimization and (c) histogram compression. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 816 29-9-2008 #5 816 Handbook of Algorithms for Physical Design Automation After these phases, the design may still be far from closing on the given timing constraints. At this point, a designer could intervene manually or rerun the flow to try and get a better timing-driven placement now that the real timing problems have been identified. One can run a net weighting algorithm (Chapter 21) to drive the next iteration of placement and the entire flow. In the flow described above, one can make several assumptions to make fast optimization achiev- able. For example, (1) clocks can be idealized so that one assum e a zero skew clock will later be inserted, ( 2) Steiner estimates and one-dimensional extraction can be used for interconnect delay estimation, (3) crosstalk can be ignored, etc. Making these assumptions certainly allows faster run- time than otherwise achievable. In practice, these assumptions are stripped away as the designer makes progress toward timing closure. Once the designer is reasonably happy with the design after running PDS, he or she may then perform clock insertion and perform a pass of incremental physical synthesis to fix problems resulting from actual clock skews. Similarly, once the design is routed, there could be timing problems caused b y scenic routes, along with noise violations from capacitive coupling. The designer can then run incremental physical synthesis in this postrouting environment, using accurate coupling information while also modeling variability for timing. Because several of the main components o f the physical synthesis f low are covered elsewhere in the book, this chapter focuses on aspects that are not covered. 1. Optimization and placement interaction. When optimizations such as buffering or resizing need to make adjustments to the netlist, they cannot happen in a vacuum because they affect the placement. Certain regions may have blockages or be too congested to allow transforms to happen. We explain the communication mechanisms between optimization and placement. 2. Critical path optimizations.Besides buffering,there are numerous techniquesone can use to improve the timing along a c ritical path. Section 39.4 overviews gate sizing and incremental synthesis techniques and the driver/transform model that PDS uses. 3. Recovery mechanisms. Du ring optimization, PDS can cause damage by overfilling local regions, causing routing congestion, etc. Section 39.5 explains how one can apply spe- cialized optimizations for repairing damage so that physical synthesis can continue effectively. 4. Specialized design styles. A typical instance for PDS is a flat ASIC, though customers also utilize it for hierarchical design and for high-performan ce microprocessors. Section 39.6 explains some of the issues faced by PDS and their solutions for these different types of design styles. 39.3 OPTIMIZATION AND PLACEMENT INTERACTION During the critical-path optimizatio n and compression, optimizatio ns such as buffer insertion, gate sizing, box movement, and logic restructuring may need to add, delete, move, or resize boxes. To estimate benefit/cost of these transformations accurately, transforms need to gen e rate legal o r semilegal locations for these boxes on the fly. Otherwise, boxes may be moved to overcongested locations or even on top of blockages, which later need to be resolved by legalization . Legalization then may move boxes far from their intended locations and undo (at least in part) the benefits of optimization. It could even introduce new problems that n eed further optimization. Of course, ideally one would like to compute the exact legal locations for such boxes during optimization, but it often can be too computationally expensive. One strategy PDS uses is to use rough legal locations during early optimization (e.g., electrical correction) when substantial changes are made. Durin g later stages of optimization when smaller or finer chan ges are made, exact legal locations may be computed. Such a strategy strikes a good balance between quality of results and the runtime of the system. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 817 29-9-2008 #6 Placement-Driven Synthesis Design Closure Tool 817 39.3.1 BIN-BASED PLACEMENT MODEL PDS uses a synthesis–placement interface (SPI) to manage the estimation or computation of incremental p lacement. Before optimization, the placement image is divided into a set of regions called bins. Each placeable object in the design is assigned to a bin, and space availability is deter- mined by examining the free space within a bin. The SPI layer manages the interface to an idealized view of the bin structure and provides a rich set of functions to access and manipulate placement data. The SPI layer uses callbacks to keep placement, optimization, and routing data consistent. Instead of computing an exact legal location, newly created or modified logic can be placed in a bin and assigned a coarse-placement location inside the bin. A fast check is performed to make sure that there is enough free space within the b in to accommodate the logic. The interaction between optimizations and the SPI layer works as follows. Suppose an optimiza- tion requests SPI to add or move a box to a specific (x, y) location. SPI gets the bin in which the (x, y) location falls and checks the free space. If there is enough space then optimization uses the location specified. If not, the optimization may ask SPI to find the closest bin in which there is space, in which case, SPI “spirals” through neighboring bins and returns a valid location, which the optimization can evaluate and choose to u se. When a placement is actually assigned, SPI updates bin information to accurately reflect the state of the placement. Using rough placement may result in boxes placed so they overlap each other. This is one reason why legalization needs to be called periodically (see e.g ., Ref. [7]). It is important for the optimized design to remain stable, so the legalizer m aintains as many pre existing locations as possible and, when a box must move, an attempt is made to disturb the timing of critical paths as little as possible. As an example, assume the potential area of placed logic inside a bin is 1000 units and that 930 units of cells are already placed within the bin. If one tries to add a n ew cell of size 90, the SPI interface reports that the bin would become too full (1020) and cannot afford to allow the cell to be placed. On the other hand, a cell o f size 50 can fit (total area 980) so SPI would permit the transform to place the cell in the bin. The p roblem with the bin-based model is that just because the total area allows another cell to be inserted, does not mean it actually can be inserted. As a simple example, consider placing three cells of width three into two rows of width five, with height one for cell and each row. The total area of the cells is nine, while the total placeable area is ten, so it would seem like the cells could fit. However, the cells cannot be placed without exceeding the row capacity. In this sense, legalizing cells within a bin so that they all fit is like the NP-complete bin packing problem. Consider Figure 39.2 in which Bin A and Bin B have exactly the same set of nine cells, though arranged differently. If one tries to insert a new cell into either bin, SPI would return that there is room in the bin, yet one cannot easily insert it in Bin A while one can in Bin B. It is likely that the fracturing of white space in Bin A will lead to legalization eventually moving a cell into a different bin. New cell (a) (b) FIGURE 39.2 New cell cannot be inserted into (a) Bin A but can be in (b) Bin B even though both bins contain the same set of cells. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 818 29-9-2008 #7 818 Handbook of Algorithms for Physical Design Automation Because of these effects, the designer often runs with a guard band (e.g., 5 percent) to make it more likely that cells will avoid unsolvable bin packing scenarios. In our earlier example, we put the virtual bin capacity at 950. Alternatively, one can allow overfilling of bins ( say by 5 percent) to allow transforms to successfully perform optimization, and then rely on powerful legalization techniques like diffusion (Chapter 20) to reduce the likelihood of legalization moving cells far away. As physical synthesis progresses, the bins are reduced in size. This tends to limit the size of box movements that legalization must do. 39.3.2 EXACT PLACEMENT The major problem with the bin-based model is that one can never guarantee that the cell really does fit in its bin. One could always construct test cases with cells of strange sizes that break any bin model (or force it to be ultra-conservative in preventing cells to be inserted). For example, fixed-area I/Os and deco upling capacitors can contribute to the problem. When it gets to o late in the flow, PDS may not be able to recover from big legalization movements tha t degrade timing. During later stage s of the system when the major optimizations have been completed, finding exact locations for the modified cells provides better overall quality of results with reasonable runtimes. PDS implements exact legal locations during op timization as follows. The placement subsystem maintains an incremental bit map (imap) to track all location changes and available free space. For example, if a cell is one row h igh and seven tracks wide, then seven b its of the imap corresponding to the cell’s location are set to one . If the two tracks next to the cell are empty, their bits are set to zero. When a new or modified box needs to be placed at a desired location, the imap capability essentially works like a hole finder. It tries to locate a hole or an empty slot (within some specified maximum distance from the desired location) large enough to place legally the newly created or modified box. As with rough locations, the optimization can evaluate and choose to use the exact locations. If this location is used then the imap data model is updated incrementally. Thus, when timing evaluates the quality of the solution, it knows exactly where the cell will end up. In this model, legalization is not necessary. An example of one problem with the imap model occurs when a cell seven tracks wide wants to be placed in a hole that is five tracks wide. To a user, it may be obvious to simply slide the neighboring cell over by two tracks to m ake room. In general, small local moves like th is will have minimal effect on timing and make it more likely that the cells will be placed at their desired locations. In such a case, a list of all the cells that need to be moved to make room for the new/modified box as well as their new locations is supplied to the transform. The transform can then evaluate this compound movement of a set of boxes and estimate the benefit/cost and decide to accept or reject such movement. Th e advantages of this approach include more successes in legally placing boxes within some specified maximum as well as obtaining legal locations that are generally closer to the desired locations. On the other hand, the transforms may get more complicated as they need to manage and evaluate the movement of, possibly unrelated, multiple cells. It may also cause more churn to the design during the later stages of optimization due to the movement of significantly larger number of boxes, which may not be directly targeted by the optimizations. Thus, it is a bit of an art to find the right degree of placement and optimization interaction that trades off accuracy versus runtime. These models are still evolving in PDS today. 39.4 CRITICAL PATH OPTIMIZATIONS Optimization of critical paths is at the heart of any physical synthesis system. Timing closure is clearly an important goal, but electrical correctness, placement and routing congestion, area, power, wirelength,yield, andsignalintegrityare alsoimportantdesigncharacteristicsthat mustbeconsidered and optimized when making incremental changes to the netlist. Within PDS, there is a large menu of optimizations that can be applied to the design. The sequences of optimizations are packaged for various functions and can be enabled or disabled Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 819 29-9-2008 #8 Placement-Driven Synthesis Design Closure Tool 819 via system parameters. Optimizationsmay also be used interactivelyby designers. The mosteffective optimizations are generally buffering and gate sizing. As a secondary dimension for optimization, with buffering one can also perform wire sizing and with gate sizing one can perform assign gates to different vt. Because buffering is covered in other chapters, we turn to gate sizing. 39.4.1 GATE SIZING Gate sizing is responsible for selecting the appropriate drive strength for a logic cell from the functionally equivalent cells available in the technology library. For example, a library may contain a set of ten inverters, each with a characteristic size, power consumption, and drive strength. Upon finding an inverter in the design, it is the task of gate sizing to assign the inverter with the appropriate drive strength to meet design objectives. When the mapped design comes from logic synthesis, gate sizes have already b een assigned based on the best information available at the time. Once the design is placed, Steiner wire estimates can be used to give a more-accurate estimation of wire loads, and many of the p revious assignments will be found to be suboptimal. Likewise, gate sizes must be reevaluated after global and detailed routing, because wire delays will again have changed. As discussed earlier, the electrical correction step performs an initial pass over the entire design. Gate sizes are assigned in a table-lookup fashion to fix capacitance and slew violations introduced by the more accurate Steiner wire models. There may be several cells in the library that meet the requirements of a logic cell, so the one with minimal area is chosen. If gate sizing is insufficient to fix the violation, buffering or box movement may be used. Later optimizations have the option of modifying these initial gate sizes. If a cell in the design is timing critical, the library cell that results in the best path delay would be chosen, while if the cell already meets its timing requirements, area recovery will pick the cell with greatest area savings. For critical path optimizations, gate sizing examines a size-sorted window of functional alterna- tives and evaluates each of them to choose the best library cell. For example, suppose that the current cell is a NAND2_D, and the library has, from smallest to largest, NAND2_A through NAND2_G cells. The program might evaluate the B, C, E, and F levels to see if they are a better fit for the opti- mization objectives. The size of the window is dynamic and affects both the accuracy of the choice and the runtime of the optimization. Because the design is constantly changing during optimization, it is necessary to periodically revisit the assigned gate sizes and readjust them. This allows r evisiting choices, perhaps with different cell windows. Other algorithms, such as simulated annealing, Lagrangian relaxation (see Chapter 29), or integer programming approaches [8], have been suggested for use in resizing, but they tend to b e too slow, given the size of today’s designs and the frequency with which this needs to be done. Further, these approaches tend to make gross assumptions about a continuous library, which then needs to be mapped to cells in a discrete library; this mapping may severely distort the quality of the optimization. Also, these methods do not account well for capacitance and slew changes resulting from new power-level assignments, and the physical placement constraints, as described above. Gate sizing is important to nearly every facet of optimization. It is used in timing correction, area recovery, electrical correction, yield improvement, and signal-integrity optimization. 39.4.2 GATE SIZING WITH MULTIPLE-VT LIBRARIES Besides performing timing closure, PDS also manages the total power budget. See Chapter 3 for an overview of the components of power consumption. The contribution of the static power component or leakage to the total power number is growing rapidly as geometries shrink. To account for that, technology foundries have introduced cell libraries with multiple vt. These libraries contain separate cells with the same functionality but with different threshold. These libraries contain separate cells with the same functionality but with different vt. In the simplest form there are two different thresholds available, commonly called high-vt and low-vt, where vt stands for vt. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 820 29-9-2008 #9 820 Handbook of Algorithms for Physical Design Automation Cells made up with high threshold transistors are slower but leak less while cells with low threshold transistors are faster at the expense of higher leakage p ower and less noise immunity. In practice, there is a limit to the number of different vt in the library because each vt introduces an additional mask in the fabrication process. Multi-vt libraries enable synthesis to select not only the appropriate gate size but also the appro- priate v t for each cell. Cells on a timing critical path can be assigned a lower vt to speed up the design. Cells that are not timing critical do not need the performance of a high-leakage cell and can use the slower and less leaky versions. In general, one prefers not to use low-vt cells at all unless they are absolutely necessary to meet high-performance timing constraints. During vt assignment, PDS simp ly collects all critical gates an d sorts the m based on their criti- cality. vt assignment then proceeds by lowering the vt on th e cells starting with the most critical cell first. The algorithm honors designer supplied leakage limits by incrementally computing the leakage current in the design. In general, multiple thresh old libraries are designed such that the low-vt equivalent of each cell has the same area and cell image as the high-vt cell. This makes the multiple voltage threshold optimization a transformation, which does not disturb the placement of a design. Because the input capacitance of a low-vt cell is slightly higher than that of a corresponding high- vt cell, resizing the cells after threshold optimization can yield further improvement in performance. The impact o f multiple vt optimization on the power/performance trade off depends on the distribution of slack across the logic. Designs with narrow critical regions can yield significant performance improvements with little affect on leakage power. The performance boost obtained from using low-vt cells is significant, making it one of the more powerful tools PDS has to fix critical paths. 39.4.3 INCREMENTAL SYNTHESIS Besides buffering and gate sizing, many other techniques can be applied to improve critical paths. Techniques from logic synthesis, modified to take placement and routing into account, can at times be very effective. Even though the design comes from logic synthesis optimized for timing, the changes caused by placement, gate sizing, buffering, etc. may disrupt the original timing and may create an opportunity for these optimizations to be effective in correcting a path that may not be fixable otherwise. • Cell movement:In g eneral, the nextmost effectiveoptimizationtechnique is cell movement. One can move cells to not only improve timing, but also minimize wirelength, reduce placement congestion, or balance pipeline stages. For critical path optimization, a simple, yet effective approach is to find a box on a critical p ath and try to move it to a better location that improves timing. • Cloning: Instead of sizing up a cell to drive a net with a fairly high load, one could copy the cell and partition the sinks of the original output nets among the copies. Figure 39.3 shows an example where four sinks are driven b y two identical gates after cloning. Cloning can also improve wirelengths and wiring congestion. • Pin swapping: Pin swappingtakes advantageof cells where the inputto output timingsdiffer by pin. As an example, a 4 -input NAND, with inputs A, B, C, and D and output Z. the delay from A to Z could be less than the delay from D to Z. Some cells may be architected so that the behavior is intentional. By swapping a timing critical at pin D with a noncritical signal at pin A , one can obtain timing improvement. More generally, when one has a fan-in tree as in Figure 39.4, commutative pins can also be swapped, so that the slowest net can be moved forward in the tree. Like cloning, pin swapping can also be used to improve wirelength and decrease wiring congestion. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 821 29-9-2008 #10 Placement-Driven Synthesis Design Closure Tool 821 C1 C1 C1′ S FIGURE 39.3 Cloning. (From Trevillyan, et al., IEEE Design and Test of Computers, pp. 14–22, 2004. With permission.) AND AND AND AND AND AND AND AND AND AND AND AND AND AND A E H C F G B D F D B G H C E A FIGURE 39.4 Pin swapping. • Inverter processing: In a rich standard-cell library, complement and dual-complement cells are available for many functions. Timing or area can be improved by manipulating invert- ers. For example, Figure 39.5 shows an example of an INVERT-NAND sequence being replaced with a NOR-INVERT sequence. Other examples include changing an AND- INVERT sequence to a NAND or an AND-INVERT into and OR. Inverter processing may remove an inverter, add an inverter, or require an inverter to be moved to another sink. • Cell expansion: The cell library may contain “complex” multilevel functions, such as AND- OR, XOR, MUX, or other less-well-defined cells. These cells normally save space, but can be slower than a breakdown into equivalentsin gle-level cells (NAND, NOR, INVERT, etc.). Cell expansion breaks apart these cells into its components; for example, Figure 39.6 shows an XOR gate decomposed into three AND gates and two inverters. • Off-path resizing: As discussed earlier, gate sizing is a core technique for optimization of gates on a critical path. However, one can also attempt to reduce the load driven by these gates by reducing the size of noncritical sink cells, as shown in Figure 39.7. The smaller . same set of cells. Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 818 29-9-2008 #7 818 Handbook of Algorithms for Physical Design Automation Because of these. the physical implementation affects timing. 813 Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 814 29-9-2008 #3 814 Handbook of Algorithms for Physical Design. Alpert /Handbook of Algorithms for Physical Design Automation AU7242_S008 Finals Page 812 24-9-2008 #3 Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C039