Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C046 Finals Page 982 9-10-2008 #27 982 Handbook of Algorithms for Physical Design Automation Source Switchbox Sink (a) Config 1 (b) Config 2 (c) Config 3 (d) Delay variation 1.5 1.1 0.5 1.0 1.0 1.0 1.0 0.9 0.9 FIGURE 46.19 Three critical path configurations and delay variations of a switch matrix. (Based on Matsumoto, Y. et al., Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Pr ogrammable Gate Arrays, ACM Press, New York, 2007. With permission.) where Y 1 (Target) is defined as Y 1 ( Target ) = T Target −∞ f crit ( t ) dt (46.13) In Equation 46.12, the likelihood that all n configurations fail is subtracted from 1. In their work, they assume complete independence b etween critical paths in different configurations, which enables them to analytically evaluate Equations 46.12 and 46.13. This assumption is not valid, as we kn ow spatial corr elations exist between circuit elements, and also critical paths across different configurations might share routing resources, especially close to the source and sink nodes. They propose a routing algorithm that keeps track of the usage of routing resources by critical paths and tries to avoid them in consecutive configuration s that are generated. The method is similar to the congestion avoidance procedure used in VPR, that is, resources that are used by critical paths in other configurations are penalized so that the router avoids them if other paths with the same delay exist. REFERENCES 1. J. Cong and K. Minkovich, Optimality study of logic synthesis for Lut-based FPGAs, IEEE Transactions on Computer-Aided Design of Integr ated Circuits and Systems, 26(2): 230–239, 2007. 2. D. Chen and J. Cong, Daomap: A depth-optimal area optimization mapping algorithm for FPGA designs, in ICCAD ’04: Proceedings of the 2004 IEEE/ACM International Conference on Computer-Aided Design, pp. 752–759, IEEE Computer Society, Washington DC, 2004. 3. B. L. Synthesis and V. Group, A bc: A system for sequential synthesis a nd verification. Available at http://www.eecs.berkeley.edu/∼alanmi/abc/. 4. Alan, S. Chatterjee, and R. Brayton, Improv ements to technology mapping for Lut-based FPGAs, in FPGA ’06: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, pp. 41–49, ACM Press, New York, 2006. 5. J. Cong and Y. Ding, Flo wmap: An optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs, IEEE Transactions on Computer-Aided Design of Integr ated Circuits and Systems (TCAD), 13(1): 1–12, 1994. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C046 Finals Page 983 9-10-2008 #28 FPGA Technology Mapping, Placement, and Routing 983 6. V. Betz and J. Rose, VPR: A n ew packing, placement and routing tool for FPGA research, in Field- Programmable Logic and Applications (W. Luk, P. Y. Cheung, and M. Glesner, eds.), pp. 213–222, Springer-Verlag, Berlin, Germany, 1997. 7. A. S. Marquardt, V. Betz, and J. Rose, Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density, in Proceedings of the ACM/SIGDA International Symposium on Field Pr ogrammable Gate Arrays, Monterey, CA, pp. 37–46, 1999. 8. E. Bozorgzadeh, S. Ogrenci-Memik, and M. Sarrafzadeh, Rpack: Routability-driven packing for cluster- based FPGAs, in Pr oceedings of the Asia-South Pacific Design Automation Confer ence, Yokohama, Japan, 2001, pp. 629–634. 9. A. Singh and M. Marek-Sadowska, Efficient circuit clustering for area and power reduction in FPGAs, in Pr oceedings of the ACM/SIGDA International Symposium on Field Pro grammable Gate Arrays, Monterey, CA, pp. 59–66, 2002. 10. A. DeHon, B alancing interconnect and computation in a reconfiguable computing array (or , why you don’t really want 100% LUT utilization), in Proceedings of the ACM/SIGDA International Symposium on Field Pr ogrammable Gate Arrays, Monterey, CA, pp. 69–78, 1999. 11. L. Cheng a nd M. D. F. Wong, Floorplan design for multi-million gate FPGAs, in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp. 292–299, 2004. 12. Y. Sankar and J. Rose, Trading quality for compile time: Ultra-fast placement for FPGAs, i n Proceed- ings of the ACM/SIGDA International Symposium o n Field Pro grammable Gate Arrays, San Jose, CA, pp. 157–166, 1999. 13. J.M.Emmert andD. Bhatia, Amethodology forfast FPGA floorplanning, inProceedings of the ACM/SIGDA International Symposium on Field Prog rammable Gate Arrays, Monterey, CA, pp. 47–56, 1999. 14. K. Bazargan, R. Kastner, and M. Sarrafzadeh, Fast template placement for reconfigurable computing systems, IEEE Design and Test—Special Issue on Reconfigur able Computing, 17: 68–83, January 2000. 15. E. L. Horta, J. W. Lockwood, D. E. Taylor, and D. Parlour, Dynamic hardware plugins in an FPGA with partial runtime reconfiguration, in Proceedings of the ACM/IEEE Design Automation Conference,New Orleans, LA, pp. 343–347, 2002. 16. J. Chen, J. Moon, and K. Bazargan, A reconfigurable FPGA-based readback signal generator for hard-drive read channel simulator, in Proceedings of the ACM/IEEE Design Automation Conference, New Orleans, LA, pp. 349–354, 2002. 17. M. Handa and R. Vemuri, An efficient algorithm for finding empty space for online FPGA placement, in Pr oceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, pp. 960–965, 2004. 18. L. Singhal and E. Bozorgzadeh, M ulti-layer floorplanning on a sequence of reconfigurable designs, in FPL’06: Proceedings of the 2006 International Conference on Field Pro grammable Logic and Applications, Madrid, 2006. 19. J. Cong, M. Romesis, and M. Xie, Optimality and stability study of timing-driven placement algorithms, in Proceedings of the I EEE/ACM International Conference on Computer-Aided Design, San Jose, CA, p. 472, 2003. 20. C. -L. E. Cheng, Risa: Accurate and efficient placement routability modeling, in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp. 690–695, 1994. 21. A. Marquardt, V. Betz, and J.Rose,Timing-drivenplacement for FPGAs, in Proceedings of the ACM/SIGDA International Symposium on Field Prog rammable Gate Arrays, Monterey, CA, pp. 203–213, 2000. 22. S. Nag and R. A. Rutenbar, Performance-driven simultaneous placement and routing for FPGA’s IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 17(6): 499–518, 1998. 23. P. Maidee, C. Ababei, and K. Bazargan, Timing-driven partitioning-based placement for island style FPGAs, IEEE Transactions on Co mputer-Aided Design of Inte grated Circuits and Systems (TCAD), 24(3): 395–406, 2005. 24. S. A. Senouci, A. Amoura, H. Krupnova, and G. Saucier, Timing driven floorplanning on programmable hierarchical targets, in Proceedings of the ACM/SIGDA International Symposium on Field P rogrammable Gate Arrays , Monterey, CA, pp. 85–92, 1998. 25. M. Hutton, K. Adibsamii, and A. Leaver, Timing-driven placement for hierarchical programmable logic devices, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, C A, pp. 3–11, 2001. 26. G. Chen and J. Cong, Simultaneous timing-driven placement and duplication, in Proceedings of the ACM/SIGDA International Symposiumon Field Programmable Gate Arrays,Monterey, CA,pp. 51–59,2005. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C046 Finals Page 984 9-10-2008 #29 984 Handbook of Algorithms for Physical Design Automation 27. D. P. Singh and S. D. Brown, Incremental placement for layout-driven optimizations on FPGAs, in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp. 752–759, 2002. 28. S. -W. Hur and J. Lillis, Mongrel: Hybrid techniques for standard cell placement, in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp. 165–170, 2000. 29. T. J. Callahan, P. Chong, A. DeHon, and J. Wawrzynek, Fast module mapping and placement for datapaths in FPGAs, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, C A, pp. 123–132, 1998. 30. C. Ababei and K. Bazargan, Non-contiguous linear placement for reconfigurable fabrics, International Journal of Embedded S ystems (IJES)—esp. issue on Reconfigurable Architectures Workshop (RAW), 2(1/2): 86–94, 2006. 31. M. Hutton, Y. Lin, and L. He, Placement and timing for FPGAs considering variations, in FPL’06: Pro- ceedings of the 2006 International Confer ence on Field Programmable Logic and Applications,Madrid, 2006. 32. L. Cheng, J. Xiong, L. He, and M. Hutton, FPGA performance optimization via chipwise placement considering process variations, in FPL’06: P roceedings of the 2006 International Confer ence on Field Programmable Logic and Applications, Madrid, 2006. 33. C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, S. Narayan, D. K. Beece, J. Piaget, N. Venkateswaran, and J. G. Hemmett, First-order incremental block-based statistical timing analysis, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25: 2170–2180, October 2006. 34. Y. Lin and L. He, Stochastic physical synthesis for FPGAs with pre-routing interconnect uncertainty and process variation, in FPGA ’07: Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, pp. 80–88, ACM Press, New York, 2007. 35. A. Gayasen, Y. Tsai, N. V ijaykrishnan, M. Kandemir, M. J. Irwin, and T. Tuan, Reducing leakage energy in fpgas using region-constrained placement, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, M onterey, CA, pp. 51–58, 2004. 36. Y. Lin and L. He, Leakage efficient chip-level dual-vdd assignment with time slack allocation for FPGA power reduction, in Proceedings of the ACM/IEEE Design Automation Conference, Anaheim, CA, pp. 720– 725, 2005. 37. L. McMuchie and C. Ebeling, Pathfinder: A negotiation-based performance-driven router for FPGAs, in Pr oceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, pp. 473–482, 1995. 38. Y. -W. C hang, K. Zhu, and D. F. Wong, Timing-driven routing for symmetrical array-based FPGAs, ACM Transactions on Design Automation of Electronic Systems, 5(3): 433–450, 2000. 39. G. -J. Nam, K. A . Sakallah, and R. A. Rutenbar, A new FPGA detailed routing approach via search-based Boolean satisfiability, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 21(6): 674–684, 2002. 40. J. -M. Lin, S. -R. Pan, and Y. -W. Chang, Graph matching-based algorithms for array-based FPGA seg- mentation design and routing, in Proceedings of the Asia-South Pacific Design Automation Conference, Kitakyushu, Japan, pp. 851–854, 2003. 41. N. Sherwani, Algorithms for VLSI Physical D esign Automation, 2 edn. Kluwer Academic Publishers, Boston, MA, 1995. 42. K. Eguro and S. H auck, Armada: Timing-driven pipeline-aw are routing for FPGAs, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, pp. 169– 178, 2006. 43. P. Kannan, S. Balachandran, and D. Bhatia, On metrics for comparing routability estimation methods for FPGAs, in Proceedings of the ACM/IEEE Design A u tomation Confer ence, New Orleans, LA, pp. 70– 75, 2002. 44. S. Sivaswamy and K. Bazargan, Variation-aware routing for FPGAs, in FPGA ’07: Proceedings of the 2007 ACM/SIGD A 15th International Symposium on Field Progra mmable G ate Arrays, pp. 71–79, ACM Press, New York 2007. 45. Y. Matsumoto, M. Hioki, T. Kawanami, T. Tsutsumi, T. Nakagawa, T. Sekigawa, and H. K oike, Performance and yield enhancement of FPGAs with within-die variation using multiple configurations, in FPGA ’07: Pr oceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Pro grammable Gate Arrays, pp. 169–177, ACM Press, New York 2007. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 985 10-10-2008 #2 47 Physical Design for Three-Dimensional Circuits Kia Bazargan and Sachin S. Sapatnekar CONTENTS 47.1 Introduction 985 47.2 Standard Cell-Based Designs 987 47.2.1 Thermal Vias 987 47.2.2 3D Floorplanning 989 47.2.3 3D Placement 990 47.2.4 Routing Algorithms 991 47.3 3D FPGADesigns 993 47.3.1 Estimation Methods 994 47.3.2 Placement and Routing Algorithms 997 47.3.2.1 Partitioning the Circuit between Tiers 998 47.3.2. 2 Partitioning-Based Placement within Tiers 999 47.3.2.3 Simulated Annealing Placement Phase 1000 References 1000 47.1 INTRODUCTION Recent advances in process technology have brought three-dimensional (3D) circuits to the realm of reality. This n ew design paradigm will require a major change from contemporary design method- ologies, because an optimal 3D design has very d ifferent characteristics from an optimal 2D design. The move from conventional 2D to 3D is inherently a topological change, and therefore, many of the problems that are unique to 3D circuits lie in the domain of physical design. The essential idea of a 3D circuit is to place multiple tiers of active devices (transistors) above each other, as opposed to a conventional 2 D circuit where all transistors and gates lie in a single tier. An example of 3D circuit is shown in Figure 47.1. One of the primary motivators for 3D technologies is related to the dominant effects of intercon- nects in nanoscale technologies, and the addition of a third dimension provides significant relief in this respect. This is achieved b y reductions in the average interconnect lengths (in comparison with 2D implementations, for the same circuit size), lower wire congestion, as well as by denser integra- tion, which results in the replacement o f chip-to-chip interconnections by intrachip connections. In addition, the increased packing density improves the computation per unit volume. For instance, Figure 47.2 shows a 2D layout on a chip of dimension 2L × 2L on the left, where the longest (nondetoured) wire, going from one end of the layout to the other, has a length of 4L. If this design is built on four tiers, as shown at right, assuming the same total silicon area and a square aspect ratio for each tier, the silicon area in each tier is L ×L. Therefore, the longest possible 985 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 986 10-10-2008 #3 986 Handbook of Algorithms for Physical Design Automation Intratier wires Devices Intertier via Silicon substrate Tier 1 Tier 2 Tier 3 Tier 4 FIGURE 47.1 Schematic of a 3D integrated circuit. undetoured wirelength, going from one end in the lowest tier to the other end in the uppermost tier, is approximately 2L (because the intertier thickness is negligible). Because, for a buffered two-pin interconnect, the delay of a wire is proportional to its length, this implies that the delay is halved. Moreover, the reduced wire lengths also reduce the likelihood of congestion bottlenecks, potentially reducing the need to detour wires. A more precise distribution of the wirelength has been reported in Ref. [1], which shows that the histogram of wirelength distributions moves progressively to the left as the number of tiers is increased. In addition, 3D designs can result in new paradigms, for example, heterogeneous integration, where each tier could be a different material (e.g.,a silicon-based circuit on one tier and a GaAs-based circuit on another). Even for purely silicon-based circuits, 3D designs permit analog/RF and digital circuits to be build on different tiers, which improves their noise behavior; additionally, it is possible to construct shielding structures such as Faraday cages between tiers for enhanced noise reduction. Various flavors o f 3D technologies have been proposed and are in use. One of the simplest forms involves wafer stacking, where the distance between active devices in the third dimension (or the “z dimension”) equals the thickness of a wafer. However, the thickness of a wafer is of the order of several hundreds of microns, and the full potential of 3D is not achieved by this approach due to the long distance that a wire must traverse in the z dimension. Further progress has resulted in the development of integrated 3D circuits in industrial [2], g overnment [3], and academic [4] settings, which have demonstrated 3D designs with intertier separations of the order of a few microns. Today, it is only possible to build a few tiers in the third dimension, as a result of which many of these technologies are often referred to as 2.5D rather than fully 3D. Nevertheless, even the half dimension can provide the potential for substantial performance improvements, and perhaps future technological improvements will enable truly 3D integration. In this chapter, we present an overview of physical design technologies for 3D circuits. We begin with a brief overview of a typical 3D technology, and then discuss physical design problems in the custom/ASIC design as well as the FPGA paradigms. Generally speaking, the number of tiers is taken in as a technology input by the 3D tools described in this chapter. 2L 2L L L FIGURE 47.2 Comparison of the maximum wirelength in a 2D layout (left) and i n its 3D counterpart (right). For clarity, the intertier thicknesses in the 3D circuit are shown to be exaggeratedly large. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 987 10-10-2008 #4 Physical Design for Three-Dimensional Circuits 987 47.2 STANDARD CELL-BASED DESIGNS A typical cell-based flow begins with a floorplanning step, where the system is laid out at the level of macroblocks, detailed p lacement of the cells in the layout, and routing. In the 3D context, each of these must be modified to adapt to the constraints imposed by 3D circuits. In addition to conventional metrics, 3D-specific geometrical considerations must be used, for example, for wirelength metrics. In addition, temperature is treated as a first-class citizen during these optimizations. ∗ Moreover, intertier via reduction is considered to be a desirable goal, because the number of available vias is restricted and must be shared between signal nets and supply and clock nets. In addition to floorplanning, placement, and routing, a 3D-specific optimization that makes the temperature distribution more uniform is the judicious positioning of thermal vias within the layout. These vias correspond to intertier metal connections that have no electrical function, but instead, constitute a passive cooling technology that draws heat from the problem areas to the heat sink, and can be built into each of these steps or performed as an independent postprocessing step, depending on the design methodology. It is instructive to view the result of a typical 3D thermally aware placement [5]: a layout for the benchmark circuit, IBM01, in a four-tier 3D process, is displayed in Figure 47.3. The cells are positioned in ordered rows on each tier, and the layout in each individual tier looks similar to a 2D standard cell layout. The heat sink is placed at the bottom of the 3D chip, and the lighter shaded regions are hotter than the darker shaded regions. The coolest cells are those in the bottom tier, next to the h eat sink, and the temperature increases as we move to higher tiers. The thermal placement method consciously mitigates the temperature by making the upper tiers sparser, in terms of the percentage of area populated by the cells, than the lower tiers. 47.2.1 THERMAL VIAS Although silicon is a good thermal conductor,with half or more of the conductivity of typical metals, many of the materials used in 3D technologies are strong insulators that place severe restrictions on the amount of heat that can be removed, even under the best placement solution.Thematerials include epoxy bonding materials used to attach 3D tiers, or field oxide, or the insulator in an SOI technology. Therefore, the use of deliberate metal lines that serve as heat-removing channels, called thermal vias, are an important ingredient of the total thermal solution. The second step in the flow determines the optimal positions of thermal vias in the placement that provide an overall improvement in the Hot Cool 0 0.5 Ϫ0.5 1 ϫ10 −5 Ϫ1 0.015 0.005 Ϫ0.005 0.01 Ϫ0.01 0 Ϫ0.015 0.015 0.01 0 0.005 Ϫ0.005 Ϫ0.01 Ϫ0.015 FIGURE 47.3 Placement for the benchmark ibm01 in a four-tier 3D technology. ( From Ababei, C., et al., IEEE Design and Test, 22, 520, 2005. Copyright IEEE. With permission.) ∗ A description of techniques for thermal analysis is provided in Section 3.4 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 988 10-10-2008 #5 988 Handbook of Algorithms for Physical Design Automation temperature distribution. In realistic 3D technologies, the footprints of these intertier vias are of the order 5 ×5 µm. In principle, the problem of placing thermal v ias can be viewed as one of determining one of two conductivities (corresponding to the presence or absence of metal) at every candidate point where a thermal via may be placed in the chip. However, in practice, it is easy to see that such an approach could lead to an extremely large search space that is exponential in the number of possible positions; note that the set of possible positions in itself is extremely large. Quite apart from the size of the search space, such an approach is unrealistic for several other reasons. First, the wanton addition of thermal vias in any arbitrary region of the layout would lead to nightmares for a router, which would have to navigate around these blockages. Second, from a practical standpoint, it is unreasonable to perform full-chip thermal analysis, particularly in the inner loop of an optimizer, at the granularity of individual thermal vias. At this level of detail, individual elements would have to correspond to the size of a thermal via, and the size of the thermal simulation matrix would become extremely large. Fortunately, there are reasonable ways to overcome each of these issues. The blockage problem may be controlled by enforcing discipline within the design, designating a specific set of areas within the chip as potential thermal via sites. These could be chosen as specific interrow regions in the cell- based layout, and the optimizer would d etermine the density with which these are filled with thermal vias. Theadvantageto the routerisobvious, because onlytheseregions arepotential blockages,which is much easier to handle. To control the finite element analysis (FEA) stiffness matrix size, one could work with a two-level scheme with relatively large elements, where the average thermal conductivity of eachregion isadesignvariable. Oncethisaverageconductivityischosen,itcouldbe translated back into a precise distribution of thermal vias within the element that achieves that average conductivity. Various published methods take different approaches to thermal via insertion. We now describe an algorithm to postfacto thermal via insertion [6]; other procedures perform thermal via insertion during floorplanning, placement or routing are discussed in the appropriate sections. For a given placed 3D circuit, an iterative method was developed in which, during each iteration, the thermal conductivities of certain FEA elements (thermal via regions) are incrementally modified so that thermal problems are reduced or eliminated. Thermal vias are generically added to elements to achieve the desired thermal conductivities. The goal of this method is to satisfy given thermal requirements using as few thermal vias as possible, that is, keeping the thermal conductivities as low as possible. The approach uses the finite element equations to determine a target thermal conductivity. A key observation in this work is that the insertion of thermal vias is most useful in areas with a high thermal gradient, rather than areas with a high temperature. Effectively, the thermal via acts as a pipe that allows the heat to be conducted from the higher temperature region to the lower temperature region; this, in turn, leads to temperature reductions in areas of high temperature. This is illustrated in Figure 47.4, which shows the 3D layout of the benchmark struct,before and after the addition of thermal vias. The hottest region is the center of the uppermost tier, and a major reason for its elevated temperature is b ecause the tier below it is hot. Adding thermal vias to remove heat from the second tier, therefore, effectively also significantly reduces the temperature of the top tier. For this reason, the regions where the insertion of thermal vias is most effective are those that have high thermal gradients. Therefore the method in Ref. [6] employs an iterative update formula of the type K new i = K old i g old i g i,ideal i = x, y, z (47.1) is employed, where K new i and K old i are, respectively, the new and old thermal conductivities in each direction, before and after each iteration, g old i is the old thermal gradient, an d g i,ideal is a heuristically selected ideal thermal gradient. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 989 10-10-2008 #6 Physical Design for Three-Dimensional Circuits 989 0 −0.01 y 1 0.8 0.6 0.01 −1 −0.8 −0.6 −0.4 −0.2 z 0.4 0.2 0 ϫ10 Ϫ5 Before thermal via placement After thermal via placement ϫ10 Ϫ5 1 0.8 0.6 −1 −0.8 −0.6 −0.4 −0.2 0.4 0.2 0 z 0 −0.01 0.01 y −0.015 −0.005 −0.01 0 0.005 0.01 0.015 x x −0.015 −0.005 −0.01 0.000 0.005 0.01 0.015 FIGURE 47.4 Thermal profile of struct before (left) and after (right) thermal via insertion. The top four layers of the figure at right correspond to the f our layers in the figure at left. (From Goplen, B. and Sapatnekar, S. S., IEEE Transactions on Computer-Aided Design, 26, 692, 2006. Copyright IEEE. With permission.) Each iteration begins with a distribution of the thermal vias; this distribution is corrected using the aboveupdate formula, and the K new i value is th en translated to a thermal via density, and then a precise layout of thermal vias, using precharacterization. The iterations end when the desired temperature profile is achieved. This essential iterative idea has also been used in other methods for thermal- via insertion steps that are integrated within floorplanning, placement, and routing, as described in succeeding sections. This general framework has been used in several o ther published techniques that insert thermal vias either concurrently during another optimization, or as an independent step. 47.2.2 3D FLOORPLANNING The 3D floorplanning problem is analogous to the 2D problem discussed in Chapters 8 through 13, with all the constraints and opportunities that arise with the move to the third dimension. Typical cost functions include a mix of the conventional wirelength and total area costs, and the temperature and the number of intertier vias. The approach in Ref. [7] presented one of the first approaches to 3D floorplanning, and used the transitive closure graph (TCG) representation [8], described in Section 11.7, for each tier, and a bucket structure for the third dimension. Each bucket represents a 2D region over all tiers, and stores, for each tier, the indices of the blocks that intersect that bucket. In other words, the TCG and this bucket structure can quickly determine any adjacency information. A simulated annealing engine is then utilized, with the moves corresponding to perturbations within a tier and across tiers; in each such case, the corresponding TCGs and buckets are updated, as necessary. A simple thermal analysis procedure is built into this solution, using a finite difference approx- imation of the thermal network to build an RC thermal network. Under the assumption that heat flows purely in the z direction and there is no lateral heat conduction, the RC model obtained from a finite difference approximation has a tree structure, and Elmore-like computations (Section 47.3.1) can be performed to d e termine the temperature. The optimization heuristically attempts to make this a self-fulfilling assumption, by discouraging lateral heat conduction, introducing a cost function parameter that discourages strong horizontal gradients. A hybridapproachperformsan exact thermal analysis once every 20 iterations or so and uses the approximate approach for the other iterations. The work in Ref. [9] expands the idea of thermally driven floorplanning by integrating thermal via insertion into the simulated annealing procedure. A thermal analysis procedure based on random walks[10]is built into themethod, and an iterative formula,similartoRef. [6],is usedin a thermal-via insertion step between successive simulated annealing iterations. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 990 10-10-2008 #7 990 Handbook of Algorithms for Physical Design Automation 47.2.3 3D PLACEMENT In the p lacement step, the p recise positions of cells in a layout are determined, and they are arranged in rows within the tiers of the 3D circuit. Because thermal considerations are particularly important in 3D cell-based circuits, this pr ocedure m ust spread the cells to achieve a reasona ble temp erature distribution, while also capturing traditional placement requirements. Several approaches to 3D placement have been proposed in the literature. The work in Ref. [11] embeds the netlist hypergraph into the layout area. A recursive bipartitioning procedure is used to assign nodes of the hypergraph to partitions, using mincut as the primary objective and underpartition capacity constraints. Partitioning in the z direction corresponds to tier assignment, and xy partitions to assigning standard cells to rows. No thermal considerations are taken into account. The procedure in Ref. [5] presents a 3D-specific force-directed placer that incorporates thermal objectives directly into the placer. I nstead of the finite difference method that is used in many floorplanners, this approach employs FEA, which discretizes the design space into regions known as elements. For rectangular structures of the type encountered in integrated circuits, a rectangular cuboidal element can simulate heat conduction in the lateral directions without aberrations in the prime directions. As described in Chapter 3, FEA results in a matrix of the type KT = P (47.2) The left hand side matrix, K, known as the global stiffness matrix, can be constructed using stamps for the finite elements and the boundary conditions. The FEA equations are solved rapidly using an iterative linear solver, with clever adjustments of the convergence criteria to achieve greater or lesser accuracy, as required at different stages of the iterative placement process. The placement engine is based on a force-directed approach, the key idea of which is described in Chapter 18. Attractive forces are created between interconnected cells, and these are proportional to the quadratic function of the cell coordinates that represents the Euclidean distance between the blocks. The constants of proportionality are chosen to be higher in the z direction to discourage intertier vias. Apart from design criteria such as cell overlap, in the 3D context, thermal criteria are also used to generate repulsive forces, to prevent hot spots. The temperature gradient (which itself can be related to the stiffness matrix and its derivative) is used to determine the magnitudes and directions of these forces. Once the entire system of attractive and repulsive forces is generated, repulsive forces are added, the system is solved for the minimum energy state, that is, the equilibrium location. Ideally, this minimizes the wirelengths while at the same time satisfying the other design criteria such as the temperature distribution. The iterative force-directed approach follows the following steps in the main loop. Initially, forces are updated based on the previous placement. Using these new forces, the cell positions are then calculated. These two steps of calculating forces and finding cell positions are rep eated until the exit criteria are satisfied. The specifics of the force-directed approachto thermal placement, including the mathematical details, are presented in Ref. [5].Oncethe iterations converge, a final postprocessing step is used to legalize the placement. Even though forces have been added to discourage overlaps, the force-directed engine solves the problem in the continuous domain, and the task of legalization is to align cells to tiers, and to rows within each tier. Another method in Ref. [12] maps an existing 2D placement to a 3D placement through trans- formations based on dividing the layout into 2 k regions, for integer values of k, and then defining local transformations to heuristically refine the layout. More recent work in Ref. [13] observes that because 3D layouts have very limited flexibility in the third dimension (with a small number of layers and a fixed set of discrete locations), partitioning works better than a force-directed method. Accordingly, this work performs global placement using recursive bisectioning. Thermal effects are incorporated through thermal resistance reduction nets, which are attractive forces that induce high power nets to remain close to the heat sink. The global Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 991 10-10-2008 #8 Physical Design for Three-Dimensional Circuits 991 placement step is followed by coarse legalization, in which a novel cell-shifting approach is proposed. This generalizes the methods in FastPlace, described in Chapter 18, by allowing shift moves to adjust the boundaries of both sparsely and densely populated cells using a computationally simple method. Finally, detailed legalization generates a final nonoverlapping layout. The approach is shown to provide excellent trade-offs between parameters such as the number of interlayer vias, wirelength, and temperature. 47.2.4 ROUTING ALGORITHMS During routing, several objectives and constraints must be taken into consideration, including avoid- ing blockages due to areas occupied by thermal vias, incorporating the effect of temperature on the delays of therouted wires, andofcourse, traditional objectivessuch aswirelength, timing,congestion, and routing completion. Once the cells have been placed and the locations of the thermal vias determined, the routing stage finds the optimal interconnections between the wires. As in 2D routing, it is important to optimize the wirelength, the delay, and the congestion. In addition, several 3D-specific issues come into play. First, the delay of a wire increases with its temperature, so that more critical wires should avoid the hottest regions, as far as possible. Second, intertier vias are a valuable resource that must be optimally allocated among the nets. Third, congestion management and blockage avoidance is more complex with the addition of a third dimension. For instance, a signal via or thermal via that spans two or more tiers constitutes a blockage that wires must navigate around. Consider the problem of routing in a three-tier technology, as illustrated in Figure 47.5. The layout is gridded into rectangular tiles, each with a horizontal and vertical capacity that determines the number of wires that can traverse the tile, and an intertier via capacity that determines the number of free vias available in that tile. These capacities account for the resources allocated for nonsignal wires (e.g., power and clock wires) as well as the resources used by thermal vias. For a single net, as shown in the figure, the degrees of freedom that are available are in choosing the locations of the intertier vias, and selecting the precise routes within each tier. The locations of intertier vias will depend on the resource contention for vias within each grid. Moreover, critical wires should avoid the high-temperature tiles, as far as possible. The work in Ref. [14] presents a thermally conscious router, using a multilevel rou ting paradigm similar to Ref. [15,16], with integrated intertier via planning and incorporating thermal considera- tions. An initial routing solution is constructed by building a 3D minimum spanning tree (MST) for each multipin net, and using maze routing to avoid obstacles. At each level of the multilevel scheme, the intertier via planning problem assigns vias in a given region at level k − 1 of the multilevel hierarchy to tiles at level k. The problem is formulated as Tier 1 Tier 2 Tier 3 FIGURE 47.5 Example route for a net in a three-tier 3D technology. (From Ababei, C., et al., IEEE Design and Test, 22, 520, 2005. Cop yright IEEE. With permission.) . Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C046 Finals Page 982 9-10-2008 #27 982 Handbook of Algorithms for Physical Design Automation Source Switchbox Sink (a). longest possible 985 Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 986 10-10-2008 #3 986 Handbook of Algorithms for Physical Design Automation Intratier wires Devices Intertier. provided in Section 3.4 Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 988 10-10-2008 #5 988 Handbook of Algorithms for Physical Design Automation temperature distribution.