Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 832 29-9-2008 #21 832 Handbook of Algorithms for Physical Design Automation GND GND GNDVDD VDD M1 metal M2 metal Footer locations VGND VGND VGND VDD VDD VDD FIGURE 39.14 Coarse-grained power gating with macro/core. GND =Ground; VDD =voltage drain drain; VGND =virtual ground. Fine-grained power gating, as shown in Figure 39.15, where the footer switches are implemented withinthe logic in a regularlayoutare moredesirable in a high-performancedesign wherethe voltage drop across the power gate as well as IR and EM (electromigration) requirements are more stringent. 39.7 INTO THE FUTURE We have summarized the current workings of IBM’s physical synthesis capabilities to display the complexities of such a system. Physical synthesis requires a seamless integration of many previously separatedesign automationdomains, suchasoptimization,placement,timing,extraction, androuting. However, as technology progresses toward 45 nm and beyond, more will be demanded of physical synthesis. It must be dynamic and must constantly adapt to changing technologies, design styles, and design specifications. Timing closure will continue to evolve into the even more complex problem of design closure. Design closure requires that accurate modeling o f the clock tree network and routing be incorporated earlier and earlier up the physical synthesis pipeline to take into account their effects on timing and signal integrity. Meeting global power constraints, using multithreshold voltages, voltage islands, power gating,etc. also becomes morecritical. One must pay attention to how physical-designchoices relate to chip fabrication, so design fo r manufacturability and handlin g of variability will become increasingly important. Optimizations must become more sophisticated to take these additional objectives into account. Increasing chip sizes and additional requirements for physical synthesis to meet and incorporate these additional constraints also further exacerbates the ability to run efficiently, perhaps an other reason why hierarchical design is becoming more prevalent. GND VGND Virtual grid Macro/core M2 metal M1 metal Footer switch location Global grid FIGURE 39.15 Fine-grained power gating within a block. GND=Ground and VGND =virtual ground. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 833 29-9-2008 #22 Placement-Driven Synthesis Design Closure Tool 833 REFERENCES 1. C. J. Alpert, C. -N. Chu, and P. G. Villarrubia. The coming of age of physical synthesis, In IEEE/ACM ICCAD, San Jose, CA, 2007, pp. 246–249. 2. L. Trevillyan, D. Kung, R. Puri, L. N. Reddy, and M. A. Kazda. An integrated environment for technology closure of deep-submicron IC designs. In IEEE Design and Test of Computers, pp. 14–22, January 2004; W. Donath, P. Kudva, L. Stok, P. Villarubia, L. Reddy, A. S ullivan, Transformational placement and synthesis, in Proceedings of the Conference on Design, Automation and Test in Europe Exhibition,Paris, France, 2000, pp. 194–201. 3. C. J. Alpert, S. K. Karandikar, Z. Li, G. -J. Nam, S. T. Quay, H. Ren, C. N. Sze, P. G. Villarrubia, and M. C. Yildiz. Techniques for fast physical synthesis. Proceedings of the IEEE, 95(3):573–599, March 2007. 4. U. Brenner and A. Rohe. An effective congestion driven placement framework. In P roceedings of International Symposium on Physical Design, San Jose, CA, pp. 6–11, 2002. 5. J. Darringer, E. Davidson, D. J. Hathaway, B. Koenemann, M. Lavin, J. K. Morrell, K. Rahmat, W. Roesner, E. Schanzenbach, G. Tellez, and L. Trevillyan. EDA in IBM: Past, present, and future. IEEE Transactions on Computer-Aided Design of Integr ated Circuits and Systems, 19(12):1476–1497, December 2000. 6. I. Sutherland, R. F. Sproull, and D. Harris. Logical Effort: Designing Fast CMOS Circuits. Morgan Kaufmann, San Fransisco, CA, 1999. 7. L. Trevillyan, P. Kotecha, A. Drumm, and R. Puri. A Method for Incremental Cell Placement for Minimum W ire Length, U. S. patent pending. 8. E. Bozorgzadeh, S. Ghiasi, and M. Sarrafzadeh. Optimal integer delay budget assignment ondirected acyclic graphs. IEEE Transactions on CAD of ICs and Systems, 23(8):1184–1199. 9. V. Kravets and P. Kudva. Implicit enumeration of structural changes in circuit optimization. I n Proceedings of Design Aut omation Conference, San Diego, C A, pp. 439–441, June 2004. 10. R. E. Bryant. Graph-based algorithms for Boolean function maniupulation. IEEE Transactions on Computers, C-35(6):677–691, August 1986. 11. G. R. Chiu, D. P. Singh, V. Manohararajah, and S. D. Brown. Mapping arbitrary login functions into synchronous embedded memories for area reduction on FPGAS. In IEEE/ACM ICCAD, San Jose, CA, pp. 135–142, 2006. 12. B. Guan and C. Sechen. Large standard cell libraries and their impact on layout area and circuit performance. In IEEE ICCD, Austin, TX, pp. 378–383, 1996. 13. A. Hussain and K. Umino. Method to close timing on a ll corners with synopsys galaxy at and below 130 nm. In SNUG, San Jose, CA, 2005. 14. R. Murgai. Improved l ayout-driven area-constrained timing optimization by net buffering, In 18th Interna- tional Conference on VLSI Design held jointly with 4th International Confer ence on Embedded Systems Design (VLSID’05), Kolkota, India, pp. 97–102, 2005. 15. V. N. Kravets. Constructive multi-level synthesis by way of functional properties. PhD Thesis, University of Michigan, Ann A rbor, MI, 2001. 16. C.Guardiani,M.Bertoletti, N. Dragone, M. Malcotti, and P. McNamara. An effectiveDFM strategy requires accurate process and IP pre-characterization. In IEEE/ACM DAC, Anaheim, CA, pp. 760–761, June 2005. 17. H. Ren, D. Z. Pan, C. J. Alpert, and P. Villarrubia. Diffusion-based placement migration. In Proceedings of Design Automation C onference, Anaheim, CA, pp. 515–520, 2005. 18. R. Puri, L. Stok, J. Cohn, D. Kung, D. Pan, D. Sylvester, A. Srivastav a, and S. Kulkarni. Pushing ASIC performance in a power envelope. In Proceedings of Design Automation Conference, Anaheim, C A, p. 788, 2003. 19. H. Li, S. Bhunia, Y. Chen, T. N. Vijaykumar , and K. Roy. Deterministic clock gating for microprocessor power reduction. In High-Performance Computer Architecture, Anaheim, CA, pp. 113–122, 2003. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 834 29-9-2008 #23 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 835 29-9-2008 #2 40 X Architecture Place and Route: Physical Design for the X Interconnect Architecture Steve Teig, Asmus Hetzel, Joseph Ganley, Jon Frankle, and Aki Fujimura CONTENTS 40.1 Introduction 835 40.2 History 836 40.3 X InterconnectArchitecture 836 40.4 Theoretical Benefits of X 837 40.5 Limitations of X 839 40.6 Role of Vias 840 40.7 System ForX Place and Route 840 40.8 X Placement 840 40.9 X Global Routing 843 40.10 X Detailed Routing 848 40.10.1 Routing Space Model and Search Algorithm 849 40.10.2 Manufacturing-Constrained Routing 851 40.11 X Steiner Trees 855 40.12 X Manufacturing Considerations 856 40.13 X in Practice 859 40.14 Summary 860 References 861 40.1 INTRODUCTION For 40 years, physical design dogma has decreed that (nearly) every wire on a chip b e Manhattan— either horizontal or vertical—despite the intuition that adding diagonal wires should significantly reduce the total interconnect required to implement a design. This chapter briefly provides some historical context for the ubiquity of Manhattan wiring and then introduces and explores the X interconnect architecture [IML+02,T02], which combines diagonal and Manhattan wires per- vasively to improve IC layout. Realizing the theoretical benefits of X, which are substantial and surprisingly diverse, in practice has proven to be quite challenging, demanding both a careful reex- amination and refinement of the manufacturing flow and a material revision of almost every tool in the physical design flow. For example, X place and route (XPR)—the X system described here—has required new methods for floorplanning, wire load estimation in synthesis, global placement, global 835 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 836 29-9-2008 #3 836 Handbook of Algorithms for Physical Design Automation routing, detailed placement, detailed routing, parasitic extraction,OPC, clock routing, power routing, geometric search, and even database representation. The novelty of these methods has resulted in numerous patents. Even as X has begun to see commercial use, with X chips from Toshiba and ATI, among others, many challenges remain, and X-enabled physical design techniques should become and increasingly active area of research in both academia and industry in the coming years. 40.2 HISTORY The transistors for almost every chip are distributed across the chip’s two-dimensional surface, so at least two routing directions are required for the flexible interconnection of those transistors. Until the mid-1980s, it was not practical to have more than one or two layers of metal wiring above the polysilicon, or poly, layer. Because chips are generally rectangular so that the silicon wafers on which they are manufactured can be efficiently diced, it seemed logical to make the poly layer and the one metal layer (M1) orthogonal and axis-parallel. ∗ The second metal layer, M2, where it was used, was typically orthogonal to M1. Very short diagonal jogs were used in some channel routers and to miter the corners in the power rail through the I/O pads, but general diagonal wiring was not seriously considered. With the introduction in the late 1980s of planarization techniques to IC manufacturing, an additional layer of metal, M3, b ecame widely available, which single-handedly precipitated the invention of modern,area-based, place-and-routemethods and their rapid dominance over the row-and-channel-basedtechniques that preceded them. Although hexagonal wiring (i.e., 0 ◦ , 120 ◦ , and 240 ◦ )—now, sometimes called the Y architecture [CCK+03]—could, in principle, have been very powerful even then, the newness of multilayer, Manhattan IC manufacturing caused th e fabs to fear the added complexity of adding diagonal wires as well. So, despite a few theoretical papers (e.g., [K95] and referencestherein)andpatents [FHM+98,SKK+96](never reduced to practice)in the mid-1990s,there was no significant,practical exploration of diagonalwiring, either in manufacturing or in EDA, before the X effort at Simplex Solutions in the 1990s. Fortunately, the number of wiring layers increased sharply in the 1990s, which motivated a reexamination of the possible benefits of diagonal wiring and the practical hurdles involved in realizing them. There are two apparent challenges in producing chips with diagonal wires: manufacturing and EDA. As it happens, for some diagonal interconnect architectures, such as those that add 45 ◦ and 135 ◦ wires to Manhattan wires, mainstream manufacturing flows and equipment enable reasonably straightforward fabrication. Physical design for diagonal wiring, on the other hand, is considerably more challenging thanit first appears,andXPR—the first full-fledged suchphysical designsystem— required dozens of person-years of development before reaching industrial-strength maturity. The sections that follow describe the X interconnect architecture, which u ses both Manhattan and d iagonal wires pervasively,and introduce some of the many EDA innovationsthat were required to reduce X to practice. 40.3 X INTERCONNECT ARCHITECTURE Moore’s law has been largely driven by the regularity and rapidity with which fabs have been able to shrink transistors and wires while still manufacturing them reliably. Even with the 20-fold shrink from the 5-µ (∼1982) to the 0.25-µ (∼1982) technology node, transistors continued to dominate area, performance, and power. But the emergence of nanometer technologies (e.g,. 180, 130, 90 nm, etc.) starting in the late 1990s so sharply increased the relative impact of wires versus transistors on area, performance, and power as to necessitate radical innovations in the interconnect. The severity of the interconnect problem at nanometer nodes is evident from the replacement of aluminum wires on chips with copper: a multibillion-dollar effort that required new chemistry, new ∗ In fact, even in the two-layer case, the total wirelength would be reduced by about 1 percent by making the two wiring directions be northeast a nd northwest instead, but the complexity of doing so outweighs a mere 1 percent benefit. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 837 29-9-2008 #4 X Architecture Place and Route: Physical Design for the X Interconnect Architecture 837 equipment, and revised EDA tools that collectively reduced interconnect delay by ∼50 percent. Of course, changes of wiring material are not the only tools available, and other radical suggestions, such as the pervasive use of diagonal wires, reemerged as worthy of consideration. Just as with the introduction o f copper wiring, though, the benefits of diagonal wires carry significant practical challenges with them. For one, most of the equipment for the generation of masks and the manufacture of chips relies on an underlying, fine, discrete grid. All manufactured structures must have their edges on this grid, which was designed for Manhattan wiring but also directly permits 45 ◦ (northeast or NE) and 135 ◦ (northwest or NW), because a diagonal wire of length k √ 2, where k is a positive integer, resolves into k units in each of the horizontal and vertical directions. The apparent manufacturability of NE and NW wires suggested that an interconnect architecture with Manhattan, NE, and NW wires could provide the benefits of diagonals in a practical implementation. By contrast, the griddedness of manufacturing is a significant obstacle to the practicality of hexagonal wiring, for example, as the edges of the wires do not lie on grid. Further, at least one of the legs of a 30–60–90 right triangle must be irrational; that is the √ 3 term does not cancel for h exagonal wiring the way that the √ 2 terms do for the X architecture. A second consideration is that lower layers of wr iting (e.g., M1 and M2 ) are typically used for the internal connections of the standard cells and IP blocks themselves. Because cells and blocks are overwhelmingly Manhattan, there was no significant benefit in adding diagonals to those wiring layers for connections between cells. On the other hand, upper layers of mental are specifically intended for connections between cells, so it is more reasonable to combine Manhattan and diagonal wires there. An X interconnect architecture is one that combines the extensive use of both Manhattan wires and d iagonal wires. In practice, distinct wiring layers (or, at least, regions within those wiring layers) are devoted to particular Manhattan o r diagonal preferred directions. While one can construct an X architecture with each layer having wires rotated 45 ◦ with respect to the previous layer, it is improbable that such structures outperform those in which layers are stacked in orthogonal pairs. Further, the desire for backward compatibility with Manhattan-b ased infrastructure, such as standard cell libraries, has strongly influenced the application of diagonals in practice. The above observations—particu lar ly, backward compatibility—motivated th e typical use in practice of X inter c onnect architectures in which M1–M3 are Manhattan, typically alternating between horizontal and vertical preferred directions on successive layers, M4 is diagonal (e.g., NE), and M5 is perpendicular to M4 (e.g., NW). If there are additional metal layers, such as M6 and M7, they are again Manhattan, while M8 and M9 could be diagonal. Thus, 1. Every layer is adjacent to a perpend icular layer (to facilitate efficient local wiring) 2. Every sequential quartet of layers above M1 contains a horizontal, a vertical, a NE, and a NW (to facilitate efficient, larger-scale wiring) An interconnect architecture that satisfies these two properties makes the name “X” apparent, owing to the novel, X-shaped intersection of adjacent d iagonal layers. However, the term and the techniques described in this chapter apply more generally to any interconnect architecture in which both Manhattan and diagonal wires play a significant role. 40.4 THEORETICAL BENEFITS OF X X, both in theory and in practice, reduces chip area, wirelength, delay, coupling capacitance, and power versus Manhattan, sometimes to such as extent that a full layer of interconnect can be removed. To estimate these benefits analytically, it is easiest to begin with wirelength reduction. Most current placement algorithms minimize some approximation to the sum over all nets of the Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 838 29-9-2008 #5 838 Handbook of Algorithms for Physical Design Automation semiperimeter of each net’s Manhattan bounding box. ∗ As a result, placers optimize a correlate to wirelength, possibly weighted by timing considerations or congestion, but one might expect that they are unbiased with respect to bounding-box aspect ratios. To test this hypothesis, we examined dozens of industrial, Manhattan designs early in the X project and found that, in fact, aspect ratios are distr ibuted uniformly. At least 55–60 percent of nets in contemporary designs are two-pin nets even before placement, and the increasing use of buffers has sharply increased that percentage after placement. So, as a simple but reasonable estimate of the wirelength reduction versus Manhattan to expect with X, one can estimate the expected reduction in the wirelength of a two-pin net with a random aspect ratio. In the Manha ttan metric, a c ircle (i.e., th e locus of points at a fixed distance from a specified center point) is diamond-shaped; for the X metric, the circle is octagonal. Consider a connection with extent (|x|, |y|). Without lossof generality,supposethat (|x|≥|y|); then, the Manhattan-to-X wirelength reduction for this connection is 1−[|x|+|y|( √ 2 −1)]/(|x|+|y|). The expected value of this reduction, assuming that the source is at the center of a diamond, and the sink’s location is uniformly distributed around the periphery is Ref. [T02]. 1 − 1/2 0 √ 2 −2 y +1 dy 1/2 0 dy ∼ = 14.6 percent Importantly, this analysis assumes that only the router has changed, yet the real benefits of X become apparent only by making the whole system X-aware. For example, suppose that the placer is X-aware too; then, by analogy to the Manhattan system, the placer for X minimizes wirelength but is unbiased with respect to aspect ratio. To model an X placer analytically, rearrange the components that would have been placed within the Manhattan disk of radius r (i.e., filled diamond) to occupy an X d isk (i.e., filled octagon) of equal area [T02]. The resulting wirelen gth reduction, which is one minus the ratio the octagon’s radius of a diamond of equal area, is 1 − (1/2) 1/4 ∼ = 15.9 percent. † For the significant fraction of high-end designs that are core-limited rather than pad-limited, substantial additional improvement with X is possible, though. Most designs have utilizations far below 100 percent: often as low as 70 percent or even lower for interconnect-dominated designs, such as network switches [BR02]. It is reasonable to assume that X-place-and-route systems are as good at optimizing X objective functions as Manhattan place-and-route systems are at optimizing Manhattan objectives, but reducing wirelength by 15.9 percent while leaving die size unchanged would leave the X router with a much easier problem than a Manhattan router had. Instead, an X system can exploit the wirelength reduction versusManhattan by shrinkingthe die until the X routing is as dense as the Manhattan routing was in the o riginal design. In particular, reducing wirelength by 15.9 percent would require a die size shrink of 15.9 percent to restore the wiring density of the original Manhattan layout. Of course, this die size reduction would permit an additional wirelength reduction of 1 −[1 −(1 /2) 1/4 ] 1/2 ∼ = 8.3 percent, which, in turn, permits an additional die shrink of 8.3 percent, etc. The resulting infinite product converges to a wirelength and die size reduction of (1−1 √ 2) ∼ = 29.3 percent. This reasoning demonstrates that, in general, any technique that achieves a wirelength reduction of R percent with fixed area can be used to achieve a wirelength reduction ∗ Of course, the minimization of bounding-box semiperimeters, despite its near-ubiquity in current placers, ignores the location of pins within the box and the concomitant flexibility of wiring; it is, thus, a very crude measure of placement quality. † The analogous analysis for Euclidean (i.e., all-angle) wiring results in a wirelength reducton of 1 − √ (2/π) ∼ = 20.2 percent versus Manhattan. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 839 29-9-2008 #6 X Architecture Place and Route: Physical Design for the X Interconnect Architecture 839 of almost 2R percent (i.e., R(2 −R) percent) if area reduction is used to maintain constant layout difficulty. Particularly for wire-dominated designs such as networking chips, which typically have low utilizations with Manhattan wiring, significant die size reduction is possible with X. Although the above analysis is somewhat aggressive in assuming that all nets have only two pins, it is also conservative in that the substantial add itional area reduction owing to the concomitant shrinkage of driversand elimination of buffersisnotconsidered. In practice,reducing thediesize by 15–20percent and the wirelength by 20–25 percent has proved straightforward in general, and the alternative of not reducing the die size but instead using X to remove a wiring layer (e.g., at ATi)—and sometimes two wiring layers—has also been achieved. One additional, nonobvious, benefit of X is its utility for routing over IP blocks. Most IP blocks (e.g., memories) use Manhattan wiring for layers 1–3 and sometimes 4, and many have the restriction that Manhattan wiring over the block is forbidden due to capacitive coupling considerations. X, on the other hand, can cross over such such blocks diagonally with no risk of parallel runs between the over-the-block wiring and within-the-block wiring. Because more than half of a typical chip today is occupied by IP such as memory blocks, the opportunity to use the entirety of the upper wiring layers confers substantial benefit. 40.5 LIMITATIONS OF X Although both practical experience with X and theoretical analysis demonstrate its significant benefits—to wirelength, area, delay, power, etc.—X is not a panacea. For example, the above analy- sis demonstrates that the wirelength reductio n is nearly doubled by permitting area reduction, but for those atyp ical desig ns that have extremely high utilizations (e.g ., more than 90 percent of the die is occupied by components that are not buffers), X can provide only more modest wirelength reductions. Clearly, for blocks that have highly eccentric aspect ratios ( e.g., 5:1), the u se of diagonals does not buy much. In practice, such narrow blocks typically abut Manhattan IP blocks, though, so over-the-IP, diagonal wiring somewhat mitigates the diminished utility of X within the eccentric block. X requires enough wiring layers for some of them to be used for diagonals; for low-end devices for which signal routing extends only to M4, for example, X cannot be usefully applied. Finally, for designs with extensive Manhattan constraints on the upper wiring layers, diagonal wiring can be difficult to incorporate effectively. For example, if the floorplan is Manhattan-oriented, so that large horizontal buses cross an X block on M4, large-scale, diagonal wires on M4 become infeasible by construction. In practice, though,the vastmajority ofdesigns showsignificantwirelength anddie size reduction with X when compared to their Manhattan equivalents. Curiously, the most significant limitation of X in practice is the widespread misconception that X changes only the router and not the full layout system. Indeed, as the theoretical analysis above shows, keeping the area and the placement fixed and changing only the wiring to permit diagonals buys only 14.6 percent wirelength reduction and achieves that only for netlists with only two-pin nets. For real netlists, the improvement is somewhat lower in practice—perhaps, 10 percent (e.g., [I06])—if only the wiring is perm itted to change. Using a Manhattan floorplan, and Manhattan buses on the upper layers, can make the situation even worse. To see the benefits of X in practice, one must apply it to problems it addresses well: • Designs for which high utilization could not be achieved in Manhattan • Designs that are not overly eccentric in aspect ratio • Designs with enough wiring layers to devote part of at least one to diagonals • Designs that accommodate X by not overconstraining the upper layers with required Manhattan wires, and, most importantly • Designs that permit the whole layout system, and just the router, to be X-aware Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 840 29-9-2008 #7 840 Handbook of Algorithms for Physical Design Automation 40.6 ROLE OF VIAS Analyses such as the one above assume that vias between layers are free. Although the interconnect delay caused by vias is sharply reduced with copper wires, vias remain the most difficult features to manufacturereliably on an integrated circuit, and reducing the number of vias can have a significant, positive effect on yield. In addition, vias create routing obstacles on two layers, rather than one, so reducing the number of vias can significantly simplify layout. An even more entrenched dogma than the use of Manhattan wiring is the use of preferred- direction wiring: that is, the requirement that each routing layer have a designated, preferred wiring direction (e.g., horizontal), such that virtually all of the wire length on that layer is required to be in the preferred direction. The assumption of a preferred direction for each layer significantly simplifies and accelerates routing algorithms, but it also demands a via for every change of direction and, thus, for any connection in a Manhattan layout that is not exactly vertical or horizontal. Because X can move diagonally without using a via, it has the potential to reduce the number of vias significantly. On the other hand, backward compatibility with existing, Manh attan cell libraries demands that the diagonal layers are upper layers (e.g., M4 and M5), so extra vias are required to use the diagonals, mitigating via reduction somewhat. The p ronounced impact of vias on the both yield and layout thus motivates the consideration of nonpreferred-direction wiring: layouts in which wires in multiple orientations coexist on the same wiring layer. Even a conservative nonpreferred-direction strategy for X, using d iagonal jogs on Manhattan layers and Manhattan jogs on diagonal layers, can sharply reduce the number of vias required for layout, and XPR sees substantial via reductionsin practice versus contemporary Manhattan systems. More radical than short, nonpreferred-direction jogs, though, but considerably more powerful is the complete abandonment of the preferred-direction requirement. Full directional freedom on all layers, called liquid routing, makes the fullest use of the power introduced by diagonals, but is far more challenging from an EDA point of view than preferred-direction X. In the addition, the yield benefits of a via reduction from liquid routing must be weighed against lithographic concerns about nonpreferred-direction wiring in nanometer technologies to assess its long-term practicality. From an academic point of view, though, liquid routing is almost completely unexplored territory. The problems there are particularly challenging, but the potential, practical payoff is high. The reader is encouraged to investigate the patents by Caldwell and Teig on Q ∗ and related liquid routing ideas [TC04,TC05,TC06] as a starting point for future work. 40.7 SYSTEM FOR X PLACE AND ROUTE In the sections that follow, we describe a few of the tools and techniques used within XPR that made X layout possible starting in 2001. Many additional X-aware tools are required, ranging from parasitic extraction to clock routing and even to visualization infrastructure, as quad and K-D trees (Section 4.3.2) are highly inefficient for region queries with both long diagonal and long Manhattan wires. We confine the presentation here to placement, global routing, detailed routing, and Steiner tree construction. 40.8 X PLACEMENT Placement is typically performed in one of the three ways: recu rsive partitioning, analytical tech- niques, or Monte Carlo methods such as simulated annealing. Annealing, which is often used for detailed placement but rarely for global placement anymore because of its high computational expense,can bedirectlyadapted to theXinterconnectarchitecture bychanging the objectivefunction, but the other two placement methods require much more drastic modifications. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 841 29-9-2008 #8 X Architecture Place and Route: Physical Design for the X Interconnect Architecture 841 Another commo n approach to mod ern placement is analytical formulation. Here, th e netlist’s wirelength is approximated by a continuously differentiable function that is either directly mini- mized with a conjugate gradient-based minimizer (or other derivative-basedtechniques) or indirectly minimized using a linear system of equations whose solution minimizes the original wirelength formula. An unusual and significant challenge in X placement stems from the inseparability of two- dimensional distance into independent, one-dimensional components. The Manhattan distance between (x1, y1) and (x2, y2) is simply |x2 − x1|+|y2 − y1|,sox and y can be treated indepen- dently, and reducing distance by one unit in either x or y is equally desirable, all other considerations (e.g., congestion) being equal. Distance in X, though, is max (|x2 − x1|, |y2 − y1|) + ( √ 2 − 1) ∗ min(|x2 − x1|, |y2 − y1|). When this equation is differentiated, the resulting derivative inter- twines x and y in ways for which most analytical placers and partitioners were not designed. In fact, this requirement for separability is why most an alytical placers use either squared Euclidean length [KSJA91], some approximation of linear wirelength using piecewise squared length [SDJ91] or β-regularization [BKKM01], or some other approximation such as bounding-box semiperimeter [KRW05]. All of these techniques have the desirable property that the x-andy- equations are sparse and separable: that is, can be solved as two separate systems, thus greatly reducing runtime. Typical analytical placement techniques often also requ ire separability within the legalization step that follows, which is usually partitioning-based. For example, it is no t clear that techniques such as Ref. [V97] can be made X-aware, even in principle. Methods such as β-regularization [BKKM01] and conjugate gradient methods [KRW05] can be generalized to accommodate X, at the expense of some implementation complexity, and this would be a potentially fruitful area for future research. By contrast, recursive partitioning methods, either standalone or as legalization methods for analytical placers, seem to be fundamentally incompatible with X. To see this, consider how a modern partitioning algorithm works. Mo st of these techniques are ultimately derived from the early algorithm of Kernighan and Lin [KL70], later improved by Fiducia and Mattheyses [FM82]. After the authors’ names, this type of approach is typically referred to as KLFM. The KLFM algorithm first divides the set of components into two roughly equal-sized subsets. It then moves or swaps individual components between these subsets heuristically to minimize the number of nets that contain components in both subsets—that is, that are cut. The KLFM algorithm itself is a fairly simple local optimization heuristic, yet it performs quite well on this problem. An enormous amount of later research builds improvements onto the basic KLFM structure (see Section 7.2). The way that KLFM is typically used to solve the placement problem is by recursive bip artition- ing, in which the two-dimensional placement problem is artificially decomposed into a sequence of one-dimensio nal partitioning problems. This is accomplished by specifying a vertical or horizon tal cutline that roughly bisects the placement area and using KLFM to partition the set of components into two subsets that are constrained to lie on opposite sides of the cutline. Then, each partition is, in turn, bisected by a cutline, and KLFM divides the subset within the partition into two smaller subsets that are constrained to lie on opposite sides o f the partition’s cutline. This process contin- ues recursively until only a few components are left in each partition. A variety of techniques have been devised for terminal propagation, which allows the algorithm to capture the influence of con- nected components outside of th e current subproblem on the p lacement of the components inside the subproblem to which they are connected. Unfortunately,recursivebipartitioningof this type is poorly suited to the X interconnect architec- ture. The use of horizontal and vertical cutlines one at a time assumes the separability of horizontal and vertical distances in assessing placement quality just as many of the analytical placement tech- niques do. The inseparability of the X metric means that even the addition of diagonal cutlines to a partitioning strategy will not capture X p lacement quality, which depends fundamentally on the . Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C039 Finals Page 832 29-9-2008 #21 832 Handbook of Algorithms for Physical Design Automation GND GND GNDVDD. placement, global 835 Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 836 29-9-2008 #3 836 Handbook of Algorithms for Physical Design Automation routing, detailed. placement algorithms minimize some approximation to the sum over all nets of the Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 838 29-9-2008 #5 838 Handbook of Algorithms