Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 392 10-10-2008 #17 392 Handbook of Algorithms for Physical Design Automation 19.3.4 RELAXATION Iterative improvement at each level may employ various techniques—network flows, simu- lated annealing, nonlinear programming, force-directed models—provided that it can support incorporation of complex constraints appropriate to the modeling scale at the current level. Important considerations for relaxation include the following: 1. Should it be local (e.g., annealing-based) or global (e.g., force-directed)? 2. How should net models (objectives) and density models be adapted to different modeling scales? 3. To what extent should relaxation be expected to change the starting configuration it inherits from an adjacent level? 4. What termination criteria should be used? 5. How scalable must the relaxation be? 6. How easily can it be implemented? 7. How readily can it be adapted to accomodate additional complex constraints? For example,in both mPL and APlace, the density grid sizes, and log-sum-expHPWL smoothing parameter, and bin-grid density smoothing parameters are chosen to match the scale of resolution implied by the average cluster size. For this reason, both these engines carefully control the variation in cluster sizes during coarsening. 19.3.4.1 mPL6 In mPL5 [49] and mPL6 [39], fast numerical PDE solvers are used in a generalization of the Eisenmann–Johannes force-directed model [10,13] at each level of hierarchy (Chapter 18). The global NLP relaxations in mPL6 are observed to dramatically improve quality over the earlier implementations [50] relying more o n localized iterations. In mPL6 [39], iterations at each level terminate when the average area–density overflow over all bins is sufficiently small. Convergence to nonuniform area–density distributions is enabled by the introduction of filler cells [51] unconnected to modules in the netlist. These are introduced hierarchically from the top-down in proportion to the white space available in each rectangular subregion r egion following the initial unconstrained placement. In addition, these filler cells areperiodically redistributed from scratch from the top-down. Adjustment of relative weights assigned to the log-sum-exp HPWL objective and the density constraints in mPL6 is intriguing. Modules do not simply spread monotonically toward their final positions. Instead, at every level of hierarchy, the HPWL term is given a large enough weight at early iterations to allow modules to contract together tightly enough to alter relative positions before sub- sequent increase of the density weight and re-expansion of the modules toward a more area-uniform configuration. These alternating contracting and expanding motions seem to confer additional hill- climbing ability to mPL6 and improve its final so lution quality significantly compared with simpler and faster monotonic spreading. 19.3.4.2 APlace In APlace [29,41], nonlinear conjugate gradients is used to iteratively improve a penalty function obtained for analytical approximationsof a HPWL objective and bell-shaped bin-based area–density constraints (Chapter 18). Relaxation at each level of APlace proceeds by the Polak–Ribiere variant o f nonlinear conjugate gradients [52] with golden-section linesearch [53]. A hard iteration limit of 100 is imposed. The grid size |G|, objective weight, wirelength smoothing parameterα, and area–densitypotential radius r are selected and adjusted at each level to guide the convergence. Bin size and α are taken proportional to the average cluster size at the current level. The density-potential radius r is set to 2 on most grids Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 393 10-10-2008 #18 Enhancing P lacement with Multilevel Techniques 393 but is increased to 4 at the finest grid to prevent oscillations in the maximum cell-area density of any bin. The density-potential weight is fixed at one. The wirelength w eight is initially set rather large and is subsequently decreased by 0.5 to escape from local minima with too much overlap. As iterations proceed, the relative weight of the area–density penalty increases, and a relatively uniform cell-area distribution is obtained. Termination in APlace isbased on discrepancy, definedfora givenwindow sizeAas the maximum ratio of module area in any circumscribing rectangle of area A. Compared with other tools, this measure of density control is quite strict and may account in part for APlace’s relatively long runtimes [54]. 19.3.4.3 FDP/LSD In the FDP/LSD [12,48] placer, a multilevel formulation is seen as a way of improving the relative positions of modules following an analytic, unconstrained quadratic HPWL-minimizing initial place- ment. In particular, clustering of tightly connected modules forces them to remain spatially close, even as other modules less strongly connected to those in the cluster are allowed to migrate away. After the initial analytical placement, netlist partitioning is also incorporated as a means of further separating modules in congested regions before subsequent quadratic placement steps. In contrast to most earlier work, th e FDP authors specifically cite large quality improvements due solely to the multilevel formulation. Termination in FDP is controlled b y normalized Klee measure [55], in which the total amount of core area occupied by overlapping modules is accurately computed by a segment-tree technique and then divided by the sum of all module areas. This spread-metric fraction is strictly less than 1 when overlap exists and approaches 1 a s overlap is removed. The FDP multiscale flow terminates, and legalization commences, when approximately 30 percent overlap remains according to this metric; i.e., when the spread-metric fraction is approximately 0.7. 19.3.4.4 Dragon In Dragon [7,37], an initial cutsize-minimizing quad risection is followed by a bin-swapping-based refinement, in which entire partition blocks at the given level are interchanged in an effort to reduce total wirelength. Recursive quadrisection and bin-swapping proceeds to the finest level. At all levels except the last, low-temperature simulated annealing is used to swap partition blocks. At the finest level, a more detailed and greedy strategy is employed. Dragon has been successfully adapted to incorporate complex constraints such as timing and routability. 19.3.5 INTERPOLATION Interpolation (a.k.a declustering, uncoarsening) maps a placement at a given coarser level to a place- ment at the adjacent finer level. The most common interpolation functions used in placement are piecewise constant, wherein each module at the finer level simply inherits the current position of its parent cluster at the coarser level. Simple declustering and linear assignment can be effective, particu larly in contexts with uni- formly sized modules [56]. With th is approach, each component cluster is initially placed at the center of its parent’s location. If an overlap-free configuration is needed, a uniform bin grid can be laid down, and clusters can be assigned to n earby bins or sets of bins. The complexity of this assign- ment can be reduced by first partitioning clusters into smaller windows, e.g., of 500 clusters each. If clusters can be assumed to have uniform size, then fast linear assignment can be used. Otherwise, approximation heuristics are needed. Under AMG-style weighted disaggregation, interpolation proceeds b y weighted averaging: each finer-level cluster is initially placed at the weighted average of the positions of all c oarser-level clusters with which its connection is sufficiently strong [16,57]. Finer-level connections can also Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 394 10-10-2008 #19 394 Handbook of Algorithms for Physical Design Automation be used: once a finer-level cluster is placed, it can be treated as a fixed, coarser-level cluster for the purpose of placing subsequent finer-level clusters. Weighted aggregation is described further in Section19.2.3. A constructive approach, as in Ultrafast VPR [24], can also lead to extremely fast and scalable algorithms. At each level, clusters are initially placed in the following sequ e nce: (i) clusters directly connected to output pads, (ii) clusters directly connected to input pads, and (iii) other clusters. 19.3.6 MULTISCALE LEGALIZATION AND DETAILED PLACEMENT Multiscale algorithms and ideas are featured in recent stud ies of legalization of mixed-size place- ments, where the largest objects may be severalordersof magnitude larger than the smallest modules. In this setting, the transition from GP to legalization takes on increased importance, as final legaliza- tion at the finest level may be difficult or impossible without massive disruption of the given global placement, unless the global placer’s estimates of constraint satisfiability are sufficiently precise. In mPL6 [42], the largest modules at each cluster level are legalized before interpolation to the adjacent finer level. In this way, the multiscale framework is used to smooth the transition between levels and increase the predictability at coarse levels of the final quality of results at the finest level. The multiscale flow essentially decomposes mixed-size legalization into a sequence of legalizations of clusters sizes balanced to within the tolerance prescribed during co a rsening. In this way, it efficiently supports look-ahead legalization [36,58] of difficult-to-legalize test cases [30], which can improve QoR on high -utilization designs. Multiscale ideas are also used in detailed placement [21,59,60]; cf. Ch apter 20. 19.4 CONCLUSION In practice, there is no single, simple, generic prescription for transforming a flat algorithm for placement into a multilevel algorithm. Consistent impr ovement from o ne level to the next depends on close coordination of coarsening, relaxation, and interpolation; this coordination depends in turn on the specific ways in which aggregates are defined and a given placement is improved. Intralevel stopping criteria, limits on variation in cluster size, the ratio of problem sizes at adjacent levels, and the number of variables and constraints at the coarsest level may vary across different imple- mentations. Ultimately, the precise settings of these parameters are generally derived empirically. In practice, intralevel termination criteria are designed so that relaxation ends soon after reduction in objectives and relaxed constraint violations slows. Intralevel and outer-flow convergence criteria must complement each other to enable iterative identification of the best solutions. Nevertheless, in recent years some trends have emerged following the 2005 and 2006 ISPD place- ment contests [61,62]. Although clustering has long been viewed as a straightforward means to speed and scalablility [18–20,24], recent results demonstrate clearly that leading multilevel optimization implementations also produce superior quality [4,12,37,49]. Improved priority-queue-based greedy clustering [26] increases the accuracy of coarse-level representations. Monotonic d ecrease at coarser levels generally amounts to hill climbing at the finest level, the corresponding large-scale moves of aggregates bypassing local variations en route to globally improved configurations. Clustering errors must be reversible by sufficiently powerful forms of r elaxation, interpolation, and iteration flow (e.g., multiple or recursive V-cycles). However, relaxation at finer levels must be scalable, and it must both respect its starting solution inherited from coarser levels and also be able to improve it rapidly. Netlist-driven priority-queue -based greedy clu stering [26] enables rapid redu ction in problem size, up to 10 times per level, at no apparent cost in solution quality. Vertex-affinity heuristics such as fine-granularityclusteringandnetcluster,designedtoaggressivelyreduce netcountsat coarserlevels, are widely u sed. Location-based or physical clustering can be used to support multiple traversals over multiple hierarchies. However, best results published to date are still attained by algorithms Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 395 10-10-2008 #20 Enhancing P lacement with Multilevel Techniques 395 using just one pass of succesive refinement, from coarsest to finest level, with relatively powerful global relaxation at each level. Improved formulations of flat analytical placement [10,63,64] have served as superior forms of relaxation in several recent leading multilevel placement implemen tations [4 ,49,65], p ossibly in part because the glo bal view in iterative improvement com plements the locality of clustering. Finally, we note that variants of multiscale placement have also played a significant role in recent advances in hybrid methods for partitioning-based placement (Chapter 15) and floorplanning (Chapter 12). ACKNOWLEDGMENT Partial support for this work has been provided by Semiconductor Research Consortium Contract 2003-TJ-1091 and National Science Foundation Contracts CCF 0430077 and CCF-0528583. This chapter is derived from the article in Ref. [50]. REFERENCES 1. A. Brandt. Multi-level adaptive solutions to boundary value problems. Mathematics of Computation, 31(138):333–390, 1977. 2. W. L. Briggs, V. E. Henson, and S. F. McCormick. A Multigrid Tutorial, 2nd edn. SIAM, Philadelphia, 2000. 3. J. Cong and X. Yuan. Multilevel global placement with retiming. In Proceedings of Design Automation Conference, pp. 208–213, New York, 2003. ACM Press. 4. A. B. Kahng and Q. Wang. Implementation and extensibility of an analytic placer. IEEE Transactions on Computer-Aided Design of Integrated and Systems, 24(5):734–747, 2005. ISPD 2004–2006, ICCAD 2004–2005. 5. C. -C. Chang, J. Cong, D. Pan, and X. Yuan. Multilevel global placement with congestion control. IEEE Transactions on Computer-Aided Design of Integrated and Systems, 22(4):395–409, Apr 2003. 6. C. Li, M. Xie, C. K. Koh, J. Cong, and P. Madden. Routability-driven placement and white space allocation. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 394–401, Nov 2004. 7. T. Taghavi, X. Yang, B. -K. Choi, M. Wang, and M. Sarrafzadeh. Congestion minimization in modern placement circuits. In G. -J. Nam and J. Cong (editors), Modern Circuit Placement: Best Practices and Results, pp. 135–165. Springer, New York, 2007. 8. Y. Cheon, P. -H. Ho, A. B. Kahng, S. Reda, and Q. Wang. Power-aware placement. In Proceedings of Design Automation Conference, Anaheim, CA, pp. 795–800, 2005. 9. A. Brandt and D. Ron. Multigrid solvers and multilevel optimization strategies. In J. Cong and J. R. Shinnerl (editors), Multilevel Optimization and VLSICAD. Kluwer Academic Publishers, Boston, 2003, pp.1–69. 10. H. Eisenmann and F. M. Johannes. Generic global placement and floorplanning. In Proceedings of 35th ACM/IEEE Design Automation Conference, San Franscisco, CA, pp. 269–274, 1998. 11. B. Hu and M. Marek-Sadowska. mFar: Multilevel fixed-points addition-based VLSI placement. In G. -J. Nam and J. Cong (editors), Modern Circuit Placement: Best Practices and Results, pp. 229–246. Springer, New Yor k, 2007. 12. K. Vorwerk and A. A. Kennings. An improved multi-level framework for force-directed placement. In DATE, Munich, Germany, pp. 902–907, 2005. 13. P. Spindler and F. M. Johannes. Kraftwerk: Afast and robust quadratic placer using an exact linear net model. In G. -J. Nam and J. Cong (editors), Modern Circuit Placement: Best Practices and Results, pp. 59–95. Springer, NY, 2007. 14. N. Viswanathan, G. -J. Nam, C. J. Alpert, P. Villarrubia, H. Ren, and C. Chu. RQL: Global placement via relaxed quadratic spreading and linearization. In Proceedings of Design Automation Conference, San Diego, CA, pp. 453–458, 2007. 15. G. H. Golub and C. F. Van Loan. Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore, Maryland, 1996. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 396 10-10-2008 #21 396 Handbook of Algorithms for Physical Design Automation 16. T. F. Chan,J. Cong, T. Kong, J. Shinnerl, andK.Sze. An enhancedmultilevel algorithmfor circuit placement. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 299–306, Nov 2003. 17. R. M. Lewis and S. Nash. Practical aspects of multiscale optimization methods for VLSICAD. In J. Cong and J. R. Shinnerl (editors), Multilevel Optimization and VLSICAD. Kluwer Academic Publishers, Boston, 2003, pp. 265–291. 18. D. M. Schuler and E. G. Ulrich. Clustering and linear placement. In Proceedings Design Automation Conference, pp. 50–56, New York, 1972. ACM Press. 19. S. Mallela and L. K. Grover. Clustering based simulated annealing for standard cell placement. In Pro- ceedings of Design Automation Conference, Atlantic City, NJ, pp. 312–317. IEEE Computer Society Press, 1988. 20. W. -J. Sun and C. Sechen. Efficient and effective placement for very large circuits. IEEE Transactions on Computer-Aided Design, 14(3):349–359, 1995. 21. S. -W. Hur and J. Lillis. Relaxation and clustering in a local search framework: Application to linear placement. In Proceedings of ACM/IEEE Design Automation Conference, pp. 360–366, New Orleans, 1999. 22. J. Cong and D. Xu. Exploiting signal flow and logic dependency in standard cell placement. In Pr oceedings of Asia South Pacific Design Automation Confer ence , p. 63, New York, 1995. ACM Press. 23. C. Sechen and K. W. Lee. An improved simulated annealing algorithm for row-based placement. In Pr oceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 478–481, 1987. 24. Y. Sankar and J. Rose. Trading quality for compile time: Ultra-fast placement for F PGAs. In FPGA ‘99, ACM Symposium on FPGAs, Monterey, CA, pp. 157–166, 1999. 25. V. Betz and J. Rose. VPR: A new packing, placement, and routing tool for FPGA research. In Proceedings of International Workshop on FPL, London, U.K., pp. 213–222, 1997. 26. G. J. Nam, S. Reda, C. J. Alpert, P. Villarrubia, and A. B. Kahng. A fast hierarchical quadratic placement algorithm. IEEE Transactions on Computer-Aided Design of Integrated and Systems, 25(4):678–691, 2006 (ISPD 2005). 27. J. Li, L. Behjat, and J. Huang. An effective clustering algorithm for mixed-size placement. In Proceedings International Symposium on Physical Design, Austin, TX, pp. 111–118, 2007. 28. B. Hu Marek-Sadowska, M. Fine granularity clustering based placement. IEEE Transactions on Computer- Aided Design of Integrated and Systems, 23(4): 527–536, 2004 (ISPD 2003: pp. 67–74, San Diego, CA and DAC 2003: pp. 800–805, Anaheim, CA). 29. A. Kahng, S. Reda, and Q. Wang. APlace: A high quality, large-scale analytical placer. In G. -J. Nam and J. Cong (editors), Modern Circuit Placement: Best Practices and Results, pp. 163–187. Springer, NY, 2007. 30. T. -C. Chen, Z. -W. Jiang, T. -C. Hsu, H. -C. Chen, and Y. -W. Chang. NTUPlace3: An analytical placer for large-scale mixed-size designs. In J. Cong and G. -J. Nam (editors), Modern Circuit Placement: Best Practices and Results, pp. 289–310. Springer, New York, 2007. 31. G. Karypis. Multilevel algorithms for multi-constraint hypergraph partitioning. Technical Report 99-034, Department of Computer Science, University of Minnesota, Minneapolis, 1999. 32. G. Karypis. Multilevel hypergraph partitioning. In J. Cong and J. R. Shinnerl (editors), Multilevel Optimization and VLSICAD. Kluwer Academic Publishers, Boston, 2003, pp. 125–154. 33. J. Cong and S. K. Lim. Edge separability based circuit clustering with application to circuit partitioning. In Asia South Pacific Design Automation Conference, Yokohama, Japan, pp. 429–434, 2000. 34. S. N. Adya, S. Chaturvedi, J. A. Roy, D. A. Papa, and I. L. Markov. Unification of partitioning, placement and floorplanning. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 12–17, 2004. 35. T. -C. Chen, Y. -W. Chang, and S. -C. Lin. Imf: Interconnect-driven multilevel floorplanning for large- scale building-module designs. In Proceedings of International Conference on Computer-Aided Design, pp. 159–164, Washington, DC, 2005. IEEE Computer Society. 36. A. N. Ng, I. L. Markov, R. Aggarwal, and V. Ramachandran. Solving hard instances of floorplacement. In Pr oceedings of International Symposium on Physical Design, pp. 170–177, Ne w York, 2006. ACM Press. 37. M. Sarrafzadeh, M. Wang, and X. Yang. Modern Placement Techiques. Kluwer, Boston, 2002. 38. J. A. Roy, D. A. Papa, and I. L. Markov. Capo: Congestion-driven placement for standard-cell and RTL n e tlists with incremental capability. In G. -J. Nam and J. Cong (editors), Modern Circuit Placement: Best Practices and Results, pp. 97–134. Springer, New York, 2007. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 397 10-10-2008 #22 Enhancing P lacement with Multilevel Techniques 397 39. T. F. Chan, J. Cong, J. R. Shinnerl, K. Sze, and M. Xie. mPL6: Enhanced multilevel mixed-size placement with congestion control. In G. -J. Nam and J. Cong (editors), Modern Circuit Placement: Best Practices and Results, pp. 247–288. Springer, NY, 2007. 40. N. Vi swanathan, M. Pan, and C. Chu. FastPlace 3.0: A fast multilevel quadratic placement algorithm with placement congestion control. In Proceedings of Asia South Pacific Design Automation Conference, Yokohama, Japan, pp. 135–140, 2007. 41. A. B. Kahng, S. Reda, and Q. Wang. Architecture and details of a high quality, large-scale analytical placer. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 891–898, Nov 2005. 42. T. F. Chan, J. Cong, M. Romesis, J. R. Shinnerl, K. Sze, and M. Xie. mPL6: Enhanced multilevel mixed-size placement. In Proceedings of International Symposium on Physical Design, San Jose, CA, pp. 212–214, Apr 2006. 43. B. Hu and M. Marek-Sadowska. Multilevel fixed-point-addition-based VLSI placement. IEEE Transactions on Computer-Aided Design of Integrated and Systems, 24(8):1188–1203, 2005. 44. H. Chen, C. -K. Cheng, N . -C. Chou, A. B. Kahng, J. F. MacDonald, P. Suaris, B. Yao, and Z. Zhu. An algebraic multigrid solver for analytical placement with layout-based clustering. In Proceedings of IEEE/ACM Design Automation Conference, Anaheim, CA, pp. 794–799, 2003. 45. T. Luo and D. Z. Pan. DPlace: Anchor-cell-based quadratic placement with linear objective. In G. -J. Nam and J. Cong (editors), Modern Circuit Placement: Best Practices and Results, pp. 39–58. Springer, New York, 2007. 46. A. Brandt. Multiscale scientific computation: Review 2001. In T. Barth, R. Haimes, and T. Chan (editors), Multiscale and Multiresolution Methods. Springer Verlag, NY, 2001, pp. 3–95. 47. J. Cong and J. R. Shinnerl (editors). Multilevel Optimization in VLSICAD. Kluwer Academic Publishers, Boston, 2003. 48. K. Vorwerk and A. Kennings. Mix ed-size placement via l ine search. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 899–904, 2005. 49. T. F. Chan, J. Cong, and K. Sze. Multilevel generalized force-directed method for circuit placement. In Pr oceedings of International Symposium on Physical Design, San Francisco, CA, pp. 185–192, 2005. 50. J. Cong, J. R. Shinnerl, M. Xie, T. K ong, and X. Yuan. Large-scale circuit placement. ACM Transactions on Design Automation of Electronic Systems, 10(2):389–430, 2005. 51. S. N. Adya, I. L. Markov, and P. Villarrubia. On whitespace and stability in mixed-size placement and physical synthesis. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 311–319, 2003. 52. S. G. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw Hill, New York, 1996. 53. P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, London and Ne w York, 1981. ISBN 0-12-283952-8. 54. http://www.sigda.org/ispd2006/contest.html. 55. K. Vorwerk, A. Kennings, and A. Vannelli. Engineering details of a stable force-directed placer. In Pr oceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 573–580, Nov 2004. 56. T. F. Chan, J. Cong, T. K ong, and J. Shinnerl. Multilevel optimization for large-scale circuit placement. In Proceedings of International Conference on Computer-Aided Design, pp. 171–176, San Jose, CA, Nov 2000. 57. I. Safro, D. Ron, and A. Brandt. Graph minimum linear arrangement by multilevel weighted edge contractions. Journal of Algorithms, 60(1): 24–41, 2006. 58. J. Cong, M. Romesis, and J. Shinnerl. Robust mixed-size placement under tight white-space constraints. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 165–172, Nov 2005. 59. A. B. Kahng, P. Tucker, and A. Zelikovsky. Optimization of linear placements for wirelength minimization with free sites. In Proceedings Asia South Pacif ic Design Automation Conference, Wanchai, Hong Kong, pp. 241–244, 1999. 60. M. Pan, N. Viswanathan, and C. Chu. An efficient and effective detailed placement algorithm. In Proceedings of International Conference on Computer-Aided Design, San Jose, CA, pp. 48–55, 2005. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 398 10-10-2008 #23 398 Handbook of Algorithms for Physical Design Automation 61. G. -J. Nam, C. J. Alpert, P. Villarrubia, B. Winter, and M. Yildiz. The ISPD2005 placement contest and benchmark suite. In Proceedings of International Symposium on Physical Design, San Francisco, CA, pp. 216–220, Apr 2005. 62. G. -J. Nam. ISPD 2006 placement contest: Benchmark suite and results. In Proceedings of International Symposium on Physical Design, pp. 167–167, New York, 2006. ACM Press. 63. B. Hu and M. Marek-Sadowska. FAR: Fixed-points addition & relaxation based placement. In Proceedings of International Symposium on Physical Design, pp. 161–166, New York, 2002. ACM Press. 64. W. C. Naylor, D. Ross, and S. Lu. Nonlinear optimization system and method for wire length and delay optimization for an automatic electric circuit placer, Oct 2001. 65. B. Hu, Y. Zeng, and M. Marek-Sadowska. mFAR: Fixed-points-addition-based VLSI placement algorithm. In Proceedings of International Symposium on Physical Design, San Francisco, CA, pp. 239–241, Apr 2005. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C020 Finals Page 399 23-9-2008 #2 20 Legalization and Detailed Placement Ameya R. Agnihotri and Patrick H. Madden CONTENTS 20.1 Introduction 399 20.1.1 Notation 402 20.1.2 Routing Models 402 20.2 Space Management 403 20.2.1 Flow-Based Overlap Removal 404 20.2.1.1 Setting Up and Solving the Transportation Problem 405 20.2.1.2 Calculation of Transportation Cost 406 20.2.2 Diffusion-Based Placement Migration 408 20.2.3 White Space Allocation 408 20.2.4 Computational Geometry-Based Placement Migration 409 20.2.5 Cell Shifting 410 20.2.6 Grid Warping 410 20.2.7 Space Management Summary 411 20.3 Legalization Techniques 411 20.3.1 Flow and Diffusion-Based Legalization 411 20.3.2 Tetris-Based Legalization 412 20.3.3 Single-Row Dynamic Programming-Based Legalization 412 20.4 Local Improvements 414 20.4.1 Cell Mirroring and Pin Assignment 414 20.4.2 Reordering of Cells 415 20.4.3 Optimal Interleaving 417 20.4.4 Linear Placement with Fixed Orderings 418 20.4.4.1 Notations and Assumptions 419 20.4.4.2 Analysis ofthe Cost Function 419 20.4.4.3 Dynamic ProgrammingAlgorithm 419 20.5 Limits of Legalization and Detailed Placement 420 References 421 20.1 INTRODUCTION In this chapter, we survey work on space management, legalization, and detailed placement, the design flow steps n ormally falling between global placement and the start of routing. Over the past few years, the traditional physical design flow has evolved. Where there was once a sequence of discrete steps, one now sees a blurring of activities and a great deal of iterative improvement. The methods described here should not be viewed as standalone optimizations; rather, they should be considered as components in a more complex multifaceted approach. 399 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C020 Finals Page 400 23-9-2008 #3 400 Handbook of Algorithms for Physical Design Automation Logic synthesis Global placement Congestion, timing, and thermal analysis Legalization Detailed placement (reordering, linear placement, mirroring) Routing Space management Severe problems may require a return to global placement, or even logic synthesis Small-scale timing and congestion problems may be resolved by stretching or shifting a placement to insert additional space FIGURE 20.1 Traditional linear design flow has been replaced by a more iterative process. Legalization and detailed placement may reveal problems with routing or timing performance, necessitating changes to the place- ment, and repeated steps. To enable design convergence, it is desirable to have these changes incremental, with each new placement being similar to the prior one. This chapter focuses on the topics indicated in boldface text. In early design flows, the transition from global placement to detailed placement was relatively simple. The logic elem ents were alig ned to cell rows, and th en small local optimizations were performed. With changes to routing models, and the dominance of interconnect delay, space man- agement has fundamentally changed the design flow, and has emerged as a key element of successful strategies. Figure 20.1 illustrates a current approach; following global placement, a number of methods can be used to analyze a placement. Congestion estimation [1] can identify regions where routing demand will likely exceed the available resources; an effective technique is to insert additional white space between logic elements, spreading out the circuit and gaining more room for wiring. Similarly, timing analysis may find slow paths that can be improved through buffer insertion or gate sizing; again, spreading out of the circuit may be required to provide room for the new logic elements. Thermal hot spots are also a major concern on high-performancedevices, and additional space is yet again needed. A primary motivationforusing a space management-basedapproachis that it provides a measure of stability [2] in the design flow. If one were to return to global placement each time a routing or timing problem was encountered, it would be difficult to achieve design closure; a new placement might eliminate previous problems, but new problems are likely to arise. By shifting and adjusting an existing placement, it is easier to achieve design closure. Formostof thediscussion, wefocuson thesimpleobjectiveof half-perimeter wirelength (HPWL) minimization. It should be noted, however, that HPWL is only an estimate of routing demand, and in many cases, this can be far off. For nets with up to three pins, HPWL is the best possible length that could be achieved; for higher degree nets, both minimum spanning trees and Steiner trees can have higher lengths. The actual length of the interconnect wiring can be increased greatly by the insertion of detours; for dense, congested designs, it may not be possible to avoid detours. The routability of a circuit can be enhanced considerably by adding additional space into a placement; while this can increase HPWL, it may be necessary for successful routing, and can actually improve routed wirelength by reducing the number of detours. Even if one were to be able to accurately estimate routing lengths, this is not in itself a meaning- ful metric. Far more important is the delay of the circuitry, which impacts the max imum operating frequency. Similarly, the length of the interconnect impacts switching capacitance and power con- sumption, but the actual switching behavior must be considered to have an accurate estimate. Although low HPWL correlates with good performance, it should not be viewed as the sole metric for evaluating a placement. We attempt to highlight how various optimization techniques interact with each other. Although the mixing of techniques results in better overall circuit designs, it also becomes mor e difficult to quantify the effect of each component. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C020 Finals Page 401 23-9-2008 #4 Legalization and Detailed Placement 401 Our discussion begins with a brief summary of routing models, and how they have changed over the years. With modern designs, space management is essential for achieving routability; the semiconductor industry has switched from a variable-die design style to a fixed-die model, resulting in the distinct possibility that a den se design will fail to route successfully (Figure 20.2). Some optimization methods performed during detail placement may seem counterintuitive unless one considers the routing constraints. After discussing the routing models, we then focus on methods to distribute space within a global placement; this has been an active area over the past few years, and a great deal of progress has been made. Successful routing is not the only reason that space insertion is of interest. High-performance designs commonly face problems with power delivery and heat removal, spacing out active devices spreads heating, resulting in lower peak temperatures. Yet another application of space insertion methods is as a way to reserve area for timing optimization. As part of an iterative improvement process, individual logic gates may be resized, and buffers can be inserted into long wires. Designs that are not densely packed can accomodate these changes without a great deal of disruption in the overall structure. After space insertion, a placement must be legalized. Standard cells must align into rows, and may a lso need to follow a column grid. Overlaps between both standard cells and macroblocks macro must be removed. For legalization, some problems are easy, allowing a remarkably simple method to be used; one objective of space management methods can be to make legalization p roblems easy. Fixed-die routing model. No additional space is available between cell rows. This model allows greater device density, but poses more difficult routing problems. Channel-based variable die with some over-the-cell routing Variable-die model In variable-die designs, standard cell row spacing can be adjusted to match the routing demand. An entire routing channel must be expanded to match the peak demand, potentially wasting resources in some areas. With modern fixed-die designs, standard cell row do not have any spacing between them, allowing sharing of power and ground wiring. All routing occurs over the cell rows, and there is no simple way to gain additional routing space. Modern designs may include macroblocks, which can further disrupt routing, and make space management more difficult. FIGURE 20.2 Increasing routing resources have caused routing to shift from a channel-based approach to over-the-cell. In the fixed-die, over-the-cell model, it may not be possible to shift logic elements apart to gain additional routing resources. . Design, San Jose, CA, pp. 48–55, 2005. Alpert /Handbook of Algorithms for Physical Design Automation AU7 242_ C019 Finals Page 398 10-10-2008 #23 398 Handbook of Algorithms for Physical Design Automation 61 Maryland, 1996. Alpert /Handbook of Algorithms for Physical Design Automation AU7 242_ C019 Finals Page 396 10-10-2008 #21 396 Handbook of Algorithms for Physical Design Automation 16. T. F. Chan,J Alpert /Handbook of Algorithms for Physical Design Automation AU7 242_ C019 Finals Page 392 10-10-2008 #17 392 Handbook of Algorithms for Physical Design Automation 19.3.4 RELAXATION Iterative