Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C025 Finals Page 532 29-9-2008 #25 532 Handbook of Algorithms for Physical Design Automation 36. Elmore, W. C. The transient response of damped linear networks with particular regard to wide-band amplifiers. Journal of Applied Physics 19(1): 55–63, 1948. 37. Lin, T. M. and Mead, C. A. Signal delay in general RC-networks. IEEE Transactions Computer-Aided Design C AD-3(4): 331–349, October 1984. 38. Rubinstein, J., Penfield, P. , and Horowitz, M. A. Signal delay in RC tree networks. IEEE Tr ansactions Computer-Aided Design 2(3): 202–211, 1983. 39. Tsay,R. S. Exact zeroskew.InPr oceedings of the IEEE International Conference Computer-Aided Design, Santa Clara, CA, N ovember 1991, pp. 336–339. 40. Alpert, C. J., Hu, T. C., Huang, J. H., Kahng, A. B., and Karger, D. Prim-Dijkstra tradeoffs for improved performance-driven routing tree design. IEEE Transactions Computer-Aided Design 14(7): 890–896, July 1995. (ISCAS 1993). 41. Awerbuch, B., Baratz, A., and Peleg, D. Cost-sensitive analysis of communication protocols. In Proceed- ings of the ACM Symposium Principles of Distributed Computing, Quebec City, Quebec, Canada, 1990, pp. 177–187. 42. Cong, J., Kahng, A. B., Robins, G., Sarrafzadeh, M., and Wong, C. K. Prov ably good algorithms for performance-driven global routing. In Proceedings of the IEEE International Symposium Circuits and Systems, San Diego, CA, May 1992, pp. 2240–2243. 43. Cong, J., Kahng, A. B., Robins, G., Sarrafzadeh, M., and Wong, C. K. Provably good performance-driven global routing. IEEE Transactions Computer-Aided Design 11(6): 739–752, 1992. 44. Khuller, S., Raghavachari, B., and Young, N. Balancing minimum spanning and shortest path trees. In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, Austin, TX, January 1993, pp. 243–250. 45. Boese, K. D., Kahng, A. B., McCoy, B. A., and Robins, G. Fidelity and near-optimality of Elmore-based routing constructions. In Proceedings of the IEEE International Conference Computer Design, Cambridge, MA, October 1993, pp. 81–84. 46. Boese, K. D., Kahng, A. B., M cCoy, B. A., and Robins, G. Rectilinear Steiner trees with minimum Elmore delay. In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, J une 1994, pp. 381–386. 47. Boese, K. D., Kahng, A. B., and Robins, G. High-performance routing trees with identified critical sinks. In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993, pp. 182–187. 48. Lillis, J., Cheng, C. K., Lin, T. -T. Y., and Ho, C. -Y. New performance driven routing techniques with explicit area/delay tradeoff and simultaneous wire sizing. In Proceedings of the ACM/IEEE Design Automation Conference, Las Vegas, NV, 1996, pp. 395–400. 49. Chen, H., Cheng, C. -K., Kahng, A., M˘andoiu, I. I., Wang, Q ., and Yao., B. The y-architecture for on-chip interconnect: Analysis and methodology. IEEE Transactions Computer-Aided Design 24(4): 588–599, April 2005. 50. Chen, H., Cheng, C. -K., Kahng, A. B., M˘andoiu, I., and Wang, Q. Estimation of wirelength reduction for λ-geometry vs. Manhattan placement and routing. In Proceedings of t he ACM International Workshop on System-Level Interconnect Pr ediction, Monterey, CA, 2003, pp. 71–76. 51. Koh, C. -K. and Madden, P. H. Manhattan or non-Manhattan?: A study of alternative VLSI routing architectures. In Proceedings of the Great Lakes Symposium VLSI, Chicago, IL, 2000, pp. 47–52. 52. Li, Y. Y., Cheung, S. K., Leung, K. S., and Wong, C. K. Steiner tree construction in λ 3 -metric. IEEE Transactions Circuits and Systems-II: Analog and Digital Signal Processing 45(5): 563–574, May 1998. 53. Nielsen, B. K., Winter, P., and Zachariasen, M. An exact algorithm for the uniformly-oriented Steiner tree problem. In Proceedings of the European Symposium on Algorithms, Lecture Notes in Computer Science 2461. Springer-Verlag, Rome, I taly, 2002, pp. 760–771. 54. Sarrafzadeh, M. and Wong, C. K. Hierarchical Steiner tree construction in uniform orientations. IEEE Transactions Computer-Aided Design 11(9): 1095–1103, September 1992. 55. Teig, S. The x architecture: Not your father’s diagonal wiring. In Proceedings of the ACM International Workshop on System-Level Interconnect Prediction, San Diego, CA, 2002, pp. 33–37. 56. Yildiz, M. C. and Madden, P. H. Preferred direction Steiner trees. In Pr oceedings of the Gr eat Lakes Symposium VLSI, West Lafayette, I N, 2001, pp. 56–61. 57. Dijkstra, E. W. A note on two problems in connection with graphs. Numerische Mathematik 1: 269–271, 1959. 58. Prim, A. Shortest connecting networks and some gener alizations. Bell System Technical Journal 36: 1389–1401, 1957. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C025 Finals Page 533 29-9-2008 #26 Timing-Driven Interconnect Synthesis 533 59. Cong, J., Kahng, A. B., Robins, G., Sarrafzadeh, M., and Wong, C. K. Performance-driven global routing for cell based ICs. In Proceedings of the IEEE International Confer ence Computer Design, Cambridge, MA, October 1991, pp. 170–173. 60. Robins, G. and Zelikovsky, A. Improved Steiner tree approximation in graphs. In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, San Francisco, CA, January 2000, pp. 770–779. 61. Kahng, A. B. and Robins, G. O n performance bounds for a class of rectilinear Steiner tree heuristics in arbitrary dimension. IEEE Transactions Computer-Aided Design 11(11): 1462–1465, November 1992. 62. Griffith, J., Robins, G., Salowe, J. S., and Zhang, T. Closing the gap: Near-optimal Steiner trees in polynomial time. IEEE Transactions Computer-Aided Design 13(11): 1351–1365, November 1994. 63. Kahng, A. B. and Robins, G. A new class of iterative Steiner tree heuristics with good performance. IEEE Transactions Computer-Aided Design 11(7): 893–902, July 1992. 64. Cong, J., Leung, K. S., and Zhou, D. Performance-driven in terconnect design basedon distributed RC delay model. In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, June 1993, pp. 606–611. 65. Nastansky, L., Selkow, S. M., and Stewart, N. F. Cost-minima trees in directed acyclic graphs. Zeitschrift for Operations Researc h 18: 59–67, 1974. 66. de Matos, R. R. L. A Rectilinear Arborescence Problem. PhD thesis, University of Alabama, Tuscaloosa, Alabama, 1979. 67. Ho, J. M., Ko, M. T., Ma, T. H., and Sung, T. Y. Algorithms for rectilinear optimal multicast tree problem. In Proceedings of the International Symposium on Algorithms and Computation, Nagoya, Japan, June 1992, pp. 106–15. 68. Leung, K. -S . and Cong, J. Fast optimal algorithms for the minimum rectilinear Steiner arborescence problem. In Proceedings of the IEEE International Symposium C ircuits and Systems, Vol. 3, Hong Kong, 1997, pp. 1568–1571. 69. Rao, S. K., Sadayappan, P., Hwang, F. K., and Shor, P. W. The rectilinear Steiner arborescence problem. Algorithmica 7(1): 277–288, 1992. 70. Trubin, V. A. Subclass of the Steiner problems on a plane with rectilinear metric. Cybernetics and Systems Analysis 21(3): 320–322, 1985. 71. Shi, W. and Su, C. The rectilinear Steiner arborescence problem is np-complete. SIAM Journal of Computation 35(3): 729–740, 2006. 72. Cordova, J. and Lee,Y.H. A heuristic algorithm for therectilinear Steiner arborescence problem.Technical Report TR-94-025, University of Florida, Gainesville, FL, 1994. 73. Alexander, M. J. and Robins, G. New performance-driven FPGA routing algorithms. IEEE Transactions Computer-Aided Design 15(12): 1505–1517, December 1996. 74. Kou, L., Markowsky, G., and Berman, L. A fast algorithm for Steiner trees. Acta Informatica 15: 141– 145, 1981. 75. Cong, J., Kahng, A. B., and Leung, K. -S. Efficient algorithms for the mi nimum shortest path Steiner arborescence problem with applications to VLSI physical design. IEEE Transactions Computer-Aided Design 17(1): 24–39, January 1998. 76. Robins, G. On Optimal Interconnections. PhD thesis, Department of Computer Science, UCLA, CSD- TR-920024, Los Angeles, CA, 1992. 77. Zhou, D., Tsui, F., and Gao, D. S. High performance multichip interconnection design. In Proceedings of the ACM/SIGDA Physical Design Workshop, Lake A rrowhead, CA, April 1993, pp. 32–43. 78. Sriram, M. and Kang, S. M. Performance driven MCM routing using a second order RLC tree delay model. In IEEE International Conference on Wafer Scale Integration, San Francisco, CA, January 1993, pp. 262–267. 79. Alpert, C. J., Gandham, G., Hrkic, M., Hu, J., Kahng, A. B., Lillis,J., Liu, B., Quay, S. T., Sapatnekar, S. S., and Sullivan, A.J.BufferedSteiner trees fordifficult instances. IEEE Transactions Computer-Aided Design 21(1): 3–14, January 2002. 80. Ganley, J. L. Accuracy and fidelity of fast net length estimates. Integration: The VLSI Journal 23(2): 151–155, 1997. 81. Hong, X., Xue, T., Kuh, E. S., Cheng, C. K., and Huang, J. Performance-driven Steiner tree algorithms for global routing. I n Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993, pp. 177–181. 82. Hu, J. and Sapatnekar, S. S. Algorithms for non-Hanan-based optimization for VLSI interconnect under a h igher order awe model. IEEE Transactions Computer-Aided Design 19(4): 446–458, April 2000. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C025 Finals Page 534 29-9-2008 #27 534 Handbook of Algorithms for Physical Design Automation 83. Hu, J. and Sapatnekar, S. S. A timing-constrained simultaneous global routing algorithm. IEEE Transactions Computer-Aided Design 21(9): 1025–1036, September 2002. 84. Peyer, S., Zachariasen, M., and Grove, D. J. Delay-related secondary objectives for rectilinear Steiner minimum trees. Discrete and Applied Mathematics 136(2): 271–298, February 2004. 85. Wu, D., Hu, J., and Mahapatra, R. Coupling aware timing optimization and antenna avoidance in layer assignment. In Proceedings of the International Symposium on Physical Design. ACM Press, New York, 2005, pp. 20–27. 86. Hanan, M. On Steiner’s problem with rectilinear distance. SIAM Journal of Applied Mathematics 14: 255– 265, 1966. 87. Zachariasen, M. A catalog of Hanan grid problems. Networks—An International Journal 38(2): 76– 83, 2001. 88. Hou, H., Hu, J., and Sapatnekar, S. S. Non-Hanan routing. IEEE Transactions Computer-Aided Design 18(4): 436–444, April 1999. 89. Fisher , A. L. and Kung, H. T. Synchronizing lar g e systolic arrays. In Proceedings of SPIE, Arlington, VA , May 1982, pp. 44–52. 90. Friedman, E. G. Clock distribution design in VLSI circuits—an overview. In Proceedings of the IEEE International Symposium Circuits and Systems, Chicago, IL, May 1993, pp. 1475–1478. 91. Pullela, S., Menezes, N., and Pillage, L. T. Reliable non-zero skew clock trees using wire width optimiza- tion. In Pr oceedings of the ACM/IEEE Design Au tomation Conference, San Die go, CA, 1993, pp.165–170. 92. Zhu, Q., Dai, W. W. M., and Xi, J. G. Optimal sizing of high-speed clock networks based on distributed RC and lossy transmission line models. In Proceedings of the IEEE International Confer ence Computer-Aided Design, 1993, pp. 628–633. 93. Dutta, R. and Marek-Sadowska, M. Algorithm for wire sizing of power and ground networks in VLSI designs. Journal of C ircuits, Systems and Computers 2: 141–157, June 1992. 94. Cong, J. , and Leung, K. S. Optimal wiresizing under the distributed Elmore d elay model. In Proceedings of the IEEE International Conference Computer-Aided Design, 1993, pp. 634–639. 95. Hodes, T. D., McCoy, B. A., and Robins, G. Dynamically-wiresized Elmore-based routing constructions. In Proceedings of the IEEE International Symposium Circuits and Systems, Vol. I , London, United Kingdom, May 1994, pp. 463–466. 96. Sapetnekar , S. RC interconnect optimization under the Elmore delay model. In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, June 1994, pp. 387–391. 97. Erhard, K. H. and Johannes, F. M. Power/ground networks in VLSI: Are general graphs better than trees? Integration: The VLSI Journal 14(1): 91–109, November 1992. 98. Erhard, K. H., Johannes, F. M., and Dachauer, R. Topology optimization techniques for power/ground networks in VLSI. In Proceedings of the European Design Automation Conference, Hamburg, Germany, September 1992, pp. 362–367. 99. Lin, S. and Wong, C. K. Process-variation-tolerant clock skew minimization. In Proceedings of the IEEE International Conference Computer-Aided Design, San Jose, CA, November 1994, pp. 284–288. 100. Chan, P. K. and Karplus, K. Computing signal delay in general RC networks by tree/link partitioning. IEEE Transactions Computer-Aided Design 9(8): 898–902, August 1990. 101. Martin, D. and Rumin, N. C. Delay prediction from resistance-capacitance models of general MOS circuits. IEEE Tr ansactions Computer-Aided Design 12(7): 997–1003, July 1993. 102. Kahng, A. B., Liu, B., and Mandoiu, I. I. Non-tree routing for reliability and yield improvement. IEEE Transactions Computer-Aided Design 23(1): 148–156, 2004. 103. Hu, S., Li, Q ., Hu, J., and Li, P. Steiner network construction for timing critical nets. In Proceedings of the ACM/IEEE Design Automation Conference, 2006, pp. 379–384. 104. Borah, M., Owens, R. M., and Irwin, M. J. An edge-based heuristic for Steiner routing. IEEE Transactions Computer-Aided Design 13: 1563–1568, 1994. 105. Qiu, W. and Shi, W. Minimum moment Steiner trees. In P r oceedings of the ACM/SIAM S ymposium Discrete Algorithms, 2004, pp. 488–495. 106. Saxena, P., Menezes, N., Cocchini, P., and Kirkpatrick, D. A. Repeater scaling and its impact on CAD. IEEE Transactions Computer-Aided Design 23(4): 451–463, April 2004. 107. Hrkic, M. and Lillis, J. Buffer tree synthesis with consideration of temporal locality, sink polarity requirements, solution cost, congestion and blockages. IEEE Transactions Computer-Aided Design 22(4): 481–491, April 2003. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 535 29-9-2008 #2 26 Buffer Insertion Basics Jiang Hu, Zhuo Li, and Shiyan Hu CONTENTS 26.1 Motivation 535 26.2 Optimizationof Two-Pin Nets 536 26.3 van Ginneken’s Algorithm 538 26.3.1 Concept of CandidateSolution 538 26.3.2 Generating Candidate Solutions 539 26.3.2.1 Wire Insertion 539 26.3.2.2 Buffer Insertion 539 26.3.2.3 Branch Merging 539 26.3.3 Inferiority and Pruning Identification 540 26.3.4 Pseudocode 540 26.3.5 Example 540 26.4 van Ginneken Extensions 542 26.4.1 Handling Library with Multiple Buffers 542 26.4.2 Library with Inverters 542 26.4.3 Polarity Constraints 542 26.4.4 Slew and Capacitance Constraints 543 26.4.5 Integration with Wire Sizing 543 26.4.6 Noise Constraints with Devgan Metric 544 26.4.6.1 Devgan’s Coupling Noise Metric 544 26.4.6.2 Algorithm ofBuffer Insertion with NoiseAvoidance 546 26.4.7 Higher Order Delay Modeling 546 26.4.7.1 Higher Order Point Admittance Model 547 26.4.7.2 Higher OrderWire Delay Model 548 26.4.7.3 Accurate Gate Delay 549 26.4.8 Flip-Flop Insertion 549 26.5 Speedup Techniques 550 26.5.1 Recent Speedup Results 550 26.5.2 Predictive Pruning 551 26.5.3 Convex Pruning 552 26.5.4 Efficient Way to Find Best Candidates 553 26.5.5 Implicit Representation 554 References 555 26.1 MOTIVATION When the VLSI technology scales, gate delay and wire delay change in opposite directions. Smaller devices imply less gate-switching delay. In contrast, thinner wire size leads to increased wire resi- stance and greater signal propagation delay along wires. As a result, wire delay has become 535 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 536 29-9-2008 #3 536 Handbook of Algorithms for Physical Design Automation a dominating factor for VLSI circuit performance. Further, it is becoming a limiting factor to the progress of VLSI technology. This is the well-known interconnect challenge [1–3]. Among many techniquesaddressingthis challenge[4,5],buffer(orrepeater)insertion is such an effectivetechnique that it is an indispensable necessity for timing closure in submicron technology and beyond. Buffers can reduce wire delay by restoring signal strength, in particular, for long wires. Moreover, buffers can be applied to shield capacitive load from timin g-critical p aths such that the interconnect delay along critical paths are reduced. As the ratio of wire delay to gate delay increases from one technology to the next, more and more buffersarerequired to achieve performance goals. The buffer scaling is studied by Intel an d the results are reported in Ref. [6]. One metric that reveals the scaling is critical buffer length, the minimum distance beyond which inserting an optimally placed and sized buffer makes the interconnect delay less than that of the corresponding unbuffered wire. When wire delay increases because of the technology scaling, the critical buffer length becomes shorter, i.e., the distance that a buffer can comfortably drive shrinks. According to Ref. [6], the critical buffer length decreases by 68 percent when the VLSI technology migrates from 9 0 to 45 nm (for two generations). Please note that the critical buffer-length scaling significantly outpaces the VLSI technology scaling, which is roughly 0.5× for every two generations. If we look at the percentage of block level nets requiring buffers, it grows from 5.8 percent in 90-nm technology to 19.6 percent in 45-nm technology [6]. Perhaps the most alarming result is the scaling of buffer count [6], which predicts that 35 percent of cells will be buffers in 45-nm technology as opposed to only 6 percent in 90-nm technology. The dramatic buffer scaling undoubtedly generates large and profound impact to VLSI circuit design. With millions of buffers required per chip, almost nobody can afford to neglect the importance of buffer insertion as compared to a decade ago when only a few thousands of buffers are needed for a chip [7]. Because of this importance, buffer insertion algorithms and methodologies need to be deeply studied on various aspects. First, a buffer insertion algorithm should deliver solutions of high quality because interconnect and circuit performance largely depend on the way that buffers are placed. Second, a buffer insertion algorithm needs to be sufficiently fast so that millions of nets can be optimized in reasonable time. Third, accurate delay m odels are necessary to ensure that buffer insertion solutions are reliable. Fourth, buffer insertion techniques are expected to simultaneously handle multiple objectives, such as timing, power, and signal integrity, and their trade-offs. Last but not the least, buffer insertion should interact with other layout steps, such as placement and routing, as the sheer number of buffers has already altered the landscape of circuit layout design. Many of these issues will be discussed in subsequent sections and other chapters. 26.2 OPTIMIZATION OF TWO-PIN NETS For buffer insertion, perhaps the most simple case is a two-pin net, which is a wire segment with a driver (source) at one end and a sink at the other end. The simplicity allows closed form solutions to buffer insertion in two-pin nets. If the delay of a two-pin net is to be minimized by using a single buffer type b , one needs to decide the number of buffers k and the spacing between the buffers, the source and the sink. First, let us look at a very simple case to attain an intuitive understanding of the problem. In this case, the length of the two-pin net is l and the wire resistance and capacitance per unit length are r and c, respectively. The number of buffers k has been given and is fixed. The driver resistance is the same as the buffer output resistance R b . The load capacitance of the sink is identical to buffer input capacitance C b . The buffer has an intrinsic delay of t b .Thek buffers separates the net into k + 1 segments, with length of l = (l 0 , l 1 , , l k ) T (Figure 26.1). Then, the Elmore delay of this net can be expressed as t( l) = k i=0 αl 2 i + βl i + γ (26.1) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 537 29-9-2008 #4 Buffer Insertion Basics 537 Driver Sin k k1 l 0 l 1 l 2 l k 23 FIGURE 26.1 Buffer insertion in a two-pin net. where α = 1 2 rc, β = R b c +rC b ,andγ = R b C b + t b . A formal problem formulation is minimize t( l) (26.2) subject to g( l) = l − k i=0 l i = 0 (26.3) According to the Kuhn–Tucker condition [8], the following equation is the necessary condition for the optimal solution. ∇t( l) +λ ∇g( l) = 0 (26.4) where λ is the Lagrangian multiplier. According to the above condition, it can be easily derived that l i = β λ −2α , i = 0, 1, , k (26.5) Because α, β,andλ are all constants, it can be seen that the buffers need to be equally spaced to minimize the delay. This is an important conclusion that can be treated as a rule of thumb. The value of the Lagrangian multiplier λ can be found by plugging Equation 26.5 into Equation 26.3. In more general cases, the driver resistance R d may be different from that of buffer output resis- tance and so is the sink capacitance C L . For such cases, the optimum number of buffers minimizing the delay is given by Ref. [9] k = − 1 2 + 1 + 2[rcl + r(C b − C L ) − c ( R b − R d ) ] 2 rc ( R b C b + t b ) (26.6) The length of each segment can be obtained through [9] l 0 = 1 k +1 l + k ( R b − R d ) r + C L − C b c l 1 = = l k−1 = 1 k + 1 l − R b − R d r + C L − C b c (26.7) l k = 1 k +1 l − R b − R d r − k ( C L − C b ) c A closed form solution to simultaneous buffer insertion/sizing and wire sizing is reported in Ref. [10]. Figure 26.2 shows an example of this simultaneous optimization. The wire is segmented into m pieces. The length l i and width h i of each wire piece i are the variables to be optimized. There are k buffers inserted between these pieces. The size b i of each buffer i is also a decision variable. A buffer location is indicated by its surrounding wire pieces. For example, if the set of wire pieces between buffer i − 1andi is P i−1 , the distance between the two buffers is equal to j∈P i−1 l j . There are two important conclusions [10] for the optimal solution that minimizes the delay. First, all wire pieces have the same length, i.e., l i = l m , i = 1, 2, , m. Second, for wire pieces P i−1 ={p i−1,1 , p i−1,2 , , p i−1,m i−1 } between buffer i − 1andi, their widths satisfy h i−1,1 > h i−1,2 > > h i−1,m i−1 and form a geometric progression. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 538 29-9-2008 #5 538 Handbook of Algorithms for Physical Design Automation h 1 h 2 l 2 l 1 b 1 b k l m h m segments segments m 0 m k FIGURE 26.2 Example of simultaneous buffer insertion/sizing and wire sizing. 26.3 VAN GINNEKEN’S ALGORITHM For a general case of signal nets, which may have multiple sinks, van Ginneken’s algorithm [11] is perhaps the first systematic approach on buffer insertion. For a fixed signal routing tree and given candidate buffer locations, van Ginneken’s algorithm can find the optimal buffering solution that maximizes timing slack accordingtothe Elmoredelaymodel.Ifthereare n candidate buffer locations, its computation complexity is O(n 2 ). Based on van Ginneken’s algorithm, numerousextensions have been made, such as handling of multiple buffer types, trade-off with power and cost, addressing slew rate and crosstalk noise, and using accurate delay models and speedup techniques. These extensions will be covered in subsequent sections. At a high level, van Ginneken’s algorithm [11]proceeds bottom-up from the leaf nodes toward the driver along a given routing tree. A set of candidate solutions keep updated during the process, where three operations adding wire, inserting buffers, and branch merging may be performed. Meanwhile, the inferior solutions are pruned to accelerate the algorithm. After a set of candidate solutions are propagated to the source, the solution with the maximum required arrival time is selected as the final solution. For a routing tree with n buffer positions, the algorithm computes the optimal buffering solution in O(n 2 ) time. A net is given as a binary routing tree T = (V, E),whereV ={s 0 }∪V s ∪ V n ,andE ⊆ V × V. Vertex s 0 is the source vertex and also the root of T , V s is the set of sink vertices, and V n is the set of internal vertices. In the existing literatu res, s 0 is also referred as driver. Denote by T(v) thesubtreeof T rooted at v. Each sink vertex s ∈ V s is associated with a sink capacitance C(s) and a required arrival time (RAT). Each edge e ∈ E is associated with lumped resistance R(e) and capacitance C(e).A buffer library B containing all the possible buffer types that can be assigned to a buffer position is also given. In this section, B contains only one buffer type. Delay estimation is obtained using the Elmore delay model, which is described in Chapter 3. A buffer assignment γ is a mapping γ : V n → B ∪{ ¯ b} where ¯ b denotes that no buffer is inserted. The timing buffering problem is defined as follows. Timing-driven buffer insertion problem: Given a binary routing tree T = (V, E), possible buffer positions, and a buffer library B, compute a buffer assignment γ such that the RAT at driver is maximized. 26.3.1 CONCEPT OF CANDIDATE SOLUTION A buffer assignment γ is also called a candidate solution for the timing buffering problem. A partial solution, denoted by γ v , refers to an incomplete solution where the buffer assignment in T (v) has been determined. The Elmore delay from v to any sink s in T(v) under γ v is computed by D ( s, γ v ) = e= ( v i ,v j ) [D ( v i ) + D ( e ) ] where the sum is taken over all edges along the path from v to s. The slack of vertex v under γ v is defined as Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 539 29-9-2008 #6 Buffer Insertion Basics 539 Q(γ v ) = min s∈T(v) { RAT(s) −D(s, γ v ) } At any vertex v, the effect of a partial solution γ v to its upstream part is characterized by a (Q(γ v ), C(γ v )) pair, where Q is the slack at v under γ v and C is the downstream capacitance viewing at v under γ v . 26.3.2 GENERATING CANDIDATE SOLUTIONS van Ginneken’s algorithm proceeds bottom-up from the leaf nodes toward the driver along T.A set of candidate solutions, denoted by , are kept updated during this process. There are three operations through solution propagation, namely, wire insertion, bufferinsertion, and branch merging (Figure 26.3). We are to describe them in turn. 26.3.2.1 Wire Insertion Suppose that a partial solution γ v at position v propagates to an upstream position u and there is no branching point in between. If no buffer is placed at u, then only wire delay needs to be considered. Therefore, the new solution γ u can be computed as Q(γ u ) = Q(γ v ) − D(e) C(γ u ) = C(γ v ) + C(e) (26.8) where e = (u, v) and D(e) = R(e) C(e) 2 + C(γ v ) . 26.3.2.2 Buffer Insertion Suppose that we add a buffer b at u. Denote by R(b), K(b) the driving resistance and the intrinsic delay of buffer b, respectively. γ u is then updated to γ u where Q(γ u ) = Q(γ u ) − R(b) ·C(γ u ) + K(b) C(γ u ) = C(b) (26.9) 26.3.2.3 Branch Merging When two branches T l and T r meet at a branching point v, l and r , which correspond to T l and T r , respectively, are to be merged. The merging process is performed as follows. For each solution γ l ∈ l and each solution γ r ∈ r , generate a new solution γ according to C(γ ) = C(γ l ) + C(γ r ) Q(γ ) = min { Q(γ l ), Q(γ r ) } (26.10) The smaller Q is picked since the worst-case circuit performance needs to be considered. u (a) Wire insertion (b) Buffer insertion v T(v) u T(u) (c) Branch merging v 1 v 2 T(v 2 ) T(v 1 ) v FIGURE 26.3 Operations in van Ginneken’s algorithm. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 540 29-9-2008 #7 540 Handbook of Algorithms for Physical Design Automation 26.3.3 INFERIORITY AND PRUNING IDENTIFICATION Simply propagating all solutions by the above three operations makes the solution set grow expo- nentially in the number of buffer positions processed. An effective and efficient pruning technique is necessary to reduce the size of the solution set. This motivates an important concept—inferior solution—in van Ginneken’s algorithm. For any two partial solutions γ 1 and γ 2 at the same vertex v, γ 2 is inferior to γ 1 if C(γ 1 ) ≤ C(γ 2 ) and Q (γ 1 ) ≥ Q(γ 2 ). Whenever a solution becomes inferior, it is pruned from the solution set. Therefore, only solutionsthat excel in at least one aspect of downstream capacitance and slack can survive. For an efficient pruning implementation and thus an efficient buffering algorithm, a sorted list is used to maintain the solution set. The solution set is increasingly sorted according to C, and thus Q is also increasingly sorted if does not contain any inferior solutions. By a straightforward implementation, when adding a wire, the number o f candidate solutions will not change; when inserting a buffer, only one new candidate solution will be introduced. More efforts are needed to merge two branches T l and T r at v. For each partial solution in l ,findthefirst solution with larger Q value in r . If such a solution does not exist, the last solution in r will be taken. Because l and r are sorted, we only need to traverse them once. Partial solutions in r are similarly treated. It is easy to see that after merging, the number of solutions is at most | l |+| r |. As such, given n buffer positions, at most n solutions can be generated at any time. Consequently, the pruning procedure at any vertex in T runs in O(n) time. 26.3.4 PSEUDOCODE In van Ginneken’s algorithm, a set of candidate solutions are propagated from sinks to driver. Along a branch, after a candidate buffer location v is processed, all solutions are propagated to its upstream buffer location u through wire insertion. A buffer is then inserted to each solution to obtain a new solution. Meanwhile, inferior solutions are pruned. At a branching point, solution sets from all branches are merged by merging process. In this way, the algorithm proceeds in the bottom-up fashion and the solution with maximum required arrival time at driver is returned. Given n buffer positions in T, van Ginneken’s algorithm can compute a buffer assignment with maximum slack at driver in O(n 2 ) time, because any operation at any node can be performed in O(n ) time. Refer to Figure 26.4 for the pseudocode of van Ginneken’s algorithm. 26.3.5 EXAMPLE Let us look at a simple example to illustrate the work flow of van Ginneken’s algorithm. Refer to Figure 26.5. Assume that there are three nondominated solutions at v 3 whose (Q, C) pairs are (200, 10), (300, 30),and ( 500, 50 ) and there are two nondominated solutions at v 2 whose (Q, C) pairs are ( 290, 5 ) and ( 350, 20 ) We first prop agate them to v 1 through wire insertion. Assume that R (v 1 , v 3 )=3andC(v 1 , v 3 )=2. Solution (200, 10) at v 3 becomes (200 −3 · (2/2 + 10),10+ 2) = (167, 12) at v 1 . Similarly, the other two solutions become (207, 32) and (347, 52). Assume that R(v 2 , v 3 ) = 2andC(v 2 , v 3 ) = 2, solutions at v 2 become (278, 7) and (308, 22) at v 1 . We are now to merge these solutions at v 1 . Denote by l the solutions propagated from v 3 and by r the solutions propagated from v 2 . Before merging, partial solutions in l are ( 167, 12 ) , ( 207, 32 ) ,and ( 347, 52 ) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 541 29-9-2008 #8 Buffer Insertion Basics 541 Algorithm: van Ginneken’s algorithm Input: T : routing tree, B : buffer library Output: γ which maximizes slack at driver 1. for each sink s , build a solution set {γ s }, where Q (γ s ) = RAT ( s ) and C (γ s ) = C ( s ) 2. for each branching point/driver v t in the order given by a postorder traversal of T, let T be each of the branches T 1 ,T 2 of v t and be the solution set corresponding to T , do 3. for each wire e in T , in a bottom-up order, do 4. for each γ ∈ , do 5. C (γ ) = C (γ ) + C ( e ) 6. Q (γ ) = Q (γ ) − D ( e ) 7. prune inferior solutions in 8. if the current position allows buffer insertion, then 9. for each γ ∈ , generate a new solution γ 10. set C (γ ) = C ( b ) 11. set Q (γ ) = Q (γ ) − R ( b ) · C (γ ) − K ( b ) 12. = {γ } and prune inferior solutions 13. // merge 1 and 2 to v t 14. set v t =∅ 15. for each γ 1 ∈ 1 and γ 2 ∈ 2 , generate a new solution γ 16. set C (γ ) = C (γ 1 ) + C (γ 2 ) 17. set Q (γ ) = min{ Q (γ 1 ) ,Q (γ 2 )} 18. v t = v t {γ } and prune inferior solutions 19. return γ with the largest slack FIGURE 26.4 v an Ginneken’s algorithm. and partial solutions in r are ( 278, 7 ) and ( 308, 22 ) After branch merging, the new candidate partial solutions whose Q are dictated by solutions in l are ( 167, 19 ) , ( 207, 39 ) ,and ( 308, 74 ) and those dictated by solutions in r are ( 278, 59 ) and ( 308, 74 ) V 2 S 1 S 3 S 4 V 1 V 3 S 2 S 0 FIGURE 26.5 Example for performing van Ginneken’s algorithm. . become 535 Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 536 29-9-2008 #3 536 Handbook of Algorithms for Physical Design Automation a dominating factor for VLSI. Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C025 Finals Page 532 29-9-2008 #25 532 Handbook of Algorithms for Physical Design Automation 36. Elmore, W geometric progression. Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C026 Finals Page 538 29-9-2008 #5 538 Handbook of Algorithms for Physical Design Automation h 1 h 2 l 2 l 1 b 1 b k l m h m segments segments m 0 m k FIGURE