Handbook of algorithms for physical design automation part 48 doc

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 452 24-9-2008 #7 452 Handbook of Algorithms for Physical Design Automation interconnect regions as a result of clustering. The third approach is to balance the perimeter-degree of the partitions during the partitioning phase of the multilevel placement. One could use a multi- constraint partitioner to solve this problem of balancing area and perimeter-degree simultaneously. Alternatively, one could satisfy both the constraints in a sequential manner by first balancing the perimeter-degree and then balancing the areas of the partitions. The authors present extensive empirical data to validate their claims about fidelity of perimeter- degree as a simple and effective metric to homogenize interconnection complexity. 22.3 GLOBAL-PLACEMENT CONGESTION IMPROVEMENT There havebeen several studiesonincorporating thecongestionmetric duringglobal-placement stage of the physicalimplementation flow.Local wiringhas a big impact onfinal congestion characteristics of a design. Hence historically, it has been difficult to robustly address congestion during the global- placement stage. There have been major advances in addressing congestion during g lobal placement over the past decade. In this section, we detail some of these approaches. 22.3.1 INCORPORATING CONGESTION ESTIMATION DURING GLOBAL PLACEMENT The placement algorithms need to have a good and fast estimation of wiring requirements if they intend to target congestion as a metric during the placement process. Other chapters in this book detail several of these wiring density estimation approaches [10,19,27,39]. Works [6,29] have proposed to incorporate congestion estimation techniques within partitioning-driven quadratic placers in interesting ways. The authors o f Ref. [29] base their work on quadratic placement engine that solves an uncon- strained minimization p roblem, the objective function of which is the squared wirelength of the netlist. Because the qua dratic solution in general has many overlapping cells, the overlap is resolved by partitioning the solution and iterating over the partitioned solutions [23]. The quadratic wirelength minimum solution serves to guidea mincut partitioning routine. After each round of partitioning, the cells in a region are constrained to the particular region by introducing a center of gravity constraint for each region for subsequent quadratic minimization formulatio ns. Figure 22.5a illustrates the proposed congestion-driven placement methodology. Before each successive placement, internal route estimation and a region-based global route are performed on each region to estimate routing supply- demand ratios. These ratios are used to influence the placer into growing or shrinking the partitioned regions based on resource demand and supply. The region router estimates the routing demand of wires spanning multiple regions. The region router is implemented using a A ∗ algorithm [5] on the region-based graph. Once routing demand is computed, this information is used to grow or shrink the regions of the current placement. Regions with higher routing demand are allocated more white space by growing them. For q regions in a p lacement stage, a growth matrix G is defined as an (n −q ×n − q) diagonal matrix with entry g ii equal to the region weight of the independent cell x i . The growth matrix is computed as follows. After a round of quadratic minimization and partitioning, a set of new regions is generated. Congestion analysis based on the router is performed on this new set of regions. The routing cost is divided into two parts: (1) external cost computed using the region Quadratic solver Congestion estimator Partition Region router Expansion Compression (a) (b) FIGURE 22.5 (a) Congestion-driven placement methodology and (b) example of region growth relieving congestion. (From Parakh, P. N., Brown, R. B., and Sakallah, K. A., DAC, 1998.) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 453 24-9-2008 #8 Congestion-Driven Physical Design 453 router for nets spann ing multiple regions and (2) internal cost computed using a line-probe router for intraregion routes. The difference between routing supply and demand of a region determine the extra routing tracks required and hence, the growth required by the region. The reader is referred to the original publication [29] for details on how the growth matrix is embedded in the quadratic formulation of the center-of-gravity constraints. The effect of the growth matrix terms in the quadratic formulation is to effectively grow and shrink the regions based on the growth matrix. Reduction in congestion occurs due to the ability to transform horizontal routes into vertical and vice versa. This is shown in Figure 22.5b, where for a vertically congested region vertical expansion is produced. The work in Ref. [6] also targets a flow to avoid routing congestion during the global p lacement. The main contribution is to use a fast but reliab le way to detect rou ting criticalities and then u se the informatio n effectively in a partitioning-driven placer. The techniq ues are tested on real-world large industrial design with very good results. The frameworkfor their studies is based on a four-way partitioning (quadrisection)-based quadratic placer,BonnPlace [37].The placement beginsby solving a quadratic wirelength minimum solution as describe in the previous paragraph. This overlapping placement is then used b y a quadrisection routine to generate four partitions of the netlist. The objective of the quadrisection is to divide the area of the die into approximately equal regions and assign cells to each region such that density requirements for each region are not violated and the displacement of the cells from their initial q uadratic wirelength minimum location s is minimized. Center-of-gravity constraints are then imposed on the cells assigned to their particular regions for subsequent quadratic wirelength minimization. This process is iterated till the number of cells in a region issmall enough fordetail placementtechniques to place. Like thework in Ref. [29], the authors of Ref. [6] also try to relieve congestion during global-placement iterations by allocating more white space to perceived congested areas. However, the mechanism is very different. First, we describe the measurement of congestion of a placement as it appears during the placement algorithm as proposed in Ref. [6]. Given a chip that is partitione d into k ×k regions (forming a (k +1) ×(k +1) placement grid), pin density for each region (for local congestion cost) and a congestion estimation for each edge in the dual graph of the placement grid (for the intraregion congestion cost) is computed. For a fast estimation of a global route, probabilistic global router is u sed to route the intraregion nets. During a placement level (iteration), the current status of the placement grid is used as the global routing partition. Figure 22.6b shows the routing grid as a dual of the placement grid. Multiterminal nets are split into sets of two-terminal nets using Steiner tree decomposition of the net. For each two point connection in the Steiner tree of a net, probabilistic routing similar to the algorithms proposed in Refs. [27,39] is used. These probabilities are then added over all nets for each routing grid box to give the expected usage p(e). Figure 22.6c shows how probabilities are calculated for two (c) (b) Steiner tree and routing grid (a) Placement grid 0.20.20.2 0.2 0.2 0.2 0.4 1 1 1 1 0.2, 0.4, 0.6 vertical edges 0.2, 0.4, 0.6 horizontal edges 0.6 0.6 0.4 0.2 0.20.20.20.4 0.40.2 Probabilistic weights FIGURE 22.6 Calculation of routing congestion using probab ilistic routing analysis. Given a placement of a chip partitioned into k × k grid shown in (a), the nets are decomposed into two pin connections using Steiner tree decomposition as shown in (b). Each two pin connection is then routed probabilistically as shown in (c). (From Brenner, U. and Rohe, A., ISPD, 2002.) Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 454 24-9-2008 #9 454 Handbook of Algorithms for Physical Design Automation connections and then added for each grid box. The fraction cong(e) = p(e) cap(e) (cap(e) is the routing capacity) is defined as the estimated congestion for an edge e ∈ E(G) in the dual of the placement grid as shown in Figure 22.6b. To account for routability problems inside a routing grid, the pin density pin – dens(R) inside a region R is used as a second metric. To use the computed congestion data, the authors rely on the popular technique of inflating groups of cells that are deemed to be in a congested region. The size of a congested cell is increased from the normal geometric area of s(c) = x(c) · y(c) to an increased value of s  (c) =[1 + b(c)]·s(c ), with b(c) ≥ 0. During the partitioning stag e, the new sizes of cells are used to satisfy the density requirements such that for set C(R) being partition ed in subsets C(R 1 ), , C(R 4 ), the condition  c∈C(R i ) s  (c) ≤ s(R i ) is satisfied for i = 1, , 4. The numbers b (c) depend upon an input parameter τ ≥ 0, which is an upper bound on the total increment of b(c ) for each placement level. The initial value b(c) for each cell is proportional to pin density of the cell. During placement, if a cell is deemed to be in a congested region R,thenb(c) is increased by min{1, 2[cong(e i ) −1]}. Once the cells in congested regions are inflated, they are spread around using a repartitioning method in the placer, which respects the new inflated sizes of the cells and m oves cells out of regions that are too full. Repartitioning is called after normal partitioning stage. It considers 2 ×2 windows of adjacent regions R 1 , , R 4 and forms alargerregionR = R 1 ∪, , ∪, R 4 . The larger region is then repartitioned to obey new density limits and improve wirelength. This is done for all 2 × 2 windows repeatedly over all the regions till it yields reasonable improvement in wirelength. Because the windows are overlapping, it allows the inflated cells to move away from congested regions, in effect reducing the demand for routing in these regions. The work in Ref. [30] also proposes a white-space allocation (WSA) technique based on congestion estimates during mincut global placement. The techniques are based on the (WSA) technique proposed in Ref. [28] and discussed in Section 22.4.2. The difference is that Ref. [28] proposes to apply the technique after placement as a detailplacement optimization, while Ref. [30] uses it during partitioning-based global placement. The framework used for the studies in Ref. [30] is a top - down mincut bipartitioning -based placer. Unlike Ref. [6], which uses a quadrisection-based fixed grid placer, Ref. [30] use a bisection- based variable grid placer in which the cutline is allowed to shift after each round of partitioning. The objective for postpartitioning cutline shifting can be based on equalizing densitiesor congestion estimates. In Ref. [30], before eachround of partitioning,the entire placement region is overlayed on a grid. Congestion maps [39] are built using last updated locations of all pins. When cells are partitioned and their positions are changed, the congestion value for their nets are updated. The routing demands and supplies for either side of the cutline are estimated using the congestion map. Using these estimates, cutline is shifted after partitioning to equalize the ratio of demand to supply on either side of the cutline. This effectively allocates more white space to potentially congested areas without having to artificially change the sizes of individual cells. 22.3.2 STEINER WIRELENGTH OPTIMIZATION DURING GLOBAL PLACEMENT Taking an orthogonal approach to traditional works on congestion-driven placement, the work in Ref. [30] shows that congestion characteristics of a placement can be improved by modifying the objective of the placer to optimize Steiner tree wirelength (StWL) and not the traditional half- perimeter wirelength (HPWL). The contention is that StWL correlates much better with routed wirelength in comparison to HPWL. The placement framework for the study is based o n the top- down mincut partitioning algorithm implemented in the well-known Capo placer [1] and is called rooster. Mincut placers generally use either bisection or quadrisection to divide the placement area and the netlist. The objective of the partitioning is to minimize the number of connections between the partitions and also to obey area constraints in the partitions. This is done iteratively till the number of cells in a region is small. When bipartitioning is used, the careful choice of vertical or horizon tal cut direction influence wirelength, wirelength ratio between horizontal and vertical components, and also routing congestion in resulting placement solution [36]. Proper handling of terminals [13] is essen tial to the success of top-down placement app roaches. When partitioning a placement bin Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 455 24-9-2008 #10 Congestion-Driven Physical Design 455 (region) cells inside the bin may connect to cells and fixed points outside the particular bin. These connections need to be properly accounted for by formulating the problem as partitioning cells in a bin with fixed terminals. The terminals are propagated to boundaries of the bin being partitioned and are considered fixed in one or more partitions. Traditional mincutobjective during placement doesnot accurately represent the wirelength objective [35]. The authorsof Ref. [35] introduce a new terminal propagation technique that allows the partitioner to better map netcut to HPWL. Each original net can be represented by one or two weighted nets for the partitioning problem, depending upon the configuration of the net’s terminals relative to bin’s center. The weighted netcut formulation better represents the HPWL objective.This formulation is later simplified in Ref. [9]. The work in Ref. [30] extends this formulatio n to facilitate minimization of wirelength estimates other than HPWL, in particular, StWL minimization. The authors of Ref. [30] show that StWL is a valid cost function for the weighted partitioning problems. However, several assumptions for minimizing traditional HPWL objective do not hold anymore. Moving terminal locations changes Steiner tree construction making StWL estimates in accurate. Additionally, nets that were con sidered inessential in HPWL minimization cannot be considered so during StWL m inimization because there are many Steiner trees of differentlengths having same boundingbox unlike the HPWL objective. To limit the addition runtime during placement, an efficient data structure called pointset with multiplicities is proposed to aid fast building and maintaining of Steiner trees for partitioning. At the beginning of the mincut placement, all movable cells are placed at the center of the first placement bin that represents the full core placeable area. When constructing the partitioning instance for a bin, a Steiner tree evaluator is called for each essential net of the bin. The weighted partitioning problem is formed based on the Steiner tree segments. The weighted mincut partitioner runs and produces a solution. A cutline is selected based on the partition ing result, new bins are created and cells are placed at the center of their respective bins. Three different Steiner tree evaluators, batched iterated 1-Steiner (BI1ST) [20], FastSteiner [21] and fast lookup table based wirelength estimation technique (FLUTE) [11] are tested in the flow. In addition, Ref. [30] also suggests using the StWL objective in detail placement techniques. Two types of sliding window optimizers targeting StWL are proposed. The first one exhaustively checks all possible linear orderings of a group of cells in row and the second one uses dynamic programming algorithm for an interleaving optimization like the one proposed in Ref. [17]. 22.3.3 FREE SPACE MANAGEMENT DURING GLOBAL PLACEMENT Traditional works on congestion-driven placement in the context of high-utilization designs have focused on redistributing the available limited white space toward the congested areas. However, for modern system on chip (SoC) design styles with several big intellectual property (IP) blocks, large macrocells, the available free space h a s increased for dust logic. For such low-utilization designs, in addition to solving the problem of local high-congestion spots, it is also important to ensure that the design is not overspread over the die and also that it has a good natural distribution o f cells in the layout. Addressing these concerns improves the overall quality of the physical design in terms of performance of the circuit, wirelength, power consumed, congestion owing to global nets, etc. However, even for low-utilization designs, car e must be taken to ensure that every local region of the die has enough routing resources to accomodate routing demand created by cells in that region. Hence, the free space management techniques during global placement n eed to be cognizant of maintaining enough white space in the local regions while ensuring that the design is not overspread for other placement objectives. Hierarchical WSA during top-down partitioning-based placement flows is discussed in Ref. [7] and implemented in the placer Capo. The available white space is uniformly spread throughout the core region. This ensures that every local region of the die has enough available white space and helps routing. At every level of partitioning during the top-down flow, the tolerances during the partitioning of an individual bin are determined by the white space deterioration factor α. α is determined by the available white space in the bin and the number of partitioning levels away to leaf level in the partitio ning tree (a pprox imated by log 2 N,whereN is the number of cells in the Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 456 24-9-2008 #11 456 Handbook of Algorithms for Physical Design Automation bin). Hierachical WSA allows higher tolerances during partitionin g and shifting cutlines after each partitioning to equalize the relative white space in the child bins and to ensure a uniform white space distribution. This simple strategy is very good for high-utilization d esigns but produces circuits with high wirelength and worse performance for sparse designs. The work in Ref. [4] argues that top-downplacement based on recursive bisection with multilevel partitioning does poorly on low-utilization designs and tends to produce excessive wirelength when large amounts of white space are p resent. On the other hand, analytical placement algorithms have a g lobal view of the placement problem and can better manage large amounts of white space. To address this, analytical constraint generation (ACG) [4] proposes to combine a mincut-based placer with a quadratic placement engine. In ACG, during top-down recursive bisection-based mincut placement flow, the partitionin g capacities for each placement bin to be partitioned are generated based on quadratic wirelength minimum placement at that level. At each level during the top-down flow, the quadratic wirelength minimization engine produces a placement subject to the respective placement bin con straints for each cell. For each placement bin to be partitioned, th e capacities of each child bin are determined by the center of mass of the quadratic wirelength minimum solution. The tolerances for the partitioning problem are kept fairly rigid. Care is taken to ensure that none of the child partitioning bins overflow their target utilizations. ACG gets the hint from a quadratic wirelength minimum solution on the natural placement of the design and uses powerful multilevel mincut partitioning techniques to produce lower wirelength solutions that are not overspread. The techniques proposed in Refs. [2,3], to handle the issue of sparse designs take a different approach. Their techniques are not dependent on the type of the placer. A black-box placer that uniformly distributes the available white space across the core area is assumed. By preprocessing the n etlist, it can be ensured that (1) there is minimum local white space through the core area and (2) better allocation of the remaining white space. The constraint of minimum local white space is required to ensure that local regions in the die do not become highly congested. The technique consists of adding small disconnected free cells to the design in an amount not exceeding the white space that remains after the local white space requirement is satisfied. Because the free cells are disconnected and small, the black-box placer is free to place the cells so as to improve the relevant design objectives. After the placement process is complete, the free cells are removed from the design and the underlying cell sites are empty. This causes high cell density (which respects the minimum local white space requirement) in certain areas, with empty sites occupying the vacant areas of the chip. In addition, the work in Ref. [3] describes a low-overhead imp lementation of filler cells in a mincut placer, because explicit modeling of free cells in the placement problem impacts the ru ntime and memory footprint of the placer. In contrast to the ACG work where partitioning capacities for each p artitioning problem are determined by quadratic wirelength minimum solution and the partitioning tolerances are kept fairly strict, in Ref. [3], the partitioning tolerances for a low-utilization placement bins are increased sufficiently so that the partitioner can find the optimum mincut solution subject to the capacity and tolerance constraints. The increase in tolerances is done keeping in mind the minimum local white space requirements. Figure 22.7 shows the placement of the same design by different mincut placement approaches as discussed above. 22.4 DETAILED PLACEMENT CONGESTION IMPROVEMENT Congestion-driven detailed placement approaches have been studied extensively in literature. In general, these approachestend to be more effectivecompared to congestion-drivenglobal placement, because a relatively m ore accurate congestion information can be achieved at the detailed placement level. The global-placement step determines r ough locations of the m ajority of the cells, as well as the local area density and pin density. After global placement completes, a pass of congestion estimation can be applied. The congestion distribution of the entire design, or congestion map, is usually the guide of the postglobal-placement congestion-reduction algorithms. The goal of congestion-reduction algorithms is to reduce the congestion in the peak congestion spots, at the cost of increasing the routing demand in noncongested area, thus averaging out the Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 457 24-9-2008 #12 Congestion-Driven Physical Design 457 (a) (b) (c) (d) FIGURE 22.7 Ckt4 design from IBM has 74 percent white space. (a) Placement produced by Capo with uniform white space distribution. (b) Another placement produced by Capo free filler cells (not shown in the picture) was added to reduce the placer white space from 74 to 15 percent. This reduces the wirelength from 15.32e6 to 8.77e6. (c) Placement obtained from a mincut placer from IBM with 70 percent target density. (d) Placement obtained by the ACG technique with 70 percent target density. (From Adya, S. N. and Markov , I. L., International Conference on Computer Aided Design (ICCAD), San Jose, CA, 2003; Alpert, C. J., Nam, G. -J., and Villarrubia, P. G., Proceedings of the 2002 IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, 2002.) routing demand over the die. For the detailed placement stage, there are typically following three ways to reduce the routing demands in congested area. First, traditional HPWL improvement, either globally or locally, can reduce the routing demand in the congested area by having smaller wires in the design. In this sense, the conventionalwirelength minimization is of value for congestion control. This is particularly true for high-utilization designs, where little room can be used for manipulating the available white space of congestion control. Wang et al. [38] illustrate the relationship between total wirelength and congestion, based on the edge overflow congestion model. They observe a strong correlation between wirelength and congestion in experiments. Recent work on mincut placement [22] or analytical placement [28] also suggests that reducing total HPWL helps producing routable placements. Second, in congested regions, replacing cells with accurate routing topology information can yield significant congestion reduction. This type of approaches include cell swapping integrated with Steiner tree routing update in the simulated annealing algorithm [38], cell replacement with an exact congestion objective obtained by global routing or routing estimator [28,30], optimal interleaving with trunk tree decomposition [17], and sliding window optimization with StWL objective [30]. The incremental routing approach in Ref. [38] can be runtime prohibitive because both incremental routing and annealing are computationally expensive. A good control of the working area is paramount for the approach. Usually, the working area grows from a highly congested spot and is subject to some maximum area constraints. If the congestion is mainly caused by entangled routing topologies, the cell replacement approach in Ref. [28] can be very effective for moderate congest reduction. The exploration space, however, is rather limited because fewer cells are touched comparing to other methods. Both approaches failed to address the congestion from global interconnects, nets that pass through the congested area. Accounting for global interconnects in routing congestion reduction is fundamentally a very hard problem for detailed placement. The approach in Ref. [30] uses a fast Steiner tree estimator in mincut placement. It is a promising technique as it improves the StWL at almost no cost on runtime. Third, low-design utilization, or the abundance of whitespace, for many modern ASIC designs allows cell spreading, the most effective way to reduce the routing congestion. By spreading the cells in the core placeable area, the routing resources can be significantly increased. In other words, the average routing demand in the congested area can be reduced to meet the routing resource constraints. Theoretically,if the utilization continues to decrease,the routability goal will be achieved for any given design. In reality, however, spreading the placement inevitably deteriorates the circuit performance and increases thepower consumption,because of increasedaverage interconnect length. We will go through the details of several techniques in the following sections: Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 458 24-9-2008 #13 458 Handbook of Algorithms for Physical Design Automation 22.4.1 ROUTER INTEGRATION An early approach on cell replacing with routing information was proposed in Ref. [38]. In this work, the detailed placement step is based on low-temperature simulated annealing. Traditionally, the cost function in simulated annealing is total HPWL, or some form of weighted total HPWL. Here the congestion cost is incorporated into the objective function. To obtain congestion cost, a fast Steiner tree router is called for every single move in simulated annealing algorithm. The routing supply/demandinformation for all the region edges need to be incrementally updated. This approach is computationally expensive because incremental routing and congestion update are much more time consuming than simple bounding-box computation. Theadvantage ison the fidelity side—congestion improvement achieved in placement can be mostly carried on to the routing. Several techniques can be used to speed up this approach and make it practically feasible, such as routing estimation or lazy update on nets with large fanouts. Using appropriate sizes of the congested regions in optimization is another key step for this approach. The idea is to avoid applying the expensive computation o n noncongested areas. The modeling of congestion cost greatly affects the final placement quality. One basic question is should we penalize the moves only when the routing demand is higher than the supply, or we start giving some cost once the demand is close to the supply, i.e., look-ahead approach. The authors tried many different congestion modeling functions and suggest that look- ahead congestion cost combined with total wirelength is most effective. A typical problem in congestion minimization is that reducing the peak congestion of one region is often at the cost of increasing the congestion of its surrounding area. Classic congestion removal approaches work in expanded area, which is the peak congestion region plus the surrounding less congested area. Furthermore, these approaches work on one expanded area at a time. This results in conflictingcongested areas. The authorsin Refs. [42,43] suggest conflict resolvingtechniques before starting congestion optimization. The basic idea is to exploit the flexibility when growing the peak congestion spot to an expanded area, and obtain the best combined expanded areas for multiple peak congestion spots. The problem can be formulated as a linear program or an integer linear program. A simpler heuristic is to start from greedy expansion and gradually adjust the expanded area to avoid conflicts. Jariwala and Lillis [17] proposed a detailed placement method to integrate trunk decomposition- based routing into optimal interleaving algorithm. Optimal interleaving is a detailed placement wirelength minimization technique proposed inRef. [17]. Itis a powerful intrarow optimization technique. The basic idea is to collect a group of cells in a single-row window, partition them into two groups, and optimally interleave two groups with the same cell o rder of each group. A dynamic programming algorithm ensures the optimal wirelength can be found in O(n 2 ) time complexity. In Ref. [17], the routing inf ormation is con sidered in the interleaving process. The routes associated with cells are also interleaved andthe number of nets crossing a channel is taken into account. The algorithmis suc- cessful on field programmable gate array (FPGA) benchmarks. It reduces the number of congested routing channels (defined as the channels with maximum routing density) by 45 percent on average. 22.4.2 WHITESPACE MANAGEMENT Because modern cell-based d esigns usually have utilizatio ns lower than their precedences, recent research work has been focused on congestion improvement in the context of whitespace. Early research on whitespace handling techniques include Ref. [7], in which the whitespace control is used primarily for improving the quality of mincut partitioning, and Ref. [29], which implicitly uses whitespace for area expansion. Yang et al. [40,41] proposed a WSA approach for the purpose of reducing congestion. The experiment flow in this chapter is complete, including globa l/detailed routing with a widely used commercial router, on a set of benchmarks based on a 0.18 µm library. After global-placementstage, the chip area is divided into rectangular areas (bins) and the congestion estimation is conducted to obtain a congestion degreefor eachbin. At thispoint, allthebins have the same amount ofwhitespace. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 459 24-9-2008 #14 Congestion-Driven Physical Design 459 Next, a two-step whitespace reallocation is performed to adjust the amount of whitespace, or cell utilization in each bin. This is accomp lished by first computing the desired whitespace distribution based on the current congestionmap, and then moving cells between adjacent bins to meet thewhite- space re quirement. Moving cells for meeting the target whitespace distribution inevitably d egrades the quality of the placement. A wirelength improvement stage is introduced to recover the loss. Dur- ing this low-temperature annealing stage, the latest WSA map is maintained. Cell moving/swapping are allowed only if they do not violate the whitespace requirements by a certain amount. The key factor that determines the performance of this approach is the conversion from congestion map to target whitespace map. Because of the lack of detailed congestion information and uncertainty in cell movement,it is extremely hard to accurately estimate how much whitespace needs to be injected into a bin to make a congested bin routable. What this approach is able to do is to relieve the congestion hot spot to some extent. I n reality, because congestion often conflicts with other placement objectives (timing, wirelength, etc.), neither underdoing nor overdoing of congestion removal is desired. Multiple runs with different scaling factor will help to achieve the best trade-off. Another uncertainty of this approach lies in the convergence issue. The target whitespace map is derived from the initial congestion map. However, after cell movement the congestion map will be changed and the target WSA is out of sync, i.e., a previous noncongested spot may pop up with less whitespace allocated. Performing iterations of this ap proach multiple times may help, but it could cause oscillation problem or considerable loss of quality of the input placement. An enhancement o f the above approach was proposed in Ref. [16]. In this work, the congestion estimation and WSA are similar to that of Ref. [40,41]. A flow-based algorithm called placement migration is devised to guide the cell movement for achieving the whitespace map. This algor ithm can minimize the amount of movement (total moving distance of all the moved cells), given an original whitespace map (uniform distribution) and a target whitespace map. Less cell movement means smaller perturbation from the input placement and usually lower wirelength loss. Another interesting approach named whitespace allocation is proposed by Li et al. [28]. The algorithm starts with a global placement and the derived congestion map. The placement is then modified using a recursive bipartitioning,or slicing treemethod.Ina preprocessingstep, a congestion degree number is computed for every cell in the design, based on current placement congestion map. In the first step, the placement is recursively partitioned until every region contains a small number of cells. For each partitioning step, the cut direction is determined by the aspect ratio of the region to be partitioned. Each cutline geometrically bisects a region evenly and tries to maintain a square shape of the regions. A slicing tree is constructed completely at the end of recursive bipartitioning. Each node of the slicing tree corresponds to a region that is bipartitioned. The left child and right child of the node corresponds to the two subregions after partitioning. A list of cells is saved in each node, representing all the cells placed inside the region. Next, the slicing tree is evaluated in a bottom-up fashion. A congestion level is given to each node of the tree. The congestion level of a node is simply the summation of the congestion level of its two children. For a leaf node, the congestion level is the summation of the congestion degree of all thecells in the end region. A cutline adjustment step is then performed. The cutlines in the nodes are shifted by traversing the tree in a top-down fashion. For each node, the amounts of whitespace allocated to the two child nodes are linearly proportional to their congestion levels. Consider a region r with lower-left corner (x 0 , y 0 ), upper-right corner (x 1 , y 1 ), and the original vertical cut direction at x cut = (x 0 + x 1 )/2. The area of this region is A r = (x 1 −x 0 )(y 1 −y 0 ). Assume that the total area of cells for left subregion r 0 and right subregion r 1 are S 0 and S 1 , and corresponding congestion levels are OVL 0 and OVL 1 , respectively. The total amount of whitespace, (A r − S 0 − S 1 ), is to be allocated into two subregions such that the amounts o f whitespace in the two subregions are linearly proportional to their congestion levels (Figure 22.8). Thus, the amount of whitespace allocated to subregion r 0 is r 0 = ( A r − S 0 − S 1 ) OVL 0 OVL 0 + OVL 1 Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 460 24-9-2008 #15 460 Handbook of Algorithms for Physical Design Automation A r A r X 0 X 0 X 1 X 1 r 0 r 0 Ј r 1 Ј r 1 X cut X Ј cut S 0 S 1 S 0 S 1 FIGURE 22.8 Subregion r 0 is more congested than subregion r 1 . Cutline is shifted from x cut to x  cut , resulting more whitespace in subregion r  0 . Then the new cutline location x  cut can be derived as follows: γ = S 0 + (A r − S 0 − S 1 ) OVL 0 OVL 0 +OVL 1 A r x  cut = γ x 1 + ( 1 −γ ) x 0 where γ is the ratio of the left subregio n area to A r after the cutline adju stment. This step is similar to a top-down partitioning-based global placement except that the cut direction, the cut location, and subnetlists are all known. T he cells stay at the center of the region to which they belong during the top-down flow. The placement is then legalized with minimum perturbation and a cell-swapping-based detailed placement is performed. Li et al. showed that this step alone can considerably reduce the postrouting congestion (measured by percentage of overflowed bins) by 72 percent on average. Furthermore, the similar approach was adopted in Ref. [30] for improving congestion by improving the routed total wirelength. The difference is that in Ref. [30] the congestion-estimate-based cutline adjustment is performed during the global-placement stage, which is a to p-down partitioning framework. The limitation of these whitespace adjustment approaches is they can do little on the part of congestion that is o riginated from global interconnects. If we consider a congested region on the globally routed layout, the congestion of the area can be classified into three types: local nets, the interconnects within the region; semiglobal nets, the interconnects coming in/out the region; and the global nets, the interconnectspassing throughthe region. WSA can relieve the congestion for local or semiglobal nets. However,it cannotreduce the routing demands coming from global nets. Therefore, the effectiveness of the WSA approaches is greatly dependent on the composition of these different types of congestion sources. Generally, the proportions of the congestion sources are determined by two factors: the netlist structure and the global-placemen t quality. It is thus desirable to make the netlist structure more congestion friendly (reducing global nets at n o cost of increasing local nets) or improve the glob al-placement quality (reducing the total wirelength ) . In summary, congestion reduction in detailed placement can be very effective because the congestion spots have been detected with high fidelity after global placement. If enough whitespace is given, most congestion can be relieved b y intelligently spreading the cells to achieve a routable design. However, other objectives in placement, such as timing and power, are likely to be degraded during the spreading. Also, more whitespace means less density and more silicon area. I n that sense, congestion reduction should not be addressed only in detailed placement. It is an objective throughout the entire placement flow and should be considered as early as possible, even in floorplanning stage. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 461 24-9-2008 #16 Congestion-Driven Physical Design 461 22.5 SIMULATED ANNEALING FOR CONGESTION IMPROVEMENT Simulated annealing as an effective placement algorithm has been studied for over two decades [25,32,33]. As circuit sizes increase, simulated annealing is no longer a good choice as an global- placement algorithm because of its weak scalability. However, during detailed placement stage, annealing is still an attractive approach in many cases. For instance, simulated annealing can be applied on a particular region to further optimize the local placement. One flexibility of annealing approach is that it can incorporate virtually any cost into its objective function, as long as the cost can be modeled and incrementally updated. It is then natural to extend simulated annealing from optimizing classic HPWL to optimizing congestion. A number of publications have presented methods to apply simulated annealing for congestion alleviation. They vary in congestion modeling and objective cost function. In this section, we review several cost functions to represent congestion. Once the right cost function is selected, both low temperature or greedy algorithm can be used as the optimization framework. 22.5.1 RISA Cheng proposed a routability model named RISA [10] and for the first time integrated the routability model in simulated annealing algorithm. The basic idea of RISA can be explained using Figure 22.9. The entire chip is divided into M × N rectangular regions. R represents one such region. There are two nets whose bounding boxes overlap with region R.LetW R (H R ) be the width (height) of region R, W 1 (H 1 ) be the width (height) of net N 1 s bounding box, w 1 (h 1 ) be the width (height) of overlapping box between R and net N 1 s bounding box. For region R, the horizontal and vertical congestion contributed by net N 1 are C h = q w 1 h 1 H 1 W R C v = q w 1 h 1 W 1 H R where q is the netweight that is a function of net pin count. q is obtained by randomly building optimal Steiner tree within net’s bounding box and statistically deriving the probability of net crossings. Table 22.1 shows the precomputed value for q. Congestion cost computed by RISA model is integrated into simulated annealing algorithm. Experiments show that with RISA model, routing congestion as measured by number of overcon- gestedgrids isconsiderablyreduced. Theruntimecostisabout 2X comparing to the original simulated annealing algorithm. W 1 H 1 w 1 W R N 1 N 2 H R h 1 R FIGURE 22.9 RISA routability model. . Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 452 24-9-2008 #7 452 Handbook of Algorithms for Physical Design Automation interconnect regions as a result of. 2002.) Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 454 24-9-2008 #9 454 Handbook of Algorithms for Physical Design Automation connections and then added for each. details of several techniques in the following sections: Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C022 Finals Page 458 24-9-2008 #13 458 Handbook of Algorithms for Physical

Định dạng
Số trang	10
Dung lượng	195,24 KB