Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 382 10-10-2008 #7 382 Handbook of Algorithms for Physical Design Automation Finally, con sider minimization of q(s) in Equation19.1 at the coarser level under these assumptions 1 and 2, i.e., min e c ∈R m ˜q(e c ) ≡ q(˜s +Pe c ) = 1 2 (˜s +Pe c ) T Q(˜s +Pe c ) −b T (˜s +Pe c ) It is easily shown that e c minimizes ˜q(e c ) if and only if it satisfies P T QPe c = P T r (19.6) Galerkin coarsening thus defines the linear coarsening operator P T : R n → R m as the transpose of the linear interpolation operator P, as simple substitution of Equa tion 19.5 into Equation 19.4 yields Equation 19.6 on premultiplication by P T . By confining coarse-level iterative improvement to the perturbation e = Pe c to ˜s, relaxation at the coarse-level is in effect decoupled from relaxation at the adjacent finer level. The combined approximation ˜s +e is thus not restricted to the range of P and can represent a broader set of solution candidates for fixed P than is possible if relaxation at the coarser level and subsequent interpolation are applied to the entire coarse-level solution s c rather than just its perturbation e c . In both theory and practice, coarsening and interpolation are so closely related that defining one essentially characterizes the other, even if nonlinear or discrete methods are used. 19.2.2 SIMPLE EXAMPLES OF INTERPOLATION To give the simplified model concreteness for placement, consider the sample netlist H illustrated in Figure19.4. The coarsening of H on the right-hand side of the figure shows modules 1, 2, and 3 mapped to cluster 0, module 4 mapped to cluster 1, and modules 0 and 5 mapped to cluster 2. For the moment we ignore the algorithm used to define the coarsening and concentrate just on the question of how to place the modules, once a placement of their parent clusters has been computed. ∗ Because in this example there are six modules at the finer level and three modules at the adjacent coarse level, linear interpolatio n operators for this example are repr e sented as constant 6 ×3 matrices. Let P const denote the matrix for piecewise-constant linear interpolation, in which each module simply inherits its parent cluster’s position . Alternatively, let P avg denote the matrix for linear interpolation in which each module is placed at some linear combination of its parent’s position and the positions of other clusters containing modules with which it shares nets; let w ij denote the weight that module 5 4 2 1 3 0 Coarsened H 2 1,2,3 4 0,5 1 0 Netlist H e 1 e 5 e 4 e 4 e 3 e 3 e 2 e 2 FIGURE 19.4 Simple six module netlist H with f ive nets and its coarsened approximation. ∗ For concreteness in this example, we describe finer-level objects as modules and coarser-level objects as clusters, but the example applies to any two adjacent levels of hierarchy—the finer-level objects might also be clusters of still finer-level objects, etc. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 383 10-10-2008 #8 Enhancing P lacement with Multilevel Techniques 383 Piecewise-constant interpolation(a) (b) Nonpiecewise-constant interpolation A B C FIGURE 19.5 Piecewiseconstant versus nonpiecewise-constant interpolation (declustering). Each component within a cluster is placed at a w eighted average of the locations of all clusters containing other components to which it is connected. i assigns to the position of cluster j, and assume weights are normalized such that w ij = 1 if module i is contained by cluster j. With modules indexed 0, 1, 2, …, 5 and clusters indexed 0, 1, 2, as shown in the figure, P const and P avg take the following forms. P const = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 001 100 100 100 010 001 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ and P avg = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ w 0,0 01 100 1 w 2,1 0 10w 3,2 01w 4,2 0 w 5,4 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ Each is applied separately to the x c and y c coordinate vectors of the coarse placement to obtain corresponding x and y coordinate vectors of the fine placement, e.g., x = P avg x c and y = P avg y c ∗ (see also Figure 19.5). 19.2.3 STRICT AGGREGATION VERSUS WEIGHTED AGGREGATION A common objection to clustering is that its associations may be incorrect and therefore lead subse- quent iterations to the wrong part of the solution space. To reduce the likelihood of poorly chosen clusters, the notion of a cluster can be generalized by weighted aggregation. Rather than assign each cell to just one cluster, we can break it into a small number of weighted fragments and assign the fragments to different coarse-level vertices; these are no longer simple clusters and are instead called aggregates. During interpolation, a cell’s initial, inherited position is then typically determined by that of several aggregates as a weighted average [16]. Clustering, also called strict aggregation, is a special case of weighted aggregation. To our knowledge, weighted aggregation is not currently used by any published placement algorithm. 19.2.4 SCALABILITY The scalability of the multilevel approach is straightforward to obtain and understand. Provided relaxation at each level has order linear in the number N a of aggregates at that level, and the number ∗ Formally, the matrix for the interpolation operator on s = (x, y) ∈ R n is P = diag(P x , P y ),whereP x is the interpolation matrix for the x coordinates, and P y for y. In these e xamples, P x = P y . Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 384 10-10-2008 #9 384 Handbook of Algorithms for Physical Design Automation of aggregates per level decreases by factor r < 1 at each level of coarsening, say N a (i) = r i N at level i, the total order of a multilevel method is at most cN(1 +r +r 2 +···) = cN/(1 −r). Higher- order (nonlin ear) relaxations can still be used, as long as at least one of the following restrictions is enforced. 1. Their use is limited to subsets of bounded size, e.g., by sweeps over overlapping windows of contiguous clusters at the current aggregation level. 2. Starting placement for iterations at each level is good enough that the termination criteria for relaxation at that level can be satisfied after a strictly controlled number of iterations. In general, unstructured global optimization problems, these restrictions might sharply limit an algorithm’s ability to explore the solution space in any scalable way. Placement, however, exhibits sparse local structure, where most variables are related through the objectives and constraints to only a small constant number of other variables. In the presence of such structure, both analysis and results obtained in practice have confirmed that these restrictions need not impair final quality of results [9]. 19.2.5 CONVERGENCE PROPERTIES Consider a Fourier decomp osition of the error e =ˆx − x ∗ in a system of equations N(x) = 0 (x ∗ denotes an exact solution, ˆx is a computed approximation to x ∗ ). By localized relaxation, we mean iterative optimization over a small subset of coupled variables, assuming all other variables are held fixed; e.g., Gauss Seidel (Equation 19.3). The fundamental observation is as follows [2]: Observation 1 Localized relaxation tends to rapidly reduce high-frequency components of the error in the system of equations. Lower-frequency components of the error are reduced much more slowly. There are two key consequences of the observation, both of which stem from the fact that the very notion of high-frequency comes from the modeling scale. First, different frequency-range Fourier components of the error are most efficiently targeted by relaxation at different scales. By coarsening the modeling scale, we can target a coarser scale of the error in the system—simply apply the same relaxation algorithm at the coarser scale. Recursion over all scales of relevance produces highly accurate solutions in extremely fast and scalable runtime. Second, both the outcome of relaxation and the convergence rates of individual variables during relaxation can beused to guide the constructionof interpolation or aggregation operators. This connection underlies the basic principles of successful multiscale algorithms for systems of linear and nonlinear differential equations, where a robust convergence analysis has been obtained [2]. In this way, analysis and practice in both linear and nonlinear systems of equations have established a link between relaxation and interpolation for problems with local structure [9]. 19.2.5.1 Error-Correcting Algorithm MG/Opt Elementary local convergence properties of a general error-correcting multiscale optimization algo- rithm have been established b y Lewis and Nash [17]. For simplicity, their consideration of constraints is omitted here. The error being corrected (to first-order) is introduced by transferring approximations from finer level to coarser levels. Consider the unconstrained minimization of a smooth nonlinear function F(s) over variables s ∈ R n within a modeling system which generates multiple resolutions of F and s as specified by subscripts h (finer scale) and H (coarser scale). In placement, a resolution is primarily a selection of bin-grid dimensions used to enforce density constraints. A convergent, continuous nonlinear relaxation algorithm R is assumed given, along with continuous interpolation operator I h H and continuous coarsening operator I H h = I h H T . Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 385 10-10-2008 #10 Enhancing P lacement with Multilevel Techniques 385 The following stepscomprise on single iteration(atlevel h) of amultiscale optimization algorithm called MG/Opt [17]. The algorithm makes explicit use of first-order coarse-grid corrections. Given a resolution min s h F h (s h ) (19.7) with initial estimate s (0) h .Ifthis(h) is the coarsest resolution level, solve Equation 19.7 to the fullest possibly accuracy by means of relaxation R. Otherwise, apply the following steps. 1. Apply N 1 > 0 iterations of R directly to Equation 19.7, obtaining improved estimate s (1) h . Compute corresponding coarse-level estimate s (1) H ≡ I H h s (1) h . 2. Compute coarse-level gradient correction v H =∇F H (s H,1 ) − I H h ∇F h (s h,1 ). 3. Using initial point s H,1 , recursivelyapply thisalgorithm tothe correctedcoarse-level problem min s H F H (s H ) −v T H s H to obtain next iterate s H,2 . 4. Interpolatethe coarse-gridstep s H,2 −s H,1 back to a finer-levelsearch direction e h = I h H (s H,2 − s H,1 ). 5. Perform line search (for scalar α)atlevelh to obtain next iterate s h,2 = s h,1 + αe h . 6. Apply N 2 > 0 iterations of relaxation R to Equation 19.7 with initial point s h,2 to obtain s h,3 . 7. Finally, discrete (noncontinuous) refinement steps may be used to transform s h,3 to next iterate s (1) h . At least one of N 1 , N 2 must be strictly positive; the other may be positive or zero. The algorithm is desig ned to be easy to implement from a given flat nonlinear relaxation and the ability to (a) model the problem at multiple resolutions and (b) transfer approximations between those resolutions. It is shown by Lewis and Nash [17] that MG/Opt converges under the given assumptions. In particular, they establish the following facts rigorously. 1. Corrected coarse-level model approximates the fine-level model to first order in e h . 2. Multiscale search direction e h is a descent direction at the fine level: e T h ∇F h (s (1) h )<0. 3. lim k→∞ ∇F h (s (k) h )=0, i.e., algorithm MG/Opt converges. 4. Search directions e h is well scaled; i.e., the natural step α = 1 is likely to be accepted close to a solution s ∗ . The latter property is necessary for fast convergence. 19.3 MULTISCALE PLACEMENT IN PRACTICE To design and implement a multiscale algorithm, one would like to know what requirements should be imposed on its coarsening and interpolation, relaxation, and iteration flow, and what trade-offs among them can be reasonably expected. In this section, we summarize characteristics of some leading multiscale algorithms and attempt to illustrate the trade-offs they create and the ways they manage them. 19.3.1 CLUSTERING-BASED PRECURSORS Placement by multilevel optimization can b e viewed as the natural recursive extension of clustering- based approaches considered in earlier work. Schuler and Ulrich [18] compared top-down and bottom-up approaches to clustering for linear placement. They observed large speedups compared to flat algorithms. They also observed that balancing cluster sizes (size incorporating both area and connectivity) was important. Mallela and Grover [19] studied clustering as a means of accelerat- ing placement by simulated annealing. They maintained a priority-queue-like structure for cluster candidates. Sechen and Sun [20] employed three levels of clustering in an annealing-based flow. Hur and Lillis [21] used three levels of clustering in linear placement. Cong and Xu [22] studied Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 386 10-10-2008 #11 386 Handbook of Algorithms for Physical Design Automation clustering-based placement, where clustering is first performed based on MFFCs (maximum fanout- free cones) that consider signal directions and logic dependency. These clusters are then placed for timing optimization using TimberWolf6.0 [23], a well-known simulated-annealing-based placement package at that time. To our knowledge, Ultrafast VPR [24] is the first published work to recursively cluster a circuit model into a hierarchy o f models for placement by multiscale optimization. Ultrafast V PR is used to accelerate the annealing-based VPR algorithm (versatile packing, placemen t and routing [25]) to reduce design times on field-programmable g ate arrays (FPGAs) at some expense in placement quality. (FPGA p lacement quality in Ultrafast VPR is measured by the area used.) 19.3.2 COARSENING An aggregation strategy defines a recursive transformation of the data functions (objectives and con- straints) from finer-level to coarser-level representations. These transformations reduce the numbers of variables and constraints but sometimes increase their complexity as information is compressed. Although some results exist for the accuracy of these transformations as approximations to the orig- inal problem in the graph context [2], formal results are, to our knowledge, not yet known in the hypergraph setting. For this reason, contemporary convergence criteria in practice rely on heuristics and empirical observations. General c lustering algorithms are described in Chapter 7. Typically, clustering algorithms for placement merge tightly connectedcells in a way that eliminates as many nets at the adjacent coarser level as possible while satisfying some constraint on variation in cluster areas. A class of coarsening algorithms more general than clustering is described briefly in Section 19.2.3. Important questions for coarsening in practice include the following: 1. How accurately do coarsened objectives an d constraints approximate their corresponding finer-level counterparts? What error is incurred? 2. How much coarser than its finer-level source should a coarsened p roblem be? 3. How much variation in cluster areas should be allowed? 4. How coarse is too coarse? That is, when should recursive coarsening stop? 5. What trade-offs exist between (a) coarsening and relaxation and (b) coarsening and iteration flow? For example, how often can an algorithm afford to revise or completely reconstruct its coarsening hierarchy, and by what means? 6. Should the coarsening model the solution at the finer level, or the change in a given approximation to that solution? Why? The questions are interdependent, and precise answers for placement are not yet known. Leading academic multiscale placers model the full placement problem at coarser levels rather than the change in a g iven placement as described in Sections19.2.1 and 19.2.5. Only force d irected placement (FDP)/line search directed placement (LSD) changes its coarsening/interpolation o perators over its flow (Section 19.3.3). The following classifications of coarsening algorithms for multiscale p lacement are sometimes used. 1. Connectivity-based versus location-based: Although netlist connectivity is always used to some extent, location-based algorithms also require a given current placement as input and attempt to keep neighboring modules together (see Section 19.3.2.2). 2. Transient versus persistent [26]: Transient clusters appear as part of the inner steps of an algorithm but are not associated with coarse-level variables. For example, clusters are formed in multiscale partitioning algorithms used in top-down bisection-based placement, but they are not separately placed. Persistent clusters, on the other hand, are computed a-priori and are actually placed. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 387 10-10-2008 #12 Enhancing P lacement with Multilevel Techniques 387 3. Score-based versus scoreless [27]: In scoreless algorithms, clusters are committed as soon as they are formed. Examples described in Chapter 7 include edge coarsening, heavy-edge matching, and first choice. In score-based algorithms, alternative candidates are assigned scores and iteratively refined. As the refinement proceeds, the clustering scores are updated, eventually only candidates with sufficiently high scores are selected to serve as clus- ters. Examples described below include Best choice [26], Fine-granularity [28], and Net cluster [27]. Leading multiscale algorithms limit variation of cluster areas at each level of hierarchy. APlace [29] and NTUPlace3 [30] limit cluster areas to at most 1.5 times the target cluster area. FastPlace3 limits its cluster areas to at most 5 times the target. Among connectivity-based algorithms, experiments to date suggest that local-connectivity- driven greedy strategies like first-choice vertex matching [31,32] and best choice [26] may be more effective than global-connectivity-driven approaches like edge-separability clustering (ESC) [33]. How best to define coarse-level netlists w ithout explosive growth in the number and degree of coarsened hyperedges relative to coarsened vertices is particularly challenging [27,28,32]; Sections 19.3.2.3 and 19.3.2.4 describe recently proposed solutions. Although various forms of clustering dominate the recent multiscale placement literature, Rent’s rule and the importance of limiting cutsize also m ake recursive partitioning an attractive means of generating the multiscale hierarchy [7]. Hybr id algorithms for floorplann in g or placement that combine clustering with netlist partioning in some form continue to be developed [7,12,34–38]. 19.3.2.1 Best-Choice Clustering As described in Chapter 7 and briefly summarized here, b est-choice clustering [26] is the method used by the leading multiscale tools APlace [29], mPL [39], FastPlace 3.0 [40], and RQL [14]. Best choice is a netlist-based, score-based algorith m typically used in placement for persistent o r semipersistent clustering. A graph is defined on the netlist vertices (modules) with each edge weighted by the affinity of the given two vertices. The affinity may represent some weighted combination of complex objectives, such as hypergraphconnectivity, spatial proximity, timingdelay, areabalance, coarse-levelhyperedge elimination, etc. The affinities s(i, j) between vertices i and j used by APlace, mPL, FastPlace3, RQL, and NTUPlace3 ∗ for connectivity-basedclustering are all equal to o r slightly modified from the basic affinity formula s(i, j) = e∈N ij w(e) a i + a j (19.8) where N ij is the set of nets containing both module i and module j a i and a j are the respective areas of modules i and j w(e) denotes the weight of hyperedge e, typically taken proportional to 1/(|e|−1) An affinity-ordered priority queue (heap) of the vertices (including clusters and partial clusters) is formed; each vertex in the heap is associated with its nearest neighbor under the given affinity metric. At each step, the pair of vertices (u, v) with the highest affinity is removed from the heap and clustered, if its total area does not violate the maximum-area constraint. For efficiency, a lazy-update strategy is then employed: the affinities of the netlist neighbor vertices of u and v are marked as invalid rather than being immediately updated. Affinities of invalid vertices are updated only when they arrive at the top of the heap. ∗ NTUPlace3 uses first c hoice rather than best choice. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 388 10-10-2008 #13 388 Handbook of Algorithms for Physical Design Automation Compared with earlier pass-based algorithms lacking a heap-order organization, the priority- queue formulation consistently improves HPWL of final placements. In experiments reported by the authors [26], final HPWL obtained by the multiscale placer hATP usin g best choice improves by 4.3 percent over edge coarsening and by 3.2 percent over first choice. Similar improvements have been reported for APlace [41] and mPL [39]. 19.3.2.2 Location-Based Clustering Location-basedclustering is also called layout-based clustering or physicalclustering.Afteraninitial placement has been obtained as a starting configuration, spatial proximity can be incorporated into the vertex-affinity metric used for cluster ing [ 12,16]. A simple three-level illustration is given in Figure19.6. Earlier versions of mPL [16] incorporated distance as a reciprocal factor in a second V-cycle. ∗ FDP [12] uses an (additive) convex combination of connectivity and spatial proximity for its ver- tex affinity function in hybrid first-choice clustering. Specifically, the affinity score s(i, j) between vertices i and j is defined for a given placement as s FDP (i, j) = λ ⎛ ⎝ e∈N ij 1 1 +|x i − x j |+|y i − y j |−ζ ⎞ ⎠ + (1 −λ) e∈N ij 1 |e|−1 where N ij is the set of nets containing both vertex i and vertex j |e| is the number of vertices in e ζ specifies the minimum displacement possible between nonoverlapping modules i and j Proximity-based aggregation Proximity-based aggregation Defines level 1 Optimization at level 1 Defines level 2 Initial placement at level 0 (a) (b) (c) (f)(e) (d) FIGURE 19.6 Location-basedclusteringin athree-level multiscale-placement flow. Given aninitial placement at the finest level, clusters at each subsequent coarser level are determined from a combination of netlist connectivity and spatial proximity. ∗ Later versions of mPL, however, abandoned the location-based approach in favor of a single V-cycle derived by purely connecti vity-driv en best-choice clustering [42]. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 389 10-10-2008 #14 Enhancing P lacement with Multilevel Techniques 389 The authors report best results for λ ≈ 0.25. The effect stabilizes iterations and supports multiple V-cycles. Location-constrained clustering is used in a somewhat different way to define the coarsest level of a three-level formulation used in FastPlace 3.0 [40]. While the first level of fine-grain clustering (cf. Section 19.3.2. 3) relies solely on netlist affinities (Equation 19.8) and best choice, the second coarse-grain clustering imposes the added restriction that vertices must be closer than a certain parametrized limit in or der to be clustered. In FastPlace 3.0, that limit is set to 10 per cent of the maximum chip dimension. A similar hybrid approach to clustering, using only netlist-connectivity at finer levels but both connectivity and proximity at the coarsest level, is used by mFar [11,43]. The work of Chen et al. [44] describes a more radical approach. ∗ A cluster hierarchy is derived from four separatepreliminaryp lacements. Several iterations of the successive-overrelaxation(SOR) variant of coordinate-wise relaxation [15] are applied to the linear equations for the optimality of flat, unconstrained quadratic HPWL perturbed by density-balancing forces computed by a Poisson solver [10]. In each of four separate trials, the cells are initially placed all at the same point: one of the four corners of the placement region. Clusters are selected according to cells’ average final proximity over the results of all four trials. Although this iterative, empirical approach to clustering requires significant runtime, it is a fixed initial cost that can be amortized over the co st of subsequent iterations. Numerical results confirm the scalability of this approach. 19.3.2.3 Mutual Contraction and Fine-Granularity Clustering A central obstacle to multilevel placement of netlists is that, in the course of recursive clustering, modules tend to be eliminated far more rapidly than nets. The result is often extremely densely interconnected coarse-level netlists with very different structure from the original finest-level netlists they are intended to approximate. The problem has beenpartly addressed intheliterature by clustering schemes that strive to eliminate as many nets as possible [32], in p articular, nets of low degree.To this end, the mutual contraction formulation [28] models the relative strength of a connection between modules u and v relative to u’s total connectivity as w r (u, v) = w (u, v) x w (u, x) where the weight w (e) = 2/{[d(e) −1]d(e)} comes from clique-model (graph-based) approxima- tions of hyperedges in the netlist. The contraction of two modules is defined as the symmetrized product c p (x, y) = w r (x, y)w r (y, x); a pair of modules with a large contraction relative to other pairs is a good candidate for clustering. The notion is readily extended to arbitrary subsets of modules. In fine-granularity clustering [28], the contraction metric is used to order connections between modules in a priority queue. The modules at the front of the queue are grouped together if their grouping does not violate area-balance or other constraints. A target cluster limit is set a priori; clustering typically stops with most clusters containing two or three modules of average area. 19.3.2.4 Net Cluster The above strategies share a dependence on metrics derived from pairs of vertices but are largely oblivious to netlist structure involving more than two vertices. The net–cluster study [27] shows that aggregation criteria defined over local subsets broader than simple pairs of vertices can be used to improve quality and runtime of existing multiscale placers. In particular, clustering the vertices of a single multipin net is shown in many instances to improve quality of results over what can be ∗ The h ierarchy is used by Chen et al. to solve the linear system of equations for a flat, Poisson-based analytical placement formulation rather than to directly support multilevel placement. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 390 10-10-2008 #15 390 Handbook of Algorithms for Physical Design Automation 1 3 2 4 5 6 7 FIGURE 19.7 Netlist with a natural clustering the vertex-pair-based algorithms typically fail to find. The natural clustering most appropriate for placement (or netlist partitioning) groups vertices 1–4 in one cluster and vertices 5–7 in another. Vertex-pair-based algorithms ov erwhelmingly prefer to cluster vertices 4 and 5, precluding the natural clustering. obtained with pairwise strategy; cf. Figure 19.7. Another advantage of net cluster is that it reduces the number of nets at approximately the same rate as the number of vertices. The net cluster algorithm consists of five distinct steps partitioned into two phases. Phase 1. Candidate cluster (a) identification, (b) refinement, and (c) scoring Phase 2. Net-cluster (a) scoring and (b) final formation Initial cluster candidates for phase 1 are simply the nets in the netlist; note that these are not disjoint. Refinement of the candidates proceeds one by one on each of them. For each one, several iterations of FM bipartitioning (Chapter 7) are performed, the cluster candidate used as one subset of the netlist and the rest of the netlist as the oth er. That is, this step attempts to determine whether each vertex in a cluster candidate is more strongly connected to other vertices in the cluster or to the vertices external to it. After this refinement step, each candidate cluster C i is scored by s c (C i ) = # nets inside the cluster # modules inside the cluster × 1 cluster area That is, the score prefers candidate clusters that (1) absorb m any nets, (2) aggregate many modules, and (3) have low area. At the end of phase 1, candidate clusters still are not disjoint. In phase 2, each net N i is then also assigned a score equal to the sum of the scores of the candidate clusters con taining it minus the sum of the score s of the candidate clusters cutting it (see Ref. [27 ] for the precise form). Nets are then visited in descending score order, with one of the four following possible cases applied to each net. 1. If clustering the cells violates cluster-area constraints, then this net is ignored, and the next net is considered. 2. Otherwise, the cells in the net are clustered if none of them have already been clustered. 3. Otherwise, if just one cell in the net has already been clustered, the net can be merged with an existing cluster, if doing so does not violate cluster-area constraints. 4. Otherwise, if at least two cells in the net have already been assigned to different clusters, a merge of all these overlapping clusters and the cells in the current net is made, if doing so does not violate cluster-area constraints. The net cluster study supports the view that considerable improvement in both runtime and QoR of existing multilevel algorithms may be attainable through a more accurate modeling of local netlist structure during aggregation. On ICCAD2004 test cases of up to approximately 200,000 objects, a single pass o f net cluster improves final HPWL of leading multiscale placers mPL6 [3 9] and NTUPlace3 [30] by 2–5 percent on average. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 391 10-10-2008 #16 Enhancing P lacement with Multilevel Techniques 391 TABLE 19.1 Approximate Numbers of Clusters N coarsest at Coarsest Levels of Leading Academic Multiscale Placers Placer APlace Dragon FastPlace3 FDP/LSD mPL NTUPlace3 N coarsest 2000 4 N finest /4 1000 500 6000 Note: The finest-level netlist has N finest modules. 19.3.2.5 Coarsest Level Because the initial placement at th e coarsest level may have a large influ e nce at subsequent iterations, and because the coarsest-level problem is relatively small, the placement at this level is typically performed with great care, to the highest quality possible. For example, mPL [39] uses nonlinear programming while mPG [5] uses simulated annealing. How to judge the coarse-level placement quality is not necesssarily obvious, however, as the coarse-level objective may not correlate strictly with the ultimate fine-level objectives. Under such circumstances, multiple iterations over the entire hierarchical flow may sometimes be important [6,16], evaluation of objectives and constraints at the finest level providing feedback to the representation and optimization at all coarser levels. Coarsest-level pro blem sizes of some leading multiscale algorithms are shown in Table 19.1. From the table it is evident that an enormous range of coarsest-level problem sizes exists, from just four movable objects in Dragon [7] to N/4 in FastPlace3, where N is the number o f modules in the original netlist. The observation that flat (non-multiscale) algorithms, e.g., [13,45] remain competitive suggeststhatthere is noinherentupper limittobest coarsest-levelproblemsize.Similarly, Dragon’s results [7] and results from other problem domains [46,47] strongly suggest the absence of any hard lower bound. The complexity of integer nonconvex nonlinear programming, to which placement belongs, is high enough that multiscaling can sometimes produce superior solutions to flat algorithms even when fewer than 100 variables are present. 19.3.2.6 Numbers of Levels Netlist-driven clustering by variations of best choice or first choice seems to allow a rapid reduction in the number of m odeling levels, by allowing a large cluster-size target, without loss of final place- ment quality. In APlace, each coarse cluster level has about 1/10 the number of movable objects of its adjacent finer level; from Section 19.3.2.5, this rule produces just four levels of hierarchy for APlace on a 1M-module test case. In NTUPlace3, this ratio is approximately 1/5, and in mPL6 it is approximately 1/4. In contrast, in the location-based clustering used by FDP/LSD [12,48], at most 35 percent of modules are aggregated in one pass of clustering, leaving a much larger number of levels. From Section 19.3.2.5, this rule produces 20 levels of hierarchy for FDP/LSD on a 1M-module test case. 19.3.3 ITERATION FLOW Of all the leading m ultiscale placers cited in this chapter, FDP/LSD [48] is the only one known to rely on more than a single pass of successive refinement from coarsest to finest level (cf. Figure 19.3). After each such pass of successive refinement in FDP/LSD, the entire hierarchy is reconstructed, using location-based clustering (Section 19.3.2.2), with one less level of clustering than used in the preceding pass. On a 1M-module test case, this flow constructs a total of 20 + 19 +···+1 ≈ 200 distinct problem formulations over its history. . Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 382 10-10-2008 #7 382 Handbook of Algorithms for Physical Design Automation Finally, con sider minimization of. P x = P y . Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 384 10-10-2008 #9 384 Handbook of Algorithms for Physical Design Automation of aggregates per level. Xu [22] studied Alpert /Handbook of Algorithms for Physical Design Automation AU7242_C019 Finals Page 386 10-10-2008 #11 386 Handbook of Algorithms for Physical Design Automation clustering-based