Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 122 24-9-2008 #15 122 Handbook of Algorithms for Physical Design Automation 2D placement 3D placement X Y 0.05 0 Ϫ0.05 Ϫ0.05 Ϫ0.1 Ϫ0.1 Ϫ0.15 0.1 0.05 0 Ϫ0.02 0 0.02 X Y N 0.04 0.06 0.08 0.15 (a) (b) 0.1 0.05 Ϫ0.05 Ϫ0.1 Ϫ0.15 Ϫ0.2 Ϫ0.04 Ϫ0.02 0 0.02 0.04 0.06 0.08 0.1 0 FIGURE 7.9 Placements of prim1 using (a) two eigenvectors and (b) three eigenvectors. Partitioning Solutions from Multiple Eigenvectors It is also possible to use multiple eigenvectors to determine arrangements of verticesthatminimize the number of cuts. Hall [Hal70] suggests that the location of the vertices in r-dimensional space can be used to identif y blocks (see Section7.3.1 fo r a description of hismethod).Two- and three-dimensional placements of prim1 are shown in Figure 7.9. The three branches in the two-dimensional p lot indicate three blocks should be formed. On the other hand, it is not as obvious how to cluster vertices in the three-dimensional plot. Instead of minimizing the squared distance between two vertices as in Equations7.3 and 7.4, Frankle and Karp [FK86] transform the distance minimization problem to one of finding the point emanating from the projection o f x onto all eigenvectors that is furthest from the origin. The vector induced by this point will give a good ordering with respect to the wirelength. Chan et al. [CSZ94] use the cosine of the angle between two rows of the |V |×k eigenvector matrix, V, to determine how close the vertices are to each other. If the cosine between two vectors is close to 1, then the corresponding vertices must belong to the same block. Their k-way partitioning heuristic constructs k prototype vectors with distinct directions (to represent blocks) and places into the corresponding block the vertices that have corresponding vectors within π 8 radians of the prototype vector. This approach was the starting point for a method devised by Alpert et al. The idea behind multiple eigenvector linear orderings (MELO) [AY95], [AKY99] is after removing the first column (which corresponds to the zero eigenvalue) from V (call this matrix V ), the partition that satisfies the usual mincut objective and balance constraints is obtained by finding a permutation of the rows of V that results in the maximum possible two-norm sum of the rows. Alpert and Yao [AKY99] prove that when the number o f eigenvectors selected is n, then maximizing the vector sum is equivalent to minimizing netcut. 7.3.2 LINEAR PROGRAMMING FORMULATIONS In paraboli, Riess et al. [RDJ94], [AK95] use the eigenvector technique of Section 7.3.1 to fix the vertices corresponding to the ten smallest eigenvector components and ten largest eigenvector components to locations 1.0 and 0.0, respectively. The center of gravity of the remaining vertices is fixed at location 0.5. T hey use a mathematical programming technique to r e position the free vertices Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 123 24-9-2008 #16 Partitioning and Clustering 123 so the overall wirelength is reduced. The mathematical formulation is given by min |V| i=1 |V| j=1 a ij |x i − x j | (x i − x j ) 2 s.t. |V| i=1 x i = f In the next pass of the algorithm, the 5 percent of vertices with the largest (smallest) resulting coordinate are moved so their center of gravity is at x i = 0.95 and x i = 0.05. After performing the optimization and repositioning, the process is repeated at center of gravity of x i = 0.9 and x i = 0.1, etc. The process is repeated ten times so there are ten different orderings. The best ordering is the one among the ten orderings with the best ratiocut metric. In Ref. [LLLC96], the authors point out that linear cost functions spread out dense blocks of vertices, whereas quadratic cost functions naturally identify blocks o f vertices, making it easier to assign discrete locations to otherwise closely packed vertices. They incorporate the merits of both linear and quadratic methods in a modified α-order cost function: min |V| i>j |V| j=1 a ij |x i − x j | 2−α (x i − x j ) 2 s.t. |V| i=1 x i = f where 1 ≤ α ≤ 2. If α = 1, the cost function becomes the linear cost function; for α = 2, the cost function becomes the quadratic cost function. They observe that α = 1.2 best incorporates the benefits of linear and quadratic cost functions. 7.3.3 INTEGER PROGRAMMING FORMULATIONS In Ref. [AK95], the authors formulate bipartitioning as an integer quadratic program. Let x is indicate that vertex i belongs to block s.Leta ij represent the cost of the edge connecting vertices i and j.Let B beamatrixwithb ii = 0, ∀i and b ij = 1, ∀i = j. The optimization pr oblem that minimizes the number of edges that have endpoints in more than one block is given by min k i,j=1 m s,=1 a ij x is b s x j (7.6) s.t. k s=1 x is = 1 ∀ i (7.7) m i=1 x is = u s ∀ s (7.8) x ij ={0, 1} (7.9) Constraint given in Equation7.7 indicates each vertex belongs to exactly one block and constraint given in Equation7.8 denotes block sizes. The rationale behind the objective function is that when the edge (i, j) is cut, a ij k s,=1 x is b s x j = a ij —in effect the cost of cutting the edge (i, j) appears only once in the summation. On the other hand, if edge (i, j ) is uncut, then s = and b s = 0, which implies that a ij k s,=1 x is b s x j = 0. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 124 24-9-2008 #17 124 Handbook of Algorithms for Physical Design Automation In Refs. [AV93], [Kuc05], the authors formulate the k-way partitioning problem as a 0–1 integer linear program (INLP). Assume there are j = 1 ···k blocks, i = 1 ···|V|vertices, s = 1 ···|E|nets, and i = 1 |e| s vertices per net s.Lets(i ) denote the index of the i th vertex of edge s in the set of vertices, V.Definex ij to be an indicator variable such that x ij = 1vertexi is in block j 0otherwise The crux of the model is in the way we represent uncut edges. If a specific net consists of vertices 1 through 4, then it will be uncut if x 1j x 2j x 3j x 4j = 1forsomej Introduce the indicator variable y sj = 1ifnets has all of its vertices entir ely in block j 0otherwise These constraints enable us to write the partitioning problem as an integer program. To understand how these constraints work, consider a net consisting of vertices 1 and 5. Thus, for this net to be uncut, x ij x 5j = 1. Because x 1j , x 5j ∈{0, 1}then it is true that x 1j x 5j ≤ x 1j and x 1j x 5j ≤ x 5j . The objective function maximizes the sum of uncut nets (hence, minimizing the sum of cutnets) max k j=1 n s=1 y sj (7.10) s.t. y sj ≤ x s(i )j ∀ i , j, s (7.11) n j=1 x ij = 1 ∀ i (7.12) l j ≤ m i=1 a i x ij ≤ u j ∀ j (7.13) x pq = 1 p ∈ V , q ∈ B (7.14) x ij ={0, 1} (7.15) y sj ={0, 1} (7.16) Constraint givenin Equation 7.11is the net connectivityconstraint.Constraintgivenin Equation 7.12 has each vertex assigned to exactly one block. Constraint given in Equation7 .13 imposes block size limits, given nonunit cell sizes a i . The bounds for b ipartitioning are typically l j =[0.45 m i=1 a i ] and u j =[0.55 m i=1 a i ]. Constraint given in Equation 7.14 indicates that vertex p is in block q. 7.3.4 NETWORK FLOW Given a directed graph G, each directed edge (or arc) (x , y) has an associated nonnegative number c(x, y) called the capacity of the arc. The capacity can be viewed as the maximal amount of flow that Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 125 24-9-2008 #18 Partitioning and Clustering 125 s t x 4 2 1 1 2 1 1 1 1 1 FIGURE 7.10 Flow network. (From Ford, L. R. and Fulkerson, D. R., Flows in Networks, Princeton University Press, Princeton, NJ, 1962.) leaves x and ends at y per unit time [FF62]. Let s indicate a starting node and t a terminating node. Aflowfroms to t is a function f that satisfies the equations from y f (x, y) − to y f (y, x) = ⎧ ⎨ ⎩ k, x = s 0, x = s, t −k, x = t (7.17) f (x, y) ≤ c(x, y) ∀ (x, y) (7.18) Equation 7.17 implies the total flow k out of s is equal to −k out of t and there is no flow out of intermediate nodes (as with Kirchoff’s law). Equation7.18 implies the flow is not allowed to exceed the capacity value. Borrowing the example from Ref. [FF62], in Figure 7.10, we see that the flow out of s is −1 −1 +1 + 4 = 3, the flow out of intermediate node x is −4 +2 +1 +1 = 0 and the flow out of t is −2 +1 −1 −1 =−3. The idea behind bipartitioning is to separate G in to two blocks (not necessarily the same size) such that s ∈ C 1 and t ∈ C 2 where the netcut is given by x∈C 1 ,y∈C 2 c(x, y). The following theorem links computing the maximum flow to the netcut. Theorem 3 MinFlow MaxCut: For any network, the maximum flow value from s to t is equal to the minimum cut capacity for all cuts separating s and t If we can find the maximum flow value from s to t, we will have found the partition with the smallest cut. In Figure 7.10, the maximum flow is 3. In Ref. [FF62], the authors prove the maximum flow computation can be solved in polynomial time. The problem is that partitions can be very unbalanced. In Ref. [YW94], the authors propose a maximum flow algorithm that finds balanced partitions in polynomial time. Because nets are bidirectional, to apply network flow techniques, the net is transformed into an equivalent flownetworkandthe flow representation shown in Figure7.11 is used. The idea is that all vertices in net 1 are connected toward vertex x and away from vertex y.The next step is to solve the maxflow-mincut problem in O(|VE|) time, which obtains the minimal cutset, E c , for the unbalanced problem. Finally, if the balance criterion is not satisfied, vertices in C 1 (or C 2 ) are collapsed into s (or t), a vertex v ∈ C 1 (or in C 2 ) incident on a net in E c is collapsed into s (or t) and the cutset, E c , is recomputed. The procedure has the same time complexity as the unbalanced mincut algorithm. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 126 24-9-2008 #19 126 Handbook of Algorithms for Physical Design Automation Net 1 u v w u v w 8 8 8 8 8 8 1 x y FIGURE 7.11 Efficient flow representation. 7.3.5 DYNAMIC PROGRAMMING In a series of two papers [AK94], [AK96], the authors discuss clustering methods that form blocks by splitting a linear ordering of vertices using dynamic programming. It can be shown that dynamic programming can be used to optimally split the ordering into blocks [AK94]. In Ref. [AK94], the authors embed a linear ord e ring obtained from multiple eigenvectors in mul- tidimensional space and use a traveling-salesman problem (TSP) heuristic to traverse the points. The idea is that points that are close together in the embedding are in proximity to one another in the linear ordering. A space-filling curve is then used as a good TSP heuristic because it traverses the points that are near to each other before wandering off to explore other parts of the space. They construct k blocks by splitting the tour into 2, 3, , k −1, up to k segments using dynamic programming. 7.4 CLUSTERING Partitioning is implicitly a top-down process in which an entire netlist is scanned for the separation of vertices into a few blocks. The complementary process to partitioning is clustering in which a few verticesat a time are grouped into a number of blocks proportionalto the number of vertices [Alp96]. A block can be defined in a number of ways. Intuitively, a block is a dense region in a hypergraph [GPS90]. The clique is the densest possible subgraph of a graph. The density of a graph G(V, E) is |E| ( |V| 2 ) and by this definition, clustering is the separation of V into k dense subgraphs, {C 1 , C 2 , , C k }in which each of C i have density equal to :0<≤ 1. However, this problem is NP-complete [AK95]. A less f ormal way of defininga block is simply a r egion where verticeshavemultiple connections with one another. This forms the basis of clustering techniques that use vertex matchings. Normally, matchings apply to g raphs, but here, we apply them to hypergraphs. A matching of G = (V ,E) is a subset of hyperedges with the property that no two hyperedgesshare the same vertex. A h eavy-edge matching means edges with the heaviest weights are selected first. A maximum matching means as many vertices as possible are matched [PS98], [Ten99]. For a hypergraph that consists of two-point hyperedges only, a maximum matching consists of |V| 2 edges (Figure 7.12). In more general case, a maximum matching contracts fewer than |V| 2 edges. 1 2 3 4 5 6 7 8 9 10 FIGURE 7.12 Maximum matching of two-point hyperedges. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 127 24-9-2008 #20 Partitioning and Clustering 127 The clustering process tends to decrease the sparsity of the netlist, which is fortunate because FM-based algorithms perform best when the average vertex degree is larger than 5 [AK95]. Walshaw [Wal03] suggests clustering filters out irrelevant data from the partitioning solution space so that subsequent iterative improvement steps look for a minimum in a more convex space. We have dividedclustering methodsinto three categoriesroughly inchronologicalorder. Cluster- ing techniques block many verticessimultaneouslyin ahierarchicalfashion [KK98,AK98]or one ver- tex at a time in an agglomerative fashion, based on physical connectivity information [AK96,CL00, HMS03,LMS05,AKN + 05]. In cell placers, information such as cell names (i.e., indicating which presynthesized objects cells belonged to) may be incorporated to speed up the clustering heuristic. 7.4.1 HIERARCHICAL CLUSTERING Hierarchical techniques m erge all vertices into clusters at the same time. Candidate vertices fo r hier- archical clustering are based on the results of vertex matchings [BS93,HL95,AK98,KK98,Kar03]; matched vertices are then merged into clusters of vertices. Matchings are used extensively because they tend to locate independent logical groupings of vertices, thus avoiding the buildup of vertices of excessively large degree. Matchings may be selected randomly or by decreasing netsize, called heavy-edge matching. After clustering, the average vertex weight increases, but the average net degree decreases. Karypis and Kumar [Kar03] use the following clustering schemes, assuming unit weights on nets: 1. Select pairs of vertices that are present in the same nets by finding a maximum matching of vertices based on a clique-graph representation (edge clustering). 2. Find a heavy-edge matching of vertices by nonincreasing net size; after all nets have been visited, merge matched vertices (net clustering). 3. After nets have been selected for m atching, for each net that has not been contracted, its (unmatched) vertices are contracted together (modified net clustering). 4. To preserve some of the natural clustering that may be destroyed by the independence criterion of the previous three schemes, after an initial matching phase, for each vertex υ ∈ V, consider vertices that belong to nets with the largest weight incident on υ,whether they are matched or not (first choice clustering). The clustering schemes are depicted in Figure 7.13. Karypis [Kar03] points out that there is no consistently better clustering scheme for all netlists. Examples can be constructed for anyof the aboveclustering methods that fail to determine the correct partitions [Kar03]. Karypis [Kar03] also suggests that a good stopping point for clustering is when there are 30k vertices where k indicates the desired number of blocks. After the clustering ph a se, an initial bipartition that satisfies the balance constraint is performed. It is not necessaryat this point to produce an optimal bipartition because that is ultimately the purpose of the refinement phase. Recently, several new clustering algorithms have been devised. 7.4.2 AGGLOMERATIVE CLUSTERING Agglomerative methods form clusters one at a time based on connectivity of nets adjacent to the vertices b eing considered. Once a cluster is formed, its vertices are removed from the remainin g pool of vertices. The key to achieving a good clustering solution is in somehow capturing global connectivity information. Clustering Based on Vertex Ordering In Ref. [AK96],the authorsintroduce the concept ofan attractionfunction andawindowto constructa linear ordering of vertices. Given a starting vertex, υ ∗ i , and an initially empty set of ordered vertices, S, Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 128 24-9-2008 #21 128 Handbook of Algorithms for Physical Design Automation (c) Modified net clustering (b) Net clustering (a) Edge clustering FIGURE 7.13 Clustering schemes. (From Karypis, G., Multilevel Optimization in VLSICAD,Kluwer Academic Publishers, Boston, MA, 2003.) they compute the attraction function for υ ∗ i at stepi inV−S.Variousattractionfunctions are described. For example, one using the absorption objective is given by Attract(i) = e∈E (i)|e∩S=∅ 1 |e|−1 where E(i) indicates the set of edges at step i. They then select the vertex υ ∗ i in V − S with optimal attraction function and add it to S. Finally, they update the attraction function for every vertex in V −S and r epeat until V −S becomes empty. The order in which vertices are inserted into S defines blocks, where vertices that were recently inserted into S have more attractio n on υ ∗ i than vertices that were inserted many passes earlier (called windowing in Ref. [AK96]). Dynamic programming is ultimately used to split S into blocks. The authors report that windowing produced superior results with respect to the absorption metric over other ordering techniques. Clustering Based on Connectivity In Ref. [CL00], the authors use the concept of edge separability to guide the clustering process. Given an edge e = (x, y), the edge separability, λ(e), is defined as the minimum cutsize among cuts separating vertices x and y. To determine the set of nets to be clustered, Z(G), they solve a maximum f low problem (because computing edge separability is equivalent to finding the maximum flow between x and y). To assess in what order the nets in Z(G) should be contracted, the authors use a specialized ranking fu nction related to the separability metric. Nets are contracted until the maximum cluster limit size of log 2 |V | is reached. In Refs. [HMS03], [HMS04], the authors use a clique representation of nets, the weight of a connection is given by w(c) = w(e) (|e|−1)|e| Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 129 24-9-2008 #22 Partitioning and Clustering 129 A B C D D C B A 1 1 1 1 ½ ½ ½ FIGURE 7.14 Clique net model (with edge weights 1/(|e|−1)) favors absorption better. where w(c) is the weight of a cluster and w(e) is the weight of a net segment (determined by the net model used). The rationale behind using a clique model for nets is that it favors configurations where the net is ab sorbed completely into a cluster. In Figure 7.14, net 1 consists of vertices {A, B, C} and net 2 consists of vertices {C, D}. On the left side, using a star net model, the cost of cutting any edge is 1 so clusters can be formed in three ways. On the right side, the cost of cutting the edge connecting C and D is highest, so clusters like these are formed. The cost of each of a fine cluster, f ,isgivenby c∈f w(c) and the overall cost of a fine clustering solution is given by f c∈f w(c), where the goal is to maximize the overall cost of the fineclustering solution. In Ref. [LMS05], the authors propose clustering technique based on physical connectivity. They define an internal force of a block C as a summation of weights of all internal block connections. F int (C) = i,j∈C w(i, j) As well, they define an external force o f a block C as the summation of weights of nets with at least one vertex located outside C and at least one vertex inside C. F ext (C) = i∈C,jC w(i, j) The measure that best reflects physical connectivity is the ratio of external to internal forces. (C) = F ext (C) F int (C) Where the goal is to maximize (C). F ext can be measured in other ways as well. In Ref. [LMS05], the authors use a local Rent’s exponent of a block p = log G T t where G is the number of nodes in the block T is the number of nets that have connections inside the block and outside the block t is the average node degree of the circuit The seed growth algorithm works by constructing a block with strong physical connectivity starting from a seed node with large net degree. The connectivity between neighbor node u and block C is given by conn(u, C) = i∈C w(u, i). In subsequent passes, neighbor nodes with the largest possible connectivity are added to the block while keeping the internal force as large as possible. Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 130 24-9-2008 #23 130 Handbook of Algorithms for Physical Design Automation When the block size exceeds some threshold value, an attempt is made to minimize the local Rent exponent to reduce the external force. Experimental results indicate the seed growth algorithm produces placements with improved wirelength over placers that use clustering techniques d escribed in Section 7.4.1. Clustering Based on Cell Area In Ref. [AKN + 05], the authors propose a clustering scheme tailored specifically to large-scale cell placement. Their method is different from methods described in Section 7.4.1 in that those m ethods block vertices indiscriminately, whereas best choice clustering considers only the best possible pair of vertices among all vertex pairs. The main idea behind best choice clustering is to identify the best possible pair of clustering candidates using a priority-queue data structure with pair-value key the tuple (u, v , d(u, v)) where u and v are the vertex pair and d(u, v) is th e clustering score. The pair-value keys are sorted, in descending order, by clustering score. The idea is to block the pair at the top o f the priority queue. The clustering score is given by d(u, v) = e 1 |e| 1 a(u) +a(v) The first term is the weight of hyperedge e, which is inversely proportional to the number of vertices incident on hyperedge e.Thea(u) + a(v) term is the total area o f cells u and v. Thus, this method favors cells with small area, connected by nets of small degree. The above area function is necessary to prevent the formation of overly large blocks. The authors propose using other score functions including one that uses the total number of p ins instead of cell area, because the total number of pins is more indicative of block size (via Rent’s rule described in Section 7.4.2). Once a (u, v) pair with the highest clustering score is merged into vertex u , the clustering score for all of u s neighbors must be recalculated. This represents the most time-consuming stage of the best choice clustering algorithm. For this reason, the authors introduce the concept of the lazy-update clustering score technique, in which the recalculation of clustering scores is delayed until a vertex pair reaches the top of the p riority queue. The best choice clustering algorithm is shown to produce better quality placement solutions than edge coarsening and first-choice clustering. The lazy-update scheme is shown to be particularly effective at reducing runtime, all with almost no change in half-perimeter wirelength. Studies are under wayasof this writingintoincorporatingfixed vertices(correspondingtoinput/output terminals) into the best choice algorithm. 7.5 MULTILEVEL PARTITIONING The gist of multilevel partitioning is to construc t a sequence of successively coarser grap hs, to partition the coarsest graph (subject to balance c onstraints) and to project the partitions onto the next level finer graph while performing numerical or FM-typ e iterative improvement to further improve the partition [BJ93,BS93,HL95,Alp96,KAKS97] (Figure 7.15). 7.5.1 MULTILEVEL EIGENVECTOR PARTITIONING The basis of multilevel partitioning with eigenvectors is described in Ref. [BS93] and consists of clustering, interpolation, and refinement steps. Contraction consists of selecting a subgraph, G : V ⊂ V, of the original graph such that V is a maximum matching with respect to G.The Lanczos algorithm [Dem97] is then applied to the reduced bipartitioning problem. Interpolation consists of the following: given an |V | ×1 Fiedler vector, x , o f a contracted graph G , an interpolation step constructsa |V|×1 vectorx 0 out of x . This is accomplished by remembering Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C007 Finals Page 131 24-9-2008 #24 Partitioning and Clustering 131 Clustering Refinement FIGURE 7.15 Essence of multilevel partitioning. that the ith component of x was derived by contracting vertex m(i) of x and upon reconstructing a new |V|×1 vector, x 0 , inserting component x m(i) into the m(i)th slot of x 0 , initially filling all empty slots in x 0 with zeros. For example, if x 0 =[x 1 00x 4 0 x 6 000x 10 ] then thezerocomponents arethen assigned theaveragevalues oftheirleftandright nonzero neighbors x 0 = x 1 x 1 + x 4 2 x 1 + x 4 2 x 4 x 4 + x 6 2 x 6 x 6 + x 10 2 x 6 + x 10 2 x 6 + x 10 2 x 10 Refinementconsistsofusing x 0 as a good starting solution for the Fiedleroptimization problem Equa- tions7.3 through 7.5. The authors use a cubically converging numerical technique called Rayleigh quotient iteration to solve for x [Wat91]. 7.5.2 MULTILEVEL MOVE-BASED PARTITIONING One of the original works on multilevelpartitioning in the VLSI d omain [AHK96] applied techniques that were previously employed on finite element meshes [HL95], [KK95]. The authors converted circuit netlists to graphs, using a clique representation for individual nets, and ran the multilevel graph partitioner, Metis [KK95], to obtain high-quality bipartitions. Using a graph representation, however, has the pitfall that removing one edge from the cutset does not reflect the true objective that is to remove an entire net from the cutset. Subsequent works [AHK97], [KAKS97] partitioned hypergraphs directly using the two-stage approach of clustering and refinement. They obtained optimal or near-optimal mincut results on the set of test cases listed. Multilevel partitioning, to this day, r e mains the de facto partitioning technique. Multilevel move-based partitioning consists of clustering and iterative improvement steps. The power of multilevel partitioning becomes evident during the iterative improvement p hase, where moving one vertex across the block boundary corresponds to moving an entire group of clustered vertices. The refinement process consists of repeatedly applying an iterative improvement phase to suc- cessively finer hypergraphs, while declustering after each pass of the interchange heuristic. Because of the space complexity of Sanchis’ k-way FM algorithm and because vertices are clustered into the proper blocks, Karyp is et al. [KK99] use a downhill-only search variant of FM that does not require the use of a bucket list. Their refinement method visits vertices in random order and moves them if they result in a positive gain (and preserve the balance criterion) . If a vertex v is internal to the block being considered, then it is not moved; if v is a boundary vertex, it can be moved to a block that houses v’s neighbors. The move that generates the highest gain is effectuated. In experiments, the refinement method converges to a high-quality solution in only a few passes. . 