Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 12 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
12
Dung lượng
5,63 MB
Nội dung
ARTICLE IN PRESS INTEGRATION, the VLSI journal 41 (2008) 123–134 www.elsevier.com/locate/vlsi Low-power state encoding for partitioned FSMs with mixed synchronous/asynchronous state memory Cao CaoÃ, Bengt Oelmann Department of Information Technology and Media, Mid Sweden University, SE-851 70 Sundsvall, Sweden Received 31 May 2006; received in revised form February 2007; accepted February 2007 Abstract Partitioned finite state machine (FSM) architectures in general enable low-power implementations and it has been shown that for these architectures, state memory based on both synchronous and asynchronous storage elements gives lower power consumption compared to their fully synchronous counterparts In this paper we present state encoding techniques for a partitioned FSM architecture based on mixed synchronous/asynchronous state memory The state memory, in this case, is composed of a synchronous local state memory and an asynchronous global state memory The local state memory uses synchronous storage elements and is shared by all sub-FSMs The global state memory operates asynchronously and is responsible for handling the interaction between sub-FSMs Even though the partitioned FSM contains the asynchronous mechanism, its input/output behaviour is still cycle by cycle equivalent to the original monolithic synchronous FSM In this paper, we discuss the low-power state encoding method for the implementation of partitioned FSM with mixed synchronous/asynchronous state memory For the local state assignment a, what we call, state-bundling procedure is presented to enable states residing in different sub-FSMs to share the same state codes Based on state-bundles, two state encoding techniques, in which one is the employment of binary encoding and the other is the further optimization for low power, are compared r 2007 Elsevier B.V All rights reserved Keywords: State encoding; Low power; Mixed synchronous/asynchronous; Finite state machine partitioning Introduction For finite state machine (FSM) low-power design, there are two main active areas of research One is FSM partitioning and the other is low-power state encoding These two methods can be used together or separately in order to reduce the power dissipation of FSMs FSM partitioning can be considered as the employment of the concept ‘‘Dynamic Power Management’’ [1] (DPM) at Register Transfer (RT) level The objective of a DPM scheme is to partition the original design into two or more units and those currently idle units are able to be shut down to reduce dynamic power dissipation It is usual for mechanisms to be added to the design to detect and shut down the idle parts of the units Implementation of these will result in additional circuits which, in turn, will add to the circuit area and power dissipation Therefore, to ÃCorresponding author E-mail address: cao.cao@miun.se (C Cao) 0167-9260/$ - see front matter r 2007 Elsevier B.V All rights reserved doi:10.1016/j.vlsi.2007.02.002 achieve a solution which uses the minimal power consumption, it is important to make an initial, careful, analysis in order to find the most beneficial idle conditions, taking the overhead into account As to FSM partitioning for low power, the original FSM is partitioned into several sub-FSMs and for most of the time, except when there is a state transition between two sub-FSMs, only one sub-FSM is active and all others are shut down Clock-gating and input-disabling are usually used as shut-down circuitry After FSM partitioning, the sub-FSM network basically has two types of structure in which the main difference is the means of implementing the state memory In [2,3], each sub-FSM has its own state memory (as shown in Fig 1a) and extra signals are added to control which sub-FSM should be currently active Since every sub-FSM can be synthesized separately, a low-power state encoding method for monolithic FSM [4,5] can be used directly to reduce the power dissipation of each sub-FSM In this structure, the state memories are, in some sense, ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 124 F1 F2 F1, F2 Fig Structural decomposition of FSM (a) Separate state memory and (b) Shared state memory redundant At any given time, only one state memory in the current active sub-FSM is of importance for storing state information while those others, remaining in the idle subFSMs, are not used By contrast, in the partitioned FSM structure proposed by Chow et al [6], all sub-FSMs share the same state memory (also called local state memory, LSM) and thus area can be reduced (see Fig 1b) However, a global state memory is added to determine which one of the sub-FSMs is currently active Because states in different sub-FSMs can have the same local state code and be distinguished by their global state codes, the local state assignment in one sub-FSM will influence the state codes in another subFSM The sub-FSMs, therefore cannot be synthesized separately and the low-power state encoding problem, in this case, should be considered carefully For state encoding [6] presents a method considering crossing transitions (the state transition between two different subFSMs) by introducing pseudo-outputs A pseudo-output bit represents a relation imposed by the crossing transitions Subsequently, all crossing transitions are deleted and Jedi [7] is used to perform low-power state assignment for each individual sub-FSM Both approaches to low-power FSM partitioning described above, assume fully synchronous implementations, either based on a shared or separate state memory However, the fully synchronous implementation does have disadvantages For the separate state memory structure, in the clock cycle when there is a crossing transition (where the source state and destination state reside in two different sub-FSMs), both sub-FSMs involved must be clocked which adds to the power consumption For the shared state memory structure, the global state memory, used for determining the current active sub-FSM, is always clocked and consumes power For partitioned FSMs utilizing separate state memory, with the aid of an asynchronous hand-over mechanism, [8] removes the requirement of clocking two sub-FSMs at a crossing transition The power overhead, introduced for managing the interaction between the sub-FSMs, can thus be reduced It was verified that, in the clock cycle without a crossing transition, the power consumption of the synchro- nous control for sub-FSM interaction is 5.8 times that of the asynchronous control [9] The average power reduction of 45%, for a set of FSM benchmark circuits, is achieved using this asynchronous communication between subFSMs [3] Using the model of shared mixed synchronous/ asynchronous state memory, without state encoding optimization [10] achieves an average power reduction of 56% As a development of [10], in this paper, a novel lowpower state encoding algorithm is proposed and applied to FSM partitioning based on mixed synchronous/asynchronous state memory This algorithm is based on the assumption that the total power consumption is proportional to the switching activity of state bit-lines [11] The main contributions of this paper are as follows: A state assignment procedure: Introduction of statebundling that reflects the property of state encoding in partitioned FSM with shared state memory Power optimized state encoding: An efficient state encoding algorithm is performed on the state-bundle table to reduce switching activity of state bit-lines, which generally can lead to the total power reduction, including that of the next state logic and the output logic Demonstration of efficiency: The proposed algorithm has been incorporated in a tool for low-power synthesis of partitioned FSMs To a set of MCNC benchmark circuits [12], it was demonstrated that, in combination with partitioning procedures, the tool can achieve an average power reduction of 59% The outline for the rest of this paper is as follows: Section introduces the implementation structure of the mixed synchronous/asynchronous state memory and the necessary definitions and procedures for this implementation In Section the basic binary state encoding technique and the state assignment optimization for low power are presented In Section experimental results showing the possibility of a further power reduction from state encoding optimization are presented In Section the paper is concluded by a discussion regarding the limitations of the two-step synthesis for partitioned FSMs where a partitioning step is followed by a state encoding step Partitioned FSM with mixed synchronous/asynchronous state memory 2.1 Implementation architecture In this paper, use is made of the architecture developed in [13] which has a shared LSM and a global asynchronous state memory (GSM), see Fig The basic idea is for the LSM, which is always clocked, to be synchronous and the GSM asynchronous, as this does not consume power when idle The partitioning of the original monolithic FSM is based on state transition probabilities where the states with ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 2.3 Basic definitions I GSM 11 δ1 λ1 Φ LSM 125 O Merging function Gating function Fig Mixed synch/asynch FSM high mutual transition probabilities are implemented in the same sub-FSM Since the state transition probabilities between sub-FSMs are low, the global state memory is generally idle and thus an asynchronous implementation is more power efficient The shut-down mechanisms used are input-gating, in order to reduce power dissipation in idle combinational logic, and clock-gating to shut down flipflops temporarily not required in the LSM The primary output and next-state information of the partitioned FSM are obtained by merging the separate output and nextstate variables from the sub-FSMs (see Merging function in Fig 2) 2.2 STG transformation In order to implement the structure described in the previous section, the original state transition graph (STG) of the monolithic FSM must be transformed Fig serves to illustrate the basic ideas of the design model for STG transformation The initial monolithic machine is decomposed into two sub-FSMs F1 and F2 as indicated in Fig 3a There are three crossing transitions after partitioning, two from s2 in F1 and one from s5 in F2 For each crossing transition, an additional g-state is introduced, which is inside the subFSM where the source state resides and has the same index as the destination state of the crossing transition In Fig 3b there are three g-states, g1, g3 and g4, whose indices correspond to s1, s3, and s4, respectively After introducing g-states, a crossing transition is transformed into two steps The first step is inside the LSM and a state transition from the original source state ends at the g-state In the second step, the g-state is detected and the global state set denoted by R, responsible for determining the active sub-FSM, changes subsequently A change in R will deactivate the sub-FSM containing the source state of the crossing transition and activate the subFSM containing the destination state of that transition Take the crossing transition from s2 to s3 as an example After STG transformation, the transition from s2 will enter g3 The detection of g3 will cause the global state set R to make a transition from r1 to r2 After the completion of the crossing transition F2, not F1 as before, will be appointed as the active sub-FSM The monolithic Mealy type FSM is defined as a sextuple: F ¼ ðS; X ; Y ; d; l; s0 Þ where S is the set of states, X is the set of binary inputs, Y is the set of binary outputs, d is the transition function, l is the output function and s0 is the initial state Let there be a partition on the set S:P ¼ {S1,S2,y,Sn} where P is defined as a collection of subsets such that [nm¼1 S m ¼ S and Si\Sj ¼ Ø for iaj where 1pi, jpn The monolithic FSM is decomposed into a set of subFSMs where every state subset SmAP defines a sub-FSM m as: F m ¼ ðS m ; X m ; Y m ; dm ; lm ; sm is Þ: The state subset S m called the internal state of the sub-FSM, X the set of input variables at the transitions from the states in Sm, and Ym the set of output variables on the sets Sm and Xm T(Sm) is defined as the set of states that are not included in Sm and to which there are transitions from Sm of Fm: TðS m ị ẳ fsj jdsk ; X h ị ẳ sj ; sj eSm ; sk Sm g Q(Sm) is defined as the set of states inside Sm and to which there are transitions from other sub-FSMs: QðS m Þ ¼ fsj jdðsk ; X h Þ ¼ sj ; sj S m ; sk eSm g The shorter notations Tm and Qm are used throughout the remainder of this paper The set of g-states Gm, for the crossing transitions originated from Sm, replacing Tm as the set of destination states, is defined as G m ¼ fgi jsi T m g After STG transformation, let the set of internal states in the sub-FSM Fm be Um: U m ¼ Sm [ G m 2.4 State-bundling In a synchronous FSM the crossing transition, as for all other transitions, must be completed within one clock cycle As explained above, a crossing transition after the STG transformation requires two steps of state transitions As the behaviour of the transformed STG is still cycle to cycle equivalent to that of the original synchronous one, these two steps of transitions must be completed within one clock cycle It is known that an asynchronous state transition is triggered by a signal transition rather than the active edge of the clock signal Therefore, following the synchronous state transition in the first step, the second step involving the asynchronous state transition happens immediately inside the global state memory and the two steps will be completed before the start of the next clock cycle When the global asynchronous state transition occurs, the local state must remain unchanged, which places a restriction on the state encoding In other words, the local states must be coded in such a way that a g-state and its associated entry state, together named as a coupledstate, must have identical codes For the example in Fig 3b, there are three coupled-states in total: (s1, g1), (s3, g3), and (s4, g4) Each coupled-state includes two states that reside in different sub-FSMs but have the same state code The coupled-states show how the local state assignments in different sub-FSMs should relate to each other, ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 126 x1 F2 F1 F2 F1 x1 x1 S1 r 1+,r 2g1 x1 S1 S5 S5 x1 x1 S2 x1 x1 r2+, r1- S2 g3 S3 S3 x1 x1 x1 r2+, r1S4 S4 x1 g4 Fig Example: (a) monolithic FSM with state partition indicated and (b) coupled-states introduced B: b0 b1 b2 b3 F1 s1 g3 g4 s2 F2 g1 s3 s4 s5 optimization procedure for low-power By merging or more coupled-states into the same state-bundle optimizing the state codes of the state-bundles, we minimize the switching activity in the state bit-lines thus the power dissipation [11] two and can and Fig Example of state-bundle table 3.1 Basic state encoding algorithm which imposes a further restriction on the local state encoding Other states, not included in the coupled-states, are denoted as free states (s2 and s5 in this example) These free states have more freedom within the local state assignment If they are located in different sub-FSMs, since their global states are different, they can choose whether to have the same local state code or not The states sharing the same local state code are called a state-bundle A, what we call, state-bundle table is used to describe the behaviour of the decomposed FSM including the sub-FSM interaction We take the above example (Fig 3) as an illustration and its state-bundle table is shown in Fig In the table, each row includes the internal states of a sub-FSM and each column, having the same local state code, represents a state-bundle A local state transition is represented by a horizontal change between columns and a global state transition is represented by a vertical change between rows Every coupled-state shares the same local state code and should be placed in the same bundle The bundle including g-state is further distinguished as a gstate-bundle in the remainder of the paper There are two reasons for state-bundling: (1) it enables states in different sub-FSMs to share local state codes and (2) it enables an efficient asynchronous global state transition State encoding for local states After FSM partitioning, the whole local state encoding procedure is performed on the state-bundle table Each coupled-state is placed in a single column and then the free states are added to the table In this section we present two state-encoding methods One is the basic procedure that generally offers good results [10] The other is the The following example will be used to illustrate the basic procedure of local state assignment Assume that the original state set is S, let there be a partition P ¼ fS1 ; S ; S ; S4 g which results in the following sets of states: U ¼ fs1 ; s2 ; s3 ; g4 g, U ¼ fs4 ; s5 ; s6 ; g7 g, U ¼ fs7 ; g1 g, and U ¼ fs8 ; s9 ; s10 ; s11 ; s12 ; s13 ; g1 ; g5 g The duty time of each state subset Um, i.e., the probability of the corresponding sub-FSM Fm to be active, is given by the sum of the state transition probabilities for states are P which the source inside Sm, that is T m ¼ probðsi ; sj Þ; si S m ; sj S The static probability of Um, i.e., the sumPof the static state probabilitiesPof Sm, is dened as Dm ẳ probsi ị; si S m n Note that static state m¼1 Dm ¼ as the sum of the P probabilities of all states equals By contrast, nm¼1 T m is greater than 1, because a crossing transition is associated with two sub-FSMs and its state transition probability contributes to the duty time of both involved sub-FSMs When building the state-bundle table, for an n-way partitioned FSM, there are n rows The set of state-bundles, denoted B, can be defined as B ¼ {b0,b1,y,bp}, where p is determined by the number of columns since each column represents a state-bundle Corresponding to the definition of state probability and state transition probability, two probabilities concerning state-bundles are defined One is the state-bundle probability, expressed as probbm ị ẳ P probsi ị, si bm , 0pmpp, representing the sum of state probabilities for states in the state-bundle bm The other is the state-bundle transition probability, dened as P probbm bk ị ẳ probðsi sj Þ, si bm , sj bk , 0pm, kpp, describing the sum of state transition probabilities from states in the bundle bm to the states in the bundle bk Binary state codes are assigned to the columns of the table from left to right in incremental order Initially, the codes of state-bundles correspond to the bundle index, i.e., ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 127 B: b0 b1 b2 b3 b4 b5 b6 b7 F1 s1 g4 s2 s3 - - - - F2 s6 s4 s5 g7 - - - - F3 g1 - - s7 - - - - F4 g1 s8 g5 s9 s10 s11 s12 s13 Fig State-bundle table with large c b0 has the code ‘‘000’’, b1 ‘‘001’’ and so on Binary encoding ensures that the number of clocked local state bits for each sub-FSM is minimal, since the high state bits unused always hold the value of ‘‘zero’’ [13] The construction of the state-bundle table starts from the coupled-states From previous discussions, it is known that each coupled-state is in a bundle of bundles S The number necessary for coupled-states is nm¼1 Qm , i.e., the S sum of the entry-states of all sub-FSMs Note that nm¼1 Qm equals the total number of g-states In Fig 5, coupled-states are shaded grey in the state-bundle table It can be seen that for example s4 in F2 is in the same column as g4 in F1 because they are coupled-states After filling the coupledstates in the table, free states in each sub-FSM are placed into the table ordinally from the left most empty cell The pseudo-code for the bundling algorithm is shown in Fig 3.2 Power-optimized state encoding algorithm The power-optimized procedure of state assignment is based on the basic state encoding algorithm mentioned above Binary codes are still used for the columns of the state-bundle table from left to right, but states will be moved to suitable columns in order to reduce the switching activity in the state bit-lines In the first step, using the ‘‘merging coupled-state’’ algorithm, two or more coupledstates can be merged into the same state-bundle It is thus possible to reduce both the number of state bits in the LSM and the clocked bits for a single sub-FSM In this step, state probabilities are also taken into account to reduce switching activity In the second step, to further reduce the switching activity, these g-state-bundles (including coupledstate) are moved to suitable columns depending on their mutual state-bundle transition probability The free states are then placed in the table, taking switching activity into account 3.2.1 Merging coupled-state algorithm For partitioned FSM, to determine if it is necessary to merge coupled-states, we introduce the measurement criteria: S m S m \Qm ị mẳ1 , c ẳ Sn m m¼1 Q where the denominator represents the total number of destination states for the crossing transitions (equal to the Fig Pseudo-code for basic procedures of building state-bundle table total number of g-states) and the numerator represents the total number of free states and thus the smaller the number of g-states, the bigger the value of c For a partitioned FSM with a small number of g-states, the number of coupledstates is limited and merging coupled-states may be unnecessary In Fig 5, the number of g-states is small and c equals 94 Most FSMs, partitioned according to the state transition probabilities, have small numbers of g-states and will therefore have a large value for c For this reason the basic state-bundling procedure without merging coupled-states works well in most cases However, when the number of g-states is large, placing each coupled-state into a different column will result in a large number of g-state-bundles and thus causing inefficient ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 128 usage of the LSM The objective of merging coupled-states is to reduce the number of g-state-bundles so that more states are able to share the same local state code It is then possible to simplify the detection logic for g-states and reduce the number of local state bits required for each subFSM To illustrate the merging coupled-state algorithm, the example in Fig is used where the value of c is 25 (Fig 8) The initial state-bundle table with coupled-states before merging is shown in Fig 9a, where the five g-states reside in five state-bundles The merging procedure is performed using the following steps (1) Rows in the table are sorted (in sort( ) function of Fig 10) according to the static probability Dm of their corresponding sub-FSMs (see Fig 7) Following this rearrangement, the static probability of the sub-FSMs appears in descending order Since those sub-FSMs with high static probability generally contribute more to the final power dissipation, they are given priority in the following optimization procedures The sorted state-bundle table is shown in Fig 9b Following on from this, the gstate-bundle with the highest state bundle probability is moved to the first column and is assigned the state code ‘‘zero’’ in binary coding In the proposed implementation architecture, the next-state information of the partitioned FSM is obtained by merging the next-state variables of the different sub-FSMs (Fig 2) using OR gates When the present active sub-FSM, assuming that its current state is encoded to ‘‘zero’’, is deactivated in the next clock cycle, F1 Sub-FSM static probability: F2 S0 S1 D1 = 0.3 D2 = 0.1 S6 F3 D4 = 0.3 D5 = 0.1 S5 F5 D3 = 0.2 S3 S4 SF2 F4 Fig Example of a partitioned FSM with small c b0 b1 b2 b3 b4 B: 000 001 010 011 100 F1 s0 g1 - - - F2 - s1 g3 - - F3 s2 - s3 g4 - F4 - g1 g3 s4 g5 F5 g0 s6 - - s5 Fig State-bundle table before optimization bits there are no state changes in this sub-FSM as the next-state variable of a deactivated sub-FSM is always encoded to ‘‘zero’’ Encoding the g-state-bundle with the highest probability to ‘‘zero’’ can therefore reduce the switching activity in the next-state bit-lines for the crossing transitions (2) The coupled-states are merged and two or more will reside in the same column To ensure that the state-bundle probability of the first column is a maximum, the algorithm begins from the coupled-state in the first column (b0 by default) and attempts to merge other coupled-states into it If two or more coupled-states are able to be chosen for merging, then the one in the bundle with the highest state bundle probability will be chosen When the merging process for the first column b0 is completed, b0 is locked The same procedure continues for the following coupledstates until the one in the last column has been executed In the example given in Fig 9b, it is shown that both b2 and b3 can be merged into b0 According to the static probability information of sub-FSMs shown in Fig 7, the state-bundle probability of b3 is 0.3 (prob(b3) ¼ prob(s4) ¼ prob(D4)), larger than that of b2 (prob(b2) ¼ prob(s3)pprob(s3)+ prob(s2) ¼ prob(D3) ¼ 0.2) Therefore, b3 is chosen to be merged into b0 The updated state-bundle table after merging coupled-states is shown in Fig 9c, where the total number of g-state-bundles is reduced from to 3.2.2 g-state-bundle encoding optimization Since every g-state-bundle is given a unique local state code, the problem of reducing the switching activity between state transitions is transformed to reducing the switching activity of the transitions between the bundles In this step, the indices of state-bundles are fixed but their positions are moved between the columns The algorithm, in addition to reducing the switching activity between statebundles, attempts to allow the sub-FSMs with higher static probabilities to retain the minimum-length encoding The state encoding procedure begins from the top row of the state-bundle table, corresponding to the sub-FSM with the highest static probability, and continues to the last row of the table Bundle b0 is in the first column with the highest state-bundle probability and, its position is locked initially when other g-state-bundles are unlocked In the table, if a row includes a state belonging to a g-state-bundle, the g-state-bundle is said to be valid in this row For the valid g-state-bundles of each row, their positions will be optimized and then locked without further changes A greedy algorithm, shown in Fig 11, is used to minimize the hamming distance of g-state-bundles with high statebundle transition probability The procedure can be illustrated through the example given above In Fig 9c, the state-bundle table after merging the coupled-states includes state-bundles b0, b1, b2, b3 The first row only includes states s0 and g1, belonging to b0 and b1, respectively, so b0 and b1 are the valid state-bundles in this row (corresponding to F1) Since b0 is locked initially, the g-state-bundle optimization procedure begins from b1 ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 a 129 b b0 b1 b2 b3 b4 b0 b1 b2 b3 b4 B: 000 001 010 011 100 B: 000 001 010 011 100 F1 s0 g1 - - - F1 s0 g1 - - - F2 - s1 g3 - - F4 - g1 g3 s4 g5 F3 - - s3 g4 - F3 - - s3 g4 - F4 - g1 g3 s4 g5 F2 - s1 g3 - - F5 g0 - - - s5 F5 g0 - - - s5 c d b0 b1 b2 b3 b4 b1 b3 b2 000 001 010 011 100 B: C: b0 B: 00 01 10 11 F1 s0 g1 - - - F1 s0 g1 - F4 s4 g1 g3 g5 - F4 s4 g1 g5 g3 F3 g4 - s3 - - F3 g4 - - s3 F2 - s1 g3 - - F2 - s1 - g3 F5 g0 - - s5 - F5 g0 - s5 - - e B: C: b0 b1 b3 b2 00 01 10 11 bits F1 s0 g1 - - F4 s4 g1 g5 g3 F3 g4 s2 - s3 F - s1 - g3 F5 g0 - s5 s6 2 Fig Optimization procedures for state-bundle table: (a) initial table with coupled-states; (b) sorted table; (c) merging coupled-states; (d) optimization of g-state-bundles; (e) final state-bundle table To ensure that valid g-state-bundles in F1 use the minimumlength codes, b1 can only be assigned the code ‘‘01’’ and thus only one state bit is necessary to distinguish the valid g-state-bundles in F1 When the optimization in the first row is completed, the position of g-state-bundles b0 and b1 is locked For the remaining two bundles, b2 includes the coupled-state (s3, g3) and b3 includes (s5, g5) In F4 there are states and the number of minimal local state bits is Since the codes ‘‘00’’ and ‘‘01’’ have been assigned, the remaining available codes are ‘‘10’’ and ‘‘11’’ The statebundle transition probabilities between unassigned statebundles (b2, b3) and assigned bundles (b0, b1) are calculated If it is assumed that the transition probability between b3 and b0 is the highest, then b3 is placed in the column with the code ‘‘10’’, which has a hamming distance of to b0 (‘‘00’’) and is then locked Bundle b2 is subsequently assigned the code ‘‘11’’ and locked Since all g-state-bundles are locked, the state encoding optimization for g-statebundles is complete It can be seen from Fig 9d that the position of b3 and b2 is swapped after optimization 3.2.3 Free state encoding optimization Free states are states of sub-FSMs that are not included in the coupled-states These states are not related to crossing transitions and therefore their state assignment optimization is not influenced by the ordering of sub-FSMs For every unassigned free state of a single sub-FSM Fm, its state transition probabilities with all other assigned states (inside or outside Fm) is calculated The free state associated with the highest state transition probability is chosen and placed into the column minimizing the hamming distance of this ARTICLE IN PRESS 130 C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 Fig 10 Pseudo-code for coupled-state merging transition Meanwhile the condition of minimum-length encoding for Fm should still be satisfied In the example above, for sub-FSM F3, s2 is a free state and the minimum number of state bits for F3 is (obtained from minimumLengthCode( ) function in Fig 12) Since s2 only has a state transition to s3 (in Fig 7), it should be placed in the column which has the smallest hamming distance to the column with code ‘‘11’’, which is where s3 resides Both columns, with code ‘‘01’’ and ‘‘10’’, have a hamming distance to ‘‘11’’, so s2 can be placed in either In Fig 9e, it is placed in the left column ‘‘01’’ For subFSM F5, s6 is a free state It has state transitions to s5 and s0, where the former occurs in sub-FSM F5 and the latter is between sub-FSMs F5 and F1 Assume that the state transition probability between s6 and s5 is higher than that between s6 and s0, s6 is placed in the column with code ‘‘11’’, which has a hamming distance of from the column with code ‘‘10’’ where s5 is to be found The column ‘‘01’’ has a hamming distance of from ‘‘10’’ and is therefore not chosen The final state-bundle table after optimization procedures is shown in Fig 9e In comparison to the state-bundle table without optimization shown in Fig 8, it can be seen that the total number of local state bits, as well as the state bits needed for F4 and F5, is reduced from to The effect of reducing local state bits depends largely on whether or not it is possible for those sub-FSMs having a high probability of being active to be able to reduce the number of clocked local state bits Experimental results In this section the results are presented showing how the state assignment optimization procedures, given in Section 3.2, influence the power consumption of partitioned FSMs The optimization algorithms are incorporated into the automatic synthesis tool based on our previous work [10] Seven MCNC standard benchmarks [12] were used in the experiments The number of states in these benchmarks ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 131 Fig 11 Pseudo-code for optimized g-state-bundle encoding ranges from 19 to 121 As in [14], we use Monte Carlo simulation to obtain the approximate state transition probabilities The input vectors are randomly generated and the average input probability is set to 0.5 by default The inputs are assumed to be independent of each other Also, it is assumed that the state probability of each state becomes a constant as time increases to infinity After a warm-up period of clock cycles, the state probability of every state is sampled A simplified convergence criterion is used, i.e., only when the maximum difference value between the probabilities of each state sampled in two consecutive time units (or clock cycles) is less than e, a user specified constant, does the simulation stop The default value of e is set to 10À6 For all the standard benchmarks tested, the simulations converged in a reasonable time This method removes the limitation to deal with STG in an explicit way and supports the execution of large benchmarks The power and area figures presented in the graphs are obtained from gate-level estimations in Power Compiler, and logic synthesis is performed using Design Compiler, both of which tools are obtained from Synopsys [15] A 0.18 mm CMOS standard cell library [16] is used The power supply voltage Vdd is assumed as 1.8 V and the clock frequency is 20 MHz For the original FSM, binary state encoding is the default one used by Synopsys The total power of this monolithic FSM is Ptot,mono ¼ Pclk+Preg+Pns+Pout where Pclk is the clock net power, Preg is the power in the state registers, Pns is the power in the next-state function, and Pout is the power in the output function The total power of the partitioned FSM is Ptot,part ¼ Pclk+Preg+Pns +Pout+Poh where Poh is the power overhead, which is the sum of the power dissipated in the global state memory, circuits for idle condition detection, and shut-down circuitry The power dissipated in the sub-FSMs can be further indicated as Ptot,sub-FSM ¼ Ptot,partÀPoh FSM partitioning, on its own, is an efficient method for achieving power reductions In Fig 13, the power consumptions before and after FSM partitioning is compared It can be seen that significant reductions have been obtained using the mixed synchronous/asynchronous architecture without optimized state encoding ARTICLE IN PRESS 132 C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 Fig 12 Pseudo-code for optimized free state encoding In the partitioned FSM a significant part of the power is dissipated in the global state memory, the circuits for idle condition detection and the shut-down logic (in total Poh) This part is rarely affected by optimization procedures presented in this paper To examine in detail how the proposed procedures affect the power consumption, we firstly only consider the power dissipated in the sub-FSMs (Ptot,sub-FSM ¼ Ptot,partÀPoh) In Fig 14, the impact of state assignment alone on power optimization of Ptot,sub-FSM after FSM partitioning is illustrated It is shown that, in comparison to the basic binary encoding procedures, how the means of merging coupled-state (indicated by the legend ‘‘merged g-states’’) and the following encoding optimization procedures (indicated by the legend ‘‘encoding’’) affect power dissipation As can be seen, the merging of coupledstates is, by itself, rarely a goal For benchmarks s832, s820, scf and s1488, coupled-state merging has no obvious positive effect This result is reasonable as the number of local state bits may not be reduced following merging Even when the number does decrease, since the local state bits are clock-gated, the sub-FSMs having a high probability of being active may still use the same number of state bits as before the merging The following state encoding optimization process (including optimization for g-state-bundles and free states), by contrast, has better performance in terms of power reduction On average, for the benchmarks tested, regarding the power dissipation of the sub-FSMs (Ptot,sub-FSM ¼ Ptot,partÀPoh), the whole state assignment optimization technique achieves a power reduction of around 13% compared to that of the partitioned FSM using binary state encoding simply As shown in Fig 13 to the right, the sub-FSM power (Ptot,partÀPoh) is only a portion (in average 40% or so) of the decomposed FSM power (Ptot,part) The total power reduction from FSM partitioning and optimization procedures is 59% compared to 56% in [10] Most low-power state encoding methods focus on the monolithic FSM, and for the method that does concern partitioned FSM with ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 Power for partitioned FSM (Basic state encoding) Power for original FSMs 140 140 120 100 80 80 Power [μW] 100 48 f sty r 49 sc s1 ke 20 48 s1 sty r 49 s1 sc s8 s8 s8 yb f 20 20 20 32 40 yb 40 32 60 s8 60 ke Poh Preg Pns Pout Pclk s1 Preg Pns Pout Pclk 120 Power [μW] 133 Fig 13 Power reductions for partitioned FSMs Power reduction in local sub-FSMs Power reductions 70 40 35 merged g-states encoding 25 60 Power reduction [%] Power reduction [%] 30 20 15 10 50 40 30 20 -5 10 keyb s832 s820 scf s1494 styr s1488 Fig 14 Power reductions in the sub-FSMs (the total power of the partitioned FSM minus the power of the overhead Poh) Number of bits in state memory Fig 15 Power reductions versus number of bits in the state memory shared state memory [6], no explicit result is given about the power reduction directly from state assignment optimization Therefore, a fair comparison between our method and theirs is not feasible In Fig 15, the average power reduction derived from state assignment optimization in the sub-FSMs of the tested benchmarks (Ptot,sub-FSM) versus the number of local state bits is shown For sub-FSMs with large numbers of bits in the LSM, the state assignment optimization procedure is efficient, but for those containing only a few bits only minor reductions can be obtained From Table 1, it can be seen that the partitioning procedure often results in small sub-FSMs with high duty probability Tm and large sub-FSMs with low duty probability For this type of decomposed FSM, the power reduction from the state encoding optimization is restricted ARTICLE IN PRESS C Cao, B Oelmann / INTEGRATION, the VLSI journal 41 (2008) 123–134 134 Table Structural information from the FSM decompositions FSM keyb s832 s820 scf s1494 styr s1488 |S1| |U1| |T1| 0.99 0.99 0.99 0.96 0.91 0.85 0.91 |S2| |U2| |T2| 0.27 21 24 0.03 0.03 0.08 0.20 0.30 0.20 |S3| |U3| |T3| 0.18 17 23 o0.01 110 0.02 0.08 0.20 0.08 |S4| |U4| |T4| 0.09 0.02 0.08 0.02 |S5| |U5| |T5| 15 16 o0.01 0.03 16 0.03 0.03 |S6| |U6| |T6| 0.02 14 21 o0.01 42 46 0.02 |S7| |U7| |T7| 42 46 0.02 0.02 Conclusions Low-power state assignment for partitioned FSM with shared state memory is more difficult than that associated with separate state memory The state codes in a single subFSM will influence the state assignment of another subFSM and this relation must be given further attention To address this problem, in this paper, the idea of statebundling is used It represents the properties of the partitioned FSM and the restriction imposed on the state encoding by the implementation structure Based on statebundling, for partitioned FSMs with shared mixed synchronous/asynchronous state memory, we propose a state encoding algorithm that, in comparison to basic binary encoding, achieves promising power reductions for subFSMs (Ptot,sub-FSM) The total power reduction for the partitioned FSM (Ptot,part) is lower than that for Ptot,subFSM It is mainly due to the fact that the power dissipation in the power-overhead circuits (Poh) can rarely be reduced from state assignment optimization when the partitioning procedure has been determined By simultaneous partitioning and state encoding, similar to that presented in [17], the state encoding method can influence the partitioning procedure and an improvement in results can be expected However, through the use of the simultaneous method, the complexity of the problem increases dramatically as the run-times for the algorithms A direction for future work is to develop an algorithm for simultaneous partitioning and state encoding and determine whether the more complex algorithms will pay off by reduced power References [1] L Benini, G.D Micheli, Dynamic Power Management: Design Techniques and CAD Tools, Kluwer Academic Publishers, Dordrecht, 1998 [2] L Benini, F Vermeulen, G.D Micheli, Finite-state machine partitioning for low-power, in: Proceedings of the IEEE International Symposium on Circuits and Systems, vol 2, 1998, pp 5–8 [3] B Oelmann, K Tammemaăe, M Kruus, M ONils, Automatic FSM synthesis for low-power mixed synchronous/asynchronous implementation, J VLSI Design 12 (2001) 167–186 (Special issue on lowpower design) [4] L Benini, G.D Micheli, State assignment for low power dissipation, IEEE J Solid-State Circuits 30 (1995) 258–268 [5] C.Y Tsui, M Pedram, A.M Despain, Low power state assignment targeting two and multilevel logic implementations, IEEE Trans Comput Aided Design 17 (12) (1998) 1281–1291 [6] S.-H Chow, Y.-C Ho, T Hwang, Low-power realization of finitestate machines—a decomposition approach, ACM Trans Design Automat Electron Syst (1996) 315–340 [7] B Lin, A Richard Newton, Synthesis of multiple level logic from symbolic high-level description languages, in: VLSI 89, Munich, 1989, pp 187–196 [8] B Oelmann, M O’Nils, A low power hand-over mechanism for gated-clock FSMs, in: Proceedings of the European Conference on Circuit Theory and Design, 1999, pp 118–121 [9] B Oelmann, M O’ Nils, Asynchronous control of low-power gatedclock finite-state machines, in: Proceedings of IEEE International Conference on Electronics, Circuits, and Systems, 1999, pp 915–918 [10] C Cao, M O’Nils, B Oelmann, A tool for low-power synthesis of FSMs with mixed synchronous/asynchronous state memory, in: IEEE Proceedings of the Norchip Conference, 2004, pp 199–202 [11] K Roy, S Prasad, SYCLOP: synthesis of CMOS logic for low power applications, in: Proceedings of the International Conference on Computer Design (ICCD), 1992, pp 464–467 [12] S Yang, Logic synthesis of optimization benchmarks—user guide version 3.0, MCNC Technical Report [13] C Cao, B Oelmann, Mixed synchronous/asynchronous state memory for low power FSM design, in: Proceedings of the EUROMICRO Symposium on Digital System Design, 2004, pp 363–370 [14] J.C Monteiro, A.L Oliveira, Implicit FSM decomposition applied to low power design, IEEE Trans Very Large Scale Integration Syst 10 (5) (2002) 560–565 [15] Synopsys Inc., /http://www.synopsys.com, company homepageS [16] United Microelectronics Corp., /http://www.umc.com.tw, company homepageS [17] G Venkataraman, S.M Reddy, I Pomeranz, GALLOP, Genetic algorithm based low power FSM synthesis by simultaneous partitioning and state assignment, in: The Sixteenth International Conference on VLSI Design, 2003, pp 533–538 ... as follows: A state assignment procedure: Introduction of statebundling that reflects the property of state encoding in partitioned FSM with shared state memory Power optimized state encoding: ... asynchronous global state transition State encoding for local states After FSM partitioning, the whole local state encoding procedure is performed on the state- bundle table Each coupled -state is placed... Pout Pclk 120 Power [μW] 133 Fig 13 Power reductions for partitioned FSMs Power reduction in local sub-FSMs Power reductions 70 40 35 merged g-states encoding 25 60 Power reduction [%] Power reduction