5.8 Bug Count as a Function of Complexity 135 Here it is apparent that, using this organization’s talent and resources, a level of Q of 6000 or higher is needed for a lower-risk tape-out. However, values in the 4000 to 6000 range might also be acceptable if time-to-market considerations require taking on somewhat higher risk at tape-out. It’s also apparent that a value of Q lower than 4000 is quite risky for a tape-out be- cause, historically, that quantity of exercise has left too many unexposed bugs. 5.8 Bug Count as a Function of Complexity One potential measure that may have practical value in estimating the number of bugs in a target is to consider the ratio of bugs to complexity. This may be found to be a simple linear relationship or perhaps some other relationship. Nevertheless, an organization that has empirical data on bug counts and complexity may be able to make more accurate predictions of expected bug counts for new verification targets and thereby plan more ef- fectively. Considering the example provided in Fig. 4.15 and adding upper and lower bounds for expected bug count, we arrive at Fig. 5.8. Fig. 5.8. Forecasting lower and upper bounds for bug count 136 Chapter 5 – Normalizing Data 5.9 Comparing Size and Complexity If we can determine the size of the functional space, why bother trying to convert that precise value to some approximation of complexity, itself a “thing” that is often not well defined, even in this book? Complexity (whatever it is) still exists, but if measures are instead based on the precise size of the functional space, no information is lost. Functional closure is based on the size of the functional space. The number of bugs encountered along the way is based (partly) on the complexity and partly on many other factors. Even though we are using the size of the functional space as an es- timate for complexity, it should be remembered that size and complexity are, in fact, two different concepts that provide insights into different veri- fication problems. 5.10 Summary The measure of complexity of a target based on analysis of the VTG for a target early in a project or analysis of special synthesis late in the project enables normalization of data with respect to a target? As the complexity of targets grow, so do the number of computations the target can perform. Consequently, more cycles of CRV are needed to explore this computa- tional space thoroughly. In the next chapter we will see how to obtain measures of this thorough- ness by examining standard variables. References Zagury S (2004) Mixed-Signal Verification Methodology Using Nanosim Integra- tion with VCS. Design And Reuse. In this figure, estimates for lower and upper bounds on bug count have been shown, and it appears that the tapering off might be misleading. There might actually be more bugs yet to be discovered if the improve- ments in RTL code quality cannot be accounted for by other means. In our example, CRV testing with error imposition hasn’t yet begun and there are more bugs associated with that functionality that remains to be discovered. Chapter 6 – Analyzing Results Previous chapters have defined standard variables and how their values are related in value transition graphs that provide the basis for determining when functional closure has been achieved. We have also had a brisk overview of practical considerations relevant to managing verification pro- jects to produce the standard results needed for a data-driven risk assess- ment. This chapter will discuss how to make use of these results to gain understanding of the level of functional coverage indicated by these results by examining standard views and standard measures. Standard views and measures are the IP analog of mechanical drawings and tolerances. The market for interchangeable parts thrives partly because of these standards, forsaking a proliferation of inches and meters and fur- longs and cubits. As standards bodies define the standard variables and their ranges for popular interface standards, IP providers and integrators will have a clear concise language for communicating product capabilities and customer needs. 6.1 Functional Coverage Bailey (p. 82.) defines functional coverage as “a user-defined metric that reflects the degree to which functional features have been exercised during the verification process.” The accuracy of this definition is apparent in the many such metrics that have been defined by users over the years. It can be quite instructive to survey the current literature for a current understanding of functional coverage. We have seen that 100% coverage for an instance in its context is when all possible unique trajectories to every function point have been traversed. The industry does not yet have tools to compute those trajectories for us, but formal verification may some day achieve such a feat. It’s worth noting that some function points might be reached along only one single func- tional trajectory, but it’s more likely that a given function point will be reachable along numerous trajectories, requiring multiple visits to such points to achieve 100% functional coverage. 138 Chapter 6 – Analyzing Results One vendor’s tool has a “coverage engine [that] keeps track of the num- ber of times the behavior occurs.” This corresponds to the traversal of a particular functional trajectory in a VTG. Any behavior can be described as a trajectory through the VTG containing the function points whose val- ues are assigned (e.g., to stimuli and conditions, etc.) or produced (as a re- sponse) in the behavior. Because verification tools are not yet able to produce 100% exhaustive functional coverage, the phrase “thoroughly exercise” is the operative norm. How thoroughly must a target be exercised with CRV prior to tape- out (or shipments)? CRV must be sufficiently thorough to achieve the risk goals declared in the verification plan. Risk assessment is the topic of the next chapter. This chapter will con- sider the various methods available to evaluate the thoroughness of cover- age, both with numerical measures and with visual views. It is on the basis of these measures and views that risk assessment is conducted. 6.2 Standard Results for Analysis Producing the standard results yields, in effect, a coverage database suit- able for data mining (Raynaud, pp. 4-5). Modern tools already have the ability to transform raw data into certain views and measures. The move to standard results for standard variables allows comparison of differing projects and, in particular, IP offerings from different vendors. The indus- try is already rather close in presentation of views and measures. What re- mains is the partitioning of the results in a standard, universal manner. To facilitate creation of actual standards, the measures and views de- rived from these results must be economical to generate. Formal verifica- tion using theorem proving and model-checking are not yet economical for commercial designs. However, generating useful measures and views from standard results is already possible with commercially available verifica- tion tools. 6.3 Statistically Sampling the Function Space The functional space as defined by the variables and their ranges may be prohibitively large for complete analysis in a practical timeframe. Rather than examining all values of all variables attained over the course of re- gression, one may: 6.4 Measures of Coverage 139 • examine all values of a sample of variables to identify coverage holes (unexercised values, value duples, tuples, etc.) • define randomly-chosen functional points (value tuples) in the func- tional space and observe whether they are visited or not, how often, etc. 6.4 Measures of Coverage A measure is simply a numerical indication of some quantity. There are formal definitions for measure in a mathematical sense that can be applied to all manner of finite and infinite sets. Fortunately, the measures that are useful for data-driven risk assessment are familiar and straightforward. Fig. 6.1. Analyzing Results with Standard Views and Measures 140 Chapter 5 discussed how data can be normalized according to the com- plexity of the target, whether determined by analyzing the VTG for the target early in the project, or by analyzing a special form of synthesis later in the project. The first (and very familiar) measure itself incorporates a form of normalization, one based on lines of code to express the RTL. 6.5 Code Coverage The process of simulating RTL requires that lines of code be evaluated in some order to determine logical values as time (measured in clock cycles) progresses. Code coverage tools keep track of whether or not the simulator has ever evaluated a given component of code, such as a line or an expres- sion. This is accomplished by instrumenting the code with special proce- dures that record whether or not the component of code was simulated. Then, when some quantity of simulation has completed (such as a regres- sion run), the count of components simulated is divided by the total count of those components. In theory this appears to be an excellent measure for coverage, but in practice it can prove to be clumsy and awkward to use. Moreover, because it considers each code component individually, it cannot account for inter- actions among the many components that make up a target’s RTL, and herein lies its weakness. Nevertheless, code coverage still has great useful- ness as an indicator of the incompleteness of verification. If code coverage is below 100%, then functional coverage is certainly below 100%. How- ever, the inverse is not true. The clumsiness in using code coverage arises from the fact that most RTL in use for current designs is either designed for intentional re-use or is being re-used (leveraged) from prior designs. Consequently, this RTL in- tentionally contains code components that might never be used in a given target. This so-called dead code can be very difficult to exclude from code coverage computations and this leads to the clumsiness in using these tools. Many days of painstaking engineering work can be consumed in analyzing a coverage report to determine which unexercised components are truly valid for a target and which are not valid (i.e., dead code). But, there’s an even more important reason why code coverage is not a good indicator of functional coverage. Code coverage tools only examine the code given to them. These tools cannot determine whether the code completely implements the target’s Chapter 6 – Analyzing Results 6.5 Code Coverage 141 required functionality. So, if 100% code coverage is achieved but the code does not implement some requirement, the target is not verified. As one large CAD firm sums up the situation well in a whitepaper, stat- ing that, “three inherent limitations of code coverage tools are: overlooking non-implemented features, the inability to measure the interaction between multiple modules and the ability to measure simultaneous events or se- quences of events.” Nevertheless, code coverage tools can be very helpful in understanding certain coverage issues. For example, a block designer may want to under- stand whether his or her block-level tests have taken some state machine through its paces completely. One vendor’s tool “paints the states and tran- sition arcs [of state machines] in colors indicating verified, unverified, and partially verified.” By “verified” what is actually meant is merely visited as often as the user thought would be sufficient. Understanding which arcs have not yet been “verified” informs the block designer where additional tests are needed. This example, by the way, is one where code coverage can be very ef- fective: when an RTL designer is writing code from scratch for use as a single instance. In this case, every line of code is part of the target’s func- tionality and every line must be exercised. Code coverage should be evaluated separately for each clock domain of each target of verification. This helps to relate the coverage achieved to the number of clock cycles needed to achieve it, such as for computing a value for convergence of a test generator. Additionally, code coverage should be evaluated on a code module basis and not on a code instance basis if one wishes to understand the coverage of the RTL of that module. On the other hand, if one wishes to understand the coverage of the RTL using that module, code coverage would be evaluated on a code instance basis. If, for example, the RTL using the module has 100% code coverage, but the module does not, perhaps code is missing from the RTL using the module. This is one of the key problems in using code coverage. It says nothing about code which is not present. Standard measures for code coverage have been in place, more or less, for many years. Keating (pp. 174-177) describes six measures based on code coverage and states requirements for each as follows: • Statement: 100% • Branch: 100% • Condition 1 : 100% 1 Note that the use of the term “condition” in the context of code coverage does not refer to the variables of condition in the standard framework. 142 • Path (considered to be secondary with less than 100% required) • Toggle (considered to be secondary with less than 100% required) • Trigger (considered to be secondary with less than 100% required) Other definitions for code coverage exist as well (Bailey, p. 81). Code coverage tools do provide a mechanism for identifying code that is to be excluded from coverage computations for whatever reason. The in- strumentation added to the RTL by the tool can slow simulation speed dramatically because the procedure calls made for each instrumented com- ponent do take time to execute. So, one might choose to instrument only certain modules so that coverage on those modules can be evaluated more quickly than by running code coverage on the entire RTL. Another situation that merits excluding certain fragments of code is when the code is deemed to be intentionally unreachable (or dead code). For example, RTL constituting multi-instance IP (intellectual property that is intended to generate a variety of different instantiations) might contain code which is dead for certain instantiations. To prevent this known dead code from being included in the code coverage computations, pragmas such as “cov-off” and “cov-on” are used to delineate such code segments. However, exercised code within a cov-off/cov-on pair constitutes an er- ror of some sort. If the code has been deemed to be unreachable but has, in fact, been reached (subjected to evaluation by the simulator), then there is a bug in the code or there is an error in the understanding of the dead code that is actually quite alive. It’s important to track whether any dead code has been reached because some error exists somewhere. IP actually often contains dead code, and this is quite ordinary and not generally a cause for alarm. For example, RTL that is designed as IP to be instantiated with different internal connectivities depending on the IP inte- grator’s needs might contain one or more state machines containing an arc that are traversed only for certain internal connectivities. It can be difficult to remove or disable the code corresponding to these arcs and this will be reported as uncovered statements, branches, or expressions, etc. 6.6 State Reachability in State Machines Another measure of the power of a test generator is indicated by its ability to reach all states in the target rapidly. Code coverage tools indicate the reachability of states within the individual state machines of the target, but do not indicate the reachability of composite state. A given target will Chapter 6 – Analyzing Results 6.8 Fault Coverage 143 typically contain numerous state machines, some of which operate in close cooperation and others that operate more independently of one another. Consider the example of two state machines, A and B, each of which has 4 possible states (see Fig. 6.2). Fig. 6.2. Composite state pairs The number of theoretical composite states is 16. However, in this ex- ample the actual number is smaller, because some states in each machine are mutually exclusive, i.e. they cannot both exist during the same clock cycle (in Fig. 6.2 these nonexistent pairs of states are indicated by the “X”). The reachability of such state pairs may be determinable using for- mal verification techniques. The number of cycles needed to reach all state pairs is also an indicator of the power of the test generator. Of course, any reachable pair that is not reached by the test generator indicates a coverage hole in which bugs might remain unexposed. 6.7 Arc Transversability in State Machines It is sometimes useful to apply the technique for determining valid com- posite states to determine valid composite arc traversals. After weeding out all mutually exclusive pairs of arcs from 2 state machines, a 2-dimensional view of the pairs of arcs can indicate unexercised functionality where un- exposed bugs might still be located. 6.8 Fault Coverage If fault coverage is evaluated for the regression suite, the results can be in- dicative of how thoroughly a regression suite has exercised the target. Us- ing the fault coverage of the manufacturing test suite is not directly appli- cable, because this suite will make use of the testability morph to achieve sufficiently high fault coverage. 144 A common rule of thumb states that the functional regression suite should achieve about 80% fault coverage (using a single stuck-at model). A low value of fault coverage from the functional regression suite is an in- dicator of weak regression suite. Examining the fault coverage for each module in the target can also indicate which modules are not well exer- cised by the verification software. Of course, fault coverage is only an indirect measure of functional cov- erage and the actual relation between fault coverage and functional cover- age is one of correlation. Historical organizational data can be mined to de- termine this correlation. 6.9 VTG Coverage We have determined that the entire functional space of an instantiated tar- get can be characterized by the points and arcs as represented in VTGs. A coverage measure corresponding to the fraction of arcs traversed is a true indicator of the degree of exercise of a target. After all functional arcs have been traversed at least once, there is no more functionality to exercise. VTG arc coverage is the measure for functional closure. There is no stronger measure of coverage than VTG arc coverage. High VTG-arc coverage will be found to correspond to high fault cover- age of a target without morphing the target for testability. This level of fault coverage is achieved by exercising the functional space of morph 0 without resorting to using morph 1 . However, achieving this level of fault coverage using morph 0 will require significantly more test time and this becomes expensive when producing parts in large volume. Consequently, manufacturing tests are applied using morph 1 which is specifically de- signed to achieve the required level of fault coverage in minimal time. 6.10 Strong Measures and Weak Measures The various measures of coverage available differ in the degree to which they indicate how thoroughly the target has been exercised. Code coverage measures are increasingly stronger as the granularity of measurement is made finer. Line coverage is a rather gross approximation of actual cover- age as compared to expression coverage that scrutinizes the target’s RTL more closely. At the other extreme is VTG arc coverage, which as we learned in chap- ter 3 is the basis for determining functional closure. Other measures of Chapter 6 – Analyzing Results . measures are instead based on the precise size of the functional space, no information is lost. Functional closure is based on the size of the functional space. The number of bugs encountered along. capabilities and customer needs. 6.1 Functional Coverage Bailey (p. 82.) defines functional coverage as “a user-defined metric that reflects the degree to which functional features have been exercised. exercise of a target. After all functional arcs have been traversed at least once, there is no more functionality to exercise. VTG arc coverage is the measure for functional closure. There is