4.16 Tracking Progress (§ 8) 115 4.16 Tracking Progress (§ 8) The late great writer Douglas Adams was quoted as saying, as he labored over his characteristically late manuscripts, “I love deadlines. I love the whooshing sound they make as they go by.” Project managers love deadlines nearly as much as authors, and are rou- tinely required to provide updated estimates for when various milestones will be met. The most critical milestone is the one at which the risk of a functional bug is sufficiently low so that final artwork can be prepared for tape-out. How close is our project to completion? That question will be addressed in detail in the last two chapters, but there are some metrics that provide some indications of progress towards completion. To track progress in re- ducing the risk of a bug in a verification target, the following metrics can provide key insights: • Code coverage: 100% statement coverage (see chapter 6) indicates that about one third of the overall work to expose bugs is completed. Fig. 4.14 illustrates how code coverage might increase as a function of time for a typical project. • Bug discovery rate: If you stop looking, you will stop finding. A non- convergent regression suite or one that has reached the asymptote of its convergence (see next chapter) will eventually stop looking because it is not exercising previously unexercised functionality. See Fig. 4.15 for an example of how bug discovery rate might appear for a typical project. • Bug counts and severity: for risk analysis. Each organization counts bugs and assigns severity somewhat differently so comparison of such data from different organizations is often not reliable. • Bug locality: Bugs are not typically distributed uniformly within the tar- get but are often clustered by code module or by functionality (as char- acterized by the variables on which faulty behavior is dependent). Tracking the locality of bugs can provide useful indicators on how to focus continued verification activity. Similarly, associating bugs with the specific sections of specifications for the target can also indicate ar- eas for increased focus. 116 Chapter 4 – Planning and Execution Fig. 4.14. Code coverage rate Typically bugs will be discovered more rapidly on “fresh” RTL that hasn’t yet been exercised. So, as new functionality in the test environment is turned on, a brief burst in bug discovery may follow because code (whether lines or expressions etc.) is exercised in ways it had not been previously. And, as the RTL matures and bugs become fewer and fewer, this will be- come apparent as the discovery rate tapers off. However, there may be other reasons for such tapering off as we will see in the next section. Comparing bug discovery rates across projects or to compare bug counts from different projects can be quite problematic for a number of reasons, and such comparisons should be regarded skeptically. First and foremost is that different organizations will count bugs differ- ently. For a given observation of faulty behavior that requires changes in multiple modules to eliminate the faulty behavior, one group might regard this as a single bug whereas another group might regard this as multiple bugs (one in each affected module). Additionally, if the changes related to the bug are later determined to be incomplete (faulty behavior not elimi- nated completely), this may result in reporting yet another bug. Furthermore, 4.16 Tracking Progress (§ 8) 117 bugs are usually graded by severity, that is, how badly they affect behavior of the target. There are no uniform definitions for severity and different organizations (indeed, different engineers in the same organization) will grade bugs differently. 9 Fig. 4.15. Bug discovery rate Another factor that makes cross-project comparison of bug curves prob- lematic is when the counting begins. Some projects begin counting bugs as soon as simulation of the target begins, regardless of the completeness of the RTL of the target. Other projects may delay counting of bugs until the RTL has achieved some level of behavioral stability, possibly well after the RTL is functionally complete. Yet another factor that can affect bug counts dramatically is stability of the specifications. If requirements on the target change mid-stream, the re- sulting sets of changes to the RTL to implement the new requirements are 9 IP customers often grade bugs much more severely than IP providers. 118 Chapter 4 – Planning and Execution very likely to introduce more bugs into the target, increasing the bug count for the project. One comparison that does have some validity, however, is the number of bugs still present in the target after tape-out. If a project’s goal is to pro- duce production-worthy silicon at first tape-out, then the number of bugs that require a re-spin of the silicon matters. It is also worth noting that achieving maximum code coverage does not mean that complete functional coverage has been achieved. A useful rule of thumb for gauging progress is that when maximum code coverage has been reached, then about one-third of the engineering effort to find the functional bugs has been spent. The first bugs are typically easy to find and the last bugs are increasingly difficult (requiring more engineering ef- fort and time) to find. 4.17 Related Documents (§ 9) This section of the plan lists the documents pertinent to the verification target, especially including the versions of each document. This list of documents includes: • design requirements (the natural-language specifications) • requirements on properties recorded as special cases, such as perform- ance • external standards documents, such as for PCI-Express or USB or AMBA • established internal checklists, such as those for code inspections or de- sign reviews 4.18 Scope, Schedule and Resources (§ 10) There are many well-established practices for managing projects success- fully and a complete discussion of these practices are beyond the scope of this book. However, a few key points are worth noting in the context of functional verification. The three principle factors available for trade-off in the execution of a project are the project’s scope, its schedule, and the resources available. The section of the verification plan called Schedule should address all three of these factors. 4.19 Summary 119 The scope of the project determines what is to be produced. It’s worth- while to list not only what is included in the project but also what is not in- cluded in the project. For example, if commercial IP is integrated into the target, and this IP is believed to be already verified, the verification plan should state explicitly that verification of the IP is not an objective of the project. On the other hand, if there is reason to suspect that the IP might contain bugs, then the plan should state explicitly that verification of the IP is an objective of the project. The schedule states who does what by when. Tasks are defined such that they can be assigned to a single individual and estimates are made for completing each task. The tasks are linked according to their dependencies into a single schedule (as a Gannt chart) and the current version of the schedule becomes part of the plan, perhaps by reference. The resources needed for the project will include not only the verifica- tion engineers, but also the simulation engines needed for CRV, the tools that must be licensed or purchased (including verification IP), and so forth. If any of these should change, a revision to the plan is in order. 4.19 Summary We have now related the loftier concepts of functional spaces to the more earthly concepts of producing a profitable product. By executing verifica- tion projects within the standard framework, one not only enjoys the ad- vantage of producing results that can be used for data-driven risk assess- ment. One also gains huge leverage to all other verification projects. Managers responsible for multiple IC designs will benefit by having a pool of verification talent that can move readily from one project to an- other without having to learn yet another verification scheme. Terminol- ogy is shared among all projects. Projects can be compared because they are executing to similar milestones, such as IC done and FC done. Plan- ning is greatly simplified by having a common template for all verification plans. Before proceeding to analyzing the results that verification produces, we still need to consider how to estimate the resources needed to execute our projects. Fortunately, there are a few techniques that can assist us that are based on objective data. That will be the topic of our next chapter. 120 Chapter 4 – Planning and Execution References Bergeron J (2003) Writing Testbenches: Functional Verification of HDL Models, Second Edition. Kluwer Academic Publishers. Bergeron J, Cerny E, Hunter A, Nightingale A (2005) Verification Methodology Manual for SystemVerilog. Springer. Foster HD, Krolnik A, and Lacey D (2004) Assertion-Based Design, 2nd Edition. Francard R, Posner M (2006) Verification Methods Applied to the ST Microelec- tronics GreenSIDE Project. Design And Reuse. Haque FI, Khan KA, Michelson J (2001) The Art of Verification with Vera. Veri- fication Central. Palnitkar S (2004) Design Verification with e. Prentice Hall Professional Techni- cal Reference. Synopsys (2003) Constrained-Random Test Generation and Functional Coverage with Vera. Talesara H, Mullinger N (2006) Accelerating Functional Closure: Synopsys Veri- fication Solutions. Synopsys, Inc. Wile B, Goss JC, Roesner W (2005) Comprehensive Functional Verification. El- sevier/Morgan Kaufman. Chapter 5 – Normalizing Data In the previous chapter we considered a fairly wide variety of practical matters related to functional verification, but one question was not ad- dressed. What will it take to verify our target in our current project? Targets of verification continue to grow and become more complex, re- quiring more resources to expose those bugs that experience and intuition tell us are surely there. But, how much “bigger” is one target than another? Measures based on lines of uncommented source code (for the RTL) or transistor count or die area are readily available, but these measures do not correlate strongly with the number of bugs we will expose or the number of simulation cycles we will consume over the months in our efforts to ex- pose them. Die area and transistor counts do not correlate well, because large uni- form arrays (such as SRAM cells for caches and memories) represent a significant fraction of these measures. Lines of code seem like a better candidate, but experience with code coverage tools (see chapter 5) suggest that this measure will not correlate well either. In fact, short code segments with tightly packed convoluted logical expressions cleverly implementing some functionality may be much buggier than long segments of simpler code implementing the same functionality. A better measure for complex- ity is certainly needed. This chapter will explore three separate indicators for complexity that can be used to normalize data in a way that can be used to forecast project resources. Once we have a useful measure for complexity, perhaps we can then determine how many cycles of CRV will be needed to achieve suffi- ciently high functional coverage so that the risk of unexposed functional bugs is acceptably low. 5.1 Estimating Project Resources Determining the resource requirements (simulation resources in particular) to execute a verification project successfully is often based largely on the judgment of the project manager and the verification engineers. However, 122 Chapter 5 – Normalizing Data using data accumulated from past projects can enable better forecasts of project resource needs as well as enable the manager responsible for mul- tiple verification projects allocate resources across multiple projects. We will discuss three techniques: 1. Examine the power of the test generator to exercise the target 2. Estimate the complexity of the target using synthesis results 3. Estimate the complexity of the target using its VTG 5.2 Power and Convergence One frequent and vital activity that requires good resource allocation is re- gression. The verification manager needs to be able to determine how much time will be required for a complete regression of the target. This will determine how frequently new changes can be checked into the verifi- cation team’s copy of the target RTL. If the available compute resources are able to execute a complete re- gression of the target in one day’s time, then changes can be verified on a daily basis. If on the other hand it requires a week to regress the target fully, then changes cannot be made as frequently. For a large design, say of a microprocessor, complete regression can require a week or more, de- pending on the number of simulation platforms allocated for this vital task. The ability of tests to explore the functional space rapidly by continu- ously producing novel activity is defined as its power. Methods to measure this power directly would require extensive statistical analysis of the con- tents of the tests, and such direct methods are not readily available. How- ever, it is possible and practical to measure the effects of the tests using readily available tools, such as those that measure code coverage. By examining how quickly (in terms of simulated cycles) a given meas- ure of coverage converges on its maximum value, we can gain insight into the power of the tests produced by the test generator. The higher the convergence of the tests with respect to the target, the more efficiently one achieves a thorough exercise of the target. Collecting the coverage data on which convergence is determined should be done on a code module basis, not on a code instance basis. The precise shape of the convergence curve depends, of course, on the test generator and the target against which tests are being applied. Fur- thermore, this is only a generalized model for convergence and different combinations of test generator and target may yield rather different curves. 5.2 Power and Convergence 123 Nevertheless, the general concept of converging asymptotically on maxi- mum coverage still applies. The mathematics underlying the computations may need revision for curves that differ greatly from the model shown. Fig. 5.1. Convergence of tests The convergence of a test generator is characterized by 2 values: • convergence gap: α =100% − level of asymptote • beta cycles: coverage(N β ) = (1 − α ) ⋅ 2e − 1 , an indicator of how many cycles are needed to reach about 74% of the level of the asymptote, analogous to the “rise time” of an exponentially rising signal. A test generator that rapidly reaches its highest level of code coverage (approaches its asymptote) is said to have good convergence (see Fig. 5.1). A test generator that reaches its highest level slowly is said to converge poorly. The convergence gap is the difference between 100% and the value for the asymptote of the generator. The narrower this gap, of course, the greater the power of the generator. There may always be certain lines 124 Chapter 5 – Normalizing Data of code that are only exercised via deterministic tests, but this code might not benefit from the exercising provided by the pseudo-random verifica- tion software. Moreover, code coverage—especially for multi-instance IP—can be difficult and clumsy to use. The use of code coverage as a tool will be discussed in some detail in chapter 6. Driving the convergence gap towards zero increases the efficiency of the test generator to exercise the target thoroughly. 5.3 Factors to Consider in using Convergence Fig. 5.2 shows how, when graphing convergence using typical test order (activation tests, ordinary CRV tests, error tests), a stratification appears. This 3-level stratification (or possibly more) can make computation of Ν β tricky. Stratification results from the following sets of tests that exercise specific subsets of logic: • activation • morph for testability • morph for verification • error-imposition tests • the bulk of CRV One way to deal with this is to choose tests at random from all types to obtain a non-stratified, smoother curve of convergence. On the other hand, the stratification may be so slight as to have a negligible effect on Ν β . • deterministic tests . Reference. Synopsys (2003) Constrained-Random Test Generation and Functional Coverage with Vera. Talesara H, Mullinger N (2006) Accelerating Functional Closure: Synopsys Veri- fication Solutions. Synopsys,. Comprehensive Functional Verification. El- sevier/Morgan Kaufman. Chapter 5 – Normalizing Data In the previous chapter we considered a fairly wide variety of practical matters related to functional. convoluted logical expressions cleverly implementing some functionality may be much buggier than long segments of simpler code implementing the same functionality. A better measure for complex- ity is