16 THE FRACTAL STRUCTURE OF DATA REFERENCE But assuming that the placement of interval boundaries falls at random, the average number of interval boundaries crossed by front ends and by back ends must be in proportion to their durations. Therefore, for each touched interval containing a back end I/O, there must be, on average, ∆τ/τ touched intervals that do not. We may therefore conclude that the probability that a touched interval contains a back end I/O, and the probability that a touched interval contains a miss, are both equal to where the expression on the right follows from (1.11). the total number of I/O’s n req and touches n tch . We may then estimate To apply the conclusion just stated as a method of trace analysis, we count (1.17) Note that if 6’ is in the general ballpark of the guestimate (1.6), then the estimate (1.17) is not highly sensitive to the exact value of θ being assumed. For example, suppose that, in some interval, we count a total of 100,000 references to 20,000 distinct tracks. Then the guestimate (1.6) would yield an estimated hit ratio of 1 – .2 x (1 – .25) = 85 percent. By comparison, the alternative assumptions θ = .2 and θ = .3 would yield estimated hit ratios of 84 percent and 86 percent respectively. 4.3 REQUIREMENTS FOR MEMORY We have just devoted several pages to a detailed discussionofthe key, closely related results (1.1 1) and (1.12), which describe the time spent by a track during a visit to memory. We now derive what is by far the most important consequence of this visit time: the resulting cache memory requirements. We relate these requirements both to the cache miss ratio, as well as to the level of service being provided to applications (as reflected in the average or single - reference residency time). Our starting point for calculating the requirements for cache memory is Little’s law, as applied previously in the result (1.14). The same result can be stated equivalently as By (1.4) and (1.12), we also have (1.18) (1.19) where Hierarchical Reuse Model 17 Substituting this expression for m into (1.18), (1.20) In a nutshell, (1.20) says that the requirement for cache memory can be estimated by first establishing an objective for average cache residency time. Reasonable objectives for the average residency time, in turn, can be developed by considering the requirements of individual applications. The next subsection applies and illustrates these ideas. It is useful to note that the equations (1.4), (1.12), and (1.20) form a “chain” that ties together, in succession, the variables m, τ,T,and s. To summarize the entire “chain” in one place, we have: (1.21) where All of the relationshipsjust presented are sufficiently simple that it is possible to “skip over” any desired part of the chain through substitution. Also, all of these relationships are easily reversed. For example, if we wish to express the cache size requirements in terms of the single - reference residency time (rather than the average residency time as just studied above) we may reverse the last two equations, then use substitution: (1.22) As another example, we can use successive substitutions to express the miss ratio as a function of cache size. Just as the miss ratio, as a function of the single - reference residency time, takes the form of a simple power law, so does the miss ratio, as a function of cache memory: (1.23) The existence of a power law relationship between these quantities was first noted by Chow [ 15], and demonstrated convincingly by Smith [ 16]. 18 THE FRACTAL STRUCTURE OF DATA REFERENCE When combined with the guestimate (1.6), and applied in a context where there is an existing workload with some known I/O rate, (1.23) yields a useful rule of thumb: (1.24) for some (not particularly important) constant k. For example, according to the rule of thumb just stated, it is necessary to increase the cache size by eight times, in order to reduce misses by a factor of two. 4.4 AN EMPIRICAL VIEW OF RESIDENCY TIME As mentioned briefly in introducing the hierarchical reuse model, objectives for the average residency time T can be developed by considering the require - ments of individual applications. In view of (1.20), such objectives provide a valuable starting point for configuration planning. Let us, therefore, consider the impact of the average residency time on the performance experienced by a wide range of cached data. Figures 1.4 through 1.10 present the performance of a “plain vanilla” cache, relative to average residency time, for a range of application storage pools. These include both the VM storage pools already introduced, as well as the stor - age pools traced during a survey oftwelve moderate to large OS/390 installations [ 17]. The figures include the following OS/390 data pools: DB2: On - line DataBase 2 (DB2) database storage. Figure 1.4. time. VM user storage pools: cache performance as a function ofaverage cache residency Hierarchical Reuse Model 19 Figure 1.5. residency time. VM system storage pools: cache performance as a function of average cache Figure 1.6. time. DB2 storage pools: cache performance as a function of average cache residency 20 THE FRACTAL STRUCTURE OF DATA REFERENCE Figure 1.7. time. CICS storage pools: cache performance as a function of average cache residency Figure 1.8. time. IMS storage pools: cache performance as a function of average cache residency . 16 THE FRACTAL STRUCTURE OF DATA REFERENCE But assuming that the placement of interval boundaries falls at random, the average number of interval boundaries crossed. Let us, therefore, consider the impact of the average residency time on the performance experienced by a wide range of cached data. Figures 1.4 through 1.10 present the performance of a “plain. to express the miss ratio as a function of cache size. Just as the miss ratio, as a function of the single - reference residency time, takes the form of a simple power law, so does the miss