Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 389 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
389
Dung lượng
26,41 MB
Nội dung
Embedded Memories for Nano-Scale VLSIs Series on Integrated Circuits and Systems Series Editor: Anantha Chandrakasan Massachusetts Institute of Technology Cambridge, Massachusetts Embedded Memories for Nano-Scale VLSIs Kevin Zhang (Ed.) ISBN 978-0-387-88496-7 Carbon Nanotube Electronics Ali Javey and Jing Kong (Eds.) ISBN 978-0-387-36833-7 Wafer Level 3-D ICs Process Technology Chuan Seng Tan, Ronald J Gutmann, and L Rafael Reif (Eds.) ISBN 978-0-387-76532-7 Adaptive Techniques for Dynamic Processor Optimization: Theory and Practice Alice Wang and Samuel Naffziger (Eds.) ISBN 978-0-387-76471-9 mm-Wave Silicon Technology: 60 GHz and Beyond Ali M Niknejad and Hossein Hashemi (Eds.) ISBN 978-0-387-76558-7 Ultra Wideband: Circuits, Transceivers, and Systems Ranjit Gharpurey and Peter Kinget (Eds.) ISBN 978-0-387-37238-9 Creating Assertion-Based IP Harry D Foster and Adam C Krolnik ISBN 978-0-387-36641-8 Design for Manufacturability and Statistical Design: A Constructive Approach Michael Orshansky, Sani R Nassif, and Duane Boning ISBN 978-0-387-30928-6 Low Power Methodology Manual: For System-on-Chip Design Michael Keating, David Flynn, Rob Aitken, Alan Gibbons, and Kaijian Shi ISBN 978-0-387-71818-7 Modern Circuit Placement: Best Practices and Results Gi-Joon Nam and Jason Cong ISBN 978-0-387-36837-5 CMOS Biotechnology Hakho Lee, Donhee Ham and Robert M Westervelt ISBN 978-0-387-36836-8 Continued after index Kevin Zhang Editor Embedded Memories for Nano-Scale VLSIs 123 Editor Kevin Zhang Intel Corporation 2501 NW 229th Ave Hillsboro, OR 97124 USA kevin.zhang@intel.com ISBN 978-0-387-88496-7 e-ISBN 978-0-387-88497-4 DOI 10.1007/978-0-387-88497-4 Library of Congress Control Number: 2008936472 c Springer Science+Business Media, LLC 2009 All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed on acid-free paper springer.com Contents Introduction Kevin Zhang Embedded Memory Architecture for Low-Power Application Processor Hoi Jun Yoo and Donghyun Kim Embedded SRAM Design in Nanometer-Scale Technologies 39 Hiroyuki Yamauchi Ultra Low Voltage SRAM Design 89 Naveen Verma and Anantha P Chandrakasan Embedded DRAM in Nano-scale Technologies 127 John Barth Embedded Flash Memory 177 Hideto Hidaka Embedded Magnetic RAM 241 Hideto Hidaka FeRAM 279 Shoichiro Kawashima and Jeffrey S Cross Statistical Blockade: Estimating Rare Event Statistics for Memories 329 Amith Singhee and Rob A Rutenbar Index 383 v Contributors John Barth IBM, Essex Junction, Vermont, jbarth@us.ibm.com Anantha P Chandrakasan Massachusetts Institute of Technology Cambridge, MA, USA Jeffrey S Cross Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan, cross.j.aa@m.titech.ac.jp Hideto Hidaka MCU Technology Division, Renesas Technology Corporation, 4-1 Mizuhara, Itami, 664-0005, Japan, hidaka.hideto@renesas.com Shoichiro Kawashima Fujitsu Microelectronics Limited, System Micro Division, 1-1 Kamikodanaka 4-chome, Nakahara-ku, Kawasaki, 211-8588, Japan kawashima@jp.fujitsu.com Donghyun Kim KAIST Rob A Rutenbar Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA, rutenbar@ece.cmu.edu Amith Singhee IBM, Thomas J Watson Research Center, Yorktown Heights, NY, USA Naveen Verma Massachusetts Institute of Technology, Cambridge, MA, USA, nverma@mit.edu Hiroyuki Yamauchi Fukuoka Institute of Technology, Fukuoka, Japan Hoi Jun Yoo KAIST Kevin Zhang Intel Corporation, Hillsboro, OR, USA vii Chapter Introduction Kevin Zhang Advancement of semiconductor technology has driven the rapid growth of very large scale integrated (VLSI) systems for increasingly broad applications, including high-end and mobile computing, consumer electronics such as 3D gaming, multi-function or smart phone, and various set-top players and ubiquitous sensor and medical devices To meet the increasing demand for higher performance and lower power consumption in many different system applications, it is often required to have a large amount of on-die or embedded memory to support the need of data bandwidth in a system The varieties of embedded memory in a given system have also become increasingly more complex, ranging from static to dynamic and volatile to nonvolatile Among embedded memories, six-transistor (6T)-based static random access memory (SRAM) continues to play a pivotal role in nearly all VLSI systems due to its superior speed and full compatibility with logic process technology But as the technology scaling continues, SRAM design is facing severe challenge in maintaining sufficient cell stability margin under relentless area scaling Meanwhile, rapid expansion in mobile application, including new emerging application in sensor and medical devices, requires far more aggressive voltage scaling to meet very stringent power constraint Many innovative circuit topologies and techniques have been extensively explored in recent years to address these challenges Dynamic random access memory (DRAM) has long been an important semiconductor memory for its well-balanced performance and density With increasing demand for on-die dense memory, one-transistor and one-capacitor (1T1C)-based DRAM has found varieties of embedded applications in providing the memory bandwidth for system-on-chip (SOC) applications With increasing amount of ondie cache memory for high-end computing and graphics application, embedded DRAM (eDRAM) is becoming a viable alternative to SRAM for large on-die memory To meet product requirements for eDRAM while addressing continuous technology scaling, many new memory circuit design technologies, which are often K Zhang (B) Intel Corporation, 2501 NE 229th Ave, Hillsboro, OR 97124, USA e-mail: kevin.zhang@intel.com K Zhang (ed.), Embedded Memories for Nano-Scale VLSIs, Series on Integrated Circuits and Systems, DOI 10.1007/978-0-387-88497-4 1, C Springer Science+Business Media, LLC 2009 K Zhang drastically different from commodity DRAM design, have to be developed to substantially improve the eDRAM performance while keeping the overall power consumption at minimum Solid-state nonvolatile memory (NVM) has played an increasingly important role in both computing and consumer electronics Many new applications in most recent consumer electronics and automobiles have further broadened the embedded application for NVM Among various NVM technologies, floating-gate-based NOR flash has been the early technology choice for embedded logic applications With technology scaling challenges in the floating-gate technologies, including the increasing need for integrating NVM along with more advanced logic transistors, varieties of NVM technologies have been extensively explored, including alternative technology based on charge-trapping mechanism (Fig 1.1) More efficient circuit design techniques for embedded flash also have to be explored to achieve optimal product goals With increasing demand of NVM for further scaling of the semiconductor technology, several emerging memory technologies have drawn increasingly more attention, including magnetic RAM (MRAM), phase-change RAM (PRAM), and ferroelectric RAM (FeRAM) These new technologies not only address some of the fundamental scaling limits in the traditional solid-state memories, but also have brought new electrical characteristics in the nonvolatile memories on top of the random accessing capability For example, MRAM can offer significant speed improvement over traditional floating-gate memory, which could open up whole new applications FeRAM can operate at lower voltage and consume ultra low power, which has already made it into “smart-card” marketplace today These new memory technologies also require a new set of circuit topologies and sensing techniques to maximize the technology benefits, in comparison to the traditional NVM design With rapid downward scaling of the feature size of memory device by technology and drastic upward scaling of number of storage elements per unit area, process-induced variation in memory has become increasingly important for both memory technology and circuit design Statistical design methodology has now 75 65 NMOS σVTN σV T (mV) 55 (one device) Minimum device 45 35 Nominal device 25 15 Fig 1.1 Transistor variation trend with technology scaling [1] 130nm 90nm 65nm 45nm Introduction Fig 1.2 Relative performance among different types of embedded memories become essential in developing reliable memory for high-volume manufacturing The required statistical modeling and optimization capability has grown far beyond the memory cell to comprehend many sensitive peripheral circuits in the entire memory block, such as critical signal development paths Advanced statistical design techniques are clearly required in today’s memory design In traditional memory field, there is often a clear technical boundary between different kinds of memory technology, e.g., SRAM and DRAM, volatile and nonvolatile With growing demand for on-die memory to meet the need of future VLSI system design, it is very important to take a broader view of overall memory options in order to make the best design tradeoff in achieving optimal system-level power and performance Figure 1.2 illustrates the potential tradeoff among these different memories With this in mind, this book intends to provide a state-of-the-art view on most recent advancements of memory technologies across different technical disciplines By combining these different memories together in one place, it should help readers to gain a much broadened view on embedded memory technology for future applications Each chapter of the book is written by a set of leading experts from both industry and academia to cover a wide spectrum of key memory technologies along with most significant technical topics in each area, ranging from key technical challenges to technology and design solutions The book is organized as follows: 1.1 Chapter 2: Embedded Memory Architecture for Low-Power Application Processor, by Hoi Jun Yoo In this chapter, an overview on embedded memory architecture for varieties of mobile applications is provided Several real product examples from advanced application processors are analyzed with focus on how to optimize the memory K Zhang architecture to achieve low-power and high-performance goal The chapter intends to provide readers an architectural view on the role of embedded memory in mobile applications 1.2 Chapter 3: Embedded SRAM Design in Nanometer-Scale Technologies, by Hiroyuki Yamauchi This chapter discusses key design challenges facing today’s SRAM design in nanoscale CMOS technologies It provides a broad coverage on latest technology and design solutions to address SRAM scaling challenges in meeting power, density, and performance goal for product applications A tradeoff for each technology and design solution is thoroughly discussed 1.3 Chapter 4: Ultra Low Voltage SRAM Design, by Naveen Verma and Anantha P Chandrakasan In this chapter, an emerging family of SRAM design is introduced for ultra-lowvoltage operation in highly energy-constrained applications such as sensor and medical devices Many state-of-the-art circuit technologies are discussed for achieving very aggressive voltage-scaling target Several advanced design implementations for reliable sub-threshold operation are provided 1.4 Chapter 5: Embedded DRAM in Nano-Scale Technologies, by John Barth This chapter describes the state-of-the-art eDRAM design technologies for varieties of applications, including both consumer electronics and high-performance computing in microprocessors Array architecture and circuit techniques are explored to achieve a balanced and robust design based on high-performance logic process technologies 1.5 Chapter 6: Embedded Flash Memory, by Hideto Hidaka This chapter provides a very comprehensive view on the state of embedded flash memory technology in today’s industry, including process technology, product application, and future trend Several key technology options and their tradeoffs are discussed Product design examples for micro-controller unit (MCU) are analyzed down to circuit implementation level 372 A Singhee and R.A Rutenbar Fig 9.19 The spread of m point estimates across 50 runs of statistical blockade compute GPD models with t = the 99-percentile write time This gives us 50 different estimates of the m point These estimates are shown in Fig 9.19 As expected, the spread of the estimates increases as we extrapolate further with the GPD model We then compute 95% confidence intervals of the m point estimates using these 50 models Say we have 50 estimates yi (m), i = 1, , 50 for the m point From these we can empirically estimate the 97.5 percentile and 2.5 percentile points, y97.5% (m) and y2.5% (m), respectively A 95% confidence interval width κ 95% (m) can then be computed as κ95% (m) = y97.5% (m) − y2.5% (m) We can express this interval as a percentage of the mean of the estimates: κ95% (m) = κ95% (m) n 50 i=1 yi (m) We also compute similar 95% confidence intervals using the second method, using 10,000 pairs of GPD parameter values sampled from the normal distribution ˆ ξ ,β In this case we express the confidence interval ˆ and covariance ⌺ with mean (ξˆ , β) as a percentage of the estimate y(m) computed using the single GPD model Gξˆ ,βˆ Figure 9.20 shows these percentage confidence intervals Although there is some mismatch in the magnitudes of the two estimates of the 95% confidence interval, we see a common trend: the statistical confidence decreases as we move out in the tail To keep the error within 5% with a confidence of 95% we should not be predicting farther than 4.28σ For 10% error, we can go out to 4.95σ Of course these numbers will change from circuit to circuit and Statistical Blockade: Estimating Rare Event Statistics for Memories 373 Fig 9.20 95% confidence intervals as a percentage of the mean value (empirical method) or the single estimate (asymptote method) performance metric to performance metric The general inference is that we should not rely on the GPD tail model too far out from our data 9.6.2.2 The Reason for Error in the MSFF Tail Model Here we return to our MSFF test circuit from Section 9.5.3, where we saw some discrepancy between the empirical and GPD estimates for the failure probability expressed as yσ We will try to develop an explanation for this undesirable, although small, discrepancy For this purpose we call on a common tool of graphical exploration of statistical data: the sample mean excess plot Reference [18] reviews some properties of the mean excess plot Here we focus on its properties in relation to the generalized Pareto distribution The mean excess function for a given threshold yf is defined as e(y f ) = E(y − y f |y > y f ); that is, the mean of exceedances over yf Plotting e(yf ) against yf gives us the mean excess plot The sample mean excess function is the sample version of e(yf ) For a given sample {yi : i = 1, ,n}, it is defined as en (y f ) = n i=1 (yi − y f )+ , where (•)+ = max(•, 0); {yi : yi > y f } that is, the sample mean of only the exceedances over yf A plot of en (yf ) against yf gives us the sample mean excess plot The mean excess function of a GPD Gξ ,β can be shown (see [18]) to be a straight line given by 374 A Singhee and R.A Rutenbar e(y f ) = β − ξyf , for y f ∈ D(ξ, β), 1−β where D(ξ , β) is as defined in Theorem 9.3 Hence, if the sample mean excess function of any data sample starts to follow roughly a straight line from some threshold, then it is an indication that the exceedances over that threshold follow a GPD In fact, this feature of the mean excess plot can be employed to manually estimate an appropriate tail threshold Let us now look at the sample mean excess plot of the MSFF tail data (τ cq ≥ 99-percentile delay) from the 500,000-point Monte Carlo run This is shown in Fig 9.21 The plot suggests a good reason for the observed discrepancy in the estimated failure probabilities It is clear from the plot that the tail defined by the t = 99 point has not converged close to a GPD form Hence, the discrepancy could be a result of choosing a tail threshold that is not large enough To test this, let us choose a threshold t = 3σ point and fit the GPD model to exceedances over this t Figure 9.21 suggests that this should show a better fit, since the sample mean excess function seems to be roughly a straight line from the 3σ threshold The predictions of this new GPD model are shown in Table 9.8 We also reproduce columns and of Table 9.7 for comparison As expected, we see more accurate predictions 9.6.2.3 The Problem For both the issues discussed above, a solution is to sample further out in the tail and use a higher tail threshold for building the GPD model of the tail This is, of course, “easier said than done.” Suppose we wish to support our GPD model with data up to the 6σ point The failure probability of a 6σ value is roughly part per billion, corresponding to a 99% chip yield requirement for a 10 Mb cache Fig 9.21 A sample mean excess plot for the MSFF circuit, showing the 99th percentile and 3σ tail thresholds Statistical Blockade: Estimating Rare Event Statistics for Memories 375 Table 9.8 Prediction of failure probability as yσ using a GPD model (method II of Section 9.5.3) with the tail threshold t at the 99th percentile and at the 3σ point τ cq (yf ) (FO4) (I) Standard Monte Carlo (II) GPD at t = 99th percentile (II) GPD at t = 3σ point 30 40 50 60 70 80 90 3.424 3.724 4.008 4.219 4.607 ∞ ∞ 3.466 3.686 3.854 3.990 4.102 4.199 4.283 3.443 3.729 3.978 4.198 4.396 4.574 4.737 This is definitely not an impractical requirement However, for a 99% tail threshold, even a perfect classifier (tc = t) will only reduce the number of simulations to an extremely large 10 million If we decide to use a 99.9999% threshold, the number of simulations will be reduced to a more practical 1,000 tail points (with a perfect classifier) However, we will need to simulate an extremely large number of points (≥ million) to generate a classifier training set with at least one point in the tail region In both cases, the circuit simulation counts are too high We now describe a recursive formulation of statistical blockade that reduces this count drastically 9.6.3 A Recursive Formulation of Statistical Blockade Let us first assume that there are no conditionals For a tail threshold equal to the ath percentile, let us represent it as ta , and the corresponding classification threshold as tc a For this threshold, build a classifier Ca and generate sufficient points beyond the tail threshold, y > ta , so that a higher percentile (tb , tc b , b > a) can be estimated For this new, higher threshold tc b , a new classifier Cb is trained and a new set of tail points (y > tb ) is generated This new classifier will block many more points than Ca , significantly reducing the number of simulations This procedure is repeated to push the threshold out more until the tail region of interest is reached The complete algorithm is shown in Algorithm 9.2 The arguments to the algorithm are formulated a little differently from the basic statistical blockade algorithm (Algorithm 9.1) Instead of passing the tail and classification threshold probabilities (pt , pc ), we pass a tail sample size nt and a classification threshold probability function pc (p) The former is the number of tail points to be used finally to compute the GPD tail model The latter is a function that returns the classification threshold probability for a given tail threshold probability It is implicitly a function also of the type of classifier being used, since the error in the classifier will determine the appropriate safety margin The functions that appear also in Algorithm 9.1 the same work here, hence, we not reiterate their description fsim now returns multiple outputs: it computes the values of all the arguments of the conditional in y = max(y0 , y1 , ) For example, in 376 A Singhee and R.A Rutenbar Algorithm 9.2 The general recursive statistical blockade algorithm for efficient sampling of extremely rare events, in the presence of conditional induced disjoint tail regions Require: initial sample size n0 (e.g., 1,000); total sample size n; tail sample size nt ; function pc (p), p ∈ (0,100); performance metric function y = max(y0 , y1 , ) 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 X = MonteCarlo(n0 ) n = n0 nc = max(nt , 1,000) // Classifier training set size at least 1,000 Y = fsim (X) // Simulate the initial Monte Carlo sample set ytail,i = Y•,i , i = 0, 1, // The ith column of Y contains values for yi in y = max(y0 , y1 , ) Xtail,i = X, i = 0, 1, while n < n ⌬n = min(100n , n) – n // Number of points to filter in this recursion step pt = n100⌬n // The tail threshold is the pt th percentile +⌬n n = n + ⌬n // Total number of points filtered by the end of this recursion stage X = MonteCarloNext(⌬n) // The next ⌬n points in the Monte Carlo sequence forall i : yi is an argument in y = max(y0 , y1 , ) (Xtail,i , ytail,i ) = GetWorst(nc , Xtail,i , ytail,i ) // Get the nt worst points t = Percentile(ytail,i , pt ) tc = Percentile(ytail,i , pc (pt )) Ci = BuildClassifier(Xtail,i , ytail,i , tc ) (Xtail,i , ytail,i ) = GetGreaterThan(t, Xtail,i , ytail,i ) // Get the points with yi > t Xcand,i = Filter(Ci , X) // Candidate tail points for yi endfor T T T Xcand,1 · · · // Union of all candidate tail points X = Xcand,0 Y = fsim (X) // Simulate all candidate tail points ycand,i = {Yj,i : Xj,• ∈ Xcand,i }, i = 0, 1, // Extract the tail points for yi T T T T T T ytail,i = ytail,i ycand,i , Xtail,i = Xtail,i Xcand,i , i = 0, 1, // All tail points till now endwhile ytail = MaxOverRows([ytail,0 ytail,1 ]) // compute the conditional ytail = GetWorst(nt , ytail ) (ξ , β) = FitGPD(ytail - min(ytail )) the case of DRV, it will return the values of DRV0 and DRV1 These values, for any one Monte Carlo point, are stored in one row of the result matrix Y The function MonteCarloNext(⌬n) returns the next ⌬n points in the sequence of points generated till now The function GetWorst(n, X, y) returns the n worst values in the vector y and the corresponding rows of the matrix X This functionality naturally extends to the two argument GetWorst(n, y) GetGreaterThan(t, X, y) returns the elements of y that are greater than t, along with the corresponding rows of X The function pc (p) is not easy to determine, hence we also present a less general version as Algorithm 9.3, which can be used immediately by any practitioner Here, we restrict the total sample size n to be some power of 100, times 1,000: n = 100 j · 1000, j = 0, 1, (9.38) Statistical Blockade: Estimating Rare Event Statistics for Memories 377 Algorithm 9.3 The recursive statistical blockade algorithm with fixed sequences for the tail and classification thresholds: t = 99%, 99.99%, 99.9999%, points, and tc = 97%, 99.97%, 99.9997%, points The total sample size is given by (9.38) Require: initial sample size n0 (e.g., 1,000); total sample size n; tail sample size nt ; performance metric function y = max(y0 , y1 , ) 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 X = MonteCarlo(n0 ) n = n0 Y = fsim (X) // Simulate the initial Monte Carlo sample set ytail,i = Y• ,i , i = 0, 1, // The ith column of Y contains values for yi in y = max(y0 , y1 , .) Xtail,i = X, i = 0, 1, while n < n ⌬n = 99n // Number of points to filter in this recursion step n = n + ⌬n // Total number of points filtered by the end of this recursion stage X = MonteCarloNext(⌬n) // The next ⌬n points in the Monte Carlo sequence forall i : yi is an argument in y = max(y0 , y1 , ) (Xtail,i , ytail,i ) = GetWorst(1,000, Xtail,i , ytail,i ) // Get the 1,000 worst points t = Percentile(ytail,i , 99) tc = Percentile(ytail,i , 97) Ci = BuildClassifier(Xtail,i , ytail,i , tc ) (Xtail,i , ytail,i ) = GetGreaterThan(t, Xtail,i , ytail,i ) // Get the points with yi > t Xcand,i = Filter(Ci , X) // Candidate tail points for yi endfor T T T Xcand,1 · · · // Union of all candidate tail points X = Xcand,0 Y = fsim (X) // Simulate all candidate tail points ycand,i = {Yj,i : Xj,• ∈ Xcand,i }, i = 0, 1, // Extract the tail points for yi T T T T T T ytail,i = ytail,i ycand,i , Xtail,i = Xtail,i Xcand,i , i = 0, 1, // All tail points till now endwhile ytail = MaxOverRows([ytail,0 ytail,1 ]) // compute the conditional ytail = GetWorst(nt , ytail ) (ξ , β) = FitGPD(ytail - min(ytail )) Also, we fix pt = 99% and pc = 97% This will always give us 1,000 tail points to fit the GPD The tail threshold t moves with every recursion step as t = 99th percentile, 99.99th percentile, 99.9999th percentile, , and the classification threshold as tc = 97th percentile, 99.97th percentile, 99.9997th percentile, The algorithms presented here are in iterative form, rather than recursive form To see how the recursion works, suppose we want to estimate the 99.9999% tail To generate points at and beyond this threshold, we first estimate the 99.99% point and use a classifier at the 99.97% point to generate these points efficiently To build this classifier in turn, we first estimate the 99% point and use a classifier at the 97% 378 A Singhee and R.A Rutenbar Fig 9.22 Recursive formulation of statistical blockade as in Algorithm 9.3 point Figure 9.22 illustrates this recursion on the PDF of any one argument in the conditional (9.37) 9.6.4 Experimental Results We now test the recursive statistical blockade method on another SRAM cell test case, where we compute the DRV as in (9.36) In this case the SRAM cell is implemented in an industrial 90 nm process Wang et al [42] develop an analytical model for predicting the CDF of the DRV that uses not more than 5,000 Monte Carlo points The CDF is given as μ0 + k(y − V0 ) , F(y) = − erfc(y0 ) + erfc2 (y0 ), wherey0 = √ 2σ0 (9.39) where y is the DRV value and erfc() is the complementary error function [43] k is the sensitivity of the SNM of the SRAM cell to the supply voltage, computed using a DC sweep μ0 and σ are the mean and standard deviation of the SNM (SNM0 ), for a user-defined supply voltage V0 SNM0 is the SNM of the cell while storing a These statistics are computed using a short Monte Carlo run of 1,500–5,000 sample points We direct the reader to [42] for complete details regarding this analytical model of the DRV distribution The qth quantile can be estimated as DRV(q) = √ √ 2σ0 erfc−1 − q − μ0 + V0 k (9.40) Here DRV(q) is the supply voltage Vdd such that P(DRV(q)) ≤ Vdd = q Let us now compute the DRV quantiles as m points, such that q is the cumulative probability for the value m from a standard normal distribution We will use five different methods to estimate the DRV quantiles for m ∈ [3, 8]: 1) Analytical: Use Eq (9.40) Statistical Blockade: Estimating Rare Event Statistics for Memories 379 2) Recursive statistical blockade without the GPD model: Algorithm 9.3 is run for n = billion This results in three recursion stages, corresponding to total sample sizes of n = 100,000, 10 million, and billion Monte Carlo points The worst DRV value for these three recursion stages are estimates of the 4.26σ , 5.2σ , and 6σ points, respectively 3) GPD model from recursive statistical blockade: The 1,000 tail points from the last recursion stage of the recursive statistical blockade run are used to fit a GPD model, which is then used to predict the DRV quantiles 4) Normal: A normal distribution is fit to data from a 1,000 point Monte Carlo run, and used to predict the DRV quantiles 5) Lognormal: A lognormal distribution is fit to the same set of 1,000 Monte Carlo points, and used for the predictions The results are shown in Fig 9.23 From the plots in the figure, we can immediately see that the recursive statistical blockade estimates are very close to the estimates from the analytical model This shows the efficiency of the recursive formulation in reducing the error in predictions for events far out in the tail Table 9.9 shows the number of circuit simulations performed at each recursion stage The total number of circuit simulations is 41,721 This is not small, but in comparison to standard Monte Carlo (1 billion simulations), and basic, nonrecursive statistical blockade (approximately, 30 million with tc = 97th percentile) it is extremely fast About 41,721 simulations for DRV computation of a 6T SRAM cell can be completed in several hours on a single computer today With the advent of multi-core processors, the total simulation time can be drastically reduced with proper implementation Note that we can extend the prediction power to 8σ with the GPD model, without any additional simulations Standard Monte Carlo would need over 1.5 quadrillion Fig 9.23 Estimates of DRV quantiles from five estimation methods The GPD model closely fits the analytical model (9.39) The solid circles show the worst DRV values from the three recursion stages of statistical blockade sampling The normal and lognormal models are quite inaccurate 380 Table 9.9 Number of circuit simulation needed by recursive statistical blockade to generate a 6σ point A Singhee and R.A Rutenbar Recursion stage Number of simulations Initial Total Speedup over Monte Carlo Speedup over statistical blockade 1,000 11,032 14,184 15,505 41,721 23,969× 719× circuit simulations to generate a single 8σ point For this case, the speedup over standard Monte Carlo is extremely large As expected, the normal and lognormal fits show large errors The normal fit is unable to capture the skewness of the DRV distribution On the other hand, the lognormal distribution has a heavier tail than the DRV distribution References P Gupta and F.-L Heng, Toward a systematic-variation aware timing methodology, Proc IEEE/ACM Design Automation Conf., June 2004 M Hane, T Ikezawa, and T Ezaki, Atomistic 3d process/device simulation considering gate line-edge roughness and poly-si random crystal orientation effects, Proc IEEE Int Electron Devices Meeting, 2003 M J M Pelgrom, A C J Duinmaijer, and A P G Welbers, Matching properties of MOS transistors, IEEE J Solid-State Circuits, 24(5): 1433–1440, 1989 A J Bhavnagarwala, X Tang, and J D Meindl, The impact of intrinsic device fluctuations on CMOS SRAM cell stability, IEEE J Solid-State Circuits, 36(4): 658–665, 2001 S Mukhopadhyay, H Mahmoodi, and K Roy, Statistical design and optimization of SRAM cell for yield enhancement, Proc IEEE/ACM Int Conf on CAD, 2004 B H Calhoun and A Chandrakasan, Analyzing static noise margin for sub-threshold SRAM in 65 nm CMOS, Proc Eur Solid State Cir Conf., 2005 B Joshi, R K Anand, C Berg, J Cruz-Rios, A Krishnamurthi, N Nettleton, S Nguyen, J Reaves, J Reed, A Rogers, S Rusu, C Tucker, C Wang, M Wong, D Yee, and J.-H Chang, A BiCMOS 50 MHz cache controller for a superscalar microprocessor, Int Solid State Cir Conf., 1992 G S Fishman, A First Course in Monte Carlo Duxbury, 2006 K Agarwal, F Liu, C McDowell, S Nassif, K Nowka, M Palmer, D Acharyya, and J Plusquellic, A test structure for characterizing local device mismatches, Symp on VLSI Circuits Dig Tech Papers, 2006 10 H Chang, V Zolotov, S Narayan, and C Visweswariah, Parameterized block-based statistical timing analysis with non-Gaussian parameters, nonlinear delay functions, Proc IEEE/ACM Design Autom Conf., 2005 11 T Ezaki, T Izekawa, and M Hane, Investigation of random dopant fluctuation induced device charestistics variation for sub-100 nm CMOS by using atomistic 3D process/device simulator, Proc IEEE Int Electron Devices Meeting, 2002 12 H Mahmoodi, S Mukhopadhyay, and K Roy, Estimation of delay variations due to randomdopant fluctuations in nanoscale CMOS circuits, IEEE J Solid State Cir., 40(3): 1787–1796, 2005 Statistical Blockade: Estimating Rare Event Statistics for Memories 381 13 D Hocevar, M Lightner, and T Trick, A study of variance reduction techniques for estimating circuit yields, IEEE Trans Computer-Aided Design, 2(3): 279–287, 1983 14 R Kanj, R Joshi, and S Nassif, Mixture importance sampling and its application to the analysis of SRAM designs in the presence of rare event failure, Proc IEEE/ACM Design Autom Conf., 2006 15 T C Hesterberg, Advances in Importance Sampling, Dept of Statistics, Stanford University, 1998, 2003 16 S I Resnick, Extreme Values, Regular Variation and Point Processes, Springer, New York, 1987 17 L de Haan, Fighting the arch-enemy with mathematics, Statist Neerlandica, 44: 45–68, 1990 19 P Embrechts, C Kl¨uppelberg, and T Mikosch, Modelling Extremal Events for Insurance and Finance, Springer-Verlag, Berlin, 4th ed., 2003 19 A Singhee and R A Rutenbar, Statistical Blockade: a novel method for very fast Monte Carlo simulation of rare circuit events, and its application, Proc Design Autom Test Europe, 2007 20 A Singhee, J Wang, B H Calhoun, and R A Rutenbar, Recursive Statistical Blockade: an enhanced technique for rare event simulation with application to SRAM circuit design, Proc Int Conf VLSI Design, 2008 21 R A Fisher and L H C Tippett, Limiting forms of the frequency distribution of the largest or smallest member of a sample, Proc Cambridge Philos Soc., 24: 180–190, 1928 22 B Gnedenko, Sur la distribution limite du terme maximum d’une aleatoire, Ann Math., 44(3): 423–453, 1943 23 A Singhee, Novel Algorithms for Fast Statistical Analysis of Scaled Circuits, PhD Thesis, Electrical and Computer Engg., Carnegie Mellon University, 2007 24 A A Balkema and L de Haan, Residual life time at great age, Ann Prob., 2(5): 792–804, 1974 25 J Pickands III, Statistical inference using extreme order statistics, Ann Stats., 3(1): 119–131, 1975 26 S D Grimshaw, Computing maximum likelihood estimates for the generalized Pareto distribution, Technometrics, 35(2): 185–191, 1993 27 R L Smith, Estimating tails of probability distributions, Ann Stats., 15(3): 1174–1207, 1987 28 R L Smith, Maximum likelihood estimation in a class of non-regular cases, Biometrika, 72: 67–92, 1985 29 J R M Hosking and J R Wallis, Parameter and quantile estimation for the generalized Pareto distribution, Technometrics, 29(3): 339–349, 1987 30 J R M Hosking, The theory of probability weighted moments, IBM Research Report, RC12210, 1986 31 C J C Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2(2): 121–167, 1998 32 T Hastie, R Tibshirani and J Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001 33 T Joachims, Making large-scale SVM learning practical, In B Sch¨olkopf, C Burges, and A Smola, editors, Advances in Kernel Methods – Support Vector Learning MIT Press, 1999 34 I H Witten and E Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2nd edition, 2005 35 K Morik, P Brockhausen, and T Joachims, Combining statistical learning with a knowledgebased approach – a case study in intensive care monitoring, Proc 16th Intn’l Conf Machine Learning, 1999 36 R Rao, A Srivastava, D Blaauw, and D Sylvester, Statistical analysis of subthreshold leakage current for VLSI circuits, IEEE Trans VLSI Sys., 12(2): 131–139, 2004 37 W Liu, X Jin, J Chen, M.-C Jeng, Z Liu, Y Cheng, K Chen, M Chan, K Hui, J Huang, R Tu, P Ko, and C Hu, BSIM 3v3.2 Mosfet Model Users’ Manual, Univ California, Berkeley, Tech Report No UCB/ERL M98/51, 1988 382 A Singhee and R.A Rutenbar 38 http://www.eas.asu.edu/∼ptm/ 39 R K Krishnamurthy, A Alvandpour, V de, and S Borkar, High-performance and low-power challenges for sub-70 nm microprocessor circuits, Proc Custom Integ Circ Conf., 2002 40 A Singhee and R A Rutenbar, Beyond low-order statistical response surfaces: latent variable regression for efficient, highly nonlinear fitting, Proc IEEE/ACM Design Autom Conf., 2007 41 L Chang, D M Fried, J Hergenrother, J W Sleight, R H Dennard, R K Montoye, L Sekaric, S J McNab, A W Topol, C D Adams, K W Guarini, and W Haensch, Stable SRAM Cell Design for the 32 nm Node and Beyond, Symp VLSI Tech Dig Tech Papers, 128–129, 2005 42 J Wang, A Singhee, R A Rutenbar, and B H Calhoun, Modeling the minimum standby supply voltage of a full SRAM array, Proc Europ Solid State Cir Conf., 2007 43 W H Press, B P Flannery, A A Teukolsky, and W T Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 2nd edition, 1992 Index A Address alignment, 14–15 Array failure probability, 331–336 Assist circuits, 53–61 B Back switching ‘relaxation’, 287, 291, 292 Balkema, and de Haan, 344, 345 Bank interleaving, 11–13, 151 BGS (bitline GND sensing), 295, 299, 305, 306, 319, 327 Bit distribution, 289, 290, 292 Blockade filter, 355, 357, 358, 360, 363, 367 C Cache, 1, 7, 8, 10, 11, 17, 40, 93, 94, 100, 127, 128, 130, 134, 161, 162, 165, 166, 168, 219, 270, 273, 342, 374 Central limit theorem, 343, 344 Chain FeRAM, 298, 299, 319 Charge-trapping cell, 237 Circuit optimization, 3, 11, 27, 30, 31, 58, 63, 66, 67, 92, 98, 102, 106, 114, 116, 125, 127, 150, 158, 163, 186, 195, 211, 217, 272, 273, 291, 318, 353 Circuit stability, 1, 34, 41, 53, 55, 60, 64, 65, 67, 69, 71, 86, 98, 101, 114, 118, 123, 185, 186, 249, 250 Classification threshold, 354, 355, 356, 357, 369, 370, 375, 377 Classifier, 340, 349, 350–354, 355–357, 359, 361, 362, 368, 370, 375, 377 CMOS memory circuits, 259 CMOS memory integrated circuits, 251 Coercive voltage (Vc), 49, 279, 284, 286, 288, 290–296, 303, 309, 318, 319, 321 Confidence interval, 371, 372, 373 Convergence, 179, 182, 236, 273, 274, 275, 288, 301, 344, 346, 347, 348 Curse of dimensionality, 361 D Disjoint tail regions, 368–371, 376 Drain-induced barrier lowering (DIBL), 92, 118 Dynamic voltage scaling (DVS), 94 E EEPROM (electrically erasable and programmable read only memory), 178, 182, 188, 190, 191, 193, 198, 218, 222, 224, 225, 236, 308, 312, 313, 314, 316, 317, 322, 323 Electric oxide thickness (EOT), 51, 70, 71, 72, 75, 76, 79 Embedded dynamic random access memory (eDRAM), 1, 2, 4, 130, 163, 166, 167, 171, 173, 338 EOT scaling, 70, 71, 72, 75, 79 EPROM (electrically programmable ROM), 178, 181, 182, 190, 191, 192, 193, 198, 234 Error correction coding (ECC), 117 Exceedance, 339, 342, 345, 356, 373, 374 Extremely rare events, 340, 371–375, 376 Extreme value theory, 5, 339, 341, 342–345 F Failure probability, 331, 333, 342, 349, 356, 358, 360, 363, 365, 367, 370, 373, 374, 375 FeRAM (ferroelectric RAM), 2, 5, 178, 179, 279–324 Ferroelectricity, 280, 281, 282, 293 Fisher–Tippett, Fr´echet, 343, 344 Floating-gate cell, 195 FPGA (field programmable gate array), 218, 219, 229, 272, 294, 323 383 384 G Gaussian fit, 364, 366 Generalized extreme value (GEV) distribution, 343, 344, 345 Generalized Pareto distribution (GPD), 345, 346, 347, 348, 349, 354, 355, 356, 357, 358, 359, 363, 364, 371–377, 379 GMR (giant magneto-resistive) effect, 243, 250 Gumbel, 343 H Heavy tail, 366, 367 High-K material, 71, 72, 82, 237 High-replication circuits, 338, 354, 356 I Imprint fatigue loss, 279, 292 Integrated circuit design, 127 L Linear classifier, 351 Local register files (LRFs), 18, 19, 20 Log-likelihood, 346 Lognormal fit, 380 M Machine learning, 5, 339, 340, 349 Magnetic RAM (MRAM), 2, 5, 241–275 Margin of separating hyperplane, 351–353 Mass Storage, 8, 11 Maximum domain of attraction (MDA), 343, 344, 345, 346 Maximum likelihood estimation (MLE), 346–347 MCU (micro-controller unit), 4, 177, 178, 180–189, 193, 195, 207, 218–221, 226–229, 236, 270–275, 309, 318, 323 Mean excess function, 373, 374 Memory architecture, 3–4, 7–36 Microwave integrated circuits, 226–227 Minor-loop, 285 Moment matching, 347 Monte Carlo simulation, 5, 95, 332, 337–338, 340, 345 MOSFET fluctuations, 95 MTJ (magnetic tunneling junction), 247, 268 N NAND flash memory, 190, 194, 216 Non-destructive read out (NDRO), 309, 311, 324 Non-switching read out, 309 Non volatile memory (NVM), 2, 270 Index NOR flash memory, 179, 196, 226, 227 NRTZ write, 294, 302, 313 NV-RAM (non-volatile RAM), 180, 246, 247, 253, 269, 270, 271, 272, 273, 275 O Optimal separating hyperplane, 350–354 OTP (one-time programmable ROM), 181, 182, 191, 232, 234–236 P Parameter space, 340, 354, 355, 368, 369, 370 Parametric yield, 331–336 Peaks over threshold, 342 PFET Amplifier, 156–157, 168 Phase Change Memory (PCRAM), 2, 179, 275, 297 Pickands, 344, 345 Planer capacitor cell, 297 Plated-wire magnetic memory, 247 Poisson yield model, 334–336 Power supplies, 61–64 Power supply circuits, 11, 21, 28, 53, 55, 60–61, 62, 63, 64, 127, 158, 177, 181, 196, 254, 256, 258, 263, 281, 299 Probability-weighted moment matching, 348–349, 371 Program and erase, 188, 191, 200, 205, 206, 209, 211, 212, 221, 227, 237, 312 Pr (remnant polarization), 286 Pseudo-spin-valve (PSV), 243 PWM, 347, 348 PZT (Pb(Zr,Ti)O3 ), 280, 282 Q Qsw (switching charge), 287, 288, 291, 292, 293, 294, 295, 296, 300, 301, 310, 314, 318, 319, 320, 321, 324 R Random dopant fluctuation (RDF), 95, 96, 97, 98, 330, 336, 337, 359 Rare events, 5, 329–380 Read margin, 53–56, 59, 234, 267 Read–Modify–Write (RMW), 15–16, 31, 32, 68, 70, 73, 74, 75, 78, 183, 187, 219 Recursive statistical blockade, 376, 377, 378, 379, 380 Redundancy, 74–75, 120–122, 124, 130, 138, 139, 140, 141, 146, 147, 148, 149, 150, 152, 153, 158, 159, 160, 161, 230, 232, 291, 294, 306–307, 331, 332–334, 336, 338 Register File, 9–10, 18, 40, 100 Reverse short-channel effect, 102 RTZ write, 294, 303 Index S Sample maximum, 342, 343 Sample mean excess plot, 373, 374 SBT (SrBi2 Ta2 O9 ), 282 Scale-invariant feature transform (SIFT), 21–23 Scratch pad memory, 10, 11, 18 Separating hyperplane, 350–353 Silicon-on-insulator (SOI), 133, 134, 162–168, 172–173 Skewness, 341, 366, 380 SNOS (semiconductor-nitrideoxide semiconductor), 206 SONOS (Silicon-Oxide-Nitride-Silicon), 190, 191, 192, 193, 204, 205–214, 227, 237 Spearman’s rank correlation, 362 Spin valve, 243, 248 Split-gate cell, SSI (source-side injection), 198–203 SRAM cell, 299, 313, 330, 336, 337, 338, 339, 357, 359–360, 361, 364–367, 368, 378, 379 SRAM chips, 1, 3, 4, 8, 9, 10, 11, 17, 26, 27, 28, 31, 39–86, 89–124, 127, 129, 130, 134, 135, 148, 151–154, 166, 173, 177–180, 188, 218, 225, 230, 242, 243, 250, 252, 254, 267–271, 313, 314, 320, 323, 329, 330, 338–340, 356, 364, 374–375 SRAM memory cell, 42, 45, 147 SRAM scaling, 4, 64, 75, 76, 86, 120 SRAM stability, 41 Stack on via capacitor cell, 294, 296 Static noise margin (SNM), 46–47, 50, 52, 96, 113, 331, 336, 337, 339, 340 Static random access memory (SRAM), 1, 3, 4, 10, 17, 18, 19, 27, 28, 29, 30, 32, 39–124 Statistical blockade, 5, 329–380 Streaming register file (SRF), 18, 19, 20 385 STT (Spin-torque transfer), 249, 258, 260, 261–263 Subthreshold, 49, 59, 80, 81, 82, 84, 93, 95, 97, 104, 110, 120, 123, 164, 339, 368 Support points, 353 Support vector machine (SVM), 350–354, 357 T Tail threshold, 341, 344, 354, 355, 357, 358, 367, 370, 374–377 Tc (curier temperature), 289, 290, 291, 295, 321, 322, 354–356, 369–370, 375–377, 379 Threshold-voltage variation, 89, 95, 97, 98, 123, 336 TMR (Tunneling magneto-resistive) effect, 246, 247–250, 252–259, 263–265, 297 8T SRAM, 65–66, 67, 69, 70, 72, 73, 75, 78–79, 109, 370 10T SRAM, 66–67, 70, 75, 78–79 Toggle cell, 246, 247, 252 U UP-DOWN sensing, 301, 312 UP-only sensing, 301, 302 V Visual Image Processing (VIP), 23, 32–36 Voltage scaling, 1, 4, 46, 89, 90, 91, 92, 93, 94, 108, 113, 123, 124, 204 Vs (saturation voltage), 284, 285, 286, 288 W Weibull, 343 Write margin, 46, 48, 50, 53–61, 71, 97, 107, 117 Write time, 5, 132, 210, 337, 341, 359, 360, 361, 372 Continued from page ii SAT-Based Scalable Formal Verification Solutions Malay Ganai and Aarti Gupta ISBN 978-0-387-69166-4, 2007 Ultra-Low Voltage Nano-Scale Memories Kiyoo Itoh, Masashi Horiguchi and Hitoshi Tanaka ISBN 978-0-387-33398-4, 2007 Routing Congestion in VLSI Circuits: Estimation and Optimization Prashant Saxena, Rupesh S Shelar, Sachin Sapatnekar ISBN 978-0-387-30037-5, 2007 Ultra-Low Power Wireless Technologies for Sensor Networks Brian Otis and Jan Rabaey ISBN 978-0-387-30930-9, 2007 Sub-Threshold Design for Ultra Low-Power Systems Alice Wang, Benton H Calhoun and Anantha Chandrakasan ISBN 978-0-387-33515-5, 2006 High Performance Energy Efficient Microprocessor Design Vojin Oklibdzija and Ram Krishnamurthy (Eds.) ISBN 978-0-387-28594-8, 2006 Abstraction Refinement for Large Scale Model Checking Chao Wang, Gary D Hachtel, and Fabio Somenzi ISBN 978-0-387-28594-2, 2006 A Practical Introduction to PSL Cindy Eisner and Dana Fisman ISBN 978-0-387-35313-5, 2006 Thermal and Power Management of Integrated Systems Arman Vassighi and Manoj Sachdev ISBN 978-0-387-25762-4, 2006 Leakage in Nanometer CMOS Technologies Siva G Narendra and Anantha Chandrakasan ISBN 978-0-387-25737-2, 2005 Statistical Analysis and Optimization for VLSI: Timing and Power Ashish Srivastava, Dennis Sylvester, and David Blaauw ISBN 978-0-387-26049-9, 2005 [...]... address repeatedly for a while, it is temporal locality Spatial locality means that the memory accesses occur within a small region of memory for a short duration Due to these localities, embedding a small but fast memory is sufficient to provide a processor with frequently required data for a short period of time However, H.J Yoo (B) KAIST K Zhang (ed.), Embedded Memories for Nano- Scale VLSIs, Series on... reduction in power consumption can be achieved 2.4.1 General Low-Power Techniques For high-performance processors, providing data to be processed without bottleneck is as important as performing computation in high speed to achieve maximum performance For that reason, there have been a number of researches for memory performance improvement and/or memory power reduction The common low-power techniques... transactions The lowest level of the memory hierarchy consists of flash memories such as NAND/NOR flash memories and secure digital (SD) cards to adapt for handheld devices Since the PXA300 processor is targeted for general-purpose applications, it is hard to tailor the memory system for low-power execution of a specific application Therefore, the memory hierarchy of the PXA300 processor is designed similar... in memory design and optimization Reference 1 K Kuhn, “Reducing Variation in Advanced Logic Technologies: Approaches to Process and Design for Manufacturability of Nanoscale CMOS,” IEEE IEDM Tech Digest, pp 471–474, Dec 2007 Chapter 2 Embedded Memory Architecture for Low-Power Application Processor Hoi Jun Yoo and Donghyun Kim 2.1 Memory Hierarchy 2.1.1 Introduction Currently, the state-of-the-art high-end... introduced as a more application-specific implementation of the memory hierarchy A target 2 Embedded Memory Architecture for Low-Power Application Processor 21 application of the memory-centric NoC is the scale- invariant feature transform (SIFT)-based object recognition The SIFT algorithm [18] is widely adopted for autonomous navigation of mobile intelligent robots [19–22] Due to vast amount of computation... control of the memory-centric NoC 2 Embedded Memory Architecture for Low-Power Application Processor 27 access to the unwritten data This results in reduced processor activity which is helpful for low-power consumption 2.4 Low-Power Embedded Memory Design At the start of this chapter, the concept of memory hierarchy was introduced first to draw a comprehensive map of memories in the computer architecture... implementing a memory system In many cases, faster memories are more expensive than slower memories For example, SRAMs require higher cost per unit storage capacity than DRAMs because a 6-transistor cell in the SRAMs consumes more silicon area than a single transistor cell of the DRAMs Similarly, DRAMs are more costly than hard disk drives or flash memories for the same capacity Flash memory cells consume... buffer stores an image data to be displayed, and depth buffer stores the depth information of each pixel Both of them are accessed by 3D graphics processor, and data are compared and modified In the depth comparison operations, for example, depth buffer data are accessed and the depth information is modified If the depth information of stored pixel is screened by newly generated pixel, it needs to be... accesses 16 H.J Yoo and D Kim Fig 2.7 Read–modify–write operation timing diagram 2.3 Embedded Memory Architecture Case Studies In the design of low-power system-on-chip (SoC), architecture of the embedded memory system has significant impact on the power consumption and overall performance of the SoC In this section, three embedded memory architectures are covered as case studies The first example is a... much cheaper than silicon die in a mass production A combination of different types of memories in the memory system enables a trade-off between performance and cost By storing infrequently accessed data in the slow but low-cost memories, the overall system cost can be reduced The second advantage is an improved performance Without the memory hierarchy, a processor should directly access the lowest level ... Massachusetts Institute of Technology Cambridge, Massachusetts Embedded Memories for Nano- Scale VLSIs Kevin Zhang (Ed.) ISBN 978-0-387-88496-7 Carbon Nanotube Electronics Ali Javey and Jing Kong (Eds.)... provide a processor with frequently required data for a short period of time However, H.J Yoo (B) KAIST K Zhang (ed.), Embedded Memories for Nano- Scale VLSIs, Series on Integrated Circuits and Systems,... (ed.), Embedded Memories for Nano- Scale VLSIs, Series on Integrated Circuits and Systems, DOI 10.1007/978-0-387-88497-4 3, C Springer Science+Business Media, LLC 2009 39 40 H Yamauchi 3.1.1 Embedded