IVLSI Part 1 potx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	30
Dung lượng	1,08 MB

Nội dung

I VLSI VLSI Edited by Zhongfeng Wang In-Tech intechweb.org Published by In-Teh In-Teh Olajnica 19/2, 32000 Vukovar, Croatia Abstracting and non-prot use of the material is permitted with credit to the source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. Publisher assumes no responsibility liability for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained inside. After this work has been published by the In-Teh, authors have the right to republish it, in whole or part, in any publication of which they are an author or editor, and the make other personal use of the work. © 2010 In-teh www.intechweb.org Additional copies can be obtained from: publication@intechweb.org First published February 2010 Printed in India Technical Editor: Melita Horvat Cover designed by Dino Smrekar VLSI, Edited by Zhongfeng Wang p. cm. ISBN 978-953-307-049-0 V Preface The process of integrated circuits (IC) started its era of very-large-scale integration (VLSI) in 1970’s when thousands of transistors were integrated into a single chip. Since then, the transistors counts and clock frequencies of state-of-art chips have grown by orders of magnitude. Nowadays we are able to integrate more than a billion transistors into a single device. However, the term “VLSI” remains being commonly used, despite of some effort to coin a new term ultralarge- scale integration (ULSI) for ner distinctions many years ago. In the past two decades, advances of VLSI technology have led to the explosion of computer and electronics world. VLSI integrated circuits are used everywhere in our everyday life, including microprocessors in personal computers, image sensors in digital cameras, network processors in the Internet switches, communication devices in smartphones, embedded controllers in automobiles, et al. VLSI covers many phases of design and fabrication of integrated circuits. In a complete VLSI design process, it often involves system denition, architecture design, register transfer language (RTL) coding, pre- and post-synthesis design verication, timing analysis, and chip layout for fabrication. As the process technology scales down, it becomes a trend to integrate many complicated systems into a single chip, which is called system-on-chip (SoC) design. In addition, advanced VLSI systems often require high-speed circuits for the ever increasing demand of data processing. For instance, Ethernet standard has evolved from 10 Mbps to 10 Gbps, and the specication for 100 Gbps Ethernet is underway. On the other hand, with the growing popularity of smartphones and mobile computing devices, low-power VLSI systems have become critically important. Therefore, engineers are facing new challenges to design highly integrated VLSI systems that can meet both high performance requirement and stringent low power consumption. The goal of this book is to elaborate the state-of-art VLSI design techniques at multiple levels. At device level, researchers have studied the properties of nano-scale devices and explored possible new material for future very high speed, low-power chips. At circuit level, interconnect has become a contemporary design issue for nano-scale integrated circuits. At system level, hardware-software co-design methodologies have been investigated to coherently improve the overall system performance. At architectural level, researchers have proposed novel architectures that have been optimized for specic applications as well as efcient recongurable architectures that can be adapted for a class of applications. As VLSI systems become more and more complex, it is a great challenge but a signicant task for all experts to keep up with latest signal processing algorithms and associated architecture designs. This book is to meet this challenge by providing a collection of advanced algorithms VI in conjunction with their optimized VLSI architectures, such as Turbo codes, Low Density Parity Check (LDPC) codes, and advanced video coding standards MPEG4/H.264, et al. Each of the selected algorithms is presented with a thorough description together with research studies towards efcient VLSI implementations. No book is expected to cover every possible aspect of VLSI exhaustively. Our goal is to provide the design concepts through those selected studies, and the techniques that can be adopted into many other current and future applications. This book is intended to cover a wide range of VLSI design topics – both general design techniques and state-of-art applications. It is organized into four major parts: ▪ Part I focuses on VLSI design for image and video signal processing systems, at both algorithmic and architectural levels. ▪ Part II addresses VLSI architectures and designs for cryptography and error correction coding. ▪ Part III discusses general SoC design techniques as well as system-level design optimization for application-specic algorithms. ▪ Part IV is devoted to circuit-level design techniques for nano-scale devices. It should be noted that the book is not a tutorial for beginners to learn general VLSI design methodology. Instead, it should serve as a reference book for engineers to gain the knowledge of advanced VLSI architecture and system design techniques. Moreover, this book also includes many in-depth and optimized designs for advanced applications in signal processing and communications. Therefore, it is also intended to be a reference text for graduate students or researchers for pursuing in-depth study on specic topics. The editors are most grateful to all coauthors for contributions of each chapter in their respective area of expertise. We would also like to acknowledge all the technical editors for their support and great help. Zhongfeng Wang, Ph.D. Broadcom Corp., CA, USA Xinming Huang, Ph.D. Worcester Polytechnic Institute, MA, USA VII Contents Preface V 1. DiscreteWaveletTransformStructuresforVLSIArchitectureDesign 001 HannuOlkkonenandJuusoT.Olkkonen 2. HighPerformanceParallelPipelinedLifting-basedVLSIArchitectures forTwo-DimensionalInverseDiscreteWaveletTransform 011 IbrahimSaeedKokoandHermanAgustiawan 3. Contour-BasedBinaryMotionEstimationAlgorithmandVLSIDesign forMPEG-4ShapeCoding 043 Tsung-HanTsai,Chia-PinChen,andYu-NanPan 4. Memory-EfcientHardwareArchitectureof2-DDual-ModeLifting-BasedDiscrete WaveletTransformforJPEG2000 069 Chih-HsienHsiaandJen-ShiunChiang 5. FullHDJPEGXREncoderDesignforDigitalPhotographyApplications 099 Ching-YenChien,Sheng-ChiehHuang,Chia-HoPanandLiang-GeeChen 6. TheDesignofIPCoresinFiniteFieldforErrorCorrection 115 Ming-HawJing,Jian-HongChen,Yan-HawChen,Zih-HengChenandYaotsuChang 7. ScalableandSystolicGaussianNormalBasisMultipliers overGF(2m)UsingHankelMatrix-VectorRepresentation 131 Chiou-YngLee 8. High-SpeedVLSIArchitecturesforTurboDecoders 151 ZhongfengWangandXinmingHuang 9. Ultra-HighSpeedLDPCCodeDesignandImplementation 175 JinSha,ZhongfengWangandMinglunGao 10. AMethodologyforParabolicSynthesis 199 ErikHertzandPeterNilsson 11. FullySystolicFFTArchitecturesforGiga-sampleApplications 221 D.Reisis VIII 12. Radio-Frequency(RF)BeamformingUsingSystolicFPGA-basedTwo Dimensional(2D)IIRSpace-timeFilters 247 ArjunaMadanayakeandLeonardT.Bruton 13. AVLSIArchitectureforOutputProbabilityComputationsofHMM-based RecognitionSystems 273 KazuhiroNakamura,MasatoshiYamamoto,KazuyoshiTakagiandNaofumiTakagi 14. EfcientBuilt-inSelf-TestforVideoCodingCores:ACaseStudy onMotionEstimationComputingArray 285 Chun-LungHsu,Yu-ShengHuangandChen-KaiChen 15. SOCDesignforSpeech-to-SpeechTranslation 297 Shun-ChiehLin,Jia-ChingWang,Jhing-FaWang,Fan-MinLiandJer-HaoHsu 16. ANovelDeBruijnBasedMeshTopologyforNetworks-on-Chip 317 RezaSabbaghi-Nadooshan,MehdiModarressiandHamidSarbazi-Azad 17. OntheEfcientDesign&SynthesisofDifferentialClockDistributionNetworks 331 HoumanZarrabi,ZeljkoZilic,YvonSavariaandA.J.Al-Khalili 18. RobustDesignandTestofAnalog/Mixed-SignalCircuits inDeeplyScaledCMOSTechnologies 353 GuoYuandPengLi 19. NanoelectronicDesignBasedonaCNTNano-Architecture 375 BaoLiu 20. ANewTechniqueofInterconnectEffectsEqualizationbyusingNegativeGroup DelayActiveCircuits 409 BlaiseRavelo,AndréPérennecandMarcLeRoy 21. BookEmbeddings 435 SaïdBettayeb 22. VLSIThermalAnalysisandMonitoring 441 AhmedLakhssassiandMohammedBougataya DiscreteWaveletTransformStructuresforVLSIArchitectureDesign 1 DiscreteWaveletTransformStructuresforVLSIArchitectureDesign HannuOlkkonenandJuusoT.Olkkonen X Discrete Wavelet Transform Structures for VLSI Architecture Design Hannu Olkkonen and Juuso T. Olkkonen Department of Physics, University of Kuopio, 70211 Kuopio, Finland VTT Technical Research Centre of Finland, 02044 VTT, Finland 1. Introduction Wireless data transmission and high-speed image processing devices have generated a need for efficient transform methods, which can be implemented in VLSI environment. After the discovery of the compactly supported discrete wavelet transform (DWT) (Daubechies, 1988; Smith & Barnwell, 1986) many DWT-based data and image processing tools have outperformed the conventional discrete cosine transform (DCT) -based approaches. For example, in JPEG2000 Standard (ITU-T, 2000), the DCT has been replaced by the biorthogonal discrete wavelet transform. In this book chapter we review the DWT structures intended for VLSI architecture design. Especially we describe methods for constructing shift invariant analytic DWTs. 2. Biorthogonal discrete wavelet transform The first DWT structures were based on the compactly supported conjugate quadrature filters (CQFs) (Smith & Barnwell, 1986), which had nonlinear phase effects such as image blurring and spatial dislocations in multi-resolution analyses. On the contrary, in biorthogonal discrete wavelet transform (BDWT) the scaling and wavelet filters are symmetric and linear phase. The two-channel analysis filters 0 ( )H z and 1 ( )H z (Fig. 1) are of the general form 1 0 1 1 ( ) (1 ) ( ) ( ) (1 ) ( ) K K H z z P z H z z Q z       (1) where the scaling filter 0 ( )H z has the Kth order zero at    . The wavelet filter 1 ( )H z has the Kth order zero at 0   , correspondingly. ( )P z and ( )Q z are polynomials in 1 z  . The reconstruction filters 0 ( )G z and 1 ( )G z (Fig. 1) obey the well-known perfect reconstruction condition 0 0 1 1 0 0 1 1 ( ) ( ) ( ) ( ) 2 ( ) ( ) ( ) ( ) 0 k H z G z H z G z z H z G z H z G z        (2) 1 VLSI2 The last condition in (2) is satisfied if we select the reconstruction filters as 0 1 ( ) ( )G z H z  and 1 0 ( ) ( )G z H z   . Fig. 1. Analysis and synthesis BDWT filters. 3. Lifting BDWT The BDWT is most commonly realized by the ladder-type network called lifting scheme (Sweldens, 1988). The procedure consists of sequential down and uplifting steps and the reconstruction of the signal is made by running the lifting network in reverse order (Fig. 2). Efficient lifting BDWT structures have been developed for VLSI design (Olkkonen et al. 2005). The analysis and synthesis filters can be implemented by integer arithmetics using only register shifts and summations. However, the lifting DWT runs sequentially and this may be a speed-limiting factor in some applications (Huang et al., 2005). Another drawback considering the VLSI architecture is related to the reconstruction filters, which run in reverse order and two different VLSI realizations are required. In the following we show that the lifting structure can be replaced by more effective VLSI architectures. We describe two different approaches: the discrete lattice wavelet transform and the sign modulated BDWT. Fig. 2. The lifting BDWT structure. 4. Discrete lattice wavelet transform In the analysis part the discrete lattice wavelet transform (DLWT) consists of the scaling 0 ( )H z and wavelet 1 ( )H z filters and the lattice network (Fig. 3). The lattice structure contains two parallel transmission filters 0 ( )T z and 1 ( )T z , which exchange information via two crossed lattice filters 0 ( ) L z and 1 ( ) L z . In the synthesis part the lattice structure consists of the transmission filters 0 ( ) R z and 1 ( ) R z and crossed filters 0 ( )W z and 1 ( )W z , and finally the reconstruction filters 0 ( )G z and 1 ( )G z . Supposing that the scaling and wavelet filters obey (1), for perfect reconstruction the lattice structure should follow the condition Fig. 3. The general DLWT structure. 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 k k T R LW L R TW z T W L R T R L W z                    (3) This is satisfied if we state 0 0 W L   , 1 1 W L   , 0 1 R T  and 1 0 R T  . The perfect reconstruction condition follows then from the diagonal elements (3) as 0 1 0 1 ( ) ( ) ( ) ( ) k T z T z L z L z z    (4) There exists many approaches in the design of the DLWT structures obeying (4), for example via the Parks-McChellan-type algorithm. Especially the DLWT network is efficient in designing half-band transmission and lattice filters (see details in Olkkonen & Olkkonen, 2007a). For VLSI design it is essential to note that in the lattice structure all computations are carried out parallel. Also all the BDWT structures designed via the lifting scheme can be transferred to the lattice network (Fig. 3). For example, Fig. 4 shows the DLWT equivalent of the lifting DBWT structure consisting of down and uplifting steps (Fig. 2). The VLSI implementation is flexible due to parallel filter blocks in analysis and synthesis parts. Fig. 4. The DLWT equivalence of the lifting BDWT structure described in Fig. 2. 5. Sign modulated BDWT In VLSI architectures, where the analysis and synthesis filters are directly implemented (Fig. 1), the VLSI design simplifies considerably using a spesific sign modulator defined as (Olkkonen & Olkkonen 2008) 1 for n even ( 1) -1 for n odd n n S       (5) A key idea is to replace the reconstruction filters by scaling and wavelet filters using the sign modulator (5). Fig. 5 describes the rules how ( )H z  can be replaced by ( )H z and the sign modulator in connection with the decimation and interpolation operators. Fig. 6 [...]... and 1 The dataflow of the first run then proceeds as shown in Table 1 The second run begins at cycle 19 and yields its first 4 output coefficients at cycle 37 20 VLSI CP1 & CP2 input latches Rt0 Rt1 1 2 2 3 1 4 2 5 1 6 2 7 1 8 2 9 1 10 2 11 1 12 2 13 1 14 2 15 1 16 2 17 1 18 2 LL0,0 LH0,0 HL0,0 HH0,0 LL0 ,1 LH0 ,1 HL0 ,1 HH0 ,1 LL1,0 LH1,0 HL1,0 HH1,0 LL1 ,1 LH1 ,1 HL1 ,1 HH1 ,1 LL2,0 LH2,0 HL2,0 HH2,0 LL2 ,1. .. LH2 ,1 HL2 ,1 HH2 ,1 LL3,0 LH3,0 HL3,0 HH3,0 LL3 ,1 LH3 ,1 HL3 ,1 HH3 ,1 - - 19 1 20 RUN 2 CP 1 RUN 1 Ck f2 2 21 1 22 2 23 1 LL0,2 LH0,2 HL0,2 HH0,2 LL1,2 LH1,2 HL1,2 HH1,2 LL2,2 LH2,2 CP1 output latches Rtl0 Rtl1 CP2 output latches Rth0 Rth1 RP1 input latches Rt0 Rt1 RP2 input latches Rt0 Rt1 Output latches of RP1 RP2 Rt0 Rt1 Rt0 Rt1 L0,0 H0,0 L1,0 H1,0 L0 ,1 H0 ,1 L1 ,1 H1 ,1 L2,0 H2,0 L3,0 H3,0 L2 ,1. .. H2 ,1 L3 ,1 H3 ,1 L4,0 H4,0 L5,0 H5,0 L4 ,1 H4 ,1 L5 ,1 H5 ,1 X0,0 X0 ,1 X1 ,1 X1,0 L6,0 H6,0 L7,0 H7,0 X0,2 X1,2 L0,0 L1,0 L0 ,1 L1 ,1 L2,0 L3,0 L2 ,1 L3 ,1 L4,0 L5,0 L4 ,1 L5 ,1 L6,0 L7,0 L6 ,1 L7 ,1 H0,0 H1,0 H0 ,1 H1 ,1 H2,0 H3,0 H2 ,1 H3 ,1 H4,0 H5,0 H4 ,1 H5 ,1 H6,0 H7,0 High Performance Parallel Pipelined Lifting-based VLSI Architectures for Two-Dimensional Inverse Discrete Wavelet Transform 24 2 25 1 26 2 27 1. .. 24 2 25 1 26 2 27 1 28 2 29 1 30 2 31 1 32 2 33 1 34 2 35 1 36 37 2 1 38 - H6 ,1 H7 ,1 L0,2 L1,2 L2,2 L3,2 L4,2 L5,2 L6,2 L7,2 H0,2 H1,2 H2,2 H3,2 H4,2 H5,2 H6,2 H7,2 2 39 HL2,2 HH2,2 LL3,2 LH3,2 HL3,2 HH3,2 1 Table 1 Dataflow for 2-parallel 5/3 architecture - L6 ,1 H6 ,1 - 21 L7 ,1 H7 ,1 X2,0 X2 ,1 X3 ,1 X3,0 X2,2 - X3,2 L0,2 H0,2 L1,2 H1,2 X4,0 X4 ,1 X5 ,1 X5,0 L2,2 H2,2 L3,2 H3,2... 6 X 0 X1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 5 3 1 7 5 redundant computatio ns 7 X 0 X1 X 2 X 3 X 4 X 5 X 6 X 7 (a ) (b) Fig 3 5/3 synthesis algorithm’s DDGs for (a) odd and (b) even length signals Y (n) 3 2 1 0 1 2 3 4 5 6 7 8 7 6 5 3 3 0 1 5 6 7 6 2 2 1 4 Y 2n  1 k Y 2n  Y 2n Y2n 1 x(2n) x(2n  1) k 1 2 k k 1 k 1 4 1 4 3 k 6 5 3 2 0 k k 1 k 1 2 0 1 k 1 k 8 7 6 5 k 1 8 7 k 1 6 7 X 0 X1 X 2 X... VLSI Architecture Design 3 Fig 3 The general DLWT structure T0 R0  LW0 1 T W  L R  0 1 1 1 L0 R0  TW0   z  k 1  T1R1  L0W1   0  This is satisfied if we state W0   L0 , W1   L1 , 0   z k  R0  T1 and (3) R1  T0 reconstruction condition follows then from the diagonal elements (3) as T0 ( z )T1 ( z )  L0 ( z ) L1 ( z )  z  k The perfect (4) There exists many approaches in the design... to locations 0 ,1 and 1, 1 Then coefficients in locations 0,0 and 0 ,1 are executed by RP1, while coefficients of locations 1, 0 and 1, 1 are executed by RP2 Second, CP1 will generate two coefficients for locations 0,2 and 1, 2, while CP2 generates two coefficients for locations 0,3 and 1, 3 Then coefficients in locations 0,2 and 0,3 are executed by RP1, while coefficients in locations 1, 2 and 1, 3 are executed... at z  1 (  0) , where M is the number of vanishing moments Writing (8) for the prototype filter (7) we obtain two equations 2a  2b  1  0 and 20a  36b  9  0 , which give the solution a  9 /16 and b  1/ 16 The wavelet filter has the z-transform H1 ( z )  (1  z 1 ) 4 (1  4 z 1  z 2 ) /16 (9) having fourth order root at z =1 The wavelet filter can be realized in the HBF form H1 ( z )... wavelet filters both obey the HBF structure (Olkkonen et al 2007c) 1  z 1 B ( z 2 ) 2 1 H 1 ( z )   z 1 B ( z 2 ) 2 H 0 ( z)  (18 ) Discrete Wavelet Transform Structures for VLSI Architecture Design 7 For example, the impulse response h0 [ n]  [ 1 0 9 16 9 0 -1] / 32 has the fourth order zero at    and h1[n]  [1 0 -9 16 -9 0 1] / 32 has the fourth order zero at   0 In the tree structured... 2005): 5/3 synthesis algorithm  Y (2n  1)  Y (2n  1)  2  step1: X (2n)  Y (2n)    4   (1) X (2n)  X (2n  2)   step2 : X (2n  1)  Y (2n  1)    2   9/7 synthesis algorithm Step 1: Y (2n)  1 k  Y (2n) Step 2: Y ( 2n  1)  k  Y (2n  1) Step 3: Y (2n)  Y (2n)   (Y (2n  1)  Y (2n  1) ) (2) Step 4: Y (2n 1)  Y (2n 1)   (Y (2n)  Y (2n  2)) High Performance . 1 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 k k T R LW L R TW z T W L R T R L W z                    (3) This is satisfied if we state 0 0 W L   , 1 1 W L   , 0 1 R T  and 1. 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 k k T R LW L R TW z T W L R T R L W z                    (3) This is satisfied if we state 0 0 W L  , 1 1 W L  , 0 1 R T and 1.                 2 )22()2( )12 ( )12 (:2 4 2 )12 ( )12 ( )2()2( :1 nXnX nYnXstep nYnY nYnXstep (1) 9/7 synthesis algorithm Step 1: )2 (1) 2( nYknY   Step 2: )12 ( )12 (   nYknY Step 3: ) )12 ( )12 (()2()2(          nYnYnYnY 

Ngày đăng: 21/06/2014, 11:20

Xem thêm

IVLSI Part 1 potx