The architecture of a vietnamese 32 bit RISC microprocessor, the VN1632= kiến trúc vi xử lý 32 bit kiểu RISC của việt nam, chip VN1632

TẠP CHÍ PHÁT TRIỂN KH&CN, TẬP 14, SỐ K1 - 2011 THE ARCHITECTURE OF A VIETNAMESE 32-BIT RISC MICROPROCESSOR, THE VN1632 Ngo Duc Hoang, Hau Nguyen Thanh Hoang, Nguyen Phu Quoc, Do Ngoc Quynh IC Design Research and Education Center (ICDREC) (Manuscript Received on April 08th, 2010, Manuscript Revised November 25th, 2010) ABSTRACT: VN1632 is the first 32-bit Vietnamese-designed microprocessor Its design is based on the Harvard 32-bit RISC architecture but with a five-stage pipeline This article presents the architecture overview and the implementation of the microprocessor The overview shows main features, the block diagram and descriptions of most salient blocks, namely registers and pipeline The implementation describes the design detail of each block A detailed simulation was carried out to check the overall performance of the design which was then entrusted to an American fab for fabrication using the 0.13um IBM process Testing results of VN1632 proved that the architecture works correctly with desired performance Keywords: microprocessor, RISC, computer architecture, pipeline In the present paper, we introduce the INTRODUCTION A microprocessor is a computer itself It is, so to say, a conglomeration of all necessary functional parts for processing information data RISC, or Reduced Instruction Set Computer, is an architecture that uses a small, The 32-bit microprocessor VN1632 was and developed based on the experiences accumulated by the success of other 8-bit microcontrollers [1][2][3] The challenge of this new task was not only the complexity, the larger scale of the 32-bit microprocessor, but to ensure the design originality, many new and hard issues have been studied and implemented: cache memory, prefetch buffer, write buffer, store buffer, bus interface, co-processor, etc… characteristics are the architecture of Harvard 32-bit RISC but with a five-stage pipeline and on-chip cache memory, in which instruction cache and data cache are separate The present paper highly-optimized set of instructions designed characteristics of the microprocessor The main also describes the architectural implementation of the microprocessor This implementation describes general and detailed specification The general specification shows the modules from the top view and the connection among the modules The detailed specification shows the detailed implementation inside each module The main architectural difference between the VN1632 and others is the architecture of five-stage pipeline, in which five successive instructions are loaded simultaneously in five different pipeline stages As a result, five Trang Science & Technology Development, Vol 14, No.K1- 2011 instructions are executed at the same time This effectively improves the performance of the microprocessor 2.2 Block diagram Block diagram of a design give us the top view of the design The VN1632 comprises the The design has been synthesized, simulated and fabricated using 0.13um IBM process The following blocks • result shows that this architecture works correctly with desired performance CPU registers: Registers used inside the CPU • CP0 registers: Registers configure system operations inside and outside ARCHITECTURE OVERVIEW the CPU 2.1 Features The VN1632 has the following main • ALU/Shifter: Computational unit • MAC: features Computational unit for multiply/ add • • Harvard RISC architecture • Five-stage pipeline architecture • Separate instruction cache and data Instruction Cache: A cache memory for instruction fetch • Data Cache: A cache memory for load/store data cache • • Built-in cache memory • 65 instructions interface between the CPU and • 32-bit instruction width external circuit • Multiply in only clock cycles • Debug support with breakpoint • Synchronous design Figure Block diagram Trang Bus interface unit: Controlling bus TẠP CHÍ PHÁT TRIỂN KH&CN, TẬP 14, SỐ K1 - 2011 operation of the CPU They are shown as 2.3 Registers description There are two kinds of 32-bit register in the follows CPU • 32 general purpose registers registers CP0 registers contain informations to • A program counter (PC) config system operations inside and outside the • A Branch Target Address (BTA) CPU Meanwhile, CPU registers are used for • HI/LO registers for storing the result microprocessor, CP0 registers and of multiply operation General Purpose Registers 31 Multiply Registers r0 HI r1 LO r2 Program counter PC r29 Branch Target Address r30 r31 BTA Figure CPU registers The five pipeline stages are Instruction 2.4 Pipeline description The VN1632 uses a architecture of five- Fetch (F), Instruction Decode (D), Execute (E), stage pipeline Each stage performs its own Memory Access (M), Write-Back (W) Each task which interacts other stages When the stage is executed in one clock cycle They are pipeline is fully utilized, five successive divided into individual modules which are instructions are simultaneously in five different described later in this paper pipeline stages Five instructions are executed at the same time resulting in execution rate of The 5-stage pipeline architecture are shown in the Figure one instruction per cycle This effectively improves the performance of the microprocessor Figure Five-stage pipeline architecture Trang Science & Technology Development, Vol 14, No.K1- 2011 implement the microprocessor VN1632 The IMPLEMENTATION general implementation block diagram in the 3.1 General Specification General Specification shows Figure shows main blocks and main the connections framework of a design It is the first task to among the blocks of the microprocessor VN16_32 Processor Core Program address FETCH DECODE EXECUTE MEM WB CP0 ALU I_Cache RF D_Cache PC Wb_result Bus Interface Unit AMBA BUS Figure General implementation block diagram The microprocessor is divided into modules: FETCH (F), DECODE then generating signals to control the (D), following stages Besides, it holds EXECUTE (EX), MEMORY (MEM), WRITE the 32 general purpose registers of BACK (WB), BUS INTERFACE UNIT (BIU) the CPU The first modules above correspond to the • EXECUTE: The main part of this stages of the pipeline Respectively, they are: module is an Arithmetic Logical Unit Instruction Fetch (F), Instruction Decode (D), (ALU) The mission of the ALU is to Execute (E), Memory Access (M), and Write- calculate from operands provided by Back (W) DECODE and to feed results to the • FETCH: instructions This module from slow next stage gets external • memories and store in fast internal from slow external memories and memory stores (I_Cache) Then the fast internal memory (D_Cache) Then the data can be from read/written I_Cache instead of slow DECODE: This module decodes instructions that are fetched from IC, Trang in instruction can be fetched quickly quickly from/to the internal memory external memory • MEMORY: This module gets data • WRITE BACK: The purposes of this modules are to generate final results, TẠP CHÍ PHÁT TRIỂN KH&CN, TẬP 14, SOÁ K1 - 2011 to control branching (performed via data from the CPU to external bus PC), system and to receive the data from and to co-processing operations (performed by CP0) • BUS INTERFACE bus system to the CPU UNIT: The 3.2 Module FETCH purpose of this module is to transmit INSTRUCTION QUEUE (IQ) instr_val address_control IQ CONTROL instr instr_address I-CACHE PREFETCH BUFFER SRAM WAY1 INSTRUCTION ADDRESS (IA) ICACHE CONTROL execute_redirect ext_req prefetch_address From WB instruction_ready redirect_address SRAM WAY2 To BIU Figure Block diagram of module FETCH Figure shows the block diagram of memory They are used to temporarily module FETCH The module consists of store the instructions that are fetched main blocks: INSTRUCTION ADDRESS (IA), from SRAMs (SRAM stands for Synchronous instructions are read from SRAMs, Random instead of external memory Access BUFFER, Memory), ICACHE PREFETCH CONTROL, • INSTRUCTION QUEUE (IQ) • IA: This address block The PB: This block fetches instructions to SRAMs It sends handshaking signals 32-bit the to BIU and then get data there next instructions The output address is • memory from external memory and write to generates pointing external • ICACHE CONTROL: This block is a controlled by signals from IQ and state machine (SM) that controls all WB Signals from IQ control the the operations of module FETCH It increase of the output address, and gets signals from IQ and PREFETCH signals BUFFER, then send back control from WB provide an immediate address to IA signals to them It also determines the SRAMs: These are internal memory time to write data to SRAMs that is much faster than external memory They are also called cache • IQ: Instructions are queued in IQ, go in turn to the following stage The Trang Science & Technology Development, Vol 14, No.K1- 2011 mechanism of operation is First In SRAMs and PB Then, instructions First Out (FIFO) When IQ is “empty” from SRAMs or PB will fill up IQ (“empty” means less than one 3.3 Module DECODE instruction in IQ), it sends request to Figure Block diagram of module DECODE Figure shows the block diagram of - Load / Store function (ls_func) module DECODE The module consists of - ALU function (alu_func) main blocks: INSTRUCTION DECODE, REG - Operand select (op1_sel, op2_sel) FILE, - Destination select (dest_sel) OPERAND DECODE, DATA • DEPENDENCY • INSTRUCTION DECODE: general purpose registers, and HI/LO This registers block decodes the instruction supplied by module FETCH, then generates the REG FILE: This block contains 32 • OPERAND DECODE: The purpose following control signals: of this block is to choose operands - Branch function (brn_func) Two operands will be selected The - Immediate value (imm) selection depends on control signals - CP0 function (cp0_func) from INSTRUCTION DECODE and DATA DEPENDENCY Trang 10 TẠP CHÍ PHÁT TRIỂN KH&CN, TẬP 14, SỐ K1 - 2011 • DATA DEPENDENCY: This block sends selection signals to OPERAND determines the dependency of data DECODE among the three following stages, then 3.4 Module EXECUTE Figure Block diagram of module EXECUTE Figure shows the block diagram of subtract, shift, and, or, xor, not, module EXECUTE The module consists of • main blocks: ALU, MULT • compare, etc… MULT (multiplier): This block ALU (ALU stands for Arithmetic multiplies the two 32-bit operands Logic Unit): This block calculates and OP_1 and OP_2, and then generates generates results base on the two 32- 64-bit product This product will be bit operands OP_1 and OP_2, and stored in HI/LO registers MULT is control performed in clock cycles signal alu_func It is performed in one clock cycle It these following operation: 3.5 Module MEMORY add, Trang 11 Science & Technology Development, Vol 14, No.K1- 2011 Figure Block diagram of module MEMORY Figure shows the block diagram of sends handshaking signals to BIU and module MEMORY The module consists of main blocks: LS CONTROL, STB, PB, WB, then gets data there • SRAMs, and MUX • WB before being written to external memory LS CONTROL (Load/Store Control): This is a state machine that controls • that is much faster than external MEMORY It gets signals from other memory They are also called cache block, then send back control signals memory They are used to temporarily to them It also determines the time to store data that are fetched from write data to SRAMs external memory Then data are read STB the operations (Store Buffer): of Data from SRAMs, instead of external is memory temporarily stored in STB before being stored in SRAMs and WB • SRAMs: These are internal memory module all • WB (Write Buffer): Data is pended in • MUX (multiplexer): This multiplexer PB (Prefetch Buffer): This block is used to select result from ALU or fetches instructions from external result from D-Cache memory and writes to SRAMs It Trang 12 3.6 Module WRITE BACK TẠP CHÍ PHÁT TRIỂN KH&CN, TẬP 14, SỐ K1 - 2011 Figure Block diagram of module WRITE BACK Figure shows the block diagram of BRANCH SM determines when module WRITE BACK The module consists performing a branch, and where the of main blocks: BRANCH CONTROL and branch go to • CP0 • CP0 (Co-processor 0): This block BRANCH CONTROL: This block contains CP0 registers that hold controls the branching issue of the configuration of the whole CPU CPU It contains register PC and system It also controls the operation BRANCH SM The register PC holds of interrupt and software trap the address of the current instruction 3.7 Module BUS INTERFACE UNIT biu_ext_data[31:0] CPU_IF ic_ext_req dc_ext_req_rd FSM receive_data_nxt, receive_instr_nxt biu_data_ready biu_instr_ready dc_ext_req_wr biu_hwrite biu_htrans[1:0] haddr rd_size[2:0] dc_byte_val_wr[3:0] rd_addr _gen biu_hsize [2:0] hwrite wr_addr _size_gen wr_size[2:0] wr_byte_addr[1:0] dc_byte_val_rd[3:0] biu_haddr[31:0] hsize rd_byte_addr[1:0] ADDR_SIZE_GEN hburst hwdata biu_hburst [2:0] biu_hwdata [31:0] AHB_IF Figure 10 Block diagram of module BUS INTERFACE UNIT Trang 13 Science & Technology Development, Vol 14, No.K1- 2011 Figure 10 shows the block diagram of Process IBM 130nm Frequency 104 MHz Power 30.6 (mW) Resource 249606 Gates Width 1144 (um) Height 1138 (um) Voltage 1.08 – 1.65 (V) Temperature -55 – +127 (C) I/O Pad 284 module BUS INTERFACE UNIT The module consists of main blocks: FSM, ADDR_SIZE_GEN, CPU_IF and AHB_IF • FSM: This block is a state machine that controls the operation of other blocks • ADDR_SIZE_GEN: This block generates addresses that are used to determines which byte/word is written or read Besides, it also generates size of read/write data The addresses and size will be used in AHB_IF block • CPU_IF (CPU interface): This block is used to communicate with CPU • AHB_IF (AHB interface): This block is used to communicate with external bus RESULTS CONCLUSION We have reported the architecture of the VN1632 which employs a five-stage pipeline We observed that this pipeline architecture highly improves the microprocessor’s performance Furthermore, we also found that this architecture has many good features to The VN1632 has been designed and fabricated using 0.13um IBM process The prototype chips have been done with many applications The results show that our chip worked corrently with desired performance The characteristics of our design are as follows: Trang 14 work effectively Therefore, this should be inherited in the next generation of Vietnamese 32-bit microprocessor TẠP CHÍ PHÁT TRIỂN KH&CN, TẬP 14, SỐ K1 - 2011 KIẾN TRÚC VI XỬ LÝ 32-BIT KIỂU RISC CỦA VIỆT NAM, CHIP VN1632 Ngơ Đức Hồng, Hầu Ngun Thanh Hoàng, Nguyễn Phú Quốc, Đỗ Ngọc Quỳnh Trung tâm Nghiên cứu Đào tạo Thiết kế Vi mạch TÓM TẮT: VN1632 vi xử lý Việt Nam thiết kế Thiết kế dựa kiến trúc RISC Harvard 32-bit với kiểu đường ống tầng (five-stage pipeline) Bài báo giới thiệu tổng quát thiết kế, đồng thời trình bày phần thực phần cứng Phần giới thiệu tổng qt trình bày mơ tả đặc điểm thiết kế, là: sơ đồ khối, tập ghi, cấu trúc đường ống Phần thực phần cứng mô tả chi tiết bên khối Một trình mơ chi tiết xây dựng để kiểm tra toàn hoạt động thiết kế Sau hoàn thành, thiết kế gởi chế tạo với công nghệ IBM 0.13um nhà máy sản xuất chip Mỹ Chip VN1632 kiểm tra thực tế kết cho thấy kiến trúc hoạt động với hiệu suất đề [3] HN-07 microprocessor – the second REFERENCES Vietnamese [1] The first made-in-Viet Nam 8-bit chip named RISC SigmaK3, (2008): microprocessor, Science & Technology Development, Vol 12, No.16, (2009) http://forum.eetasia.com [2] Vietnam - The Rising Tiger in the Semiconductor Industry, (2008): http://www.frost.com/prod/servlet/marketinsight-top.pag?docid=125651805 Trang 15 ... inherited in the next generation of Vietnamese 32- bit microprocessor TẠP CHÍ PHÁT TRIỂN KH&CN, TẬP 14, SOÁ K1 - 2011 KIẾN TRÚC VI XỬ LÝ 32- BIT KIỂU RISC C? ?A VI? ??T NAM, CHIP VN1 632 Ngô Đức Hoàng,... unit • MAC: features Computational unit for multiply/ add • • Harvard RISC architecture • Five-stage pipeline architecture • Separate instruction cache and data Instruction Cache: A cache memory... data to SRAMs external memory Then data are read STB the operations (Store Buffer): of Data from SRAMs, instead of external is memory temporarily stored in STB before being stored in SRAMs and

Định dạng
Số trang	11
Dung lượng	289,77 KB