Advanced Computer Architecture - Lecture 10: Computer hardware design. This lecture will cover the following: pipeline datapath and control design; features of pipelined processor; pipelining lessons; pipelined processor design; pipelined registers included; multiple cycle verses pipeline – pipeline enhances performance;...
CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof Dr M Ashraf Chughtai Recap: Lecture Single cycle verses multi cycle datapath Key components of multi cycle data path Design and information flow in multi cycle data path Multi cycle control unit design Finite State Machine–based control Unit Microprogram-based controller MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) What is pipelining? Pipelining is a fundamental concept It utilizes capabilities of the Datapath by MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) Pipelining is Natural! Laundry Example! Four loads: A, B, C, D Four laundry operations: A B C D Wash, Dry, fold and place into drawers Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) Sequential Laundry PM T a s k O r d e r A 10 11 12 AM 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time B C D Explanation next please …………… MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) Pipelined Laundry: Start work ASAP PM T a s k O r d e r 10 30 30 30 30 30 30 30 11 12 AM Time A B C D Pipelined laundry takes 3.5 hours for loads! MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) Features of Pipelined Processor All the functional units operate independently Multiple tasks operating simultaneously using different resources Pipelining doesn’t help latency of single task, it helps throughput of entire workload Potential speedup = Number pipe stages ……… Cont’d MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) Next please! Pipelining Lessons Pipeline rate limited by: - Slowest pipeline stage - Time to “fill” pipeline and time to “drain” it reduces speedup - Unbalanced lengths of pipe stages reduces speedup If washer takes longer time than the dryer then dryer has to wait! Stall for Dependences MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) Five Steps of Datapath Ins fetch Dec/Reg Exec Mem Wr MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) Pipelined Processor Design MAC/VU-Advanced Computer Architecture B Lecture 10 –Computer Hardware Design (4) Equal WB Ctrl Write Back (Reg Wrt) Reg File IRwb IRmem Exec S Mem Ctrl Ex Ctrl A Memory Rd/Wrt Mem Access IRex Execute/ Address Dcd Ctrl IR ID/Register Read Reg File PC Next PC Inst Mem Instruction Fetch M Data Mem 10 Pipelined Registers Included MAC/VU-Advanced Computer Architecture B Lecture 10 –Computer Hardware Design (4) Equal WB Ctrl Write Back (Reg Wrt) Reg File IRmem Exec S Mem Ctrl IRwb Ex Ctrl A Memory Rd/Wrt Mem Access IRex Execute/ Address Dcd Ctrl ID/Register Read Reg File PC Next PC Inst Mem IR Instruction Fetch M Data Mem 12 Five Steps as Stages of Pipeline Load Cycle 1 Cycle 2 Cycle 3 Ifetch Reg/Dec Exec Cycle 4 Mem Cycle 5 Wr MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 13 Multiple Cycle verses Pipeline – Pipeline enhances performance Cycle 2 3 4 5 6 7 8 9 10 11 12 13 14 Clk Multiple Cycle Implementation: Load Store Rtype Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch Reg Exec Mem Pipeline Implementation: Load Ifetch Reg Exec Mem Wr Store Ifetch Reg Exec Mem Wr Rtype Ifetch Reg Exec Mem Wr Explanation next slide…… MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 14 Instructions program reconsidered Load Store R-type (ADD) MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 15 Example The cycle time of a single cycle machine is 45 ns, and of multi cycle and pipelined machines is 10 ns; and average CPI due to instruction mix on multi cycle machine is 4.6 What is the execution time on each type of machine? Ans: Single Cycle Machine – 45 ns/cycle x CPI x 100 inst = 4500 ns Multi Cycle Machine – 10 ns/cycle x 4.6 CPI x 100 inst = 4600 ns Pipelined machine – 10 ns/cycle x (1 CPI x 100 inst + cycle drain) = 1040 ns MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 16 Another Example Consider a multicycle, unpiplined processor requires cycles for the ALU and Branch operations and cycles for the memory operation Assume the relative frequency of these operations is 40%, 25% and 35% respectively; and the clock cycle is of n sec In pipelined implementation, due to clock skew and setup processor adds 0.2 n sec to the clock Ignoring any latency impact, how much is the speedup from the pipelined processor? MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 17 Solution Unpiplined Processor: Average Execution Time/Instruction = Clock Cycle x Average CPI = n sec x [{(0.4 +.25)} x + 0.35 x 5] = n sec x (0.65 x + 0.35 x 5) = n sec x (2.60 + 1.75) = 4.35 n sec Pipelined Processor: Average Execution Time/ Instruction = Clock cycle + overhead = n sec + 0.2 n sec = 1.2 n sec Speed up = 4.35 / 1.2 = 3.62 times MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 18 Pipelined Execution Representation Conventional Representation - Helps showing the program flow viz-a-viz time Time Program Flow 1st Inst IFetch Dcd 2nd Inst 3rd Inst 4th Inst 5th Inst MAC/VU-Advanced Computer Architecture Exec IFetch Dcd Mem Exec IFetch Dcd WB Mem WB Exec Mem WB Exec Mem IFetch Dcd IFetch Dcd Lecture 10 –Computer Hardware Design (4) Exec WB Mem WB 19 Graphical Representation Instr Instr Reg D. Mem Reg I.Mem Reg D. Mem I.Mem Reg D.Mem I.Mem Reg D.Mem I.Mem Reg ALU Instr CC5 ALU Instr I.Mem CC4 ALU O r d e r Instr CC3 ALU I n s t r CC1 ALU Time (clock cycles) CC2 CC6 CC7 CC8 CC9 Reg Reg Reg Mem Reg Explanation…… Next Please MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 20 Why Pipeline? Because the resources are there! Time (clock cycles) Inst MAC/VU-Advanced Computer Architecture Im Dm Reg Dm Im Reg Im Reg Reg Lecture 10 –Computer Hardware Design (4) Reg Dm ALU Inst Reg Reg ALU Inst Im Dm ALU Inst Reg ALU O r d e r Inst Im ALU I n s t r Reg Dm Reg 21 Can pipelining get us into trouble? Structural hazards – Data hazards – Control hazards MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 22 How Stall degrades the performance? The pipelined CPI with stalls = Ideal CPI + Stall clock cycles per instruction MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 23 How Stall degrades the performance? Speedup w.r.t unpiplined = CPI Unpiplined + stall cycles per instruction Speedup w.r.t pipeline depth: : Speedup w.r.t pipeline depth = pipeline depth + stall cycles per instruction MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 24 Summary multi cycle datapath verses pipeline datapath Key components of pipeline data path Performance enhancement due to pipeline Hazards in pipelined datapath MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 25 Asslam-u-aLacum and ALLAH Hafiz MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 26 ... datapath MAC/VU -Advanced Computer Architecture Lecture 10 ? ?Computer Hardware Design (4) 25 Asslam-u-aLacum and ALLAH Hafiz MAC/VU -Advanced Computer Architecture Lecture 10 ? ?Computer Hardware Design (4)... MAC/VU -Advanced Computer Architecture Lecture 10 ? ?Computer Hardware Design (4) 14 Instructions program reconsidered Load Store R-type (ADD) MAC/VU -Advanced Computer Architecture Lecture 10 ? ?Computer. .. Mem[S]