Advanced Computer Architecture - Lecture 2: Quantitative principles. This lecture will cover the following: detailed discussion on the computer performance – the key to quantitative design and analysis; growth in processor performance; price-performance design; CPU performance metrics;...
CS 704 Advanced Computer Architecture Lecture 2 Quantitative Principles Detailed discussion on the computer Performance – the key to quantitative design and analysis MAC/VU-Advanced Computer Architecture Lecture - Performance Today’s Topics Recap of Lecture Growth in processor performance Price-performance design CPU performance metrics CPU benchmarks suites Summary MAC/VU-Advanced Computer Architecture Lecture - Performance Recap of Lecture Computer Systems: Architecture refers to those attributes of a computer visible to a programmer or compiler writer; e.g instruction set, addressing techniques, I/O mechanisms etc Organization refers to how the features of a computer are implemented? i.e., control signals are generated using the principles of finite state machine (FSM) or microprogramming MAC/VU-Advanced Computer Architecture Lecture - Performance Recap of Lecture Computer Development: •Academically, modern computer developments have their infancy in 1944-49 •Commercially, the first machine was built by EckertMauchly Computer Corporation in 1949 •Technological developments, from vacuum tubes to VLSI circuits, dynamic memory and network technology gave birth to four different generations of computers •Microprocessor and PCs were introduced in 1971 MAC/VU-Advanced Computer Architecture Lecture - Performance Recap of Lecture Design Perspectives: Processor – ISA, ILP and Cache Memory hierarchy: Multilevel cache and Virtual memory input/output and storages multiprocessor and networks MAC/VU-Advanced Computer Architecture Lecture - Performance Recap of Lecture Computer Design Cycle: • The computer design and development has been under the influence of -Technology -performance and -cost; the decisive factors for rapid changes in the computer development have been the performance enhancements, price reduction MAC/VU-Advanced Computer Architecture Lecture - Performance and functional improvements Growth in Processor Performance Insert Slide here •The supercomputers and mainframes, costing millions of dollars and occupying excessively large space, prevailing form of computing in 1960s were replaced with relatively low-cost and smaller-sized minicomputers in 1970s •In 1980s, very low-cost microprocessor-based desktop computing machines in the form of personal computer (PC) and workstation were introduced MAC/VU-Advanced Computer Architecture Lecture - Performance Growth in Processor Performance Insert Slide here •The growth in processor performance since mid-1980s has been substantially high than in earlier years •Prior to the mid-1980s microprocessor performance growth was averaged about 35% per year •By 2001 the growth raised to about 1.58 per year MAC/VU-Advanced Computer Architecture Lecture - Performance Growth in Processor Performance Performance relative to MIPS 1600 Intel P-III 1400 1200 HP 9000 1000 ■ DEC Alpha ■ 800 600 400 200 ■ ■ ■ IBM HP 9000 ■ DEC MIPS Power1 ■ Alpha ■ R2000 ■ 1984 1986 1988 1990 1992 1994 1996 1998 2000 Year MAC/VU-Advanced Computer Architecture Lecture - Performance Price-Performance Design Technology improvements are used to lower the cost and increase performance The relationship between cost and price is complex one The cost is the total amount spends to produce a product The price is the amount for which a finished good is sold MAC/VU-Advanced Computer Architecture Lecture - Performance 10 Price-Performance Design • Time to run the task: • Execution time, response time, latency • Throughput or bandwidth: • Tasks per day, hour, week, sec, ns … MAC/VU-Advanced Computer Architecture Lecture - Performance 30 Price-Performance Design Insert Slid 32 • Example: • To carry 2400 passengers from Lahore to Islamabad – • Train completes the task in 4:00 hrs while airplane completes the same task in 6.00 hrs.; • e., 66.67% of the task in same time – throughput and hence performance of train is 50% more than airplane MAC/VU-Advanced Computer Architecture Lecture - Performance 31 Price-Performance Design: Example Vehicle Time Lah to Isb Passenge rs/ trip Time to complete job Execution time /person Cost / person Cost-performance Train 4.0 hours 2400 4.0 hours 6.0 sec 300 Rs 300x6=1,800 Rs-sec/person Plane 45 300 45x8 = 6.0 Hr 9.0 sec 3000 Rs 3000x9=27,000 Rs-sec/person Plane 10 time faster but takes 50% more time to complete the job; i.e., lesser throughput – thus performance of train is 50%better than plane MAC/VU-Advanced Computer Architecture Lecture - Performance The time per person and cost person of train is less than that of plane Thus the cost-performance of plane is 1:15 32 Metrics of Performance Insert Slide 33 Answers per month Operations per second Application Programming Language Compiler MIPS: Millions of Instructions per second MFLOPS: millions of FP operations per sec Instruction Set Architecture Datapath Control Function Units Transistors Pins/ Wire – I/O Megabytes per second Cycles per second (clock rate) MAC/VU-Advanced Computer Architecture Lecture - Performance 33 Aspects of CPU Performance CPU time = Seconds = Instructions x Program Program Inst Count Program √ Compiler √ Inst Set √ Organization Cycles x Seconds Instruction CPI Cycle Clock Rate √ √ √ Technology MAC/VU-Advanced Computer Architecture Lecture - Performance √ 34 Cycles Per Instruction • Cycles per Instruction – CPI = CPU Clock Cycles for program / Instruction Count = (CPU Time * Clock Rate) / Instruction Count • Instruction Frequency – For instruction mix, the relative frequency of occurrence of different types of instructions is given as: FICi = IC of ith instruction / Total Instruction count • Average Cycles per Instruction – n n CPI = [1/Instruction count] ∑ ICi x CPIi = ∑ FICi x CPIi i=1 MAC/VU-Advanced Computer Architecture Lecture - Performance i=1 35 Example: Calculating average CPI Base Machine (Reg / Reg) Op ALU Load Store Branch Freq 50% 20% 10% 20% Cycles 2 CPI (i) 0.5 0.4 0.2 0.4 (% Time) (33%) (27%) (13%) (27%) 1.5 MAC/VU-Advanced Computer Architecture Lecture - Performance 36 Cycles Per Instruction Arithmetic mean time: n 1/n ∑ Time i i=1 Weighted arithmetic mean time: n ∑ w i x Time i i=1 Geometric mean time: n / n / π Execution time ratio i √ I =1 MAC/VU-Advanced Computer Architecture Lecture - Performance 37 Summary: Price-Performance Design Computer cost: The total cost of manufacturing a computer is distributed among different parts of the system such as the cost of cabinet, processor board and I/O devices Performance Time is the key measurement of performance Comparing performance of two designs: the ratio, η = Execution time Y / Execution time X determines how much lower execution time machine Y takes as compared to X ; as performance is inverse of execution time, i.e., η = Performance X / Performance Y MAC/VU-Advanced Computer Architecture Lecture - Performance 38 Instruction Execution Rate - MIPS MIPS specify performance inversely to execution time; For a given program: MIPS = (instruction count) / (execution time x 106) MIPS could not be calculated from the instruction mix Relative MIPS for a machine ‘M’ is defined based on some reference machine as: RMIPS = [Performance M / Performance reference] x MIPS reference or = [Time reference / Time M] x MIPS reference MFLOPS defined for Floating-point-intensive programs as millions of floating-point operations per second MAC/VU-Advanced Computer Architecture Lecture - Performance 39 CPU Benchmark Suites Performance Comparison: the execution time of the same workload running on two machines without running the actual programs Benchmarks: the programs specifically chosen to measure the performance Five levels of programs: in the decreasing order of accuracy – Real Applications – Modified Applications – Kernels – Toy benchmarks – Synthetic benchmarks MAC/VU-Advanced Computer Architecture Lecture - Performance 40 SPEC: System Performance Evaluation Cooperative First Round 1989: 10 programs yielding a single number – SPECmarks Second Round 1992: SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Third Round 1995 – new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) – “benchmarks useful for years” – Single flag setting for all programs: SPECint_base95, SPECfp_base95 MAC/VU-Advanced Computer Architecture Lecture - Performance 41 Summary: Designing and performance comparison • Designing to Last through Trends Capacity • Speed Logic 2x in years 2x in years DRAM 4x in years 2x in 10 years Disk 4x in years 2x in 10 years 6yrs to graduate => 16X CPU speed, DRAM/Disk size • Time to run the task – Execution time, response time, latency • Tasks per day, hour, week, sec, ns, … – Throughput, bandwidth • “X is n times faster than Y” means ExTime(Y) Performance(X) = ExTime(X) Performance(Y) MAC/VU-Advanced Computer Architecture Lecture - Performance 42 Summary …… Cont’d CPI Law: CPU CPUtime time == Seconds Seconds == Instructions Instructions xx Cycles Cycles xx Seconds Seconds Program Program Instruction Cycle Program Program Instruction Cycle Execution time is the REAL measure of computer performance! Good products created when have: – Good benchmarks, good ways to summarize performance Die Cost goes roughly with die area4 MAC/VU-Advanced Computer Architecture Lecture - Performance 43 Summary … Cont’d “For better or worse, benchmarks shape a field” Good products created when have: – Good benchmarks – Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance! MAC/VU-Advanced Computer Architecture Lecture - Performance 44 ... i.e., the yield of the die MAC/VU -Advanced Computer Architecture Lecture - Performance 18 Dies of Integrated Circuits MAC/VU -Advanced Computer Architecture Lecture - Performance 19 Cost of Integrated... networks MAC/VU -Advanced Computer Architecture Lecture - Performance Recap of Lecture Computer Design Cycle: • The computer design and development has been under the influence of -Technology -performance... in: - Quadratic rise in transistor count - Linear increase in performance 4-bit to 64-bit microprocessor Desktops have replaced time-sharing machines MAC/VU -Advanced Computer Architecture Lecture