Bài tập kiến trúc máy tính

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	27
Dung lượng	2,54 MB

Nội dung

Chapter 1. Ex. 1. Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has a clock rate of 100 MHz. The average number of cycles for each instruction class and their frequencies (for a typical program) are as follows: a) Calculate the average CPI for each machine, M1, and M2. b) Calculate the average MIPS ratings for each machine, M1 and M2. c) Which machine has a smaller MIPS rating ? Which individual instruction class CPI do you need to change, and by how much, to have this machine have the same or better performance as the machine with the higher MIPS rating (you can only change the CPI for one of the instruction classes on the slower machine)? Ex. 2. (Amdahl’s law question) Suppose you have a machine which executes a program consisting of 50% floating point multiply, 20% floating point divide, and the remaining 30% are from other instructions. a) Management wants the machine to run 4 times faster. You can make the divide run at most 3 times faster and the multiply run at most 8 times faster. Can you meet management’s goal by making only one improvement, and which one? b) Dogbert has now taken over the company removing all the previous managers. If you make both the multiply and divide improvements, what is the speed of the improved machine relative to the original machine? Ex. 3. Suppose that we can improve the floating point instruction performance of machine by a factor of 15 (the same floating point instructions run 15 times faster on this new machine). What percent of the instructions must be floating point to achieve a Speedup of at least 4? Ex. 4. Just like we defined MIPS rating, we can also define something called the MFLOPS rating which stands for Millions of Floating Point operations per Second. If Machine A has a higher MIPS rating than that of Machine B, then does Machine A necessarily have a higher MFLOPS rating in comparison to Machine B? Note: MIPS rating is defined by: MIPS = (Clock Rate)/(CPI * 10 6 ) Ex. 5. Assume that a design team is considering enhancing a machine by adding MMX (multimedia extension instruction) hardware to a processor. When a computation is run in MMX mode on the MMX hardware, it is 10 times faster than the normal mode of execution. Call the percentage of time that could be spent using the MMX mode the percentage of media enhancement. a) What percentage of media enhancement is needed to achieve an overall speedup of 2? b) What percentage of the run-time is spent in MMX mode if a speedup of 2 is achieved? (Hint: You will need to calculate the new overall time.) c) What percentage of the media enhancement is needed to achieve one-half the maximum speedup attainable from using the MMX mode? Ex. 6. If processor A has a higher clock rate than processor B, and processor A also has a higher MIPS rating than processor B, explain whether processor A will always execute faster than processor B. Suppose that there are two implementations of the same instruction set architecture. Machine A has a clock cycle time of 20ns and an effective CPI of 1.5 for some program, and machine B has a clock cycle time of 15ns and an effective CPI of 1.0 for the same program. Which machine is faster for this program, and by how much? Note: MIPS rating is defined by: MIPS = (Clock Rate)/(CPI * 10 6 ) Ex. 7. Suppose a program segment consists of a purely sequential part which takes 25 cycles to execute, and an iterated loop which takes 100 cycles per iteration. Assume the loop iterations are independent, and cannot be further parallelized. If the loop is to be executed 100 times, what is the maximum speedup possible using an infinite number of processors (compared to a single processor)? Ex. 8. Computer A has an overall CPI of 1.3 and can be run at a clock rate of 600MHz. Computer B has a CPI of 2.5 and can be run at a clock rate of 750 Mhz. We have a particular program we wish to run. When compiled for computer A, this program has exactly 100,000 instructions. How many instructions would the program need to have when compiled for Computer B, in order for the two computers to have exactly the same execution time for this program? Ex. 9. The design team for a simple, single-issue processor is choosing between a pipelined or non-pipelined implementation. Here are some design parameters for the two possibilities: Parameter Pipelined Version Non-pipelined Version Clock Rate 500MHz 350MHz CPI for ALU instructions 1 1 CPI for Control instructions 2 1 CPI for Memory Instructions 2.7 1 a) For a program with 20% ALU instructions, 10% control instructions and 75% memory instructions, which design will be faster? Give a quantitative CPI average for each case. b) For a program with 80% ALU instructions, 10% control instructions and 10% memory instructions, which design will be faster? Give a quantitative CPI average for each case. Ex. 10. A designer wants to improve the overall performance of a given machine with respect to a target benchmark suite and is considering an enhancement X that applies to 50% of the original dynamically-executed instructions, and speeds each of them up by a factor of 3. The designer’s manager has some concerns about the complexity and the cost-effectiveness of X and suggests that the designer should consider an alternative enhancement Y. Enhancement Y, if applied only to some (as yet unknown) fraction of the original dynamically-executed instructions, would make them only 75% faster. Determine what percentage of all dynamically-executed instructions should be optimized using enhancement Y in order to achieve the same overall speedup as obtained using enhancement X. Ex. 11. Prior to the early 1980s, machines were built with more and more complex instruction set. The MIPS is a RISC machine. Why has there been a move to RISC machines away from complex instruction machines? Chapter 2. Ex. 12. Write the following sequence of code into MIPS assembler: x = x + y + z - q; Assume that x, y, z, q are stored in registers $s1-$s4. Ex. 13. In MIPS assembly, write an assembly language version of the following C code segment: int A[100], B[100]; for (i=1; i < 100; i++) { A[i] = A[i-1] + B[i]; } At the beginning of this code segment, the only values in registers are the base address of arrays A and B in registers $a0 and $a1. Avoid the use of multiplication instructions–they are unnecessary. Ex. 14. Consider the following assembly code for parts 1 and 2. r1 = 99 Loop: r1 = r1 – 1 branch r1 > 0, Loop halt a) During the execution of the above code, how many dynamic instructions are executed? b) Assuming a standard unicycle machine running at 100 KHz, how long will the above code take to complete? Ex. 15. Convert the C function below to MIPS assembly language. Make sure that your assembly language code could be called from a standard C program (that is to say, make sure you follow the MIPS calling conventions). unsigned int sum(unsigned int n) { if (n == 0) return 0; else return n + sum(n-1); } This machine has no delay slots. The stack grows downward (toward lower memory addresses). The following registers are used in the calling convention: Ex. 16. In the snippet of MIPS assembler code below, how many times is instruction memory accessed? How many times is data memory accessed? (Count only accesses to memory, not registers.) lw $v1, 0($a0) addi $v0, $v0, 1 sw $v1, 0($a1) addi $a0, $a0, 1 Ex. 17. Use the register and memory values in the table below for the next questions. Assume a 32- bit machine. Assume each of the following questions starts from the table values; that is, DO NOT use value changes from one question as propagating into future parts of the question. a) Give the values of R1, R2, and R3 after this instruction: add R3, R2, R1 b) What values will be in R1 and R3 after this instruction is executed: load R3, 12(R1) c) What values will be in the registers after this instruction is executed: addi R2, R3, #16 Ex. 18. Loop Unrolling and Fibonacci: Consider the following pseudo-C code to compute the fifth Fibonacci number (F(5)). 1 int a,b,i,t; 2 a=b=1; /* Set a and b to F(2) and F(1) respectively */ 3 for(i=0;i<2;i++) 4 { 5 t=a; /* save F(n-1) to a temporary location */ 6 a+=b; /* F(n) = F(n-1) + F(n-2) */ 7 b=t; /* set b to F(n-1) */ 8 } One observation that a compiler might make is that the loop construction is somewhat unnecessary. Since the the range of the loop indices is fixed, one can unroll the loop by simply writing three iterations of the loop one after the other without the intervening increment/comparison on i. For example, the above could be written as: 1 int a,b,t; 2 a=b=1; 3 t=a; 4 a+=b; 5 b=t; 6 t=a; 7 a+=b; 8 b=t; a) Convert the pseudo-C code for both of the snippets above into reasonably efficient MIPS code. Represent each variable of the pseudo-C program with a register. Try to follow the pseudo-C code as closely as possible (i.e. the first snippet should have a loop in it, while the second should not). b) Now suppose that instead of the fifth Fibonacci number we decided to compute the 20th. How many static instructions would there be in the first version and how many would there be in the unrolled version? What about dynamic instructions? You do not need to write out the assembly for this part. Ex. 19. In MIPS assembly, write an assembly language version of the following C code segment: for (i = 0; i < 98; i ++) { C[i] = A[i + 1] - A[i] * B[i + 2] } Arrays A, B and C start at memory location A000hex, B000hex and C000hex respectively. Try to reduce the total number of instructions and the number of expensive instructions such as multiplies. Ex. 20. Suppose that a new MIPS instruction, called bcp, was designed to copy a block of words from one address to another. Assume that this instruction requires that the starting address of the source block be in register $t1 and that the destination address be in $t2. The instruction also requires that the number of words to copy be in $t3 (which is > 0). Furthermore, assume that the values of these registers as well as register $t4 can be destroyed in executing this instruction (so that the registers can be used as temporaries to execute the instruction). Do the following: Write the MIPS assembly code to implement a block copy without this instruction. Write the MIPS assembly code to implement a block copy with this instruction. Estimate the total cycles necessary for each realization to copy 100-words on the multicycle machine. Ex. 21. This problem covers 4-bit binary multiplication. Fill in the table for the Product, Multplier and Multiplicand for each step. You need to provide the DESCRIPTION of the step being performed (shift left, shift right, add, no add). The value of M (Multiplicand) is 1011, Q (Multiplier) is isnitially 1010. Ex. 22. This problem covers floating-point IEEE format. a) List four floating-point operations that cause NaN to be created? b) Assuming single precision IEEE 754 format, what decimal number is represent by this word: 1 01111101 00100000000000000000000 (Hint: remember to use the biased form of the exponent.) Ex. 23. The floating-point format to be used in this problem is an 8-bit IEEE 754 normalized format with 1 sign bit, 4 exponent bits, and 3 mantissa bits. It is identical to the 32-bit and 64-bit formats in terms of the meaning of fields and special encodings. The exponent field employs an bias-7 coding. The bit fields in a number are (sign, exponent, mantissa). Assume that we use unbiased rounding to the nearest even specified in the IEEE floating point standard. a) Encode the following numbers the 8-bit IEEE format: i) 0.0011011 binary ii) 6.0 decimal b) Perform the computation 1.011 binary + 0.0011011 binary c) Decode the following 8-bit IEEE number into their decimal value: 1 1010 101 d) Decide which number in the following pairs are greater in value (the numbers are in 8-bit IEEE 754 format): i) 0 0100 100 and 0 0100 111 ii) 0 1100 100 and 1 1100 101 e) In the 32-bit IEEE format, what is the encoding for negative zero? f) In the 32-bit IEEE format, what is the encoding for positive infinity? Ex. 24. The floating-point format to be used in this problem is a normalized format with 1 sign bit, 3 exponent bits, and 4 mantissa bits. The exponent field employs an excess-4 coding. The bit fields in a number are (sign, exponent, mantissa). Assume that we use unbiased rounding to the nearest even specified in the IEEE floating point standard. a) Encode the following numbers in the above format: i) 1.0binary ii) 0.0011011binary Note: The guard bit is an extra bit that is added at the least significant bit position during an arithmetic operation to prevent loss of significance. The round bit is the second bit that is used during a floating point arithmetic operation on the rightmost bit position to prevent loss of precision during intermediate additions. The sticky bit keeps record of any 1’s that have been shifted on to the right beyond the guard and round bits b) Using 32-bit IEEE 754 single precision floating point with one(1) sign bit, eight (8) exponent bits and twenty three (23) mantissa bits, show the representation of -11/16 (- 0.6875). c) What is the smallest positive (not including +0) representable number in 32-bit IEEE 754 single precision floating point? Show the bit encoding and the value in base 10 (fraction or decimal OK). Ex. 25. Perform the following operations by converting the operands to 2’s complement binary numbers and then doing the addition or subtraction shown. Please show all work in binary, operating on 16-bit numbers. a) 3 + 12 b) 13 – 2 c) 5 – 6 d) -7 – (-7) Ex. 26. Define the WiMPY precision IEEE 754 floating point format to be: where each ’X’ represents one bit. Convert each of the following WiMPY floating point numbers to decimal: a) 00000000 b) 11011010 c) 01110000 Ex. 27. This problem covers 4-bit binary unsigned division (similar to Fig. 3.11 in the text). Fill in the table for the Quotient, Divisor and Dividend for each step. You need to provide the DESCRIPTION of the step being performed (shift left, shift right, sub). The value of Divisor is 4 (0100, with additional 0000 bits shown for right shift), Dividend is 6 (initially loaded into the Remainder). Ex. 28. We’re going to look at some ways in which binary arithmetic can be unexpectedly useful. For this problem, all numbers will be 8-bit, signed, and in 2’s complement. a) For x = 8, compute x & (−x). (& here refers to bitwise-and, and − refers to arithmetic negation.) b) For x = 36, compute x & (−x). c) Explain what the operation x & (−x) does. Ex. 29. Data representation a) Tìm biểu diễn thập phân của số không dấu, dấu phẩy cố định 10110,110 2 b) Tìm biểu diễn không dấu, dấu phẩy cố định của số 106,375 10 c) Có thể đổi một số thập phân bất kz sang dạng nhị phân dấu phẩy cố định mà không làm mất chính xác được không? Ex. 30. Data representation a) Đổi số thập phân 3,4 và 2,4 sang dạng nhị phân dấu phẩy cố định sử dụng 4 chữ số bên trái dấu phẩy và 4 chữ số bên phải dấu phẩy. Thực hiện phép cộng 2 số đó. Xác định sai số tương đối. b) Số 0110 0110 0011 1111 2 tương ứng với số hệ 16 nào? Ex. 31. Tìm biểu diễn nhị phân 8 bít của số -86 a) Dùng dấu và độ lớn b) Dùng biểu diễn bù 1 c) Dùng biểu diễn bù 2 d) Dùng biểu diễn lệch 127 Ex. 32. a) Đổi số thập phân a = 3,4 và b = 10,25 sang dạng biểu diễn dấu phẩy động theo chuẩn IEEE 754 độ chính xác đơn. b) Cộng 2 số a và b c) Nhân 2 số a và b Ex. 33. Mô tả phương pháp để nhân một số biểu diễn dưới dạng mã bù 2 với 127 mà không dùng bộ nhân. Đổi 127 10 sang dạng số nhị phân mã bù 2, 8 bits. Xác định giá trị 127 2 (Kết quả biểu diễn bằng số 16 bit) Ex. 34. Thiết kế bộ dịch Barrel cho phép dịch trái số học 1,0,-1, hoặc -2 bit một số 4 bit. Số lượng bít cần dịch được cho dưới dạng 1 số nhị phân mã bù 2. Ex. 35. Dùng các cổng logic đơn giản và một bộ cộng 32 bit với các bit nhớ vào và ra. a) Thiết kế một mạch để trừ 2 số không dấu 32 bit. Mạch này có 2 đầu vào 32 bít và 1 đầu ra 32 bit. Ngoài ra, mạch có một đầu ra n (negative). N=1 báo hiệu hiệu là số âm và không thể biểu diễn dưới dạng số không dấu. b) Thiết kế một mạch để so sánh 2 số có dấu 32 bít a và b. Cả 2 số đều được biểu diễn dưới dạng dấu và trị số tuyệt đối. Mạch này có 1 đầu ra l (less). Khi l = 1, ta có a < b. c) Thiết kế một mạch để so sánh 2 số dấu phẩy động độ chính xác đơn. Ex. 36. Cho một bộ cộng Ripple-Carry gồm 16 bộ cộng đủ 1 bit như hình sau: Mỗi cổng có độ trễ 1 đơn vị. Tín hiệu được đưa vào ở thời điểm 0. Tính thời điểm t ar các tín hiệu tổng và tín hiệu nhớ đạt trạng thái ổn định. Chapter 3. Ex. 37. For the MIPS datapath shown below, several lines are marked with “X”. For each one: • Describe in words the negative consequence of cutting this line relative to the working, unmodified processor. • Provide a snippet of code that will fail • Provide a snippet of code that will still work Ex. 38. Consider the following assembly language code: I0: ADD R4 = R1 + R0; I1: SUB R9 = R3 - R4; I2: ADD R4 = R5 + R6; I3: LDW R2 = MEM[R3 + 100]; I4: LDW R2 = MEM[R2 + 0]; I5: STW MEM[R4 + 100] = R2; I6: AND R2 = R2 & R1; I7: BEQ R9 == R1, Target; I8: AND R9 = R9 & R1; Consider a pipeline with forwarding, hazard detection, and 1 delay slot for branches. The pipeline is the typical 5-stage IF, ID, EX, MEM, WB MIPS design. For the above code, complete the pipeline diagram below (instructions on the left, cycles on top) for the code. Insert the characters IF, ID, EX, MEM, WB for each instruction in the boxes. Assume that there two levels of bypassing, that the [...]... xung đồng hồ để thực hiện đoạn mã trên Vẽ biểu diễn hoạt động pipeline minh họa việc thực hiện đoạn mã trên trong kiến trúc pipeline Chỉ ra những vị trí cần có chuyển tiếp để giảm tạm dừng (stall) Ex 50 Xét chương trình gồm 100 lệnh lw, mỗi lệnh đều phụ thuộc dữ liệu vào lệnh trước đó Tính chỉ số CPI khi thực hiện chương trình nói trên Ex 51 Xét một đoạn mã như sau: lw $0,0($1) add $0,$0,$3 lw $2,4($1)... which the load hits takes 10 cycles but that an iteration of a loop in which the load misses takes 100 cycles What is the execution time of this snippet with the aforementioned cache? Ex 16 Xét hệ thống máy tính bao gồm các thành phần sau: 1 Bộ xử lý có tốc độ đồng hồ 2.4Ghz 2 2 bộ đệm ánh xạ trực tiếp L1, kích thước 32KB  Bộ đệm lệnh (I-cache), mỗi đường (khối) kích thước 32bytes  Bộ đệm dữ liệu, kích . trong kiến trúc pipeline. Chỉ ra những vị trí cần có chuyển tiếp để giảm tạm dừng (stall). Ex. 50. Xét chương trình gồm 100 lệnh lw, mỗi lệnh đều phụ thuộc dữ liệu vào lệnh trước đó. Tính chỉ. cộng đủ 1 bit như hình sau: Mỗi cổng có độ trễ 1 đơn vị. Tín hiệu được đưa vào ở thời điểm 0. Tính thời điểm t ar các tín hiệu tổng và tín hiệu nhớ đạt trạng thái ổn định. Chapter 3. Ex.

Ngày đăng: 11/05/2014, 15:00

Xem thêm