Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 73 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
73
Dung lượng
2,71 MB
Nội dung
VIETNAM NATIONAL UNIVERSITY HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY THI-THUY NGUYEN COMMUNICATION CONTROL MECHANISM IN RECONFIGURABLE NETWORK-ON-CHIPS ARCHITECTURES MASTER’S THESIS OF ELECTRONICS – TELECOMMUNICATIONS TECHNOLOGY Hanoi - 2015 VIETNAM NATIONAL UNIVERSITY HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY THI-THUY NGUYEN COMMUNICATION CONTROL MECHANISM IN RECONFIGURABLE NETWORK-ON-CHIPS ARCHITECTURES Branch: Electronics – Telecommunications Technology Major: Electronics Engineering Code: 60520203 MASTER’S THESIS OF ELECTRONICS – TELECOMMUNICATIONS TECHNOLOGY Supervisor: Assoc Prof Xuan-Tu Tran Hanoi - 2015 AUTHORSHIP “I hereby declare that the work contained in this thesis is of my own and has not been previously submitted for a degree or diploma at this or any other higher education institution To the best of my knowledge and belief, the thesis contains no materials previously published or written by another person except where due reference or acknowledgement is made.” Signature:……………………………………………… TABLE OF CONTENTS AUTHORSHIP TABLE OF CONTENTS List of Figures List of Tables List of Abbreviations INTRODUCTION 10 12 NETWORK-ON-CHIP 12 1.1 Basic concepts 12 1.1.1 Basic components of a NoC 12 1.1.2 Network topology 13 1.1.3 Communication protocol 15 1.1.4 Routing modes 16 1.1.5 Buffering strategies 17 1.1.6 Routing algorithms 19 1.1.7 Data blocking 19 1.1.8 Quality of the service 21 1.2 Communication mechanism in NoC 21 1.2.1 Definition and classification 21 1.2.2 Previous works 24 1.3 Reconfigurable NoCs 27 1.4 Conclusions 29 30 PROPOSED COMMUNICATION CONTROL MECHANISM IN RECONFIGURABLE NOCS ARCHITECTURES 30 2.1 Target reconfigurable NoC architectures 30 2.1.1 General parameters 30 2.1.2 RNoC platform 31 2.2 Proposed Communication control mechanism 33 2.2.1 Tracking and replacing routing information mechanism 36 2.2.2 Flit structures 37 2.3 Conclusions 40 41 ARCHITECTURE OF MODIFIED ROUTER AND NETWORK INTERFACE 41 3.1 Modified architecture for reconfigurable network router 41 3.2 NI architecture 45 3.2.1 C2r_buffer 45 3.2.2 C2r_controller 46 3.2.3 Flitizer 47 3.2.4 Routing_table 49 3.2.5 Updating _path 50 3.2.6 R2c_buffer 51 3.2.7 R2c_controller 51 3.2.8 De_flitizer 52 3.3 Conclusions 53 54 VERIFICATION, IMPLEMENTATION AND EVALUATION 54 4.1 Verification method 54 4.1.1 Verifying basic function of NI 55 4.1.2 Verifying the mechanism of tracking and replacing routing information 56 4.2 Implementation result 61 4.3 Evaluation 62 4.4 Conclusions 64 CONCLUSIONS 65 Publications 67 References 68 Appendix 70 List of Figures Figure 1-1: An example architecture of a 2D mesh 13 Figure 1-2: Popular NoC topologies: (a) ring and chordal ring; (b) fat-tree; butterfly fat-tree; (d) 2D-mesh; (e) 2D-torus; (f) 2D folded torus [5] 14 Figure 1-3: Three frequent routing modes of NoC: (a) Store And Forward; (b) Virtual Cut Though; (c) Wormhole [5] 16 Figure 1-4: Four popular buffering strategies of NoC: (a) Input queuing;(b) Output queuing;(c) Virtual output queuing;(d)Virtual channel priority input queuing [5] 18 Figure 1-5: Deadlock example [5] 20 Figure 1-6: Separated buffer virtual channels are used to solve the deadlock problem The data of one VC can use the physical link when the others one in the same port is stalled[4] 20 Figure 1-7: An example of livelock The ouside routers can’t reach the inside once because they are in deadlock [5] 21 Figure 1-8: The classification of NOC’s flow cotrol mechanisms 22 Figure 1-9: Connection implementation of end-to-end flow control [13] 24 Figure 1-10: NoC architecture with injection level flow control strategy [14] 25 Figure 1-11: Router architecture [15] 26 Figure 1-12: Proposed idea of T-error [12] 27 Figure 1-13: Logic level of T-error [12] 27 Figure 2-1: RNOC router architecture [2] 31 Figure 2-2: (a) Probihited router is in the middle of a straight segment routing path, (b) the probihited router is at the conrner of routing path [2] 32 Figure 2-3: The prohibited router appears just before or just after the corner of the routing path [2] 33 Figure 2-4: Block diagram of proposed communicaiton mechanism 34 Figure 2-5: Router-to-router/NI interface and send/accept protocol [23] 35 Figure 2-6: A flow diagram of the tracking and replacing routing information mechansim 37 Figure 2-7: Structure of header flit 38 Figure 2-8: Structue of body flit of a normal packet 39 Figure 2-9: A structue of body flit of a special packet 39 Figure 2-10: A structue of tail flit 39 Figure 3-1: Micro-architecture of INPUT PORT of RNOC’s router [2] 42 Figure 3-2: Modifying the VC_Demux of the North input port 43 Figure 3-3: Modifying the VC_Demux of the East input port 44 Figure 3-4: Modifying the VC_Demux of the South input port 44 Figure 3-5: Modifying the VC_Demux of the West input port 44 Figure 3-6 : A architecture of NI 45 Figure 3-7: C2R_buffer module 46 Figure 3-8: C2R_controller module 47 Figure 3-9: Flitizer module 48 Figure 3-10: Routing table module 49 Figure 3-11: Updating path module 50 Figure 3-12: R2C_Buffer module 51 Figure 3-13: R2C_controller module 52 Figure 3-14: De_flitizer module 53 Figure 4-1 Testbench model 55 Figure 4-2: Test case 56 Figure 4-3: Simulation for tracking phase and processing phase in test case 57 Figure 4-4: Simulation for replacing phase in test case 57 Figure 4-5: Test case 58 Figure 4-6 Simulation for tracking phase and processing phase in test case 59 Figure 4-7: Simulation for replacing phase in test case 59 Figure 4-8: Test case 60 Figure 4-9: Simulation for tracking phase and processing phase in test case 61 Figure 4-10: Simulation for replacing phase in test case 61 Figure 4-11: Timing information 62 Figure 4-12: ASIC & VLSI design flow [24] 70 List of Tables Table 2-1: Function and code for each type of flits 38 Table 3-1: Codes for directions 43 Table 3-2: Code for VC_Demux of each input port 43 Table 3-3: Pin desciption of C2R_buffer module 46 Table 3-4: Pin desciption of R2c_buffer module 51 Table 4-1: Routing informaton in test case 57 Table 4-2: Routing information in test case 58 Table 4-3: Rouiting information in test case 60 Table 4-4: Device Utilization Sumary (estimated values) 61 Table 4-5: Delay information 63 57 Table 4-1: Routing informaton in test case Original path Reconfigured path Expected tracking path Tracking path (from simulation) Expected processed path Processed path (from simulation) 000000101010010101 110110101110010101 011010111001010111 011010111001010111 110110101110010101 110110101110010101 Figure 4-3 demonstrates that the tracking process and the processing process of our mechanism worked correctly Figure 4-3: Simulation for tracking phase and processing phase in test case Figure 4-4 demonstrates that the replacing process of our mechanism Figure 4-4: Simulation for replacing phase in test case Test case 2: Data is sent from node9 to node3, and node11 is prohibited as shown in Figure 4-5 58 Figure 4-5: Test case Similar to test case 1, in test case 2, the information in Table 4-2 shows that the tracking phase and the processing phase worked as expected Table 4-2: Routing information in test case Original path Reconfigured path Expected tracking path Tracking path (from simulation) Expected processed path Processed path (from simulation) 000000001000000101 000000001000010001 000100011100000000 000100011100000000 000000001000010001 000000001000010001 Figure 4-6 demonstrates that the tracking process and the processing process of our mechanism worked correctly 59 Figure 4-6 Simulation for tracking phase and processing phase in test case Figure 4-7 demonstrates that the replacing process of our mechanism Figure 4-7: Simulation for replacing phase in test case Test3: Data is sent from node15 to node4, and node8 is probated as shown in Figure 4-8 60 Figure 4-8: Test case In test case 3, the information in Table 4-3 demonstrates that the tracking phase and the processing phase worked as expected In this table, the group of four italic bits is an example representing of a loop path Table 4-3: Rouiting information in test case Original path Reconfigured path Expected tracking path Tracking path (from simulation) Expected processed path Processed path (from simulation) 000000100000111111 000111000001111111 110000011111111100 110000011111111100 000000011100001111 000000011100001111 Figure 4-9 demonstrates that the tracking process and the processing process of our mechanism worked correctly 61 Figure 4-9: Simulation for tracking phase and processing phase in test case Figure 4-10 demonstrates that the replacing process of our mechanism Figure 4-10: Simulation for replacing phase in test case In three test cases shown, data is transmitted in all input ports (for example, the North input port and the East input port in test1, the South input port and the West input port in test2), and the results demonstrate that the tracking function is occurred as expected As all routers in the RNoC platform have the same architecture, the results of three test cases also prove that the tracking function implemented in routers will work correctly The working of the processing phase and the replacing phase are proved in all three test cases corresponding to three cases of reconfiguration strategy 4.2 Implementation result After completing the verification process, the proposed NI are synthesized to kit Virtex5 XC5VLX330 -2ff1760 using Xilinx tools Obtained results of synthesis process are shown in Table 4-4 We see that our proposed NI occupied 1334 of 207360 Slice Registers and 609 of 207360 Slice LUTs The percentage of fully used LUT-FF pairs, bonded IOBs and BUFG/BUFGCTRLs are 11%, 12% and 3%, respectively Table 4-4: Device Utilization Sumary (estimated values) Logic utilization Number of Slice Registers Number of Slice LUTs Number of fully used LUT-FF pairs Number of Bonded IOBs Number of BUFG/BUFGCTRLs Used 1334 609 193 146 Available 207360 207360 1750 1200 32 Utilization 0% 0% 11% 12% 3% 62 As shown in timing summary of synthesis report (see Figure 4-11), our proposed NI can operate at maximum frequency of 294 MHz Figure 4-11: Timing information 4.3 Evaluation Our communication control mechanism aims to help NoC decrease packet delay after reconfiguring (i.e., Increase the performance) To evaluate the efficiency of the proposed mechanism, we simulate two 4×4 mesh NoC platforms The first one is without our proposed end-to-end flow control mechanism, and the second one is with our proposed mechanism Both of these platforms are simulated with four cases of reconfiguration strategy: three cases are introduced in section 2.1.2 and the last one is a special case of case3 which includes a loop path With the simulation environment described in Figure 4-1, information of timing recorded in input.txt and output.txt, delay time of each packet in each case is calculated We calculate three types of delay: header-to-header delay, tail-to-tail delay and packetto-packet delay Header-to-header delay is the period from the time a header flit of a packet is sent from the NI of the source node to the NI of the destination node Tail-to-tail delay is the period that a tail flit of a packet is transmitted from the NI of the source node to the NI of the destination node Packet-to-packet delay is the period a packet is transmitted from the NI of the source node to the NI of the destination node This period of time is begin at the time the header flit is sent into NoC and end at the time the tail flit of that packet received at the NI of the destination node 63 After the simulating and calculating process, we got the delay information as shown in Table 4-5 Table 4-5: Delay information L1 C1 C2 C3 Lo 7 L2 7 7 Delay (ns) Delay (ns) Without end-to-end flow control Within end-to-end flow control H2H 760 680 600 520 680 600 520 560 520 480 800 720 640 560 T2T 360 320 280 240 280 240 200 280 240 200 480 400 400 320 P2P H2H 2440 560 2400 480 2360 400 2320 320 2360 520 2320 440 2280 360 2320 400 2280 360 2240 480 2480 520 2400 440 2400 360 2320 280 The period of clock cycle: 40ns C1: case1 of reconfiguration strategy C2: case2 of reconfiguration strategy C3: case3 of reconfiguration strategy Lo: case3 including a loop path T2T 360 320 280 240 280 240 200 280 240 200 280 240 200 160 P2P 2280 2240 2200 2160 2200 2160 2120 2200 2160 2120 2200 2160 2120 2080 Delay decrease H2H T2T 26.3% 0% 29.4% 0% 33.3% 0% 38.5% 0% 23.5% 0% 26.7% 0% 30.8% 0% 28.6% 0% 30.8% 0% 33.3% 0% 35% 41.7% 38.9% 40% 43.6% 50% 50% 50% P2P 6.6% 6.7% 6.8% 6.9% 6.8% 6.9% 7.0% 5.2% 5.3% 5.4% 11.3% 10% 11.7% 10.4% L1 : length of reconfigured path L2 : length of processed path H2H : header-to-header delay T2T : tail-to-tail delay P2P : packet-to-packet delay As introduced in section 2.1.1, target reconfigurable NoC architectures use wormhole routing technique The characteristic of this routing technique is one of several factors affect the delay of packets For instance, the header flit reserved channel, and the body flit and tail flit will then follow the reserved path Moreover, in NoC using wormhole routing technique, the flits of a packet can be stored in several routers at the same time This can result in a decrease in packet delay; however, the spanning of flits over multiple routers can lead to an increase in the possibility of dead-lock problems Therefore, for small packets (usually the control packets, for example, special packets in our design) which has number of flit smaller than the length of path before the prohibited node, the tail flit will be transmitted into the network before the header flit can pass the reconfiguration node (the node before the prohibited node with respect to the routing path) As the consequence, the period header flit is processed at the reconfiguration node is added to the tail-to-tail delay In other words, the period of time header flit is processed decide the time that packet occupies the communication resources of the network With the large packet (usually the data packet) which have the number of flits bigger than the length of path before the prohibited node, the packet-to- 64 packet delay is consider mostly In this case, the time the period of time header flit is processed affects slightly to the time that packet occupies the network resources In our design, the length of a normal packet is fixed at 17 flits (including the additional tail flit to tracking routing information) while the special packet just includes three flits and each routing path just include only one prohibited node With the normal packet, our end-to-end flow control help NoC decrease about 23.5% to 50% in the header-to-header delay, 0% to 50% in the tail-to-tail delay and 5.2% to 11.7% in packetto-packet delay As simulating in the 4×4 NoC, the length of routing path is limited, and the changing in the length of routing path just lead to lightly changing in packet-to-packet delay For the larger network, the length of routing path can significantly increase; therefore, the delay decrease (after applying our mechanism) can get lower value However, the increase in the length of the routing path may lead to the higher number of prohibited node in that routing path As the number of prohibited node increase, its take more time to reconfigure the routing path As a result, the delay decrease (after applying our mechanism) can get a higher value 4.4 Conclusions In this chapter, the method to simulate and verify our proposed communication control mechanism are presented After verification process, our NI architecture is synthesized, and then some main experimental implementation results also be given Finally, evaluation process shows that our mechanism can help the network decrease the packet delay 65 CONCLUSIONS Thanks to dynamically reconfiguring hardware ability, reconfigurable NoCs allow many hardware tasks to be mapped onto the same hardware platform However, the reconfiguration processes lead to the changing in communication infrastructures such as routers architecture, characteristics of network links Therefore, the communication control mechanism is really needed to ensure the communication in NoC after reconfiguration process There are several proposed reconfigurable NoC, and each architecture has a different method to make NoC reconfigurable As results, the communication control mechanisms are various depending on the characteristics and features of those reconfigurable NoC architectures In this thesis, we proposed a communication control mechanism for the RNoC platform to guarantee the lossless communication as well as reduce the packet delay of the network To implement our CCM, a tracking and replacing information mechanism was proposed The RNoC architecture was modified to support the tracking phase while NI architecture was designed to implement the processing phase and replacing phase Our NI architecture was also specified to support the send/accept based flow control that ensures the lossless communication of RNoC Our proposed NI architecture and modified RNoC were modeled by using VHDL language , then simulated and verified using ModelSim To the verification process, 16 modified RNoCs were connected to build a 4×4 mesh RNoC platform Then 16 proposed NIs were plugged to that platform To complete the environment for verifying our CCM, the basic functions of IP cores such as sending and receiving data were also simulated in a proper testbench Four test cases were introduced with respect to three cases of reconfiguration strategies of RNoC With the information from the simulation process, we evaluated the packet delay in case packets including seventeen flits The obtainedresults showed that our CCM help NoC decrease about 23.5% to 50% in headerto-header delay, 0% to 50% in tail-to-tail delay and 5.2% to 11.7% in packet-to-packet delay After verifying that our CCM can achieve the expected functions, our NI architecture was synthesized to kit Virtex5 XC5VLX330 -2ff1760 using tools of Xilinx 66 The results of synthesis process showed that our design can operate at maximum frequency of 294 MHz In the work of this thesis, our proposed CCM focused on ensuring the communication between IP cores and reducing the packet delay; however, the balancing network was not covered yet In the next step, we will develop our CCM to balance the network traffic by controlling the injection rate based on the tracked information 67 Publications Thi-Thuy Nguyen, Xuan-Tu Tran, 2014, A Novel Asynchronous First-InFirst-Out Adapting to Multi-synchronous Network-on-Chips, In Proceedings of the 7th International Conference on Advanced Technologies for Communications (ATC 2014), pp 365-370 68 References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] R Dafali, J P Diguet, and M Sevaux, "Key Research Issues for Reconfigurable Network-on-Chip," in Reconfigurable Computing and FPGAs, 2008 ReConFig '08 International Conference on, 2008, pp 181-186 L.-V Thanh-Vu and X.-T Tran, "High-Level Modeling and Simulation of a Novel Reconfigurable Network-on-Chip Router," REV Journal on Electronics and Communications, vol 4, pp 68-74, 2014 É Cota, A de Morais Amory, and M S Lubaszewski, "Reliability, Availability and Serviceability of Networks-on-chip," ed: Springer Science & Business Media, 2011 "A Servey of Research and Pratices of Network-on-Chip," ACM Computing Survey, vol 38, March 2006 X.-T Tran (editor), Emerging Apects in Electionics and Communicaiton Engineering: Vietnam National University Publisher, 2013 "Survey of Network on Chip (NoC) Architecure & Contributations," Journal of Engineering, Computing and Architecture, 2009 "Servey of Network-on-chip Proposals," OCP-IP, Whie Paper2008 A Agarwal, C Iskander, and R Shankar, "Survey of Network on Chip (NoC) architectures & contributions," Journal of engineering, Computing and Architecture, vol 3, pp 21-27, 2009 K Tatas, K Siozios, D Soudris, and A Jantsch, Designing 2D and 3D Networkon-chip Architectures: Springer, 2014 G De Micheli and L Benini, Networks on chips: technology and tools: Academic Press, 2006 D Wiklund and D Liu, "SoCBUS: switched network on chip for hard real time embedded systems," in Parallel and Distributed Processing Symposium, 2003 Proceedings International, 2003, p pp R Tamhankar, S Murali, S Stergiou, A Pullini, F Angiolini, L Benini, et al., "Timing-Error-Tolerant Network-on-Chip Design Methodology," ComputerAided Design of Integrated Circuits and Systems, IEEE Transactions on, vol 26, pp 1297-1310, 2007 A Radulescu, J Dielissen, S G Pestana, O P Gangwal, E Rijpkema, P Wielage, et al., "An efficient on-chip NI offering guaranteed services, sharedmemory abstraction, and flexible network configuration," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol 24, pp 417, 2005 M Tang and X Lin, "Injection Level Flow Control for Networks-on-Chip (NoC)," J Inf Sci Eng., vol 27, pp 527-544, 2011 E Bolotin, I Cidon, R Ginosar, and A Kolodny, "QNoC: QoS architecture and design process for network on chip," Journal of systems architecture, vol 50, pp 105-128, 2004 A Pullini, F Angiolini, D Bertozzi, and L Benini, "Fault Tolerance Overhead in Network-on-Chip Flow Control Schemes," in Integrated Circuits and Systems Design, 18th Symposium on, 2005, pp 224-229 69 [17] [18] [19] [20] [21] [22] [23] [24] P T Wolkotte, G J Smit, G K Rauwerda, and L T Smit, "An energy-efficient reconfigurable circuit-switched network-on-chip," in Parallel and Distributed Processing Symposium, 2005 Proceedings 19th IEEE International, 2005, pp 155a-155a M B Stensgaard and J Sparso, "Renoc: A network-on-chip architecture with reconfigurable topology," in Networks-on-Chip, 2008 NoCS 2008 Second ACM/IEEE International Symposium on, 2008, pp 55-64 M Modarressi, H Sarbazi-Azad, and M Arjomand, "A hybrid packet-circuit switched on-chip network based on SDM," in Proceedings of the Conference on Design, Automation and Test in Europe, Nice, 2009, pp 566-569 A Faruque, M Abdullah, T Ebi, and J Henkel, "Configurable links for runtime adaptive on-chip communication," in Proceedings of the Conference on Design, Automation and Test in Europe, 2009, pp 256-261 M A Al Faruque, T Ebi, and J Henkel, "Run-time adaptive on-chip communication scheme," in Computer-Aided Design, 2007 ICCAD 2007 IEEE/ACM International Conference on, 2007, pp 26-31 P.-T Huang and W Hwang, "2-level FIFO architecture design for switch fabrics in network-on-chip," in Circuits and Systems, 2006 ISCAS 2006 Proceedings 2006 IEEE International Symposium on, 2006, pp pp.-4866 N.-K Dang, T.-V Le-Van, and X.-T Tran, "FPGA Implementation of a Low Latency and High Throughput Network-on-Chip Router Architecture," in proceeding of the 2011 international Conference on Integrated Circuits and Devices in Vietnam, Hanoi, Vietnam, 2011, pp 112-116 X.-T Tran, "Lecture: VLSI and ASIC dessign," VNU, University of Engineering and Technology 2009 70 Appendix ASIC & VLSI design flow The ASIC & VLSI design flow includes three main part as shown in Figure 4-12 [24] Figure 4-12: ASIC & VLSI design flow [24] The following paragraphs will gives detail steps of the design flow: Specification: Describe function, structure of the design Specification gives the specific description of I/O gates, timing constrain and architecture Specification is used in parallel for Design Entry process and testbench building Design Entry: Designing the entry using hardware description language such as VHDL or Verilog base on specific description from Specification, In this step, codes are used to check functions at Functional Verification then synthesized 71 Logic Synthesis: Synthesizing the design which is descripted in hardware description language to basic logic component such as logic gates, flip-flops, latches, wires The result includes netlist file and delay model After synthesizing, the result will be checked in Post-synthesis Verification then used to layout design This is final step of RTL Design Floorplanning: Partitioning for Place&Route step Place&Route: Including two steps: place for each basic component then wiring Result of this process are layout file and full delay model This result can be checked in PostPlace&Route Verification After finishing design process, we will received layout of Integrated Circuit which can be used for fabrication