Model-Based Design for Embedded Systems- P40 doc

Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 356 2009-10-1 356 Model-Based Design for Embedded Systems FXINA MUXFX FXINB G D LUT BY F BX CE CLK SR Inputs LUT MUXF5 F5 X XQ FX Y YQ FF/LAT DQ CE CLK SR D Inputs REV FF/LAT D Q CE CLK SR REV FIGURE 12.2 Simplified architecture of Xilinx Virtex 4 slice [27]. The multiplexers in the middle are primarily used to implement wide multiplexers from several slices. (From Xilinx, Virtex-4 FPGA User Guide, ug070 v2.40 edition, April 2008. With permission.) where some configuration frames are reconfigured while other portions remain active. In Virtex 4 FPGAs, the configuration frames themselves are organized in columns along the North–South axis of the FPGA. Each configuration frame is the height of 16 CLBs or 4 BRAM memory elements and matches the height of the clock distribution tree. Hence, PR of large portions of the FPGA is best done using rectangular regions that are a multiple of 16 CLBs in that direction. In the East–West direction, the columns are narrow (requiring many configuration frames to configure all of the LUTs in one CLB), which enables the exact size of a reconfigurable region to be more finely controlled. Note that in Virtex 2 and Virtex 2 Pro FPGAs, the configuration frames cross the entire device in the North–South direction, making connectivity between regions more difficult. Although PR is possible in these families, rather complex architectures tend to be used [11]. Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 357 2009-10-1 FPGA Platforms for Embedded Systems 357 Hold phi1 In phi2 Pass device Basic cell Pass device FIGURE 12.3 Early FPGA configuration logic [5]. phi1, phi2, and“hold” signals control loading of data into the shift chain. Blocks marked “pass device” are controlled by the configuration logic. (From Xilinx, Virtex-4 FPGA User Guide, ug070 v2.40 edition, April 2008. With permission.) Although the logic in a design can often be floorplanned to fit the nat- ural layout of the configuration frames, signal routing is often much more problematic. For instance, the FPGA architecture may require certain external I/O pins to be used for certain purposes, such as clock inputs. It may also be difficult to floorplan a region containing exactly the right number of external pins, while still maintaining a reasonable mix of other elements. These difficulties can be reduced by allowing static signals to be routed through reconfigured regions of the FPGA. Implementing such “route-overs” require both capabilities in the FPGA architecture and capabilities in the design tools. The FPGA architecture must support the ability to overwrite the configuration of routing resources without causing active signals using those resources to glitch. This capability is supported by Xilinx Virtex 2, Virtex 2 Pro, Virtex 4, and Virtex 5 FPGAs, but not by lower cost Spartan 3 FPGAs. The design tools must have the comple- mentary capability to generate bitstreams for reconfigurable regions where route-overs use exactly the same set of configuration bits to route each signal. This capability is implemented in the Xilinx early access (EA) PR tools using a “merge-based” process [17]. In this process, the static portion of the design is placed and routed first and the routing resources used are stored in a design database. This database, combined with floorplanning constraints, are used to constrain the routing of reconfigurable modules to lie within the boundaries of the reconfigurable region and avoid routing resources used by route-overs. To generate a partial bitstream, the implementation of each reconfigurable module is first merged with the implementation of the static region, ensuring that any route-over uses the same signal routing as the static design. Using this process, each reconfigurable module can be implemented without the knowledge of the implementation of any other reconfigurable Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 358 2009-10-1 358 Model-Based Design for Embedded Systems module and configured independently, as long as every configuration frame is guaranteed to contain information from the static design and at most one reconfigurable region. From the perspective of the configuration logic, the process of loading a partial bitstream is handled in exactly the same way. However, from the perspective of building systems, there are several key differences. Primar- ily, a partial bitstream never contains the configuration commands that are normally present in a bitstream to trigger the initialization and power-on- reset process of the FPGA, since issuing such commands would immediately halt processing in the static region. As a result, a PR design must never rely on the power-on-reset state of flip-flops for proper operation. Secondarily, although routing resources can be reconfigured without glitching in some FPGA architectures, any signal that is sourced by a flip-flop or register that is reconfigured will still glitch during reconfiguration. As a result, extra logic is typically included to ensure that signals driven from the reconfigured region into the static region are forced to a value during reconfiguration. 12.2.3 Partial Reconﬁguration with Processors The PR process itself can be initiated either through an external configuration interface, such as Xilinx SelectMap interface or the joint test action group (JTAG) boundary scan interface, or internally, through the internal configuration access port (ICAP) [26]. The most convenient way to use the ICAP is by using a processor, such as the Xilinx Microblaze processor or PowerPC hard cores found in some FPGAs. A program running on the processor in the static region of the FPGA can make decisions about when reconfiguration should occur and can load an appropriate partial bitstream through the ICAP. When used in this way, the combination of FPGA plus the static design capable of reconfiguration is often called a “self-reconfiguring platform” (SRP) [1,17,22]. The basic architecture of an SRP is shown in Figure 12.4. One example of how such a system might work is shown in Figure 12.5. This system includes a large number of FPGA computational units in a modular rack-mounted system. Data arrives on the right of the figure and is processed by FPGAs directly connected to A/D converters. Under control of the control workstation, data is routed through a network switch to other FPGA computational units for further processing and data reduction. The processed data is stored or displayed by the control workstation. A similar system, is currently in use at the Allen Telescope Array, using racks of FPGA boards to combine the results from a large number of radio telescopes [13,14]. In order to provide scalability and fault tolerance, each computational unit performs self-checks when it is first powered on. Based on these checks, the unit notifies a centralized server of its availability. When work is available, the centralized server distributes it to any available and unallocated computational units. If any units fail (based on periodic internal checks, or external verification of work results), the centralized server can decide not Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 359 2009-10-1 FPGA Platforms for Embedded Systems 359 Control processor Reconfigurable FPGA resource(s) Common I/O interfaces (Console, JTAG) Control/data bus External memory interface ICAP interface FPGA Sys I/O FIGURE 12.4 Basic architecture of a SRP. (From Xilinx, Virtex-4 FPGA User Guide, ug070 v2.40 edition, April 2008. With permission.) Control workstation Network switch Control proc. FPGA resource Control proc. FPGA resource Antenna inputs A/D FIGURE 12.5 A radio telescope system architecture based on FPGAs. (From Xilinx, Virtex-4 FPGA User Guide, ug070 v2.40 edition, April 2008. With permission.) to assign additional work to the failed unit and schedule a replacement. This management and coordination task is handled by distributed software executing on the control workstation and the control processors in each FPGA unit. Another example, based on a software-defined radio is shown in Figure 12.6. In this system, a large number of different communication protocols, called “waveforms,” must be implemented in a system although only a small number are active at any one time [21]. In Figure 12.6, waveforms are exe- cuted primarily in the reconfigurable FPGA resources on the right. The control processor responds to events initiated by the user of the system through the interfaces on the left, controls reconfiguration of the FPGA resources, and manages the transfer of data between the radio and the other interfaces of the system. When a connection is established, the correct waveform is selected from a library of FPGA implementations and inserted into the system. This type of system also enables a straightforward path toward supporting new Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 360 2009-10-1 360 Model-Based Design for Embedded Systems To antenna A/D Control proc. FPGA resources D/A Audio/video keypad interfaces Control/data bus FIGURE 12.6 A software defined radio architecture based on FPGAs. waveforms through any device that the processor has access to, including a wireless network connection based on an existing waveform supporting data traffic. 12.2.4 Reusable FPGA Platforms for Embedded Systems Typically, the SRP concept is seen largely as a mechanism for enabling bet- ter use of the reconfigurability of the FPGA. Such a system may consume less power, cost less, or be more flexible than an equivalent system without reconfiguration, since only the portion of the system that is active needs to be loaded in the FPGA. In practice, however, these advantages are often difficult to realize, because of the complexity of the resulting system. Compared with a processor, which is typically capable of switching between processes in hundreds of cycles, reconfiguration of a large FPGA may take hundreds of thousands of cycles. In order to leverage FPGA reconfigurability, systems must be capable of accepting this latency. If a task needs to be resumed later, its state must be saved and reloaded, adding not only additional latency but also storage requirements. Even if multiple tasks can be time-shared, realiz- ing a cost savings by fitting a design into a smaller FPGA is difficult since only discrete sizes of FPGAs are available and there is some overhead in using PR techniques. The SRP concept can also be viewed as a means for enabling faster and more robust system design. The processor is decoupled from the bulk of the FPGA design, enabling it to be designed, verified, and optimized in a working system independent from the FPGA design. Within a given application domain (such as designs requiring DDR2 memory and Gigabit Ether- net as basic infrastructure), the processor system can also be made generic and leveraged across different designs. The processor also becomes centrally involved in how the bulk of the FPGA is configured, enabling flexible programming of the configuration process, rather than relying solely on fixed configuration modes. As a result, new capabilities, such as Built-in Self Test or network-enabled processing resources, can be enabled, which previously required an external processor. Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 361 2009-10-1 FPGA Platforms for Embedded Systems 361 A key benefit of this view is that an SRP can encompass system software, such as an operating system and user libraries, in addition to just the processor subsystem. This can greatly reduce overall risk in the design process, since a system designer does not need to be concerned with verifying the fundamental abstractions of the operating system. By enabling most application code to be managed by the operating system in “user-space,” application programming errors can be more easily localized and debugged and the underlying operating system mechanisms can be continually improved. The programmable portion of the FPGA becomes simply another hardware resource managed by the operating system. This management can be implemented once, verified and reused, eliminating the possibility of difficult- to-debug errors that might occur as a result of FPGA reconfiguration. Furthermore, when combined with higher level design techniques, such as C-to-FPGA tools with strong compiler analysis and optimization, application programming in the FPGA can be given the same “user-space” guarantees as application code running on a processor. The remainder of this chapter will provide a basic introduction into con- structing Linux-based FPGA platforms using PR. The first section describes what is required to boot Linux on a PowerPC-based FPGA design. The following section describes the additional constraints of PR in more detail. The final section describes a particular SRP engineered for wireless communication systems. 12.3 EDK Designs with Linux To a large extent, running Linux on a processor implemented inside an FPGA is very similar to running Linux on any otherembeddedprocessor. However, exploring this area can be complex, since FPGAs allow a system designer to change not only the processor code, but also the processor, peripherals, and interconnect architecture in a system. In addition, some interfaces, such as Gigabit ethernet and PCIexpress, are complex pieces of FPGA IP managed by complex subsystems within the Linux kernel. Since this section cannot address all of this complexity, it will focus on some of the general FPGA- specific aspects of working with Linux, focusing on the PowerPC processor embedded in some FPGAs. For a complete introduction to working with embedded Linux, there are many excellent books and online resources. ∗ 12.3.1 Design Constraints One of the key complexities in using Linux on FPGA-based designs is the great variety of systems that can be constructed using FPGAs. EDK provides ∗ See [3,28], and http://git.xilinx.com. Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 362 2009-10-1 362 Model-Based Design for Embedded Systems access to a large number of IP cores, but it is up to the system designer to construct a system correctly to meet a particular design goal. In many FPGA systems, for instance, processors are used directly in a datapath, rather than as a “control processor,” and may not require additional IP cores in order to function. However, due to the architecture of the Linux kernel, a number of IP cores must be used in addition to the basic processor core in order to create a Linux-capable processor system. Some of these requirements are fundamental to the way Linux works, such as the need for a root file system. Other requirements are specific to the design of a particular architecture. In particular, in order to run a Linux on a PowerPC-based EDK design, it is typically necessary to • Include access to significant amounts of external memory, typically DDR or DDR2 SDRAM). • Include an interrupt controller, which aggregates the interrupt lines of most IP cores to the processor. • Include memory at physical address 0, to service the PowerPC trap mechanism. Typically, this will be the external memory. • Include memory at the reset vector 0xFFFFFFFC. Typically, this will be a small amount of FPGA BRAM. • Include a console device. In most embedded designs, this is provided by a serial port. • Include a source for the root filesystem and user applications. A wide variety of options may be used for the root filesystem, including on-board Flash memory, disk drives, and networked booting using network file system (NFS). Designs generated using EDK base system builder will satisfy these constraints, as long as an external memory IP such as the Xilinx multiported memory controller (MPMC) and some BRAM is included in the design, and “use interrupt” is always selected for other IP. 12.3.2 Device Trees Historically, 32-bit PowerPC-based architectures and 64-bit PowerPC-based architectures have been supported by two independent code bases in Linux. However, as of Linux kernel version 2.6.27, most 32-bit PowerPC-based architectures in Linux have been merged with the 64-bit architectures, and the older code has been removed. The unifying concept behind this merge is a generic mechanism for exposing hardware device information to the kernel, called the “device tree.” In desktop and server applications, the device tree is typically constructed by querying the correct information from Open Firmware. In PowerPC systems without Open Firmware, as often occurs in embedded systems, the device tree is explicitly specified and passed as a binary structure to the kernel at boot time [6,15]. Device trees are a powerful mechanism for FPGA systems, since every FPGA design typically has an application-specific mix of peripherals. For Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 363 2009-10-1 FPGA Platforms for Embedded Systems 363 most operating systems, including older versions of Linux, EDK includes a board support package (BSP) generator that generates header files containing compile-time constants (#define statements) describing the IPs and memory map of a particular design. As a result, when the operating system is compiled, it is specific to a particular design. Modifications to the FPGA platform require recompilation of the operating system kernel. Device trees introduce an additional level of indirection in this process, decoupling the Linux kernel from the hardware it is running on. With this level of indirection, it becomes straightforward to run the same kernel binary on different FPGA designs. When combined with the ability to load operating system modules into a running kernel, the indirection also enables a system to react to reconfiguration of the FPGA system. The simplest way to use a device tree is to link a binary version of it (a “device tree blob”) with the Linux kernel, resulting in a kernel that is again specific to particular hardware or FPGA design. More commonly, a boot loader may retrieve the device tree blob from a stored location and pass it to the Linux kernel at boot time. In PowerPC FPGA systems, the device tree blob can be conveniently stored in BRAM. This enables the device tree blob to be strongly associated with the FPGA design it describes and allows the same Linux kernel to be used for different FPGA designs. It is also relatively inexpensive, since a compressed device tree block typically fits in the single BRAM that must exist at the reset vector anyway. 12.4 Introduction to Modular Partial Reconfiguration In this section, we describe the use of the Xilinx (EA) PR flow, based on Xil- inx ISE 9.2.4 and Xilinx EDK 9.2.2 to build a SRP. In this flow, a single partial bitstream is generated for each reconfigurable module and reconfiguration is performed by simply writing this bitstream into the configuration memory of the FPGA. This flow implies some additional floorplanning design constraints in addition to the fundamental architectural constraints implied by the FPGA architecture. First, any reconfigurable region must be floorplanned within the device and constrained to a particular region of the device. In addition, it is often useful to explicitly floorplan the static region in order to gain more information about resource usage in that portion of the design. Each region is represented by an AREA_GROUP constraint, which must be declared explicitly by the system designer. Typically, each AREA_GROUP consists of all the FPGA resources within a contiguous rectangular region of the FPGA device, including not only LUTs and FFs, but also BRAMs, DSP blocks, and routing resources. However, in some cases, it may be useful to include additional noncontiguous resources in the AREA_GROUP for the static region, such as I/O pins or processor blocks. Remember that routes within a static region Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 364 2009-10-1 364 Model-Based Design for Embedded Systems can cross over a reconfigurable region. Additionally, no two reconfigurable regions can include the same configuration frame, although a reconfigurable region and a static region can share the same configuration frame. Furthermore, additional design restrictions are required by the EA PR tools in order to properly route signals across the boundary between the static region and the reconfigured region. For nonclock signals, signals pass- ing between the static region and the reconfigured region must pass through a primitive “bus macro” component. These bus macros must also be constrained to particular locations in the device, ensuring that the portion of signal routed with the static region and the portion routed with the reconfigured region align correctly. Historically, different forms of bus macros, making use of different FPGA resources, such as tristate buffers or slices with varying logic included have been proposed [17]. In the EA PR tools, two types of slice-based bus macros are usually used. An older form consists of two adjacent slices and are required to be placed exactly on the boundary of a reconfigurable region. However, merge-based PR with route-overs enables a simpler bus macro consisting of only a single slice, which is located inside the reconfigured region. Clock signals are generally distributed by a specialized low-skew clock tree in the FPGA, and are treated independently from other signals. In Virtex 4 devices, these clock signals are sourced at the center of the device and con- nect to one of eight horizontal wires that distribute the clock in each clock region. Hence only eight clocks can be used in each clock region. Instead of using bus macros, each source of a global clock, represented by a BUFG primitive, is implemented in the static region and constrained to a particular location in the device. In addition, the configuration frames containing the connections to the horizontal clock tree are configured only based on the structure of the static design and are assumed to never be reconfigured. This simplifies the allocation of clocks in a reconfigurable region, although as a result each reconfigurable region is restricted to at most eight global clock signals, in addition to any additional constraints if multiple reconfigurable regions are placed in a clock region. 12.5 EDK Designs with Partial Reconfiguration As of version 9.2.2, EDK does not directly support the creation and implementation of PR designs. However, by following some simple guidelines, it is possible to construct a design in EDK that can be partially reconfigured. This section describes a procedure for implementing such a design, focusing on the concept of a SRP. This technique relies on two independent EDK designs, one for the static region containing the processor subsystem and a separate design for the reconfigurable module. The interface between the two regions is represented by an IP core, which conceptually encapsulates Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 365 2009-10-1 FPGA Platforms for Embedded Systems 365 the reconfigurable region along with bus macros and the interface control logic. 12.5.1 Abstracting the Reconﬁgurable Socket Within EDK, all logic must be encapsulated within IP cores. In PR designs, the logic that is instantiated must include bus macros, the module that will be reconfigured, and any control logic for controlling the bus macros and generating other signals. For simplicity, it is easiest to assume that this logic will be encapsulated within a single IP core, which we will call a “reconfigurable socket.” Although some other logic is necessary, including generating clocks and interfacing with the ICAP port, it is easiest to simply reuse existing IP cores provided by Xilinx. This enables simple systems with only a single reconfigurable region and more complex systems with multiple regions to be easily constructed by instantiating multiple reconfigurable socket IP cores. 12.5.2 Interface Architecture The interface of the reconfigurable socket is a critical system-level design decision. Since this interface is fixed with the design of the static system, it must be flexible enough to allow any anticipated applications to be implemented inside the reconfigurable region. For systems where the static design must support a set of reconfigurable modules designed for a particular application, this can be done relatively easily. However, in order to implement an application-independent static design and to enable reuse of the socket IP core, a generic interface must be chosen. Architecting this interface around a standard bus protocol, such as the IBM CoreConnect processor local bus (PLB), provides this flexibility. Most currently available Xilinx IP are based on an FPGA-optimized variant of version 4.6 of this specification [8,25]. However, using this standard directly is somewhat difficult. One difficulty is the large number of signals that is required to implement an arbitrary PLB slave, including up to 128 data signals each way, 64 address signals, plus a large number of additional control signals. In total, this requires over 300 signals to be passed across bus macros, even though most systems are unlikely to implement 128-bit wide slaves. A second difficulty is that some of the widths of the bus control signals depend on the number of masters and slaves. This makes modifying the system difficult, since each signal must be given explicit placement constraints. A final difficulty is that the Xilinx EDK computes the data width of the bus based on the maximum data width of all masters and slaves, and masters and slaves must know the width of the bus in order to include the correct logic to communicate with masters and slaves of different widths. Because of the design flow described above, which uses separate EDK designs to represent the static design and any reconfigurable modules, exposing this width information to EDK is difficult. . every FPGA design typically has an application-specific mix of peripherals. For Nicolescu /Model-Based Design for Embedded Systems 67842_C012 Finals Page 363 2009-10-1 FPGA Platforms for Embedded. Nicolescu /Model-Based Design for Embedded Systems 67842_C012 Finals Page 356 2009-10-1 356 Model-Based Design for Embedded Systems FXINA MUXFX FXINB G D LUT BY F BX CE CLK SR Inputs LUT MUXF5 F5 X XQ FX Y YQ FF/LAT DQ CE CLK SR D Inputs REV FF/LAT D Q CE CLK SR. the centralized server can decide not Nicolescu /Model-Based Design for Embedded Systems 67842_C012 Finals Page 359 2009-10-1 FPGA Platforms for Embedded Systems 359 Control processor Reconfigurable FPGA resource(s) Common I/O

Định dạng
Số trang	10
Dung lượng	446,28 KB