Model-Based Design for Embedded Systems- P41 doc

Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 366 2009-10-1 366 Model-Based Design for Embedded Systems PLBv46 Slave PLBv46 master burst Slave buffer interface LocalLink write buffer LocalLink read buffer Reset logic Interrupt generation Bridge control logic Bridge status signals Interrupt request Reset Control bus Control bus Control bus DCR slave PLB bus PLB bus DCR bus Reconfigured region Bus macro enable Reconfigurable socket Reset request FIGURE 12.7 Reconfigurable socket abstraction based on the “PLBv46 PLBv46 bridge” architecture. The “PLBv46 slave” and “PLBv46 master burst” blocks are standard IP components and all blocks except the DCR slave block are part of the bridge. Bus macros are implicitly present on all signals crossing the bound- ary of the reconfigured region. An alternative is to architect the interface around a bus bridge, with inde- pendent busses in the static region and in the reconfigurable region. The design of the socket is based on partitioning the Xilinx “PLBv46 PLBv46 bridge” IP [23], as shown in the block diagram in Figure 12.7. Internally this core is based around 32-bit fixed-width data FIFOs and a small number of control signals. Most of the bridge is treated as part of the static region, with only a small amount of logic required in the reconfigurable region to complete the bridge. In addition to the bus interface, which is primarily used to interface to the reconfigured region, the socket core also contains a control interface (based on the DCR protocol [7]) which is used to generate an inde- pendent reset signal to the reconfigurable region and to force signals driven by the reconfigurable module to stable values during reconfiguration. 12.5.3 Direct Memory Access Interfaces The bus interface above is a generic and flexible interface, which can be used to communicate with the reconfigured portion of the system in different ways. For instance, it may be used by the processor to both send and receive data from the reconfigured region or as a control interface to set parame- ter values of IP cores executing in the reconfigured region. However, it does have several disadvantages. Primarily, the bandwidth of data to or from the Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 367 2009-10-1 FPGA Platforms for Embedded Systems 367 processor is limited because of the overhead of bus arbitration and the fact that the memory range is treated as uncached I/O transactions. Although performance could be improved somewhat for large transactions by using DMA engines or treating data transfer regions as cached and manually managing cache coherency, this would significantly increase the complexity of the processor software. Secondly, many FPGA algorithms require access to external memory for buffering data until it can be processed. For instance, in a network router, packet data may need to be stored until a routing decision can be made, or in a streaming video system, several frames of video data may need to be stored to analyze object motion between frames. Because of these limitations, it is best to consider the bus interface above as primarily an interface used for low-bandwidth control and configuration information. In systems that require higher bandwidth communication, or direct access to external memory, the control interface can be augmented with additional interfaces to memory. Although it may seem straightforward to include a complementary bus bridge that can be driven by the reconfigured region to provide this functionality, this tends not to be the highest bandwidth option since performance can be limited by the arbitration logic of the PLB bus. This logic is heavily pipelined in order to maximize the bus throughput under a wide variety of usage, typically incurring three cycles of latency before a slave can respond to a bus access. One solution is to provide an interface connected directly to the native port interface (NPI) of the Xilinx MPMC IP core, as shown in Figure 12.8. External memory (e.g., DDR/DDR2) Arbiter Multiported memory controller Physical interface FIFO FIFO PIM PIM FIGURE 12.8 Architecture of the Xilinx MPMC. Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 368 2009-10-1 368 Model-Based Design for Embedded Systems Typically, this interface exhibits both lower latency and higher bandwidth than the PLB bus. Although the MPMC must still arbitrate between different ports attempting to use the memory controller, this arbitration can be per- formed locally within the memory controller and concurrently with the data being provided. The only disadvantage of connecting directly to the memory controller is that other IP cores in the static region cannot be accessed from the reconfigured region. However, since in the SRP usage model these IP cores are likely being managed by device drivers in the operating system of the processor, it is questionable whether such access should be allowed anyway. 12.5.4 External Interfaces In addition to communicating with the static region, a reconfigurable module may also communicate with other interfaces external to the FPGA. In order to accomplish this, a reconfigurable region may include external I/O pins and/or high-speed serial transceivers. For the most part, these resources can be treated as any other FPGA primitives and can be placed and routed as usual. However, there is some complexity with regard to external I/O pins, since in many FPGA designs, the input/output buffer (IOB) primitives representing external I/O pins are not explicitly instantiated in a user design but are inferred in the synthesis process. Normally in a hierarchical design, the netlist can be synthesized using a special option to disable inference of these primitives, since they will be inferred or instantiated during synthesis of the toplevel design. However, when building a generic FPGA platform, relying on this may not be desirable, since the reconfigured region may require more control over the configuration of these primitives. In other cases, exactly which IOB primitives are explicitly instantiated in a reconfigurable module and which ones are not may not be known when the static design is synthesized and implemented. One way to solve this is to not expose any I/O pins of the reconfigurable region as external signals of the static region, implying that synthesis of the static design will never include IOB primitives for these pins. When a reconfigurable module is synthesized, signals interfacing with the static region are individually tagged with the constraint BUFFER_TYPE set to NONE, indicating that no IOB primitives should be inferred for those signals. High-speed serial transceivers also have additional design complexity, since each transceiver is associated with specialized clock resources in the FPGA. These clock resources typically include phase-locked loops for clock synchronization and dedicated clock distribution paths and may be shared between transceivers. From the perspective of building FPGA platforms, this resource sharing combined with how transceivers are grouped into configuration frames may need to be considered during the floorplanning stage in order to gain maximum usage of the available transceivers. Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 369 2009-10-1 FPGA Platforms for Embedded Systems 369 Static design flow EDK base system builder + hand design EDK BSP generator Linux design flow .elf EDK platgen .mhs .ucf Floor- planning .ngc .dts PR-enabled NGDBuild, Map, and PAR PR-enabled bitgen EDK genace.tcl static.ace .bit .ncd static.used UCF merge .ucf EDK platgen .mhs EDK hand design Module design flow Hand design .ucf .ngc PR-enabled NGDBuild, Map, and PAR .ncd PRMergeDesign + PR-enabled bitgen Meta-information C code gcc + objcopy EDK genace.tcl partial.bit .bit merged.ace configure.elf .ucf FIGURE 12.9 Design flow for PR systems based on EDK. 12.5.5 Implementation Flow The implementation flow for the system is shown in Figure 12.9. The static design is implemented first, as shown in the left-hand side of the figure, using the EA PR tools. During this sequence, no netlist for the reconfigurable region is present, and the place and route tools only implement logic for the static region. Design constraints are provided in a .ucf file and must include the required floorplanning constraints for the PR flow. After routing is com- pleted, the routing resources used by the static logic are saved in the file static.used for later use. Since by default the interface with the reconfigured region is driven to an idle state, the resulting bitstream can be used in a system without programming the remainder of the FPGA. The device tree for a particular design is generated from the EDK design, and after being con- verted to a binary device tree blob, can be included in the Linux kernel image, or stored as the initial value of a BRAM in the bitstream. Lastly, EDK is used to package the FPGA bitstream with the Linux kernel binary in a bootable image that can be used with Xilinx SystemAce [24] to boot the kernel. The right-hand side of Figure 12.9 shows a second pass for the implementation of a reconfigurable module. During this pass, the logic of the Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 370 2009-10-1 370 Model-Based Design for Embedded Systems reconfigurable module is implemented together with a small portion of the static logic called the “context logic.” The context logic is necessary to provide the context of the reconfigurable module, so that hierarchical names in the design and location constraints for clock signals and bus macros can be preserved. The design constraints for implementation are created by merging the design constraints from the static design with any additional design constraints specific to the reconfigurable module, such as pin location constraints. During this pass, the routing resources in the file static.used are excluded from use, since these resources are already used in the static design. The final bitstream for the reconfigurable module is generated by first merging the design database (contained in an .ncd file) from both passes, ensuring that the configuration bits used in the static design are pro- grammed correctly. In addition, design rule checks and timing analysis can be applied to the merged design database, to ensure that individual passes were implemented correctly. From the merged design database, it is possible to generate both a partial bitstream that can be used after configuration with the static bitstream and a merged bitstream which can be used as an initial configuration bitstream, with the reconfigurable module already loaded. To enable reconfiguration in a Linux system, the partial bitstream is encapsu- lated with the Linux code for performing PR and the meta-information about the reconfigurable module, to generate a Linux executable, as described in Section 12.6. 12.6 Managing Partial Reconfiguration in Linux Two device drivers are used to manage the reconfiguration process. Primar- ily, the device driver for the ICAP device performs the actual reconfiguration. When a partial bitstream is written to this device (for instance, using the cp command or the write() system call), the bytes are transferred to the ICAP. Since the device driver does not inspect or modify the stream of bytes, the data being written must include the appropriate control words, as expected by the configuration interface [26]. The device driver also includes simple locking of the ICAP resource, in order to prevent different processes from unexpectedly interleaving accesses to the ICAP. Readback is also possible using this device driver by writing the correct readback request bitstream to the ICAP and subsequently reading data (using the read() system call). The second device driver used to manage reconfiguration is associated with the reconfigurable socket core. This driver exports a character interface to which meta-information about a reconfigurable module can be written. A simple way of representing this meta-information is in the form of an array of struct platform_device, a data structure which is used internally by Linux to represent devices. A more complex, but perhaps more robust Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 371 2009-10-1 FPGA Platforms for Embedded Systems 371 Reconfigure FPGA Notify kernel of devices Load kernel modules Enable bus macros Reset reconfigurable module Processing Unload kernel modules Release devices Disable bus macros FIGURE 12.10 The reconfiguration process. representation of meta-information could be an additional device tree blob. This meta-information is parsed and checksummed and, if valid, is used to notify the Linux kernel of the presence of new devices, which can then be bound to other device drivers. An invalid checksum is interpreted as an indi- cation to unbind any previously loaded devices and release ownership of the reconfigured region. Secondarily, this device driver also enables and disables the bus macros between the static region and the reconfigured region, and controls the reset of the reconfigured region. As with the ICAP device driver, the socket device driver includes a simple locking mechanism in order to prevent a process from unexpectedly reconfiguring an active region in use by another process. The complete process of reconfiguration is shown in Figure 12.10. In the initial state, we assume no module is loaded in the reconfigured region. Next, a reconfigurable module is loaded into the FPGA through the ICAP device driver. Next, meta-information about the reconfigurable module issent to the socket device driver, which registers the presence of any new devices, resets the newly loaded module, and enables the interface between the static region and the reconfigurable module. At this point, although Linux is aware of the presence of the reconfigured devices, it may not have device drivers appropriate to those devices. Next, device drivers for new devices are provided by loading the appropriate kernel modules and the Linux kernel binds those device drivers to the reconfigured devices. At this point, application code may use the device drivers to communicate with the reconfigured region. A similar sequence of steps in reverse order occurs to unbind the device drivers and release the reconfigured region so that different processing may occur. Since the ICAP device and the control interface of the socket are exposed through device drivers, it is relatively straightforward to implement reconfiguration through a regular user process. One possibility for implementing Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 372 2009-10-1 372 Model-Based Design for Embedded Systems this involves linking the bitstream and meta-information into a single executable along with the code for reconfiguration. The process created when this executable is executed can be controlled through any operating system mechanism (such as POSIX signals) to manage the life cycle of the module loaded in the FPGA. The executable can also be linked together with other application code, resulting in a familiar processor-centric usage model for the FPGA fabric. This approach is similar in spirit, but greatly different in implementation from that proposed in [18], which performs essentially the same processes using the Linux kernel’s ability to implement new executable formats. It is important to recognize that although the reconfiguration process is managed by a user process, it must be treated as a privileged operation executed as the root user, since there are many places where both unintended errors and malicious attacks may result in unintended behavior. Some of these places are not specific to the PR process, such as loading kernel modules, whereas others are more subtle vulnerabilities. For instance, as noted before, partial bitstreams have significant constraints on how they are constructed and are specific to a particular implementation of the static system. More directly, it is possible to trigger reconfiguration of the FPGA through the ICAP interface, resulting in the loss of the current state of the system. If the bus macros are enabled during PR, then it is likely that glitch- ing on the interface signals will result in unintended behavior of the static system. One particularly common usage error is simply attempting to load a partial bitstream that does not correspond to the current implementation of the static design. This may happen during development when a modification is made to the static region, but a designer neglects to reimplement a reconfigured module. One way of avoiding such errors is to prepend each partial bitstream with a hash generated from the static design. This hash can also be stored in the static design, possibly in the device tree blob, and checked before being loaded into the FPGA. If the partial bitstream is not signed properly, then the reconfiguration process can be halted without affecting the operation of the static design. This technique can be simply applied to prevent unintended errors, or adapted using more cryptographically secure techniques to prevent malicious attacks [2,4]. 12.7 Putting It All Together This section illustrates a SRP design targeted at a variant of the WARP Software-defined Radio hardware built by Rice University [12]. Since the original hardware is based on an older Virtex 2 Pro FPGA, we present a design based on an updated Virtex 4 FX 100 device in order to better Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 373 2009-10-1 FPGA Platforms for Embedded Systems 373 PPC405 (ppc_virtex4 v2.00.b) Interrupt controller (xps_intc v1.00a) Multiported memory controller (mpmc v3.00b) Ethernet MAC (xps_ll_temac v1.01a) plb plb sdma Reconfigurable socket RS232 Uart (xps_uartlite v1.00a) ICAP Interface (opb_hwicap v1.00b) (using bridge) BRAM interface (xps_bram_if_cntlr v1.00a) npi dcr (using bridge) PLB bus Reconfigured region FIGURE 12.11 Architecture of a reconfigurable platform. Some signals and standard cores have not been shown. represent the PR capabilities of newer FPGA architectures. In particular, we focus on a MIMO OFDM reference design for this board, which implements a bridge from Wired Ethernet to a two-radio MIMO system. The design uses a processor to manage the packet headers and to perform configuration man- agement of the radios, while packet payloads are communicated directly between the wired and wireless network interfaces using direct memory access to a processor-managed memory buffer. In the reference design, the packed payload buffer is implemened in BRAM and communicated through a PLB bus. In the reconfigurable design, we assume that the packet payload buffer is implemented in external DRAM, which must be accessed from the reconfigurable region through a separate port of the memory controller. As a nonreconfigurable system, this design uses approximately 50% of the device (21294 of 42176 slices). The design of the static subsystem is shown in Figure 12.11. This design is architected around the PowerPC 405 processor core and was largely generated using the Base System Builder capability in Xilinx EDK. Standard serial port and ethernet IP cores provide external connectivity. Access to external 64 bit wide DDR2 SDRAM, including DMA access for the ethernet core, is provided by the Xilinx MPMC IP core. In this system, the processor, memory bus, and memory controller are designed to be “quasi-synchronous,” mean- ing that clocks must be edge-aligned. Based on the speeds of the individual components, a design point was chosen targeting a slow speed grade FPGA (−10) with the memory bus clocked at 83.3 MHz, the memory controller clocked twice as fast (166.6 MHz), and the processor clocked three times as fast (250 MHz). Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 374 2009-10-1 374 Model-Based Design for Embedded Systems Reconfigured region ICAP interface Control interface bus macros Memory interface bus macros Utilized powerPC core Static region FIGURE 12.12 Placed and routed design of an FPGA processor platform, targeting a Virtex 4 FX 100. The FPGA layout of the design is shown in Figure 12.12, overlaid with the PR floorplanning constraints. The static region is at the south of the chip, and is exactly two configuration frames tall. This layout provides approximately 8600 slices and 128 external I/O pins, which accommodates both the logic Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 375 2009-10-1 FPGA Platforms for Embedded Systems 375 requirements of a simple processor design, and the I/O pins requirements of a 64-bit DDR2 memory interface. A significantly smaller region would fail to provide enough logic cells for the static design, while a larger region would allocate too many pins to the static region, which would be difficult to access from the reconfigurable region. Note that the majority of the routed signals are contained within the floor- planned area for the static region. The routes entering the top region connect primarily to external I/O pins and FPGA resources, such as clock buffers and the ICAP, located in the center column of the FPGA. Some routes into the top region also connect to the PowerPC cores. Although only one PowerPC is actually used in the static design, current versions of the EA PR tools do not allow PowerPC cores to be part of the reconfigured portion of the design. Hence, this design instantiates both PowerPC cores in the static region, in order to enable use of the JTAG chain, which is assumed to connect through both cores. The device tree for this design is shown in Figure 12.13. Since the targeted board includes Xilinx SystemACE, this is used to configure the FPGA and initialize external memory with the kernel image. The compressed device tree blob is initialized in the BRAM at address 0xfffff800 and decom- pressed by the Linux bootwrapper executing out of external memory. The root filesystem is stored on an external file server and loaded over the network interface using the NFS protocol. 12.8 Conclusion Although high-level algorithmic modeling offers significant promise for increasing design productivity, a common problem with many approaches is representing the environment in which a model exists in a system. A solution to this problem is often to provide platforms that abstract lower level details, provide standardized interfaces, and can be targeted by a high-level design tool. Although this difficulty exists in any embedded system, it is particularly apparent in FPGA systems, which include complex IP blocks, such as processor cores, and where physical interfaces to the rest of the system are highly flexible and incorporate many features that cannot be easily modeled even at the circuit and gate level. However, using the architectural features of some FPGAs, such as PR, higher level platforms can be constructed that abstract many of these details and are more appropriate for mapping from a high-level design tool. This chapter has particularly shown how this technique can abstract the complex- ities associated with including a control processor and operating system as part of an FPGA platform. . transceivers. Nicolescu /Model-Based Design for Embedded Systems 67842_C012 Finals Page 369 2009-10-1 FPGA Platforms for Embedded Systems 369 Static design flow EDK base system builder + hand design EDK BSP generator Linux design flow .elf EDK platgen .mhs .ucf Floor- planning .ngc .dts PR-enabled NGDBuild, Map,. 12.8 Architecture of the Xilinx MPMC. Nicolescu /Model-Based Design for Embedded Systems 67842_C012 Finals Page 368 2009-10-1 368 Model-Based Design for Embedded Systems Typically, this interface exhibits. Nicolescu /Model-Based Design for Embedded Systems 67842_C012 Finals Page 366 2009-10-1 366 Model-Based Design for Embedded Systems PLBv46 Slave PLBv46 master burst Slave buffer interface LocalLink write

Định dạng
Số trang	10
Dung lượng	477,42 KB