High Level Synthesis: from Algorithm to Digital Circuit- P12 pps

10 431 0
High Level Synthesis: from Algorithm to Digital Circuit- P12 pps

Đang tải... (xem toàn văn)

Thông tin tài liệu

96 M. Meredith • Synthesize RTL that implements the SystemC semantics that were simulated • Use the same testbench for high-level simulation and RTL simulation The design can comprise a single module or multiple cooperating modules. In the case of multiple modules, the high-level SystemC simulation ensures that the modules are operating correctly individually and working together properly. This simulation validates the algorithms, the protocol implementations at the interfaces, and the interactions of the modules operating concurrently. The modules can then be synthesized, and the resulting RTL can be verified using the same testbench that was used at the high level. This is made possi- ble by the mixed-mode scheduling described earlier in which the algorithm is written as untimed SystemC while the interfaces are specified as cycle-accurate SystemC. Multiple testbench configurations may be constructed to verify various combinations of high-level modules and RTL modules. Single SystemC Testbench RTL Cynthesizer Socket C/C++ Algorithm Socket Cynthesizerincorporatesacompletedependencymanagementandprocessautoma- tion system that automatically generates needed cosimulation wrappers and testbench infrastructure to automate verification of multiple configurations of high-level and RTL modules without any need to customize the testbench source code itself. 5.11 Conclusion This chapter has outlined the synthesizable constructs of C++ and SystemC sup- ported by the Forte Design Systems in its Cynthesizer product. It has described specific techniques that can be used to encapsulate synthesizable communication protocols in C++ classes for maximum reuse and techniques used to automati- cally produce well-structured RTL for predictable timing closure. Finally, some of 5 High-Level SystemC Synthesis with Forte’s Cynthesizer 97 the user-visible mechanisms for controlling scheduling and the architecture of loop implementation have been discussed along with a brief discussion of verification issues automation incorporated in the Cynthesizer product. Hopefully, this has enabled the reader to understand how SystemC synthesis with Cynthesizer can be used to implement a broad range of functionality at multiple abstraction levels and how the use of high-level C++ and SystemC constructs raises the level of abstraction in hardware design. Chapter 6 AutoPilot: A Platform-Based ESL Synthesis System Zhiru Zhang, Yiping Fan, Wei Jiang, Guoling Han, Changqi Yang, and Jason Cong Abstract The rapid increase of complexity in System-on-a-Chip design urges the design community to raise the level of abstraction beyond RTL. Automated behavior-level and system-level synthesis are naturally identified as next steps to replace RTL synthesis and will greatly boost the adoption of electronic system-level (ESL) design. High-level executable specifications, such as C, C++,orSystemC, are also preferred for system-level verification and hardware/software co-design. In this chapter we present a commercial platform-based ESL synthesis system, named AutoPilot TM offered by AutoESL Design Technologies, Inc. AutoPilot is based on the xPilot system originally developed at UCLA. It automatically gener- ates efficient RTL code from C, C++ or SystemC descriptions for a given system platform and simultaneously optimize logic, interconnects, performance, and power. Preliminary experiments demonstrate very promising results for a wide range of applications, including hardware synthesis, system-level design exploration, and reconfigurable accelerated computing. Keywords: ESL, Behavioral synthesis, Scheduling, Resource binding, Interface synthesis 6.1 Introduction The rapid increase of complexity in System-on-a-Chip (SoC) design urges the design community to raise the level of abstraction beyond RTL. Electronic system- level (ESL) design automation has been widely identified as the next productivity boost for the semiconductor industry. However, the transition to ESL design will not be as well accepted as the transition to RTL in the early 1990s without robust synthesis technologies that automatically compile high-level functional descriptions into optimized hardware architectures and implementations. P. Coussy and A. Morawiec (eds.) High-Level Synthesis. c  Springer Science + Business Media B.V. 2008 99 100 Z. Zhang et al. Despite the past failure of the first-generation behavioral synthesis technology during the mid-1990s, we believe that behavior-level and system-level synthesis and optimizations are now becoming imperative steps in EDA design flows for the following reasons: • Embedded processors are in almost every SoC: With the coexistence of micro- processors, DSPs, memories and custom logic on a single chip, more software elements are involved in the process of designing a modern embedded sys- tem. It is natural to use C-based languages to program software for embedded processors. Moreover, the automated C-based synthesis allows the designer to quickly experiment different hardware/software boundaries and explore various area/power/performance tradeoffs using a single functional specification. • Huge silicon capacity requires higher level of abstraction: Design abstraction is one of the most effective methods for controlling rising complexity and improv- ing design productivity. For example, the study from NEC [10] shows that a 1M-gate design typically requires about 300K lines of RTL code, clearly beyond what can be handled by a human designer. However, the code density can be improved by more than 7X when moved to the behavior level. This results in a human-manageable 40K lines of behavioral description. • Verification drives the acceptance of SystemC: Transaction-levelmodeling (TLM) with SystemC [2] has become a very popular approach to system-level verifica- tion [8]. Designers commonly use SystemC TLMs to describe virtual software/ hardware platforms, which serve three important purposes: early embedded software development, architectural modeling and functional verification. The wide availability of SystemC functional models directly drives the needs for SystemC-based synthesis solutions, which automatically generate RTL code through a series of formal constructive transformations. This avoids the slow and error-prone manual process and simplifies the design verification and debugging effort. • Accelerated computing or reconfigurable computing needs C/C++ based compilation/synthesis to FPGAs: Recent advances in FPGAs have made recon- figurable computing platforms feasible to accelerate many high-performance computing (HPC) applications, such as image and video processing, financial analytics, bioinformatics, and scientific computing applications. Since HDLs are exotic to most application software developers, it is essential to provide a highly automated compilation/synthesis flow from C/C++ language to FPGAs. In this chapter we present a platform-based ESL synthesis system named AutoPilot TM , offered by AutoESL Design Technologies, Inc. AutoPilot is capable of automatically generating efficient RTL code from an untimed or partially timed C, C++ and SystemC description for the target hardware platform. It performs platform-based behavioral and system synthesis, tightly integrates with a modern leading-edge C/C++ compiler, and embodies a class of novel, near-optimal, and highly-scalable synthesis algorithms. 6 AutoPilot: A Platform-Based ESL Synthesis System 101 The synthesis technology was originally developed in the UCLA xPilot sys- tem [5], and has been licensed by AutoESL for the commercialization. In its current stage, AutoPilot exhibits the following key features and advantages: • Unified C/C++/SystemC design flow: AutoPilot accepts three kinds of stan- dard C-based design entries: C, C++ and SystemC. It also supports a variety of abstraction models including pure untimed functional model, partially timed transactional model, and fully timed behavioral or structural model. The broad coverage of languages and abstraction models allows AutoPilot to target a wide range of applications, including hardware synthesis, system-level design exploration and high-performance reconfigurable computing. • Utilization of state-of-the-art compiler technologies: AutoPilot incorporates a leading-edge commercial-strength C/C++ compiler in the synthesis loop. Many state-of-the-art compiler techniques (intra-procedural and inter-procedural) are utilized to analyze, transform and aggressively optimize the input behaviors. • Platform-based and implementation-aware synthesis: AutoPilot takes advantage of the target platform information to carry out more informed synthesis and opti- mization. The timing, area and power for the available computation resources and communication interfaces are all characterized. In addition, AutoPilot has tight integration with several downstream RTL synthesis and physical synthesis tools to assure better quality-of-result and higher degree of automation. • Interconnect-centric and power-aware optimization: AutoPilot is able to generate an optimized microarchitecture with consideration of the on-chip interconnects at the high level and maximize both data locality and communication locality to achieve faster timing and power closure. Furthermore, it can carry out aggressive power optimization using fine-grain clock gating and power gating. The reminder of this paper is organized as follows: Sect. 6.2 presents an overview of the AutoPilot design flow. Sections 6.3 and 6.4 briefly discuss the system front- end and highlight the synthesis engine, respectively. The preliminary experimental results are reported in Sect. 6.5. 6.2 Overall Design Flow The overall design flow of the AutoPilot synthesis system is shown in Fig. 6.1. AutoPilot accepts synthesizable C, C++, and/or SystemC as input and performs four major steps to generate the cycle-accurate RTLs, which includes compilation and elaboration, advanced code transformation, core behavioral and communication synthesis, and microarchitecture generation. In the first step the behavioral description is parsed by a GCC-compatible front- end compiler, with the extensions to handle the bit-accurate integer data types. For SystemC designs, elaboration will be invoked to extract processes, ports, channels, and interconnection topologies and construct a detail-rich system-level synthesis data model. 102 Z. Zhang et al. C/C++/SystemC C/C++/SystemC Timing/Power/ Timing/Power/ Layout Constraints Layout Constraints RTL SystemC & RTL SystemC & RTL HDLs RTL HDLs Platform Models ASICs/FPGAs ASICs/FPGAs Implementation Implementation = Simulation Compilation & Compilation & Elaboration Elaboration Advance Code Advance Code Transformation Transformation Behavioral & Communication Behavioral & Communication Synthesis and Optimizations Synthesis and Optimizations AutoPilot TM Common Testbench User Constraints User Constraints ESL Synthesis Design Specification Microarchitecture Microarchitecture Generation Generation Verification Fig. 6.1 AutoPilot TM design flow On top of the synthesis data model, AutoPilot applies a set of advanced code transformations and analyses to optimize the input behavior, including traditional compilation techniques such as constant propagation and dead code elimination, and hardware-specific passes such as bitwidth analysis and optimization. The AutoPilot front-end will be discussed in Sect. 6.3. The code transformation phase is followed by the core hardware synthesis phase. AutoPilot performs platform-based synthesis and interconnect-centric optimizations during scheduling and resource binding; these take into account the user-specified frequency/latency/throughput/resource constraints and generate optimized microar- chitectures. We shall discuss more details of the synthesis engine in Sect. 6.4. At the back-end, AutoPilot outputs RTL VHDL/Verilog code together with con- straint files (e.g., multicycle path constraints, physical location constraints, etc.) to leverage the existing logic synthesis and physical design toolset for final imple- mentation on either ASICs or FPGAs. It is worth noting that RTL SystemC code is also generated, which can be directly compiled and simulated with the original C/SystemC test bench to verify the correctness of the synthesized RTLs. 6.3 AutoPilot Front-End In this section we discuss three major aspects of the AutoPilot front end, i.e., the language support, compiler optimizations, and the platform modeling. 6 AutoPilot: A Platform-Based ESL Synthesis System 103 6.3.1 Language Coverage 6.3.1.1 C/C++ Support AutoPilot has a broad coverage of the C and C++ language features. It provides comprehensive support for most of the commonly-used data types, operators, struct/ class constructs, and control flow constructs. Due to the fundamental difference between the memory models of software and hardware, AutoPilot currently dis- allows the usage of dynamic pointers, dynamic memory allocations, and function recursions. Designers can fully control the data precisions of a C/C++ specification. AutoPi- lot directly supports single and double precision floating-point types. In addition, it adds the capabilities (compared to xPilot) in compiling and synthesizing bit-accurate fixed-point data types, for which standard C and C++ language lack native support. • Arbitrary-precision integer (APInt) data types: The user can specify that an inte- ger type’s precision (bit width) is any number of bits up to eight million. For example, int24 declares an 24-bit signed integer value. Constant values will be zero or sign extended to the indicated bit width if necessary. • Arbitrary-precision fixed point (APFixed) data types: AutoPilot provides a syn- thesizable templatized C++ library, named APFixed, for the designer to describe fixed-point math. APFixed library implements the common arithmetic routines via operator overloading and supports the standard quantization and saturation modes. • IEEE-754 standard single and double precision floating point data types are fully supported in AutoPilot for FPGA platforms. Common floating-point math rou- tines (e.g., square root, exponentiation, logarithm, etc.) can be also synthesized. 6.3.1.2 SystemC Support AutoPilot fully supports the OCSI synthesizable subset [1] for the SystemC synthesis. Designers can make use of SystemC bit-accurate data types (i.e., sc int/sc uint, sc bigint/sc biguint,andsc fixed/sc ufixed) to define the data precisions. Multi- module hierarchical designs can be specified and synthesized with the SC MODULE constructs. Within each module, multiple concurrent processes can be declared with the SC METHOD and SC CTHREAD constructs. 6.3.2 Advanced Code Transformations A variety of compiler optimization techniques are applied to the behavioral descrip- tion code with the objective to reduce the code complexity, maximize the data locality, and expose more parallelism. The following transformations and analyses 104 Z. Zhang et al. are particularly instrumental for AutoPilot hardware synthesis. • Traditional optimizations such as constant propagation, dead code elimination, and common subexpression elimination that avoid functional redundancy. • Strength reductions that replace expensive operations (e.g., multiplications and divisions) with simpler low-cost operations (e.g., shifts, additions and subtrac- tions). • Transformations such as if-conversion and tree height reduction that explicitly expose fine-grain operator-level parallelism. • Coarse-grain code restructuring by loop transformations such as loop unrolling, loop flattening, loop fusion, etc. • Analyses such as bitwidth analysis, alias analysis, and dependence analysis that help to reduce the data widths and analyze the data and control dependences. These transformation are either performed locally within the function bodies, or applied intraprocedurally across the function call hierarchy. 6.3.3 Platform Modeling AutoPilot takes full advantage of the target platform information to carry out more informed synthesis and optimization. The platform specification describes the avail- abilities and characteristics of the important system building blocks, including the on-chip computation resources and the selected communication interfaces. Component pre-characterization is involved in the modeling process. Specifi- cally, it characterizes the delay, area, and power for each type of hardware resource, such as arithmetic units (e.g., adders and multipliers), memories (e.g., RAMs, ROMs and register files), steering logic (multiplexors), and interface logics (e.g., FIFOs, and bus interface adapters). The delay/area/power characteristic functions are derived by varying the bit widths, number of input and output ports, pipeline intervals and latencies, etc. To facilitate our interconnect-centric synthesis. The het- erogeneous resources distribution map and the distance-based wire delay lookup tables are also constructed. AutoPilot greatly extends the platform modeling capabilities in xPilot. It can sup- port advanced ASIC process (e.g., TSMC 90 and 65 nm technologies), a wide range of FPGA device families (e.g., Xilinx Virtex-4/Virtex-5, Altera Stratix II/Stratix III) and various accelerated computing platforms (e.g., Nallatech [4] and XDI [3] acceleration boards). 6.4 AutoPilot Hardware Synthesis Engine This section highlights several important features of the AutoPilot synthesis engine, including scheduling, resource binding, pipelining, and interface synthesis. 6 AutoPilot: A Platform-Based ESL Synthesis System 105 6.4.1 Scheduling An efficient and versatile scheduler is implemented in the AutoPilot system to exploit parallelism in the behavior-level design and determine the time at which different computations and communications are performed. The core scheduling algorithm is based on a mathematical programming formulation. It has significant advantages over the prior approaches in two major aspects: • Versatility: Our scheduler is able to model a rich set of scheduling constraints (including cycle time constraint, latency constraints, throughput constraint, I/O timing constraints, and resource constraints) in the constraint system, and express different performance metrics (such as worst-case and average-case latency) in the objective function. Moreover, several important synthesis optimizations such as operation chaining, structural pipelining, behavioral template, slack distribution, etc., are all naturally encoded in a single unified mathematical framework. • Efficiency and scalability: Our scheduler is highly efficient and scalable when compared to the other constraint-driven approaches. For instance, typical ILP formulations uses discrete 0–1 variables to model the assignment relationships between operations and time steps, this requires lots of variables and complex equations to express one scheduling constraint since all feasible time steps should be considered. In our formulation, variables directly represent operation execu- tion time and are independent of the final schedule latency. This leads to much more compact constraint system, and the mathematical programming model can be efficiently solved in a few seconds for very complex designs, as evidenced by the Xilinx MPEG-4 design (to be discussed in Sect. 6.5). The first generation of our scheduler was based on the SDC-based scheduling algorithm and the technical details are available in [7]. 6.4.2 Resource Binding Resource binding determines the numbers of functional units and registers, and the sharing among compatible operations and data transfers. It has a dramatic impact on the final design quality as they determine the interconnection network with wires and steering logic. AutoPilot is capable of providing optimized binding for various functional units and memory blocks, such as integer and floating-point arithmetic units, transcen- dental functions, black-box IP blocks, registers, register files, RAMs/ROMs, etc. AutoPilot’s binding algorithm can also generate different microarchitectures. For example, it has an option to generate a distributed register-file microarchitecture (DRFM) to optimize both data and communication localities. DRFM has a semi-regular structure which consists of one or multiple islands. As illustrated in Fig. 6.2, each DRFM island contains a local register file (LRF), 106 Z. Zhang et al. Island A Data- Routing Logic Local Register File FUP MUX Island B Functional Unit Pool MUL AL AL Island C Island D Input Island E Island F Fig. 6.2 Distributed register-file microarchitecture a functional unit pool (FUP), and data-routing logic. The LRF serves as the local storage in an island. Each register file allows a variable number of read ports but only a fixed number (typically one) of write ports. The LRF stores the results pro- duced from the local computation units in FUP and provides data to both local FUP and the external islands. By clustering LRF and FUP into a single island, we are able to maximize both data/computation locality and communication locality. This also helps us avoid, to a large extent, the centralized memory structures and global communications which often become the bottlenecks limiting system efficiency in performance, area, and power.To handle the necessary inter-island communications, we use the data-routing logic to route data from the external islands. DRFM is a semi-regular microarchitecture. The configurations of the LRF, FUP and the data-routing logic are application-specific. One important objective that DRFM-based resource binding tries to minimize is the inter-island connections. This will simplify the data-routing logic in each island and reduce the overall complexity of the resulting datapath. The technical details of the DRFM-based resource binding algorithm are avail- able in [6]. 6.4.3 Pipelining AutoPilot’s synthesis engine (during scheduling, resource binding, and microar- chitecture generation) supports several forms of pipelining to improve the system performance. . Algorithm Socket Cynthesizerincorporatesacompletedependencymanagementandprocessautoma- tion system that automatically generates needed cosimulation wrappers and testbench infrastructure to automate verification of multiple configurations of high- level and RTL. urges the design community to raise the level of abstraction beyond RTL. Automated behavior -level and system -level synthesis are naturally identified as next steps to replace RTL synthesis and. transition to ESL design will not be as well accepted as the transition to RTL in the early 1990s without robust synthesis technologies that automatically compile high- level functional descriptions into

Ngày đăng: 03/07/2014, 14:20

Từ khóa liên quan

Mục lục

  • cover.jpg

  • front-matter.pdf

  • fulltext.pdf

  • fulltext_001.pdf

  • fulltext_002.pdf

  • fulltext_003.pdf

  • fulltext_004.pdf

  • fulltext_005.pdf

  • fulltext_006.pdf

  • fulltext_007.pdf

  • fulltext_008.pdf

  • fulltext_009.pdf

  • fulltext_010.pdf

  • fulltext_011.pdf

  • fulltext_012.pdf

  • fulltext_013.pdf

  • fulltext_014.pdf

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan