CHAPTER 8 Hardware Design Using DSP Chips 8.1 INTRODUCTION In Chapter 7, we used the fdatool to illustrate the analysis and design of a digital filter in which the coefficients of the filter and the input samples are represented by a finite number of bits. We also found the effect of rounding or truncating the results of adding signals or multiplying the signal value and the coefficient of the filter and ascertained that there is no possibility of limit cycles or unstable operation in the filter. In the example chosen we decided that an FIR filter would meet the frequency response specifications of a lowpass elliptic filter, with a wordlength of 8 bits. Very often, however, a digital filter is used as a prominent part of a digital system such as a cell phone, which has other components such as power supply, keyboard, or other I/O interfaces. So we have to simulate the performance of the whole system with all components connected together in the form of a block diagram. 8.2 SIMULINK AND REAL-TIME WORKSHOP Simulink is the software that is available as a companion toolbox to MATLAB and is used to model and simulate the performance of dynamic systems, under varying conditions. Just as MATLAB works with a number of toolboxes, Simulink has access to a library of many additional tools called blocksets, such as the DSP blockset, fixed-point blockset, communications blockset, and control system blockset, as shown on the left side of Figure 8.1. The Simulink browser library includes blocksets for simulation of aeronautical and mechanical systems, too, namely, are aerospace blockset and simMe- chanics . 1 Each of these blocksets contains a large number of blocks that are used to define specific transfer functions or algorithms and a variety of input signals. 1 Depending on the version of Simulink, this may or may not contain some of the blocksets mentioned in this chapter. Introduction to Digital Signal Processing and Filter Design, by B. A. Shenoi Copyright © 2006 John Wiley & Sons, Inc. 381 382 HARDWARE DESIGN USING DSP CHIPS Figure 8.1 Screen capture of the Simulink browser and block diagram of a model. The GUI interface is used to drag and drop these blocks from the blockset and connect them to describe a block diagram representation of the dynamic system, which may be a continuous-time system or a discrete-time system. A mechanical system model [6] is shown in Figure 8.1. Simulink is based on object-oriented programming, and the blocks are represented as objects with appropriate prop- erties, usually specified in a dialog box. Indeed, the fdatool that we used in Chapter 7 can be launched from SIMULINK as an object or from the MATLAB command window because both of them are integrated together to operate in a seamless fashion. Simulink itself can be launched either by typing simulink in the MATLAB command window or by clicking the Simulink icon in its toolbar. For the simulation of a digital filter, we choose the DSP blockset, which contains the following blocks in a tree structure: DSP Blockset →DSP Sinks →DSP Sources →Estimation →Filtering →Adaptive Filters →Math Functions →Filter Design →Analog Filter Design →Platform Specific I/O →Multirate filters →Digital Filter Design →Quantizers →Digital Filter →Signal Management →Filter Realization Wizard →Signal Operations →Overlap-Add FFT filter →Statistics →Overlap-Save FFT filter →Transforms DESIGN PRELIMINARIES 383 When we open Simulink window, and click File→New→Model in sequence, we get a window for the new model. Then we drag the block shown above as Digital Filter Design and drop it in the window for the new model. When we click on this object in the new window, it opens the same window as the one for the fdatool shown in Figure 7.1. After we have imported the parameters of the digital filter that we designed in an earlier session, or after we have completed the design of the quantized filter as explained in Chapter 7, we use the Filter Realization Wizard shown above under the DSP Blockset and get the realization structure for the filter. This serves as the model for the filter to which we can now connect different types of sources and observe the output on the scope connected to the filter, as the sink. Very often, we are required to design a whole system, in which case a digital filter is the only major block in the system, but there are other subsystems integrated with it. So it may be necessary to use the blocks for the adaptive filters or multirate filters or the blocks from the Communication blockset and Controls blockset, besides the DSP Blockset , and so on. After building the block diagram model for the total digital signal processing system, and using Simulink to carry out extensive simulation of the model under varying conditions, we check to ensure that it meets the specifications satisfactorily; if not, we may have to modify the design of the filter or tune the parameters. For example, we may simulate the total system with a finite number of bits in floating-point or fixed-point format, using the Fixed Point blockset to represent all data. We may have to change the design completely and simulate the new system. 8.3 DESIGN PRELIMINARIES All the design and simulation of digital filters and digital systems done by MAT- LAB and Simulink is based on numerical computation of scientific theory. When this work is completed, we have to decide on one of the following choices: 1. Design a VLSI chip, using software such as VHDL, to meet our particular design specifications 2. Select a DSP chip from manufacturers such as Texas Instruments, Analog Devices, Lucent, or Motorola and program it to work as a digital system 3. Choose a general-purpose microprocessor and program it to work as a digital signal processor system. 4. Design the system using the field-programmable gate arrays (FPGAs). In all cases, several design considerations have to be explored as thoroughly as possible before we embark on the next step in hardware design. If we decide to select a DSP chip from one of the abovementioned manufac- tures, we have to consider the bandwidth of the signal(s) that the digital filter or the the digital system will be processing, based on which sampling frequency of the ADC is selected. However, the sampling frequency of the ADC may not be 384 HARDWARE DESIGN USING DSP CHIPS the same as the clock frequency of the CPU in the chip or the rate at which data will be transferred from and to the memory by the CPU (central processing unit). This in turn determines the rating in mips (millions of instructions per second). Depending on the amount of data or memory space required by the processor, the amount of power is determined. Other considerations are the I/O (input/output) interfaces, additional devices such as the power supply circuit, and the micro- controller, add-on memory, and peripheral devices. Finally the most important is the the cost per chip. We also need to consider the reliability of the software and technical support provided by the manufacturer; credibility and sustainability of the manufacturer also become important if the market for the digital filter or the system is expected to last for many years. The selection of the DSP chip is facilitated by an evaluation of the chips avail- able from the major manufacturers listed above and their detailed specifications. For example, the DSP Selection Guide, which can be downloaded from the TI (Texas Instruments) Website www.dspvillage.ti.com , is an immense source of information on all the chips available from them. The DSP chips provided by TI are divided into three categories. The fam- ily of TMS3206000 DSP platform are designed for systems with very high performance, ranging within 1200–5760 mips for fixed-point operation and 600–1350 mflops (million floating-point operations per second) for floating- point operation. The fixed-point DSPs are designated by TMS320C62x and TMS320C64x, and the floating-point DSPs belong to the TMS320C67x family. The fixed-point TMS32062x DSPs are optimized for multichannel, multifunc- tion applications such as wireless base stations, remote-access servers, digital subscriber loop (DSL) systems, central office switches, call processing, speech recognition, image processing, biometric equipment, industrial scanners, pre- cision instruments, and multichannel telephone systems. They use 16 bits for multiplication and 32 bits for instructions in single-precision format as well as double-precision format. The fixed-point TMS320C64x DSPs offer the high- est level of performance at clock rates of up to 720 MHz and 5760 mips, and they are best suited for applications in digital communications and video and image processing, wireless LAN (local area networking), network cameras, base station transceivers, DSL, and pooled modems, and so on. The floating-point TMS320C67x DSPs operate at 225 MHz and are used in similar applications. The TMS320C5000 DSP family is used in consumer digital equipments, namely, products used in the Internet and in consumer electronics. Therefore these chips are optimized for power consumption as low as 0.05 mW/mips and speeds of ≤300 MHz and 600 mips; the TMS320C54x DSPs are well known as the industry leader in portable devices such as cell phones(2G, 2.5G, and 3G), dig- ital audio (MP3) players, digital cameras, personal digital assistants (PDAs), GPS receivers, and electronic books. The TMS320C55x DSPs also deliver the highest power efficiency and are software-compatible with the TMS320C54x DSPs. The TMS320C2000 DSPs are designed for applications in digital con- trol industry, including industrial drives, servocontrol, factory automation, office equipment, controllers for pumps, fans, HVAC (heating–ventilation–air CODE GENERATION 385 conditioning), and other home appliances. The TMS320C28x DSPs offer 32-bit, fixed-point processing and 150 mips operation, whereas the TMS320C24x DSPs offer a maximum of 40 mips operation. More detailed information and specifications for the DSPs and other devices such as ADCs, and codecs (coders/decoders) supplied by TI can be found in the DSP Selection Guide. The amount of information on the software and hardware development tools, application notes, and other resource material that is freely available in this Website is enormous and indispensable. We must remember that DSP chips produced by other manufacturers such as Analog Devices may be better suited for specific applications, and they, too, provide a lot of information about their chips and the applications. 8.4 CODE GENERATION The next task is to generate a code in machine language that the DSP we have selected understands and that implements the algorithm for the digital system we have designed. First we have to convert the algorithm for the system under development to a code in C/C++ language. This can be done manually by one who is experienced in C language programming. Or we simulate the performance of the whole system modeled in Simulink, and use a blockset available in it, known as the Real-Time Workshop [7] to generate the ANSI Standard C code for the model. 2 The C code can be run on PCs, DSPs, and microcontrollers in real time and non–real time in a variety of target environments. We connect a rapid prototyping target, for example, the xPC Target, to the physical system but use the Simulink model as the interface to the physical target. With this setup, we test and evaluate the performance of the physical target. When the simulation is found to work satisfactorily, the Real-Time Workshop is used to create and download an executable code to the target system. Now we can monitor the performance of the target system and tune its parameters, if necessary. The Real-Time Workshop is useful for validating the basic concept and overall performance of the whole system that responds to a program in C code. An extension of Real-Time Workshop called the Real-Time Workshop Embed- ded Coder is used to generate optimized C code for embedded discrete-time systems. Note that the C code is portable in the sense that it is independent of any man- ufacturer’s DSP chip. But the manufacturers may provide their own software to generate the C code also, optimized for their particular DSP chip. However, pro- gramming a code in machine language is different for DSP chips from different manufacturers, and the different manufacturers provide the tools necessary to obtain the machine code from the C code for their DSP chips. 2 Depending on the version of MATLAB/Simulink package installed on the computer in the college or university, software such as FDA Tool, Real-Time Workshop and others mentioned in this chapter may or may not be available. 386 HARDWARE DESIGN USING DSP CHIPS 8.5 CODE COMPOSER STUDIO Texas Instruments calls its integrated development tool the Code Composer Stu- dio (IDE). The major steps to be carried out are outlined in Figure 8.2. Basically, these steps denote the C compiler, assembler, linker, debugger, simulator, and emulator functions. It must be pointed out that the other manufacturers also design DSP chips for various applications meeting different specifications; their own software bundle follows steps similar to those mentioned above for the Code Composer Studio (CCS) from Texas Instruments (TI). First the Code Composer Studio compiles the C/C++ code to an assembly language code in either mnemonic form or algebraic form, for the particular C/C++ Source files C/C++ Compiler C/C++ and Assembly files Assembler COFF Object files Linker Executable COFF Object Module DSP Debug Environment Simulator DSP/BIOS Support Host Emulation Support DSP Emulation Hardware JTAG Link DSP Application DSP/BIOS II Kernel Target Hardware Object library files RTS library files Macro library files Assembly Source Files Figure 8.2 Software development flow for generating the object code from the C code. CODE COMPOSER STUDIO 387 DSP platform that we have chosen. If we choose the TMS320C55x DSPs to illustrate the software development cycle, then the command used to invoke the C compiler is of the form c155[-options] [filenames] [-z[link options] [object files]] The [filenames] list the C program files, and other assembly language files, and even object files with their default extensions .c, .asm ,and .obj , respectively. The C language is not very efficient in carrying out a few specific operations, such as fixed-point data processing that are used in DSP applications. For this reason, assembly language files are added to the C language program files in order to improve the efficiency of the program in carrying out time-critical sections of the assembly language code delivered by the assembler. We can choose from many options in [-options] and in [link options] to control the way that the compiler shell processes the files listed in [filenames] and the way that the linker processes the object files. For more details, students should refer to the TI simulator user’s guide [25]. The next step is translation of the assembly language code by the assembler to the object code in binary form (or in machine language) specific to the DSP platform. The CCS command to invoke the assembler is of the form asm55 [input file [object file] [list file] [-options]] Since there might be several C program files that implement the original algo- rithm in small sections, the assembler produces the output file in several sections. It may also collect assembly source files from an external library, which imple- ment processes that are used again and again at several stages of the software and load them into the list of [filenames] . For example, Texas Instruments pro- vides a large number of highly optimized functions in three libraries, namely, the DSP library ( DSPLib ), the image processing library ( IMAGELib ), and the chip support library ( CSLib ). Then there are assembly files that are long programs and therefore are shortened to a macro so that they can be invoked by a single or a few lines of instructions. All of these external files are added to the list of assembly language files and converted to binary form, under a single format known as the common-object file format (COFF). The object file produces the object file in COFF format; the list file shows the binary object code as well as the assembly source code and where the program and the variables are allocated in the memory space. But they are allocated in temporary locations, not in absolute locations. Therefore these relocatable object files can be archived into a library of reusable files that may be used elsewhere. There are many options in the assembler, and their use is described in Ref. 25. The linker utility is invoked to combine all the object files generated by the assembler to one single linked object code, and this is done by assigning absolute addresses in the physical memory of the target DSP chip as specified by a memory map. The memory map is created by a linker command xfile , which lists the various sections of the assembly code and specifies the location of the starting 388 HARDWARE DESIGN USING DSP CHIPS address and length of memory space in RAM and ROM (random access and read-only memory), and where the individual sections are to be located in the RAM and ROM, as well as the various options. Then the linker command is invoked as follows: lnk55 command file.cmd The linker can call additional object files from an external library and also the runtime support (RTS) library files that are necessary during the debugging proce- dure. It also has many options that can be used to control the linker output, which is an executable COFF object module that has .out as its extension. Detailed information on the linker can be found in Ref. 17. Remember that information on compiler, assembler, and linker commands may be different for other DSP platforms, and information on these commands may be found in TI references appropriate for the DSP platform chosen. 8.6 SIMULATOR AND EMULATOR After we have created the executable COFF object module, we have to test and debug it by using software simulation and/or by hardware emulation. For low-cost simulation, we use the development starter kits, for example, the TMS3205402 DSP starter kit for the TMS320C54x DSP, and for more detailed evaluation and debugging, we use an evaluation board such as the TMS320C5409. Finally, we have the emulator boards such as the XDS510 JTAG emulator, which are used to run the object code under real-time conditions. The executable object code is downloaded to the DSP on the DSK board. The simulator program installed on the PC that is connected to the DSK board accepts the object code as its input and under the user’s control, simulates the same actions that would be taken by the DSP device as it executes the object code. The user can execute the object code one line at a time, by inserting breakpoints at a particular line of the object program, halt the operation of the program; view the contents of the data memory, program memory, auxiliary registers, stacks, and so on; display the contents of the registers, for example, the input and output of a filtering operation; and change the contents of any register if so desired. One can also observe or monitor the registers controlling the I/O hardware, serial ports, and other components. If minor changes are made, the Code Composer Studio reassembles and links the files quickly to accelerate the debugging process; otherwise the entire program has to be reassembled and linked before debugging can proceed. When the monitoring and fixing the bug at all breakpoints is over, execution of the program is resumed manually. By inserting probe points, Code Composer Studio enables us to read the data from a file or written to a file on the host PC, halting the execution of the program momentarily, and then resume it. It should be obvious that simulation on a DSK is a slow process and does not check the performance of the peripheral devices that would be connected to the digital system. CONCLUSION 389 In order to test the performance of the object code on the DSP in real time, we connect an emulator board to the PC by a parallel printer cable, and the XDS 510 Emulator conforms to the JTAG scan-based interface standard. The peripheral devices are also connected to the emulator board. A DSP/BIOS II plug-in is included in the Code Composer Studio to run the emulation of the software. It also contains the RTDX (real-time data exchange) module that allows transfer of data between the target DSP and the host PC in real time. The Code Composer Studio enables us to test and debug the performance of the software under real-time conditions, at full sampling rate. Without disrupting the execution of the software, the emulator controls its execution of the breakpoints, single- step execution, and monitoring of the memory and registers, and checks the performance of the whole system, including the peripheral devices. When the emulation of the whole system is found to operate correctly, the software is approved for production and marketing. This is a very brief outline of the hardware design process, carried out after the design of the digital system is completed by use of MATLAB and Simulink. Students are advised to refer to the extensive literature available from TI and other manufacturers, in order to become proficient in the use of all software tools available from them. For example, Analog Devices offers a development software called Visual DSP++, which includes a C++ compiler, assembler, linker, user interface, and debugging utilities for their ADSP-21xx DSP chips. 8.6.1 Embedded Target with Real-Time Workshop Simulink has been expanded to generate and simulate bit-true, timing-accurate code for directly designing DSP and FPGA targets and produce tests at system level. This software tool considerably reduces the design effort outlined above, as it facilitates the design of digital filters and systems obtained by the Signal Processing Toolbox and FDA Toolbox and generates executable machine code for hardware design. 8.7 CONCLUSION The material presented above is only a very brief outline of the design proce- dure that is necessary to generate the assembly language code from the C code, generate the object code using the assembler, and link the various sections of the object code to obtain the executable object code in machine language. Then this code is debugged by using an evaluation board, simulator, and emulator; all of these steps are carried out by using an integrated, seamless software such as the Code Composer Studio that was used to illustrate the steps. Like any design process, this is an iterative procedure that may require that we go back to earlier steps to improve or optimize the design, until we are completely satisfied with the performance of the whole system in real-time conditions. Then the software development is complete and is ready for use in the DSP chips chosen for the specific application. 390 HARDWARE DESIGN USING DSP CHIPS REFERENCES 1. S. M. Kuo and B. H. Lee, Real-Time Digital Signal Processing; Implementations, Applications and Experiments with TMS320C55X, Wiley, 2001. 2. S. M. Kuo and W.-S. Gan, Digital Signal Processors: Architectures, Implementations and Applications, Pearson Prentice-Hall, 2004. 3. R. Chassaing, Applications Using C and the TMS320C6XDSK, Wiley, 2002. 4. R. Chassaing and D. W. Horning, Digital Signal Processing with the TMS320C25, Wiley, 1990. 5. R. Chassaing, Digital Signal Processing and Applications with the C6713 and C416 DSK, Wiley, 2004. 6. The MathWorks, Inc., Learning Simulink, User’s Guide, 1994. 7. The MathWorks, Inc., Real-Time Workshop for Use with Simulink, User’s Guide. 8. P. Embree, C Algorithms for Real-Time DSP, Prentice-Hall, 1995. 9. The MathWorks, Inc., Embedded Target for the TI TMS320C6000 TM DSP Platform, for Use with Real-Time Workshop (R) , User’s Guide. 10. Texas Instruments, TMS320C6411 Fixed-Point Digital Signal Processor (SPRS196). 11. Texas Instruments, TMS320C6000 CPU and Instruction Set Reference Guide (SPRU189). 12. Texas Instruments, Manual Update Sheet for TMS 320C6000 CPU and Instruction Set Reference Guide (SPRZ168). 13. Texas Instruments, TMS320C64x Technical Overview (SPRU395). 14. Texas Instruments, Code Composer Studio Tutorial (SPRU301). 15. Texas Instruments, Code Composer Studio User’s Guide (SPRU328). 16. Texas Instruments, TMS320C6000 Programmer’s Guide (SPRU198). 17. Texas Instruments, TMS320C6000 Assembly Language Tools User’s Guide (SPRU186). 18. Texas Instruments, TMS320C6000 Optimizing C Compiler User’s Guide (SPRU187). 19. Texas Instruments, TMS320C6000 C Source Debugger User’s Guide (SPRU188). 20. Texas Instruments, TMS320C6000 DSP/BIOS User’s Guide (SPRU303). 21. Texas Instruments, TMS320C6000 DSP/BIOS Application Programming Interface (API) Reference Guide (SPRU403). 22. Texas Instruments, TMS320 DSP Algorithm Standard Rules and Guidelines (SPRU352). 23. Texas Instruments, TMS320 DSP Algorithm Standard Developer’s Guide (SPRU424). 24. Texas Instruments, TMS320C6000 Simulator User’s Guide (SPRU546). 25. Texas Instruments, TMS320C55x Optimizing C Compiler User’s Guide (SPRU281). 26. Texas Instruments, TMS320C55x Assembly Language Tools User’s Guide (SPRU380). 27. Analog Devices, ADSP-21xxx Family: Assembler Tools and Simulator Model, 1995. 28. Analog Devices, ADSP-2106x SHARC User’s Manual, 1997. 29. Motorola Inc., Motorola DSP Assembler Reference Manual, 1994. 30. Motorola Inc., DSP56xxx Digital Signal Processor: User’s Manual, 1993. . Module DSP Debug Environment Simulator DSP/ BIOS Support Host Emulation Support DSP Emulation Hardware JTAG Link DSP Application DSP/ BIOS II Kernel Target Hardware. complete and is ready for use in the DSP chips chosen for the specific application. 390 HARDWARE DESIGN USING DSP CHIPS REFERENCES 1. S. M. Kuo and B.