Tài liệu 77 Introduction to the TMS320 Family of Digital Signal Processors docx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	37
Dung lượng	416,92 KB

Nội dung

Papamichalis, P. “Introduction to the TMS320 Family of Digital Signal Processors” Digital Signal Processing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c  1999byCRCPressLLC 77 Introduction to the TMS320 Family of Digital Signal Processors Panos Papamichalis Texas Instruments 77.1 Introduction 77.2 Fixed-Point Devices: TMS320C25 Architecture and Fundamental Features 77.3 TMS320C25 Memory Organization and Access 77.4 TMS320C25 Multiplier and ALU 77.5 Other Architectural Features of the TMS320C25 77.6 TMS320C25 Instruction Set 77.7 Input/Output Operations of the TMS320C25 77.8 Subroutines, Interrupts, and Stack on the TMS320C25 77.9 Introduction to the TMS320C30 Digital Signal Processor 77.10 TMS320C30 Memory Organization and Access 77.11 Multiplier and ALU of the TMS320C30 77.12 Other Architectural Features of the TMS320C30 77.13 TMS320C30 Instruction Set 77.14 Other Generations and Devices in the TMS320 Family References This article discusses the architecture and the hardware characteristics of the TMS320 family of Digital Signal Processors. The TMS320 family includes several generations of programmable processors with several devices in each generation. Since the programmable processors are split between fixed-point and floating-point devices, both categories are examinedinsomedetail. The TMS320C25 serves here as a simple example for the fixed-point processor family, while the TMS320C30 is used for the floating-point family. 77.1 Introduction Since its introduction in 1982 with the TMS32010 processor, the TMS320 family of DSPs has been exceedingly popular. Different members of this family were introduced to address the existing needs for real-time processing, but then, designers capitalized on the features of the devices to create solutions and productsin waysneverimagined before. Inturn, these innovationsfed the architectural and hardware configurations of newer generations of devices. Digital Signal Processing encompasses a variety of applications, such as digital filtering, speech and audio processing, image and video processing, and control. All DSP applications share some c  1999 by CRC Press LLC common characteristics: • The algorithms used are mathematically intensive. A typical example is the computation of an FIR filter, implemented as sum-of-products. This operation involves a lot of multiplications combined with additions. • DSP algorithms must typically run in real time: i.e., the processing of a segment of the arriving signal must be completed before the next segment arrives, or else data will be lost. • DSP techniques are under constant development. This implies that DSP systems should be flexible to support changes and improvements in the state of the art. As a result, programmable processors have been the preferred way of implementation. In recent times, though, fixed-function devices have also been introduced to address high-volume consumer applications with low-cost requirements. These needs are addressed in the TMS320 family of DSPs by using appropriate architecture, instruction sets, I/O capabilities, as well as the raw speed of the devices. However, it should be kept in mind that these features do not cover all the aspects describing a DSP device, and especially a programmable one. Availability and quality of software and hardware development tools (such as compilers, assemblers, linker, simulators, hardware emulators, and development systems), application notes, third-party products and support, hot-line support, etc. play an important role on how easy it will be to develop an application on the DSP processor. The TMS320 family has very extensive such support, but its description goes beyond the scope of this article. The interested reader should contact the TI DSP hotline (Tel. 713-274-2320). For the purposes of this article, two devices have been selected to be highlighted from the Texas Instruments TMS320 family of digital signal processors. One is the TMS320C25, a 16-bit, fixed-point DSP, and the other is the TMS320C30, a 32-bit, floating-point DSP. As a short-hand notation, they will be called ‘C25 and ‘C30, respectively. The choice was made so that both fixed-point issues are considered. There have been newer (and more sophisticated) generations added to the TMS320 family but, since the objective of this article is to be more tutorial, they will be discussed as extensions of the ‘C25 and the ‘C30. Such examples are other members of the ‘C2x and the ‘C3x generations, as well as the TMS320C5x generation (‘C5x for short) of fixed-point devices, and the TMS320C4x (‘C4x) of floating-point devices. Customizable and fixed-function extensions of this family of processors will be also discussed. Texas Instruments, like all vendors of DSP devices, publishes detailed User’s Guides that explain at great length the features and the operation of the devices. Each of these User’s Guides is a pretty thick book, so it is not possible (or desirable) to repeat all this information here. Instead, the objective of this article is to give an overview of the basic features for each device. If more detail is necessary for an application, the reader is expected to refer to the User’s Guides. If the User’s Guides are needed, it is very easy to obtain them from Texas Instruments. 77.2 Fixed-Point Devices: TMS320C25 Architecture and Fundamental Features The Texas Instruments TMS320C25 is a fast, 16-bit, fixed-point digital signal processor. The speed of the device is 10 MHz, which corresponds to a cycle time of 100 ns. Since the majority of the instructions execute in a single cycle, the figure of 100 ns also indicates how long it takes to execute one instruction. Alternatively, we can say that the device can execute 10 million instructions per second (MIPS). The actual signal from the external oscillator or crystal has a frequency four times higher, at 40 MHz. This frequency is then divided on-chip to generate the internal clock with a c  1999 by CRC Press LLC period of 100 ns. Figure 77.1 shows the relationship between the input clock CLKIN from the external oscillator, and the output clock CLKOUT. CLKOUT is the same as the clock of the device, and it is related to CLKIN by the equation CLKOUT = CLKIN /4. Note that in Fig. 77.1 the shape of the signal is idealized ignoring rise and fall times. FIGURE 77.1: Clock timing of the TMS320C25. CLKIN = external oscillator; CLKOUT = clock of the device. Newer versions of the TMS320C25 operate in higher frequencies. For instance, there is a spinoff that has a cycle time of 80 ns, resulting in a 12.5 MIPS operation. There are also slower (and cheaper) versions for applications that do not need this computational power. Figure 77.2 shows in a simplified form the key features of the TMS320C25. The major parts of the DSP processor are the memory, the Central Processing Unit (CPU), the ports, and the peripherals. Each of these parts will be examined in more detail later. The on-chip memory consists of 544 words of RAM (read/write memory) and 4K words of ROM (read-only memory). In the notation used here, 1K = 1024 words, and 4K = 4 × 1024 = 4096 words. Each word is 16 bits wide and, when some memory size is given, it is measured in 16-bit words, and not in bytes (as is the custom in microprocessors). Of the 544 words of RAM, 256 words can be used as either program or data memory, while the rest is only data memory. All 4K of on-chip ROM is program memory. Overall, the device can address 64K words of data memory and 64K words of program memory. Except for what resides on-chip, the rest of the memory is external, supplied by the designer. The CPU is the heart of the processor. Its most important feature, distinguishing it from the traditional microprocessors, is a hardware multiplier that is capable of performing a 16 × 16 bit multiplication in a single cycle. To preserve higher intermediate accuracy of results, the full 32- bit product is saved in a product register. The other important part of the CPU is the Arithmetic Logic Unit (ALU) that performs additions, subtractions, and logical operations. Again, for increased intermediate accuracy, there is a 32-bit accumulator to handle all the ALU operations. All the arithmetic and logical functions are accumulator-based. In other words, these operations have two operands, one of which is always the accumulator. The result of the operation is stored in the accumulator. Because of this approach the form of the instructions is very simple indicating only what the other operand is. This architectural philosophy is very popular but it is not universal. For instance, as is discussed later, the TMS320C30 takes a different approach, where there are several “accumulators” in what is called a register file. Other components of the TMS320C25 CPU are several shifters to facilitate manipulation of the data and increase the throughput of the device by performing shifting operations in parallel with other functions. As part of the CPU, there are also eight auxiliary registers that can be used as memory pointers or loop counters. There are two status registers, and an 8-deep hardware stack. The stack c  1999 by CRC Press LLC FIGURE 77.2: Key architectural features of the TMS320C25. is used to store the memory address where the program will continue execution after a temporary diversion to a subroutine. To communicate with external devices, the TMS320C25 has 16 input and 16 output parallel ports. It also has a serial port that can serve the same purpose. The serial port is one of the peripherals that have been implemented on chip. Other peripherals include the interrupt mask, the global memory capability, and a timer. The above components of the TMS320C25 are examined in more detail below. The device has 68 pins that are designated to perform certain functions, and to communicate with other devices on the same board. The names of the signals and the corresponding definitions appear in Table 77.1. The first column of the table gives the pin names. Note that a bar over the name indicates that the pin is in the active position when it is electrically low. For instance, if the pins take the voltage levels of 0 V and 5 V, a pin indicated with an overbar is asserted when it is set at 0 V. Otherwise, assertion occurs at 5 V. The second column indicates if the pin is used for input to the device or output from the device or both. The third column gives a description of the pin functionality. Understanding the functionality of the device pins is as important as understanding the internal architecturebecause it provides the designer with the toolsavailabletocommunicate with the external world. The DSP device needs to receive data and, often, instructions from the external sources, and send the results back to the external world. Depending on the paths available for such transactions, the design of a program can take very different forms. Within this framework, it is up to the designer to generate implementations that are ingenious and elegant. The TMS320C25 has its own assembly language to be programmed. This assembly language consists of 133 instructions that perform general-purpose and DSP-specific functions. Familiarity with the instruction set and the device architecture are the two components of efficient program implementation. High-level-language compilers have also been developed that make the writing of programs an easier task. For the TMS320C25, there isaCcompiler available. However, there is always a loss of efficiency when programming in high-level languages, and this may not be acceptable in computation-bound real-time systems. Besides, for complete understanding of the device it is necessary to consider the assembly language. c  1999 by CRC Press LLC TABLE 77.1 Names and Functionality of the 68 pins of the TMS320C25 Signals I/O/Z a Definition V CC I 5-V supply pins V SS I Ground pins X1 O Output from internal oscillator for crystal X2/CLKIN I Input to internal oscillator from crystal or external clock CLKOUT1 O Master clock output (crystal or CLKIN frequency/4) CLKOUT2 O A second clock output signal D15-D0 I/O/Z 16-bit data bus D15 (MSB) through DO (LSB). Multiplexedbetween program, data, and I/O spaces. A15-A0 O/Z 16-bit address bus A15 (MSB) through AO (LSB) PS,DS, IS O/Z Program, data, and I/O space select signals R/ W O/Z Read/write signal ST RB O/Z Strobe signal RS I Reset input INT 2- INT 0 I External user interrupt inputs MP/ MC I Microprocessor/microcomputer mode select pin MSC O Microstate complete signal IACK O Interrupt acknowledge signal READY I Data ready input. Asserted by external logic when using slower devices to indicate that the current bus transaction is complete. BR O Busrequestsignal. Assertedwhenthe TMS320C25requiresaccesstoanexternal global data memory space. XF O External flag output (latched software-programmable signal) HOLD I Hold input. When asserted. TMS320C25 goes into an idle mode and places the data, address, and control lines in the high impedance state. H OLDA O Hold acknowledge signal. SYNC I Synchronization input. BIO I Branch control input. Polled by BIOZ instruction DR I Serial data receive input CLKR I Clock for receive input for serial port FSR I Frame synchronization pulse for receive input DX O/Z Serial data transmit output CLKX I Clock for transmit output for serial port FSX I/O/Z Frame synchronization pulse for transmit. Configurable as either an input or an output. a I/O/Z denotes input/output/high-impedance state. Note: The first column is the pin name; the second column indicates if it is an input or an output pin; the third column gives a description of the pin functionality. A very important characteristic of the device is its Harvard architecture. In Harvard architecture (see Fig. 77.3), the program and data memory spaces are separated and they are accessed by different buses. One bus accesses the program memory space to fetch the instructions, while another bus is used to bring operands from the data memory space and store the results back to memory. The objective of this approach is to increase the throughput by bringing instructions and data in parallel. Analternatephilosophy is thevonNeumanarchitecture. The vonNeumanarchitecture (see Fig. 77.4) uses a single bus and a unified memory space. Unification of the memory space is convenient for partitioning it between program and data, but it presents a bottleneck since both data and program instructions must use the same path and, hence, they must be multiplexed. The Harvard architecture of multiple buses is used in digital signal processorsbecausethe increased throughput is of paramount importance in real-time systems. The difference of the architectures is important because it influences the programming style. In Harvard architecture, two memory locations can have the same address, as long as one of them is in the data space and the other is in the program space. Hence, when the programmer uses an address label, he has to be alert as to what space he is referring. Another restriction of the Harvard architecture is that the data memory cannot be initialized during loading because loading refers only to placing the program on the memory (and the program memory is separate from the data memory). Datamemorycan be initialized during execution only. The programmer must incorporate such initialization in his program code. As it will be seen later, such restrictions have been removed from the TMS320C30 while retaining the convenient feature of multiple buses. Figure 77.5 shows a functional block diagram of the TMS320C25 architecture. The Harvard c  1999 by CRC Press LLC FIGURE 77.3: Simplified block diagram of the Harvard architecture. FIGURE 77.4: Simplified block diagram of the von Neuman architecture. architecture of the device is immediately apparent from the separate program and data buses. What is not apparent is that the architecture has been modified to permit communication between the two buses. Through such communication, it is possible to transfer data between the program and memory spaces. Then, the program memory space also can be used to store tables. The transfer takes place by using special instructions such as TBLR (Table Read), TBLW (Table Write), and BLKP (Block transfer from Program memory). As shown in the block diagram, the program ROM is linked to the program bus, while data RAM blocks B1 and B2 are linked to the data bus. The RAM block B0 can be configured either as program or data memory (using the instructions CNFP and CNFD), and it is multiplexed with both buses. The different segments, such as the multiplier, the ALU, the memories, etc. are examined in more detail below. 77.3 TMS320C25 Memory Organization and Access Besides the on-chip memory (RAM and ROM),the TMS320C25 can accessexternalmemory through the external bus. This bus consists of the 16 address pins A0-A15, and the 16 data pins D0-D15. The address pins carry the address to be accessed, while the data pins carry the instruction word or the operand, depending on whether program or data memory is accessed. The bus can access either program or data memory, the difference indicated by which of the pins PS and DS (with overbars) becomes active. The activation is done automatically when, during the execution, an instruction or a piece of data needs to be fetched. Since the address is 16-bits wide, the maximum memory space c  1999 by CRC Press LLC FIGURE 77.5: Functional block diagram of the TMS320C25 architecture. c  1999 by CRC Press LLC FIGURE 77.6: Memory maps for program and data memory of the TMS320C25. is 64K words for program and 64K words for data. The device starts execution after a reset signal, i.e., after the RS pin is pulled low for a short period of time. The execution always begins at program memory location 0, where there should be an instruction to direct the program execution to the appropriate location. This direction is accomplished by a branch instruction. BPROG which loads the program counter with the program memory address that has the label PROG (or any other label you choose). Then, execution continues from the address PROG, where, presumably, a useful program has been placed. It is clear that the program memory location 0 is very important, and you need to know where it is physically located. The TMS320C25 gives you the flexibility to use as location 0 either the first location of the on-chip ROM,or the first location of the external memory. In the first case, we say that the device operates in the microcomputer mode, while in the second one it is in the microprocessor mode. In the microprocessor mode, the on-chip ROMis ignored altogether. You can choose between the two modes by pulling the device MP/MC high or low. The microcomputer mode is useful for production purposes, while for laboratory and development work the microprocessor mode is used exclusively. Figure 77.6 shows the memory configuration of the TMS320C25, where the microprocessor and microcomputer configurations of the program memory are depicted separately. The data memory is partitioned in 512 sections, called pages, of 128 words each. The reason of the partitioning is for addressing purposes, as will be discussed below. Memory boundaries of the 64K memory space are shown in both decimal and hexadecimal notation (hexadecimal notation indicated by an “h” or “H” at the end.) Compare this map with the block diagram in Fig. 77.5. As mentioned earlier, in two-operand operations, one of the operands resides in the accumulator, and the result is also placed in the accumulator. (The only exceptions is the multiplication operation examined later.) The other operand can either reside in memory or be part of the instruction. In the lattercase, the value to be combinedwith the accumulator is explicitly specified in the instruction, and this addressing mode is called immediate addressing mode. In the TMS320C25 assembly language, the immediate addressing mode instructions are indicated by a “K” at the end of the instruction. c  1999 by CRC Press LLC For example, the instruction ADDK 5 increments the contents of the accumulator by 5. If the value to be operated upon resides in memory, there are two ways to access it: either by specifying the memory address directly (direct addressing) or by using a register that holds the address of that number (indirect addressing). As a general rule, it is desirable to describe an instruction as briefly as possible so that the whole description can be held in one 16-bit word. Then, when the program is executed, only one word needs to be fetched before all the information from the instruction is available for execution. This is not always possible and there are two-word instructions as well, but the chip architects always strive to achieve one-word instructions. In the direct addressing mode, full description of a memory address would require a 16-bit word by itself because the memory space is 64K words. To reduce that requirement, the memory space is divided in 512 pages of 128 words each. An instruction using direct addressing contains the 7 bits indicating what word you want to access within a page. The page number (9 bits) is stored in a separate register (actually, part of a register), called the Data Page pointer (DP). You store the page number in the DP pointer by using the instructions LDP (Load Data Page pointer) or LDPK (Load Data Page pointer immediate). In the indirect addressing mode, the data memory address is held in a register that acts as a memory pointer. There are eight such registers available, called auxiliary registers, AR0-AR7. The auxiliary registerscan also be used for other functions, suchasloop counters, etc. To save bits in the instruction, the auxiliary register used as memory pointer is not indicated explicitly, but it is stored in a separate register (actually, part of a register), the auxiliary register pointer (ARP). In other words, there is the concept of the “current register”. In an operation using indirect addressing, the contents of the current auxiliary register point to the desired memory location. The current AR is specified by the contents of the ARP as shown in Fig. 77.7. In an instruction, indirect addressing is indicated by an asterisk. FIGURE 77.7: Example of indirect addressing mode. A“+” sign at the end of an instruction using indirect addressing means “after the present memory access, increment the contents of the current auxiliary register by 1”. This is done in parallel with the load-accumulator operation. The above autoincrementing of the auxiliary register is an optional operation that offers additional flexibility to the programmer. And it is not the only one available. The TMS320C25 has an auxiliary register arithmetic unit (ARAU, see Fig. 77.5) that can execute c  1999 by CRC Press LLC [...]... contents on the top of the stack (TOS) The TOS now contains the address of the instruction to be executed after returning from the subroutine • Loads the address SUBRTN on the PC • Starts execution from where the PC is pointing at (i.e., from location SUBRTN) At the end of the subroutine execution, a return instruction (RET) will pop the contents of the top of the stack on the program counter, and the program... on the accumulator, which is also 32-bits wide The contents of the product register can be loaded on the accumulator, overwriting whatever was there, using the PAC (product to accumulator) instruction It can also be added to or subtracted from the accumulator using the instructions APAC or SPAC FIGURE 77. 8: Diagram of the TMS320C25 multiplier and ALU When moving the contents of the T-register to the. .. devices of the TMS320 family in order to examine in detail their features However, the TMS320 family consists of five generations (three fixed-point and two floating-point) of digital signal processors (as well as the latest addition, the TMS320C8x generation, also known as MVP, Multimedia Video Processors) The fixed-point devices c 1999 by CRC Press LLC are members of the TMS320C1x, TMS320C2x, or TMS320C5x... with the block repeat instruction The repeat-start (RS) contains the beginning of the loop, and the repeat-end (RE) the end of the loop These registers are initialized automatically by the processor, but they are available to the user in case he needs to save them On the TMS320C30, there are several internal and external interrupts, which are prioritized, i.e., when several of the interrupts occur at the. .. shifter to the ALU and the accumulator can shift the input value to the left by up to 16 locations, while output shifters from the accumulator can shift either the high or the low part of the accumulator by up to 7 locations to the left A construct that appears very often in mathematical computations is the sum of products Sums of products appear in the computation of dot products, in matrix multiplication,... the ALU The CPU configuration is shown in Fig 77. 16 which depicts the multiplier and the ALU of the TMS320C30 The hardware multiplier can perform both integer and floating-point multiplications in a single machine cycle FIGURE 77. 16: Central processing unit (CPU) of the TMS320C30 The inputs to the multiplier come from either the memory or the registers of the register file The outputs are placed in the register... reinstated in the program counter when the execution returns from the subroutine call The programmer has control over the stack by using the PUSH, PSHD, POP, and POPD instructions The PUSH and POP operations push the accumulator on the stack or pop the top of the stack to the accumulator respectively PSHD and POPD do the same functions but with memory locations instead of the accumulator Occasionally the program... use the stack to pass arguments to subroutines or to save information during an interrupt In other words, the stack is a convenient scratch-pad that you designate at the beginning, so that you do not have to worry where to store some temporary values c 1999 by CRC Press LLC 77. 11 Multiplier and ALU of the TMS320C30 The heart of the TMS320C30 is the CPU consisting, primarily, of the multiplier and the. .. absolute value of the number is not large enough to fill all the bits of the word, there will be more than one sign bits As seen from Fig 77. 8, the multiplier path is not the only way to access the accumulator Actually, the ALU and the accumulator support a wealth of arithmetic (ADD, SUB, etc.) and logical (OR, AND, XOR, etc.) instructions, in addition to load and store instructions for the accumulator (LAC,... certain functions, and to communicate with other devices on the same board The names of the signals and the corresponding definitions appear in Table 77. 3 The first column of the table gives the pin names; the second one indicates if the pin is used for input or output; the third column gives a description of the pin functionality Note that a bar over the name indicates that the pin is in the active position . on the TMS320C25 77. 9 Introduction to the TMS320C30 Digital Signal Processor 77. 10 TMS320C30 Memory Organization and Access 77. 11 Multiplier and ALU of the. discusses the architecture and the hardware characteristics of the TMS320 family of Digital Signal Processors. The TMS320 family includes several generations of

Ngày đăng: 25/12/2013, 06:16

Xem thêm