ARM Architecture Reference Manual- P26

VFP Addressing Modes 5.1.3 Scalar operations If the destination register lies in the first bank of eight registers, the instruction specifies a scalar operation: if d_bank == vec_len Sd[0] = Sn[0] = Sm[0] = then = d_num n_num m_num Note Source operands ARM DDI 0100E The source operands are always scalars, regardless of which bank they are in This allows individual elements of vectors to be used as scalars Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-5 VFP Addressing Modes 5.1.4 Mixed vector/scalar operations If the destination register specified in the instruction does not lie in the first bank of eight registers, but the second source register does, then the destination register and first source register specify vectors and the second source register specifies a scalar: if d_bank != and m_bank == then vec_len = vector length specified by FPSCR for i = to vec_len-1 Sd[i] = (d_bank then n_index = n_index - Notes First source operand The first operand is always a vector, regardless of which bank it is in This allows a set of consecutive registers in the first bank to be treated as a vector Vector wrap-around A vector operand must not wrap around so that it re-uses its first element Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this is not a restriction, because the vector length is at most When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most Operand overlap If two operands overlap, they must be identical both in terms of which registers are accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that: • If the set of register numbers generated in Sd[i] overlaps the set of register numbers generated in Sn[i], then d_num and n_num must be identical • If the set of register numbers generated in Sn[i] includes m_num, the vector length must be It is impossible for the set of register numbers generated in Sd[i] to include m_num, because they lie in different banks C5-6 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes 5.1.5 Vector operations If neither the destination register nor the second source register lies in the first bank of eight registers, then all register operands specify vectors: if d_bank != and m_bank != then vec_len = vector length specified by FPSCR for i = to vec_len-1 Sd[i] = (d_bank then m_index = m_index - Notes Vector wrap-around A vector operand must not wrap around so that it re-uses its first element Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this is not a restriction, since the vector length is at most When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most Operand overlap If two operands overlap, they must be identical both in terms of which registers are accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that: • • If the set of register numbers generated in Sd[i] overlaps the set of register numbers generated in Sm[i], then d_num and m_num must be identical • ARM DDI 0100E If the set of register numbers generated in Sd[i] overlaps the set of register numbers generated in Sn[i], then d_num and n_num must be identical If the set of register numbers generated in Sn[i] overlaps the set of register numbers generated in Sm[i], then n_num and m_num must be identical Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-7 VFP Addressing Modes 5.2 Addressing Mode - Double-precision vectors (non-monadic) 31 28 27 26 25 24 23 22 21 20 19 cond 1 Op Op 16 15 Dn 12 11 10 Dd 1 Op 0 Dm When the vector length indicated by the FPSCR is greater than 1, the double-precision two-operand instructions FADDD, FDIVD, FMULD, FNMULD, and FSUBD can specify three different types of behavior: • One arithmetic operation between two scalar values, yielding a scalar: ScalarA op ScalarB → ScalarD • When this case is selected (see Scalar operations on page C5-11), it causes just one operation to be performed, overriding the vector length specified in the FPSCR This allows scalar operations and vector operations to be mixed without the need to reprogram the FPSCR between them A set of N arithmetic operations, where N is the vector length specified in the FPSCR, with the first operand scanning through a vector, the second operand remaining constant and the destination scanning through a vector: VectorA[0] op ScalarB → VectorD[0] VectorA[1] op ScalarB → VectorD[1] VectorA[N-1] op ScalarB → VectorD[N-1] This can be abbreviated to: VectorA op ScalarB → VectorD • A set of N arithmetic operations, where N is the vector length specified in the FPSCR, with both operands and the destination scanning through vectors: VectorA[0] op VectorB[0] → VectorD[0] VectorA[1] op VectorB[1] → VectorD[1] VectorA[N-1] op VectorB[N-1] → VectorD[N-1] This can be abbreviated to: VectorA op VectorB → VectorD The double-precision three-operand instructions FMACD, FMSCD, FNMACD and FNMSCD each use the same register for their addition/subtraction operand as for their destination So they have three forms corresponding to the above three: • A pure scalar form: ± (ScalarA * ScalarB) ± ScalarD → ScalarD • A form in which the second multiplication operand is a scalar and everything else scans through vectors: ± (VectorA[0] * ScalarB) ± VectorD[0] → VectorD[0] ± (VectorA[1] * ScalarB) ± VectorD[1] → VectorD[1] ± (VectorA[N-1] * ScalarB) ± VectorD[N-1] → VectorD[N-1] C5-8 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes This can be abbreviated to: ± (VectorA * ScalarB) ± VectorD → VectorD • A form in which everything scans through a vector: ± (VectorA[0] * VectorB[0]) ± VectorD[0] → VectorD[0] ± (VectorA[1] * VectorB[1]) ± VectorD[1] → VectorD[1] ± (VectorA[N-1] * VectorB[N-1]) ± VectorD[N-1] → VectorD[N-1] This can be abbreviated to: ± (VectorA * VectorB) ± VectorD → VectorD 5.2.1 Register banks To allow these various forms to be specified, the set of 16 double-precision registers is split into four banks, each of four registers The form used by an instruction depends on which operands are in the first bank The general principle behind the rules is that the first bank must be used to hold scalar operands while the other banks are used to hold vector operands All destination register writes and many source register reads adhere to this principle, but some source register reads can result in scalar access to vector elements or vector accesses to groups of scalars A vector operand consists of 2-4 registers from a single bank, with the number of registers being specified by the vector length field of the FPSCR (see Vector length/stride control on page C2-22) The register number in the instruction specifies the register that contains the first element of the vector Each successive element of the vector is formed by incrementing the register number by the value specified by the vector stride field of the FPSCR If this causes the register number to overflow the top of the register bank, the register number wraps around to the bottom of the bank, as shown in Figure 5-2 Scalar bank Vector bank Vector bank Vector bank d0 d4 d8 d12 d1 d5 d9 d13 d2 d6 d10 d14 d3 d7 d11 d15 Figure 5-2 Double-precision register banks ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-9 VFP Addressing Modes 5.2.2 Operation The following pages describe each of the three possible forms of the addressing mode: • Scalar operations on page C5-11 • Mixed vector/scalar operations on page C5-12 • Vector operations on page C5-13 In each case, the following values are generated: The number of individual operations specified by the instruction vec_len Dd[0] Dd[vec_len-1] Destination registers of the individual operations Dn[0] Dn[vec_len-1] First source registers of the individual operations Dm[0] Dm[vec_len-1] Second source registers of the individual operations The register numbers specified in the instruction are broken up into bank numbers and indices within the banks as follows: d_bank = Dd[3:2] d_index = Dd[1:0] n_bank = dn[3:2] n_index = Dn[1:0] m_bank = Dm[3:2] m_index = Dm[1:0] Note The case where the FPSCR specifies a vector length of is not in fact a special case, since the rules for all three forms of the addressing mode simplify to the following when the vector length is 1: vec_len Dd[0] = Dn[0] = Dm[0] = C5-10 = Dd Dn Dm Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes 5.2.3 Scalar operations If the destination register lies in the first bank of four registers, the instruction specifies a scalar operation: if d_bank == vec_len Dd[0] = Dn[0] = Dm[0] = then = Dd Dn Dm Notes Source operands ARM DDI 0100E The source operands are always scalars, regardless of which bank they are in This allows individual elements of vectors to be used as scalars Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-11 VFP Addressing Modes 5.2.4 Mixed vector/scalar operations If the destination register specified in the instruction does not lie in the first bank of four registers, but the second source register does, then the destination register and first source register specify vectors and the second source register specifies a scalar: if d_bank != and m_bank == then vec_len = vector length specified by FPSCR for i = to vec_len-1 Dd[i] = (d_bank then n_index = n_index - Notes First source operand The first operand is always a vector, regardless of which bank it is in This allows a set of consecutive registers in the first bank to be treated as a vector Vector wrap-around A vector operand must not wrap around so that it re-uses its first element Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this implies that the vector length must be at most When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most Operand overlap If two operands overlap, they must be identical both in terms of which registers are accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that: • If the set of register numbers generated in Dd[i] overlaps the set of register numbers generated in Dn[i], then Dd and Dn must be identical • If the set of register numbers generated in Dn[i] includes Dm, then the vector length must be It is impossible for the set of register numbers generated in Dd[i] to include Dm, because they lie in different banks C5-12 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes 5.2.5 Vector operations If neither the destination register nor the second source register lies in the first bank of four registers, then all register operands specify vectors: if d_bank != and m_bank != then vec_len = vector length specified by FPSCR for i = to vec_len-1 Dd[i] = (d_bank then m_index = m_index - Notes Vector wrap-around A vector operand must not wrap around so that it re-uses its first element Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this implies that the vector length must be at most When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most Operand overlap If two operands overlap, they must be identical both in terms of which registers are accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that: • • If the set of register numbers generated in Dd[i] overlaps the set of register numbers generated in Dm[i], then Dd and Dm must be identical • ARM DDI 0100E If the set of register numbers generated in Dd[i] overlaps the set of register numbers generated in Dn[i], then Dd and Dn must be identical If the set of register numbers generated in Dn[i] overlaps the set of register numbers generated in Dm[i], then Dn and Dm must be identical Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-13 VFP Addressing Modes 5.3 Addressing Mode - Single-precision vectors (monadic) 31 28 27 26 25 24 23 22 21 20 19 cond 1 1 D 1 16 15 Op 12 11 10 Fd 1 Op M 0 Fm When the vector length indicated by the FPSCR is greater than 1, the single-precision one-operand instructions FABSS, FCPYS, FNEGS, and FSQRTS can specify three different types of behavior: • An operation on a scalar value, yielding a scalar: Op(ScalarB) → ScalarD When this case is selected (see Scalar-to-scalar operations on page C5-16), it causes just one operation to be performed, overriding the vector length specified in the FPSCR This allows scalar operations and vector operations to be mixed without the need to reprogram the FPSCR between them • An operation on a scalar value, whose result is written to each of the N elements of a vector, where N is the vector length specified in the FPSCR: Op(ScalarB) → VectorD[0] Op(ScalarB) → VectorD[1] Op(ScalarB) → VectorD[N-1] This can be abbreviated to: Op(ScalarB) → VectorD • A set of N operations, where N is the vector length specified in the FPSCR, with both the operand and the destination scanning through vectors: Op(VectorB[0]) → VectorD[0] Op(VectorB[1]) → VectorD[1] Op(VectorB[N-1]) → VectorD[N-1] This can be abbreviated to: Op(VectorB) → VectorD To allow these various forms to be specified, the set of 32 single-precision registers is split into four banks, each of eight registers For a description of this, see Register banks on page C5-3 C5-14 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes 5.4.1 Operation The following pages describe each of the three possible forms of the addressing mode: • Scalar-to-scalar operations on page C5-21 • Scalar-to-vector operations on page C5-22 • Vector-to-vector operations on page C5-23 In each case, the following values are generated: vec_len The number of individual operations specified by the instruction Dd[0] Dd[vec_len-1] Destination registers of the individual operations Dm[0] Dm[vec_len-1] Source registers of the individual operations The register numbers specified in the instruction are broken up into bank numbers and indices within the banks as follows: d_bank = Dd[3:2] d_index = Dd[1:0] m_bank = Dm[3:2] m_index = Dm[1:0] Note The case where the FPSCR specifies a vector length of is not in fact a special case, since the rules for all three forms of the addressing mode simplify to the following when the vector length is 1: vec_len = Dd[0] = Dd Dm[0] = Dm C5-20 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes 5.4.2 Scalar-to-scalar operations If the destination register lies in the first bank of four registers, the instruction specifies a scalar operation: if d_bank == vec_len Dd[0] = Dm[0] = then = Dd Dm Notes Source operands ARM DDI 0100E The source operand is always a scalar, regardless of which bank it lies in This allows individual elements of vectors to be used as scalars Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-21 VFP Addressing Modes 5.4.3 Scalar-to-vector operations If the destination register specified in the instruction does not lie in the first bank of four registers, but the source register does, then the destination register specifies a vector and the source register specifies a scalar: if d_bank != and m_bank == then vec_len = vector length specified by FPSCR for i = to vec_len-1 Dd[i] = (d_bank then d_index = d_index - Notes Vector wrap-around A vector operand must not wrap around so that it re-uses its first element Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this implies that the vector length must be at most When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most Operand overlap C5-22 If the source and destination overlap, they must be identical both in terms of which registers are accessed and the order in which they are accessed This implies that if the set of register numbers generated in Dn[i] includes Dm, the vector length must be Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes 5.4.4 Vector-to-vector operations If neither the destination register nor the source register lies in the first bank of four registers, then both register operands specify vectors: if d_bank != and m_bank != then vec_len = vector length specified by FPSCR for i = to vec_len-1 Dd[i] = (d_bank then m_index = m_index - Notes Vector wrap-around A vector operand must not wrap around so that it re-uses its first element Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this implies that the vector length must be at most When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most Operand overlap ARM DDI 0100E If the source and destination overlap, they must be identical both in terms of which registers are accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that if the set of register numbers generated in Dd[i] overlaps the set of register numbers generated in Dm[i], then Dd and Dm must be identical Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-23 VFP Addressing Modes 5.5 Addressing Mode - VFP load/store multiple 31 28 27 26 25 24 23 22 21 20 19 cond 1 P U D W L 16 15 Rn 12 11 Fd cp_num offset The VFP load multiple instructions (FLDMD, FLDMS, FLDMX) are examples of ARM LDC instructions, whose addressing modes are described in Addressing Mode - Load and Store Coprocessor on page A5-56 Similarly, the VFP store multiple instructions (FSTMD, FSTMS, FSTMX) are examples of ARM STC instructions, which have the same addressing modes However, the full range of LDC/STC addressing modes is not available for the VFP load multiple and store multiple instructions This is partly because the FLDD, FLDS, FSTD and FSTS instructions use some of the options, and partly because the 8_bit_offset field in the LDC/STC instruction is used for additional purposes in the VFP instructions This section gives details of the LDC/STC addressing modes that are allowed for the VFP load multiple and store multiple instructions, and the assembler syntax for each option 5.5.1 Summary Whether an LDC/STC addressing mode is allowed for the VFP load multiple and store multiple instructions can be determined by looking at the P, U and W bits of the instruction Table 5-1 shows details of this Table 5-1 VFP load/store addressing modes P W Instructions Mode 0 UNDEFINED See Note 0 UNDEFINED See Note FLDMD, FLDMS, FLDMX, FSTMD, FSTMS, FSTMX Unindexed 1 FLDMD, FLDMS, FLDMX, FSTMD, FSTMS, FSTMX Increment 0 FLDD, FLDS, FSTD, FSTS (Negative offset) 1 FLDMD, FLDMS, FLDMX, FSTMD, FSTMS, FSTMX Decrement 1 FLDD, FLDS, FSTD, FSTS (Positive offset) C5-24 U 1 UNDEFINED See following note Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes Note For a hardware coprocessor implementation of the VFP instruction set, the UNDEFINED entries in Table 5-1 mean the coprocessor does not respond to the instruction, which make the ARM’s Undefined Instruction exception occur (see Undefined Instruction exception on page A2-15) For a software implementation, the UNDEFINED entries mean that such instructions must be passed to the system’s normal mechanism for dealing with non-coprocessor undefined instructions The exact details of this are system-dependent ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark C5-25 VFP Addressing Modes 5.5.2 VFP load/store multiple - Unindexed 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 D L 16 15 Rn 12 11 Fd cp_num offset This addressing mode is for VFP load multiple and store multiple instructions, and forms a range of addresses The first address formed is the start_address, and is the value of the base register Rn Subsequent addresses are formed by incrementing the previous address by four • For the FLDMS and FSTMS instructions, the offset in the instruction is equal to the number of single-precision registers to be transferred One address is generated for each register, so the end_address is four less than the value of the base register Rn plus four times the offset • For the FLDMD and FSTMD instructions, the offset in the instruction is equal to twice the number of double-precision registers to be transferred Two addresses are generated for each register, so the end_address is four less than the value of the base register Rn plus four times the offset • For the FLDMX and FSTMX instructions, the offset in the instruction is one more than twice the number of double-precision registers to be transferred The number of addresses generated is at most equal to the offset, but can be a smaller number (decided by the implementor) provided the FLDMX and FSTMX instructions function correctly (see FLDMX on page C4-425 and FSTMX on page C4-100) Accordingly, the end_address is the value of the base register Rn plus four times the offset, minus an IMPLEMENTATION DEFINED amount which is at least four Instruction syntax IA{} , where: Is FLDM or FSTM, and controls the value of the L bit Is D, S or X, and controls the values of cp_num and offset[0] Is the condition under which the instruction is executed The conditions are defined in The condition field on page A3-5 If is omitted, the AL (always) condition is used Specifies the base register If R15 is specified for , the value used is the address of the instruction plus Specifies the list of registers loaded or stored by the instruction See the individual instructions for details of which registers are specified and how Fd, D and offset are set in the instruction Architecture version All C5-26 Copyright © 1996-2000 ARM Limited All rights reserved Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ARM DDI 0100E VFP Addressing Modes Operation if (offset[0] == 1) and (cp_num == 0b1011) then /* FLDMX or FSTMX */ word_count = IMPLEMENTATION DEFINED value (

Định dạng
Số trang	30
Dung lượng	393,06 KB