Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 30 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
30
Dung lượng
1,1 MB
Nội dung
6.1 Field Multiplication 159
Algorithm 6.5 Modular Reduction Using General Irreducible Polynomials
Require: The degree m of the irreducible polynomial; the operand C to be reduced;
and k the number of bits that can be reduced at once.
Ensure: The field polynomial defined as C = C mod P, with a length of m bits.
2:
shift = 2m-2-k-l]
3:
for i from 0 to Nk do
4:
A = Cn-k-iC{n-k-i)-\
• • •
C'(n-fc.i)-/e+i;
5:
5 = Highdivtahle[A\\
6: Pshifted = LeftShift{Paddedtable[S], shift);
7:
C = C-\- Pshifted]
8:
s/iz/t
= shift
—
k\
9: end for
10:
Return C
is computed the amount of shift needed to apply properly the method outlined
in figure 6.7. Then, in each iteration of the loop in lines 3-9, k bits of C are
reduced. In line 4 the k bits of C to be reduced are obtained. This information
is used in line 5 to compute the appropriate scalar S needed to obtain the
result of equation (6.23). In fine 6 the S-th entry of the table Paddedtable is
left shifted shift positions so that in line 7 the operation C-{-2^^^^^{S-P) can
be finally computed allowing the effective reduction of k bits at once. Then, in
fine 8 the variable shift is updated in order to continue the reduction process.
Algorithm 6.5 performs a total of
A^^;
= T^^x^l iterations. At each itera-
tion of the algorithm the look-up tables Highdivtable and Paddedtable are
accessed once each. In line 7, and XOR addition is executed, implying that
the complexity cost of the general reduction method discussed in this section
is given as,
Additions = 2Nk, .^ ^^.
Look-up table size (in bits) =
2^^(771
-h 2k) . \ - )
6.1.6 Interleaving Multiplication
In this Subsection we discuss one of the simplest and most economical binary
field multiplier schemes: the serial interleaving multiplication algorithm.
Multiplication by a Primitive Element
Let P(a:;) = po+pia;-f-pia;^-f .H-Pm-ia;"^"^ +a;'^ be an m-degree irreducible
polynomial over GF{2). Let also a be a root of
p(a;),
i.e., p(a)
—
0. Then, the
set
{1,
a,
a^, ,
a'^"^}
is a basis for
^^(2^^),
commonly called the polyno-
mial (canonical) basis of the field
[221].
An element A G GF{2'^) is expressed
m —1
in this basis as A — ^ aia\ Let A{a) be an arbitrary element of GF{2'^).
i=0
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
160 6. Binary Finite Field Arithmetic
Then, the product C
—
a- A{a) can be expressed as,
C = a (ao+ aia4 .+arri_ia'^~^) = aoa + aia^ +
.
H-am-iQ;'^. (6.25)
'T5
'^ ^
•#
-e
^
-—e
Fig. 6.8. a
•
A{a) MultipUcation
Using the fact that a is a primitive root of the irreducible polynomial, we
can write,
a^ = po + Pia + + pm-ia^"^ (6.26)
Substituting Eq. (6.26) into Eq. (6.25) we obtain,
C = Co + cia 4- + Cm-ia^~\
where,
CQ
—
am-iPo and
di
—
ai-i -f am-iPi,
for i — 1, , m
—
1. A realization of the above operation is shown in
Fig. 6.8. The main building block is an m-tap LFSR register. That regis-
ter is initially loaded with the m coordinates of the field element A, namely,
(ao,
ai,
a2, ,
am
—
1). The signals pi represent the coefficients of the irre-
ducible polynomial. Notice that whenever a given polynomial coefficient is
on, i.e Pi = 1, then the corresponding branch of the circuit will be a short
circuit. Otherwise, if Pi = 0 the branch acts as an open circuit. After m clock
cycles, the new register content will be the value of the field element C.
Serial Multiplication
Using the multiplication procedure outlined above, the multiplication of two
arbitrary field elements can be accomplished by using a procedure inspired in
the well-know Horner's scheme.
Let us consider two arbitrary field elements A and B expressed in polyno-
mial basis as,
m —1 m—l
i=0 1=0
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
6.1 Field Multiplication 161
Then, the product oi A
•
B can be expressed as,
C{a) - A{a)B{a) mod P{a)
= A{a) ( Y^ bia' j mod P{a)
m-l \
Y^ biA{a)a' mod P{a)
si=0 /
Therefore,
C{a) = {boAia) + biA{a)a -f b2A{a)a'^ 4 + bm-iAia)'^-'^) mod P{a).
Algorithm 6.6 shows the standard procedure for computing above equation
using Horner's rule.
Algorithm 6.6 LSB-First Serial/Parallel Multipher
Require: An irreducible polynomial P{a) of degree ?n, two elements
A^
B G
Ensure: C{a) = A{a)B{a) mod P{a).
1
2
3
4
5
6
C = 0;
for i = 0 to
772 —
1 do
C^biA-i-
C;
A = Aa^ mod P(a);
end for
Return(C).
The multiplier realization of Algorithm 6.6 is shown in Fig. 6.9. The archi-
tecture shown in Fig. 6.9 consists of two LFSR Register plus extra circuitry.
As it was mentioned previously, the signals pi in the first LFSR block represent
the coefficients of the irreducible polynomial, and their values (either ones or
zeroes) determine the LFSR structure. Furthermore, a gate array is included
in order to compute the multiplication operation as is explained below. Ini-
tially the register C is set to zero, whereas the register in the upper part of
Fig. 6.9 is loaded with the m coefficients of the field element A. Thereafter,
when the clock signal is applied to the registers, the value of Aa is generated.
Then, B coefficients, namely,
6o,
^i,
^2,
• •
•, ^m-i are serially introduced in that
order, thus generating the values biAa\ for z =
0,1, ,
m
—
1, which are ac-
cumulated in register C until all the m product coefficients
CQ,
ci,
C2,
, Cm-i
are collected.
6.1.7 Matrix-Vector Multipliers
The GF(2^) multiplication given by (6.1) can be described in terms of matrix-
vector operations. There are mainly two different approaches based on matrix
vector operations to compute a field product:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
162 6. Binary Finite Field Arithmetic
po~ri 7^} ^
.
b^,
bo
e-
e*
j^
^
e- / e
i3
5 5
^
T^
e*
"F^
Fig. 6.9. LSB-First Serial/Parallel Multiplier
a
o*
T^
1.
The polynomial multiplication part is performed by any method. Then,
the resulting product is reduced by using a reduction matrix.
2.
The polynomial multiplication and modular reduction parts are performed
in a single step by using the so-called Mastrovito matrix.
Let a{x) and b{x) denote two degree m polynomials representing the ele-
ments in GF(2"^). Let c{x) = a{x)b{x) mod P{x) denote their field product.
The coefficient vectors of these polynomials are given by
a== [ao,ai,-
• •
,am-i]^
b = [bo.bi, .bm-i]'-^
c = [co,ci,-"
,Cm-i]^.
Also,
let us define the polynomials
d{x) = a{x)b{x) = do-\- dix
H
h (i2m-2^^^~^ ,
d(^\x) = do -f c/ix + • -f- dm-ix'^-'^ , (6.27)
d^^^{x) =dm-\- dm-^-lX +
• • •
4-
d2m-2X'^-^
.
The coefficient vectors representing these polynomials are
d = [do^di,'" ,C?2m-2]^ ,
d(^) = [do,dir".dm-if ,
d^^^ =
[dm,
dm-\-l,
• • • ,
C?2m-2]^ •
The work in [284] reduces the polynomial multiplication d{x) using an
(m X m
—
1) reduction matrix Q to obtain the field product c{x) as below:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
6.1 Field Multiplication 163
c = d(^) + Q
•
d^^) . (6.28)
Mastrovito Multiplier
The so-called Mastrovito matrix is constructed from the coefficients of the
first multiplicand and the irreducible polynomial defining the field. Then, the
polynomial multiplication and modulo reduction steps are performed together
using this matrix. The papers [351, 128, 401] follow the Mastrovito multiph-
cation scheme outHned below.
c-M b
(6.29)
where M is the (m x m) Mastrovito matrix whose entries are the function of
the coefficients of a(x) and P{x). The Mastrovito matrix M is related to the
reduction matrix Q by
M - L + Q . U , (6.30)
where L and U are the following (m x m) and (m
—
1 x m) matrices:
L =
U =
ao
ai
(12
O'm-2
_<^m-l
0 am-
0 0
0
ao
ai
0
0
do
^m-3 <^m-4
ttm-2 ttm-3
1 Q'm-
dm-
-2 " '
-1 "
•
Cl2
^3
0 0
0 0
0 0
ao 0
ai ao
ai
a2
(6.31)
0 0 0
-1
CLr,
0 0 0 ••• 0 ttm-l.
This is because d{x) = a{x)b{x) can be given in the vector notation by
d=:
d(^)
d(^)
Lb
Ub
Then, c = d(^) + Q
•
d(^) =L.b + Q.U.b=(L + Q-U).b = M.b.
The Mastrovito and the reduction matrices are studied thoroughly in
[284,
401] for various types of irreducible polynomials. In [351] a compre-
hensive study of the Mastrovito multiplier for irreducible trinomials was pre-
sented. Authors in [401] proposed a practical and systematic design approach
for a general Mastrovito multiplier. In [388] it was shown that non-Mastrovito
multipliers using direct modular reduction also provide competitive perfor-
mance. Moreover, efficient non-Mastrovito multipliers for irreducible trinomi-
als were also proposed.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
164 6. Binary Finite Field Arithmetic
6.1.8 Montgomery Multiplier
In this section we explain the Montgomery multiplication method in GF(2"^).
Once again, let P{x) be an irreducible polynomial over GF{2) that defines the
field
GF(2^).
Rather than computing Eq.(6.1), the Montgomery multiplica-
tion calculates
C{x) = A[x)B{x)R-\x) mod P[x) (6.32)
where R{x) is a fixed element and gcd{R{x),P{x)) = 1.
Because of Bezout's identity^, one can find two polynomials i?~^(x) and
P {x) such that
R{x)R-\x) + P{x)P'{x) - 1 (6.33)
where R~^{x) is the inverse of R[x) modulo P{x). These two polynomi-
als can be calculated with the extended Euclidean algorithm. Kog and Acar
[182,
388] selected R{x)
—
x^ for high performance modular reduction in the
Montgomery multiplication algorithm, which can be given as follows:
Algorithm 6.7 Montgomery Modular Multiplication Algorithm
Require: A{x),B{x),R(x),P'(x)
Ensure: C{x) = A{x)B{x)R~^{x) mod P{x)
1:
T{x) = A(x)B{x);
2:
U{x) = T{x) P'{x) mod R{x)\
3:
C\x) = [T{x) + U{x)P{x)]/R{x)]
4:
Return C
To prove the correctness of this algorithm we note that Step 2 implies that
there exists a polynomial
U{x) = T{x) P\x) + H{x)R{x) . (6.34)
We write C{x) in Step 3 by using (6.34) as follows:
<^i^) = flfeyl^W + T{x) P'{x) P{x) + H{x)R{x) P{x)\
= flfe[rW(l + P'{x) P{x))+H{x)R{x) P(x)] .
From (6.33), we can write 1 + P{x)P (x) = R{x)R''^{x) and substitute it
into our last expression
^(^) = W^[T{x)R{x)R-' {x) -f H{x)R{x) P{x)]
= T{x)R'\x)-^H[x) P{x)
= A{x)B{x)R-^ mod P{x) .
For more details on Bezout's identity the reader is refer to
§6.3.1.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
6.1 Field Multiplication 165
The degree of C{x) can be verified from Step 3 as follows:
deg[C{x)] < max{deg[T{x)],deg[U{x)] 4- deg[P{x)]} - deg[R{x)]
< max{2m
—
2, deg[R{x)]
—
1 + m}
—
deg[R{x)]
< max{2m
—
2
—
deg[R{x)],m
—
1} .
Then, it can be concluded that deg[C{x)] < m
—
1, if deg[R{x)] > m
—
1. If
we choose R{x) = x'^, the result C{x) will be of degree m
—
1 at most.
It can be shown [182] that Algorithm 6.7 has an associated computational
cost of 2m^ coefficient multiplications (ANDs) and 2m^
—
3m
—
1 coefficient
additions (XORs), whereas the total time complexity is 3TA + (2|'log2m] +
[log2(m-l)l)rx.
6.1.9 A Comparison of Field Multiplier Designs
Table 6.3. Fastest Reconfigurable
Work
KOM variant by [47],
implemented by [326]
KOM variant by [85],
implemented by [326]
KOM variant by
[293],
implemented by [326]
KOM [106]
Recursive
Classical [106]
KOM [117]
Massey-Omura
[118]
Platform
Virtex 2
Virtex 2
Virtex 2
Virtex 2
Virtex 2
Virtex 2
Virtex 2
Field
GF(2'^^)
GF(2'^^)
GF(2^^^)
240 bits
240 bits
240 bits
240 bits
Hardware GF{2'^) Multipliers
Cost
5307
CLBs
5409
CLBs
5840
CLBs
1480
CLBs
1582
CLBs
1660
CLBs
36857
LUTs
Cycles
1
1
1
30
56
54
50
timings
I2.5677S
13.37r?S
14.73778
37877S
523r;S
655778
8OO778
bits
S licesx tim ings
2.445M
2.254M
1.895M
0.429M
0.290M
0.221M
0.0336M (est.)
In this Subsection we compare some of the most representative designs
of GF{2'^) multipliers considering three metrics: speed, compactness and effi-
ciency. Table 6.3 shows the fastest designs reported to date for GF{2'^) field
multiplication. It can be observed that Karatsuba-ofman Multipliers (KOM)
are much faster than other schemes such as recursive classical multiplier or
Massey-Omura scheme. This can be explained from the theoretical point of
view from the fact that KOM algorithms enjoy of a sub-quadratic complexity.
In Table 6.4 we show a selection of some of the most compact reconfigurable
hardware multiplier designs. It is noted that this category is dominated by
the interleaved and Montgomery multiplier schemes.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
166 6. Binary Finite Field Arithmetic
Table 6.4. Most Compact Reconfigurable Hardware GF(2'^) Multipliers
Work
Interleaved
[104]
Montgomery
[97]
Class.+Montg.
[18]
Montgomery
118]
Interleaved
[266]
Platform
Virtex
Virtex
Virtex
Virtex
Virtex
Field
GF(2"^^^)
GF(2'"^^)
GF(2^^")
GF(2^^")
GF(2'"^")
Cost
359
CLBs
425
CLBs (est)
1049
CLBs
1427
CLBs
420
CLBs (est)
Cycles
239
466
80
160
210
timings
3.1MS
2.8lAiS
l.U/xS
1.66/iS
12.3/iS
bits
Slicesxtiminqs
0.215M '
0.195M
0.137M
0.0675M
0.042M
We measure efficiency by taking the ratio of number of bits processed over
slices multiplied by the time delay achieved by the design, namely,
bits
Slices X timings
For instance, consider the KOM variant design proposed by [47] and imple-
mented by
[326].
As is shown in Table 6.3, working over GF{2^^^), that design
achieved a time delay of just, 12.66778 at a cost of 5307 sHces. Therefore its
efficiency is calculated as,
bits
163
Slices X timings 5307 x 12.56?7
2.445M
When comparing the designs featured in Tables 6.3 and 6.4, it is noticed
that the most efficient multiplier designs are the Karatsuba-Ofman multipli-
ers variants as they were reported in [47, 85, 293]. This is a quite remarkable
feature, which implies that the Karatsuba-Ofman multipliers represent both,
the fastest and the most efficient of all multiplier designs studied in this Chap-
ter.
6.2 Field Squaring and Field Square Root for Irreducible
Trinomials
Let us consider binary extension fields constructed using irreducible trinomials
of the form P(x) = x'^
-{-
x'^ -h 1, with m > 2. It is convenient to consider,
without loss of generality, the additional restriction 1 <n< [^J ^.
^ It is known that if P{x) = x"^ -\-x'^
-{-1
is irreducible over GF{2), so is P{x) =
^m
_^
ajW-n _|_
^228].
Hence, provided that at least one irreducible trinomial of
degiee m exists, it is always possible to find another irreducible trinomial such
that its middle coefficient n satisfies the restriction 1 < n < [yj.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
6.2 Field Squaring and Field Square Root for Irreducible Trinomials 167
The rest of this Section is organized as follows. First, in Subsection
6.2.1,
we give the corresponding formulae needed for computing the field squaring
operation when considering arbitrary irreducible trinomials. Those equations
are then used in Subsection 6.2.2 to find the corresponding ones for the field
square root operator.
6.2.1 Field Squaring Computation
Let A = X^^^ aix'^ be an arbitrary element of GF{2'^). Then, according to
Eq. (6.16) its square, A^, can be represented by the 2m-coefficient vector.
A^{x)
= [O ttm-i 0 am-2 0 ai 0 ao]
= Km-l ^m-2
• • •
^m-1 «m i ^m-1 ^2
•
•
• «1 «o] (6-35)
where a[ = 0 for i odd. Hence, the upper half of A'^ (i.e., the m most signifi-
cant bits) in Eq. (6.35) is mapped into the first m coordinates by performing
addition and shift operations only.
In order to investigate the exact cost of the field squaring operation, we
categorize all the irreducible trinomials over GF{2) into four different types.
For all four types considered and by means of Eqs. (6.35) and (6.21), the
following explicit formulae for the field squaring operation were found.
Type I: Computing C =
A"^
mod P{x)y with P{x) = x"^ -f x" 4- 1, m even, n
odd and n < y,
a± +
arn±i
i even, z < n or z > 2n,
a± + ttm+i -f a^_„^i i even, n < i < 2n,
a^^i_ii±i i odd, i < n,
am-n+i i odd, i >
riy
Ci = \
for z = 0,1,
• • • ,
m
—
1. It can be verified that Eq. (6.36) has an associated
cost of m±E:zl XOR gates and 2T^ delays.
Type II: Computing C = ^^ mod P{x), with P{x) = x"^ 4- a:"" 4-1, m even,
n odd and n = ^,
(6.37)
for
2
= 0,1,
• • • ,
m
—
1. It can be verified that Eq. (6.37) has an associated
cost of ^^^ XOR gates and one Tx delay.
ai -f am+i
2 ~2~
ai
2
^m+1-^
an+i
i even, i < n,
i even, z > n,
i odd, z < n.
z odd, i > n^
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
168 6. Binary Finite Field Arithmetic
Type III: Computing C = A^ mod P{x), with P{x) = x"^ +x^ -f 1, m, n odd
numbers and n < ^^^^,
Ci= {
a± -ha±_^rn^ +ai^(^_^)
a± 4- tti ,
1
am+i + ar
2
am+i
i even, i < n,
i even, n < z < 2n,
2 even, z > 2n,
i odd, i < n,
z odd, i > n^
(6.38)
for z = 0,1,
• • • ,
m
—
1. It can be verified that Eq. (6.38) has an associated
cost of ^ XOR gates and 2Tx delays.
Type IV: Computing C = A^ mod P{x), with P{x) = x^ -f a:^ + 1, m odd.
n even and n < ^^^^^,
ai + ai
2
2
2
2
ai
2
a rn + i
ar
+m—n
+ ar
i even, z < n,
even, n < i < 2n,
even, z > 2n,
odd, z < n,
z odd, i > n,
(6.39)
for z = 0,1,
• • • ,
m
—
1. It can be verified that Eq. (6.39) has an associated
cost of ^+^~-^ XOR gates and one Tx delay.
The complexity costs found on Equations (6.36) through (6.39) are in conso-
nance with the ones analytically derived in [386, 387].
6.2.2 Field Square Root Computation
In the following, we keep the assumption that the middle coefficient n of the
generating trinomial P{x) — x'^ -\-x'^
-\-1
satisfies the restriction 1 < n < ^.
Clearly, Eqs. (6.36)-(6.39) are a consequence of the fact that in binary
extension fields, squaring is a linear operation. The Hnear nature of binary
extension field squaring, allow us to describe this operator in terms of an
(m X m)-matrix as,
C = A^:=^MA (6.40)
Furthermore, based on Eq. (6.40), it follows that computing the square
root of an arbitrary field element A means finding a field element D ~ yA
such that D^ = MD = A. Hence,
D = M-'^A
(6.41)
Eq. (6.41) is especially attractive for fields GF{2^) with order sufficiently
large, i.e., m >> 2, where the matrixes M corresponding to Eqs. (6.36)-(6.39)
are all highly spare (each row has at most three nonzero values).
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
[...]... Multiplicative Inverse Among customary finite field arithmetic operations, namely, addition, subtraction, multiplication and inversion of nonzero elements, the computation of the later is the most time-consuming one Multiplicative inversion computation of a nonzero element a G GF{2'^) is defined as the process of finding the unique element a~^ G GF{2'^) such that a • a~^ = 1 Several algorithms for computing... multiplication computations are required On the other hand, notice that at each iteration i, a total of 2'"^2 field squarings are performed Notice also that by definition, the addition chain guarantees that for each Ui^l < i < ty the relation Ui^ — Ui — ui^ holds Hence, one can show by induction that the total number of field squaring operations performed right after the execution of the z-th iteration \^... by the addition chain U as discussed above We assess the computational complexity of the algorithm shown in Fig 6.10 as follows The algorithm performs one field multiplication in each of algorithm's t iterations, yielding a total of t field multiplication computations required Furthermore, at each iteration z, a total of 2^^2 field square roots are performed Since by definition, the addition chain guarantees... a t i o n s In this Section we briefly describe some important binary finite field arithmetic operations such as, the computation of the trace function, the half trace function and binary exponentiation The first two operations are key building blocks for halving an eUiptic curve point, which will be studied in §10.7 6.4.1 Trace function Given C G (7F(2"^), the trace function can be defined as: TriC)... reduction polynomial p{x) = x^^^ -{- x'^ -{- x^ + x^ -\- 1 Then, Tr(x^) = 1 if and only if z G {0,157} The implementation of the trace function in reconfigurable hardware only needs one XOR gate to add the bits 0 and i57 from the input polynomial 6.4.2 Solving a Q u a d r a t i c E q u a t i o n over ^ ^ ( 2 ^ ^ ) In order to solve a quadratic Equation (10.26), we may use the half-trace function Let... Itoh-Tsuii algorithm with the concept of addition chains Then, we showed that for this version of the Itoh-Tsuii algorithm the multiplicative inverse of an arbitrary nonzero field element in GF(2^) can be computed by performing exactly m — 1 field squarings and t multiplications, where t is the step-length of the optimal addition-chain for m-1 One of the main conclusions of this Section is that according to... stipulated by the addition chain U as discussed above We assess the computational complexity of the algorithm shown in Fig 6.9 as follows The algorithm performs t iterations (where t is the length of the Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark 6.3 Multiplicative Inverse 179 addition chain U) and one field multiplication per iteration Thus, we conclude that a total... www.verypdf.com to remove this watermark 6.4 Other Arithmetic Operations 185 6.4.3 Exponentiation over Binary Finite Fields Exponentiation over binary finite fields is used for inverse computation via Fermat Little theorem [295] and key agreement schemes such as the DiffieHellman protocol, among other applications For binary extension fields GF{2'^), generated using the m-degree irreducible polynomial... platforms We included detailed analysis of complexities for binary field operations such as: multiphcation, squaring, square root, multiphcative inverse computation, among others Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark 6.5 Conclusions 187 Algorithm 6.13 Squaring and Square Root Parallel Exponentiation Require: The irreducible polynomial P{x), a G GF{2'^), e = (em-i • •... multiplication and squaring operations; 2 ITMIA is a competitive design option only when using normal basis representation and; 3 The recursive nature of the ITMIA algorithm makes the parallelization of that algorithm rather difficult if not impossible, forcing the implementation of the ITMIA procedure in a sequential manner In the rest of this Section we describe efficient implementations of the binary .
Among customary finite field arithmetic operations, namely, addition, sub-
traction, multiplication and inversion
of
nonzero elements,
the
computation. modular reduction in the
Montgomery multiplication algorithm, which can be given as follows:
Algorithm 6.7 Montgomery Modular Multiplication Algorithm