Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 44 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
44
Dung lượng
305,77 KB
Nội dung
Last Round The last round of AES (round 10, 12, or 14 depending on key size) differs from the other rounds in that it applies the following steps: 1. SubBytes 2. ShiftRow 3. AddRoundKey Inverse Cipher The inverse cipher is composed of the steps in essentially the same order, except we replace the individual steps with their inverses. 1. AddRoundKey(Nr) 2. for round = Nr-1 downto 1 do 1. InvShiftRow 2. InvSubBytes 3. AddRoundKey(round) 4. InvMixColumns 3. InvSubBytes 4. InvShiftRow 5. AddRoundKey(0) In theses steps, the “Inv” prefix means the inverse operation.The key schedule is slightly different depending on the implementation. We shall see that in the fast AES code, moving AddRoundKey to the last step of the round allows us to create a decryption routine similar to the encryption routine. Key Schedule The key schedule is responsible for turning the input key into the Nr+1 required 128-bit round keys.The algorithm in Figure 4.11 will compute the round keys. www.syngress.com Advanced Encryption Standard • Chapter 4 155 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 155 Figure 4.11 The AES Key Schedule Input: Nk Number of 32-bit words in the key (4, 6 or 8) wArray of 4*(Nk+1) 32-bit words Output: w Array setup with key 1. Preload the secret key into the first Nk words of w in big endian fashion. 2. i = Nk 3. while (i < 4*(Nr+1)) do 1. temp = w[i – 1] 2. if (i mod Nk = 0) i. temp = SubWord(RotWord(temp)) XOR Rcon[i/Nk] 3. else if (Nk > 6 and i mod Nk = 4) i. temp = SubWord(temp) 4. w[i] = w[i-Nk] xor temp 5. i = i + 1 The key schedule requires two additional functions. SubWord() takes the 32-bit input and sends each byte through the AES SubBytes substitution table in parallel. RotWord() rotates the word to the right cyclically by eight bits.The Rcon table is an array of the first 10 powers of the polynomial g(x) = x modulo the AES polynomial stored only in the most significant byte of the 32-bit words. Implementation There are already many public implementations of AES for a variety of platforms. From the most common reference, implementations are used on 32-bit and 64-bit desktops to tiny 8- bit implementations for microcontrollers.There is also a variety of implementations for hard- ware scenarios to optimize for speed or security (against side channel attacks), or both. Ideally, it is best to use a previously tested implementation of AES instead of writing your own. However, there are cases where a custom implementation is required, so it is important to understand how to implement it. We are going to focus on a rather simple eight-bit implementation suitable for compact implementation on microcontrollers. Second, we are going to focus on the traditional 32-bit implementation common in various packages such as OpenSSL, GnuPG, and LibTomCrypt. www.syngress.com 156 Chapter 4 • Advanced Encryption Standard 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 156 An Eight-Bit Implementation Our first implementation is a direct translation of the standard into C using byte arrays. At this point, we are not applying any optimizations to make sure the C code is as clear as pos- sible.This code will work pretty much anywhere, as it uses very little code and data space and works with small eight-bit data types. It is not ideal for deployment where speed is an issue, and as such is not recommended for use in fielded applications. aes_small.c: 001 /* The AES Substitution Table */ 002 static const unsigned char sbox[256] = { 003 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 004 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76, <snip> 033 0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 034 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16 }; 035 036 /* The key schedule rcon table */ 037 static const unsigned char Rcon[10] = { 038 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36 }; These two tables form the constant tables.The first is the SubBytes function imple- mented as a single table called sbox.The second table is the Rcon table for the key schedule, which is the first 10 powers of g(x) = x. 040 /* The *x function */ 041 static unsigned char xtime(unsigned char x) 042 { 043 if (x & 0x80) { return ((x<<1)^0x1B) & 0xFF; } 044 return x<<1; 045 } This function computes the xtime required for MixColumns. One possible tradeoff would be to implement this as a single 256-byte table. It would avoid the XOR, shift and branch, making the code faster at a cost of more fixed data usage. 047 /* MixColumns: Processes the entire block */ 048 static void MixColumns(unsigned char *col) 049 { 050 unsigned char tmp[4], xt[4]; 051 int x; 052 053 for (x = 0; x < 4; x++, col += 4) { 054 xt[0] = xtime(col[0]); 055 xt[1] = xtime(col[1]); 056 xt[2] = xtime(col[2]); 057 xt[3] = xtime(col[3]); 058 tmp[0] = xt[0] ^ xt[1] ^ col[1] ^ col[2] ^ col[3]; 059 tmp[1] = col[0] ^ xt[1] ^ xt[2] ^ col[2] ^ col[3]; 060 tmp[2] = col[0] ^ col[1] ^ xt[2] ^ xt[3] ^ col[3]; 061 tmp[3] = xt[0] ^ col[0] ^ col[1] ^ col[2] ^ xt[3]; 062 col[0] = tmp[0]; 063 col[1] = tmp[1]; www.syngress.com Advanced Encryption Standard • Chapter 4 157 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 157 064 col[2] = tmp[2]; 065 col[3] = tmp[3]; 066 } 067 } This is the MixColumn function we saw previously, except it has now been modified to work on all 16 bytes of the state. As previously noted, this function is also doubled buffered (copying to tmp[]) and can be optimized to avoid this. We are also using an array xt[] to hold copies of the xtime() output. Since it is used twice, caching it saves time. However, we do not actually need the array. If we first add all inputs, then the xtime() results, we only need a single byte of extra storage. 069 /* ShiftRows: Shifts the entire block */ 070 static void ShiftRows(unsigned char *col) 071 { 072 unsigned char t; 073 074 /* 2nd row */ 075 t = col[1]; col[1] = col[5]; col[5] = col[9]; 076 col[9] = col[13]; col[13] = t; 077 078 /* 3rd row */ 079 t = col[2]; col[2] = col[10]; col[10] = t; 080 t = col[6]; col[6] = col[14]; col[14] = t; 081 082 /* 4th row */ 083 t = col[15]; col[15] = col[11]; col[11] = col[7]; 084 col[7] = col[3]; col[3] = t; 085 } This function implements the ShiftRows function. It uses a single temporary byte t to swap around values in the rows.The second and fourth rows are implemented using essen- tially a shift register, while the third row is a pair of swaps. 087 /* SubBytes */ 088 static void SubBytes(unsigned char *col) 089 { 090 int x; 091 for (x = 0; x < 16; x++) { 092 col[x] = sbox[col[x]]; 093 } 094 } This function implements the SubBytes function. Fairly straightforward, not much to optimize here. 096 /* AddRoundKey */ 097 static void AddRoundKey(unsigned char *col, 098 unsigned char *key, int round) 099 { 100 int x; 101 for (x = 0; x < 16; x++) { 102 col[x] ^= key[(round<<4)+x]; 103 } 104 } www.syngress.com 158 Chapter 4 • Advanced Encryption Standard 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 158 This functions implements AddRoundKey function. It reads the round key from a single array of bytes, which is at most 15*16=240 bytes in size. We shift the round number by four bits to the left to emulate a multiplication by 16. This function can be optimized on platforms with words larger than eight bits by XORing multiple key bytes at a time.This is an optimization we shall see in the 32-bit code. 106 /* Encrypt a single block with Nr rounds (10, 12, 14) */ 107 void AesEncrypt(unsigned char *blk, unsigned char *key, int Nr) 108 { 109 int x; 110 111 AddRoundKey(blk, key, 0); 112 for (x = 1; x <= (Nr - 1); x++) { 113 SubBytes(blk); 114 ShiftRows(blk); 115 MixColumns(blk); 116 AddRoundKey(blk, key, x); 117 } 118 119 SubBytes(blk); 120 ShiftRows(blk); 121 AddRoundKey(blk, key, Nr); 122 } This function encrypts the block stored in blk in place using the scheduled secret key stored in key.The number of rounds used is stored in Nr and must be 10, 12, or 14 depending on the secret key length (of 128, 192, or 256 bits, respectively). This implementation of AES is not terribly optimized, as we wished to show the dis- crete elements of AES in action. In particular, we have discrete steps inside the round.As we shall see later, even for eight-bit targets we can combine SubBytes, ShiftRows, and MixColumns into one step, saving the double buffering, permutation (ShiftRows), and lookups. 124 /* Schedule a secret key for use. 125 * outkey[] must be 16*15 bytes in size 126 * Nk == number of 32-bit words in the key, e.g., 4, 6 or 8 127 * Nr == number of rounds, e.g., 10, 12, 14 128 */ 129 void ScheduleKey(unsigned char *inkey, 130 unsigned char *outkey, int Nk, int Nr) 131 { 132 unsigned char temp[4], t; 133 int x, i; 134 135 /* copy the key */ 136 for (i = 0; i < (4*Nk); i++) { 137 outkey[i] = inkey[i]; 138 } 139 140 i = Nk; 141 while (i < (4 * (Nr + 1))) { www.syngress.com Advanced Encryption Standard • Chapter 4 159 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 159 142 /* temp = w[i-1] */ 143 for (x = 0; x < 4; x++) temp[x] = outkey[((i-1)<<2) + x]; 144 145 if(i%Nk==0){ 146 /* RotWord() */ 147 t = temp[0]; temp[0] = temp[1]; 148 temp[1] = temp[2]; temp[2] = temp[3]; temp[3] = t; 149 150 /* SubWord() */ 151 for (x = 0; x < 4; x++) { 152 temp[x] = sbox[temp[x]]; 153 } 154 temp[0] ^= Rcon[(i/Nk)-1]; 155 } else if (Nk > 6 && (i % Nk) == 4) { 156 /* SubWord() */ 157 for (x = 0; x < 4; x++) { 158 temp[x] = sbox[temp[x]]; 159 } 160 } 161 162 /* w[i] = w[i-Nk] xor temp */ 163 for (x = 0; x < 4; x++) { 164 outkey[(i<<2)+x] = outkey[((i-Nk)<<2)+x] ^ temp[x]; 165 } 166 ++i; 167 } 168 } This key schedule is the direct translation of the AES standard key schedule into C using eight-bit data types. We have to emulate RotWords() with a shuffle, and all of the loads and stores are done with a four step for loop. The obvious optimization is to create one loop per key size and do away with the remainder (%) operations. In the optimized key schedule, we shall see shortly a key can be scheduled in roughly 1,000 AMD64 cycles or less. A single division can take upward of 100 cycles, so removing that operation is a good starting point. As with AddRoundKey on 32- and 64-bit platforms, we will implement the key schedule using full 32-bit words instead of 8-bit words.This allows us to efficiently imple- ment RotWord() and the 32-bit XOR operations. 170 /** DEMO **/ 171 172 #include <stdio.h> 173 int main(void) 174 { 175 unsigned char blk[16], skey[15*16]; 176 int x, y; 177 static const struct { 178 int Nk, Nr; 179 unsigned char key[32], pt[16], ct[16]; 180 } tests[] = { 181 { 4, 10, 182 { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 183 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f }, www.syngress.com 160 Chapter 4 • Advanced Encryption Standard 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 160 184 { 0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 185 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff }, 186 { 0x69, 0xc4, 0xe0, 0xd8, 0x6a, 0x7b, 0x04, 0x30, 187 0xd8, 0xcd, 0xb7, 0x80, 0x70, 0xb4, 0xc5, 0x5a } 188 }, { 189 6, 12, 190 { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 191 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 192 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17 }, 193 { 0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 194 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff }, 195 { 0xdd, 0xa9, 0x7c, 0xa4, 0x86, 0x4c, 0xdf, 0xe0, 196 0x6e, 0xaf, 0x70, 0xa0, 0xec, 0x0d, 0x71, 0x91 } 197 }, { 198 8, 14, 199 { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 200 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 201 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 202 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f }, 203 { 0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 204 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff }, 205 { 0x8e, 0xa2, 0xb7, 0xca, 0x51, 0x67, 0x45, 0xbf, 206 0xea, 0xfc, 0x49, 0x90, 0x4b, 0x49, 0x60, 0x89 } 207 } 208 }; These three entries are the standard AES test vectors for 128, 192, and 256 key sizes. 210 for (x = 0; x < 3; x++) { 211 ScheduleKey(tests[x].key, skey, tests[x].Nk, tests[x].Nr); 212 213 for (y = 0; y < 16; y++) blk[y] = tests[x].pt[y]; 214 AesEncrypt(blk, skey, tests[x].Nr); Here we are encrypting the plaintext (blk == pt), and are going to test if it equals the expected ciphertext. Notes from the Underground… Cipher Testing A good idea for testing a cipher implementation is to encrypt the provided plain- text more than once; decrypt one fewer times and see if you get the expected result. For example, encrypt the plaintext, and then that ciphertext 999 more times. Next, decrypt the ciphertext repeatedly 999 times and compare it against the expected ciphertext. www.syngress.com Advanced Encryption Standard • Chapter 4 161 Continued 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 161 Often, pre-computed table entries can be slightly off and still allow fixed vectors to pass. Its unlikely, but in certain ciphers (such as CAST5) it is entirely possible to pull off. This test is more applicable to designs where tables are part of a bijection, such as the AES MDS transform. If the tables has errors in it, the resulting imple- mentation should fail to decrypt the ciphertext properly, leading to the incorrect output. Part of the AES process was to provide test vectors of this form. Instead of decrypting N–1 times, the tester would simply encrypt repeatedly N times and verify the output matches the expected value. This catches errors in designs where the elements of the design do not have to be a bijection (such as in Feistel ciphers). 216 for (y = 0; y < 16; y++) { 217 if (blk[y] != tests[x].ct[y]) { 218 printf("Byte %d differs in test %d\n", y, x); 219 for (y = 0; y < 16; y++) printf("%02x ", blk[y]); 220 printf("\n"); 221 return -1; 222 } 223 } 224 } 225 printf("AES passed\n"); 226 return 0; 227 } This implementation will serve as our reference implementation. Let us now consider various optimizations. Optimized Eight-Bit Implementation We can remove several hotspots from our reference implementation. 1. Implement xtime() as a table. 2. Combine ShiftRows and MixColumns in the round function. 3. Remove the double buffering. The new xtime table is listed here. aes_small_opt.c: 040 static const unsigned char xtime[256] = { 041 0x00, 0x02, 0x04, 0x06, 0x08, 0x0a, 0x0c, 0x0e, 042 0x10, 0x12, 0x14, 0x16, 0x18, 0x1a, 0x1c, 0x1e, 043 0x20, 0x22, 0x24, 0x26, 0x28, 0x2a, 0x2c, 0x2e, <snip> 070 0xcb, 0xc9, 0xcf, 0xcd, 0xc3, 0xc1, 0xc7, 0xc5, 071 0xfb, 0xf9, 0xff, 0xfd, 0xf3, 0xf1, 0xf7, 0xf5, 072 0xeb, 0xe9, 0xef, 0xed, 0xe3, 0xe1, 0xe7, 0xe5 }; www.syngress.com 162 Chapter 4 • Advanced Encryption Standard 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 162 This lookup table will return the same result as the old function. Now we are saving on a function call, branch, and a few trivial logical operations. Next, we mix ShiftRows and MixColumns into one function. aes_small_opt.c: 074 static void ShiftMix(unsigned char *col, unsigned char *out) 075 { 076 unsigned char xt; 077 078 #define STEP(i,j,k,l) \ 079 out[0] = col[j] ^ col[k] ^ col[l]; \ 080 out[1] = col[i] ^ col[k] ^ col[l]; \ 081 out[2] = col[i] ^ col[j] ^ col[l]; \ 082 out[3] = col[i] ^ col[j] ^ col[k]; \ 083 xt = xtime[col[i]]; out[0] ^= xt; out[3] ^= xt; \ 084 xt = xtime[col[j]]; out[0] ^= xt; out[1] ^= xt; \ 085 xt = xtime[col[k]]; out[1] ^= xt; out[2] ^= xt; \ 086 xt = xtime[col[l]]; out[2] ^= xt; out[3] ^= xt; \ 087 out += 4; 088 089 STEP(0,5,10,15); 090 STEP(4,9,14,3); 091 STEP(8,13,2,7); 092 STEP(12,1,6,11); 093 094 #undef STEP 095 } We did away with the double buffering tmp array and are outputting to a different desti- nation. Next, we removed the xt array and replaced it with a single unsigned char. The entire function has been unrolled to make the array indexing faster. In various pro- cessors (such as the 8051), accessing the internal RAM by constants is a very fast (one cycle) operation. While this makes the code larger, it does achieve a nice performance boost. Implementers should map tmp and blk to IRAM space on 8051 series processors. The indices passed to the STEP macro are from the AES block offset by the appropriate amount. Recall we are storing values in column major order. Without ShiftRows, the selec- tion patterns would be {0,1,2,3}, {4,5,6,7}, and so on. Here we have merged the ShiftRows function into the code by renaming the bytes of the AES state. Now byte 1 becomes byte 5 (position 1,1 instead of 1,0), byte 2 becomes byte 10, and so on.This gives us the following selection patterns {0,5,10,15}, {4,9,14,3}, {8, 13, 2, 7}, and {12, 1, 6, 11}. We can roll up the loop as for (x = 0; x < 16; x += 4) { STEP((x+0)&15,(x+5)&15,(x+10)&15,(x+15)&15); } This achieves a nearly 4x compression of the code when the compiler is smart enough to use CSE throughout the macro. For various embedded compilers, you may need to help it out by declaring i, j, k, and l as local ints. For example, www.syngress.com Advanced Encryption Standard • Chapter 4 163 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 163 for (x = 0; x < 16; x += 4) { int i, j, k, l; i = (x+0)&15; j = (x+5)&15; k = (x+10)&15; l = (x+15)&15); STEP(i, j, k, l) } Now when the macro is expanded, the pre-computed values are used. Along with this change, we now need new SubBytes and AesEncrypt functions to accommodate the sec- ondary output buffer. aes_small_opt.c: 115 /* SubBytes */ 116 static void SubBytes(unsigned char *col, unsigned char *out) 117 { 118 int x; 119 for (x = 0; x < 16; x++) { 120 out[x] = sbox[col[x]]; 121 } 122 } 123 <snip> 133 134 /* Encrypt a single block with Nr rounds (10, 12, 14) */ 135 void AesEncrypt(unsigned char *blk, unsigned char *key, int Nr) 136 { 137 int x; 138 unsigned char tmp[16]; 139 140 AddRoundKey(blk, key, 0); 141 for (x = 1; x <= (Nr - 1); x++) { 142 SubBytes(blk, tmp); 143 ShiftMix(tmp, blk); 144 AddRoundKey(blk, key, x); 145 } 146 147 SubBytes(blk, blk); 148 ShiftRows(blk); 149 AddRoundKey(blk, key, Nr); 150 } Here we are still using a double buffering scheme (akin to page flipping in graphics pro- gramming), except we are not copying back the result without doing actual work. SubBytes stores the result in our local tmp array, and then ShiftMix outputs the data back to blk. With all these changes, we can now remove the MixColumns function entirely.The code size difference is fairly trivial on x86 processors, where the optimized copy requires 298 more bytes of code space. Obviously, this does not easily translate into a code size delta on smaller, less capable processors. However, the performance delta should be more than worth it. While not shown here, decryption can perform the same optimizations. It is recom- mended that if space is available, tables for the multiplications by 9, 11, 13, and 14 in GF(2)[x]/v(x) be performed by 256 byte tables, respectively.This adds 1,024 bytes to the code size but drastically improves performance. www.syngress.com 164 Chapter 4 • Advanced Encryption Standard 404_CRYPTO_04.qxd 10/30/06 9:42 AM Page 164 [...]... char *IV, 0 35 unsigned char *ct, 036 unsigned char *pt, 037 unsigned long size, 038 unsigned long *skey, int Nr) 039 { 040 unsigned char buf[16], buf2[16], t; 041 unsigned long x; 042 043 for (x = 0; x < 16; x++) buf[x] = IV[x]; www.syngress.com 189 404_CRYPTO_04.qxd 190 10/30/06 9:42 AM Page 190 Chapter 4 • Advanced Encryption Standard 044 0 45 046 047 048 049 050 051 052 053 054 055 056 057 058 while... macros (below) to access the tables 150 #ifdef SMALL_CODE 151 152 #define Te0(x) TE0[x] 153 #define Te1(x) RORc(TE0[x], 8) 154 #define Te2(x) RORc(TE0[x], 16) 155 #define Te3(x) RORc(TE0[x], 24) 156 157 #define Te4_0 0x000000FF & Te4 158 #define Te4_1 0x0000FF00 & Te4 159 #define Te4_2 0x00FF0000 & Te4 160 #define Te4_3 0xFF000000 & Te4 161 162 #else 163 164 #define Te0(x) TE0[x] 1 65 #define Te1(x) TE1[x] 166 #define... InvMixColumns(drk[x]) which can be implemented in the following manner for (x = 4; x < 10*4; x++) { drk[x] = Td0( 255 & Te4[byte(drk[x], Td1( 255 & Te4[byte(drk[x], Td2( 255 & Te4[byte(drk[x], Td3( 255 & Te4[byte(drk[x], } 3)]) ^ 2)]) ^ 1)]) ^ 0)]); Now we have the proper inverse key schedule for AES-128 Substitute “10*4” by “12*4” or “14*4” for 192- or 256 -bit keys, respectively Practical Attacks As of this writing,... 0xbfbfbfbfUL, 0xe6e6e6e6UL, 0x42424242UL, 146 0x41414141UL, 0x99999999UL, 0x2d2d2d2dUL, 147 0xb0b0b0b0UL, 0x5 454 5 454 UL, 0xbbbbbbbbUL, 148 }; www.syngress.com 0xf67b7b8dUL, 0x91c5c 554 UL, 0x562b2b7dUL, 0xec76769aUL, 0xfa7d7d87UL, 0x1a0d0d17UL, 0xd06868b8UL, 0x1e0f0f11UL, 0x2c16163aUL, 0x7b7b7b7bUL, 0xc5c5c5c5UL, 0x2b2b2b2bUL, 0x76767676UL, 0xdfdfdfdfUL, 0x0d0d0d0dUL, 0x68686868UL, 0x0f0f0f0fUL, 0x16161616UL,... if (keylen == 16) { j = 44; for (;;) { temp = rk[3]; rk[4] = rk[0] ^ setup_mix(temp) ^ rcon[i]; rk [5] = rk[1] ^ rk[4]; www.syngress.com 169 404_CRYPTO_04.qxd 170 10/30/06 9:42 AM Page 170 Chapter 4 • Advanced Encryption Standard 054 055 056 057 058 059 060 rk[6] = rk[2] ^ rk [5] ; rk[7] = rk[3] ^ rk[6]; if (++i == 10) { break; } rk += 4; } This loop computes the round keys for the 128-bit key mode It... 0x6dbbbbd6UL, 081 }; 082 083 static const unsigned long Te4[ 256 ] = { 084 0x63636363UL, 0x7c7c7c7cUL, 0x77777777UL, 0 85 0xf2f2f2f2UL, 0x6b6b6b6bUL, 0x6f6f6f6fUL, 086 0x30303030UL, 0x01010101UL, 0x67676767UL, 087 0xfefefefeUL, 0xd7d7d7d7UL, 0xababababUL, 143 0xcecececeUL, 0x 555 555 55UL, 0x28282828UL, 144 0x8c8c8c8cUL, 0xa1a1a1a1UL, 0x89898989UL, 1 45 0xbfbfbfbfUL, 0xe6e6e6e6UL, 0x42424242UL, 146 0x41414141UL,... long)((y)[0] & 255 )>16)& 255 ); \ 0 05 (y)[2] = (unsigned char)(((x)>>8)& 255 ); \ 006 (y)[3] = (unsigned char)((x)& 255 ); }... * [1, 3, 2, 1] 4 Te3[x] = S(x) * [1, 1, 3, 2] Where S(x) is the SubBytes transform and the product is a 1x1 * 1x4 matrix operation From these tables, we can compute SubBytes and MixColumns with the following code: unsigned long SubMix(unsigned long x) { return Te0[x& 255 ] ^ Te1[(x>>8)& 255 ] ^ Te2[(x>>16)& 255 ] ^ Te3[(x>>24)& 255 ]; } The fifth table is simply the SubBytes function replicated four times; that... values for keylen are 16, 24, and 32.The second difference is the output is stored in an array of 15* 4 words instead of 15* 16 bytes 041 042 043 044 0 45 046 047 /* setup the forward key */ i = 0; rk = skey; LOAD32H(rk[0], key ); LOAD32H(rk[1], key + 4); LOAD32H(rk[2], key + 8); LOAD32H(rk[3], key + 12); We always load the first 128 bits of the key regardless of the actual key size 048 049 050 051 052 053 . *col) 049 { 050 unsigned char tmp[4], xt[4]; 051 int x; 052 053 for (x = 0; x < 4; x++, col += 4) { 054 xt[0] = xtime(col[0]); 055 xt[1] = xtime(col[1]); 056 xt[2] = xtime(col[2]); 057 xt[3] =. = t; 149 150 /* SubWord() */ 151 for (x = 0; x < 4; x++) { 152 temp[x] = sbox[temp[x]]; 153 } 154 temp[0] ^= Rcon[(i/Nk)-1]; 155 } else if (Nk > 6 && (i % Nk) == 4) { 156 /* SubWord(). patterns {0 ,5, 10, 15} , {4,9,14,3}, {8, 13, 2, 7}, and {12, 1, 6, 11}. We can roll up the loop as for (x = 0; x < 16; x += 4) { STEP((x+0)& 15, (x +5) & 15, (x+10)& 15, (x+ 15) & 15) ; } This