Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 54 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
54
Dung lượng
124,32 KB
Nội dung
1
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Chapter 9: Hashing
• Basic concepts
• Hash functions
• Collision resolution
• Open addressing
• Linked list resolution
• Bucket hashing
2
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
• Sequential search: O(n) Requiring several
key comparisons
• Binary search: O(log
2
n) before the target is found
3
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
1,000,000500,000201,000,000
100,00050,00017100,000
10,0005,0001410,000
1,000500101,000
2561288256
5025650
168416
Sequential
(Worst Case)
Sequential
(Average)
BinarySize
• Search complexity:
4
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
• Is there a search algorithm whose complexity is
O(1)?
5
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
• Is there a search algorithm whose complexity is
O(1)?
YES.
6
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
memory addresses
keys
hashing
Each key has only one address
7
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
John Adams100
Ray Black007
Vu Nguyen005
Sarah Trapp002
Harry Lee001
Key
Address
Vu Nguyen 102002
John Adams 107095
Sarah Trapp 111060
Hash
Function
005
100
002
8
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
• Home address: address produced by a hash
function.
• Prime area: memory that contains all the home
addresses.
9
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
• Synonyms: a set of keys that hash to the same
location.
• Collision: the location of the data to be inserted is
already occupied by the synonym data.
10
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Basic Concepts
• Ideal hashing:
– No location collision
– Compact address space
[...]... Hoang Tru CSE Faculty - HCMUT 13 01 December 2008 Basic Concepts Searh for B hash(A) = 9 hash(B) = 9 hash(C) = 17 C [1] A B [5] [9] [17] Probing Cao Hoang Tru CSE Faculty - HCMUT 14 01 December 2008 Hash Functions • Direct hashing • Modulo division • Digit extraction • Mid-square • Folding • Rotation • Pseudo-random Cao Hoang Tru CSE Faculty - HCMUT 15 01 December 2008 Direct Hashing • The address... 64 65 66 → → → → → 26 36 46 56 66 Spreading the data more evenly across the address space Cao Hoang Tru CSE Faculty - HCMUT 25 01 December 2008 Pseudorandom Key Pseudorandom Number Generator Random Number Modulo Division Address y = ax + c For maximum efficiency, a and c should be prime numbers Cao Hoang Tru CSE Faculty - HCMUT 26 01 December 2008 Pseudorandom • Example: Key = 121267 Address a = 17 c=7... elements Cao Hoang Tru CSE Faculty - HCMUT 29 01 December 2008 Collision Resolution • As data are added and collisions are resolved, hashing tends to cause data to group within the list ⇒ Clustering: data are unevenly distributed across the list • High degree of clustering increases the number of probes to locate an element ⇒ Minimize clustering Cao Hoang Tru CSE Faculty - HCMUT 30 01 December 2008 Collision... Cao Hoang Tru CSE Faculty - HCMUT 18 01 December 2008 Digit Extraction Address = selected digits from Key • Example: 379452 121267 378845 160252 045128 Cao Hoang Tru CSE Faculty - HCMUT → → → → → 394 112 388 102 051 19 01 December 2008 Mid-square Address = middle digits of Key2 • Example: 9452 * 9452 = 89340304 → 3403 Cao Hoang Tru CSE Faculty - HCMUT 20 01 December 2008 Mid-square • Disadvantage: the... hash(C) = 17 A [1] [5] Cao Hoang Tru CSE Faculty - HCMUT [9] [17] 11 01 December 2008 Basic Concepts Insert A, B, C hash(A) = 9 hash(B) = 9 B and A collide at 9 hash(C) = 17 A [1] [5] B [9] [17] Collision Resolution Cao Hoang Tru CSE Faculty - HCMUT 12 01 December 2008 Basic Concepts Insert A, B, C hash(A) = 9 hash(B) = 9 B and A collide at 9 hash(C) = 17 C and B collide at 17 C [1] A B [5] [9] [17] Collision... = 2061546 MOD 307 + 1 = 41 + 1 = 42 Cao Hoang Tru CSE Faculty - HCMUT 27 01 December 2008 Collision Resolution • Except for the direct hashing, none of the others are one-to-one mapping ⇒ Requiring collision resolution methods • Each collision resolution method can be used independently with each hash function Cao Hoang Tru CSE Faculty - HCMUT 28 01 December 2008 Collision Resolution • A rule of thumb:... Hoang Tru CSE Faculty - HCMUT 16 01 December 2008 Direct Hashing • Advantage: there is no collision • Disadvantage: the address space (storage size) is as large as the key space Cao Hoang Tru CSE Faculty - HCMUT 17 01 December 2008 Modulo Division Address = Key MOD listSize + 1 • Fewer collisions if listSize is a prime number • Example: Numbering system to handle 1,000,000 employees Data space to store... resolution • Bucket hashing Cao Hoang Tru CSE Faculty - HCMUT 33 01 December 2008 Open Addressing • When a collision occurs, an unoccupied element is searched for placing the new element in Cao Hoang Tru CSE Faculty - HCMUT 34 01 December 2008 Open Addressing • Hash function: h: U → {0, …, m − 1} set of keys Cao Hoang Tru CSE Faculty - HCMUT addresses 35 01 December 2008 Open Addressing • Hash and probe function:... Resolution • Primary clustering: data become clustered around a home address Insert A9, B9, C9, D11, E12 A B C D E [1] Cao Hoang Tru CSE Faculty - HCMUT [9] [10] [11] [12] [13] 31 01 December 2008 Collision Resolution • Secondary clustering: data become grouped along a collision path throughout a list Insert A9, B9, C9, D11, E12, F9 A B D E [1] Cao Hoang Tru CSE Faculty - HCMUT C [9] [10] [11] [12] [13]... ⇒ 368 321 + 456 + 987 = 1764 ⇒ 764 Cao Hoang Tru CSE Faculty - HCMUT 23 01 December 2008 Rotation • Hashing keys that are identical except for the last character may create synonyms • The key is rotated before hashing original key rotated key 600101 600102 600103 600104 600105 160010 260010 360010 460010 560010 Cao Hoang Tru CSE Faculty - HCMUT 24 01 December 2008 Rotation • Used in combination with . Faculty - HCMUT
Chapter 9: Hashing
• Basic concepts
• Hash functions
• Collision resolution
• Open addressing
• Linked list resolution
• Bucket hashing
2
01. Direct hashing
• Modulo division
• Digit extraction
• Mid-square
• Folding
• Rotation
• Pseudo-random
16
01 December 2008
Cao Hoang Tru
CSE Faculty - HCMUT
Direct