Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 54 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
54
Dung lượng
135,8 KB
Nội dung
1 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Chapter 9: Hashing • Basic concepts • Hash functions • Collision resolution • Open addressing • Linked list resolution • Bucket hashing 2 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts • Sequential search: O(n) Requiring several key comparisons • Binary search: O(log 2 n) before the target is found 3 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts 1,000,000500,000201,000,000 100,00050,00017100,000 10,0005,0001410,000 1,000500101,000 2561288256 5025650 168416 Sequential (Worst Case) Sequential (Average) BinarySize • Search complexity: 4 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts • Is there a search algorithm whose complexity is O(1)? 5 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts • Is there a search algorithm whose complexity is O(1)? YES. 6 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts memory addresses keys hashing Each key has only one address 7 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts John Adams100 Ray Black007 Vu Nguyen005 Sarah Trapp002 Harry Lee001 Key Address Vu Nguyen 102002 John Adams 107095 Sarah Trapp 111060 Hash Function 005 100 002 8 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts • Home address: address produced by a hash function. • Prime area: memory that contains all the home addresses. 9 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts • Synonyms: a set of keys that hash to the same location. • Collision: the location of the data to be inserted is already occupied by the synonym data. 10 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts • Ideal hashing: – No location collision – Compact address space [...]... Insert A, B, C hash(A) = 9 hash(B) = 9 hash(C) = 17 A [1] [5] Cao Hoang Tru CSE Faculty - HCMUT [9] [17] 11 01 December 2008 Basic Concepts Insert A, B, C hash(A) = 9 hash(B) = 9 B and A collide at 9 hash(C) = 17 A [1] [5] B [9] [17] Collision Resolution Cao Hoang Tru CSE Faculty - HCMUT 12 01 December 2008 Basic Concepts Insert A, B, C hash(A) = 9 hash(B) = 9 B and A collide at 9 hash(C) = 17 C and B... Cao Hoang Tru CSE Faculty - HCMUT 18 01 December 2008 Digit Extraction Address = selected digits from Key • Example: 3 794 52 121267 378845 160252 045128 Cao Hoang Tru CSE Faculty - HCMUT → → → → → 394 112 388 102 051 19 01 December 2008 Mid-square Address = middle digits of Key2 • Example: 94 52 * 94 52 = 893 40304 → 3403 Cao Hoang Tru CSE Faculty - HCMUT 20 01 December 2008 Mid-square • Disadvantage: the... data become clustered around a home address Insert A9, B9, C9, D11, E12 A B C D E [1] Cao Hoang Tru CSE Faculty - HCMUT [9] [10] [11] [12] [13] 31 01 December 2008 Collision Resolution • Secondary clustering: data become grouped along a collision path throughout a list Insert A9, B9, C9, D11, E12, F9 A B D E [1] Cao Hoang Tru CSE Faculty - HCMUT C [9] [10] [11] [12] [13] [14] F [23] 32 01 December 2008... [1] A B [5] [9] [17] Collision Resolution Cao Hoang Tru CSE Faculty - HCMUT 13 01 December 2008 Basic Concepts Searh for B hash(A) = 9 hash(B) = 9 hash(C) = 17 C [1] A B [5] [9] [17] Probing Cao Hoang Tru CSE Faculty - HCMUT 14 01 December 2008 Hash Functions • Direct hashing • Modulo division • Digit extraction • Mid-square • Folding • Rotation • Pseudo-random Cao Hoang Tru CSE Faculty - HCMUT 15 01... 123|456|7 89 fold shift fold boundary 123 + 456 + 7 89 = 1368 ⇒ 368 321 + 456 + 98 7 = 1764 ⇒ 764 Cao Hoang Tru CSE Faculty - HCMUT 23 01 December 2008 Rotation • Hashing keys that are identical except for the last character may create synonyms • The key is rotated before hashing original key rotated key 600101 600102 600103 600104 600105 160010 260010 360010 460010 560010 Cao Hoang Tru CSE Faculty - HCMUT... • Variations: use only a portion of the key 3 794 52: 3 79 * 3 79 = 143641 → 364 121267: 121 * 121 = 014641 → 464 045128: 045 * 045 = 002025 → 202 Cao Hoang Tru CSE Faculty - HCMUT 21 01 December 2008 Folding • The key is divided into parts whose size matches the address size Key = 123|456|7 89 fold shift 123 + 456 + 7 89 = 1368 ⇒ 368 Cao Hoang Tru CSE Faculty - HCMUT 22 01 December 2008 Folding • The key... = 2061546 MOD 307 + 1 = 41 + 1 = 42 Cao Hoang Tru CSE Faculty - HCMUT 27 01 December 2008 Collision Resolution • Except for the direct hashing, none of the others are one-to-one mapping ⇒ Requiring collision resolution methods • Each collision resolution method can be used independently with each hash function Cao Hoang Tru CSE Faculty - HCMUT 28 01 December 2008 Collision Resolution • A rule of thumb:... elements Cao Hoang Tru CSE Faculty - HCMUT 29 01 December 2008 Collision Resolution • As data are added and collisions are resolved, hashing tends to cause data to group within the list ⇒ Clustering: data are unevenly distributed across the list • High degree of clustering increases the number of probes to locate an element ⇒ Minimize clustering Cao Hoang Tru CSE Faculty - HCMUT 30 01 December 2008 Collision... Pseudo-random Cao Hoang Tru CSE Faculty - HCMUT 15 01 December 2008 Direct Hashing • The address is the key itself: hash(Key) = Key Cao Hoang Tru CSE Faculty - HCMUT 16 01 December 2008 Direct Hashing • Advantage: there is no collision • Disadvantage: the address space (storage size) is as large as the key space Cao Hoang Tru CSE Faculty - HCMUT 17 01 December 2008 Modulo Division Address = Key MOD listSize... addressing • Linked list resolution • Bucket hashing Cao Hoang Tru CSE Faculty - HCMUT 33 01 December 2008 Open Addressing • When a collision occurs, an unoccupied element is searched for placing the new element in Cao Hoang Tru CSE Faculty - HCMUT 34 01 December 2008 Open Addressing • Hash function: h: U → {0, …, m − 1} set of keys Cao Hoang Tru CSE Faculty - HCMUT addresses 35 01 December 2008 Open . Faculty - HCMUT Chapter 9: Hashing • Basic concepts • Hash functions • Collision resolution • Open addressing • Linked list resolution • Bucket hashing 2 01 December 2008 Cao Hoang Tru CSE Faculty -. C hash(A) = 9 hash(B) = 9 hash(C) = 17 12 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts [17] [9] [5][1] BA B and A collide at 9 Collision Resolution Insert A, B, C hash(A) = 9 hash(B). 9 hash(B) = 9 hash(C) = 17 13 01 December 2008 Cao Hoang Tru CSE Faculty - HCMUT Basic Concepts [17] [9] [5][1] BAC B and A collide at 9 Collision Resolution Insert A, B, C hash(A) = 9 hash(B) = 9 hash(C)