1. Trang chủ
  2. » Khoa Học Tự Nhiên

Tài liệu Thuật toán Algorithms (Phần 27) docx

10 530 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 67,66 KB

Nội dung

STRING SEARCHTNG 253 This leads to the very simple pattelm-matching algorithm implemented below. The program assumes the same i.ldex function as above, but d=32 is used for efficiency (the multiplications might be implemented as shifts). function rksearch : integer; const q=33554393; d=3.Z; var hl, h2, dM, i: integer: begin dM:=l; for i:=l to M-1 do dM:=(d*dM) mod q; hl:=O; for i:=l to M do hl:=(hl*d+index(p[i])) mod q; h2:=0; for i:=l to M do h2:=(h2*d+index(a[i])) mod q; i:=l; while (hloh2) and (i<=N-M) do begin h2:=(h2+d*q-index(,t[i])*dM) mod q; h2:=(h2*d+index(a[i+M])) mod q; i:=i+l; end ; rksearch :=i; end ; The program first computes a hash valle hl for the pattern, then a hash value h2 for the first M characters of the text. (Also it computes the value of d”-’ modq in the variable dM.) Then it proceeds through the text string, using the technique above to compute the hash function for the M characters starting at position i for each i, comparing each new hash value to hl. The prime q is chosen to be as large as possible, but small enough that (d+l)*q doesn’t cause overflow: this requires less mod operations then if we used the largest repesentable prime. (An extra d*q is added during the h2 calculation to make sure that everything stays positive so that the mod operation works as it should.) This algorithm obviously takes time proportional to N + M. Note that it really only finds a position in the text which has the same hash value as the pattern, so, to be sure, we really should do a direct comparison of that text with the pattern. However, the use of suc:i a large value of q, made possible by the mod computations and by the fact that we don’t have to keep the actual hash table around, make8 it extremely unlikely that a collision will occur. Theoretically, this algorithm could still take NM steps in the (unbelievably) worst case, but in practice the algorithm can be relied upon to take about N + M steps. 254 CHAPTER 19 Multiple Searches The algorithms that we’ve been discussing are all oriented towards a specific string searching problem: find an occurrence of a given pattern in a given text string. If the same text string is to be the object of many pattern searches, then it will be worthwhile to do some processing on the string to make subsequent searches efficient. If there are a large number of searches, the string searching problem can be viewed as a special case of the general searching problem that we studied in the previous section. We simply treat the text string as N overlapping “keys,” the ith key defined to be a[l N], the entire text string starting at position i. Of course, we don’t manipulate the keys themselves, but pointers to them: when we need to compare keys i and j we do character-by-character compares starting at positions i and j of the text string. (If we use a “sentinel” character larger than all other characters at the end, then one of the keys will always be greater than the other.) Then the hashing, binary tree, and other algorithms of the previous section can be used directly. First, an entire structure is built up from the text string, and then efficient searches can be performed for particular patterns. There are many details which need to be worked out in applying searching algorithms to string searching in this way; our intent is to point this out as a viable option for some string searching applications. Different methods will be appropriate in different situations. For example, if the searches will always be for patterns of the same length, a hash table constructed with a single scan as in the Rabin-Karp method will yield constant search times on the average. On the other hand, if the patterns are to be of varying length, then one of the tree-based methods might be appropriate. (Patricia is especially adaptable to such an application.) Other variations in the problem can make it significantly more difficult and lead to drastically different methods, as we’ll discover in the next two chapters. r-l STRING SEARCHING 255 Exercises 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Implement a brute-force pattern ms.tching algorithm that scans the pat- tern from right to left. Give the next table for the Knuth-Morris-Pratt algorithm for the pattern AAIWUAA. Give the next table for the Knuth-Morris-Pratt algorithm for the pattern AERACADABRA. Draw a finite state machine which can search for the pattern AE3RACAD AFBA. How would you search a text file fo; a string of 50 consecutive blanks? Give the right-to-left skip table for the right-left scan for the pattern AENACADABRA. Construct an example for which the right-to-left pattern scan with only the mismatch heuristic performs badly. How would you modify the Rabin-Karp algorithm to search for a given pattern with the additional proviso that the middle character is a “‘wild card” (any text character at all can match it)? Implement a version of the Rabin-Karp algorithm that can find a given two-dimensional pattern in a given two-dimensional text. Assume both pattern and text are rectangles of characters. Write programs to generate a random lOOO-bit text string, then find all occurrences of the last /c bits elsewhere in the string, for k = 5,10,15. (Different methods might be appropriate for different values of k.) . relied upon to take about N + M steps. 254 CHAPTER 19 Multiple Searches The algorithms that we’ve been discussing are all oriented towards a specific string. always be greater than the other.) Then the hashing, binary tree, and other algorithms of the previous section can be used directly. First, an entire structure

Ngày đăng: 26/01/2014, 14:20

TỪ KHÓA LIÊN QUAN

w