Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 73 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
73
Dung lượng
721,58 KB
Nội dung
276 Chapter 7 • Searching { int list_size = the_list.size( ); if (searches <= 0 || list_size < 0) { cout << " Exiting test: "<<endl << " The number of searches must be positive."<<endl << " The number of list entries must exceed 0."<<endl; return; } int i, target, found_at; Key :: comparisons = 0; Random number; Timer clock; for (i = 0; i < searches; i++) { target = 2 * number.random_integer(0, list_size − 1) + 1; if (sequential_search(the_list, target, found_at) == not_present) cout << " Error: Failed to find expected target "<<target << endl; } print_out("Successful", clock.elapsed_time( ), Key :: comparisons, searches); Key :: comparisons = 0; clock.reset( ); for (i = 0; i < searches; i++) { target = 2 * number.random_integer(0, list_size); if (sequential_search(the_list, target, found_at) == success) cout << " Error: Found unexpected target "<<target << " at "<<found_at << endl; } print_out("Unsuccessful", clock.elapsed_time( ), Key :: comparisons, searches); } The details of embedding this function into a working program and writing the output function, print_out, are left as a project. Exercises 7.2 E1. One good check for any algorithm is to see what it does in extreme cases. Determine what sequential search does when (a) there is only one item in the list. (b) the list is empty. (c) the list is full. E2. Trace sequential search as it searches for each of the keys present in a list con- taining three items. Determine how many comparisons are made, and thereby check the formula for the average number of comparisons for a successful search. Section 7.2 • Sequential Search 277 E3. If we can assume that the keys in the list have been arranged in order (for example, numerical or alphabetical order), then we can terminate unsuccessful searches more quickly. If the smallest keys come first, then we can terminate the search as soon as a key greater than or equal to the target key has been found. If we assume that it is equally likely that a target key not in the list is in any one of the n +1 intervals (before the first key, between a pair of successive keys, or after the last key), then what is the average number of comparisons for unsuccessful search in this version? E4. At each iteration, sequential search checks two inequalities, one a comparison of keys to see if the target has been found, and the other a comparison of indices to see ifthe end of the listhasbeen reached. A good waytospeed up the algorithm byeliminatingthesecondcomparisonistomake surethat eventually key target will be found, by increasing the size of the list and inserting an extra item at the end with key target. Such an item placed in a list to ensure that a sentinel process terminates is called a sentinel. When the loop terminates, the search will have been successful if target was found before the last item in the list and unsuccessful if the final sentinel item was the one found. Write a C++ function that embodies the idea of a sentinel in the contiguous version of sequential search using lists developed in Section 6.2.2. E5. Find the number of comparisons of keys done by the function written in Exercise E4 for (a) unsuccessful search. (b) best successful search. (c) worst successful search. (d) average successful search. Programming Projects 7.2 P1. Write a program to test sequential search and, later, other searching methods using lists developed in Section 6.2.2. You should make the appropriate decla- rations required to set up the list and put keys into it. The keys are the odd inte- gers from 1 to n, where the user gives the value of n. Then successful searches can be tested by searching for odd integers, and unsuccessful searches can be tested by searching for even integers. Use the function test_search from the text to do the actual testing of the search function. Overload the key comparison operators so that they increment the counter. Write appropriate introduction and print_out functions and a menu driver. For now, the only options are to fill the list with a user-given number of entries, to test sequential_search, and to quit. Later, other searching methods could be added as further options. Find out how many comparisons are done for both unsuccessful and suc- cessful searches, and compare these results with the analyses in the text. Run your program for representative values of n, such as n = 10, n = 100, n = 1000. 278 Chapter 7 • Searching P2. Take the driver program written in Project P1 to test searching functions, and insert the version of sequential search that uses a sentinel (see Exercise E4). For sentinel search various values of n, determine whether the version with or without a sentinel is faster. By experimenting, find the cross-over point between thetwo versions, if there is one. That is, for what value of n is the extra time needed to insert a sentinel at the end of a list of size n about the same as the time needed for extra comparisons of indices in the version without a sentinel? P3. What changes are required to our sequential search function and testing pro- gram in order to operate on simply linked lists as developed in Section 6.2.3? linked sequential search Make these changes and apply the testing program from Project P1 for linked lists to test linked sequential search. 7.3 BINARY SEARCH Sequential search is easy to write and efficient for short lists, but a disaster for long ones. Imagine trying to find the name “Amanda Thompson” in a large telephone book by reading one name at a time starting at the front of the book! To find any entry in a long list, there are far more efficient methods, provided that the keys in the list are already sorted into order. One of the best methods for a list with keys in order is first to compare the target key with one in the center of the list and then restrict our attention to only method the first orsecond halfof the list, dependingon whether the target key comes before or after the central one. With one comparison of keys we thus reduce the list to half its original size. Continuing in this way, at each step, we reduce the length of the list to be searched by half. In only twenty steps, this method will locate any requested key in a list containing more than a million keys. The method we are discussing is called binary search. This approach of course requires that the keys in the list be of a scalar or other type that can be regarded as restrictions having an order and that the list already be completely in order. 7.3.1 Ordered Lists What we are really doing here is introducing a new abstract data type, which is 222 defined in the following way. Definition An ordered list is a list in which each entry contains a key, such that the keys are in order. That is, if entry i comes before entry j in the list, then the key of entry i is less than or equal to the key of entry j . Section 7.3 • Binary Search 279 The only List operations that do not apply, without modification, to an ordered list are insert and replace. These standard List operations must fail when they would otherwise disturb the order of a list. We shall therefore implement an ordered list as a class derived from a contiguous List. In this derived class, we shall override the methods insert and replace with new implementations. Hence, we use the following class specification: class Ordered_list: public List < Record > { public: Ordered_list( ); Error_code insert(const Record &data); Error_code insert(int position, const Record &data); Error_code replace(int position, const Record &data); }; As well as overriding the methods insert and replace, we have overloaded the 223 method insert so that it can be used with a single parameter. This overloaded method places an entry into the correct position, determined by the order of the keys. We shall study this operation further in Chapter 8, but here is a simple, implementation-independent version of the overloaded method. If the list already contains keys equal to the new one being inserted, then the new key will be inserted as the first of those that are equal. Error_code Ordered_list :: insert(const Record &data) / * Post: If the Ordered_list is not full, the function succeeds: The Record data is inserted into the list, following the last entry of the list with a strictly lesser key (or in the first list position if no list element has a lesser key). Else: the function fails with the diagnostic Error_code overflow. * / { int s = size( ); int position; for (position = 0; position < s; position++) { Record list_data; retrieve(position, list_data); if (data >= list_data) break; } return List < Record > :: insert(position, data); } Here, we apply the original insert method of the base List class by using the scope resolution operator. The scope resolution is necessary, becausewe have overridden scope resolution this original insertion method with a new Ordered_list method that is coded as follows: 280 Chapter 7 • Searching 224 Error_code Ordered_list :: insert(int position, const Record &data) / * Post: If the Ordered_list is not full, 0 ≤ position ≤ n, where n is the number of entries in the list, and the Record data can be inserted at position in the list, without disturbing the list order, then the function succeeds: Any entry formerly in position and all later entries have their position numbers increased by 1 and data is inserted at position of the List. Else: the function fails with a diagnostic Error_code. * / { Record list_data; if (position > 0) { retrieve(position − 1, list_data); if (data < list_data) return fail; } if (position < size( )) { retrieve(position, list_data); if (data > list_data) return fail; } return List < Record > :: insert(position, data); } Note the distinction between overridden and overloaded methods in a derived class: The overridden methods replace methods of the base class by methods with matching names and parameter lists,whereas the overloaded methods merelymatch existing methods in name but have different parameter lists. 7.3.2 Algorithm Development Simple though the idea of binary search is, it is exceedingly easy to program it dangers incorrectly. The method dates back at least to 1946, but the first version free of errors and unnecessary restrictions seems to have appeared only in 1962. One study (see the references at the end of the book) showed that about 90 percent of professional programmers fail to code binary search correctly, even after working on it for a full hour. Another study 2 found correct solutions in only five out of twenty textbooks. Let us therefore take special care to make sure that we make no mistakes. To do this, we must state exactly what our variables designate; we must state precisely what conditions must be true before and after each iteration of the loop contained in the program; and we must make sure that the loop will terminate properly. Our binary search algorithm will use two indices, top and bottom, to enclose the part of the list in which we are looking for the target key. At each iteration, we 2 Richard E. Pattis, “Textbook errors in binary searching,” SIGCSE Bulletin, 20 (1988), 190–194. Section 7.3 • Binary Search 281 shall reduce the size of this part of the list by about half. To help us keep track of the 225 progress of the algorithm, let us write down an assertion that we shall require to be true before every iteration of the process. Such a statement is called an invariant of the process. The target key, provided it is present in the list, will be found between the indices bottom and top, inclusive. invariant We establish the initial correctness of this assertion by setting bottom to 0 and top to the_list.size( ) − 1. To do binary search, we first calculate the index mid halfway between bottom and top as mid = (bottom + top)/2 Next, we comparethe targetkey against the key at positionmid and thenwe change the appropriate one of the indices top or bottom so as to reduce the list to either its bottom or top half. Next, we note that binary search should terminate when top ≤ bottom; that termination is, when the remaining part of the list contains at most one item, providing that we have not terminated earlier by finding the target. Finally, we must make progress toward termination by ensuring that the num- progress ber of items remaining to be searched, top − bottom + 1, strictly decreases at each iteration of the process. Several slightly different algorithms for binary search can be written. 7.3.3 The Forgetful Version Perhaps the simplest variation is to forget the possibility that the Key target might be found quickly and continue, whether target has been found or not, to subdivide the list until what remains has length 1. 226 This method is implemented as the following function, which, for simplicity in programming, we write in recursive form. The bounds on the sublist are given as additional parameters for the recursive function. Error_code recursive_binary_1(const Ordered_list &the_list, const Key &target, int bottom, int top, int &position) / * Pre: The indices bottom to top define the range in the list to search for the target. Post: If a Record in the range of locations from bottom to top in the_list has key equal to target, then position locates one such entry and a code of success is returned. Otherwise, the Error_code of not_present is returned and position becomes undefined. Uses: recursive_binary_1 and methods of the classes List and Record. * / 282 Chapter 7 • Searching { Record data; if (bottom < top) { // List has more than one entry. int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data < target) // Reduce to top half of list. return recursive_binary_1(the_list, target, mid + 1, top, position); else // Reduce to bottom half of list. return recursive_binary_1(the_list, target, bottom, mid, position); } else if (top < bottom) return not_present; // List is empty. else { // List has exactly one entry. position = bottom ; the_list.retrieve(bottom, data); if (data == target) return success; else return not_present; } } The division of the list into sublists is described in the following diagram: 227 bottom top ? < target ≥ target Note that this diagram shows only entries strictly less than target in the first part of the list, whereas the last part contains entries greater than or equal to target.In this way, when the middle part of the list is reduced to size 1 and hits the target, it will be guaranteed to be the first occurrence of the target if it appears more than once in the list. If the list is empty, the function fails; otherwise it first calculates the value of mid. As their average, mid is between bottom and top, and so mid indexes a legitimate entry of the list. Note that the if statement that invokes the recursion is not symmetrical, since termination the condition tested puts mid into the lower of the two intervals. On the other hand, integer division of nonnegative integers always truncates downward. It is only these two facts together that ensure that the recursion always terminates. Let us determine what occurs toward the end of the search. The recursion will continue only as long as top > bottom. But this condition implies that when mid is calculated we always have bottom <= mid < top Section 7.3 • Binary Search 283 since integer division truncates downward. Next, the if statement reduces the size of the interval from top − bottom either to top − (mid + 1) or to mid − bottom, both of which, by the inequality, are strictly less than top − bottom. Thus at each iteration the size of the interval strictly decreases, so the recursion will eventually terminate. After the recursion terminates, we must finally check to see if the target key has been found, since all previous comparisons have tested only inequalities. To adjust the parameters to our standard search function conventions, we pro- duce the following search function: Error_code run_recursive_binary_1(const Ordered_list &the_list, const Key &target, int &position) main call to recursive_binary1 { return recursive_binary_1(the_list, target, 0, the_list.size( ) − 1, position); } Since the recursion used in the function recursive_binary_1 is tail recursion, we can easily convert it into an iterative loop. At the same time, we can make the parameters consistent with other searching methods. 228 Error_code binary_search_1 (const Ordered_list &the_list, const Key &target, int &position) / * Post: If a Record in the_list has Key equal to target, then position locates one such entry and a code of success is returned. Otherwise, not_present is returned and position is undefined. Uses: Methods for classes List and Record. * / { Record data; int bottom = 0, top = the_list.size( ) − 1; while (bottom < top) { int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data < target) bottom = mid + 1; else top = mid; } if (top < bottom) return not_present; else { position = bottom; the_list.retrieve(bottom, data); if (data == target) return success; else return not_present; } } 284 Chapter 7 • Searching 7.3.4 Recognizing Equality Although binary_search_1 is a simple form of binary search, it seems that it will often make unnecessary iterations because it fails to recognize that it has found the target before continuing to iterate. Thus we might hope to save computer time with a variation that checks at each stage to see if it has found the target. In recursive form this method becomes: 229 Error_code recursive_binary_2(const Ordered_list &the_list, const Key &target, int bottom, int top, int &position) / * Pre: The indices bottom to top define the range in the list to search for the target. Post: If a Record in the range from bottom to top in the_list has key equal to target, then position locates one such entry, and a code of success is returned. Otherwise, not_present is returned, and position is undefined. Uses: recursive_binary_2, together with methods from the classes Ordered_list and Record. * / { Record data; if (bottom <= top) { int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data == target) { position = mid; return success; } else if (data < target) return recursive_binary_2(the_list, target, mid + 1, top, position); else return recursive_binary_2(the_list, target, bottom, mid − 1, position); } else return not_present; } As with run_recursive_binary_1, we need a function run_recursive_binary_2 to ad- just the parameters to our standard conventions. Error_code run_recursive_binary_2(const Ordered_list &the_list, const Key &target, int &position) main call to recursive_binary2 { return recursive_binary_2(the_list, target, 0, the_list.size( ) − 1, position); } Again, this functioncanbe translatedintononrecursiveformwith onlythestandard parameters: Section 7.3 • Binary Search 285 230 Error_code binary_search_2(const Ordered_list &the_list, const Key &target, int &position) / * Post: If a Record in the_list has key equal to target, then position locates one such entry and a code of success is returned. Otherwise, not_present is returned and position is undefined. Uses: Methods for classes Ordered_list and Record. * / { Record data; int bottom = 0, top = the_list.size( ) − 1; while (bottom <= top) { position = (bottom + top)/2; the_list.retrieve(position, data); if (data == target) return success; if (data < target) bottom = position + 1; else top = position − 1; } return not_present; } The operation of this version is described in the following diagram: 231 bottom top ? < target > target Notice that this diagram (in contrast to that for the first method) is symmetrical in that the first part contains only entries strictly less than target, and the last part contains only entries strictly greater than target. With this method, therefore, if target appears more than once in the list, then the algorithm may return any instance of the target. Proving that the loop in binary_search_2 terminates is easier than the proof for loop termination binary_search_1.Inbinary_search_2, the form of the if statement within the loop guarantees that the length of theinterval is reduced by at least half ineach iteration. comparison of methods Which of these two versions of binary search will do fewer comparisons of keys? Clearly binary_search_2 will, if we happen to find the target near the begin- ning of the search. But each iteration of binary_search_2 requires two comparisons of keys, whereas binary_search_1 requires only one. Is it possible that if many it- erations are needed, then binary_search_1 may do fewer comparisons? To answer this question we shall develop new analytic tools in the next section. Exercises 7.3 E1. Suppose that the_list containsthe integers 1, 2, , 8. Trace through the steps of binary_search_1 to determine what comparisons of keys are done in searching for each of the following targets: (a) 3, (b) 5, (c) 1, (d) 9, (e) 4.5. E2. Repeat Exercise E1 using binary_search_2. [...]... Searching E3 [Challenging] Suppose that L1 and L2 are ordered lists containing n1 and n2 integers, respectively (a) Use the idea of binary search to describe how to find the median of the n1 + n2 integers in the combined lists (b) Write a function that implements your method Programming Projects 7.3 P1 Take the driver program of Project P1 of Section 7.2 (page 277), and make binary_search_1 and binary_search_2... true that all the other operations (such as incrementing and comparing indices) have gone in lock step with comparison of keys In fact, the frequency of such basic actions is much more important than is a total count of all operations, including the housekeeping The total including housekeeping is too dependent on the choice of programming language and on the programmer’s particular style, so dependent... doubling the corresponding coordinate In the third graph we wish to compare the two versions of binary search; a semilog graph is appropriate here, so that the vertical axis maintains linear units while the horizontal axis is logarithmic Sequential + + Binary 2 + + + + + + + + + + + Binary 1 + + + + + + + + 6 5 4 3 2 Binary + + + + Sequential 2048 1024 51 2 256 Sequential 128 64 32 Binary 2 16 8 Binary... 1 second might involve a million (106 ) basic actions, and doubling the size of the input would then require 1012 basic actions, increasing the running time from 1 second to 11 1 days Doubling the input again raises the count of basic 2 actions to 1024 and the time to about 30 billion years The function 2n grows very rapidly indeed as n increases Our desire in formulating general principles that will... fewer times, about lg n times instead of n times, and as the number n increases, the value of lg n grows much more slowly than does the value of n In the context of comparing underlying methods, the differences between binary_search_1 and binary_search_2 become insignificant in comparison with the difference between either binary search and sequential search For large lists, binary_search_2 may require... obtain a theorem like Theorem 7.6 giving lower bounds for worst and average case behavior for an unsuccessful search by such an algorithm (b) Use Theorem 7.4 to obtain a similar result for successful searches (c) Compare the bounds you obtain with the analysis of binary_search_2 Programming Project 7 .5 P1 (a) Write a program to do interpolation search and verify its correctness (especially termination)... what we would intuitively conclude, that binary_search_2 is probably not worth the effort, since for large problems binary_search_1 is better, and for small problems, sequential_search is better To be fair, however, with some computers and optimizing compilers, the two comparisons needed in binary_search_2 will not take double the time of the one in binary_search_1, so in such a situation binary_search_2... 7.7 binary_search_1 is optimal in the class of all algorithms that search an ordered list by making comparisons of keys In both the average and worst cases, binary_search_1 achieves the optimal bound Section 7 .5 • Lower Bounds 301 An informal way to see why Corollary 7.7 is true is to start with an arbitrary searching algorithm and imagine drawing its comparison tree for a list of length n Since... comparisons? Programming Projects 7.4 P1 (a) Write a “ternary” search function analogous to binary_search_2 that examines the key one-third of the way through the list, and if the target key is greater, then examines the key two-thirds of the way through, and thus in any case at each pass reduces the length of the list by a factor of three (b) Include your function as an additional option in the testing program. .. help in finding it, if you are searching by means of comparisons of keys Exercises 7.4 E1 Draw the comparison trees for (i) binary_search_1 and (ii) binary_search_2 when (a) n = 5, (b) n = 7, (c) n = 8, (d) n = 13 Calculate the external and internal path lengths for each of these trees, and verify that the conclusion of Theorem 7.3 holds E2 Sequential search has less overhead than binary search, and . search function and testing pro- gram in order to operate on simply linked lists as developed in Section 6.2.3? linked sequential search Make these changes and apply the testing program from Project. ); Error_code insert(const Record & ;data) ; Error_code insert(int position, const Record & ;data) ; Error_code replace(int position, const Record & ;data) ; }; As well as overriding the methods insert. should terminate when top ≤ bottom; that termination is, when the remaining part of the list contains at most one item, providing that we have not terminated earlier by finding the target. Finally,