Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 400 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
400
Dung lượng
12,58 MB
Nội dung
CHAPTER SIX SEARCHING Let’s look at the record — AL SMITH (1928) This chapter might have been given the more pretentious title “Storage and on the other hand, it might simply have been called Retrieval of Information” Table Look-Up.” in a computer s We ; are concerned with the process of collecting information in such a way that the information can subsequently be memory, recovered as quickly as possible Sometimes we are confronted with more data than we can really use, and it may be wisest to forget and to destroy most of it; but at other times it is important to retain and organize the given facts in such a way that fast retrieval is possible Most how of this chapter is devoted to the study of a very simple search problem: to find the data that has been stored with a given identification For example, in a numerical application we might want to find /( x), given x and a table of the values of /; in a nonnumerical application, we might want to find the English translation of a given Russian word In general, we shall suppose that a set of records has been stored, and the problem is to locate the appropriate one As in the case of sorting, we N assume that each record includes a special is many especially appropriate, because day searching for their keys We that each key uniquely identifies a table or field called its key; this terminology people spend a great deal of time every keys to be distinct, so generally require the its record where the word “table” The N collection of all records is called usually used to indicate a small and “file” is usually used to indicate a large table large file or a group of is frequently called a database file, is A file, files Algorithms for searching are presented with a so-called argument, K, and the problem is to find which record has as its key After the search is complete, two possibilities can arise: Either the search was successful, having located the unique record containing K; or it was unsuccessful, having determined that is nowhere to be found After an unsuccessful search it is sometime desirable to K K enter a new record, containing K, into the table; a method that does this is called a search- and-insertion algorithm Some hardware devices known as associative memories solve the search problem automatically, in a way that might resemble the functioning of a human brain; but we shall study techniques for searching on a conventional general-purpose digital computer Although the goal of searching is to find the information stored in the record associated with K, the algorithms in this chapter generally ignore everything but 392 SEARCHING the keys themselves In practice located K ; for 393 we have find the associated data once K if appears in location TABLE + i, the associated data might be in location TABLE + * + 1, or in DATA + i, etc It is what should be done after has example, (or a pointer to we can it) K therefore convenient to gloss over the details of been successfully found Searching is the most time-consuming part of many programs, and method for a bad one often we can often arrange the data the substitution of a good search leads to a substantial increase in speed In fact or the data structure is eliminated entirely, by ensuring that we always know just where to find the information we need Linked memory is a common way to achieve this; for example, a doubly linked list makes it unnecessary to search for the predecessor or successor of a given item Another way to avoid searching occurs if we are allowed to choose the keys freely, since we might as well let them be the numbers {1,2, , TV); then the record containing can simply be placed in location TABLE + K Both of these techniques were used to eliminate searching from the topological sorting algorithm discussed in Section 2.2.3 However, searches would have been necessary if the objects in the topological sorting algorithm had been given symbolic names instead of numbers Efficient algorithms for searching turn out to be quite important in practice so that searching K We might divide them we divided the sorting algorithms Or we might divide search dynamic searching, where “static” means that the Search methods can be classified in several ways into internal versus external searching, just as of Chapter into internal versus external sorting methods into static versus contents of the table are essentially unchanging (so that it is important to min- imize the search time without regard for the time required to set up the table), and “dynamic” means that the table is subject to frequent insertions and perhaps also deletions A third possible scheme is to classify search methods according to whether they are based on comparisons between keys or on digital properties of the keys, analogous to the distinction between sorting by comparison and sorting by distribution Finally we might divide searching into those methods that use the actual keys and those that work with transformed keys The organization of this chapter is essentially a combination of the latter two modes of classification Section 6.1 considers “brute force” sequential methods of search, then Section 6.2 discusses the improvements that can be made based on comparisons between keys, using alphabetic or numeric order to govern the decisions Section 6.3 treats digital searching, and Section 6.4 discusses an important class of methods called hashing techniques, based on arithmetic transformations of the actual keys Each of these sections treats both internal and external searching, in both the static and the dynamic case; and each section points out the relative advantages and disadvantages of the various algorithms Searching and sorting are often closely related to each other For example, consider the following problem: Given two sets of numbers, and B = {bi, b %, suggest themselves: b n }, determine whether or not A C A= B {aj, 02 , , am } Three solutions SEARCHING 394 Compare each a sequentially with the 6/s until finding a match t Sort the a’s and 6’s, then make one sequential pass through both checking the appropriate condition Enter the 6/s in a table, then search for each of the a, files, m and n some constant a, and some (larger) constant With a suitable hashing method, solution will take roughly c m + c4 n units of time, for some (still larger) constants c and c It follows that solution is good for very small to and n, but solution soon becomes better as m and n grow larger Eventually solution becomes preferable, until n exceeds the internal memory size; then solution is usually again superior until n gets much larger still Thus we have a situation where sorting is sometimes a good substitute for searching, and searching is sometimes a good substitute for sorting More complicated search problems can often be reduced to the simpler case Each of these solutions Solution will take is attractive for a different range of values of roughly Cimn solution will take about C (to lg units of time, for m + n lg n) units, for c.-i considered here For example, suppose that the keys are words that might be slightly misspelled; we might want to find the correct record in spite of this error If we make two copies of the file, one in which the keys are in normal lexicographic order and another in which they are ordered from right to left (as the words were spelled backwards), a misspelled search argument will probably agree up to half or more of its length with an entry in one of these two files The search methods of Sections 6.2 and 6.3 can therefore be adapted to find the key if that was probably intended A related problem has received considerable attention in connection with airline reservation systems, when and in other applications involving people’s names is a good chance that the name will be misspelled due to poor handwriting or voice transmission The goal is to transform the argument into some code that tends to bring together all variants of the same name The there following contemporary form of the “Soundex” method, a technique that was originally developed by Margaret K Odell and Robert C Russell [see U.S Patents 1261167 (1918), 1435663 (1922)], has often been used for encoding surnames: Retain the first letter of the name, and drop u, w, y in other positions Assign the following numbers to the remaining letters after the b, f, p, v -> 1 c, g, j, k, q, s, x, z d, t If — >• -A two or more occurrences of i, o, first: A6 with the same code were adjacent in the original name (before step 1), or adjacent except for intervening h’s the a, e, h, m, n -» r letters all and w’s, omit all but first Convert to the form “letter, digit, digit, digit” by adding trailing zeros (if there are less than three digits), or by dropping rightmost digits are more than three) (if there SEARCHING 395 For example, the names Euler, Gauss, Hilbert, Knuth, Lloyd, Lukasiewicz, and Wachs have the respective codes E460, G200, H416, K530, L300, L222, W200 Of course this system will bring together names that are somewhat different, names that are similar; the same seven codes would be obtained for Ghosh, Heilbronn, Kant, Liddy, Lissajous, and Waugh And on the other hand a few related names like Rogers and Rodgers, or Sinclair and St Clair, or Tchebysheff and Chebyshev, remain separate But by and large the Soundex code greatly increases the chance of finding a name in one of its disguises [For as well as Ellery, Bourne and D F Ford, JACM (1961), 538Leon Davidson, CACM (1962), 169-171; Federal Population Censuses 1790-1890 (Washington, D.C.: National Archives, 1971), 90.] When using a scheme like Soundex, we need not give up the assumption that all keys are distinct; we can make lists of all records with equivalent codes, further information, see C P 552; treating each list as a unit Large databases tend to make the retrieval process more complex, since many different fields of each record as potential keys, with the ability to locate items when only part of the key information is specified For example, given a large file about stage performers, a producer might wish to find all unemployed actresses between 25 and 30 with dancing talent and a French accent; given a large file of baseball statistics, a sportswriter may wish to determine the total number of runs scored by the Chicago White Sox in 1964, during the seventh inning of night games, against left-handed pitchers Given a large file of data about anything, people like to ask arbitrarily complicated questions Indeed, we might consider an entire library as a database, people often want to consider and a searcher may want to information retrieval An find everything that has been published about introduction to the techniques for such secondary key (multi-attribute) retrieval problems appears below in Section 6.5 it may be helpful to put During the pre-computer era, many books of etc., were compiled, so that mathematical calculations could be replaced by searching Eventually these tables were transferred to punched cards, and used for scientific problems in connection with collators, sorters, and duplicating punch machines But when stored-program computers were introduced, it soon became apparent that it was now cheaper to recompute logo; or cos a: each time, instead of looking up the answer in a table Although the problem of sorting received considerable attention already in the earliest days of computers, comparatively little was done about algorithms for searching With small internal memories, and with nothing but sequential media like tapes for storing large files, searching was either trivially easy or Before entering into a detailed study of searching, things in historical perspective logarithm tables, trigonometry tables, almost impossible But the development of larger and larger random-access memories during the 1950s eventually led to the recognition that searching was an interesting problem own complaining about the limited amounts of space in the early machines, programmers were suddenly confronted with larger amounts of memory than they knew how to use efficiently in its right After years of SEARCHING 396 The first Computers surveys of the searching problem were published by A P2 — C/ e , ., Pat log 80/ log 20 as before, — c/N and where , is the c=l/H^ \ 13 ) ( Ath harmonic number of s, namely l~ + 2~ + + A~ Notice that this probability distribution very similar to that of Zipf’s law ( ); as varies from to 0, the probabilities order is = s s s SEQUENTIAL SEARCHING 6.1 vary from a uniform distribution to a Zipfian one Applying to (3) CN = as the mean number Hk e) /H$- e = > ^ + 0(N ~e ) «0 ( 401 13 ) yields 1221V ( 14 ) of comparisons for the 80-20 law (see exercise ) A study of word frequencies carried out by E S Schwartz [see the interesting graph on page 422 of JACM 10 (1963)] suggests that distribution 13 with a ( ) slightly negative value of gives a better fit to the data than Zipf’s law ) In ( this case the mean value (-) is substantially smaller than ( ) as N— 00 Distributions like ( 11 ) and ( 13 ) were first studied by Vilfredo Pareto in connection with disparities of personal income and wealth [Cours d’Economie Politique (Lausanne: Rouge, 1897), 304-312] If p k is proportional to the wealth of the fcth richest individual, the probability that a person’s wealth exceeds or equals x times the wealth of the poorest individual is k/N when x = Pk/PN Thus, when p k = ck e ~ l and x = (k/N) - the stated probability /( 1-0 is x ); this is now called a Pareto distribution with parameter 1/(1 — 9) , Curiously, Pareto didn’t understand his own distribution; he believed that a value of near would correspond to a more egalitarian society than a value near 1! His error was corrected by Corrado Gini [Atti della III Riunione il Progresso delle Scienze (1910), reprinted in his della Societa Italiana per Memorie Metodologia Statistica (Rome: 1955), 3-120], who was the first person to formulate and explain the significance of ratios like the 80-20 law 10 ) ( People still tend to misunderstand such distributions; they often speak about a di if an a-b law makes sense only when a + b = 100, 12 ) shows that the sum 80 + 20 is quite irrelevant ( Another discrete distribution analogous to ( 11 ) and ( 13 ) was introduced by “75-25 law” or a “90-10 law” as while G Udny Yule when he studied the increase in biological species as a function of time, assuming various models of evolution [Philos Trans B213 (1924), 21-87], Yule’s distribution applies c Pi c,P2 -9 ,P3 when limiting value c = 2: (iV — 1)! c PN ~ (N-9) (2-9) (3-0)(2-0)’ c= The < 2c ~ /Hpr or c (V) 1-0 is used when > \N- 1) 16 ) ( i-eyy — l/N (N-0\ = or = A “self-organizing” file These calculations with probabilities are very nice, but in most cases we don’t know what the probabilities are We could keep a count in each record of how often it has been accessed, reallocating the records on the basis of those counts; the formulas derived above suggest that this procedure would often lead to a worthwhile savings But we probably don’t want to devote INDEX AND GLOSSARY Rost, Hermann, 614, 671 Rotations in a binary tree, 481 double, 461, 464, 477 single, 461, 464, 477 Rotem, Doron (DJin inn), 61 Rothe, Heinrich August, 14, 48, 62, 592 Rouche, Eugene, theorem, 681 Roura Ferret, Salvador, 478 Roving pointer, 543 Rovner, Paul David, 578 Royalties, use of, 407 Rubin, Herman, 728 Rudolph, Lawrence Set, 673 Runs of a permutation, 35-47, 248, 259-266, 387 Russian roulette, 21 Rustin, Randall Dennis, 315, 353 Rytter, Wojciech, 454 Jerome David, 578 Sackman, Bertram Stanley, 279, 684 Sable, Sagan, Bruce Eli, 48 Thomas Joshua, 513 Yehoshua Chaim ”n ywr), 721 match, 9, 394, 408, 563, 566, 581 Sagot, Marie-France, 615 Saks, Michael Ezra, 452, 660, 673 Salveter, Sharon Caroline, 477 Salvy, Bruno, 565 Samadi, Behrokh 721 Samet, Hanan (vnv pn), 566 Samplesort, 122, 720 Sampling, 587 Samuel, son of Elkanah (mppN PNIOW), 481 Samuel, Arthur Lee, 547 Sandelius, David Martin, 656 Sankoff, David Lawrence, 614 p Sapozhenko, Alexander Antonovich (Canoxcemco, A-JieKcaHflp Ahtohobhh), 669 Sarnak, Neil Ivor, 583 Sasson, Azra (llOT 369 Satellite information: Record minus key, 4, 74 Satisfiability, 242 p Search-and-insertion algorithm, 392 Searching, 392-583; see External searching, Internal searching; Static table searching, Symbol table algorithms, by comparison of keys, 398-399, 409-491, 546-547 by digits of keys, 492-512 by key transformation, 513-558 for closest (3>5\y Saul, son of Kish (W’p Sawtooth order, 452 Richard Arthur, 47 Scrambling function, 517, 590, 709 Scoville, Russell, Robert Clifford, 394 Sager, Sagiv, 777 Schkolnick, Mario, 721 Schlegel, Stanislaus Ferdinand Victor, 270 Schlumberger, Maurice Lorrain, 366 Schmidt, Jeanette Pruzan, 708, 742 Schneider, Donovan Alfred, 549 Schonhage, Arnold, 215, 218 Schott, Rene Pierre, 713 Schreier, Jozef, 209 Schulte Monting, Jurgen, 192, 659 Schur, Issai, function, 611-612 Schiitzenberger, Marcel Paul, 17, 21, 39, 55, 57-58, 66, 68, 70, 670 Schwartz, Eugene Sidney, 401 Schwartz, Jules Isaac, 128 Schwiebert, Loren James, II, 229 481 Sawyer, Thomas, 747 SB-tree, 489 SB-tree, 489 Scatter storage, 514 Schachinger, Werner, 576 Schaffer, Alejandro Alberto, 708 Schaffer, Russel Warren, 155, 157, 645 Schay, Geza, Jr., 538, 555, 729 Schensted, Craige Eugene (= Ea Ea), 57-58, 66 Scherk, Heinrich Ferdinand, 644 match, 559-582 geometric data, 563-566 history, 395-396, 420-422, 453, 547-549, 578 methods, see B- trees, Balanced trees, Binary search, Chaining, Fibonaccian for partial search, Interpolation search, Open addressing, Patricia, Sequential search, Tree search, Trie search, optimum, 413, 425, 549, see Optimum also binary search trees, Optimum digital search trees, parallel, 425 related to sorting, v, 2, 393-394, 409, 660 text, 511, 572, 578 two-dimensional, 207 Sears, Richard Warren, 757 Secant numbers, 610-611 Secondary clustering, 529, 551, 554 Secondary hash codes, 741 Secondary key retrieval, 395, 559-582 Sedgewick, Robert, 91, 93, 95, 114, 115, 122, 136, 152, 155, 157, 477, 512, 623, 629, 630, 633, 638, 645, 674, 726 Seeding in a tournament, 208 Seek time, 358, 362-365, 368-369, 407, 562-563 Sefer Yetzirah (m> 2P IDO), 23 Ludwig von, 611 Raimund, 478 Seidel, Philipp Seidel, t largest, 218-219, 408 for, 232-234, 238 Selection of tth largest, 136, 207-219, 472 networks for, 234, 238 Selection sorting, 54-55, 73, 138-158, 222 Selection of networks INDEX AND GLOSSARY 778 Selection trees, 141-144, 252, 256-258 Self-adjusting binary trees, see Splay trees Self-inverse permutations, 599, see also Involutions Self-modifying programs, 85, 107, 174, 640 Self-organizing files, 401-403, 405-406, 478, 521, 646 Selfridge, John Lewis, Senko, Michael Edward, 487 Sentinel: A special value placed in a table, designed to be easily recognizable by the accompanying program, 4, 105, 159, 252, 308, 387 Separation sorting, 343 Sequential allocation, 96, 149, 163-164, 170-171, 386, 459 Sequential file processing, 2-3, 6-10, 248 Sequential search, 396-409, 423 Sets, testing equality, 207 Singh, Parmanand fRIT), 270 Single hashing, 556-557 Single rotation, 461, 464, 477 Singleton, Richard Collom, 99, 115, 122, 136, 572, 581 Sinking sort, 80, 106, see Straight insertion Skew heaps, 152 Skip lists, 478 Slagle, SLB James Robert, (shift left rAX Sleator, Daniel 704 binary), 516, 529 152, Dominic Kaplan, 403, 478, 583, 718 Sloane, Neil James Alexander, 479 Slupecki, Jerzy, 209 Smallest-in-first-out, see Priority queues Smith, Alan Jay, 168, 695 Smith, Alfred Emanuel, 392 Smith, Cyril Stanley, 593 Smith, Wayne Earl, 405 Snow job, 255-256, 260-261, 263-266 testing inclusion, 393-394 Kenneth Clem, 564 Seward, Harold Herbert, 79, 170, 255, Sevcik, 387, 670, 696 Sexagesimal number system, 420 Seymour, Paul Douglas, 402 Shackleton, Patrick, 136 Shadow Signed magnitude notation, 177 Signed permutations, 615 Silicon Graphics Origin2000, 390 Silver, Roland Lazarus, 591 Silverstein, Craig Daryl, 152 Simon, Istvan Gusztav, 642 Simulation, 351-353 Singer, Theodore, 279 keys, 588 Shanks, Daniel Charles, 591 Snyder Holberton, Frances Elizabeth, Shannon, Claude Elwood, Jr., 442, 457, 712 Shapiro, Gerald Norris, 226-227, 229, 243 324, 386, 387 Sobel, Milton, 212, 215, 216, 217, 218 Sobel, Sheldon, 311, 316 SODA: Proceedings of the Symposia on Discrete Algorithms, inaugurated in 1990 Software, 387-390 Solitaire (patience), 42-45 Sort generators, 338-339, 387-388 Sorting (into order), 1-391; see External sorting, Internal sorting; Address calculation sorting, Enumeration Shapiro, Henry David, 668 Shapiro, Louis Welles, 607 Shar, Leonard Eric, 416, 423, 706 Shasha, Dennis Elliott, 488 Shearer, James Bergheim, 660 Sheil, Beaumont Alfred, 457 Shell, Donald Lewis, 83, 93, 279 Shellsort, 83-95, 98, 102-105, 111, 148, 380, 382, 389, 669, 698 Shepp, Lawrence Alan, 611 Sherman, Philip Martin, 492 Shields, Paul Calvin, 728 Shift-register device, 407 Shifted tableaux, 67 Shockley, William Bradford, 668 Sholmov, Leonid Ivanovich (HIojimob, JleoHHA IlBaiiOBHu), 351 ACM-SIAM sorting, sorting, by insertion, 73, 80-105, 222 by merging, 98, 158-168 by reversals, 72 by selection, 138-158 Shrairman, Ruth, 152 Shrikhande, Sharadchandra Shankar 746 Shuffle network, 227, 236-237 Shuffling, 7, 237 SICOMP: SIAM Journal on Computing published by the Society for Industrial , and Applied Mathematics since 1972 Sideways addition, 235, 643, 644, 717 Siegel, Alan Richard, 708, 742 Siegel, Shelby, 623 Sifting, 80, see Straight insertion Siftup, 70, 145-146, 153-154, 157 Exchange sorting, Insertion Merge sorting, Radix sorting, Selection sorting, adaptive, 389 by counting, 75-80 by distribution, 168-179 by exchanging, 105-138 history, 251, 383-390, 421 in O(N) steps, 5, 102, 176-179, 196, 616 into unusual orders, 7-8 methods, see Binary insertion sort, Bitonic sort, Bubble sort, Cocktail-shaker sort, Comparison counting sort, Distribution counting sort, Heapsort, Interval exchange sort, List insertion sort, List merge sort, Median-of-three quicksort, Merge exchange sort, Merge insertion INDEX AND GLOSSARY sort, Multiple list insertion sort, Natural merge sort, Odd-even transposition sort, Pratt sort, Quicksort, Radix exchange sort, Radix insertion sort, Samplesort, Shellsort, Straight sort, insertion sort, Straight Radix merge list sort, Straight selection sort, Tree insertion sort, Tree selection sort, insertion sort; see also networks for, Two-way Merge patterns, 219-247 parallel, 113, Steiner triple systems, 576-577, 580-581, 745 Steinhaus, Hugo Dyonizy, 186, 209, 422, 518 Stepdowns, 160, 262 Stevenson, David Kurl, 671 James, approximation, 63, 129, 182, 197 numbers, 45, 175, 455, 602, 653, Stirling, 679, 739, 754 ACM STOC: optimum, 180-247 222-223, 228-229, 235, 390, 671 punched cards, 169-170, 175, 383-385, 694 related to searching, v, 2, 393-394, 409, 660 779 Steiner, Jacob, 745 Proceedings of the Symposia on Theory of Computing, inaugurated in 1969 Stockmeyer, Paul Kelly, 202 Stone, Harold Stuart, 237, 425 Stop/start time, 319-320, 331, 342 Stoyanovskii, Alexander Vasil’evich (Ctobhobckhh, AjiCKcaimp BacHjibeBHu) 70, 614 , stable, 4-5, 17, 24, 25, 36-37, 79, 102, 134, 155, 167, 347, 390, 584, 615, 653 topological, 9, 31-32, 62, 66-67, 187, 216, 393, 593 two-line arrays, 34 variable-length strings, 177, 178, 489, 633 with one tape, 353-356 with two tapes, 348-353, 356 Sos, Vera Turan Paine, 518, 747 Soundex, 394-395 Stratified trees, 152 Spacings, 458 Sparse arrays, 721-722 Spearman, Charles Edward, 597 Speedup, see Loop optimization Spelling correction, 394, 573 Sperner, Emanuel, theorem, 744 Splay trees, 478 Splitting a balanced tree, 474-475, 480 Sprugnoli, Renzo, 513 Spruth, Wilhelm Gustav Bernhard, 538, 555 Spuler, SRB David Andrew, 711 (shift right rAX binary), 125-126, 134, 411 Stable merging, 390 Stable sorting, 4-5, 17, 24, 25, 36-37, 79, 102, 134, 155, 167, 347, 390, 584, 615, 653 Stacks, 21, 60, 114-117, 122, 123-125, 135, 148, 156, 168, 177, 299, 310, 350, 473 Edney Webb, 704 Stael-Holstein, Anne Louise Germaine Stacy, Necker, Baronne de, 589 Standard networks of comparators, 234, 237-238, 240, 244 Eugene, 457 Stanley, Richard Peter, 69, 600, 605, 606, 670, 671 Stasevich, Grigory Vladimirovich (CTaceBHu, Stanfel, Larry r pHrOpHH BjiaOTMHpOBHH), Straight insertion sort, 80-82, 96, 102, 105, 110, 116-117, 127, 140, 148, 163, 222-223, 380, 382, 385, 386, 390, 676 Straight merge sort, 162-163, 167, 183, 193, 387 Straight selection sort, 110, 139-140, 148, 155-156, 381, 382, 387, 390 91 Stasko, John Thomas, 152 Static table searching, 393, 409-426, 436-458, 492-496, 507-508, 513-515 Stearns, Richard Edwin, 351, 356 Straus, Ernst Gabor, 704 Strings: Ordered subsequences, 248, see Runs Strings: Sequences of items, 22, 27-28, 72, 248, 494 recurrence relations for, 274-275, 284, 287, 308 sorting, 177, 178, 489, 633 Striping, 342, 370-373, 378, 379, 389, 698 Strong, Hovey Raymond, Jr., 549 Strongly T-fifo trees, 310-311, 345, 348 Successful searches, 392, 396, 532, 550 Sue, Jeffrey Yen (Jt|5 '£l), 693 Suel, Torsten, 623, 667 Sugito, Yoshio 727 Sum of uniform deviates, 47 Summation factor, 120 Sun SPARCstation, 782 Superblock striping, 370, 371, 379 Superfactorials, 612 Superimposed coding, 570-573, 579 l Surnames, encoding, 394-395 Sussenguth, Edward Henry, Jr., 496 Swierczkowski, Stanislaw Slawomir, 518 Swift, Jonathan, vii Sylvester, James Joseph, 622 Symbol table algorithms, 3, 426-435, 455, 496-512, 520-558 Symmetric binary B-trees, 477 Symmetric functions, 239, 608-609 Symmetric group, 48, see Permutations Symmetric order: Left subtree, then root, then right subtree, 412, 427, 658 Symvonis, Antonios (EopPtovric, AvccovtoQ, 702 SyncSort, 369, 371, 699 Szekeres, Gyorgy (= George), 66 Szemeredi, Endre, 228, 549, 673, 740 Szpankowski, Wojciech, 726, 727, 728 INDEX AND GLOSSARY 780 Ting, Tze Ching (T^PiH), 261 T-fifo trees, 310-311 strongly, 310—311, 345, 348 X-lifo trees, 305-310, 346, 348 Tableaux, 47-72, 240, 670-671 Tables, 392 of numerical quantities, 748-751 Tag sorting, see Keysorting Tail inequalities, 379, 636 Tainiter, Melvin, 740 Takacs, Lajos, 745 Tamaki, Jeanne Keiko (jESMS^)) 454 Tamari, Dov (nail 37), born Bernhard Teitler, 718 Tamminen, Markku, 176-177, Tan, Kok Chye 457, 179 711 Tangent numbers, 602, 610-611 Tanner, Robert Michael, 660 Tannier, Eric, 615 Tanny, Stephen Michael, 606 Tape searching, 403-407 Tape splitting, 281-287 polyphase merge, 282-285, 287, 298, 326-327, 333, 338 Tapes, see Magnetic tapes Tardiness, 407 Tarjan, Robert Endre, 152, 214, 215, 403, 477, 478, 549, 583, 590, 649, 652, 713, 718, 722 Tarter, Michael Ernest, 99 230 Tarui, Jun Telephone directories, 409, 561, 573 Tengbergen, Cornelia van Ebbenhorst, 744 Tenner, Bridget Eileen, 669 Tennis tournaments, 207-208, 216 Terabyte sorting, 390 Ternary comparison trees, 194 Ternary heaps, 157 Ternary trees for tries, 512 Terquem, Olry, 591 Tertiary clustering, 554 Testing several conditions, 406 Teuhola, Jukka Ilmari, 649 TfeX, iv, vi-vii, 531, 722, 782 Text searching, 511, 572, 578 Theory meets practice, 318 Thiel, Larry Henry, 578 Thimbleby, Harold William, 627 Thimonier, Loys, 703 Thorup, Mikkel, 181 Thrall, Robert McDowell, 60, 67 Threaded trees, 267, 454-455, 464, 708 Three-distance theorem, 518, 550 Three-way radix quicksort, see Multikey quicksort Thue, Axel, 422, 494 trees, 426 Thumb indexes, 419, 492 Thurston, William Paul, 718 Tichy, Robert Franz, 644 Tie-breaking trick, 404 Tobacco, 72 Togetherness, Tokuda, Naoyuki (iSESjnjiJl), 95 Topological sorting, 9, 31-32, 62, 66-67, 187, 216, 393, 593 Total displacement, 22, 102 Total order, Total variance, 735, 742 Touchard, Jacques, 653 Tournament, 141-142, 207-212, 216, 253-254 Townsend, Gregg Marshall, 549 Trabb Pardo, Luis Isidoro, 645, 702 Tracks, 357, 482 Trading tails, 64 Transitive law, 4-5, 18-19, 182, 207, 456 Transpose of a matrix, 6-7, 14, 567, 617 Transposition sorting, see Exchange sorting Treadway, Jennifer Ann, 595 Treaps, vii, 478 Tree function T(z), 606, 713, 740 Tree hashing, 553 Tree insertion sort, 98, 389, 431, 453, 675 Tree network of processors, 267 Tree representation of algorithms, see Decision trees Tree representation of distribution patterns, 344-345, 348 Tree representation of merge patterns, 303-306, 309-311, 363-364, 377 Tree search, 427-431, 482, 546-547 generalized, 490 Tree selection sort, 141-144, 167, 183, 210, 217, 388, 664 Tree traversal, 138, 427, 431 Trees, Treesort, see Tree selection sort, Heapsort Tribolet, Charles Siegfried, 623 Tribonacci sequence, 270 Trichotomy law, 4-5 Tricomi, FVancesco Giacomo Filippo, 131 Trie memory, see Tries Trie search, 492-496, 500-502, 508-509 Tries, 492-496, 507-509, 512 binary, 500-502 compressed, 507, 722 generalized, 576 multidimensional, 576 optimum, 508 represented as forests, 494-496, 508, 512 represented as ternary trees, 512 Tripartitioning, 633, 635 Triple systems, 576-577, 580-581, 745 Triply linked trees, 158, 475 Trotter, William Thomas, 658 Trousse, Jean-Michel, 747 Truesdell, Leon Edgar, 384 Truncated octahedron, 13, 18 Trybula, Stanislaw, 186 Tsetlin, Mikhail L’vovich (U,eTjiHH, Muxauji JlbBOBHu), 703 INDEX AND GLOSSARY Tucker, Alan Curtiss, 454 Tumble instruction, 82 Thomas Norbert, 496 Turing, Alan Mathison, 583 Turba, machine, 676 Wladyslaw Marek, 513 Twain, Mark (= Clemens, Samuel Langhorne), 747 Turski, Twin Twin heaps, 645 primes, 529 Two-line notation for permutations, 13-14, 24, 35, 43-44, 51-54, 64-65 Two-tape sorting, 348-353, 356 Two-way branching, 425, 457 Two-way insertion sort, 83, 98 Two-way merging, 158-159, 166, 248, 370, 379, 386 781 Variable-length strings, sorting, 177, 178, 489, 633 Variance, different notions of, 709 Vector representation of merge patterns, 302-303, 309, 310 Velthuis, Frans Jozef, 782 Venn, John Leonard, 302 Verbin, Elad (fill JVb>N), 615 Vershik, Anatoly Moiseevich (Bepmrnc, AiiaTojiHft MonceeBHH), 611 Viennot, Gerard Michel Frangois Xavier, 152 Viola Deambrosis, Alfredo, 740, 741, 742 Virtual memory, 378, 389, 547 Vitter, Jeffrey Scott (HffiliO, 152, 371, 489, 548, 698, 730, 731, 735, 736 von Mises, Richard, Edler, 513 Two’s complement notation, 177 von Neumann, John (= Margittai Neumann U von Randow, Rabe-Riidiger, 606 von Seidel, Philipp Ludwig, 611 Janos), (n) and U (n ), 218, 232, 238 Ullman, Jeffrey David, 476, 539-540, 652 UltraSPARC computer, 390 t t Underflow during deletions, 720 Uniform binary search, 414-416, 423 Uniform distribution, 6, 16, 20, 47, 127, 606 Uniform probing, 530, 534-535, 548, 555-557 Uniform sorting, 245-246 Unimodal function, 417 UNIVAC I computer, 386-387, 738 UNIVAC III computer, 688 UNIVAC LARC computer, Universal hashing, 519, 557-558 UNIX operating system, 122 Unreliable comparisons, 702 Unsuccessful searches, 392, 396, 531, 572 Unusual correspondence, 27 Up-down permutations, 68 Updating a file, 166, 370 Uzgalis, Robert, 482, 490 V VSAM system, 489 Vuillemin, Jean Etienne, 152, 366, 377, 478 Vyssotsky, Victor Alexander, 738 W W t {n) and t (n), 209, 232, 238 Wachs, Michelle Lynn, 395, 454, 609, 711 Waks, David Jeffrey, 339 Waksman, Abraham, 226, 670 Walker, Ewing Stockton, 367 Walker, Wayne Allan, 442 Wallis, John, 24 Walters, John Rodney, Jr., 256 Wang, Ya Wei 261 Wang, Yihsiao 128 Ward, Morgan, 669 Watanabe, Masatoshi (iSiiSft'K), 782 Waters, Samuel Joseph, 367 Waugh, Evelyn Arthur St John, 395 Weak Bruhat order, 13, 19, 22, 628, 670 Weak orderings, 194 Wegman, Mark N, der Pool, Jan Albertus, 739 Ebbenhorst Tengbergen, Cornelia, 744 Boas, Peter, 152 Emde Emden, Maarten Herman, 159, 385 Wedekind, Hartmut, 312 Wegener, Ingo Werner, 643, 645 V t (n) and t (n), 209, 232, 238 Vallee, Brigitte, 728 van van van van van van 8, 128, 633, 638 Leeuwen, Jan, 645 Leeuwen, Marcus Aurelius Augustinus, 611 van Lint, Jacobus Hendricus, 729, 747 Van Valkenburg, Mac Elwyn, Van Voorhis, David Curtis, 228-229, 240 van Wijngaarden, Adriaan, Vandermonde, Alexandre Theophile, determinant, 59, 610, 729 Variable-length code, 452-453 Variable-length keys, searching for, 429, 487, 490, 496, 519, 556, 557, 720 Variable-length records, 266, 339, 403 519, 557, 743 Wegner, Lutz Michael, 635 Weight-balanced trees, 476, 480 Weighted path length, 196, 216, 337, 361, 438, 451, 458 Weisert, Conrad, 281 Weiss, Benjamin, 548, 732 Weiss, Harold, 388 Weiss, Mark Allen, 95, 623 Weissblum, Walter, 47 Wells, Mark Brimhall, 187, 192 Wessner, Russell Lee, 711 Wheeler, David John, 98, 453 Whirlwind computer, 387 Whitlow, Duane Leroy, 369 Wiedemann, Douglas Henry, 669 Wiener, Norbert, Wigram, George Vicesimus, 179 INDEX AND GLOSSARY 782 Wijngaarden, Adriaan van, Wiles, Andrew John, 584 Wilf, Herbert Saul, 70 Willard, Dan Edward, 152, 181 Williams, Francis A., Jr., 521 Williams, John William Joseph, 144-145, 149, 152, 156, 157, 389 Williamson, Stanley Gill, 673 Wilson, David Bruce, 670 Windley, Peter Francis, 453, 709, 735 Winkler, Phyllis Astrid Benson, vii Woan, Wen-jin (jftjttt), 607 Wolfel (= Woelfel), Peter Philipp, 558 Wong, Chak-Kuen (Jf^fll), 259, 458, 476, 480, 566, 678 Derick, 389, 489, 645, 714 Wood, Woodall, Arthur David, 166 Woodrum, Luther Jay, 166, 339 Woolhouse, Wesley Stoker Barker, 745 Wormald, Nicholas Charles, 513 Wrench, John William, Jr., 41, 155, 644, 726 Wright, Edward Maitland, 594 Wu Jigang (S^H'J), 643 Wyman, Max, 64 Yao, Andrew Chi-Chih 216, 230, 234, 425, 489, 557, 627, 640, 665, 668, 718, 722 Yao, Frances Foong Chu (ft®), 202, 230, 425, 665, 711 Yoash, Nahal Ben (TOO’ pseudonym p 7n3), of Gideon Yuval (7lP jWTa), 349 Youden, William Wallace, 440 Young, Alfred, 48 tableaux, 47-72, 240, 670-671 Young, Frank Hood, 228 Yuba, Toshitsugu PI), 727 Yuen, Pasteur Shih Teh (JtfiipISi), 520 Yule, George Udny, 401 distribution, 401, 405 Zalk, Martin Maurice, 256 Zave, Derek Alan, 279, 682 Zeckendorf, Edouard, 681 Zeilberger, Doron inn), 596, 600, 601 Zero-one principle, 223, 224, 245, 667, 668 Zeta function C(z), 133, 510, 637, 702 Zhang, Bin (55®), 488 Zhang, Linbo (3k#t&), 782 Zhu, Hong (35$), 643, 674 Zigzag paths, 430, 612, 671, see also Lattice paths Zijlstra, Erik, 152 Zipf, George Kingsley, 400 distribution, 400, 402, 435, 455 Ziviani, Nivio, 489 Zoberbier, Werner, 338 Zobrist, Alfred Lindsey, 742 Zodiac, 426-427 Zolnowsky, John Edward, 594 Zuse, Konrad, 385 Zweben, Stuart Harvey, 717 Zwick, Uri (p> 11 X nw), 664 Although you may pass an yet you — GEORGE artist, may for computist, or analyst, not be justly esteemed a man of science BERKELEY, The Analyst (1734) THIS BOOK was composed on a Sun SPARCstation with Computer Modern typefaces, using the T]'X and METflFONT software as described in the author’s books Computers Typesetting (Reading, Mass.: Addison-Wesley, 1986), Volumes A— E The illustrations were produced with John Hobby’s METHPOBT system Some names in the index were typeset with additional fonts developed by Yannis Haralambous (Greek, Hebrew, Arabic), Olga G Lapko (Cyrillic), Frans J Velthuis (Devanagari), Masatoshi Watanabe (Japanese), and Linbo Zhang (Chinese) & 00 01 02 03 04 05 06 07 08 09 00 No 10 11 01 rA N0P(0) 08 09 V IT rll 4- -V -V LD1N (0 24 25 M(F) 4— rA 33 M(F) 4- rJ 40 jump 0, jump JIM JAM 48 ±M INCA(0) DECA(l) ENTA(2) ENNAC3) 56 Cl 4- rA(F) : V CMPA(0 5) FCMPC6) : General form: - ±M INCl(O) DECl(l) ENT1C2) ENN1C3) 57 rI2 : V [rI2]? variant, (4 43 rI3 0, jump J3M ±M rI3 - [rI3] ? ±M 59 : INC3(0) DEC3(1) ENT3(2) ENN3(3) V Cl 4- rI3(F) CMP3(0:5) 5) field of instruction 4) field of instruction : V = M(F) = contents of F field of location = symbolic name for operation = normal F setting t = execution time; T = interlock time 0P : 51 M = address of instruction after indexing (F) F I0C(0) CMP2(0:5) : +T Control, unit - operation code, (5 F = op : jump 58 Cl 4- rI2(F) CMP1(0:5) C= ST3(0 5) INC2C0) DEC2C1) ENT2(2) ENN2(3) Cl 4- rll(F) rI3 35 F busy? 0, M(F) 4- rI2 50 [rll]? : J2M 49 rll LD3N (0 5) : -V 27 rI2 rI3 4- 42 : -V JBUS(0) 41 rll : Unit 4- STZ(0:5) STJ(0:2) LD3(0 5) 19 34 2 V rI3 4- ST2(0:5) : V 11 V M(F) 4- rll rA x : 26 4- MUL(0 5) FMUL(6) LD2N(0:5) 5) 10 rAX rI2 4- ST1 (0 5) : [rA]? : M(F) 4- STA(0 5) 32 rA 4- 19 20 21 22 23 24 03 V 18 LDAN(0:5) 0, 18 LD2(0:5) : : : 17 rI2 4- LD1 (0 5) LDA(0 5) rA rA — 10 V rll 4- 16 M(F) 4- 16 15 14 SUB (0:5) FSUB(6) ADD(0 5) FADDC6) : rA 4- 13 02 +V rA 4— rA operation rA 4— 12 uABCDEFGHIAJKLMNOPQREIISTU Character code: M : V 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 VWXYZ0123456789 04 12 rA 4- rAX/V rX 4— remainder DIV(0:5) FDIV(6) 12 rI4 4- 05 g LD4N(0:5) 4— rI4 +T F Input, unit : 0, jump rI5 J4[+] 0, Unit jump rI6 60 Cl 4- rI4(F) : ±M 61 V : Cl 4- rI6(F) [*]: i, i ±M CMP6(0:5) JL(4) JE(5) JG(6) JGE(7) JNE(8) JLE(9) [+]: < — > > < N(0) Z(l) P (2) NN(3) NZ(4) NP(5) V 23 V 31 39 Jumps 47 rX : 0, jump JX[+] 62 V : = register A = register X rAX = registers A and X as one rli = index register < < rJ = register J Cl = comparison indicator jump INC6(0) DEC6C1) ENT6C2) ENN6(3) CMP5(0 5) rA rX 0, rI6 4- [rI6]? Cl 4- rI5(F) CMP4(0:5) 15 J6[+] INC5(0) DEC5C1) ENT5(2) ENN5(3) : M to rll M0VE(1) JMP(O) JSJ(l) JOV (2) JN0V(3) also [*] below 54 rI5 4- [rI5]? INC4(0) DEC4C1) ENT4(2) ENN4(3) from F ready? 46 J5[+] ±M JRED(O) 53 [rI4] ? F : 07 + 2F Move F words STX(0:5) 38 0UT(0) 52 rI4 4- +T ' : ; M(F) 4- rX rI6 ST6(0:5) a LDXN(0:5) M(F) 4- 45 > rX 30 rI5 Output, unit IN(0) 44 -V LD6N(0:5) 37 < LDX(0:5) rI6 4- ST5(0:5) : rI4 -V M(F) 4— $ rX 4— LD6(0:5) 29 ST4(0 5) 22 LD5N(0:5) 36 14 rI5 4- M bytes = rI6 4- 21 V 28 M(F) V LD5(0:5) 2 Shift SLA(O) SRA(l) SLAXC2) SRAX(3) SLC(4) SRC (5) 13 rI5 4— ()+-*/ 06 Special LD4(0:5) , NUM(O) CHAR ( HLT(2) V 20 rI4 10 : V 55 rX 4- [rX]? ±M INCX(O) DECX(l) ENTX(2) ENNXC3) 63 Cl 4- rX(F) CMPX(0 5) : : V 25 26 27 28 29 30 31 32 33 34 35 36 37 # VWXYZ01234567 04 rA «(- rAX/V rX 4— remainder Special NUM(O) CHAR(l) HLT(2) DIV(0:5) FDIV(6) 12 13 V rI4 4— rI5 4— LD4(0 5) : 20 21 -V rI5 4- LD4N(0:5) -V Donald 4- rI4 the M(F) ST4(0 5) known throughout the work on algorithms and is pioneering METAFONT TgX and 4- rl writing (26 books, Emeritus of The Art of at Stanford University, 36 +T 37 F Input, unit : 0, jump rI5 0, AMS and, J5[+] the recipient of numerous awards and ACM Turing Award, the in Steele Prize for expository writing, November, 996, the prestigious Kyoto Prize for advanced technology 52 53 rI4 4— [rI4] ? ±M 60 : ± 61 V Visit the Addison-Wesley about on the this web site to learn CMP5(0:5) tion own home page for more informabook and on future volumes in the Knuth’s on this series: [ = register A — register X rAX = registers A and X as one rli = index register < i < rJ = register J Cl = comparison indicator rA rX i, more remarkable scientist and author: : Visit : lives informit.com/knuth Cl 4- rI5(F) CMP4(0 5) He Jill i INC5(0) DEC ENT5(2) ENN5 Cl 4- rI4(F) Stanford campus with his wife, rI5 4- [rI5]? INC4(0) DEC4C1) ENT4(2) ENN4(3) full seminal multi- Medal of Science presented by President Carter, jum] the J4 [+] is honors, including the : his at California Institute of Technology Professor OUT(O) Knuth rI4 Computer Programming he currently devotes volume series on classical computer science, begun in 1962 when he was a graduate student 45 invention of papers) Professor time to the completion of Output, unil IN(O) 44 his systems for computer typesetting, and for his prolific and influential ST5(0:5) : Knuth his programming techniques, for 29 E world for LD5N(0:5) 28 M(F) V LD5(0 5) : rI4 4- 05 12 J J J JG| JN JL www-cs-faculty.stanford.edu/~knuth Computer Science/Programming The Art of Computer Programming DONALD science The KNUTH E This multivolume first work widely recognized as the definitive description of classical computer is three volumes have for decades been an invaluable resource theory and practice for students, researchers, and practitioners The bible in * programming alike of all fundamental algorithms and the work that taught many of today’s software know about computer programming September 1995 developers most of what they — Byte, Countless readers have spoken about the profound personal influence of Knuth’s work Scientists his analysis, while ordinary programmers have “cookbook” solutions to their day-to-day problems All have admired Knuth good humor found in his books have marveled at the beauty and elegance of successfully applied his for the breadth, clarity, accuracy, and I can’t begin to me! I tell you how have pored over them game when my son wasn’t many pleasurable hours of study in cars, restaurants, in and home at work, at recreation they have afforded and even at a Little League the line-up —Charles Long Primarily written as a reference, some people have nevertheless found it possible and interesting to A programmer in China even compared the experience read each volume from beginning to end to reading a poem If you think you’re a good programmer really Programming You should — Bill definitely send me read [Knuth’s] Art of a resume if you Computer can read the whole thing Gates Whatever your backgound, if you need to any serious computer programming, you will find your own good reason to make each volume in this series a readily accessible part of your scholarly or professional It’s library always a pleasure when a problem shelf I find that merely opening is hard enough that you have one has a very useful terrorizing to get the effect Knuths off the on computers —Jonathan Laventhol For the first time developments in more than 20 in the field years, that have changed In Knuth has revised all three books to reflect last editions, keeping with the authoritative character of these books, information about previous work in the field has been updated where by perceptive and demanding readers, have raise new all all — historical necessary Consistent with the author’s reputation for painstaking perfection, the rare technical errors been added to more recent on those areas where knowledge has on problems that have been solved, on problems^ His revisions focus specifically converged since publication of the in his work, discovered been corrected Hundreds of new exercises have challenges ISBN-13: 978-0-201-89685-5 0-201-89685-0 ISBN-10: j informit.com/aw A Addison-Wesle y Pearson Education $74.99 US $89.99 CANADA ... 13 13 13 13 14 09 34 14 31 43 40 48 40 04 26 29 08 08 53 20 52 30 48 30 40 49 49 48 48 48 12 27 09 07 12 49 41 15 46 22 59 25 25 55 33 20 36 The task of sorting 500 entries like this, given the. .. = 0, 1, 34 21 , 25 , 25 , 25 , 25 , 25 , 25 , 26 , 26 , 26 , 30 , 30 , 30 , 30 , 30 , 30 , 30 , 33 , 33 , 33 ,35 ,35 ) The “betweenness frequencies” qj have a noticeable effect on the optimum shows the optimum... 6 .2. 1 a) 411 Searching for 6 53: 087 154 170 27 5 426 5 03 509 5 12 6 12 6 53 677 7 03 426 5 03 509 [5 12 6 12 6 53 677 7 03 426 5 03 509 [5 12 6 12 6 53] 677 7 03 061 087 154 170 27 5 426 5 03 509 5 12 6 12 [6 53]