Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 94 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
94
Dung lượng
485,38 KB
Nội dung
Parallel and Distributed Databases 635 8. In the Collaborating Servers architecture, when a transaction is submitted to the DBMS, briefly describe how its activities at various sites are coordinated. In particular, describe the role of transaction managers at the different sites, the concept of subtransactions, and the concept of distributed transaction atomicity. Exercise 21.2 Give brief answers to the following questions: 1. Define the terms fragmentation and replication, in terms of where data is stored. 2. What is the difference between synchronous and asynchronous replication? 3. Define the term distributed data independence. Specifically, what does this mean with respect to querying and with respect to updating data in the presence of data fragmen- tation and replication? 4. Consider the voting and read-one write-all techniques for implementing synchronous replication. What are their respective pros and cons? 5. Give an overview of how asynchronous replication can be implemented. In particular, explain the terms capture and apply. 6. What is the difference between log-based and procedural approaches to implementing capture? 7. Why is giving database objects unique names more complicated in a distributed DBMS? 8. Describe a catalog organization that permits any replica (of an entire relation or a frag- ment) to be given a unique name and that provides the naming infrastructure required for ensuring distributed data independence. 9. If information from remote catalogs is cached at other sites, what happens if the cached information becomes outdated? How can this condition be detected and resolved? Exercise 21.3 Consider a parallel DBMS in which each relation is stored by horizontally partitioning its tuples across all disks. Employees(eid: integer , did: integer, sal: real) Departments(did: integer , mgrid: integer, budget: integer) The mgrid field of Departments is the eid of the manager. Each relation contains 20-byte tuples, and the sal and budget fields both contain uniformly distributed values in the range 0 to 1,000,000. The Employees relation contains 100,000 pages, the Departments relation contains 5,000 pages, and each processor has 100 buffer pages of 4,000 bytes each. The cost of one page I/O is t d , and the cost of shipping one page is t s ; tuples are shipped in units of one page by waiting for a page to be filled before sending a message from processor i to processor j. There are no indexes, and all joins that are local to a processor are carried out using a sort-merge join. Assume that the relations are initially partitioned using a round-robin algorithm and that there are 10 processors. For each of the following queries, describe the evaluation plan briefly and give its cost in terms of t d and t s . You should compute the total cost across all sites as well as the ‘elapsed time’ cost (i.e., if several operations are carried out concurrently, the time taken is the maximum over these operations). 636 Chapter 21 1. Find the highest paid employee. 2. Find the highest paid employee in the department with did 55. 3. Find the highest paid employee over all departments with budget less than 100,000. 4. Find the highest paid employee over all departments with budget less than 300,000. 5. Find the average salary over all departments with budget less than 300,000. 6. Find the salaries of all managers. 7. Find the salaries of all managers who manage a department with a budget less than 300,000 and earn more than 100,000. 8. Print the eids of all employees, ordered by increasing salaries. Each processor is connected to a separate printer, and the answer can appear as several sorted lists, each printed by a different processor, as long as we can obtain a fully sorted list by concatenating the printed lists (in some order). Exercise 21.4 Consider the same scenario as in Exercise 21.3, except that the relations are originally partitioned using range partitioning on the sal and budget fields. Exercise 21.5 Repeat Exercises 21.3 and 21.4 with the number of processors equal to (i) 1 and (ii) 100. Exercise 21.6 Consider the Employees and Departments relations described in Exercise 21.3. They are now stored in a distributed DBMS with all of Employees stored at Naples and all of Departments stored at Berlin. There are no indexes on these relations. The cost of various operations is as described in Exercise 21.3. Consider the query: SELECT * FROM Employees E, Departments D WHERE E.eid = D.mgrid The query is posed at Delhi, and you are told that only 1 percent of employees are managers. Find the cost of answering this query using each of the following plans: 1. Compute the query at Naples by shipping Departments to Naples; then ship the result to Delhi. 2. Compute the query at Berlin by shipping Employees to Berlin; then ship the result to Delhi. 3. Compute the query at Delhi by shipping both relations to Delhi. 4. Compute the query at Naples using Bloomjoin; then ship the result to Delhi. 5. Compute the query at Berlin using Bloomjoin; then ship the result to Delhi. 6. Compute the query at Naples using Semijoin; then ship the result to Delhi. 7. Compute the query at Berlin using Semijoin; then ship the result to Delhi. Exercise 21.7 Consider your answers in Exercise 21.6. Which plan minimizes shipping costs? Is it necessarily the cheapest plan? Which do you expect to be the cheapest? Parallel and Distributed Databases 637 Exercise 21.8 Consider the Employees and Departments relations described in Exercise 21.3. They are now stored in a distributed DBMS with 10 sites. The Departments tuples are horizontally partitioned across the 10 sites by did, with the same number of tuples assigned to each site and with no particular order to how tuples are assigned to sites. The Employees tuples are similarly partitioned, by sal ranges, with sal ≤ 100, 000 assigned to the first site, 100, 000 <sal≤200, 000 assigned to the second site, and so on. In addition, the partition sal ≤ 100, 000 is frequently accessed and infrequently updated, and it is therefore replicated at every site. No other Employees partition is replicated. 1. Describe the best plan (unless a plan is specified) and give its cost: (a) Compute the natural join of Employees and Departments using the strategy of shipping all fragments of the smaller relation to every site containing tuples of the larger relation. (b) Find the highest paid employee. (c) Find the highest paid employee with salary less than 100, 000. (d) Find the highest paid employee with salary greater than 400, 000 and less than 500, 000. (e) Find the highest paid employee with salary greater than 450, 000 and less than 550, 000. (f) Find the highest paid manager for those departments stored at the query site. (g) Find the highest paid manager. 2. Assuming the same data distribution, describe the sites visited and the locks obtained for the following update transactions, assuming that synchronous replication is used for the replication of Employees tuples with sal ≤ 100, 000: (a) Give employees with salary less than 100, 000a10percentraise,withamaximum salary of 100, 000 (i.e., the raise cannot increase the salary to more than 100, 000). (b) Give all employees a 10 percent raise. The conditions of the original partitioning of Employees must still be satisfied after the update. 3. Assuming the same data distribution, describe the sites visited and the locks obtained for the following update transactions, assuming that asynchronous replication is used for the replication of Employees tuples with sal ≤ 100, 000. (a) For all employees with salary less than 100, 000 give them a 10 percent raise, with a maximum salary of 100, 000. (b) Give all employees a 10 percent raise. After the update is completed, the conditions of the original partitioning of Employees must still be satisfied. Exercise 21.9 Consider the Employees and Departments relations from Exercise 21.3. You are a DBA dealing with a distributed DBMS, and you need to decide how to distribute these two relations across two sites, Manila and Nairobi. Your DBMS supports only unclustered B+ tree indexes. You have a choice between synchronous and asynchronous replication. For each of the following scenarios, describe how you would distribute them and what indexes you would build at each site. If you feel that you have insufficient information to make a decision, explain briefly. 638 Chapter 21 1. Half the departments are located in Manila, and the other half are in Nairobi. Depart- ment information, including that for employees in the department, is changed only at the site where the department is located, but such changes are quite frequent. (Although the location of a department is not included in the Departments schema, this information can be obtained from another table.) 2. Half the departments are located in Manila, and the other half are in Nairobi. Depart- ment information, including that for employees in the department, is changed only at the site where the department is located, but such changes are infrequent. Finding the average salary for each department is a frequently asked query. 3. Half the departments are located in Manila, and the other half are in Nairobi. Employees tuples are frequently changed (only) at the site where the corresponding department is lo- cated, but the Departments relation is almost never changed. Finding a given employee’s manager is a frequently asked query. 4. Half the employees work in Manila, and the other half work in Nairobi. Employees tuples are frequently changed (only) at the site where they work. Exercise 21.10 Suppose that the Employees relation is stored in Madison and the tuples with sal ≤ 100, 000 are replicated at New York. Consider the following three options for lock management: all locks managed at a single site, say, Milwaukee; primary copy with Madison being the primary for Employees; and fully distributed. For each of the lock management options, explain what locks are set (and at which site) for the following queries. Also state which site the page is read from. 1. A query submitted at Austin wants to read a page containing Employees tuples with sal ≤ 50, 000. 2. A query submitted at Madison wants to read a page containing Employees tuples with sal ≤ 50, 000. 3. A query submitted at New York wants to read a page containing Employees tuples with sal ≤ 50, 000. Exercise 21.11 Briefly answer the following questions: 1. Compare the relative merits of centralized and hierarchical deadlock detection in a dis- tributed DBMS. 2. What is a phantom deadlock? Give an example. 3. Give an example of a distributed DBMS with three sites such that no two local waits-for graphs reveal a deadlock, yet there is a global deadlock. 4. Consider the following modification to a local waits-for graph: Add a new node T ext ,and for every transaction T i that is waiting for a lock at another site, add the edge T i → T ext . Also add an edge T ext → T i if a transaction executing at another site is waiting for T i to release a lock at this site. (a) If there is a cycle in the modified local waits-for graph that does not involve T ext , what can you conclude? If every cycle involves T ext , what can you conclude? Parallel and Distributed Databases 639 (b) Suppose that every site is assigned a unique integer site-id. Whenever the local waits-for graph suggests that there might be a global deadlock, send the local waits- for graph to the site with the next higher site-id. At that site, combine the received graph with the local waits-for graph. If this combined graph does not indicate a deadlock, ship it on to the next site, and so on, until either a deadlock is detected or we are back at the site that originated this round of deadlock detection. Is this scheme guaranteed to find a global deadlock if one exists? Exercise 21.12 Timestamp-based concurrency control schemes can be used in a distributed DBMS, but we must be able to generate globally unique, monotonically increasing timestamps without a bias in favor of any one site. One approach is to assign timestamps at a single site. Another is to use the local clock time and to append the site-id. A third scheme is to use a counter at each site. Compare these three approaches. Exercise 21.13 Consider the multiple-granularity locking protocol described in Chapter 18. In a distributed DBMS the site containing the root object in the hierarchy can become a bottleneck. You hire a database consultant who tells you to modify your protocol to allow only intention locks on the root, and to implicitly grant all possible intention locks to every transaction. 1. Explain why this modification works correctly, in that transactions continue to be able to set locks on desired parts of the hierarchy. 2. Explain how it reduces the demand upon the root. 3. Why isn’t this idea included as part of the standard multiple-granularity locking protocol for a centralized DBMS? Exercise 21.14 Briefly answer the following questions: 1. Explain the need for a commit protocol in a distributed DBMS. 2. Describe 2PC. Be sure to explain the need for force-writes. 3. Why are ack messages required in 2PC? 4. What are the differences between 2PC and 2PC with Presumed Abort? 5. Give an example execution sequence such that 2PC and 2PC with Presumed Abort generate an identical sequence of actions. 6. Give an example execution sequence such that 2PC and 2PC with Presumed Abort generate different sequences of actions. 7. What is the intuition behind 3PC? What are its pros and cons relative to 2PC? 8. Suppose that a site does not get any response from another site for a long time. Can the first site tell whether the connecting link has failed or the other site has failed? How is such a failure handled? 9. Suppose that the coordinator includes a list of all subordinates in the prepare message. If the coordinator fails after sending out either an abort or commit message, can you suggest a way for active sites to terminate this transaction without waiting for the coordinator to recover? Assume that some but not all of the abort/commit messages from the coordinator are lost. 640 Chapter 21 10. Suppose that 2PC with Presumed Abort is used as the commit protocol. Explain how the system recovers from failure and deals with a particular transaction T in each of the following cases: (a) A subordinate site for T fails before receiving a prepare message. (b) A subordinate site for T fails after receiving a prepare message but before making a decision. (c) A subordinate site for T fails after receiving a prepare message and force-writing an abort log record but before responding to the prepare message. (d) A subordinate site for T fails after receiving a prepare message and force-writing a prepare log record but before responding to the prepare message. (e) A subordinate site for T fails after receiving a prepare message, force-writing an abort log record, and sending a no vote. (f) The coordinator site for T fails before sending a prepare message. (g) The coordinator site for T fails after sending a prepare message but before collecting all votes. (h) The coordinator site for T fails after writing an abort log record but before sending any further messages to its subordinates. (i) The coordinator site for T fails after writing a commit log record but before sending any further messages to its subordinates. (j) The coordinator site for T fails after writing an end log record. Is it possible for the recovery process to receive an inquiry about the status of T from a subordinate? Exercise 21.15 Consider a heterogeneous distributed DBMS. 1. Define the terms multidatabase system and gateway. 2. Describe how queries that span multiple sites are executed in a multidatabase system. Explain the role of the gateway with respect to catalog interfaces, query optimization, and query execution. 3. Describe how transactions that update data at multiple sites are executed in a multi- database system. Explain the role of the gateway with respect to lock management, distributed deadlock detection, Two-Phase Commit, and recovery. 4. Schemas at different sites in a multidatabase system are probably designed independently. This situation can lead to semantic heterogeneity; that is, units of measure may differ across sites (e.g., inches versus centimeters), relations containing essentially the same kind of information (e.g., employee salaries and ages) may have slightly different schemas, and so on. What impact does this heterogeneity have on the end user? In particular, comment on the concept of distributed data independence in such a system. BIBLIOGRAPHIC NOTES Work on parallel algorithms for sorting and various relational operations is discussed in the bibliographies for Chapters 11 and 12. Our discussion of parallel joins follows [185], and our discussion of parallel sorting follows [188]. [186] makes the case that for future high Parallel and Distributed Databases 641 performance database systems, parallelism will be the key. Scheduling in parallel database systems is discussed in [454]. [431] contains a good collection of papers on query processing in parallel database systems. Textbook discussions of distributed databases include [65, 123, 505]. Good survey articles in- clude [72], which focuses on concurrency control; [555], which is about distributed databases in general; and [689], which concentrates on distributed query processing. Two major projects in the area were SDD-1 [554] and R* [682]. Fragmentation in distributed databases is consid- ered in [134, 173]. Replication is considered in [8, 10, 116, 202, 201, 328, 325, 285, 481, 523]. For good overviews of current trends in asynchronous replication, see [197, 620, 677]. Papers on view maintenance mentioned in the bibliography of Chapter 17 are also relevant in this context. Query processing in the SDD-1 distributed database is described in [75]. One of the notable aspects of SDD-1 query processing was the extensive use of Semijoins. Theoretical studies of Semijoins are presented in [70, 73, 354]. Query processing in R* is described in [580]. The R* query optimizer is validated in [435]; much of our discussion of distributed query processing is drawn from the results reported in this paper. Query processing in Distributed Ingres is described in [210]. Optimization of queries for parallel execution is discussed in [255, 274, 323]. [243] discusses the trade-offs between query shipping, the more traditional approach in relational databases, and data shipping, which consists of shipping data to the client for processing and is widely used in object-oriented systems. Concurrency control in the SDD-1 distributed database is described in [78]. Transaction man- agement in R* is described in [476]. Concurrency control in Distributed Ingres is described in [625]. [649] provides an introduction to distributed transaction management and various no- tions of distributed data independence. Optimizations for read-only transactions are discussed in [261]. Multiversion concurrency control algorithms based on timestamps were proposed in [540]. Timestamp-based concurrency control is discussed in [71, 301]. Concurrency control algorithms based on voting are discussed in [259, 270, 347, 390, 643]. The rotating primary copy scheme is described in [467]. Optimistic concurrency control in distributed databases is discussed in [574], and adaptive concurrency control is discussed in [423]. Two-Phase Commit was introduced in [403, 281]. 2PC with Presumed Abort is described in [475], along with an alternative called 2PC with Presumed Commit. A variation of Presumed Commit is proposed in [402]. Three-Phase Commit is described in [603]. The deadlock detection algorithms in R* are described in [496]. Many papers discuss deadlocks, for example, [133, 206, 456, 550]. [380] is a survey of several algorithms in this area. Distributed clock synchronization is discussed by [401]. [283] argues that distributed data independence is not always a good idea, due to processing and administrative overheads. The ARIES algorithm is applicable for distributed recovery, but the details of how messages should be handled are not discussed in [473]. The approach taken to recovery in SDD-1 is described in [36]. [97] also addresses distributed recovery. [383] is a survey article that discusses concurrency control and recovery in distributed systems. [82] contains several articles on these topics. Multidatabase systems are discussed in [7, 96, 193, 194, 205, 412, 420, 451, 452, 522, 558, 672, 697]; see [95, 421, 595] for surveys. 22 INTERNETDATABASES He profits most who serves best. —Motto for Rotary International The proliferation of computer networks, including the Internet and corporate ‘in- tranets,’ has enabled users to access a large number of data sources. This increased access to databases is likely to have a great practical impact; data and services can now be offered directly to customers in ways that were impossible until recently. Elec- tronic commerce applications cover a broad spectrum; examples include purchasing books through a Web retailer such as Amazon.com, engaging in online auctions at a site such as eBay, and exchanging bids and specifications for products between com- panies. The emergence of standards such as XML for describing content (in addition to the presentation aspects) of documents is likely to further accelerate the use of the Web for electronic commerce applications. While the first generation of Internet sites were collections of HTML files —HTML is a standard for describing how a file should be displayed —most major sites today store a large part (if not all) of their data in database systems. They rely upon DBMSs to provide fast, reliable responses to user requests received over the Internet; this is especially true of sites for electronic commerce. This unprecedented access will lead to increased and novel demands upon DBMS technology. The impact of the Web on DBMSs, however, goes beyond just a new source of large numbers of concurrent queries: The presence of large collections of unstructured text documents and partially structured HTML and XML documents and new kinds of queries such as keyword search challenge DBMSs to significantly expand the data management features they support. In this chapter, we discuss the role of DBMSs in the Internet environment and the new challenges that arise. We introduce the World Wide Web, Web browsers, Web servers, and the HTML markup language in Section 22.1. In Section 22.2, we discuss alternative architec- tures for making databases accessible through the Web. We discuss XML, an emerg- ing standard for document description that is likely to supersede HTML, in Section 22.3. Given the proliferation of text documents on the Web, searching them for user- specified keywords is an important new query type. Boolean keyword searches ask for documents containing a specified boolean combination of keywords. Ranked keyword searches ask for documents that are most relevant to a given list of keywords. We 642 Internet Databases 643 consider indexing techniques to support boolean keyword searches in Section 22.4 and techniques to support ranked keyword searches in Section 22.5. 22.1 THE WORLD WIDE WEB The Web makes it possible to access a file anywhere on the Internet. A file is identified by a universal resource locator (URL): http://www.informatik.uni-trier.de/ ˜ ley/db/index.html This URL identifies a file called index.html, stored in the directory ˜ ley/db/ on machine www.informatik.uni-trier.de. This file is a document formatted using HyperText Markup Language (HTML) and contains several links to other files (identified through their URLs). The formatting commands are interpreted by a Web browser such as Microsoft’s Internet Explorer or Netscape Navigator to display the document in an attractive manner, and the user can then navigate to other related documents by choosing links. A collection of such documents is called a Web site and is managed using a program called a Web server, which accepts URLs and returns the corresponding documents. Many organizations today maintain a Web site. (Incidentally, the URL shown above is the entry point to Michael Ley’s Databases and Logic Programming (DBLP) Web site, which contains information on database and logic programming research publications. It is an invaluable resource for students and researchers in these areas.) The World Wide Web,orWeb, is the collection of Web sites that are accessible over the Internet. An HTML link contains a URL, which identifies the site containing the linked file. When a user clicks on a link, the Web browser connects to the Web server at the destination Web site using a connection protocol called HTTP and submits the link’s URL. When the browser receives a file from a Web server, it checks the file type by examining the extension of the file name. It displays the file according to the file’s type and if necessary calls an application program to handle the file. For example, a file ending in .txt denotes an unformatted text file, which the Web browser displays by interpreting the individual ASCII characters in the file. More sophisticated document structures can be encoded in HTML, which has become a standard way of structuring Web pages for display. As another example, a file ending in .doc denotes a Microsoft Word document and the Web browser displays the file by invoking Microsoft Word. 22.1.1 Introduction to HTML HTML is a simple language used to describe a document. It is also called a markup language because HTML works by augmenting regular text with ‘marks’ that hold special meaning for a Web browser handling the document. Commands in the language 644 Chapter 22 <HTML> <HEAD></HEAD> <BODY> Science: <UL> <LI>Author: Richard Feynman</LI> <LI>Title: The Character of Physical Law</LI> <LI>Published 1980</LI> <LI>Hardcover</LI> </UL> Fiction: <UL> <LI>Author: R.K. Narayan</LI> <LI>Title: Waiting for the Mahatma</LI> <LI>Published 1981</LI> <UL> <LI>Name: R.K. Narayan</LI> <LI>Title: The English Teacher</LI> <LI>Published 1980</LI> <LI>Paperback</LI> </UL> </BODY> </HTML> Figure 22.1 Book Listing in HTML are called tags and they consist (usually) of a start tag and an end tag of the form <TAG> and </TAG>, respectively. For example, consider the HTML fragment shown in Figure 22.1. It describes a Web page that shows a list of books. The document is enclosed by the tags <HTML> and </HTML>, marking it as an HTML document. The remainder of the document —enclosed in <BODY> </BODY>—contains information about three books. Data about each book is represented as an unordered list (UL) whose entries are marked with the LI tag. HTML defines the set of valid tags as well as the meaning of the tags. For example, HTML specifies that the tag <TITLE> is a valid tag that denotes the title of the document. As another example, the tag <UL> always denotes an unordered list. Audio, video, and even programs (written in Java, a highly portable language) can be included in HTML documents. When a user retrieves such a document using a suitable browser, images in the document are displayed, audio and video clips are played, and embedded programs are executed at the user’s machine; the result is a rich multimedia presentation. The ease with which HTML documents can be created— [...]... Figure 22.6 as input, this query produces the following result: Internet Databases 661 Commercial database systems and XML: Many relational and objectrelational database system vendors are currently looking into support for XML in their database engines Several vendors of object-oriented database management systems already offer database engines that can store XML data whose contents can be accessed... information retrieval, which is closely related to database management Information retrieval systems, like database systems, have the goal of enabling users to query a large volume of data, but the focus has been on large collections of unstructured documents Updates, concurrency control, and recovery have traditionally not been addressed in information retrieval systems because the data in typical applications... sources: Most companies have data in many different database systems, from legacy systems to modern object-relational systems Electronic commerce applications require integrated access to all these data sources Transactions involving several data sources: In electronic commerce applications, a user transaction might involve updates at several data sources An 6 48 Chapter 22 #!/usr/bin/perl use DBI; use CGI;... as the Web server The use of a Web browser to invoke a program at a remote site leads us to the role of databases on the Web: The invoked program can generate a request to a database system This capability allows us to easily place a database on a computer network, and make services that rely upon database access available over the Web This leads to a new and rapidly growing source of concurrent requests... data sources including flat files and legacy systems; a structured data model is often too rigid Third, we cannot query a structured database without knowing the schema, but sometimes we want to query the data without full knowledge of the schema For example, we cannot express the query “Where in the database can we find the string Malgudi?” in a relational database system without knowing the schema All... Character of Physical Law BOOK PUBLISHED 1 980 Feynman Figure 22 .8 BOOK AUTHOR FIRST NAME R.K TITLE LAST NAME Waiting for the Mahatma PUBLISHED 1 981 Narayan The Semistructured Data Model and edges correspond to attributes There is no separate schema and no auxiliary description; the data in the graph is self describing For example, consider the graph shown in Figure 22 .8, which represents part of the XML data... DBMS It has the potential to make database systems more tightly integrated into Web applications than ever before XML emerged from the confluence of two technologies, SGML and HTML The Standard Generalized Markup Language (SGML) is a metalanguage that allows the definition of data and document interchange languages such as HTML The SGML standard was published in 1 988 and many organizations that manage... research tries to find answers to these questions 22.4 INDEXING FOR TEXT SEARCH In this section, we assume that our database is a collection of documents and we call such a database a text database For simplicity, we assume that the database contains exactly one relation and that the relation schema has exactly one field of type document Thus, each record in the relation contains exactly one document In... documents, we have the opportunity to use a high-level language that exploits this structure to conveniently retrieve data from within such documents Such a language would bring XML data management much closer to database management than the text-oriented paradigm of HTML documents Such a language would also allow us to easily translate XML data between different DTDs, as is required for integrating data... scales less well to larger 665 Internet Databases Rid 1 2 3 4 Document agent James Bond agent mobile computer James Madison movie James Bond movie Figure 22.9 Signature 1100 1101 1011 1110 Word agent Bond computer James Madison mobile movie Inverted list 1, 2 1, 4 2 1, 3, 4 3 2 3, 4 Hash 1000 0100 0100 1000 0001 0001 0010 A Text Database with Four Records and Indexes database sizes because the index has . joins follows [ 185 ], and our discussion of parallel sorting follows [ 188 ]. [ 186 ] makes the case that for future high Parallel and Distributed Databases 641 performance database systems, parallelism. in parallel database systems is discussed in [454]. [431] contains a good collection of papers on query processing in parallel database systems. Textbook discussions of distributed databases include. distributed recovery. [ 383 ] is a survey article that discusses concurrency control and recovery in distributed systems. [82 ] contains several articles on these topics. Multidatabase systems are discussed