Advanced Database Technology and Design phần 7 pps

local server, the waits-for graph is fetched from any remote sites where other parts of the transactions involved in the cycle might be executing. In our example, the transaction fragments for T 1.2 and T 2.1 executing at S 1 would be involved in a cycle including an EXT node. This causes S 1 to contact S 2 , from which the waits-for graph for T 1.1 and T 2.2 can be fetched. 9.5.2 Distributed Commit Once a transaction has completed all its operations, the ACID properties require that it be made durable when it commits. In a centralized system, ensuring that the commit is atomic amounts to ensuring that all updates are written to log files and that the write operation to disk that marks the transaction as complete is atomic. In distributed transactions, there is the additional requirement for a protocol that ensures that all the servers involved in the transaction agree to either all commit or all abort. The basic framework for building such protocols will involve having a coordinator process in the GTM for each set of server processes in the LTMs that are executing the transaction. At some point, the coordinator will decide that the transaction should be concluded and will perform the following steps. 1. Ask all servers to vote if they are able to commit the transaction. 2. The servers may vote to commit or to abort. 3. The coordinator commits the transaction only if all servers vote to commit. We next give two protocols that implement this voting procedure in a manner that is (to various extents) tolerant of failures of servers and the coordinator. 9.5.2.1 Two-Phase Commit The most common protocol for ensuring atomic commitment is two-phase commit (2PC) [1416], which has been implemented in commercial DBMSs such as Sybase [17] and Oracle [18]. It is such a common protocol that its messages make up the OSI application-layer commitment, concurrency, and recovery (CCR) protocol. We use these messages to describe the execution of 2PC, which involves the following two phases: • Phase 1: The coordinator transmits the message C-PREPARE to all servers, informing them that the transaction should now commit. A Distributed Databases 319 server replies C-READY if it is ready to commit. After that point, it cannot abort the transaction unless instructed to do so by the coordinator. Alternatively, a server replies C-REFUSE if it is unable to commit. • Phase 2: If the coordinator receives C-READY from all servers, it transmits C-COMMIT to all servers. Each server commits on receiving this message. If the coordinator receives C-REFUSE from any server, it transmits C-ROLLBACK to all servers. Each server aborts on receiving this message. Provided none of the servers crashes and there are no network errors, 2PC will provide a reliable and robust distributed atomic commitment protocol. However, we must take into account failures occurring, which introduces the concept of having some termination protocol to deal with situations in which the atomic commitment protocol is not being obeyed. Some failures are easily handled by having timeouts associated with communication. For example, the coordinator may not receive a reply from one failed server and then might decide to abort the transaction using C-ROLLBACK. Alterna- tively, the coordinator may fail after asking for a vote, in which case all the servers will time out and then contact each other to elect a new coordinator and continue with the transaction. For some errors, however, the protocol has a weakness in that a server may become blocked. That will occur after a server has sent a C-READY reply, which entails that it must commit if and when it receives C-COMMIT. In this circumstance, two failures can occur that require con- tradictory action by the server: • Immediately after sending C-PREPARE, the coordinator might have crashed. One other server might have replied C-REFUSE and aborted its transaction. If this was the case, it would be correct for the server to abort its transaction, even after sending C-READY. • The coordinator might have sent C-COMMIT to all other servers and then crashed. Those other servers might have committed their transactions and then also crashed. It would then be correct for the server to commit its transaction. For the server that issued the C-READY and has received no reply, those two situations would be identicalthe server is unable to get information from 320 Advanced Database Technology and Design the coordinator. Thus the server is blocked, unable to either commit or abort the transaction; hence, it must maintain all the locks associated with the transaction indefinitely. The factor that makes 2PC block is that once a server has voted to take part in a commit, it does not know what the result of the vote is until the command to commit arrives. By the time it has timed out, after not receiving the result of the vote, all other servers might have failed and so may not be able to be contacted. Eventually one of the failed servers will execute a recovery and be able to inform the blocked server of the result, but that may take a great deal of time. It can be argued that such scenarios are unlikely in practice; indeed, 2PC has been used successfully in commercial systems. However, in environments that require greater fault tolerance, we require protocols that do not block. 9.5.2.2 Three-Phase Commit 2PC can be made nonblocking by introducing an extra phase that obtains and distributes the result of the vote before sending out the command to commit. That requires that the OSI CCR protocol be extended with message types C-PRECOMMIT and C-PRECOMMIT-ACK to inform servers of the result of a vote separately from issuing the command to commit or roll back a transaction. The steps in such a three-phase commit (3PC) [14, 15, 19] protocol are the following: • Phase 1: The same as for 2PC. • Phase 2: If the coordinator receives C-READY from all servers, it transmits C-PRECOMMIT to all servers. Each server replies with a C-PRECOMMIT-ACK. If the coordinator receives C-REFUSE from any server, it transmits C-ROLLBACK to all servers. Each server aborts on receiving that message. • Phase 3: If the coordinator receives a C-PRECOMMIT-ACK from all servers, it transmits C-COMMIT to all servers. If the coordinator is missing a C-PRECOMMIT-ACK, it transmits a C-ROLLBACK to all servers. The fact that the vote is distributed to all servers and is confirmed to have arrived at those servers, before a command to commit is made by the coordinator, means that should any server be missing the C-PRECOMMIT, it will time out and contact some other server to find out the result of the vote and hence be in a position to commit. If all other servers have failed, the server Distributed Databases 321 can abort because none of the other servers could have committed before the timeout occurred. If a server has received a C-COMMIT, it can safely commit, knowing that all the other failed servers can later recover and determine the result of the vote. 3PC is a nonblocking protocol that ensures that all nonfailed sites can agree on a decision for transaction termination in the absence of communication failures. It achieves that by introducing an extra delay and set of messages to be exchanged; hence, it will have poorer performance than 2PC. Also, the number of different states that may arise in 3PC is greatly increased from that in 2PC. For that reason, implementations of 3PC are more diffi- cult to produce and verify as correct. 9.5.3 Distributed Recovery To a large extent, each LTM in the DDB will be able to use standard techniques based on redo/undo logs [15] to recover from system crashes by roll- ing back or committing transactions. As in a centralized system, the recovery process should be executed each time a server is restarted after a crash. In a DDB, extra complexity is introduced by the fact that a distributed commit decision has to be made, and failures might occur during the execution of the atomic commitment protocol. A full analysis of how 2PC and 3PC alter the recovery process is given in [20], but in overview the extra complexity is due to the fact the other sites might need to be contacted during the recovery process to determine what action should be taken. For example, in 2PC, a server might fail after having issued a C-READY. During recovery, it should contact the coordinator to determine what has been the result of the vote, so that it knows whether to use the undo log to roll back the transaction (if the decision had been C-ROLLBACK ) or simply mark the transaction as complete in the local logs (if the decision has been C-COMMIT ). 9.5.4 Transaction Management in Heterogeneous DDBs We recall that a heterogeneous DDB consists of several autonomous local DB systems. There is thus a basic contradiction in executing global transactions over a heterogeneous DDB. This is because the GTM needs to exercise some degree of control over the LTMs to guarantee the ACID properties of global transactions, thereby violating the autonomy of the local DB systems. For example, if one local DB server decides to roll back a subtransaction of a global transaction, it would require the other servers participating in the execution of the transaction to also roll back their subtransactions, 322 Advanced Database Technology and Design thereby violating their autonomy. A further violation of local autonomy occurs if standard techniques such as 2PL are used to guarantee the serializability of global transactions, since this would require the LTMs of the various servers to export some of their transaction management capabilities in an external interface. In particular, the GTM needs access to the lock records, deadlock waits-for graph, and atomic commitment protocol of each LTM. A further complication is that different servers may support different atomic commitment protocols (some may use 2PC, others 3PC) or different concurrency control methods (some may use 2PL, others timestamping), or that some servers may allow nonserializable execution of transactions. Coor- dinating such disparate functionality to achieve global ACID properties can be prohibitively complex. These problems have led researchers to suggest that the serializability requirement be relaxed for heterogeneous DDBs by the adoption of different transaction models, such as workflow models. The next section briefly dis- cusses alternative transaction models. 9.6 Current Trends and Challenges 9.6.1 Alternative Transaction Models Conventional transaction models may be inadequate in distributed environments for two main reasons. First, there is the loss of autonomy of the local DBs. Second, there is the tying up of local resources, such as locks, at sites that are participating in the execution of long-running global transactions. One solution to those problems is relaxation of the serializability requirement, which has led to the development of several alternative transaction models. One approach is the use of sagas [21] rather than serializable global transactions. Sagas consist of a sequence of local subtransactions t 1 ; t 2 ; …; t n , such that for each t i it is possible to define a compensating transaction t i −1 that undoes its effects. After a local subtransaction commits, it releases its locks. If the overall saga later needs to be aborted, then, for all subtransactions that have already committed, their compensating transactions can be executed. However, notice that sagas can see the changes of other con- currently executing sagas that may later abort, thereby violating the isolation property. This loss of isolation needs to be taken into account by applications, and any data dependencies between different sagas need to be explicitly tested for. Distributed Databases 323 TEAMFLY Team-Fly ® Another approach is workflow models [22]. Workflows relax the ato- micity requirement of conventional transaction models in that it may be possible for one or more tasks of a workflow to fail without the entire workflow failing. The workflow designer is able to specify what the scheduling dependencies between the tasks making up a workflow are and what the permissible termination states of the workflow are. The workflow management system automatically guarantees that the workflow execution terminates in one of these states. 9.6.2 Mediator Architectures The mediator approach [23] to DB integration is a development of the five- level model in Figure 9.2. With the mediator approach, the export schemas are replaced by wrappers, which include more functionality, such as locking for concurrency control. The semantic integration of export schemas into global schemas is replaced by mediators. Apart from sourcing information from wrappers, mediators can contact other mediators and provide some intelligence, which allows negotiation between mediators to occur. In the mediator approach, the DDB is constructed in a top-down manner. A global schema is first created for a particular application. The application then requests one or more mediators to source the information in that global schema. The mediators use their knowledge to source data from the correct information wrappers. Note the use of the term information here rather than data. An advantage of the mediator approach is that semistructured data (such as Web documents) can be accessed by the mediators, as well as structured DBs. A second advantage is that changes to the structure of the information sources do not always require that the mediators be reconfigured. There have been a number of research implementations of the mediator approach. In the intelligent integration of information (I3) architecture [24], the basic notion of a mediator as the middle layer between applications and information sources has three additional components: facilitators, which search for likely sources of information and detect how those sources might be accessed; query processors, which reformulate queries to enhance the chance of a query being successfully answered from the available sources of information; and data miners, which search the information sources for unexpected properties. The knowledge reuse and fusion/transformation (KRAFT) architecture [25] extends the notion of wrappers so that information sources may initiate requests for information as well as service them. The middle layer between 324 Advanced Database Technology and Design applications and wrappers is termed the KRAFT domain, where messages can be exchanged between applications and wrappers. Facilitators serve to route those messages, and mediators serve to process operations based on the messages. 9.6.3 Databases and the World Wide Web The World Wide Web is based on the notion of browsers on client machines fetching from servers documents formatted in hypertext markup language (HTML), using a protocol called hypertext transfer protocol (HTTP). A DB can be statically connected to the World Wide Web by the use of an application on the server to read information from the DB and format the results in HTML. A DB can also be connected dynamically to the World Wide Web, by allowing requests from clients to cause the server to generate HTML documents from the DB. In both cases, the structure of the DB is lost, in that there is no standard method for describing the schema of HTML documents. They contain just the data with some formatting instructions; for that reason, HTML documents generally are referred to as semistructured data. The focus of some more recent works has been on methods by which a schema can be extracted from the semistructured data (e.g., [19]) and data extracted from HTML (e.g., [26]). The introduction of the extended markup language (XML) allows a much richer range of types to be associated with values in World Wide Web documents, since the XML definition can be regarded as a kind of DB schema. An interesting area of future work will be methods to query and integrate XML documents from diverse sources. References [1] Sheth, A., and J. Larson, Federated Database Systems, ACM Computing Surveys, Vol. 22, No. 3, 1990, pp. 183236. [2] Tsichritzis, D., and A. Klug, The ANSI/X3/SPARC DBMS Framework, Informa- tion Systems, Vol. 3, No. 4, 1978. [3] Templeton, M., et al., Mermaid: A Front-End to Distributed Heterogeneous Data- bases, Proc. IEEE, Vol. 75, No. 5, 1987, pp. 695708. [4] Devor, C., et al., Five-Schema Architecture Extends DBMS to Distributed Applica- tions, Electronic Design, Mar. 1982, pp. 2732. [5] Ramakrishnan, R., Database Management Systems, New York: McGraw-Hill, 1998. Distributed Databases 325 [6] Batini, C., M. Lenzerini, and S. Navathe, A Comparative Analysis of Methodologies for Database Schema Integration, ACM Computing Surveys, Vol. 18, No. 4, 1986, pp. 323364. [7] McBrien, P. J., and A. Poulovassilis, Automatic Migration and Wrapping of Data- base ApplicationsA Schema Transformation Approach, Proc. ER99, LNCS, Springer-Verlag, 1999. [8] McBrien, P. J., and A. Poulovassilis, A Formalisation of Semantic Schema Integra- tion, Information Systems, Vol. 23, No. 5, 1998, pp. 307334. [9] Elmasri, R., and S. Navathe, Fundamentals of Database Systems, 2nd ed., Redwood City, CA: Benjamin/Cummings, 1994. [10] Andersson, M., Extracting an Entity Relationship Schema From a Relational Data- base Through Reverse Engineering, Proc. ER94, LNCS, Springer-Verlag, 1994, pp. 403419. [11] Poulovassilis, A., and P. J. McBrien, A General Formal Framework for Schema Transformation, Data and Knowledge Engineering, Vol. 28, No. 1, 1998, pp. 4771. [12] McBrien, P. J., and A. Poulovassilis, A Uniform Approach to Inter-Model Transfor- mations, Proc. CAiSE99, LNCS, Springer-Verlag, 1999. [13] Ullman, J. D., Principles of Database and Knowledge-Base Systems, Vol. 1, Rockville, MD: Computer Science Press, 1988. [14] Bell, D., and J. Grimson, Distributed Database Systems, Reading, MA: Addison- Wesley, 1992. [15] Bernstein, P. A., V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems, Reading, MA: Addison-Wesley, 1987. [16] Mullender, S. (ed.), Distributed Systems, Reading, MA: Addison-Wesley, 1993. [17] McGoveran, D., and C. J. Date, A Guide to SYBASE and SQL Server, Reading, MA: Addison-Wesley, 1992. [18] Leverenz, L., Oracle8 Concepts, Vol. 2, Oracle, 1997. [19] Nestorov, S., S. Abiteboul, and R. Motwani, Extracting Schema From Semistruc- tured Data, SIGMOD Record, Vol. 27, No. 2, 1998, pp. 295306. [20] Özsu, M. T., and P. Valduriez, Principles of Distributed Database Systems, 2nd ed., Upper Saddle River, NJ: Prentice-Hall, 1999. [21] Garcia-Molina, H., and K. Salem, Sagas, Proc. SIGMOD 1987, 1987, pp. 249259. [22] Rusinkiewicz, M., and A. Sheth, Specification and Execution of Transactional Work- flows, in W. Kim (ed.), Modern Database Systems: The Object Model, Interoperability, and Beyond, Addison-Wesley/ACM Press, 1995. [23] Wiederhold, G., Mediators in the Architecture of Future Information Systems, IEEE Computer, Vol. 25, No. 3, 1992, pp. 3849. 326 Advanced Database Technology and Design [24] Wiederhold, G., Foreword to Special Issue on Intelligent Integration of Informa- tion, J. Intelligent Information Systems, Vol. 6, No. 23, 1996, pp. 9397. [25] Gray, P., et al., KRAFT: Knowledge Fusion From Distributed Databases and Knowl- edge Bases, Proc. 8th Intl. Workshop on Database and Expert System Applications, DEXA97, 1997. [26] Adelberg, B., NoDoSEA Tool for Semi-Automatically Extracting Structured and Semi-Structured Data From Text Documents, SIGMOD Record, Vol. 27, No. 2, 1998, pp. 283294. Selected Bibliography This chapter has given an overview of some of the many issues involved in the design and implementation of DDBs. An in-depth discussion of the whole area can be found in the book by Özsu and Valduriez [20]. This book also includes chapters on several topics that we have not covered here, including parallel DBs and distributed object-oriented DBs. Earlier books on DDBs include [14] and Distributed Databases: Principles and Systems,by S. Ceri and G. Pelagatti (McGraw-Hill, 1984). Modern Database Systems: The Object Model, Interoperability, and Beyond, edited by W. Kim (Addison- Wesley/ACM Press, 1995), contains several chapters on heterogeneous multi-DBMSs that collectively give a deeper treatment of the area than we have been able to do here. An extensive analysis of concurrency control and recovery, with some coverage of concurrency control and recovery in DDBs, is given in [15]. Distributed Databases 327 This Page Intentionally Left Blank [...]... Dao, S., and B Perry, Information Dissemination in Hybrid Satellite/Terrestrial Networks, IEEE Data Engineering Bulletin, Vol 19, No 3, Sept 1996 350 Advanced Database Technology and Design [36] Özsu, M T., and P Valduriez, Principles of Distributed Database Systems, 2nd ed., Upper Saddle River, NJ: Prentice-Hall, 1999 [ 37] Imielinski, T., and B R Badrinath, Querying in Highly Mobile and Distributed... computers In [41] there appears a query processing 342 Advanced Database Technology and Design interface, called query by icons, that addresses the features of screen size along with the limitations in memory and battery power and the restricted communication bandwidth In [42] the issue of how the pen and voice can be used as substitutes for the mouse and keyboard is addressed Moreover, in [43] there appears... coverage with few but expensive stations and with great delays when establishing communications that require a high power cost Low earth orbit satellites (LEOS) are smaller and less expensive, and communications have low cost but also a low data rate The cells are much smaller and allow 336 Advanced Database Technology and Design for more frequency reuse but imply more handoffs Medium earth orbit satellites... Operations and Management (DSOM 98), 1998 Selected Bibliography The concepts presented in this chapter can be further explored in [ 17, 18] and in an article by Daniel Barbará, Mobile Computing and DatabasesA Survey (IEEE Trans on Knowledge and Data Engineering, Vol 11, No 1, Jan./Feb 1999, pp 1081 17) Reference [ 17] starts with a good introduction to mobile computing and presents a set of projects and. .. include the scarce bandwidth, asymmetry in the communications, and the high frequency of disconnections 10.5.2.1 Scarce Bandwidth of Wireless Networks and Asymmetry in Communications Wireless networks offer a smaller bandwidth than wired networks Wireless networks offer a bandwidth that varies between 9 and 14 Kbps, while any Ethernet offers a bandwidth of 10 Mbps The oscillation in the bandwidth is more... Management Issues 349 [20] Si, A., and H V Leong, Adaptive Caching and Refreshing in Mobile Databases, Personal Technologies, Sept 19 97 [21] Zenel, B., and D Duchamp, A General Purpose Proxy Filtering Mechanism Applied to the Mobile Environment, Proc 3rd Annual ACM/IEEE Intl Conf on Mobile Computing and Networking, Sept 19 97 [22] Elmasri, R., and S B Navathe, Fundamentals of Database Systems, 3rd ed.,... because mobile computing is not yet mature and many problems must be solved, it is expected that new proposals will appear in the future References [1] Kistler, J J., and M Satyanarayanan, Disconnected Operation in the Coda File System, ACM Trans on Computer Systems, Vol 10, 1992, pp 213225 348 Advanced Database Technology and Design [2] Mazer, M S., and J J Tardo, A Client-Side-Only Approach... 334 Advanced Database Technology and Design With respect to the wireless part of the architecture, there exist a lack of standards and limited performance features with todays second digital generation of mobile communications systems However, a third generation of mobile communication systems is emerging It is formed by systems like the European Universal Mobile Telecommunication Systems (UMTS) and. .. Disks, Personal Communications, Vol 2, No 6, Dec 1995 [31] Imielinski, T., S Viswanathan, and B R Badrinath, Data on Air: Organization and Access, IEEE Trans on Knowledge and Data Engineering, Vol 9, No 3, 19 97 [32] Leong, H V., and A Si, Database Caching Over the Air Storage, Computer J., Vol 40, No 7, 19 97, pp 401415 [33] Hughes Network Systems DirecPC, http://www.direcpc.com [34] Glance, D.,... before arriving at the clients office, and while she has a coffee she reviews the most important data related to the client to prepare for the interview Once the interview has been concluded, she enlarges the clients file, registering that days visit and its results and sending the new data to her companys DB During the rest of 332 Advanced Database Technology and Design the day, our salesperson goes . time-division multiple access (TDMA) and code-division multiple access (CDMA). There are several basic standards deployed in Europe and 334 Advanced Database Technology and Design the United States. In. TeleWeb [9], and Weblicator [10], the proxy is added to the client part of the system. In other systems, like TranSend [11] and Digestor [12], the 330 Advanced Database Technology and Design proxy. middle layer between 324 Advanced Database Technology and Design applications and wrappers is termed the KRAFT domain, where messages can be exchanged between applications and wrappers. Facilitators

Định dạng
Số trang	56
Dung lượng	437,97 KB