Building Secure and Reliable Network Applications phần 8 pptx

Chapter 18: Reliable Distributed Computing Systems 359 359 Using Horus, it was straightforward to extend CMT with fault-tolerance and multicast capabilities. Five Horus stacks were required. One of these is hidden from the application, and implements a clock synchronization protocol [Cri89]. It uses a Horus layer called MERGE to ensure that the different machines will find each other automatically (even after network partitions), and employs the virtual synchrony property to rank the processes, assigning the lowest ranked machine to maintain a master clock on behalf of the others. The second stack synchronizes the speeds and offsets with respect to real-time of the logical timestamp objects. To keep these values consistent, it is necessary that they be updated in the same order. Therefore, this stack is similar to the previous one, but includes a Horus protocol block that places a total order on multicast messages delivered within the group. 18 The third tracks the list of servers and clients. Using a deterministic rule based on the process ranking maintained by the virtual synchrony layer, one server decides to multicast the video, and one server, usually the same, decides to multicast the audio. This set-up is shown in Figure 18-5b. To disseminate the multi-media data, we used two identical stacks, one for audio and one for video. The key component in these is a protocol block that implements a multi-media generalization of the Cyclic UDP protocol. The algorithm is similar to FRAG, but will reassemble messages that arrive out of order, and drop messages with missing. One might expect that a huge amount of recoding would have been required to accomplish these changes. However, all of the necessary work was completed using 42 lines of Tcl code. An additional 160 lines of C code supports the CMT frame buffers in Horus. Two new Horus layers were needed, but were developed by adapting existing layers; they consist of 1800 lines of C code and 300 lines, respectively (ignoring the comments and lines common to all layers). Thus, with relatively little effort and little code, a complex application written with no expectation that process group computing might later be valuable was modified to exploit Horus functionality. 18.5 Using Horus to Harden CORBA applications The introduction of process groups into CMT required sophistication with Horus and its intercept proxies. Many potential users would lack the sophistication and knowledge of Horus required to do this, hence we recognized a need for a way to introduce Horus functionality in a more transparent way. This goal evokes an image of “plug and play” robustness, and leads one to think in terms of an object-oriented approach to group computing. Early in this text we looked at CORBA, noting that object-oriented distributed applications that comply with the CORBA ORB specification and support the IOP protocol can invoke one-another's methods with relative ease. Our work resulted in a CORBA compliant interface to Horus, which we call Electra [Maf95]. Electra can be used without Horus, and vice versa, but the combination represents a more complete system. In Electra, applications are provided with ways to build Horus process groups, and to directly exploit the virtual synchrony model. Moreover, Electra objects can be aggregated to form “object groups,” and object references can be bound to both singleton objects and object groups. An implication of the interoperability of CORBA implementations is that Electra object groups can be invoked from any CORBA-compliant distributed application, regardless of the CORBA platform on which it is running, without special provisions for group communication. This means that a service can be made fault-tolerant without changing its clients. 18 This protocol differs from the Total protocol in the Trans/Total[MMABL96] project in that the Horus protocol only rotates the token among the current set of senders, while the Trans/Total protocol rotates the token among all members. Kenneth P. Birman - Building Secure and Reliable Network Applications 360 360 When a method invocation occurs within Electra, object-group references are detected and transformed into multicasts to the member objects (see Figure 18-6). Requests can be issued either in transparent mode, where only the first arriving member reply is returned to the client application, or in non-transparent mode, permitting the client to access the full set of responses from individual group members. The transparent mode is used by clients to communicate with replicated CORBA objects, while non-transparent mode is employed with object groups whose members perform different tasks. Clients submit a request either in a synchronous, asynchronous, or deferred-synchronous way. The integration of Horus into Electra shows that group programming can be provided in a natural, transparent way with popular programming methodologies. The resulting technology permits the use to “plug in” group communication tools anywhere that a CORBA application has a suitable interface. To the degree that process-group computing interfaces and abstractions represent an impediment to their use in commercial software, technologies such as Electra suggest a possible middle ground, in which fault-tolerance, security, and other group-based mechanisms can be introduced late in the design cycle of a sophisticated distributed application. 18.6 Basic Performance of Horus A major concern of the Horus architecture is the overhead of layering, hence we now focus on this issue. This section present the overall per formance of Horus on a system of SUN Sparc10 workstations running SunOS 4.1.3, communicating through a loaded Ethernet. We used two network transport protocols: normal UDP, and UDP with the Deering IP multicast extensions [Dee88] (shown as “Deering”). To highlight some of the performance numbers: Horus achieves a one-way latency of 1.2 msecs over an unordered virtual synchrony stack (over ATM, it is currently 0.7 msecs), and, using a totally CORBA object Host C CORBA object Host B CORBA object Host A CORBA object ref. Client Application Requests Replies HORUS Object Group Invocations Replies Figure 18-6: Object-group communication in Electra, a CORBA-compliant ORB that uses Horus to implement group multicast. The invocation method can be changed depending on the intended use. Orbix+Isis and the COOL- ORB are examples of commercial products that support object groups Chapter 18: Reliable Distributed Computing Systems 361 361 ordered layer over the same stack, 7,500 1-byte messages per second. Given an application that can accept lists of messages in a single receive operation, we can drive up the total number of messages per second to over 75,000 using the FC Flow-Control layer, which buffers heavily using the “message list” capabilities of Horus [FR95a]. Horus easily reached the Ethernet 1007 Kbytes/second maximum bandwidth with a message size smaller than 1 kilobyte. The performance test program has each member do exactly the same thing: send k messages and wait for k (n -1) messages of size s, where s is the number of members. This way we simulate an application that imposes a high load on the system while occasionally synchronizing on intermediate results. Figure 18-7 depicts the one-way communication latency of 1-byte Horus messages. As can be seen in the top graph, hardware multicast is a big win, especially when the message size goes up. In the bottom graph, we compare FIFO to totally ordered communication. For small messages we get a FIFO one-way latency of about 1.5 milliseconds and a totally ordered one-way latency of about 6.7 milliseconds. A problem with the totally ordered layer is that it can be inefficient when senders send single messages at random, and with a high degree of concurrent sending by different group members. With just one sender, the one-way latency drops to 1.6 milliseconds. Obtain Data from CACM paper Figure 18-7: The left figure compares the one-way latency of 1-byte FIFO Horus messages over straight UDP and UDP with the Deering IP multicast extensions. The right figure compares the performance of total and FIFO order of Horus, both over UDP multicast. Obtain Data from CACM paper Figure 18-8: These graphs depict the message throughput for virtually synchronous, FIFO ordered communication over normal UDP and Deering UDP, and for totally ordering communication over Deering UDP. Kenneth P. Birman - Building Secure and Reliable Network Applications 362 362 Figure 18-8 shows the number of 1-byte messages per second that can be achieved for three cases. For normal UDP and Deering UDP the throughput is fairly constant. For totally ordered communication we see that the throughput becomes better if we send more messages per round (because of increased concurrency). Perhaps surprisingly, the throughput also becomes better as the number of members in the group goes up. The reason for this is threefold. First, with more members there are more senders. Second, with more members it takes longer to order messages, and thus more messages can be packed together and sent out in single network packets. Last, the ordering protocol allows only one sender on the network at a time, thus introducing flow control and reducing collisions. 18.7 Masking the Overhead of Protocol Layering Although layering of protocols can be advocated as a way of dealing with the complexity of computer communication, it is also criticized for its performance overhead. Recent work by Van Renesse has yielded considerable insight regarding the design of protocols, which van Renesse uses to mask the overhead of layering in Horus. The fundamental idea is very similar to client caching in a file system. With these new techniques, he achieves an order of magnitude improvement in end-to-end message latency in the Horus communication framework, compared to the best latency possible using Horus without these optimizations. Over an ATM network, the approach permits applications to send and deliver messages of varying levels of semantics in about 85us, using a protocol stack that is written in ML, an interpreted functional language. In contrast, the performance figures shown in the previous section were for a version of Horus coded in C, and carefully optimzed by hand but without use of the protocol accelerator. Having presented this material in seminars, the author has noticed that the systems community seems to respond to the very mention of the ML language with skepticsm, and it is perhaps appropriate to comment on this before continuing. First, the reader should keep in mind that a technology such as Horus is simply a tool that one uses to harden a system. It makes little difference whether such a tool is internally coded in C, assembler language, Lisp, or ML if it works well for the desired purpose. The decision to work with a version of Horus coded in ML is not one that would impact the use of Horus in applications that work with the technology through wrappers or toolkit interfaces. However, as we will see here and in Chapter 25, it does bring some important benefits for Horus itself, notably the potential for us to harden the system using formal software analysis tools. Moreover, although ML is often viewed as obscure and of academic interest, the version of ML used in our work on Horus is not really so different from Lisp or C++ once one becomes accustomed to the syntax. Finally, as we will see here, the performance of Horus coded in ML is actually better than that of Horus coded in C, at least for certain patterns of communication. Thus we would hope that the reader will recognize that the work reported here is in fact very practical. As we saw in earlier chapters, modern network technology allows for very low latency communication. For example, the U-Net [EBBV95] interface to ATM achieves 75 microsecond round-trip communication as long as the message is 40 bytes or smaller. On the other hand, if a message is larger, it will not fit in a single ATM cell, significantly increasing the latency. This points to two basic concerns: first, that systems like Horus need to be designed to take full advantage of the potential performance of current communications technology, and secondly that to do so, it will be important that Horus protocols use small headers, and introduce minimal processing overhead. Unfortunately, these properties are not typical of the protocol layers needed to implement virtual synchrony. Many of these protocols are complex, and layering introduces additional overhead of its own. One source of overhead is interfacing: crossing a layer costs some CPU cycles. The other is header overhead. Each layer uses its own header, which is prepended to every message and usually padded so that each header is aligned on a 4 or 8 byte boundary. Combining this with a trend to very large addresses Chapter 18: Reliable Distributed Computing Systems 363 363 (of which at least two per message are needed), it is impossible to have the total amount of header space be less than 40 bytes. The Horus Protocol Accelerator (Horus PA) eliminates these overheads almost entirely, and offers the potential of a one to three orders of magnitude of latency improvement over the protocol implementations described in the previous subsection. For example, we looked at the impact of the Horus PA on an ML [MTH90] implementation of a protocol stack with five layers. The ML code is interpreted (although in the future it will be compiled), and therefore relatively slow compared to compiled C code. Nevertheless, between two SunOS user processes on two Sparc 20s connected by a 155 Mbit/sec ATM network, the Horus PA permits these layers to achieve a roundtrip latency of 175 microseconds, down from about 1.5 milliseconds in the original Horus system (written in C). The Horus PA achieves its results using three techniques. First, message header fields that never change are only sent once. Second, the rest of the header information is carefully packed, ignoring layer boundaries, typically leading to headers that are much smaller than 40 bytes, and thus leaving room to fit a small message within a single U-Net packet. Third, a semi-automatic transformation is done on the send and delivery operations, splitting them into two parts: one that updates or checks the header but not the protocol state, and the other vice versa. The first part is then executed by a special packet filter (both in the send and the delivery path) to circumvent the actual protocol layers whenever possible. The second part is executed, as much as possible, when the application is idle or blocked. 18.7.1 Reducing Header Overhead In traditional layered protocol systems, each protocol layer designs its own header data structure. The headers are concatenated and prepended to each user message. For convenience, each header is aligned to a 4 or 8 byte boundary to allow easy access. In systems like the x-Kernel or Horus, where many simple protocols may be stacked on top of each other, this may lead to extensive padding overhead. Some fields in the headers, such as the source and destination addresses, never change from message to message. Yet, instead of agreeing on these values, they are frequently included in every message, and used as the identifier of the connection to the peer. Since addresses tend to be large (and getting larger to deal with the rapid growth the Internet), this results in significant use of space for what are essentially constants of the connection. Moreover, notice that the connection itself may already be identifiable from other information. On an ATM network, connections are “named” by a small 4 byte VPI/VCI pair, and every packet carries this information. Thus, constants such as sender and destination addresses are implied by the connection identifier and including them in the header is superfluous. The Horus PA exploits these observations to reduce header sizes to a bare minimum. The approach starts by dividing header fields into four classes: • Connection Identification  fields that never change during the period of a connection, such as sender and destination. • Protocol-specific Information  fields that are important for the correct delivery of the particular message frame. Examples are the sequence number of a message, or the message type (Horus messages have types, such as “data”, “ack”, or “nack”). These fields must be deterministically implied by the protocol “state”, and not on the message contents or the time at which it was sent. • Message-specific information  fields that need to accompany the message, such as the message length and checksum, or a timestamp. Typically, such information depends only on the message, and not on the protocol state. • Gossip  fields that technically do not need to accompany the message, but are included for efficiency. Kenneth P. Birman - Building Secure and Reliable Network Applications 364 364 Each layer is expected to declare the header fields that it will use during initialization, and subsequently accesses fields using a collection of highly optimized functions implemented by the Horus PA. These functions extract values directly from headers if they are present, and otherwise compute the appropriate field value and return that instead. This permits the Horus PA to precompute header templates that have optimized layouts, with a minumum of wasted space. Horus includes the Protocol-specific and Message-specific information in every message. Currently, although not technically necessary, Gossip information is also always included, since it is usually small. However, since the Connection Identification fields never change, they are only included occasionally because they tend to be large. A 64-bit “mini-header” is placed on each message to indicate which headers it actually includes. Two bits of this are used to indicate whether or not the connection identification is present in the message and to destinate the byte-ordering for bytes in the message. The remaining 62-bits are a connection cookie, which is a magic number established in the connection identification header and selected randomly, to identifythe connection. The idea is that the first message sent over a connection will a connection identifier, specifying the cookie to use, and providing an initial copy of the connection identification fields. Subsequent messages need only contain the identification field if it has changed. Since the Connection Identification tend to include very large identifiers, this mechanism reduces the amount of header space in the normal case significantly. For example, in the version of Horus that Van Renesse used in his tests, the connection identification typically occupies about 76 bytes. 18.7.2 Eliminating Layered Protocol Processing Overhead In most protocol implementations, layered or not, a great deal of processing must be done between the application's send operation, and the time that the message is actually sent out onto the network. The same is true between the arrival of a message and the delivery to the application. The Horus PA reduces the length of the critical path by updating the protocol state only after a message has been sent or delivered, and by precomputing any statically predictable protocol-specific header fields, so that the necessary values will be known before the application generates the next message (Figure 18-9). These methods work because the protocol-specific information for most messages can be predicted (calculated) before the message is sent or delivered. (Recall that, as noted above, such information must not depend on the message contents or the time on which it was sent). Each connection maintains a predicted protocol- specific header for the next send operation, and another for the next delivery (much like a read-ahead strategy in a file system). For sending, the gossip information can be predicted as well, since this does not depend on the message contents. The idea is a bit like that of prefetching in a file system. Chapter 18: Reliable Distributed Computing Systems 365 365 Thus, when a message is actually sent, only the message-specific header will need to be generated. This is done using a packet filter [MRA87], which is constructed at the time of layer initialization. Packet filters are programmed using a simple programming language (a dialect of ML), and operate by extracting information from the message needed to form the message-specific header. A filter can also hand off a message to the associated layer for special handling, for example if a message fails to satisfy some assumption that was used in predicting the protocol-specific header. In the usual case, the message-specific header will be computed, other headers are prepended from the precomputed versions, and the message is transmitted with no additional delay. Because the header fields have fixed and precomputed sizes, a header template can be filled in with no copying, and scatter-send/scatter-gather hardware used to transmit the header and message as a single packet without copying them first to a single place. This reduces the computational cost of sending or delivering a message to a bare minimum, although it leaves some background costs in the form of prediction code that must be executed before the next message is sent or delivered 18.7.3 Message Packing The Horus PA as described so far will reduce the latency of individual messages significantly, but only if they are spaced out far enough to allow time for post-processing. If not, messages will have to wait until the post-processing of every previous message completes (somewhat like a process that reads file system records faster than they can be prefetched). To reduce this overhead, the Horus PA uses message packing [FR95] to deal with backlogs. The idea is a very simple one. After the post-processing of a send operation completes, the PA checks to see if there are messages waiting. If there are more than one, the PA will pack these messages together into a single message. The single message is now processed in the usual way, which takes only one pre-processing and post-processing phase. When the packed message is ready for delivery, it is unpacked and the messages are individually delivered to the application. Returning to our file system analogy, the approach is similar to one in which the application could indicate that it plans to read three 1k data blocks. Rather than fetching them one by one, the file system can now fetch them all at the same time. Doing so amortizes the overhead associated with fetching the blocks, permitting better utilization of network bandwidth. 18.7.4 Performance of Horus with the Protocol Accelerator The Horus PA dramatically improved the performance of the system over the base figures described earlier (which were themselves comparable to the best performance figures cited for other systems). With the accelerator, one-way latencies dropped to as little as 85us (compared to 35us for the U-Net implementation over which the accelerator was tested). As many as 85,000 one-byte messages could be sent and delivered per second, over a protocol stack of five layers implementing the virtual synchrony model within a group of two members. For RPC-style interactions, 2,600 round-trips per second were achieved. These latency figures, however, represent a best-case scenario in which the frequency of messages was low enough to permit the predictive mechanisms to operate; when they become overloaded, Pre-process multicast n Data-dependent stage, multicast n Post-process multicast n Data-dependent stage, multicast n Post-process multicast n Pre-process multicast n+1 Figure 18-9: Restructuring a protocol layer to reduce the critical path. By moving data-dependent code to the front, delays for sending the next message are minimized. Post- processing of the current multicast and preprocessing of the next multicast (all computation that can be done before seeing the actual contents of the message) are shifted to occur after the current multicast has been sent, and hence concurrently with application-level computing. Kenneth P. Birman - Building Secure and Reliable Network Applications 366 366 latency increases to about 425us for the same test pattern. This points to a strong dependency of the method on the speed of the code used to implement layers. Van Renesse’s work on the Horus PA made use of a version of the ML programming language which was interpreted, not compiled. ML turns out to be a very useful language for specifying Horus layers: it lends itself to formal analysis and permits packet filters to actually be constructed at runtime; moreover, the programming model is well matched to the functional style of programming used to implement Horus layers. ML compiler technology is rapidly evolving, and when the Horus PA is moved to a compiled version of ML the sustainable load should rise and these maximum latency figures drop. The Horus PA does suffer from some limitations. Message fragmentation and reassembly is not supported by the PA, hence the pre-processing of large messages must be handled explicitly by the protocol stack. Some technical complications result from this design decision, but it reduces the complexity of the PA and hence improves the maximum performance achievable using it. A second limitation is that the PA must be used by all parties to a communication stack. However, this is not an unreasonable restriction, since Horus has the same sort of limitation with regard to the stacks themselves (all members of a group must use identical or at least compatible protocol stacks). 18.8 Scalability Up to the present, this text as largely overlooked issues associated with protocol scalability. Although a serious treatment of scalability in the general sense might require a whole textbook in itself, the purpose of this section is to set out some general remarks on the subject, as we have approached it in the Horus project. It is perhaps worthwhile to comment that, overall, surprisingly little is known about scaling reliable distributed systems. If one looks at the scalability of Horus protocols, as we did earlier in presenting some basic Horus performance figures, it is clear that Horus performs well for groups with small numbers of members, and for moderately large groups when IP multicast is available as a hardware tool to reduce the cost of moving large volumes of data to large numbers of destinations. Yet although these graphs are honest, they may be misleading. In fact, as systems like Horus are scaled to larger and larger numbers of participating processes, they experience steadily growing overheads, in the form of acknowldgements and negative acknowledgements from the recipient processes to the senders. A consequence is that if these systems are used with very large numbers of participating processes, the “backflow” associated with these types of messages and with flow control becomes a serious problem. A simple thought experiment suffices to illustrate that there are probably fundamental limits on reliability in very large networks. Suppose that a communication network is extremely reliable, but that the processes using it are designed to distrust that network, and to assume that it may actually malfunction by losing messages. Moreover, assume that these processes are in fact closely rate-matched (the consumers of data keep up with the producers), but again that the system is designed to deal with individual processes that lag far behind. Now, were it not for the backflow of messages to the senders, this hypothetical system might perform very well near the limits of the hardware. It could potentially be scaled just by adding new recipient processes, and with no changes at all, continue to provide a high observed level of reliability. However, the backflow messages will substantially impact this simple and rosy scenario. They represent a source of overhead, and in the case of flow control messages, if they are not received, the sender may be forced to stop and wait for them. Now, the performance of the sender side is coupled to the timely and reliable reception of backflow messages, and as we scale the number of recipients connected to the system, we can anticipate a traffic jam phenomenon at the sender’s interface (protocol designers call Chapter 18: Reliable Distributed Computing Systems 367 367 this an acknowledgement “implosion”) that will cause traffic to get increasingly bursty and performance to drop. In effect, the attempt to protect against the mere risk of data loss or flow control mismatches is likely to slash the maximum achievable performance of the system. Now, obtaining a stable delivery of data near the limits of our technology will become a tremendously difficult juggling problem, in which the protocol developer must trade the transmission of backflow messages against their performance impact. Graduate students Guerney Hunt and Michael Kalantar have studied aspects of this problem in their doctoral dissertations at Cornell University, both using special purpose experimental tools (that is, neither actually experimented on Horus or a similar system; Kalantar, in fact, worked mostly with a simulator). Hunt’s work was on flow control in very large scale system. He concluded that most forms of backflow were unworkable on a large scale, and ultimately proposed a rate-based flow control scheme in which the sender limits the transmission rate for data to match what the receivers can accomodate [Hunt95]. Kalantar looked at the impact of multicast ordering on latency, asking how frequently an ordering property such as causal or total ordering would significantly impact the latency of message delivery [Kal95]. He found that although ordering had a fairly small impact on latency, there were other much important phenomena that represented serious potential concerns. In particular, Kalantar discovered that as he scaled the size of his simulation, message latencies tended to become unstable and bursty. He hypothesized that in large-scale protocols, the domain of stable performance becomes smaller and smaller. In such situations, a slight perturbation of the overall system, for example because of a lost message, could cause much of the remainder of the system to block because of reliability or ordering constraints. Now, the system would shift into what is sometimes called a convoy behavior, in which long message backlogs build up and are never really eliminated; they may shift from place to place, but stable, smooth delivery is generally not restored. In effect, a bursty scheduling behavior represents a more stable configuration of the overall system than one in which message delivery is extremely regular and smooth, at least if the number of recipients is large and the presented load is a substantial percentage of the maximum achievable (so that there is little slack bandwidth with which the system can catch up after an overload develops). Hunt’s and Kalantar’s observations are not really surprising ones. It makes sense that it should be easy to provide reliability or ordering when far from the saturation point of the hardware, and much harder to do so as the communication or processor speed limits are approached. Over many years of working with Isis and Horus, the author has gained considerable experience with these sorts of scaling and flow control problems. Realistically, the conclusion can only be called a mixed one. On the positive side, it seems that one can fairly easily build a reliable system if the communication load won’t exceed, perhaps, 20% of the capacity of the hardware. With a little luck, one can even push as high as perhaps 40% of the hardware. (Happily, hardware is becoming so fast that this may still represent a very satisfactory level of perfomance long into the future!) However, as the load presented to the system rises beyond this threshold, or if the number of destinations for a typical message becomes very large (hundreds), it becomes increasingly difficult to guarantee reliability and flow control. A fundamental tradeoff seems to be present: one can send the data and hope that it will usually arrive, and by doing so, may be able to operate quite reliably near the limits of the hardware. But, of course, if a process falls behind, it may lose large numbers of messages before it recovers, and no mechanism is provided to let it recover these from any form of backup storage. On the other hand, one can operate in a less demanding performance range, and in this case provide reliability, ordering, and performance guarantees. In between the two, however, lies a domain that is extremely difficult in an engineering sense and often requires a very high level of software complexity, which will necessarily reduce reliability. Moreover, one can raise serious questions about the stability of message passing systems that operate in this intermediate domain, where the load presented is near the limits of Kenneth P. Birman - Building Secure and Reliable Network Applications 368 368 what can be accomplished. The typical experience in such systems is that they perform well, most of the time, but that once something fails, the system falls so far behind that it can never again catch up: in effect, any perturbation can shift such a system into the domain of overloads and hopeless backlogs. Where does Horus position itself in this spectrum? Although the performance data shown earlier may suggest that the system seeks to provide scalable reliability, it is more likely that successful Horus applications will seek one property or the other, but not both at once, or at least not both when performance is demanding. In Horus, this is done by using multiple protocol stacks, in which the protocol stacks providing strong properties are used much less frequently, while the protocol stacks providing weaker reliability properties may be used for high volume communication. As an example, suppose that Horus were to be used to build a stock trading system. It might be very important to ensure that certain clases of trading information will reach all clients, and for this sort of information, a stack with strong reliability properties could be used. But as a general rule, the majority of communication in such systems will be in the form of bid/offered pricing, which may not need to be delivered quite so reliably: if a price quote is dropped, the loss won’t be serious so long as the next quote has a good probability of getting through. Thus, one can visualize such a system as having two superimposed architectures: one, which has much less traffic, and much stronger reliability requirements, and a second one with much greater traffic but weaker properties. We saw a similar structure in the Horus application to the CMT system: here, the stronger logical properites were reserved for coordination, timestamp generation, and agreement on such data as system membership. The actual flow of video data was through a protocol stack with very different properties: stronger temporal guarantees, but weaker reliability properties. In building scalable reliable systems, such tradeoffs may be intrinsic. In general, this leads to a number of interesting problems, having to do with the synchronization and ordering of data when multiple communication streams are involved. Researchers at the Hebrew University in Jerusalem, working with a system similar to Horus called Transis (and with Horus itself), have begun to investigate this issue. Their work, on providing strong communication semantics in applications that mix multiple “quality of service” properties at the transport level, promises to make such multi-protocol systems more and more manageable and controlled [Iditxx]. More broadly, it seems likely that one could develop a theoretical argument to the effect that reliability properties are fundamentally at odds with high performance. While one can scale reliable systems, they appear to be intrinsically unstable if the result of the scaling is to push the overall system anywhere close to the maximum performance of the technology used. Perhaps some future effort to model these classes of systems will reveal the basic reasons for this relationship and point to classes of protocols that degrade gracefully while remaining stable under steadily increasing scale and load. Until then, however, the heuristic recommended by this writer is to scale systems, by all means, but to be extremely careful not to expect the highest levels of reliabilty, performance and scale simultaneously. To do so is simply to move beyond the limits of problems that we know how to solve, and may be to expect the impossible. Instead, the most demanding systems must somehow be split into subsystems that demand high performance but can manage with weaker reliability properties, and subsystems that need reliabilty, but will not be subjected to extreme performance demands. 18.9 Related Readings Chapter 26 includes a review of related research activities, which we will not duplicate here. On the Horus system: [BR96, RBM96, FR95]. Horus used in a real-time telephone switching application: Section 20.3 [FB96]. Virtual fault-tolerance: [BS95]. Layered protocols: [CT87, AP93, BD95, KP93, KC94]. Event counters: [RK79]. The Continuous Media Toolkit: [RS92]. U-Net [EBBV95]. Packet [...]... [SNS 88, Sch94] Associated theory [LABW92, BM90] RSA and DES: [DH79, RSA 78, DES 88, Den84] Fortezza: most information is online, but [Den96] includes a brief review Rampart: [RBG92, RBR95, Rei93, Rei94a, Rei94b] Split-key cryptographic techniques and associated theory: [HT87, Des 88, Fra89, LH91, DFY92, FD92] Mixing techniques [Cha81, Coo94, CB95] 383 384 Kenneth P Birman - Building Secure and Reliable Network. .. t+∆-ε/2 and t+c to t+∆+ε/2 389 Kenneth P Birman - Building Secure and Reliable Network Applications 390 p0 t t+a t+b t+c * p1 p2 * * p3 * p4 p5 * Figure 20-3: In the CASD protocol, messages are delivered with real-time guarantees despite a variety of possible failures In this example for a fully connected network (d=1), processes p0 and p1 are faulty and send the message only to one destination each p2 and. .. Srikanth and Toueg mentioned above However, when a GPS receiver is present in a distributed system that has a standard 387 388 Kenneth P Birman - Building Secure and Reliable Network Applications broadcast-style LAN architecture, the a-posteriori method will be optimal in both respects: accuracy and precision, with clock accuracies that are comparable in magnitude to the variation in message latencies... successful, such an attack could in principle last long enough for the 381 382 Kenneth P Birman - Building Secure and Reliable Network Applications “names” involved to expire, at which point the card must be reprogrammed or replaced However, secured information will never be revealed even if the system is attacked in this manner, and incorrect authentication will never occur Although Fortezza is designed... is currently a de-facto standard in the UNIX community The approach genuinely offers a major improvement in security over that which is traditionally available within UNIX Its primary limitation is that applications 377 3 78 Kenneth P Birman - Building Secure and Reliable Network Applications using Kerberos must be modified to create communication channels using the Kerberos secure channel facilities...Chapter 18: Reliable Distributed Computing Systems 369 filters (in Mach) [MRA87] Chapter 25 discusses verification of the Horus protocols in more detail; this work focuses on the same ML implementation of Horus to which the Protocol Accelerator was applied 369 370 Kenneth P Birman - Building Secure and Reliable Network Applications 19 Security Options for Distributed... [LAM84, LM85, Mar84, LM85, KO87, ST87, Cri89, CF94, VR92, CM96], overviews of the field can be found in [SWL90] and [Lis93] The introduction of the global positioning system, in the early 1990’s, greatly changed the situation As recently as five years ago, a textbook such as this would have treated the problem in considerable detail, to the benefit of the reader because the topic is an elegant one and. .. remarkably high degree of synchronization: it is rarely necessary to time events to within accuracy’s of a millisecond or less, and these limits tell us that it should be possible to synchronize clocks to that degree if desired 385 386 Kenneth P Birman - Building Secure and Reliable Network Applications Accuracy is a characterization of the degree to which a correct clock can differ from an external clock that... computers, in his command the security mechanisms are disabled (The military phrase is that “he runs all his computers at system high”) This illustrates a fundamental point which is overlooked by most security technologies today: security cannot be treated independent of other aspects of reliability 379 380 Kenneth P Birman - Building Secure and Reliable Network Applications 19.3.3 ONC security and NFS SUN Microsystems... difficult to secure a distributed system and very hard to add security to a technology that already exists and must be treated as a form of black box The best known technologies, such as Kerberos, are still used only sporadically This makes it hard to implement customized security mechanisms, and leaves the average distributed system quite open to 371 372 Kenneth P Birman - Building Secure and Reliable Network . communication over normal UDP and Deering UDP, and for totally ordering communication over Deering UDP. Kenneth P. Birman - Building Secure and Reliable Network Applications 362 362 Figure 18- 8 shows the number. is that applications Kenneth P. Birman - Building Secure and Reliable Network Applications 3 78 3 78 using Kerberos must be modified to create communication channels using the Kerberos secure channel facilities Birman - Building Secure and Reliable Network Applications 372 372 attack. Break-ins and security violations are extremely common in the most standard distributed computing environments, and there

Định dạng
Số trang	51
Dung lượng	372,4 KB