1. Trang chủ
  2. » Ngoại Ngữ

PEERSTORE a PEER TO PEER BACKUP SYSTEM

93 153 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 93
Dung lượng 290,09 KB

Nội dung

PEERSTORE: A PEER-TO-PEER BACKUP SYSTEM ZHANG HAN NATIONAL UNIVERSITY OF SINGAPORE 2004 Name: Zhang Han Degree: Master of Science Dept: Computer Science Thesis Title: PeerStore: A Peer-to-Peer Backup System Keywords: Peer-to-Peer, Backup, Distributed Hash Table, metadata Abstract The vision of this thesis is to focus on designing a Peer-to-Peer backup system to be suited into large unstable networks Peer-to-peer backup, the implementation of a data backup service on top of a peer-to-peer network has received research attention in recent years This thesis tries to concentrate on the use of Peer-to-Peer backup in the Internet, with a large number of anonymous users The thesis offers both a general analysis of the requirements and issues of peer-to-peer backup, and a new design for such a system Existing systems are introduced and their suitability for Internet is evaluated, before we present our own novel peer-to-peer backup scheme PeerStore PeerStore offers better performance by separating the tasks of data placement and metadata management, its improvements are shown by running the experiments in real networks PEERSTORE: A PEER-TO-PEER BACKUP SYSTEM ZHANG HAN (B.Comp.(Hons.) NUS) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE 2004 Acknowledgement I would like to express my sincere gratitude to my supervisor, Prof Tan Kian-Lee, for his guidance and patience His advice, insights and comments have helped me tremendously throughout my master year Working under Prof Tan is a great experience and he has enriched my experience greatly in being a researcher I am particularly grateful to Martin A Landers, my project partner from Munich, who worked with me on the PeerStore project for half a year The PeerStore and this thesis would be impossible without his help and effort At the same time, I would like to thank my parents, for their support and encouragement throughout my years of studies They have guided me both in study and in life, I hope what I have done and what I will be done make them proud of me Last but not least, I am deeply indebt to my girlfriend in China She helped me through the entire year of my Master study despite the long distance between us I must thank her for giving me care when I needed most Summary This thesis studies various issues related to Peer-to-Peer backup, which is a new service based on a typical Peer-to-Peer network We study various systems and techniques proposed in recent years in Peer-to-Peer backup area and propose our own novel scheme: PeerStore We implement existing system as well as our own proposed system to be run in real network Peer-to-Peer backup, the implementation of a data backup on top of a Peer-toPeer network, offers interesting possibilities for both corporate users and private users In recent research works, corporate scenario has been well studied and several schemes have been proposed which are proven to be well-suited However, for private users, especially like those anonymous users connected to large unstable networks such as the Internet, these schemes may not be applicable or may incur high maintenance cost The main target of this thesis is to design a new system to take care of these large network users in doing backup, while at the same time impose certain mechanisms for security concerns A detailed analysis is carried out to explore different requirements and issues related to both the underlying Peer-to-Peer network and the top level backup semantics The backup part of the system requires a high degree of flexibility, while the Peerto-Peer part, invisibility should be the main concern Some challenging tasks are to deal with limited system resources, support for fairness in order to avoid freeriding, as well as take care of peer heterogeneity A number of recent research works on Peer-to-Peer backup proposed various techniques and approaches, the thesis gives a description on all of them and discusses iv the suitability for the three most important systems: pStore, Cooperative Internet Backup Scheme and Pastiche, because they represent the three most typical systems in Peer-to-Peer backup Based on all these, PeerStore, our own system, is proposed in order to fulfill our original design goal: A Peer-to-Peer backup system to be suited into large unstable network We also implement both pStore and PeerStore using Java, the experiments run on real networks, which consists of 50 PCs, shows great improvements in reducing maintenance cost and better support for fairness and heterogeneity To this end, we believe that our contribution has addressed some important Peerto-Peer backup issues in large unstable networks PeerStore has proven to handle these issues better than the existing systems, the discussion in the designing also gives future research directions in Peer-to-Peer backup The implemented PeerStore can be further extended to incorporate more functionalities to address more concerns Contents Introduction 1.1 Peer-to-Peer Backup 1.2 Contributions 1.3 Thesis Organization Issues in Peer-to-Peer Backup 2.1 Backup Requirements 2.2 Peer-to-Peer Requirements 2.3 Resource Constraints 2.3.1 Storage Space Constraints 2.3.2 Bandwidth Constraints 2.3.3 Dealing with Duplicated Data Peer-to-Peer Issues 10 2.4.1 10 2.4 Dealing with Unreliable Peers CONTENTS 2.5 2.4.2 Dealing with Free-Riders 11 2.4.3 Dealing with Malicious Peers 12 2.4.4 Dealing with Peer Heterogeneity 12 2.4.5 Ensuring Availability in Unstable Network 13 Backup Organization 14 2.5.1 Data Storage and Retrieval 14 2.5.2 Metadata Management 14 2.5.3 Ensuring Confidentiality and Integrity 15 Related Work 3.1 3.2 vi 17 Introduction to Existing Systems 17 3.1.1 pStore 17 3.1.2 A Cooperative Internet Backup Scheme 22 3.1.3 Pastiche 24 3.1.4 Samsara - Fairness for Pastiche 27 3.1.5 Other Systems 29 Analysis of Existing Systems 31 3.2.1 pStore 32 3.2.2 Cooperative Internet Backup Scheme 33 3.2.3 Pastiche with Samsara 34 CONTENTS Design of PeerStore vii 37 4.1 Overview 38 4.2 Backup 39 4.3 Restore 41 4.4 Metadata Layer 42 4.5 Data Distribution 44 4.5.1 Finding Trade Partners 46 4.5.2 Imbalance in Trades 47 4.5.3 The Trading Process 48 Fairness 50 4.6.1 Safekeeping 50 4.6.2 Punishment Model 52 Short-term Availability vs Long-term Availability 53 4.6 4.7 Experimental Results 5.1 57 Simulation to prove Dominance of Data Migration 57 5.1.1 Simulation Setup 58 5.1.2 Results 60 5.2 Simulation to compare Performance of pStore and PeerStore 63 5.3 Evaluation 66 CONTENTS viii Conclusion and Future Work 69 A Implementation of pStore and PeerStore 77 A.1 Development Platform 77 A.2 Overview 77 5.2 SIMULATION TO COMPARE PERFORMANCE OF PSTORE AND PEERSTORE Percentage of peers that finished backup 100 80 60 40 20 0 10 Trade ratio Figure 5.5: Trade Ratio vs Number of Successful Backup Nodes investigate the influence of the trade ratio on the number of successful trades A first series of experiments was setup like this: 30 peers form a network and each of them picks files randomly from a collection of 50MB in 1000 files, with percentages taken from a normal distribution with an average of 50% and a standard deviation of 25% Each time this simulation is run, a different trade ratio is set for every peer Figure 5.5 shows the results from this simulation: starting from trade ratio 1, where no peer manages to make a trade for backup since there are seldom exact matching backup sizes, initial increases in the trade ratio show a large effect At a trade ratio of 2, 80% of the peers in the simulation can finish their backup, having made successful trades However, peers managed to finish their backup with factor 1.5 than with factor 2, which looks unreasonable, which led us to conduct the second simulation on this investigation, with more peers to form the network, 65 5.3 EVALUATION 66 Percentage of peers that finished backup 100 80 60 40 20 0 10 Trade ratio Figure 5.6: Trade Ratio vs Number of Successful Backup Nodes the simulation has more tendency towards consistency As shown in Figure 5.6, the plotted points look more consistent and looks like a curve that are approaching the 100% as the trade ratio value increases In this sense, as PeerStore is meant for large-scale networks, peers can adjust their trade ratios according to the curve in Figure 5.6 in order to establish more trades 5.3 Evaluation With combination of a small, aggressively maintained metadata DHT and a lazy storage-trading scheme for data distribution, PeerStore lowers the maintenance traffic while improves the support for fairness and heterogeneity By creating a symmetric trade relationship between peers, PeerStore makes it possible to suggest 5.3 EVALUATION 67 a fairness mechanism to be implemented on top of that easily With combination of all these properties, PeerStore is well-suited for different kinds of networks, ranging from the LANs, WANs, to large unstable network The support for fairness mechanism enable PeerStore to offer backup with high long-term availability even in highly un-trusted networks For LANs within the same corporation or same campus, PeerStore offers improvements in reducing the traffic since by detecting duplicate blocks at sender instead of at receiver, in a highly overlapped network, PeerStore transfers less data than pStore When used in eager mode, the system can offer high short-term availability as existing system with better support for fairness and heterogeneity Some aspects of the current system still need to be taken care of in the coming research work: By using convergent encryption, PeerStore inherits the slightly reduced security Since security issue is not the main focus of this thesis, further research into this area can be carried out; they system offers an advantage for peers joining the network late, since they will find large number of common blocks, decreasing the amount of storage they need to donate to the system The most important problem is that PeerStore has no strict guarantee for a backup for every peer, i.e., since peers need to look for partners to backup, if no peer is willing to trade, then peer ends up with unfinished backup process Following is the comparison of the main advantages and disadvantages of the recent system together with PeerStore 5.3 EVALUATION 68 - pStore Cooperative Backup Pastiche PeerStore Duplicate Checking Yes No Yes Yes Fairness Mechanism No Yes Yes Yes Deal with Heterogeneity No Yes Yes Yes Central Server No Yes No No Maintenance Cost High Low Low Low Establishing Trade Not Applicable Rely on Server Rare Moderate Security Moderate Moderate Moderate Moderate Table 5.1: Comparison of Existing Peer-to-Peer backup system with PeerStore Chapter Conclusion and Future Work In this thesis we seek to design and implement a Peer-to-Peer backup system to be used in large, unstable networks After investigating relevant issues and existing approaches, we successfully built our own Peer-to-Peer backup system: PeerStore PeerStore offers several nice features such as lowering the maintenance cost in large unstable network, providing good support for fairness and peer heterogeneity, etc We believe that PeerStore is one step closer to a real software to be used by anonymous private users to backup their data through the Internet PeerStore also serves as one possible research direction in designing of Peer-to-Peer backup systems: to combine different network topologies to improve performance Currently there is no single Peer-to-Peer network topology that can fulfill the entire diverse set of requirements of different parts of a Peer-to-Peer system A typical case is distributed hash table, which can provide efficient searching performance, but it is too strict for data placement and makes it cannot deal with fairness and heterogeneity By designing the system into layers and delegate different requirements to different layers, we can possibly provide a better solution to the existing Chapter Conclusion and Future Work 70 system, and we are expected to see more complex systems in the future which can address more concerns The application of Peer-to-Peer techniques has changed the traditional client-server model into distributed manner, besides its file-sharing origin, Peer-to-Peer has become more diverse and aims for more complex computing tasks Peer-to-Peer backup is one of the new services based on the Peer-to-Peer network, and we hope to see more services besides the backup will be provided by Peer-to-Peer computing Bibliography [1] E Adar and B Huberman Free riding on gnutella Technical report, Xerox PARC, August 2000 [2] M G Baker, J H Hartman, M D Kupfer, K W Shirriff, and J K Ousterhout Measurements of a distributed file system In Proceedings of the thirteenth ACM symposium on Operating systems principles, pages 198–212 ACM Press, 1991 [3] C Batten, K Barr, A Saraf, and S Treptin pStore: A secure peer-topeer backup system Technical Memo MIT-LCS-TM-632, MIT Laboratory for Computer Science, December 2001 [4] R Bhagwan, D Moore, S Savage, and G Voelker Replication strategies for highly available peer-to-peer storage, June 2002 [5] W J Bolosky, J R Douceur, D Ely, and M Theimer Feasibility of a serverless distributed file system deployed on an existing set of desktop pcs In Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 34–43 ACM Press, 2000 BIBLIOGRAPHY 72 [6] H Bretzke and J Vassileva Motivating cooperation in peer to peer networks, 2002 [7] J Cates Robust and efficient data management for a distributed hash table Master’s thesis, Massachusetts Institute of Technology, 2003 [8] B Cohen Bittorrent, 2001 http://bitconjurer.org/BitTorrent/ [9] L P Cox, C D Murray, and B D Noble Pastiche: Making backup cheap and easy In Proceedings of the Fifth ACM/USENIX Symposium on Operating Systems Design and Implementation, Boston, MA, December 2002 [10] L P Cox and B D Noble Samsara: Honor among thieves in peer-to-peer storage In Proceedings of the 19th ACM Symposium on Operating Systems Principles, Bolton Landing, NY, October 2003 [11] P Digital Magic mirror backup, 2003 [12] J Douceur The sybil attack In Proceedings of the First International Workshop on Peer-to-Peer Systems (IPTPS’02), March 2002 [13] P Druschel and A Rowstron PAST: A large-scale, persistent peer-to-peer storage utility In Proceedings of the Eigth Workshop on Hot Topics in Operating Systems (HotOS VIII), pages 75–80 IEEE Computer Society Press, May 2001 [14] E Gamma, R Helm, R Johnson, and J Vlissides Design Patterns: Elements of Reusable Object-Oriented Software Addison-Wesley, 1995 BIBLIOGRAPHY 73 [15] P Golle, K Leyton-Brown, and I Mironov Incentives for sharing in peerto-peer networks In Proceedings of the 3rd ACM conference on Electronic Commerce, pages 264–267 ACM Press, 2001 [16] HiveCache Swarmbackup a peer-to-peer backup system., 2003 [17] J Kubiatowicz, D Bindel, Y Chen, P Eaton, D Geels, R Gummadi, S Rhea, H Weatherspoon, W Weimer, C Wells, and B Zhao Oceanstore: An architecture for global-scale persistent storage In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX) ACM Press, November 2000 [18] M Lillibridge, S Elnikety, A Birrel, M Burrows, and M Isard A cooperative internet backup scheme In Proceedings of the 2003 Usenix Annual Technical Conference, pages 29–41, 2003 [19] L W LLC Limewire gnutella client, 2001 http://www.limewire.com [20] G Ltd Grokster, 2001 http://www.grokster.com [21] S N Ltd Kazaa media desktop, 2001 http://www.kazaa.com [22] U Manber Finding similar files in a large file system In Proceedings of the USENIX Winter 1994 Technical Conference, pages 1–10, San Fransisco, CA, USA, 17–21 1994 [23] E Martinian Distributed internet backup system (dibs), 2003 [24] P Maymounkov and D Mazieres Kademlia: A peer-to-peer information system based on the xor metric In Proceedings of the First International Workshop on Peer-to-Peer Systems (IPTPS’02), March 2002 BIBLIOGRAPHY [25] MetaMachine eDonkey2000, file-sharing network, 74 2000 http://www.overnet.com/ [26] MetaMachine Overnet, file-sharing network, 2001 http://www.overnet.com/ [27] J K Ousterhout, H Da Costa, D Harrison, J A Kunze, M Kupfer, and J G Thompson A trace-driven analysis of the unix 4.2 bsd file system In Proceedings of the tenth ACM symposium on Operating systems principles, pages 15–24 ACM Press, 1985 [28] A Rowstron and P Druschel Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems Lecture Notes in Computer Science, 2218:329–350, 2001 [29] S Saroiu, P K Gummadi, and S D Gribble A measurement study of peerto-peer file sharing systems In Proceedings of Multimedia Computing and Networking 2002 (MMCN ’02), San Jose, CA, USA, January 2002 [30] B Schneier Applied Cryptography: Protocols, Algorithms, and Source Code in C Wiley, second edition, 1995 [31] E Sit, J Cates, and R Cox A dht-based backup system, August 2003 [32] I Stoica, R Morris, D Karger, M F Kaashoek, and H Balakrishnan Chord: A scalable peer-to-peer lookup service for internet applications In Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications, pages 149–160 ACM Press, 2001 [33] A Tridgell Efficient Algorithms for Sorting and Synchronization PhD thesis, Australian National University, 1999 BIBLIOGRAPHY 75 [34] W Vogels Filesystem usage in windows nt 4.0 In Proceedings of the 17th Symposium on Operating Systems Principles (SOSP), pages 93–109, December 1999 [35] H Weatherspoon and J Kubiatowicz Erasure coding vs replication: A quantitative comparison In Proceedings of the First International Workshop on Peer-to-Peer Systems (IPTPS’02), March 2002 BIBLIOGRAPHY 76 Appendix A Implementation of pStore and PeerStore A.1 Development Platform All code for the project was written in Java, using version 1.4 of the Java SDK Eclipse was used as the development environment and FreePastry was used as the underlying Distributed Hash Table implementation A.2 Overview Both systems are built around their respective Pastry application classes, derived from the rice.pastry.client.PastryAppl base class The two main classes (pstore.PStoreSimulation and peerstore.PeerstoreSimulation in the p2p.simulation package) are very simple: they only parse the command line for the name of an A.2 OVERVIEW 78 : PeerstoreSimulation : Automator parse() : StartEvent execute() :PeerstoreApplication : BackupEvent execute() Figure A.1: Sequence Diagram of Backup in PeerStore A.2 OVERVIEW 79 event file and then start ”executing” it The file is parsed by an instance of the simulation.automation.Automator class, which creates event objects (found in the automation package of the pstore and peerstore packages, respectively) that execute the appropriate actions at the time specified in the event file The most important event is the StartEvent, which creates a PastryNode and attaches the application to it, thus starting the actual peer-to-peer backup code After the application object has been instantiated, the event framework can instruct it to things like a backup or restore, or to shut down This allows us to control the actions of each peer at runtime The sequence of events in creating the application and instructing it to a backup is depicted in Figure A.1 As each node executes its own event file, the behavior of the individual nodes can be controlled in great detail To avoid having to create 30 or more event files manually, we created a number of event generators, which can be found in the peerstore.automation package The format of the event files is identical for the pStore and PeerStore simulation, which allowed us to run both prototypes in identical scenarios The source code for the simulation prototypes is in the projects/pstore simulation and projects/peerstore simulation directories of the accompanying CD-ROM To build the pstoreSim.jar and peerstoreSim.jar files, invoke ant pStore and ant peerStore, respectively The simulations can be run by invoking java -jar [pstore/peerstore] Sim.jar [event-file] [...]... 14 Backup Organization Since this is a backup system, issues related to building backup semantics on top of a Peer- to -Peer network will be discussed here: Data storage and retrieval, metadata management as well as confidentiality 2.5.1 Data Storage and Retrieval Most Peer- to -Peer backup systems are in favor of dealing with data in small and equal-sized blocks This approach avoids the problem of fragmentation... metadata consistency All peers in the network need 2.5 BACKUP ORGANIZATION 15 to have a consistent view of the whole backup data, in order to make backup and restore successful Exploiting the distributed nature of Peer- to -Peer network, we can make each peer managing the metadata manage its own backups, but if a host crashes, both the metadata and the backup data are lost The solution for this will again... fragmentation and able to treat files of different size in the same manner Storage location can also be located easily, especially in a structured network where searching and broadcasting can be done efficiently 2.5.2 Metadata Management In a Peer- to -Peer backup system, the task of the metadata is to keep track of the location of backup data stored in the Peer- to -Peer network And we expect the maintenance... I have made the following contributions • Investigate various Peer- to -Peer backup issues, which include both issues related to the underlying Peer- to -Peer network and those concerning the upper level backup semantics • Examine and compare existing approaches in Peer- to -Peer backup, to analyze their advantages and disadvantages in doing Peer- to -Peer backup • Propose a two layer Peer- to -Peer backup system: ... system must maximize the storage space available and at the same time minimize the backup size The system has to give incentives to the users or force the user to contribute at least as much storage as they use To minimize the backup size, replication overhead and metadata management need to be kept low and the system need to have a duplicate check and removal mechanism to reduce the actual data stored in... Tables 5.1 Comparison of Existing Peer- to -Peer backup system with PeerStore 68 LIST OF TABLES xii Chapter 1 Introduction 1.1 Peer- to -Peer Backup Backing up data is important for most PC users as well as large corporations However, despite of its importance, usual backup approaches can be costly Backing up data requires large tapes and off-site storages to regularly store all the important files so as... spirit of the Peer- to -Peer system The system should be relying solely on decentralized peers Message routing, data dissemination and retrieval should be done in a distributed manner Peer- to -Peer Metadata Management Metadata is important as it captures the information on how and where backup data is stored in the Peer- to -Peer network Therefore, the metadata must also be replicated in the Peer- to -Peer network,... Peer- to -Peer backup, which enables users to backup their data on top of a Peer- to -Peer network in a collaborative fashion Investigation of recent research works on Peer- to -Peer backup shows that most of them has been devoted to long-term archival and publishing system, rather than real backup system In our thesis, we will first examine and investigate various issues in Peer- to -Peer backup, defining... system: PeerStore, which decouples the data placement from metadata management so as to relax the strictness Distributed Hash Table has imposed on data placement • Implement two Peer- to -Peer backup systems from scratch, using Java The first system contains the existing approach in Peer- to -Peer backup, and the second system implements the newly proposed two-layer system These two systems are meant for... data when peers join and leave the network For a system with long-term availability, a peer must monitor the intervals of its partner’s up time and challenge them regularly In a Peer- toPeer backup system, long-term availability can be sufficient, but the problem with this approach is it requires a long time to restore the backup data as some portions may be missing temporarily 2.5 BACKUP ORGANIZATION ... trade Asymmetric trade generating a storage claim Asymmetric trade forwarding a storage claim Data block Storage claim Figure 3.9: Generation and Forwarding of Storage Claims in Samsara established... make each peer managing the metadata manage its own backups, but if a host crashes, both the metadata and the backup data are lost The solution for this will again be redundancy, which means replicate... routing, data dissemination and retrieval should be done in a distributed manner Peer-to-Peer Metadata Management Metadata is important as it captures the information on how and where backup data is

Ngày đăng: 28/11/2015, 13:33

TỪ KHÓA LIÊN QUAN

w