Giải pháp lưu trữ khóa giá trị phân tán cho các hệ thống lớn

DISTRIBUTED KEY-VALUE STORE FOR LARGE-SCALE SYSTEMS Thanh Trung Nguyen, The thesis is submitted to Le Quy Don Technical University Tai Lieu Chat Luong for the degree of Ph.D at the Faculty of Information Technology Research Supervisor Assoc Prof Dr Hieu Minh Nguyen HaNoi 2016 Declaration I declare that this thesis contains no material that has been accepted for the award or any other degree or diploma in any university or other institution To the best of my knowledge and belief, this thesis contains no material that is previously published or written by another person, except where due reference is made in the text of the thesis PHD Candidate Nguyen Trung Thanh i Abstract In recent decades, network application systems have been growing rapidly Not only Business data processing or Online Transaction Processing (OLTP) applications in the market, but also many new types of application such as Text Management (eg Search Engine), Data Warehouses, Stream Processing, Scientific and Intelligent Databases have been developed and being researched Particularly, the size of many applications such as social network, online commerce system, personal cloud storage has exponentially increased In such huge applications, building high performance, scalable data storage system is one of the biggest challenges In order to efficiently address the data storage problem, new mechanisms in building data storage have been developed to fill the gap that the traditional relational database management system cannot The new data storage mechanism is called NoSQL Key-value store is one of NoSQL data store schemes and it plays an important role in many large systems Distributed key-value store is an extension of key-value store that supports data distribution across multiple servers The quality and capability of a distributed key-value store depend on many factors, including performance of key-value stores in each node, the efficiency of data distribution algorithms, the structure of storing system for a specific data type, consistency model, the ability of storing big-values and big data structures, etc There are three important questions in building high performance distributed key-value stores: How to minimize latency and maximize throughput of the Key - Value store with ii minimum memory overhead in persistent layer? How to store big values into key-value store or how to manage large number of big-files in a cloud storage with the advantages of distributed key-value store? How to efficiently store and distribute big data-structures in key-value store? This thesis attempts to address these questions by studying the movement from RDBMS to NoSQL databases and exploring techniques in designing Key-Value stores After analyzing some common approaches for transforming from RDBMS to NoSQL, we conducted several experiments in order to reveal mechanisms of each approach For one of the most popular key types (auto increasing integer key), a high performance key-value store was proposed to minimize the latency of both read and write operations This method ensures that there is a maximum of one disk seek per operation and memory overhead per key is fixed After that, we analyzed some popular existing personal cloud storage systems that showed that the space complexity of metadata of files in these systems is linear to file size or O(n) In other words, most existing key-value stores and database systems often lack the ability to efficiently store big values such as big files in cloud storage Consequently, we proposed a new architecture and algorithms for big-file cloud using the advantages of key-value store that reduce the space complexity of metadata of files from O(n) to O(1) Finally, we proposed the Forest of Distributed B+Tree based on key-value store for storing big data structure such as Sets and Maps The novelty of our method is that this structure supports for distributing partial value This ability is not supported in some existing popular systems such as BigTable, Cassandra where each row of these storage systems must be fitted in one server Our method allows us to build huge-row storage in which rows are large than in Google BigTable and Cassandra In summary, this thesis studies and proposes the methods for efficient storing data in large-scale systems iii Acknowledgements First of all, I would like to express my gratitude to my supervisor Minh Hieu Nguyen for his guidance, experiences and encouragement throughout my PhD journey I would like to thank Uy Quang Nguyen for his support and many helpful insights for my research I thank Associate Professor Lam Thu Bui for his meaningful questions in the early stage of my research, they helped me a lot to improve my academic point of view as a science researcher I thank my cool office mates, Loi Van Cao, Thien Duc Nguyen, etc for fun times and for a lot of interesting discussions I want to give my appreciation to Dung Hong Luu from Network Security group for his supports, advice and collaboration I also take this opportunity to thank all my colleagues in the Department of Network Security, for their effort to make the department such an excellent environment to work I would like to specially thank Thanh Ta Minh, Ly Vu Thi for always being helpful and responsive I thank to the Research Fund RF @ K12 that supported me a lot to publish my research result I thank to Research and Development Department of VNG Corporation for supporting big infrastructure and large real data sets for this research I thank friends in Research and Development Department of VNG Corporation: Anh Nguyen Hoai,Tin Khac Vu, my little brother Trung Thanh Nguyen and Tung Chi Vu You all have made the team such a great place to work and helped me to apply our research results in real products Most importantly, I wish to thank my family for their endless and unconditional love, for their sustained support and encouragement I am so grateful that they have always been there for me iv Contents Declaration i Abstract ii List of Figures ix List of Tables xi Abbreviations xii Introduction 1.1 Key-Value Store Overview 1.2 Big Data Challenges and Motivation 1.3 MCS: Data Storage Framework 1.3.1 Memory Cache 1.3.2 Key-Value Store Abstraction 10 1.3.3 Service Model 11 1.3.4 Commit-Log 11 1.4 Problem Statements 11 1.5 Contributions 13 1.6 Thesis structure 14 v Backgrounds 2.1 2.2 2.3 15 Overview 15 2.1.1 The development of NoSQL 15 2.1.2 Scalable Data Management for Cloud Computing and Big Data 19 Basic concepts 19 2.2.1 ACID Properties 20 2.2.2 Consistency, Availability and Partition tolerance 21 2.2.3 Eventual Consistency 22 2.2.4 The BASE Consistency Model 27 2.2.5 Partitioning 28 2.2.6 Data structures for persistent layer in key-value store 30 Cloud Storage Benchmarks and Workloads 32 High performance key-value store for large-scale storage service 35 3.1 Introduction 36 3.2 Related works 38 3.3 Sequential Log Storage Model 41 3.4 Proposed Key-Value Store 42 3.4.1 Data structure for the Index 43 3.4.2 Data Layout in Persistent File 46 3.4.3 Key-Value Data Table and Main Algorithms 46 3.4.4 Implementation 54 3.5 Analysis and Comparison with other key-value stores 55 3.6 Performance Evaluation 57 3.6.1 Standard Benchmark 57 3.6.2 Engine Evaluation 60 vi 3.6.3 3.7 Discussion 63 Summary 64 High-Performance Distributed Big-File Storage Based On Key-Value Store 66 4.1 Introduction 67 4.2 Related Works 69 4.3 Proposed Method for Big File Storage System 71 4.3.1 General Big File Model 72 4.3.2 Architecture Overview 72 4.3.3 Logical Data layout 74 4.3.4 Chunk Storage 75 4.3.5 Metadata 76 4.3.6 Data distribution and replication 79 4.3.7 Uploading and deduplication algorithm 79 4.3.8 Downloading algorithm 81 4.3.9 Secure Data Transfer Protocol 82 Evaluation 83 4.4.1 Evaluate Key-Value store for BFC 84 4.4.2 Locally performance comparison 85 4.4.3 Metadata comparison 85 4.4.4 Deduplication 86 Summary 87 4.4 4.5 Forest of distributed B+Tree for storing large number of big and growing sets based on key-value store 89 5.1 Introduction 90 5.2 Big Set Problem Statement 92 vii 5.2.1 Problem 92 5.2.2 Complexity 92 5.3 Related works 93 5.4 Forest of Distributed B+ Tree for solving Big-Set Problem 95 5.4.1 Method Overview 96 5.4.2 Forest of Distributed B+Tree Definition 98 5.4.3 Leaf Nodes of Distributed B+Tree 98 5.4.4 Non-leaf Nodes of Distributed B+Trees 101 5.4.5 Forest of Distributed B+Tree 106 5.4.6 General key-value store using Forest of Distributed B+Tree 110 5.5 Evaluation 111 5.6 Discussion 114 5.7 Applications of ZDB and Forest of Distributed B+Tree 116 5.8 5.7.1 Computing Architecture for Anomaly Detection System 116 5.7.2 Specific Storage Solution for Specific Structured Data 118 Summary 120 Conclusion and Future works 121 Publications 124 Bibliography 125 viii List of Figures 1.1 Storage Backend Framework overview 1.2 Contributions Overview 14 2.1 Consistent Hashing 30 2.2 B-Tree Example 31 3.1 Proposed key-value store architecture 43 3.2 Data file layout with data files 47 3.3 Put, Get, Remove algorithms of Flat Table 48 3.4 Data partitioning 51 3.5 Write only 1KB records using YCSB 59 3.6 Write only 4KB records using YCSB 59 3.7 High read / low write 1KB records using YCSB 60 3.8 High read / low write 4KB records using YCSB 61 3.9 High Write / low read 1KB records using YCSB 61 3.10 High Write / low read 4KB records using YCSB 62 4.1 BFC Architecture 73 4.2 BFC Main Backend Components 75 4.3 Data layout of Big File in the system 75 4.4 Chunk storage system 76 ix one disk seeking All writes can be configured to be sequential go achieve best writing performance of both SSD and HDD Proposed key-value store using shared memory Flat Index to fast lookup key-value pair position in data file without unnecessary disk accessing The proposed key-value store is optimized for auto increasing integer keys - one of the most popular key types A high performance key-value store called Zing Database (ZDB) is implemented The results are presented in the first two publications of this thesis Secondly, this thesis propose architecture for building big file cloud storage based on key-value store It takes advantages of proposed key-value store to minimize the size of metadata when a system managing large number of big files for serving millions of users For storing big-files (big values) into key-value store, every file has a same size metadata Each big-file is split into multiple fixed-size chunks and stored in ZDB The chunks of a file have a contiguous ID range, thus it is easy to distribute data and scale-out storage system, especially when using ZDB This thesis proposes method to store big-files efficiently with advantages of key-value store Finally, this thesis proposed Forest of Distributed B+Tree based on key-value store This result convert efficiently binary key-space into auto increasing integer key-space It is useful for building scalable Nosql data storage for large data structure such as big set, wide-column data Data is distributed in key-value store automatically and make it easy to scale the systems Every Big Set as a value in a key-value store is split into multiple small sets and store them in distributed ZDB key-value store It supports building scalable data storage system for big data structure The experiment results show that Forest of Distributed B+Tree has a good performance in both read and write operations Moreover, it is a general key-value store that support binary-key type efficiently and order-preserving In future, we will continue to extend and research storage architecture for big data We will firstly focus to data storage system that support computing in data mining system more efficiently such as large time series storage In the ”Internet of things” trend, many data 122 sources from millions sensors with multiple long time series need to be stored for querying and mining efficiently Second, we will research to make our storage systems more secure in network environment, make them not only high performance but also secure 123 Publications [1] Thanh Trung Nguyen, Minh Hieu Nguyen “ZDB-High performance key-value store.” In Proceedings of the 2013 Third World Congress on Information and Communication Technologies (WICT 2013) [2] Thanh Trung Nguyen, Minh Hieu Nguyen “Zing Database: high-performance keyvalue store for large-scale storage service.” Vietnam Journal of Computer Science, February 2015, Volume 2, Issue 1, pp 13-23 [3] Thanh Trung Nguyen, Tin Khac Vu, Minh Hieu Nguyen “BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value Store” in Proceeding of 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2015) [4] Thanh Trung Nguyen, Anh Tuan Nguyen, Tuan Anh Ha Nguyen, Ly Thi Vu, Quang Uy Nguyen, and Long Dao Hai 2015 “Unsupervised Anomaly Detection in Online Game.” In Proceedings of the Sixth International Symposium on Information and Communication Technology (SoICT 2015) ACM, New York, NY, USA, 4-10 DOI=http://dx.doi.org/ 10.1145/2833258.2833305 [5] Thanh Trung Nguyen, Minh Hieu Nguyen “Forest of Distributed B+Tree Based On Key-Value Store for Big-Set Problem.” In Database Systems for Advanced Applications Volume 9645 of the series Lecture Notes in Computer Science pp 268-282, 2016 [6] Thanh Trung Nguyen, Minh Hieu Nguyen “Distributed and High Performance BigFile Cloud Storage Based On Key-Value Store” International Journal of Networked and Distributed Computing, Volume 4, Issue 3, pp 159 - 172, July 2016 124 Bibliography [1] Daniel Abadi, Rakesh Agrawal, Anastasia Ailamaki, Magdalena Balazinska, Philip A Bernstein, Michael J Carey, Surajit Chaudhuri, Jeffrey Dean, AnHai Doan, Michael J Franklin, Johannes Gehrke, Laura M Haas, Alon Y Halevy, Joseph M Hellerstein, Yannis E Ioannidis, H V Jagadish, Donald Kossmann, Samuel Madden, Sharad Mehrotra, Tova Milo, Jeffrey F Naughton, Raghu Ramakrishnan, Volker Markl, Christopher Olston, Beng Chin Ooi, Christopher Ré, Dan Suciu, Michael Stonebraker, Todd Walter, and Jennifer Widom The beckman report on database research SIGMOD Rec., 43(3):61–70, December 2014 ISSN 0163-5808 doi: 10.1145/2694428.2694441 URL http://doi.acm.org/10.1145/2694428.2694441 [2] Divyakant Agrawal, Sudipto Das, and Amr El Abbadi Big data and cloud computing: current state and future opportunities In Proceedings of the 14th International Conference on Extending Database Technology, pages 530–533 ACM, 2011 [3] Marcos K Aguilera, Wojciech Golab, and Mehul A Shah A practical scalable distributed b-tree Proceedings of the VLDB Endowment, 1(1):598–609, 2008 [4] Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella, and Suman Nath Cheap and Large CAMs for High Performance Data-Intensive Networked Systems In NSDI, volume 10, pages 29–29, 2010 [5] David G Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence 125 Tan, and Vijay Vasudevan FAWN: A fast array of wimpy nodes In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 1–14 ACM, 2009 [6] NoSQL Archive NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable http://nosql-databases.org/, 2016 Accessed September 4, 2016 [7] Marcos D Assun¸caõ, Rodrigo N Calheiros, Silvia Bianchi, Marco AS Netto, and Rajkumar Buyya Big data computing and clouds: Trends and future directions Journal of Parallel and Distributed Computing, 79:3–15, 2015 [8] Lars Backstrom and Jure Leskovec Supervised random walks: predicting and recommending links in social networks In Proceedings of the fourth ACM international conference on Web search and data mining, pages 635–644 ACM, 2011 [9] Anirudh Badam, KyoungSoo Park, Vivek S Pai, and Larry L Peterson HashCache: Cache Storage for the Next Billion In NSDI, volume 9, pages 123–136, 2009 [10] Doug Beaver, Sanjeev Kumar, Harry C Li, Jason Sobel, Peter Vajgel, et al Finding a Needle in Haystack: Facebook’s Photo Storage In OSDI, volume 10, pages 1–8, 2010 [11] Dhruba Borthakur HDFS architecture guide Hadoop Apache Project, page 53, 2008 [12] Eric Brew CAP Twelve Years Later: How the ”Rules” Have Changed http://www infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed, 2012 Accessed December 1st , 2014 [13] Eric A Brewer Towards robust distributed systems In PODC, page 7, 2000 [14] Eric A Brewer Lessons from giant-scale services Internet Computing, IEEE, 5(4): 46–55, 2001 126 [15] Mike Burrows The chubby lock service for loosely-coupled distributed systems In Proceedings of the 7th symposium on Operating systems design and implementation, pages 335–350 USENIX Association, 2006 [16] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber Bigtable: A distributed storage system for structured data ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008 [17] Laura Chappell and Gerald Combs Wireshark network analysis: the official Wireshark certified network analyst study guide Protocol Analysis Institute, Chappell University, 2010 [18] Peter M Chen, David Patterson, et al Storage performance-metrics and benchmarks Proceedings of the IEEE, 81(8):1151–1165, 1993 [19] Douglas Comer Ubiquitous B-tree ACM Computing Surveys (CSUR), 11(2):121–137, 1979 [20] Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears Benchmarking cloud serving systems with YCSB In Proceedings of the 1st ACM symposium on Cloud computing, pages 143–154 ACM, 2010 [21] Bin Cui, Hong Mei, and Beng Chin Ooi Big data: the driver for innovation in databases National Science Review, 1(1):27–30, 2014 [22] Jeff Dean Software engineering advice from building large-scale distributed systems, 2007 [23] Jeffrey Dean and Sanjay Ghemawat Mapreduce: simplified data processing on large clusters Communications of the ACM, 51(1):107–113, 2008 127 [24] Biplob Debnath, Sudipta Sengupta, and Jin Li FlashStore: high throughput persistent key-value store Proceedings of the VLDB Endowment, 3(1-2):1414–1425, 2010 [25] Biplob Debnath, Sudipta Sengupta, and Jin Li SkimpyStash: RAM space skimpy key-value store on flash-based storage In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 25–36 ACM, 2011 [26] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels Dynamo: amazon’s highly available key-value store In SOSP, volume 7, pages 205–220, 2007 [27] Idilio Drago, Marco Mellia, Maurizio M Munafo, Anna Sperotto, Ramin Sadre, and Aiko Pras Inside dropbox: understanding personal cloud storage services In Proceedings of the 2012 ACM conference on Internet measurement conference, pages 481–494 ACM, 2012 [28] Idilio Drago, Enrico Bocchi, Marco Mellia, Herman Slatman, and Aiko Pras Benchmarking personal cloud storage In Proceedings of the 2013 conference on Internet measurement conference, pages 205–212 ACM, 2013 [29] Dropbox Dropbox Tech Blog https://tech.dropbox.com/, 2014 Accessed October 28, 2014 [30] PUB FIPS 197: the official AES standard Figure2: Working scheme with four LFSRs and their IV generation LFSR1 LFSR, 2, 2001 [31] Brad Fitzpatrick A distributed memory object caching system http://www.danga com/memcached/, 2013 Accessed September 4, 2013 [32] Peter Géczy Big data characteristics The Macrotheme Review, 3(6):94–104, 2014 128 [33] Sanjay Ghemawat and Jeff Dean LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values https: //github.com/google/leveldb, 2014 Accessed November 2, 2014 [34] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung The Google file system In ACM SIGOPS Operating Systems Review, volume 37, pages 29–43 ACM, 2003 [35] Jim Gray Microsoft SQL Server January 1997 URL http://research.microsoft com/apps/pubs/default.aspx?id=68492 [36] Jim Gray and Andreas Reuter Transaction Processing: Concepts and Techniques Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 1992 ISBN 1558601902 [37] Jim Gray et al The transaction concept: Virtues and limitations In VLDB, volume 81, pages 144–154, 1981 [38] Yunhong Gu and Robert L Grossman UDT: UDP-based data transfer for high-speed wide area networks Computer Networks, 51(7):1777–1799, 2007 [39] Patrick Hunt, Mahadev Konar, Flavio P Junqueira, and Benjamin Reed ZooKeeper: wait-free coordination for internet-scale systems In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, volume 8, pages 11–11, 2010 [40] Google Inc LevelDB - A fast and lightweight key/value database library by Google http://code.google.com/p/leveldb, 2013 Accessed July 23, 2013 [41] Doug Judd Hypertable http://static.last.fm/johan/nosql-20090611/ hypertable\_nosql.pdf, 2009 129 [42] Stephen Kaisler, Frank Armour, Juan Antonio Espinosa, and William Money Big data: Issues and challenges moving forward In System Sciences (HICSS), 2013 46th Hawaii International Conference on, pages 995–1004 IEEE, 2013 [43] Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P C Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J Abadi H-Store: a High-Performance, Distributed Main Memory Transaction Processing System Proc VLDB Endow., 1(2):1496–1499, 2008 ISSN 2150-8097 doi: 10.1145/1454159.1454211 URL http://hstore.cs.brown edu/papers/hstore-demo.pdf [44] David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 654–663 ACM, 1997 [45] David Karger, Alex Sherman, Andy Berkheimer, Bill Bogstad, Rizwan Dhanidina, Ken Iwamoto, Brian Kim, Luke Matkins, and Yoav Yerushalmi Web caching with consistent hashing Computer Networks, 31(11):1203–1213, 1999 [46] FAL Labs Kyoto Cabinet: a straightforward implementation of DBM http:// fallabs.com/kyotocabinet, 2013 Accessed May 1, 2013 [47] Eric Lai No to SQL? Anti-database movement gains steam http: //www.computerworld.com/article/2526317/database-administration/ no-to-sql anti-database-movement-gains-steam.html, 2009 [48] Avinash Lakshman and Prashant Malik Cassandra: a decentralized structured storage system ACM SIGOPS Operating Systems Review, 44(2):35–40, 2010 130 [49] David Liben-Nowell and Jon Kleinberg The Link Prediction Problem for Social Networks In Proceedings of the Twelfth International Conference on Information and Knowledge Management, volume 58 of CIKM ’03, pages 556–559, New York, NY, USA, 2003 ACM ISBN 1-58113-723-0 doi: 10.1145/956863.956972 URL http://doi.acm.org/10.1145/956863.956972 [50] Hyeontaek Lim, Bin Fan, David G Andersen, and Michael Kaminsky SILT: A memory-efficient, high-performance key-value store In Proceedings of the TwentyThird ACM Symposium on Operating Systems Principles, pages 1–13 ACM, 2011 [51] Witold Litwin, Marie-Anne Neimat, and Donovan Schneider Rp*: A family of order preserving scalable distributed data structures In VLDB, volume 94, pages 12–15, 1994 [52] Yandong Mao, Eddie Kohler, and Robert Tappan Morris Cache craftiness for fast multicore key-value storage In Proceedings of the 7th ACM european conference on Computer Systems, pages 183–196 ACM, 2012 [53] Mapkeeper MapKeeper https://github.com/m1ch1/mapkeeper, 2014 Accessed June 1, 2014 [54] Ward Douglas Maurer and Theodore Gyle Lewis Hash table methods ACM Computing Surveys (CSUR), 7(1):5–19, 1975 [55] Nimrod Megiddo and Dharmendra S Modha ARC: A Self-Tuning, Low Overhead Replacement Cache In FAST, volume 3, pages 115–130, 2003 [56] Nimrod Megiddo and Dharmendra S Modha Outperforming lru with an adaptive replacement cache algorithm Computer, 37(4):58–65, 2004 131 [57] Prathyush Menon, Tilmann Rabl, Mohammad Sadoghi, and Hans-Arno Jacobsen Cassandra: An ssd boosted key-value store In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 1162–1167 IEEE, 2014 [58] Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom SFS: Random write considered harmful in solid state drives In Proc of the 10th USENIX Conf on File and Storage Tech, 2012 [59] Jeffrey C Mogul, Yee-Man Chan, and Terence Kelly Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP In NSDI, volume 4, pages 4–4, 2004 [60] MySQL Disadvantages of Creating Many Tables in the Same Database, 2015 [61] Elizabeth J O’neil, Patrick E O’neil, and Gerhard Weikum The lru-k page replacement algorithm for database disk buffering ACM SIGMOD Record, 22(2):297–306, 1993 [62] Elizabeth J O’neil, Patrick E O’Neil, and Gerhard Weikum An optimality proof of the lru-k page replacement algorithm Journal of the ACM (JACM), 46(1):92–112, 1999 [63] Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil The logstructured merge-tree (LSM-tree) Acta Informatica, 33(4):351–385, 1996 [64] Oracle Oracle Berkeley DB 12c: Persistent key value store http://www.oracle com/technetwork/products/berkeleydb, 2013 [65] John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, et al The case for RAMCloud Communications of the ACM, 54(7):121–130, 2011 132 [66] Rasmus Pagh and Flemming Friche Rodler Cuckoo hashing Journal of Algorithms, 51(2):122–144, 2004 [67] David Peleg and Avishai Wool The availability of quorum systems Information and Computation, 123(2):210–223, 1995 [68] Martin Placek and Rajkumar Buyya A taxonomy of distributed storage systems Reporte técnico, Universidad de Melbourne, Laboratorio de sistemas distribuidos y cómputo grid, 2006 [69] FIPS PUB Secure Hash Standard (SHS) 2012 [70] William Pugh Skip lists: a probabilistic alternative to balanced trees Communications of the ACM, 33(6):668–676, 1990 [71] Stephen M Rumble, Ankita Kejriwal, and John K Ousterhout Log-structured memory for DRAM-based storage In FAST, pages 1–16, 2014 [72] Salvatore Sanfilippo and Pieter Noordhuis Redis http://redis.io, 2013 Accessed June 07, 2013 [73] Spencer Shepler, Mike Eisler, David Robinson, Brent Callaghan, Robert Thurlow, David Noveck, and Carl Beame Network file system (NFS) version protocol Network, 2003 [74] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler The Hadoop Distributed File System In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST ’10, pages 1–10, Washington, DC, USA, 2010 IEEE Computer Society ISBN 978-1-4244-7152-2 doi: 10.1109/ MSST.2010.5496972 URL http://dx.doi.org/10.1109/MSST.2010.5496972 133 [75] Mark Slee, Aditya Agarwal, and Marc Kwiatkowski Thrift: Scalable cross-language services implementation Facebook White Paper, 5, 2007 [76] Benjamin Sowell, Wojciech Golab, and Mehul A Shah Minuet: a scalable distributed multiversion b-tree Proceedings of the VLDB Endowment, 5(9):884–895, 2012 [77] Jan Stanek, Alessandro Sorniotti, Elli Androulaki, and Lukas Kencl A Secure Data Deduplication Scheme for Cloud Storage 2014 [78] Michael Stonebraker and Rick Cattell 10 rules for scalable performance in’simple operation’datastores Communications of the ACM, 54(6):72–80, 2011 [79] Michael Stonebraker and Ugur Cetintemel ” One size fits all”: an idea whose time has come and gone In Data Engineering, 2005 ICDE 2005 Proceedings 21st International Conference on, pages 2–11 IEEE, 2005 [80] Michael Stonebraker, Chuck Bear, U˘gur C ¸ etintemel, Mitch Cherniack, Tingjian Ge, Nabil Hachem, Stavros Harizopoulos, John Lifter, Jennie Rogers, and Stan Zdonik One size fits all? Part 2: Benchmarking results In Proc CIDR, 2007 [81] Michael Stonebraker, Samuel Madden, Daniel J Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland The end of an architectural era:(it’s time for a complete rewrite) In Proceedings of the 33rd international conference on Very large data bases, pages 1150–1160 VLDB Endowment, 2007 [82] Christof Strauch, Ultra-Large Scale Sites, and Walter Kriha NoSQL databases Lecture Notes, Stuttgart Media University, 2011 [83] Miklos Szeredi et al Fuse: Filesystem in userspace Accessed on, 2010 [84] D B Terry, M M Theimer, Karin Petersen, A J Demers, M J Spreitzer, and C H Hauser Managing Update Conflicts in Bayou, a Weakly Connected Replicated 134 Storage System In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP ’95, pages 172–182, New York, NY, USA, 1995 ACM ISBN 0-89791-715-4 doi: 10.1145/224056.224070 URL http://doi.acm.org/10.1145/ 224056.224070 [85] Avadis Tevanian, Richard F Rashid, Michael Young, David B Golub, Mary R Thompson, William J Bolosky, and Richard Sanzi A UNIX Interface for Shared Memory and Memory Mapped Files Under Mach In USENIX Summer, pages 53–68 Citeseer, 1987 [86] Tom van Dijk Analysing and Improving Hash Table Performance 2009 [87] Robbert van Renesse and Fred B Schneider Chain Replication for Supporting High Throughput and Availability In OSDI, volume 4, pages 91–104, 2004 [88] Werner Vogels Eventually consistent Communications of the ACM, 52(1):40–44, 2009 [89] Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell D E Long, and Carlos Maltzahn Ceph: A Scalable, High-performance Distributed File System In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI ’06, pages 307–320, Berkeley, CA, USA, 2006 USENIX Association ISBN 1-93197147-1 URL http://dl.acm.org/citation.cfm?id=1298455.1298485 [90] Sage A Weil, Scott A Brandt, Ethan L Miller, and Carlos Maltzahn CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC ’06, New York, NY, USA, 2006 ACM ISBN 0-7695-2700-0 doi: 10.1145/1188455.1188582 URL http://doi acm.org/10.1145/1188455.1188582 135 [91] M Widenius, D Axmark, and M AB MySQL 5.5 Reference Manual, 2014 [92] Demetrios Zeinalipour-Yazti, Song Lin, Vana Kalogeraki, Dimitrios Gunopulos, and Walid A Najjar MicroHash: An Efficient Index Structure for Flash-Based Sensor Devices In FAST, volume 5, pages 3–3, 2005 [93] Kai Zhang, Kaibo Wang, Yuan Yuan, Lei Guo, Rubao Lee, and Xiaodong Zhang Mega-kv: a case for gpus to maximize the throughput of in-memory key-value stores Proceedings of the VLDB Endowment, 8(11):1226–1237, 2015 136

Định dạng
Số trang	150
Dung lượng	3,5 MB