TÀI LIỆU - Cao Học Khóa 8 - ĐH CNTT 5. a NoSql tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về t...
Trang 1NoSQL Databases – Amir H. Payberah
Trang 3NoSQL Databases – Amir H. Payberah 3 April 26, 2012
SQL is Good
● Relational Databases Management Systems (RDMBSs) – mainstay of business
Trang 4NoSQL Databases – Amir H. Payberah 4 April 26, 2012
Trang 5NoSQL Databases – Amir H. Payberah 5 April 26, 2012
SQL is Good
● SQL is good,
Trang 6NoSQL Databases – Amir H. Payberah
Trang 7NoSQL Databases – Amir H. Payberah 7 April 26, 2012
Trang 8NoSQL Databases – Amir H. Payberah 8 April 26, 2012
The Past and the Moment
http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf
Trang 9NoSQL Databases – Amir H. Payberah 9 April 26, 2012
Trang 10NoSQL Databases – Amir H. Payberah 10 April 26, 2012
Trang 11NoSQL Databases – Amir H. Payberah 11 April 26, 2012
Let's Scale RDBMSs Sharding
● Scaling out (horizontal scaling) based on data partitioning, i.e. dividing the database across many (inexpensive) machines
Trang 12NoSQL Databases – Amir H. Payberah 12 April 26, 2012
Scaling RDBMSs is Expensive and Inefficient
[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]
Trang 13NoSQL Databases – Amir H. Payberah 13 April 26, 2012
Not Only SQL
Trang 14NoSQL Databases – Amir H. Payberah 14 April 26, 2012
Trang 15NoSQL Databases – Amir H. Payberah 15 April 26, 2012
NoSQL History
● It was first used in 1998 by Carlo Strozzi to name his relational
database that did not expose the standard SQL interface
● The term was picked up again in 2009 when a Last.fm develper, Johan Oskarsson, wanted to organize an event to discuss opensource
distributed databases.
● The name attempted to label the emergence of a growing number of
nonrelational, distributed data stores that often did not attempt to
provide ACID
Trang 16NoSQL Databases – Amir H. Payberah 16 April 26, 2012
Trang 17NoSQL Databases – Amir H. Payberah 17 April 26, 2012
NoSQL Cost
[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]
Trang 18NoSQL Databases – Amir H. Payberah 18 April 26, 2012
SQL vs. NoSQL
[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]
Trang 19NoSQL Databases – Amir H. Payberah 19 April 26, 2012
Consistency
Single storage image. Informally, after an update completes, any subsequent
access will return the updated value.
Trang 20NoSQL Databases – Amir H. Payberah 20 April 26, 2012
Trang 21NoSQL Databases – Amir H. Payberah 21 April 26, 2012
Quorum Model
● N : the number of nodes to which a data item is replicated.
● R : the number of nodes a value has to be read from to be accepted.
● W : the number of nodes a new value has to be written to before the write operation is finished.
● To enforce strong consistency: R + W > N
Trang 22NoSQL Databases – Amir H. Payberah 22 April 26, 2012
Quorum Model
● N : the number of nodes to which a data item is replicated.
● R : the number of nodes a value has to be read from to be accepted.
● W : the number of nodes a new value has to be written to before the write operation is finished.
● To enforce strong consistency: R + W > N
R = 3, W = 3, N = 5 R = 4, W = 2, N = 5
Trang 23NoSQL Databases – Amir H. Payberah 23 April 26, 2012
Relaxing ACID Properties
● The largescale applications have to be reliable : availability + redundancy
● These properties are difficult to achieve with ACID properties.
● The BASE approach forfeits the ACID properties of consistency and isolation in favour
of availability, graceful degradation, and performance
Trang 24NoSQL Databases – Amir H. Payberah 24 April 26, 2012
Trang 25NoSQL Databases – Amir H. Payberah 25 April 26, 2012
CAP Theorem
● C onsistency : how a a system is in a consistent state after the execution of an operation.
Trang 26NoSQL Databases – Amir H. Payberah 26 April 26, 2012
CAP Theorem
● C onsistency : how a a system is in a consistent state after the execution of an operation.
Trang 27NoSQL Databases – Amir H. Payberah
Trang 28Dynamo
Trang 29NoSQL Databases – Amir H. Payberah 29 April 26, 2012
Trang 30NoSQL Databases – Amir H. Payberah 30 April 26, 2012
Trang 31NoSQL Databases – Amir H. Payberah 31 April 26, 2012
Trang 32NoSQL Databases – Amir H. Payberah 32 April 26, 2012
Trang 33NoSQL Databases – Amir H. Payberah 33 April 26, 2012
Trang 34NoSQL Databases – Amir H. Payberah 34 April 26, 2012
Trang 35NoSQL Databases – Amir H. Payberah 35 April 26, 2012
Load Imbalance
● Consistent hashing may lead to imbalance
Trang 36NoSQL Databases – Amir H. Payberah 36 April 26, 2012
Load Imbalance
● Consistent hashing may lead to imbalance
Node identifiers may not be balanced
Trang 37NoSQL Databases – Amir H. Payberah 37 April 26, 2012
Load Imbalance
● Consistent hashing may lead to imbalance
Node identifiers may not be balanced
Data identifiers may not be balanced
Trang 38NoSQL Databases – Amir H. Payberah 38 April 26, 2012
fatemeh.mp3
jim.mp3
Trang 39NoSQL Databases – Amir H. Payberah 39 April 26, 2012
Trang 40NoSQL Databases – Amir H. Payberah 40 April 26, 2012
Trang 41NoSQL Databases – Amir H. Payberah 41 April 26, 2012
Trang 42NoSQL Databases – Amir H. Payberah 42 April 26, 2012
Replication
● To achieve high availability and durability, Dynamo replicates its data
on multiple hosts
● The list of nodes that is responsible for storing a particular key is called the preference list
5
Jim
Cosmin
Tallat Seif
Fatemeh
Trang 43NoSQL Databases – Amir H. Payberah 43 April 26, 2012
5
Jim
Cosmin
Tallat Seif
Fatemeh
Tallat
Seif
Trang 44NoSQL Databases – Amir H. Payberah 44 April 26, 2012
Trang 45NoSQL Databases – Amir H. Payberah 45 April 26, 2012
Trang 46NoSQL Databases – Amir H. Payberah 46 April 26, 2012
Data Versioning
● Version branching can happen due to node failures, network
failures/partitions, etc
Target applications are aware that multiple versions can exist.
If causal : older version can be forgotten
If concurrent : conflict exists, requiring reconciliation
● A put requires a context, i.e., which version to update
Trang 47NoSQL Databases – Amir H. Payberah 47 April 26, 2012
Trang 48NoSQL Databases – Amir H. Payberah 48 April 26, 2012
Trang 49NoSQL Databases – Amir H. Payberah 49 April 26, 2012
Trang 50NoSQL Databases – Amir H. Payberah 50 April 26, 2012
Trang 51NoSQL Databases – Amir H. Payberah 51 April 26, 2012
Trang 52NoSQL Databases – Amir H. Payberah 52 April 26, 2012
Trang 53NoSQL Databases – Amir H. Payberah 53 April 26, 2012
Trang 54NoSQL Databases – Amir H. Payberah 54 April 26, 2012
Adding Node
● A new node X added to system
X is assigned key ranges w.r.t. its virtual servers
For each key range, it transfers the data items
Trang 55NoSQL Databases – Amir H. Payberah 55 April 26, 2012
Trang 56NoSQL Databases – Amir H. Payberah 56 April 26, 2012
Trang 57NoSQL Databases – Amir H. Payberah 57 April 26, 2012
Trang 58NoSQL Databases – Amir H. Payberah 58 April 26, 2012
Trang 59NoSQL Databases – Amir H. Payberah 59 April 26, 2012
Trang 60BigTable
Trang 61NoSQL Databases – Amir H. Payberah 61 April 26, 2012
Trang 62NoSQL Databases – Amir H. Payberah 62 April 26, 2012
Trang 63NoSQL Databases – Amir H. Payberah 63 April 26, 2012
Table Model
● Distributed multidimensional sparse map
Trang 64NoSQL Databases – Amir H. Payberah 64 April 26, 2012
Table Model Rows
● Every read or write in a row is atomic
● Rows sorted in lexicographical order
“com.cnn.www”
Trang 65NoSQL Databases – Amir H. Payberah 65 April 26, 2012
Trang 66NoSQL Databases – Amir H. Payberah 66 April 26, 2012
Trang 67NoSQL Databases – Amir H. Payberah 67 April 26, 2012
Tablets: Pieces of a Table
● A table starts as one tablet
As it grows, it it split into multiple tablet.
Trang 68NoSQL Databases – Amir H. Payberah 68 April 26, 2012
Trang 69NoSQL Databases – Amir H. Payberah 69 April 26, 2012
Trang 70NoSQL Databases – Amir H. Payberah 70 April 26, 2012
Trang 71NoSQL Databases – Amir H. Payberah 71 April 26, 2012
Trang 72NoSQL Databases – Amir H. Payberah 72 April 26, 2012
Trang 73NoSQL Databases – Amir H. Payberah 73 April 26, 2012
Trang 74NoSQL Databases – Amir H. Payberah 74 April 26, 2012
Trang 75NoSQL Databases – Amir H. Payberah 75 April 26, 2012
Major Components
● Tablet server
● Master server
● Client library
Trang 76NoSQL Databases – Amir H. Payberah 76 April 26, 2012
Trang 77NoSQL Databases – Amir H. Payberah 77 April 26, 2012
Trang 78NoSQL Databases – Amir H. Payberah 78 April 26, 2012
Major Components – Client Library
● Library that is linked into every client
● Client data does not move though the master
● Clients communicate directly with tablet servers for reads/writes
Trang 79NoSQL Databases – Amir H. Payberah 79 April 26, 2012
Highlevel Structure
Trang 80NoSQL Databases – Amir H. Payberah 80 April 26, 2012
Trang 81NoSQL Databases – Amir H. Payberah 81 April 26, 2012
Trang 82NoSQL Databases – Amir H. Payberah 82 April 26, 2012
Trang 83NoSQL Databases – Amir H. Payberah 83 April 26, 2012
Master is responsible for finding when tablet server is no longer serving
its tablets and reassigning those tablets as soon as possible.
Trang 84NoSQL Databases – Amir H. Payberah 84 April 26, 2012
Trang 85NoSQL Databases – Amir H. Payberah 85 April 26, 2012
tablet log
sstable sstable
GFS
Memeory
Write operations are logged
Recent updates kept sorted
in memory
Recent updates kept sorted
in memory Memtable and sstables are merged to
serve a read request
Trang 86NoSQL Databases – Amir H. Payberah 86 April 26, 2012
request
Tablet server
Tablet server Tablet server
Client
req ues t
response
Trang 87NoSQL Databases – Amir H. Payberah 87 April 26, 2012
Trang 88NoSQL Databases – Amir H. Payberah 88 April 26, 2012
Trang 89NoSQL Databases – Amir H. Payberah 89 April 26, 2012
Trang 90Cassandra
Trang 91NoSQL Databases – Amir H. Payberah
Trang 92NoSQL Databases – Amir H. Payberah 92 April 26, 2012
Trang 93NoSQL Databases – Amir H. Payberah 93 April 26, 2012
Trang 94Any Questions?