Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
0,94 MB
Nội dung
However, is it is possible to use the NDB Cluster technologies without the MySQL server, but this requires lower-level programming with the NDB API. The NDB API is object-oriented and implements indexes, scans, transactions, and event handling. This allows you to write applications that retrieve, store, and manipulate data in the cluster. The NDB API also provides object-oriented error-handling facilities to allow orderly shutdown or recovery during failures. If you are a developer and want to learn more about the NDB API, see the MySQL NDB API online documentation. How Does MySQL Cluster Differ from MySQL? You may be wondering, “What is the difference between a cluster and replication?” There are several definitions of clustering, but it can generally be viewed as something that has membership, messaging, redundancy, and automatic failover capabilities. Replication, in contrast, is simply a way to send messages (data) from one server to another. We discuss replication within a cluster (also called local replication) and MySQL replication in more detail later in this chapter. Typical Configuration You can view the MySQL Cluster as having three layers: • Applications that communicate with the MySQL server • The MySQL server that processes the SQL commands and communicates to the NDB storage engine • The NDB Cluster components (sometimes called data nodes) that process the queries and return the results to the MySQL server You can scale up each layer independently with more server processes to increase performance. Figure 15-1 shows a conceptual drawing of a typical cluster installation. The applications connect to the MySQL server, which accesses the NDB Cluster com- ponents via the storage engine layer (specifically, the NDB storage engine). We will discuss the NDB Cluster components in more detail momentarily. There are many possible configurations. You can use multiple MySQL servers to con- nect to a single NDB Cluster and even connect multiple NDB Clusters via MySQL replication. We will discuss more of these configurations in later sections. What Is MySQL Cluster? | 527 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Features of MySQL Cluster To satisfy the goals of having the highest achievable performance, high availability, and redundancy, data is replicated inside the cluster among the peer data nodes. The data is replicated using a synchronous mechanism in which each data node connects to every other data node and data is stored on multiple data nodes. It is also possible to replicate data between clusters, but in this case you use MySQL replication, which is asynchronous rather than synchro- nous. As we’ve discussed in previous chapters, asynchronous replication means you must expect a delay in updating the slaves, slaves do not report back the progress in committing changes, and you cannot expect a consistent view across all servers in the replicated architecture like you can expect within a single MySQL cluster. MySQL Cluster has several specialized features for creating a highly available system. The most significant ones are: Figure 15-1. MySQL Cluster 528 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Node recovery Data node failures can be detected via either communication loss or heartbeat failure, and you can configure the nodes to restart automatically using copies of the data from the remaining nodes. Failure and recovery can comprise single or multiple storage nodes. This is also called local recovery. Logging During normal data updates, copies of the data change events are written to a log stored on each data node. You can use the logs to restore the data to a point in time. Checkpointing The cluster supports two forms of checkpoints, local and global. Local checkpoints remove the tail of the log. Global checkpoints are created when the logs of all data nodes are flushed to disk, creating a transaction-consistent snapshot of all node data to disk. In this way, checkpointing permits a complete system restore of all nodes from a known good synchronization point. System recovery In the event the whole system is shut down unexpectedly, you can restore it using checkpoints and change logs. Typically, the data is copied from disk into memory from known good synchronization points. Hot backup and restore You can create simultaneous backups of each data node without disturbing exe- cuting transactions. The backup includes the metadata about the objects in the database, the data itself, and the current transaction log. No single point of failure The architecture is designed so that any node can fail without bringing down the database system. Failover To ensure node recovery is possible, all transactions are committed using read commit isolation and two-phase commits. Transactions are then doubly safe; that is, they are stored in two separate locations before the user gets acceptance of the transaction. Partitioning Data is automatically partitioned across the data nodes. MySQL version 5.1 Cluster supports user-defined partitioning. Online operations You can perform many of the maintenance operations online without the normal interruptions. These are operations that normally require stopping a server or placing locks on data. For example, it is possible to add new data nodes online, alter table structures, and even reorganize the data in the cluster. For more information about MySQL Cluster, see the online reference manual. What Is MySQL Cluster? | 529 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Local and Global Redundancy You can create local redundancy (inside a particular cluster) using a two-phase commit protocol. In principle, each node goes through a round in which it agrees to make a change, then undergoes a round in which it commits the transaction. During the agree- ment phase, each node ensures that there are enough resources to commit the change in the second round. In NDB Cluster, the MySQL server commit protocol changes to allow updates to multiple nodes. NDB Cluster also has an optimized version of two- phase commit that reduces the number of messages sent using synchronous replication. The two-phase protocol ensures the data is redundantly stored on multiple data nodes, a state known as local redundancy. Global redundancy uses MySQL replication between clusters. This establishes two nodes in a replication topology. As discussed previously, MySQL replication is asyn- chronous because it does not include an acknowledgment or receipt for arrival or ex- ecution of the events replicated. Figure 15-2 illustrates the differences. Figure 15-2. Local and global redundancy 530 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Log Handling MySQL Cluster implements two types of checkpoints: local checkpoints to purge part of the redo log and a global checkpoint that is mainly for synchronizing between the different data nodes. The global checkpoint becomes important for replication because it forms the boundary between sets of transactions known as epochs. Each epoch is replicated between clusters as a single unit. In fact, MySQL replication treats the set of transactions between two consecutive global checkpoints as a single transaction. Redundancy and Distributed Data Data redundancy uses replicas. Each replica has a copy of the data. This allows a cluster to be fault tolerant. If any data node fails, you can still access the data. Naturally, the more replicas you allow in a cluster, the more fault tolerant the cluster will be. Split-Brain Syndrome If one or more data nodes fail, it is possible that the remaining data nodes will be unable to communicate. When this happens, the two sets of data nodes are in a split-brain scenario. This type of situation is undesirable, because each set of data nodes could theoretically perform as a separate cluster. To overcome this, you need a network partitioning algorithm to decide between the competing sets of data nodes. The decision is made in each set independently. The set with the minority of nodes will be restarted and each node of that set will need to join the majority set individually. If the two sets of nodes are exactly the same size, a theoretical problem still exists. If you split four nodes into two sets with two nodes in each, how do you know which set is a minority? For this purpose, you can define an arbitrator. In the case that the sets are exactly the same size, the set that first succeeds in contacting the arbitrator wins. You can designate the arbitrator as either a MySQL server (SQL node) or a management node. For best availability, you should locate the arbitrator on a system that does not host a data node. The network partitioning algorithm with arbitration is fully automatic in MySQL Clus- ter, and the minority is defined with respect to node groups to make the system even more available than it would be compared to just counting the nodes. You can specify how many copies of the data (NoOfReplicas) exist in the cluster. You need to set up as many data nodes as you want replicas. You can also distribute the data across the data nodes using partitioning. In this case, each data node has only a portion of the data, making queries faster. But since you have multiple copies of the data, you can still query the data in the event that a node fails, and the recovery of the missing node is assured (because the data exists in the other replicas). To achieve this, you need multiple data nodes for each replica. For example, if you want two replicas What Is MySQL Cluster? | 531 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. and partitioning, you need to have at least four data nodes (two data nodes for each replica). Architecture of MySQL Cluster MySQL Cluster is composed of one or more MySQL servers communicating via the NDB storage engine to an NDB cluster. An NDB cluster itself is composed of several components: data or storage nodes that store and retrieve the data and one or more management nodes that coordinate startup, shutdown, and recovery of data nodes. Most of the NDB components are implemented as daemon processes, while MySQL Cluster also offers client utilities to manipulate the daemons’ features. A list of the daemons and utilities follows. Figure 15-3 depicts how each of these components communicates. mysqld The MySQL server NDBd A data node NDBmtd A multithreaded data node NDB_mgmd The cluster’s management server NDB_mgm The cluster’s management client Each MySQL server with the executable name mysqld typically supports one or more applications that issue SQL queries and receive results from the data nodes. When discussing MySQL Cluster, the MySQL servers are sometimes called SQL nodes. The data nodes are NDB daemon processes that store and retrieve the data either in memory or on disk depending on their configuration. Data nodes are installed on each server participating in the cluster. There is also a multithreaded data node daemon named NDBmtd that works on platforms that support multiple CPU cores. You can see improved data node performance if you use the multithreaded data node on dedi- cated servers with modern multiple-core CPUs. The management daemon, NDB_mgmd, runs on a server and is responsible for reading a configuration file and distributing the information to all of the nodes in the cluster. NDB_mgm, the NDB management client utility, can check the cluster’s status, start backups, and perform other administrative functions. This client runs on a host con- venient to the administrator and communicates with the daemon. There are also a number of utilities that make maintenance easier. A few of the more popular ones follow. Consult the NDB Cluster documentation for a complete list. 532 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. NDB_config Extracts configuration information from existing nodes. NDB_delete_all Deletes all rows from an NDB table. NDB_desc Describes NDB tables (like SHOW CREATE TABLE). NDB_drop_index Drops an index from an NDB table. NDB_drop_table Drops an NDB table. NDB_error_reporter Diagnoses errors and problems in a cluster. NDB_redo_log_reader Checks and prints out a cluster redo log. NDB_restore Performs a restore of a cluster. Backups are made using the NDB management client. How Data Is Stored MySQL Cluster keeps all indexed columns in main memory. You can store the re- maining nonindexed columns either in memory or on disk with an in-memory page cache. Storing nonindexed columns on disk allows you to store more data than the size of available memory. When data is changed (via INSERT, UPDATE, DELETE, etc.), MySQL Cluster writes a record of the change to a redo log, checkpointing data to disk regularly. As described Figure 15-3. The MySQL Cluster components Architecture of MySQL Cluster | 533 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. previously, the log and the checkpoints permit recovery from disk after a failure. How- ever, because the redo logs are written asynchronously with the commit, it is possible that a limited number of transactions can be lost during a failure. To mitigate this possibility, MySQL Cluster implements a write delay (with a default of two seconds, but this is configurable). This allows the checkpoint write to complete so that if a failure occurs, the last checkpoint is not lost as a result of the failure. Normal failures of indi- vidual data nodes do not result in any data loss due to the synchronous data replication within the cluster. When a MySQL Cluster table is maintained in memory, the cluster accesses disk storage only to write records of the changes to the redo log and to execute the requisite check- points. Since writing the logs and checkpoints is sequential and few random access patterns are involved, MySQL Cluster can achieve higher write throughput rates with limited disk hardware than the traditional disk caching used in relational database systems. You can calculate the size of memory you need for a data node using the following formula. The size of the database is the sum of the size of the rows times the number of rows for each table. Keep in mind that if you use disk storage for nonindexed col- umns, you should count only the indexed columns in calculating the necessary memory. (SizeofDatabase × NumberOfReplicas × 1.1 ) / NumberOfDataNodes This is a simplified formula for rough calculation. When planning the memory of your cluster, you should consult the online MySQL Cluster Reference Manual for additional details to consider. You can also use the Perl script NDB_size.pl found in most distributions. This script connects to a running MySQL server, traverses all the existing tables in a set of data- bases, and calculates the memory they would require in a MySQL cluster. This is con- venient, because it permits you to create and populate the tables on a normal MySQL server first, then check your memory configuration before you set up, configure, and load data into your cluster. It is also useful to run periodically to keep ahead of schema changes that can result in memory issues and to give you an idea of your memory usage. Example 15-1 depicts a sample report for a simple database with a single table. To find the total size of the database, multiply the size of the data row from the summary by the number of rows. In Example 15-1, we have (for MySQL version 5.1) 84 bytes per row for data and index. If we had 64,000 rows, we would need to have 5,376,000 bytes of memory to store the table. If the script generates an error about a missing Class/Method- Maker.pm module, you need to install this class on your system. For example, on Ubuntu you can install it with the following command: sudo apt-get install libclass-methodmaker-perl 534 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Example 15-1. Checking the size of a database with NDB_size.pl cbell@cbell-mini:~/mysql-cluster-gpl-7.0.13-linux-i686-glibc23/bin$ ./NDB_size.pl \ --database=cluster_test --user=root NDB_size.pl report for database: 'cluster_test' (1 tables) ---------------------------------------------------------- Connected to: DBI:mysql:host=localhost Including information for versions: 4.1, 5.0, 5.1 cluster_test.City ----------------- DataMemory for Columns (* means varsized DataMemory): Column Name Type Varsized Key 4.1 5.0 5.1 district char(20) 20 20 20 population int(11) 4 4 4 ccode char(3) 4 4 4 name char(35) 36 36 36 id int(11) PRI 4 4 4 -- -- -- Fixed Size Columns DM/Row 68 68 68 Varsize Columns DM/Row 0 0 0 DataMemory for Indexes: Index Name Type 4.1 5.0 5.1 PRIMARY BTREE N/A N/A N/A -- -- -- Total Index DM/Row 0 0 0 IndexMemory for Indexes: Index Name 4.1 5.0 5.1 PRIMARY 29 16 16 -- -- -- Indexes IM/Row 29 16 16 Summary (for THIS table): 4.1 5.0 5.1 Fixed Overhead DM/Row 12 12 16 NULL Bytes/Row 0 0 0 DataMemory/Row 80 80 84 (Includes overhead, bitmap and indexes) Varsize Overhead DM/Row 0 0 8 Varsize NULL Bytes/Row 0 0 0 Avg Varside DM/Row 0 0 0 No. Rows 3 3 3 Rows/32kb DM Page 408 408 388 Fixedsize DataMemory (KB) 32 32 32 Rows/32kb Varsize DM Page 0 0 0 Varsize DataMemory (KB) 0 0 0 Rows/8kb IM Page 282 512 512 Architecture of MySQL Cluster | 535 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. IndexMemory (KB) 8 8 8 Parameter Minimum Requirements ------------------------------ * indicates greater than default Parameter Default 4.1 5.0 5.1 DataMemory (KB) 81920 32 32 32 NoOfOrderedIndexes 128 1 1 1 NoOfTables 128 1 1 1 IndexMemory (KB) 18432 8 8 8 NoOfUniqueHashIndexes 64 0 0 0 NoOfAttributes 1000 5 5 5 NoOfTriggers 768 5 5 5 Notice that while Example 15-1 uses a very simple table, the output shows not only the row size, but also a host of statistics for the tables in the database. The report also shows the indexing statistics, which are the key mechanism the cluster uses for high performance. The script displays the different memory requirements across MySQL versions. This allows you to see any differences if you are working with older versions of MySQL Cluster. Partitioning One of the most important aspects of MySQL Cluster is data partitioning. MySQL Cluster partitions data horizontally. That is, the rows are automatically divided among the data nodes using a function to distribute the rows. This is based on a hashing algorithm that uses the primary key for the table. In early versions of MySQL, the software uses an internal mechanism for partitioning, but MySQL versions 5.1 and later allow you to provide your own function for partitioning data. If you use your own function for partitioning, you should create a function that ensures the data is distrib- uted evenly among the data nodes. If a table does not have a primary key, MySQL Cluster adds a surrogate primary key. Partitioning allows the MySQL Cluster to achieve higher performance for queries be- cause it supports distribution of queries among the data nodes. Thus, a query will return results much faster when gathering data across several nodes than from a single node. For example, you can execute the following query on each data node, getting the sum of the column on each one and summing those results: SELECT SUM(population) FROM cluster_db.city; 536 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... and restart the data node Achieving High Availability | 551 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark For more information about database high availability and MySQLhigh availability, see the following white papers: http://www .mysql. com/why -mysql/ white-papers /mysql_ db _high_ availability.php http://www .mysql. com/why -mysql/ white-papers /mysql_ ha_solutions.php Replication... adequate for using MySQL Cluster in the cloud You can find a complete list of all of the considerations for high performance in the MySQL Cluster” section of the online MySQL Reference Manual For general MySQL performance improvements, see High Performance MySQL by Baron Schwartz et al (O’Reilly) High Performance Best Practices There are a number of things you can do to ensure your MySQL Cluster is... practices, you will be well on your way to making MySQL Cluster the best high- performance, high- availability solution for your organization For more information about optimizing MySQL Cluster, see the white paper “Optimizing Performance of the MySQL Cluster Database” at http://www .mysql. com/why -mysql/ white-papers /mysql_ wp_cluster_perfomance.php 560 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge... For more information about MySQL Cluster backup and restore, please see the “Using the MySQL Cluster Management Client to Create a Backup” and “Restore a MySQL Cluster Backup” sections of the online MySQL Reference Manual at the following URLs: http://dev .mysql. com/doc/refman/5.1/en /mysql- cluster-backup-using-management -client.html http://dev .mysql. com/doc/refman/5.1/en /mysql- cluster-programs-NDB-restore... size of the binary log may be larger than for normal MySQL replication MySQL replication to replicate data from one cluster to another permits you to leverage the advantages of MySQL Cluster at each site and still replicate the data to other sites Can MySQL Replication Be Used with MySQL Cluster? You can replicate from a MySQL Cluster server to a non -MySQL Cluster server (or vice versa) No special configuration... procedure 556 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Achieving High Performance MySQL Cluster is designed not only for high availability, but also for high performance We have already reviewed many of these features, as they are often beneficial for high availability In this section, we examine a few features that provide high performance We... divide the load across the web and MySQL servers Achieving High Availability | 549 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark When configuring a MySQL cluster for high availability, you should consider employing all of the following best practices We discuss these in more detail later in this chapter when we examine high performance MySQL Cluster techniques • • • • Use... more information about MySQL Cluster, its architecture, and its version 7.0 features, see the white paper available at http://www .mysql. com/why -mysql/ white-papers/ mysql_ wp_cluster7_architecture.php 538 | Chapter 15: MySQL Cluster Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Example Configuration In this section, we present a sample configuration of a MySQL Cluster running... events 100325 9:14:23 [Note] NDB: Creating mysql. NDB_schema 100325 9:14:23 [Note] NDB: Flushing mysql. NDB_schema 100325 9:14:23 [Note] NDB Binlog: CREATE TABLE Event: REPL $mysql/ NDB_schema 100325 9:14:23 [Note] NDB Binlog: logging /mysql/ NDB_schema (UPDATED,USE_WRITE) 100325 9:14:23 [Note] NDB: Creating mysql. NDB_apply_status 100325 9:14:23 [Note] NDB: Flushing mysql. NDB_apply_status 100325 9:14:23 [Note]... briefly discussed how MySQL replication and replication inside the cluster differ MySQL Cluster replication is sometimes called internal cluster replication or simply internal replication to clarify that it is not MySQL replication MySQL replication is sometimes called external replication In this section, we discuss MySQL Cluster internal replication We will also look at how MySQL replication (external . single MySQL cluster. MySQL Cluster has several specialized features for creating a highly available system. The most significant ones are: Figure 15-1. MySQL. about MySQL Cluster, its architecture, and its version 7.0 fea- tures, see the white paper available at http://www .mysql. com/why -mysql/ white-papers/ mysql_ wp_cluster7_architecture.php.