Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
864,27 KB
Nội dung
the timeout limit—and acknowledge that the transaction has been properly written to disk You can use the rpl-semi-sync-master-wait-no-slave={ON|OFF} option to turn off this behavior, in which case the master reverts to asynchronous replication if there are no connected slaves Monitoring semisynchronous replication Both plug-ins install a number of status variables that allow you to monitor semisynchronous replication We will cover the most interesting ones here—for a complete list, consult the online reference manual for semisynchronous replication rpl_semi_sync_master_clients This status variable reports the number of connected slaves that support and have registered for semisynchronous replication rpl_semi_sync_master_status The status of semisynchronous replication on the master is if it is active, and if it is inactive—either because it has not been enabled or because it was enabled but has reverted to asynchronous replication rpl_semi_sync_slave_status The status of semisynchronous replication on the slave is if active—that is, it has been enabled and the I/O thread is running—and if it is inactive You can read the values of these variables either using the SHOW STATUS command or through the information schema table GLOBAL_STATUS If you want to use the values for other purposes, the SHOW STATUS command is hard to use and a query as shown in Example 4-5 uses SELECT on the information schema to extract the value and store it in a user-defined variable Example 4-5 Retrieving values using the information schema master> SELECT Variable_value INTO @value -> FROM INFORMATION_SCHEMA.GLOBAL_STATUS -> WHERE Variable_name = 'Rpl_semi_sync_master_status'; Query OK, row affected (0.00 sec) Slave Promotion The procedures described so far work well when you have a master running that you can use to synchronize the standby and the slave before the switchover, but what happens if the master dies all of a sudden? Since replication has stopped in its tracks with all slaves (including the standby), it will not be possible to run replication just a little more to get all the necessary changes that would put the new master in sync If the standby is ahead of all the slaves that need to be reassigned, there is no problem, because you can run replication on each slave to the place where the standby stopped Procedures | 127 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark You will lose any changes that were made on the master but not yet sent to the standby We will cover how to handle the recovery of the master in this case separately If the standby is behind one of the slaves, you shouldn’t use the standby as the new master, since the slave knows more than the standby As a matter of fact, it would be better if the “more knowledgeable” slave—that is, the slave that has replicated most events from the common master—were the master instead! This is exactly the approach taken to handle master failures using slave promotion: instead of trying to keep a dedicated standby around, ensure that any one of the slaves connected to the master can be promoted to master and take over at the point where the master was lost By selecting the “most knowledgeable” slave as the new master, you guarantee that none of the other slaves will be more knowledgeable than the new master, so they can connect to the new master and read events from it There is, however, a critical issue that needs to be resolved—synchronizing all slaves with the new master so that no events are lost or repeated The problem in this situation is that all of the slaves need to read events from the new master, but the positions of the new master are not the same as the positions for the old master So what is a poor DBA to do? The traditional method for promoting a slave Before delving into the final solution, let us first take a look at the recommended practice for handling slave promotion This will work as a good introduction to the problem, and also allow us to pinpoint the tricky issues that we need to handle for the final solution Figure 4-8 shows a typical setup with a master and several slaves Figure 4-8 Promoting a slave to replace a failed master 128 | Chapter 4: Replication for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark For the traditional method of slave promotion, the following are required: • Each promotable slave must have a user account for the replication user • Each promotable slave should run with log-bin, that is, with the binary log enabled • Each promotable slave should run without the log-slave-updates option (the reason will become obvious shortly) Assume you are starting with the original setup shown in Figure 4-8 and that the master fails You can promote a slave to be the new master by doing the following: Stop the slave using STOP SLAVE Reset the slave that is going to be the new master using RESET MASTER This will ensure the slave starts as the new master and that any connecting slave will start reading events from the time the slave was promoted Connect the other slaves to the new master using CHANGE MASTER TO Since you reset the new master, you can start replication from the beginning of the binary log, so it is not necessary to provide any position to CHANGE MASTER TO Unfortunately, this approach is based on an assumption that is not generally true— that the slaves have received all changes that the master has made In a typical setup, the slaves will lag behind the master to various degrees It might be just a few transactions, but nevertheless, they lag behind In the next section you will see a solution to that problem Regardless of that, this approach is so simple that it is useful if you can handle lost transactions or if you are operating under a low load A revised method for promoting a slave The traditional approach to promoting a slave is inadequate in most cases because slaves usually lag behind the master Figure 4-9 illustrates the typical situation when the master disappears unexpectedly The box labeled “binary log” in the center is the master’s binary log and each arrow represents how much of the binary log the slave has executed In the figure, each slave has stopped at a different binlog position To resolve the issue and bring the system back online, one slave has to be selected as the new master— preferably the one that has the latest binlog position—and the other slaves have to be synchronized with the new master The critical problem lies in translating the positions for each slave—which are the positions in the now-defunct master—to positions on the promoted slave Unfortunately, the history of events executed and the binlog positions they correspond to on the slaves are lost in the replication process—each time the slave executes an event that has arrived from the master, it writes a new event to its binary log, with a new binlog position The slave’s position bears no relation to the master’s binlog position of the Procedures | 129 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Figure 4-9 Binary log positions of the master and the connected slaves same event The only option that remains for us is to scan the binary log of the promoted slave To use this technique: • Enable the binary log; otherwise, no changes can be replicated • Enable log slave updates (using the log-slave-updates option); otherwise, no changes from the original master can be forwarded • Each slave needs to have a replication user to act as a master so that if it turns out to be the best candidate for a new master, other slaves can to connect to it and replicate from it Carry out the following steps for each of the slaves that are not promoted: Figure out the last transaction it executed Find the transaction in the binary log of the promoted slave Take the binlog position for the transaction from the promoted slave Start the nonpromoted slaves to replicate from that position on the promoted slave To match the latest transaction on each of the slaves with the corresponding event in the binary log of the promoted slave, you need to tag each transaction The content and structure of the tags don’t matter; they just need to be uniquely identifiable no matter who executed the transaction so each transaction on the master can be found in the promoted slave’s binary log We call this kind of tag the global transaction ID The easiest way to accomplish this is to insert a statement at the end of each transaction that updates a special table and use that to keep track of where each slave is Just before 130 | Chapter 4: Replication for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark committing each transaction, a statement updates the table with a number that is unique for the transaction Tagging can be handled in two main ways: • Extending the application code to perform the necessary statements • Calling a stored procedure to perform each commit and writing the tag in the procedure Because the first approach is easier to follow, it will be demonstrated here If you are interested in the second approach, see “Stored Procedures to Commit Transactions” on page 141 To implement the global transaction ID, we have created the two tables in Example 4-6: one table named Global_Trans_ID to generate sequence numbers and a separate table named Last_Exec_Trans to record the global transaction ID The server ID is added to the definition of Last_Exec_Trans to distinguish transactions committed on different servers If, for example, the promoted slave fails before all the slaves have managed to connect, it is very important to distinguish between the transaction ID of the original master and the transaction ID of the promoted slave Otherwise, the slaves that didn’t manage to connect to the promoted slave might start to execute from a position that is wrong when being redirected to the second promoted slave This example uses MyISAM to define the counter table, but it is possible to use InnoDB for this as well Example 4-6 Tables used for generating and tracking global transaction IDs CREATE TABLE Global_Trans_ID ( number INT UNSIGNED AUTO_INCREMENT PRIMARY KEY ) ENGINE = MyISAM; CREATE TABLE Last_Exec_Trans ( server_id INT UNSIGNED, trans_id INT UNSIGNED ) ENGINE = InnoDB; Insert a single row with NULLs to be updated INSERT INTO Last_Exec_Trans() VALUES (); The next step is to construct a procedure for adding a global transaction ID to the binary log so that a program promoting a slave can read the ID from the log The following procedure is suitable for our purposes: Insert an item into the transaction counter table, making sure to turn off the binary log before doing this, since the insert should not be replicated to the slaves: master> SET SQL_LOG_BIN = 0; Query OK, rows affected (0.00 sec) Procedures | 131 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark master> INSERT INTO Global_Trans_ID() VALUES (); Query OK, row affected (0.00 sec) Fetch the global transaction ID using the function LAST_INSERT_ID To simplify the logic, the server ID is fetched from the server variable server_id at the same time: master> SELECT @@server_id as server_id, LAST_INSERT_ID() as trans_id; + -+ + | server_id | trans_id | + -+ + | | 235 | + -+ + row in set (0.00 sec) Before inserting the global transaction ID into the Last_Exec_Trans tracking table, you can remove its row from the transaction counter table to save space This optional step works only for a MyISAM table If you use InnoDB, you have to be careful about leaving the last used global transaction ID in the table InnoDB determines the next number from the maximum value in the autoincrement column currently in the table master> DELETE FROM Global_Trans_ID WHERE number < 235; Query OK, row affected (0.00 sec) Turn on the binary log: master> SET SQL_LOG_BIN = 1; Query OK, rows affected (0.00 sec) Update the Last_Exec_Trans tracking table with the server ID and the transaction ID you got in step This is the last step before committing the transaction through a COMMIT: master> UPDATE Last_Exec_Trans SET server_id = 0, trans_id = 235; Query OK, row affected (0.00 sec) master> COMMIT; Query OK, rows affected (0.00 sec) Each global transaction ID represents a point where replication can be resumed Therefore, you must carry out this procedure for every transaction If it is not used for some transaction, the transaction will not be tagged properly and it will not be possible to start from that position Now, to promote a slave after the master is lost, find the slave that has the latest changes of all the slaves—that is, has the largest binlog position—and promote it to master Then have each of the other slaves connect to it For a slave to connect to the promoted slave and start replication at the right position, it is necessary to find out what position on the promoted slave has the last executed transaction of the slave Scan the binary log of the promoted slave to find the right transaction ID 132 | Chapter 4: Replication for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Use the following steps to carry out the recovery: Stop the slave Get the last-seen global transaction ID from its Last_Exec_Trans table Pick the slave with the highest global transaction ID to promote to master If there are several, pick one Get the master position of the slave to promote and the binary logs of the slave at the same time using SHOW MASTER LOGS Note that the last row of SHOW MASTER LOGS matches what you would see in SHOW MASTER STATUS Bring the promoted slave online and let it start accepting updates Connect to the promoted slave and scan the binary log to find the latest global transaction ID that you found in each slave’s binary log Unless you have a file position that you know is good, the only good starting position for reading a binary log is the beginning Therefore, you have to scan the binary logs in reverse order, starting with the latest This step will give you a binlog position on the promoted slave for each global transaction ID that you collected in step Reconnect each slave to the promoted slave, starting at the position where the slave needs to start in order to recover all information, using the information from step The first four steps are straightforward, but step is tricky To illustrate the situation, let’s start with an example of some basic information gathered from the first three steps Table 4-2 lists three sample slaves with the global transaction ID of each slave Table 4-2 Global transaction ID for all connected slaves Server ID Trans ID slave-1 245 slave-2 248 slave-3 256 As you can see in Table 4-2, slave-3 has the latest global transaction ID and is therefore the slave you will promote It is therefore necessary to translate the global transaction ID of each slave to binlog positions on slave-3 For that, we need information about the binary log on slave-3, which we’ll obtain in Example 4-7 Procedures | 133 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Example 4-7 Master positions of slave-3, which will be promoted slave-3> SHOW MASTER LOGS; + + -+ | Log_name | File_size | + + -+ | slave-3-bin.000001 | 3115 | | slave-3-bin.000002 | 345217 | | slave-3-bin.000003 | 24665 | | slave-3-bin.000004 | 788243 | | slave-3-bin.000005 | 1778 | + + -+ row in set (0.00 sec) The important thing to know from the output of SHOW MASTER LOGS is the names of the logs, so you can scan them for global transaction IDs For instance, when reading the slave-3-bin.000005 file using mysqlbinlog, part of the output will look like that shown in Example 4-8 The transaction received by slave-3 starting at position 596 (highlighted in the first line of the output) has the global transaction ID received by slave-1, as shown by an UPDATE of the Last_Exec_Trans table Example 4-8 Output from the mysqlbinlog command for one transaction # at 596 #091018 18:35:42 server id end_log_pos 664 Query thread_id=952 SET TIMESTAMP=1255883742/*!*/; BEGIN /*!*/; # at 664 #091018 18:35:42 server id end_log_pos 779 Query thread_id=952 SET TIMESTAMP=1255883742/*!*/; UPDATE user SET messages = messages + WHERE id = /*!*/; # at 779 #091018 18:35:42 server id end_log_pos 904 Query thread_id=952 SET TIMESTAMP=1255883742/*!*/; INSERT INTO message VALUES (1,'MySQL Python Replicant rules!') /*!*/; # at 904 #091018 18:35:42 server id end_log_pos 1021 Query thread_id=952 SET TIMESTAMP=1255883742/*!*/; UPDATE Last_Exec_Trans SET server_id = 1, trans_id = 245 /*!*/; # at 1021 #091018 18:35:42 server id end_log_pos 1048 Xid = 1433 COMMIT/*!*/; Table 4-2 shows that the trans_id 245 is the last transaction seen by slave-1, so now you know that the start position for slave-1 is in file slave-3-bin.000005 at byte position 1048 So to start slave-1 at the correct position, you can now execute CHANGE MASTER TO and START SLAVE: slave-1> CHANGE MASTER TO -> MASTER_HOST = 'slave-3', 134 | Chapter 4: Replication for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark -> MASTER_LOG_FILE = 'slave-3-bin.000005', -> MASTER_LOG_POS = 1048; Query OK, rows affected (0.04 sec) slave-1> START SLAVE; Query OK, rows affected (0.17 sec) By going backward in this manner—locating each of the transactions that you recorded in the first step in the procedure—you can connect the slaves one by one to the new master at exactly the right position This technique works well if the update statement is added to every transaction commit Unfortunately, there are statements that perform an implicit commit before and after the statement Typical examples include CREATE TABLE, DROP TABLE, and ALTER TABLE Since these statements an implicit commit, they cannot be tagged properly, hence it is not possible to restart just after them This means that if the sequence of statements in Example 4-9 is executed and there is a crash, you will potentially have problems If a slave has just executed the CREATE TABLE and then loses the master, the last seen global transaction ID is for the INSERT INTO—that is, just before the CREATE TABLE statement Therefore, the slave will try to reconnect to the promoted slave with the transaction ID of the INSERT INTO statement Since it will find the position in the binary log of the promoted slave, it will start by replicating the CREATE TABLE statement again, causing the slave to stop with an error You can avoid these problems through careful use and design of statements; for example, if CREATE TABLE is replaced with CREATE TABLE IF NOT EXISTS, the slave will notice that the table already exists and skip execution of the statement Example 4-9 Statements where global transaction ID cannot be assigned INSERT INTO message_board VALUES ('mats@sun.com', 'Hello World!'); CREATE TABLE admin_table (a INT UNSIGNED); INSERT INTO message_board VALUES ('', ''); Slave promotion in Python You have now seen two techniques for promoting a slave: a traditional technique that suffers from a loss of transactions on some slaves, and a more complex technique that recovers all available transactions The traditional technique is straightforward to implement in Python, so let’s concentrate on the more complicated one To handle slave promotion this way, it is necessary to: • • • • Configure all the slaves correctly Add the tables Global_Trans_ID and Last_Exec_Trans to the master Provide the application code to commit a transaction correctly Write code to automate slave promotion Procedures | 135 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark You can use the Promotable class (shown in Example 4-10) to handle the new kind of server As you can see, this example reuses the previously introduced _enable_binlog helper method, and adds a method to set the log-slave-updates option Since a promotable slave requires the special tables we showed earlier for the master, the addition of a promotable slave requires the tables added to the master To this, we write a function named _add_global_id_tables The function assumes that if the tables already exist, they have the correct definition, so no attempt is made to re-create them However, the Last_Exec_Trans table needs to start with one row for the update to work correctly, so if no warning was produced to indicate that a table already exists, we create the table and add a row with NULL Example 4-10 The definition of a promotable slave role _GLOBAL_TRANS_ID_DEF = """ CREATE TABLE IF NOT EXISTS Global_Trans_ID ( number INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (number) ) ENGINE=MyISAM """ _LAST_EXEC_TRANS_DEF = """ CREATE TABLE IF NOT EXISTS Last_Exec_Trans ( server_id INT UNSIGNED DEFAULT NULL, trans_id INT UNSIGNED DEFAULT NULL ) ENGINE=InnoDB """ class Promotable(Role): def init (self, repl_user, master): self. master = master self. user = repl_user def _add_global_id_tables(self, master): master.sql(_GLOBAL_TRANS_ID_DEF) master.sql(_LAST_EXEC_TRANS_DEF) if not master.sql("SELECT @@warning_count"): master.sql("INSERT INTO Last_Exec_Trans() VALUES ()") def _relay_events(self, server, config): config.set('mysqld', 'log-slave-updates') def imbue(self, server): # Fetch and update the configuration config = server.get_config() self._set_server_id(server, config) self._enable_binlog(server, config) self._relay_event(server, config) # Put the new configuration in place server.stop() server.put_config(config) server.start() 136 | Chapter 4: Replication for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark ... (%d,''%s'')"; mysql_ query(sprintf($insert_message, $user_id, $message), $link); commit_trans($link); $conn = mysql_ connect(":/var/run/mysqld/mysqld1.sock", "root"); add_message(''mats@example.com'', "MySQL. .. start_trans($link) { mysql_ query("START TRANSACTION", $link); } function commit_trans($link) { mysql_ select_db("common", $link); mysql_ query("SET SQL_LOG_BIN = 0", $link); mysql_ query("INSERT... = %d"; mysql_ query(sprintf($delete_query, $trans_id), $link); mysql_ query("SET SQL_LOG_BIN = 1", $link); } $update_query = "UPDATE Last_Exec_Trans SET server_id = %d, trans_id = %d"; mysql_ query(sprintf($update_query,