Pro MySQL experts voice in open source phần 9 doc

505x_Ch17_FINAL.qxd 6/27/05 7:30 PM Page 584 Replication Imagine, for a moment, that there was a single, physical phone book for the entire country where you lived. This phone book was housed in a building in one city, and had specific hours in which you were permitted to visit. Imagine that phone numbers weren’t available from any other source. To look up a friend’s phone number, or find the number to call to make a reser- vation for dinner, you’d be required to travel to the phone book building, get in line, and wait for your turn to flip through the pages. You might imagine that over time, special services would arise in which you could call in a request to the phone book office or a third-party serv- ice that would retrieve the number for you. You might argue that this arrangement makes having the phone book pointless, and the data stored in the book of little value to anyone who needs immediate access to the data, or who doesn’t live within close proximity to the phone book office. Much like our phone book example, organizations or applications often require multiple instances of their database, either within the same physical space for scalability or redundancy, or spread halfway around the world for geographic diversity. In either case, the data needs to be available in multiple instances to provide value to the organization. Although issues with geographic diversity can sometimes be solved with good network connections, there are plenty of cases in which having a separate instance of the data better serves the needs of the organization. Fortunately, we don’t live in a world where data has to be confined to a single physical location. With replication, a MySQL database can exist partially, or in its entirety, in many different locations, with each replicated instance following close behind the primary database. In the context of databases, replication means creating a copy of the data in an alternate location. In most instances, this means the data is available via a second or third server, either in the same location or a geographically separate location. However, there’s nothing to prevent you from using replication between two databases on a single server. Replication is as much about having an alternate copy as it is about active synchronization of the data, either real- time or at some interval. The goal of replication is to make data from one database available in more than that one place. Replication in MySQL can be fairly simple to set up, depending on the complexity of your replication requirements. For a single replicated database with a small amount of data, you’re just a few commands away from having a replicated database—but more on that later. 585 CHAPTER 18 ■ ■ ■ 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 585 By the time you’ve completed this chapter, you’ll have learned about the following topics: • What replication is • Why you should replicate data • What replication doesn’t solve • How to plan for replication • How MySQL implements replication •Setting up replication initially •Understanding your configuration options •Monitoring and managing MySQL replication •Replication performance •Examples of replication Without further delay, let’s start our look into replication with a discussion of what we mean by replication. What Is Replication? Depending on your previous experience with replication, you may have some ideas about what it means to replicate your data between servers or systems. Data replication tools are available in most widely used database systems (Oracle, SQL Server, Sybase, PostgreSQL, and so on), but the feature sets and management tools of each system vary. Te rm inology The terminology for replication varies between database systems. However, all seem to split the replicated databases into two groups: databases that provide data and databases that con- sume data. 1 You might think of these groups as some databases exporting their data and other databases importing data. To further complicate replication, you can set up a database to provide data to another database while at the same time being a consumer of data. We’ll get into a few configuration examples later in the chapter to illustrate why and how you might use a database as both data provider and consumer. Terminology to describe the process of replicating data varies between different vendors. In MySQL, databases that are replicated from, or that export their data, are called masters. Databases that replicate the data, or import it from another server, are called slaves. CHAPTER 18 ■ REPLICATION586 1. SQL Server separates databases into publishers and subscribers, but also has a third player called the distributor. The distributor isn’t a database, but a process that can run independently on a separate machine to move data between publishers and subscribers. 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 586 REPLICATION TERMINOLOGY If you’re new to replication, or are coming from another database system, you might not be familiar with the replication terminology in MySQL. Following is a list of terms with corresponding definitions: • Master:A database that serves as the primary source of data for other databases. A master exports data to another database. • Slave: The slave replicates, or imports data changes, from another database. • Snapshot:A snapshot refers to making a point-in-time copy of the data on the primary database to be moved to a slave database. Creating a snapshot gives a starting point for replication to move forward. • Merge or multimaster: Merge or multimaster replication is a concept in which a system has multiple databases that feed each other updates. MySQL doesn’t support this. Synchronous vs. Asynchronous Replication Before we talk about the feature sets of replication systems, note that, regardless of a system’s features, the databases are kept in sync either synchronously or asynchronously. MySQL’s replication implementation is asynchronous, meaning that the data in the replicated systems lags behind that on the master, anywhere from fractions of a second to several seconds. Let’s look more closely at the differences between the two synchronization types. Synchronous Replication In synchronous replication, the data is committed to the primary database as well as the replicated database as a part of the same transaction. This is also known as dual commit or dual phase commit. The transaction is written and committed on both the master and the slave as a part of the transaction. In synchronous replication, the primary database and all replicated databases are always in sync. Figure 18-1 visually represents the process of synchronous replication. Figure 18-1. Synchronous replication Client issues query Query executed on master Query executed on slave Query committed on master and slave Status returned to client Client Master Database Slave Database CHAPTER 18 ■ REPLICATION 587 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 587 As you can see in the diagram in Figure 18-1, the query is executed on both the master and the slave, and then committed on both before the client receives the return status. With the query changing data in both places before returning a response, all databases in the environment are kept in sync. Asynchronous Replication Asynchronous replication means that a query isn’t picked up on the replicated servers until after the transaction is complete on the primary database. Typically, a process pulls or pushes data changes from the primary database at a scheduled interval and makes those changes in the replicated system. In MySQL, data is pulled from the master by a process on the slave after the master has completed the query and made an entry in the binary log. With asynchronous replication, the replicated databases are always some amount of time behind the primary database. The amount of time depends on numerous factors: how frequently the replication process grabs updates, how much data must be transferred to the replicated systems, and how fast the network will allow the data to move between those systems. Figure 18-2 shows the flow of data in asynchronous replication. Figure 18-2. Asynchronous replication Figure 18-2 illustrates how an asynchronous replication system processes the query on the master server and returns status to the client before the query is replicated on the slave. At some future point the query is pulled to the slave and executed. Again, we want to point out that MySQL replication is asynchronous, the flow of data matching that in Figure 18-2. One-Way vs. Merge Replication technology lives on a continuum that goes from simple to complex. Each database vendor has its own set of tools to accomplish replicating data from one system to another. Some of the tools are sophisticated, and include endless configuration options for controlling and optimizing the data moving between your systems. Others are fairly simple, giving just enough control to set up the system and let it take over. Client issues query Query executed on master Status returned to client Query copied to slave via separate process Query executed on slave Client Master Database Slave Database CHAPTER 18 ■ REPLICATION588 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 588 At the simpler end of the continuum are replication mechanisms that provide read-only copies of the data, or one-way replication. In a one-way replication arrangement, all updates to the database are directed to the master server, and then those changes are pulled down to the replicated databases. The communication is one-way in that any change on a slave is never communicated back to the master. It is possible to set up a replication system in which a segment of your databases or tables is replicated from a master and some tables are maintained locally. In that case, local data changes to the nonreplicated tables on the slave won’t cause problems. Otherwise, if you’ve got a set of replicated tables, you shouldn’t make updates on the slave, as the updates may cause problems with future updates from the master, and the data will most likely be lost the next time you do a complete refresh of the data. In more advanced database replication systems, replication allows for both reading and writing in the replicated database, and provides a mechanism to merge changes from multiple databases into every other replicated database. Having multiple primary databases, with reads and writes happening in each, presents some interesting problems. The replication software has to make decisions about which records take precedence when there are conflicts. MySQL’s replication falls on the simpler end of the spectrum, and doesn’t provide data merging in its replication feature. Data is replicated to read-only servers. If changes are made in the replicated data, they aren’t replicated back to the master. More information about enabling merge replication in MySQL is available at http://dev.mysql.com/books/ hpmysql-excerpts/ch07.html#hpmysql-CHP-7-SECT-7.3. Why Replicate Data? Before running out and setting up a server to replicate your data, it’s good to consider what role replication will play in the requirements of your database or database-backed application. In many instances, replication is just the thing you’ve been looking for, and will make a huge, positive impact on your system. However, in some cases it can be more hindrance than help, as discussed in the section “What Isn’t Solved with Replication.” Let’s look at a few areas where MySQL’s replication may help. Performance Having replicated databases can improve performance of your application. Perhaps you want to be able to spread the load of database queries across several database servers. If you’re at a point where the CPU or memory on your database server has peaked, or the network traffic for database transactions is reaching capacity, you may find that replicating your data onto several machines and balancing queries across multiple machines improves your database response. Even if you don’t have ongoing demand for performance improvements provided by replication, sometimes providing a separate database for certain users or specific queries can offer a great deal of relief for your primary database. Reporting or summary queries can be extremely intensive, and can slow or stop other queries to those tables. Replicating the data, and moving user accounts or pointing reporting tools to the replicated data, can be of great benefit to the primary database. CHAPTER 18 ■ REPLICATION 589 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 589 Geographic Diversity Replication is a good way to solve situations in which data is needed in multiple locations. Perhaps you have offices located across the country or around the world and need to provide a local copy of the data for each office. Using a replication mechanism could allow each office local access to its data, but also make its data available to other offices, and vice-versa. Limited Connectivity If you have inconsistent network availability, replication may be a way to provide more uptime. Perhaps you have customers in a certain part of the country with an intermittent pipe to the public Internet, but a very good network within their region. Setting up a database within the region that replicates off a master when the connection is up gives your customers a constantly available database. The data is only as current as the latest successful connection to the master, but the database is always available for use within the region. Redundancy and Backup A replicated database is an excellent way to provide redundancy and high availability. Having one or more slave databases running all the time means that you can roll onto one of the slave servers in the instance of a machine failure or disaster. You can do the switch manually, or you can program the application to make the switch if the primary machine isn’t available. In addition to providing redundancy, a replicated database is an excellent stand-by backup for instances where you need to restore from a backup. Unlike a nightly dump of the data, the replicated data is as current as the last statement read from the binary log on the master database, which is likely to be more current than your most recent backup. Using a replicated server as a backup means your backup is constantly updated, and if your primary database server goes down you’ve got an almost-current copy of the database ready to start the restore process. ■Caution Be careful when relying on a replicated database for restoring data. If you’re attempting to restore data from an accidental query, the data change will likely happen in the slaves before you can get to them to restore the data. Using replication as a backup is more appropriate for instances in which a restore is required after a disk or server failure. Storage Engine and Index Optimization Replication can allow you to take advantage of multiple storage engines for a single table or database. What does that mean? With replication it’s possible to use one table type on the master and another table type on the slave. Perhaps you want to have foreign keys, which are only allowed using the InnoDB and BDB table types, but you also want to be able to use the full-text indexing feature of the MyISAM table type. Because replication simply executes queries from one server on another server, it’s possible to have the master database use InnoDB tables, which provides referential integrity. You can alter the tables replicated to the CHAPTER 18 ■ REPLICATION590 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 590 slave to be MyISAM, including the definition of full-text indexes. If you wanted to run queries against the full-text index, you would send those queries to the slave with the MyISAM tables and full-text index. Presumably, you’d use the InnoDB features on the master to enforce data integrity, but get the advantages of the MyISAM performance, and so on. We’ve hinted at it with the full-text indexes in our multiple-storage-engines example, but it’s worthy to note that a slave database can have a different set of indexes than the master. This can be helpful if you have fundamentally different methods for accessing the data that require multiple indexes on a single table. Spreading those indexes across two different databases and sending the queries to the appropriate machine can mean reduced index sizes and improved performance. What Isn’t Solved with Replication Replication can be helpful, and necessary in many situations, but it doesn’t solve every prob- lem. Just to give a few examples: •Replication doesn’t solve data validation or integrity problems. Whatever changes are made to the master database are also made in the slave. • As cautioned earlier, using replication as a backup system to restore data from accidental updates or deletes doesn’t work. Because a replicated server has most likely executed the same query within seconds of the master, going to a slave to retrieve records that were accidentally updated or deleted on the master proves unsuccessful. •Because MySQL replication is asynchronous, it isn’t useful in a system where data is needed in real time by the slaves. •By default, replication in MySQL doesn’t allow you to merge data from two different servers into one. If you have updates happening in two databases and you need to rep- resent them in one, you might be better served by replicating the separate databases and then creating a view that brings the tables together. See Chapter 12 for more information on views in MySQL. •Replication in MySQL doesn’t natively give you the ability to run updates in two different databases and have them reflect each other’s changes by replicating each other. This is also known as multimaster replication. You now should have a sense of what replication can and can’t do, and why you might embark on creating replicated data in your environment. Planning for Replication Before we leave the replication why and get into the how, we encourage you to stop and think about how replication fits into your organization. When looking at how to build replication into your system, or expand a system to include replicated data, you should think about how you can go about fully understanding the requirements. We encourage you to identify the owners of the data and gather their expectations for the data in the system. CHAPTER 18 ■ REPLICATION 591 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 591 Armed with that knowledge, consider the things replication can and can’t do, and work with the stakeholders to develop a technically viable policy on replicating data. Go through things such as the requirements for synchronization and privacy. Help the stakeholders understand the possibilities available through replication and the process of establishing and maintaining a system with replicated data. Together, document a policy for replicating data that is technically possible and meets the requirements of your organization. Armed with the policy, put together an implementation plan that includes information on the details of where and how the data is replicated. Good documentation of this plan will serve as a fallback when you’ve been focused on other things and don’t recall the details, as well as an aid for anyone who has to step in to help with implementation or problems in the system. We understand it’s rare to be a database administrator or application developer who loves to write documentation, especially on something as nontechnical as policies. Hopefully, the potential gains from having the process documented will be motivation enough to forge through well-written policy and implementation documents. This will ensure that as you move forward, you remain on the right track and don’t cause a lot of extra work for yourself or others by not having documentation available for clarification. How MySQL Implements Replication In its simplest form, MySQL’s replication moves data from one database to another by copying all the queries that change the data in one database and running those exact statements in the replicated database. In effect, the slave databases are shadowing the master database by copying the master’s queries. ■Note Replication has been available since version 3.23.15, but underwent some significant changes in 4.0.2. If you’re attempting to set up replication that involves versions prior to 4.0.2, see MySQL’s documentation on replication for more information on replication with earlier versions of MySQL. Binary Log How does replication actually work? That’s what we’re here to look into. You’re probably familiar with the binary log, a logging mechanism that keeps track of all changes in your MySQL tables. Because replication relies on the binary log, you must enable it with the log-bin option in your database startup to successfully replicate data. Chapters 4, 17, and 20 talk about the binary log a bit, but in different contexts, so we’ll do a quick review here, keeping replication in mind. CHAPTER 18 ■ REPLICATION592 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 592 Every time a query makes a change in your database, or has the potential to make a change, that query is executed and then appended to the binary log. 2 Test this by issuing the statement in Listing 18-1 in your MySQL client. Listing 18-1. DELETE Statement to Test Binary Log DELETE FROM customer; After you’ve issued the statement, check the last entry in the binary log. If you aren’t familiar with the binary log file, it’s in your data directory. By default, if you haven’t specified it in the options, it’s named <server name>-bin.00000x (the currently active binary log is the highest numbered). It’s necessary to use the myslqbinlog tool included with MySQL to convert the binary to ASCII to make it readable. You can see that the last entry in the binary log looks like that of Listing 18-2, except your time will be different. Listing 18-2. Last Statement in Binary Log # at 716 #050319 12:07:21 server id 1 end_log_pos 793 Query thread_id=120 exec_time=0 error_code=0 USE shop; SET TIMESTAMP=1111252041; DELETE FROM customer; As you can see in Listing 18-2, the last item in the binary log is the DELETE statement. There are a few other pieces of information. First, the log entry tells us the position of the binary log when starting: #at 716. The next line gives us, among other things, the time, the server number, the binary log position at the end of the statement, the time it took to execute the query, and if there was an error. The binary log then includes a USE shop; statement to ensure you’re in the right database, a SET TIMESTAMP statement to adjust the time to the time this statement was entered, and the actual SQL statement that was processed. As you might sense, this information all comes in handy when attempting to keep another database in sync with this one. It’s as if you could copy these five lines to another identical database and see the same changes in the data on the other database. Series of statements that are part of a transaction are written to the binary log once the transaction has successfully completed. Transactions that fail and roll back, or are rolled back manually, don’t change the data and thus aren’t written to the binary log. Because we’re here to talk about replication, we won’t explain the binary log further, but you can find more information in the MySQL documentation at http://dev.mysql.com/doc/ mysql/en/binary-log.html. CHAPTER 18 ■ REPLICATION 593 2. As of MySQL version 4.1.3, any statement that could potentially change data, like a DELETE where no rows were matched, is still written to the binary log. 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 593 [...]... find more information about LOAD DATA FROM MASTER at http://dev .mysql. com /doc /mysql/ en/load-data-from-master.html mysqlhotcopy documentation can be found at http://dev .mysql. com /doc /mysql/ en/mysqlhotcopy html, and information on mysqlsnapshot is at http://jeremy.zawodny.com /mysql/ mysqlsnapshot/ Chapter 17 also contains details about ways to create data snapshots Listing 18-5 Lock Tables and Find Binary... things change quickly Although most of the concepts will remain the same, the details of MySQL s cluster engine may change As of this writing, MySQL is using version 4.1.12 for production, and makes reference to how things used to be done in 4.1 .9 If you’re using an older or newer version than these, please review MySQL s online documentation to get up-to-date information at http://dev .mysql. com /doc /mysql/ en/ndbcluster.html... Many commercial products offer replication as well as clustering for redundancy and high availability, including multimaster replication A few open- source projects also offer wrappers for MySQL Rather than getting into specific vendors, check out this link for a list of MySQL partners, in which you can find MySQL- endorsed products that offer third-party replication tools: http://solutions .mysql. com/ Summary... represented by lines 2–8 in Table 18-1 595 505x_Ch18_FINAL.qxd 596 6/27/05 3:38 PM Page 596 CHAPTER 18 ■ REPLICATION Table 18-1 Line Descriptions for master.info File Line Number Example Description 1 14 Indicates to the I/O thread how many lines of data are in the file 2 master-database-bin.000002 The name of the current binary log file on the master 3 5813 The read position of the I/O thread in the binary... been processed in the database) 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 595 CHAPTER 18 ■ REPLICATION info Files When running replication, two new files appear in your data directory: master.info and relay-log.info MySQL uses these files to save information about your replication state, and they’re used when MySQL starts up, if they’re available ■ Caution MySQL considers the information contained in. .. explain the advantages and disadvantages of MySQL Cluster Make a formal document outlining your implementation plan, including how the cluster will be configured and managed MySQL s Cluster Implementation MySQL makes cluster technology available through its NDBCLUSTER storage engine From the client perspective, using the cluster engine is just like interacting with any other storage engine Because MySQL. .. MySQL provides clients with a unified mechanism to access data in the various storage engines, using the cluster involves using the same familiar command-line interface or programmatic access method you’ve been using to get to tables in other storage engines To be fair, although client access is similar to using the other storage engines, this storage engine isn’t exactly the same The MyISAM and InnoDB... replication With your slave database running, send the dump of your master database to the client, using the statement in Listing 18-8 599 505x_Ch18_FINAL.qxd 600 6/27/05 3:38 PM Page 600 CHAPTER 18 ■ REPLICATION Listing 18-8 Create Slave Tables from Master Snapshot shell> mysql < all_database.sql Running the command in Listing 18-8 brings your slave database to the exact point in time that your master was when... parameter Daisy Chain Although having multiple slaves pointed to a single master works well in certain situations, there may be other cases where creating a chain of replicated machines meets your needs better Perhaps having many machines replicating from a single master requires too much work for your master, or your replication environment is spread across such a large geographic area that chaining the closest... for SSL information The complete listing of entries in the master.info file is shown in Table 18-1 Lines 9 through 14 contain information about the use of SSL connections for the replication threads.3 These lines may be blank if values aren’t specified on startup 3 The SSL connection options are new to the master.info file as of MySQL version 4.1 Previous versions of MySQL included only seven lines, . binary log. Because we’re here to talk about replication, we won’t explain the binary log further, but you can find more information in the MySQL documentation at http://dev .mysql. com /doc/ mysql/ en/binary-log.html. CHAPTER. versions of MySQL included only seven lines, represented by lines 2–8 in Table 18-1. 505x_Ch18_FINAL.qxd 6/27/05 3:38 PM Page 595 Table 18-1. Line Descriptions for master.info File Line Number. can find more information about LOAD DATA FROM MASTER at http://dev .mysql. com /doc /mysql/ en/load-data-from-master.html. mysqlhotcopy documentation can be found at http://dev .mysql. com /doc /mysql/ en/mysqlhotcopy. html ,

Định dạng
Số trang	77
Dung lượng	585,37 KB