Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 42 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
42
Dung lượng
257,97 KB
Nội dung
of approximately 2–3K and reasonably efficient PHP code.) Some benchmarks show that on a quad system, Microsoft IIS running on Windows will outperform Apache running on Linux under high load on static HTML content. However, this information probably won’t influence an experienced MySQL Web system architect, for the following reasons: ■■ Because much of the content is dynamic, static content performance is not as significant in the decision. As far as dynamic content is concerned, PHP has a reputation of outperforming ASP. Although it is difficult to come up with a fair benchmark for two different languages, many users claim a ten- fold increase in performance after converting their ASP applications to PHP. ■■ It is more cost-effective to scale Web performance by creating a Web server farm of uniprocessor or dual-processor systems than to buy quad servers. ■■ The lack of license fees becomes an important cost factor when you’re building a Web server farm (a common practice on high-load Web sites). ■■ PHP can connect to MySQL using a native interface, whereas ASP must go through the ODBC layer. Of course, you can choose from a number of other Web servers, such as Netscape Enterprise, Roxen, WebSphere, and iPlanet. MySQL applications will run on those Web servers. However, our focus will be on Apache because it is the most commonly chosen Web server. Server-Application Integration Methods A Web application can interface with a Web server in two primary ways: through CGI (Common Gateway Interface), or through an internal scripting lan- guage or precompiled modules (if the Web server supports those functionali- ties). In the CGI model, the Web application is a stand-alone executable that reads its input from the standard input stream and writes its output to the stan- dard output stream according to a special protocol understood by the Web server. All the application has to do is follow a simple protocol. It can be writ- ten in any language and will work with most Web servers without significant porting effort. Alternatively, a Web server can have the capability to execute a script internally without loading an external interpreter, or to load a precompiled module. In this case, the script or the module can usually run on a Web server that imple- ments the standard. MySQL Client in a Web Environment 106 Although CGI offers more flexibility, a performance penalty is associated with the fact that a Web server must create a separate process on every hit. The internal execution approach overcomes this problem and tends to produce bet- ter results—especially when the execution time is very small and the overhead of process creation is significant. The difference is less significant if the appli- cation executes a long time. In practice, the issue of process creation overhead on modern hardware becomes important for applications that handle more than 100 requests per second. The most common languages for writing CGI Web applications interfacing with MySQL are Perl and C. Perl has the advantage of a faster development cycle, whereas C gives you the upper hand on performance. If you decide to go the internal interpreter route, the two most common options are mod_perl and PHP. Which one is a better choice is a matter of debate. Gen- erally, people who really like Perl prefer mod_perl, but those who do not know Perl that well or are not attached to it prefer PHP. Apache provides another option for increasing performance: You can write a server module in C. The development process is more complicated and requires more skill, but you can get top performance because a module blends with the rest of the server and basically becomes its integral part. If you have a simple targeted Web application that requires absolutely top per- formance, such as a banner server or request logger, you may want to consider writing your own Web server. Although it is a challenging task, it is doable; it can give you serious hardware savings and, in some cases, can make the differ- ence between being able to handle the load and not being able to do so. Web Optimization Techniques The process of optimizing a Web application that connects to MySQL can be broken down into the following areas: ■■ Avoiding long queries ■■ Avoiding unnecessary queries ■■ Avoiding unnecessary dynamic execution ■■ Using persistent connections Avoiding Long Queries Long database queries are probably the top killer of Web performance. A Web developer may write a query, test it on a small dataset, and not notice anything Web Optimization Techniques 107 wrong because the query runs very fast. However, after the server has been in production for a month or two, the response time becomes terrible. The poorly written query is now being executed concurrently from several threads on a large dataset, causing heavy disk I/O and CPU utilization and bringing the server to its knees. Even very experienced developers can put ugly queries in their code by accident. To avoid unoptimized queries, proactive measures are required. First, you must have a sense of how many rows the query you are writing will have to examine. When in doubt, it is a good idea to try the EXPLAIN command on the query in question. Additionally, you should run the query with the log-long-format and log-slow- queries options on your development server, which in combination will log all the queries that are not using a key to the slow log. You should then run EXPLAIN on all the queries you find in the slow log as you develop your appli- cation and see what keys you need to add. In some cases, you may find it acceptable for a query to scan the entire table, but most of the time this is some- thing you should avoid. It is a good idea to add a /* slow */ comment to your query so you can easily distinguish between the queries that are supposed to be slow and those that are slow by mistake in the slow query log. Avoiding Unnecessary Queries Although unnecessary queries are usually not as destructive to the server’s san- ity as slow queries, they still take the edge off your system’s fighting capacity. Of course, if the unnecessary query is also slow, you are dealing with double trouble; but otherwise, the server can usually survive this kind of abuse. Unnecessary queries usually result from errors and oversights in the application design process. Developers may forget they already have retrieved a certain piece of data and can use the value stored on the client. Or, perhaps they do not think the data will need to be reused in other parts of the code, and either do not store it after retrieving it, or do not retrieve it at all (when it could have been retrieved and stored easily without slowing the original query much). Experienced developers make errors of this kind (not just novices), and the best time to catch the mistakes is during development. It’s helpful to learn to visualize your paycheck being reduced by a certain amount for every query you execute in proportion to the number of rows the query has to examine—let’s say one cent per row, and ten cents for initiating a query. (Creative managers might consider making this more than a game; have a set bonus amount that is reduced for inefficiency and also for each day past the deadline.) MySQL Client in a Web Environment 108 In addition to the suggested mental exercise, it is also helpful to enable the log option on the development server, which will log all the queries into the usage log along with the connection ID. Then, periodically run the application and examine the log to evaluate the sanity of the query sequence. This process usu- ally makes it easy to spot unnecessarily repeated queries. Beginning in version 4.0.1, MySQL has a query cache that alleviates the burden of running the same query repeatedly. However, as of this writing the 4.0 branch is still in beta and may take a long time to fully stabilize. Avoiding Unnecessary Dynamic Execution In many cases, you can greatly optimize an application by caching the content. A classic example is a news site. News items arrive fairly often, but rarely more often than once a second. On the other hand, the Web site could easily receive several hits per second. A straightforward approach to a news Web site is to generate the news HTML by querying the database on every hit. Sometimes this approach gives sufficient performance, especially if the news queries are optimized. However, the load may be so high that the developers need to look for an optimization. A more efficient approach with little extra code involves first noticing that a functional dependency exists between the data in the database and the content of the page. In other words, the content of the news page changes only when the database is modified. You can take advan- tage of this observation by making the news page static and regenerating it every time the database is updated. The static-when-possible approach has two advantages. First, you avoid unnec- essary database queries. Second, even without the database queries, serving a static page requires significantly fewer CPU resources and somewhat less memory. The actual performance improvement from switching to a static pages will, of course, depend on the application. As a semi-educated guess, I would expect on average a three- to tenfold speed increase. Using Persistent Connections Some languages, such as PHP, allow you the option of being persistently con- nected to MySQL, as opposed to connecting at the start of the request before the first database access and disconnecting once the request is over. The Apache Web server process that has just handled the request continues to run while waiting to handle another request; so, in some cases it makes sense to not disconnect from the database at all because the next request will come soon. Web Optimization Techniques 109 The process stays connected for its entire lifetime, which in Apache is con- trolled by the value of MaxRequestsPerChild. This process prevents MySQL from having to deal with connection setup over- head, which could be significant on systems that have a problem creating threads under high load (for example, Linux 2.2). Even if thread creation is fast, maintaining a persistent connection still reduces network bandwidth and saves CPU cycles. Unfortunately, the disadvantage of using persistent connections is the follow- ing common scenario: A large number of Web server children are spawned, each connecting to the database. Although as a whole the Web server is busy, each client performs a query only once in a while. This situation creates a large number of connections that are idle most of the time. Although an idle connec- tion consumes memory and CPU resources, they are not significant. The real problem is that the database server reaches its connection limit. To keep this from happening, you must make sure the sum of the maximum number of con- current clients for each Web server connecting to the database (on Apache, the setting is MaxClients) does not exceed the value of max_connections on the MySQL server. For example, if you have a Web server farm with three servers, each having a MaxClients value of 256, your MySQL server should have max_connections set to at least 3*256 = 768. This rule applies even if you are not using persistent connections, although not using persistent connections is less likely to expose the problem if you break that rule. In practice, the rule is often not followed, but everything works fine for a while. Then somebody drops a key or writes a very slow query and adds it to a live application. The database server begins to get overloaded because each client stays connected for a long time. This situation drives the number of concurrent connections up past the limit, and eventually new connection attempts result in an error. You can mitigate the idle connection issues by setting the value of wait_timeout sufficiently low—perhaps 15 seconds. Then, all the connections that have remained idle for more than 15 seconds will be terminated, and the unfortunate client will have to reconnect. This approach accomplishes the desired goal only when most clients perform queries at intervals closer than 15 seconds. Stress-Testing a Web Application It is difficult to overestimate the importance of running a stress test. Many problems that are discovered in production could have been detected and cor- rected with even minimal stress testing. Although no stress test can possibly simulate exactly what will happen when an application goes live, carefully cho- sen stress tests can help identify ugly application bottlenecks. MySQL Client in a Web Environment 110 Using ApacheBench A minimum approach to stress testing involves running the ApacheBench tool (ab), which comes packaged with the Apache distribution and is installed by default on most Linux systems on a fixed URL. You can run a total of 5000 requests coming from 10 concurrent clients as follows ab -c 10 -n 5000 http://localhost/ The output will be similar to this: This is ApacheBench, Version 1.3c <$Revision: 1.38 $> apache-1.3 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/ Server Software: Apache/1.3.12 Server Hostname: localhost Server Port: 80 Document Path: / Document Length: 157 bytes Concurrency Level: 10 Time taken for tests: 8.275 seconds Complete requests: 5000 Failed requests: 0 Total transferred: 2395958 bytes HTML transferred: 785314 bytes Requests per second: 604.23 Transfer rate: 289.54 kb/s received Connection Times (ms) min avg max Connect: 0 3 16 Processing: 4 12 108 Total: 4 15 124 One common mistake is to forget the slash (/) at the end of the URL. You will get an error message if you do, at least in the version of ApacheBench I have tried. ApacheBench has a number of other options, which you can view by executing it with the -h option: Usage: ab [options] [http://]hostname[:port]/path Options are: -n requests Number of requests to perform -c concurrency Number of multiple requests to make -t timelimit Seconds to max. wait for responses -p postfile File containing data to POST -T content-type Content-type header for POSTing -v verbosity How much troubleshooting info to print -w Print out results in HTML tables Stress-Testing a Web Application 111 -i Use HEAD instead of GET -x attributes String to insert as table attributes -y attributes String to insert as tr attributes -z attributes String to insert as td or th attributes -C attribute Add cookie, eg. 'Apache=1234. (repeatable) -H attribute Add Arbitrary header line, eg. 'Accept-Encoding: zop' Inserted after all normal header lines. (repeatable) -A attribute Add Basic WWW Authentication, the attributes are a colon separated username and password. -p attribute Add Basic Proxy Authentication, the attributes are a colon separated username and password. -V Print version number and exit -k Use HTTP KeepAlive feature -h Display usage information (this message) Other Approaches The disadvantage of relying on ApacheBench alone is that it cannot perform dynamic Web requests—it cannot, for example, iterate through a dictionary of possible values for form inputs in order to simulate a more random load. For some applications, this limitation might not make much difference, but for others the lack of variation in the request may seriously misrepresent the appli- cations’ capacities. You can perform more thorough stress testing by doing the following: 1. Find a way to log requests that preserves form variables with their values. For example, your Web code can include a special debugging option that dumps all the variables, or you can do something as sophisticated as modi- fying the source of the Squid proxy to log POST request bodies. If you are not using POST forms in your code, you can simply use your Web access log. 2. Connect to your application from a browser and manually perform as many frequently run operations as possible as if you were a user. 3. Extract the requests from your logs. Replace the actual values of the form inputs with a special placeholder tag that indicates it is to be replaced with a dictionary value. For each form input, have a separate dictionary refer- ence. For example, replace first_name=Steve&last_name=Jones with first_name=${first_name}&last_name=${last_name} MySQL Client in a Web Environment 112 4. Create dictionary files for each form input, populating them a wide range of possible values. 5. Write a program that parses your log files and plays back the requests ran- domly, replacing dictionary references with the appropriate values from the dictionary. You may want to use ApacheBench as a base; or, if you pre- fer Perl, you can use the LWP::UserAgent module. 6. Run the load, see what happens, and then fix the bugs in your application. If a commercial solution exists that will let you do all this with less hassle, I am not aware of it. (Perhaps this is because I have always found it easier and quicker to solve a problem by downloading free tools from the Internet and making up for the missing functionality with my own code than spending time looking for a commercial solution that will do the job for me.) Stress-Testing a Web Application 113 T he purpose of this chapter is to give you a head start on writing MySQL client applications in C/C++. In all truth, this chapter is about C. Although there exists a separate C++ API called MySQL++, it is not very stable, and is rather redundant because the C API works perfectly with the C++ code. If you plan to write a client in C++, I recommend that you use the direct C API interface described in this chapter. Even if you do not plan to develop in C/C++, it is still important to understand the C API because it serves as a basis for all other interfaces (with the excep- tion of Java). All other interfaces are simply wrappers around the C API calls. This chapter provides instructions for setting up your system, a concise intro- duction to the API, and then a fully functional sample program. We conclude the chapter with a few useful tips. Preparing Your System All you basically need to do to be able to write MySQL applications in C/C++ as far as system administration is concerned is to install the client library file and the headers. If you are using a binary distribution on Unix, the library will be located in /usr/local/mysql/lib/mysql/ and the header files in /usr/local/mysql/include/. The same locations apply if you have installed from source using configure defaults (except unlike the binary distribution, the source distribution by default compiles and installs a shared library instead of a static library). C/C++ Client Basics CHAPTER 8 115 [...]... more information at www .mysql. com/doc/en/ Debugging_client.html int mysql_ dump_debug_info (MYSQL *mysql) Causes the server to write debugging information to the MySQL error log unsigned int mysql_ errno (MYSQL *mysql) Returns the error code for the most recently executed MySQL function If the previous function succeeded, it returns 0 118 C/C++ Client Basics char *mysql_ error (MYSQL *mysql) Returns the error... examples, at www .mysql. com/documentation /mysql/ bychapter/manual_Clients.html#C my_ulonglong mysql_ affected_rows (MYSQL *mysql) Returns a count of the rows affected by an INSERT, UPDATE, or DELETE query my_bool mysql_ change_user (MYSQL *mysql, const char *username, const char *password, const char *database) Changes the current user to a different one const char *mysql_ character_set_name (MYSQL *mysql) Returns... result set was retrieved with mysql_ store_result() MYSQL_ ROW_OFFSET mysql_ row_tell (MYSQL_ RES *result) Returns the current row pointer in the result set int mysql_ select_db (MYSQL *mysql, const char *database) Attempts to change the current database to the value specified by the database argument Returns 0 on success and a non-zero value on failure int mysql_ shutdown (MYSQL *mysql) Tells the server to shut... function returns NULL mysql_ field_count() can be used to test whether the last query was supposed to return a result set If it was, mysql_ field_count() returns a non-zero value unsigned long mysql_ thread_id (MYSQL *mysql) Returns the thread ID of the connection specified by the argument The return value can be used as an argument to mysql_ kill() MYSQL_ RES *mysql_ use_result (MYSQL *mysql) Allocates a result... *result) Returns the number of rows in the result set int mysql_ options (MYSQL *mysql, enum mysql_ option option, const char *arg) Sets the options for a connection to the database You should specify the options before making the connection More detailed documentation is available at www .mysql. com/doc/en /mysql_ options.html int mysql_ ping (MYSQL *mysql) Pings a database server If the server is alive, the... query int mysql_ reload (MYSQL *mysql) Causes the server to reload the grant tables MYSQL_ ROW_OFFSET mysql_ row_seek (MYSQL_ RES *result, MYSQL_ ROW_OFFSET Structures and Functions of the AP I 121 offset) Moves the current row pointer in the result set to the row specified by the offset The second argument is a pointer to a structure, not just a number It must be the return value of mysql_ row_tell() or mysql_ row_seek()... the previous value of the field cursor MYSQL_ FIELD_OFFSET mysql_ field_tell (MYSQL_ RES *result) Returns the current value of the field cursor void mysql_ free_result (MYSQL_ RES *result) Frees all of the memory associated with the result set char *mysql_ get_client_info(void) Returns a string with the current library version it is using char *mysql_ get_host_info (MYSQL *mysql) Returns a string describing the... privileges Returns 0 on success and a non-zero value on failure char *mysql_ stat (MYSQL *mysql) Returns a string with server statistics such as uptime, number of queries since startup, and number of queries per second Returns NULL on failure MYSQL_ RES *mysql_ store_result (MYSQL *mysql) Should be called immediately after mysql_ query() or mysql_ real_query() Retrieves the entire result set from the server... server Structures and Functions of the AP I 119 unsigned int mysql_ get_proto_info (MYSQL *mysql) Returns the client-server protocol version used in the connection associated with the argument char *mysql_ get_server_info (MYSQL *mysql) Returns a string containing the server version of the connection described by the argument char *mysql_ info (MYSQL *mysql) Returns a string containing some information about... to call die() before a call to mysql_ init() as long as we do it with flags set to 0 If one of the flags is set, we will be using the MYSQL structure, which needs to be initialized first */ if ((flags & DIE _MYSQL_ ERROR)) fprintf(stderr, " MySQL errno = %d: %s", mysql_ errno( &mysql) , mysql_ error( &mysql) ); /* Note the order - we first print the error, and only then call mysql_ close() If we did it in reverse . www .mysql. com/doc/en/ Debugging_client.html. int mysql_ dump_debug_info (MYSQL *mysql) Causes the server to write debugging information to the MySQL error log. unsigned int mysql_ errno (MYSQL *mysql) Returns. is NULL. MYSQL_ RES *mysql_ list_processes (MYSQL *mysql) Returns a result set with a list of all threads currently executing on the database server. MYSQL_ RES *mysql_ list_tables (MYSQL *mysql, const. transferred: 7853 14 bytes Requests per second: 6 04. 23 Transfer rate: 289. 54 kb/s received Connection Times (ms) min avg max Connect: 0 3 16 Processing: 4 12 108 Total: 4 15 1 24 One common mistake