Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 60 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
60
Dung lượng
809,65 KB
Nội dung
40 Simpo n Working with Boost Threads Chapter PDF Merge and Split Unregistered Version - http://www.simpopdf.com BOOST_FOREACH(biglong prime, primes) { if (count < 100) std::cout ( prime ( ","; else if (count == primeCount-100) std::cout ( "\n\nLast 100 primes:\n"; else if (count > primeCount-100) std::cout ( prime ( ","; count++; } std::cout ( "\n"; system("pause"); return 0; } This new version of the primality test replaces the core loop of the findPrimes() function Previously, variable testDivisor was incremented until the root of a candidate was reached, to test for primality Now, testDivisor is the increment variable in a BOOST_FOREACH loop which pulls previously stored primes out of the list This is a significant improvement over testing every divisor from up to the root of a candidate (blindly) What about the results? As Figure 2.4 shows, the runtime for a 10 million candidate test is down from 22 seconds to 4.7 seconds! This is a new throughput of 141,369 primes per second—nearly five times faster Optimizing the Primality Test: Odd Candidates There is no need to test even candidates because they will never be prime anyway! We can start testing divisors and candidates at 3, rather than 2, and then increment candidates by so that the evens are skipped entirely We will just have to print out “2” first since it is no longer being tested, but that’s no big deal Here is the improved version This project is called Prime Number Test #include #include #include #include Simpo PDF Merge and Split Unregistered Version -Punishing a Single Core http://www.simpopdf.com 41 Figure 2.4 Using primes as divisors improves performance nearly five-fold #include #include //declare a 64-bit long integer type typedef unsigned long long biglong; const long MILLION = 1000000; biglong highestPrime = 10*MILLION; boost::timer timer1; std::list primes; long findPrimes(biglong rangeFirst, biglong rangeLast) { long count = 0; biglong candidate = rangeFirst; if (candidate < 3) candidate = 3; primes.push_back( ); while(candidate 0) //second option //{ //use the idle time //} pthread_mutex_destroy(&mutex); cout ( "Counter = " ( counter ( endl; cout ( "Run time = " ( timer1.elapsed() ( endl; system("pause"); return 0; } Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Introducing the POSIX Threads Library 87 Here is the output from the ThreadDemo program with threads Note that each thread reports about the same runtime, because each thread continues to increment the counter until the threshold is reached As a result, the final counter value will be a bit higher than the target value while the rest of the threads finish one last loop Thread Demo Creating thread Creating thread Creating thread Creating thread Creating thread Creating thread Creating thread Creating thread Done creating threads Waiting for threads thread time = 17.993 thread time = 17.96 thread time = 17.977 thread time = 17.995 thread time = 17.997 thread time = 17.984 thread time = 18 thread time = 17.986 Counter = 100000007 Run time = 18.021 Here is the output from the program using only threads This version is a bit more effective since the PC being used in this example is a quad-core Intel Q6600 If you have a CPU with eight or more hardware threads, such as an Intel Core i7, then you will see an improvement to the 8-thread version of the program Thread Demo Creating thread Creating thread Creating thread Creating thread Done creating threads Waiting for threads 88 Simpo n Working with POSIX Threads Chapter PDF Merge and Split Unregistered Version - http://www.simpopdf.com thread time = 13.029 thread time = 13.025 thread time = 13.027 thread time = 13.024 Counter = 100000004 Run time = 13.04 Comparing Single-thread Results Now that we have some good data representing multi-threaded results, let’s take a look at the output from a single-thread run of the program Thread Demo Creating thread Done creating threads Waiting for threads thread time = 4.174 Counter = 100000001 Run time = 4.178 Oh no! This result is about two to three times faster than the more highly threaded versions of the ThreadDemo program! And I was feeling pretty confident about the results This just goes to show that you must test your results several ways before deciding on a technique to use So, why were the threaded results so much slower than a single-core version? The slowdown occurs inside the thread function due to the mutex locks The while loop running inside that thread function is very tight and fast, meaning there are a lot of mutex roadblocks preventing threads from processing the counter variable What alternatives are there to this problem? We ran into a similar problem back in the boost::thread chapter, as you may recall First of all, we are leaving all of the “thinking” in this program to the thread function: counting to 100 million with the conditional logic inside the while loop It’s better if a thread is allowed to run without having to “think” very much So, what we need to is split up the problem so that each thread can crunch the numbers exclusively, without sharing or mutex problems There will be just one time when the mutex is used—to update our global counter variable with the local counter used in the function (without any concern for conflicts) Let’s update the code #include #pragma comment(lib,"pthreadVC2.lib") Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Introducing the POSIX Threads Library 89 #include #include using namespace std; #include const long MAX = 100 * 1000000; long counter = 0; const int THREADS = 4; pthread_t threads[THREADS]; pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; bool done = false; void* thread_function(void* data) { boost::timer t; int id = *((int*)data); long local_counter = 0; long range = MAX / THREADS; for (long n = 0; n < range; n++) { local_counter++; } pthread_mutex_lock(&mutex); //update global counter counter += local_counter; if (counter > MAX) done = true; cout ( "counter = " ( counter ( endl; pthread_mutex_unlock(&mutex); cout ( "thread " ( id ( " time = " ( t.elapsed() ( endl; return 0; } int main(int argc, char argv[]) { cout ( "Thread Demo 2" ( endl; 90 Simpo n Working with POSIX Threads Chapter PDF Merge and Split Unregistered Version - http://www.simpopdf.com boost::timer timer1; //create the thread(s) for (int id = 0; id < THREADS; id++) { cout ( "Creating thread " ( id ( endl; pthread_create(&threads[id], NULL, thread_function, (void*)&id); } cout ( "Done creating threads" ( endl; cout ( "Waiting for threads." ( endl; //wait for all threads to complete for (int n = 0; n < THREADS; n++) pthread_join(threads[n], 0); pthread_mutex_destroy(&mutex); cout ( "Counter = " ( counter ( endl; cout ( "Run time = " ( timer1.elapsed() ( endl; system("pause"); return 0; } This new version, called ThreadDemo 2, produces slightly improved results (to put it mildly!) Here is a new 4-thread result that’s more realistic since thread lock is not hampering performance The runtime is 0.077 seconds (77 ms) Thread Demo Creating thread Creating thread Creating thread Creating thread Done creating threads Waiting for threads counter = 25000000 thread time = 0.06 counter = 50000000 thread time = 0.062 counter = 75000000 thread time = 0.058 counter = 100000000 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Introducing the POSIX Threads Library 91 thread time = 0.065 Counter = 100000000 Run time = 0.077 And, for comparison, let’s run the 8-thread version with a counter target of 10 billion (up from 100 million) The result is only 1.094 seconds! In this new example, having more threads actually helped, because, as you can see, the first few threads finished before the main program had a chance to create all of the threads! What we’re seeing here is pipeline optimizations occurring inside the CPU, thanks to an optimizing compiler Be sure to run your performance code with Release build Thread Demo Creating thread Creating thread Creating thread Creating thread Creating thread Creating thread Creating thread Creating thread Done creating threads Waiting for threads counter = 176258176 thread time = 0.566 counter = 352516352 thread time = 0.754 counter = 528774528 thread time = 0.791 counter = 705032704 thread time = 0.874 counter = 881290880 thread time = 0.863 counter = 1057549056 thread time = 0.972 counter = 1233807232 thread time = 1.064 counter = 1410065408 thread time = 1.002 Counter = 1410065408 Runtime = 1.094 92 Simpo n Working with POSIX Threads Chapter PDF Merge and Split Unregistered Version - http://www.simpopdf.com Table 4.1 ThreadDemo Results (Intel Q6600 CPU*) VERSION MAX 100,000,000 global counter 100,000,000 global counter 100,000,000 global counter 100,000,000 local counter 100,000,000 local counter 100,000,000 local counter 1,000,000,000 local counter 1,000,000,000 local counter 1,000,000,000 local counter 10,000,000,000 local counter 10,000,000,000 local counter 10,000,000,000 local counter THREADS TIME (sec) 8 8 13.04 18.021 4.178 0.077 0.092 0.275 0.822 0.741 2.635 1.028 1.094 3.73 *Note: For a precise comparison with other systems, this Core2Quad Q6600 CPU has been overclocked from the base 2.4 to 2.6 GHz Table 4.1 shows the results of the program running with various MAX and THREADS values Included in the table are results for much higher MAX ranges just for a general comparison Summary Writing a multi-threaded game is now even closer to reality with the new knowledge gained in this chapter about Pthreads Performing a mere accumulation on a variable is hardly a test worthy of benchmark comparison, but the examples were meant to be simple to understand rather than technically intriguing I recommend performing some real calculations in the thread worker function to put the CPU cores and Pthreads code to the test more effectively Next, it’s up to you to ultimately decide which of the four threading libraries is most effective: Boost threads, OpenMP, Windows threads, or Pthreads? We have yet to cover Windows threads in any detail, so that is the topic for the next chapter Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com References 93 References “Pthreads-Win32”; http://sourceware.org/pthreads-win32/ “Stream processing”; http://en.wikipedia.org/wiki/Stream_processing Butenhof, David R Programming with POSIX Threads Unknown City: Addison-Wesley 1997 “Pthreads Tutorial”; http://students.cs.byu.edu/~cs460ta/cs460/labs/pthreads html Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com This page intentionally left blank Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com chapter Working with Windows Threads This chapter explores the threading library built into Windows Using Windows threads is a bit easier than either POSIX threads or Boost threads because Windows threading is already installed with Visual Cỵỵ and available via windows.h, so there is no extra effort needed on our part to use Windows threads We’ll see how to create and control a thread in this chapter, which will be familiar to you by now after covering both POSIX threads and Boost threads previously The differences in the thread support built into Windows is minimal as the same concepts are found here Topics covered in this chapter include: n Exploring Windows threads n Creating a thread n Controlling thread execution n The thread function n Thread function parameters Exploring Windows Threads The Windows Platform SDK and the other support files installed with Visual Cỵỵ provide support for threads via just the windows.h header file We will learn how to invoke a new thread, via a thread function, and control it to a limited degree 95 96 Simpo n Working with Windows Threads Chapter PDF Merge and Split Unregistered Version - http://www.simpopdf.com First, let’s begin with a simple example to give you a feel for the overall process of using Windows threads Quick Example The simplest example of using a Windows thread involves creating a new thread with the CreateThread() function and then calling the WaitForSingleObject() function to pause execution of the main program until the thread has finished running (when the thread function returns) I always like to see a quick, simple example of a new library or algorithm, so here is one such example for you We’ll go over Windows threads in more detail and cover additional features in the next section #include #include DWORD WINAPI threadFunction1( void* data ) { std::cout ( "threadFunction1 running\n"; return 0; } int main(int argc, char argv[]) { HANDLE thread1 = NULL; std::cout ( "creating thread\n"; thread1 = CreateThread(NULL,0,&threadFunction1,NULL,0,NULL); WaitForSingleObject(thread1,0); return 0; } That quick example produces the following output However, if the thread starts running before the console finishes printing the text passed to std::cout, it’s possible that the two lines could be garbled together—this happens frequently when working with threads creating thread threadFunction1 running Creating a Thread The CreateThread() function is used to create a thread: CreateThread( LPSECURITY_ATTRIBUTES lpThreadAttributes, Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Exploring Windows Threads 97 SIZE_T dwStackSize, LPTHREAD_START_ROUTINE lpStartAddress, LPVOID lpParameter, DWORD dwCreationFlags, LPDWORD lpThreadId ); The parameters of CreateThread() are explained in the following table lpThreadAttributes dwStackSize lpStartAddress lpParameter dwCreationFlags lpThreadId Security attributes (usually NULL) Initial stack size (0 for default) Address to thread function Pointer to parameter variable passed to function Thread creation flags (usually 0) Pointer to a variable for the thread id Every thread needs a HANDLE variable (a renamed void* pointer), so this variable must be global or managed by a class HANDLE thread1 = NULL; If you want to know the identifier value assigned to a new thread, that value is passed back in the last parameter— threadid The data type is LPDWORD, which is defined as a DWORD*, or unsigned long* LPDWORD threadid = NULL; Sample usage: thread1 = CreateThread(NULL,0,&threadFunction1,NULL,0,threadid); If the return value (thread1) is NULL, then an error occurred during the thread creation process This is rare About the only thing that could cause an error is a bad memory location specified for the thread function, thread parameter, or thread identifier Controlling Thread Execution The dwCreationFlags parameter is of particular interest, since we can use the constant CREATE_SUSPENDED to create a thread that starts off in a suspended state, waiting for us to call ResumeThread() to allow it to launch This can be helpful if we want to any housekeeping before the thread function begins running 98 Simpo n Working with Windows Threads Chapter PDF Merge and Split Unregistered Version - http://www.simpopdf.com DWORD WINAPI ResumeThread( HANDLE hThread ); We can control the execution of a thread by using the CREATE_SUSPENDED flag as a parameter when calling CreateThread() First, we create the new thread: thread1=CreateThread(NULL,0,&threadFunction1,NULL,CREATE_SUSPENDED,threadid); Then we start it running by calling ResumeThread(): ResumeThread(thread1); The Thread Function The thread function starts executing when the thread is created Normally, a thread function will have some distinct process to run, such as sorting data in a list or reading a file But often a thread function will contain a while loop and it will continue to run for a longer period in parallel with the main program When this is the case, we get into difficulties with sharing global variables, which is not permitted—no two threads can access the same memory location at the same time without causing a serious crash (a page fault on most systems) Here is the definition of a thread function with the correct return type and parameter type Use the address of your thread function name when creating a new thread The LPVOID parameter is defined as void* DWORD WINAPI function( LPVOID data ); Thread Function Parameters We have been passing NULL for the thread function parameter when creating a new thread, but this parameter can be a variable declared as a simple data type or a struct There are many reasons why you might need to pass data into a thread function: to specify how many items to process, or a pointer to a buffer in memory, for instance The only real drawback to using parameters with Windows threads is the ugly way in which the parameter data must be created (on the heap), maintained via pointers, and destroyed (from the heap) afterward In this respect, Windows thread programming is not as convenient as either POSIX or Boost Here is an example: typedef struct MyParam { Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Exploring Windows Threads 99 int value1; int value2; } MYPARAM, *PMYPARAM; We can create a variable using this custom struct and pass it to CreateThread(), which will make the values available to the thread function (as a void* buffer) We must allocate the parameter struct variable on the heap and pass it to the thread function as a pointer This is handled by the HeapAlloc() and HeapFree() functions PMYPARAM param; param = (PMYPARAM)HeapAlloc(GetProcessHeap(),HEAP_ZERO_MEMORY,sizeof(PMYPARAM)); When finished with the parameter data, free it with HeapFree(): HeapFree(GetProcessHeap(), 0, param); Let’s see a complete example: #include #include #include #include using namespace std; typedef struct MyParam { int value1; int value2; } MYPARAM, *PMYPARAM; DWORD WINAPI threadFunction1( void* data ) { cout ( "thread function running\n"; PMYPARAM param = (PMYPARAM)data; cout ( "parameter: " ( param->value1 ( "," ( param->value2 ( endl; cout ( "thread function end\n"; return 0; } ... 5,761,454 50,847,533 0 .24 1 1.837 4.484 15.494 40 .24 4 1 02. 7 92 2,347.1 62 4,166,666 2, 721 , 829 2, 230,151 1,613, 527 1 ,24 2, 421 9 72, 838 426 ,046 327 ,070 189,718 148 ,21 1 101,066 74,573 56,049 21 ,663 *Intel Q6600... referenced by your compiler That means you must add the pthreads-w 32- 2-8-0-release \Pre-built .2\ include and pthreads-w 32- 2-8-0-releasePre-built .2\ lib folders to your compiler’s include and library search... highestPrime /2; boost::thread thread1( findPrimes, 0, range1 ); std::cout ( "creating thread 2\ n"; biglong range2 = highestPrime; boost::thread thread2( findPrimes, range1+1, range2 ); std::cout