Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 46 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
46
Dung lượng
207,6 KB
Nội dung
Chapter 18. Performance TuningPerformancetuning is a many-splendored thing. Just because Python is an interpreted language doesn't mean you shouldn't worry about code optimization. But don't worry about it too much. 18.1. Diving in There are so many pitfalls involved in optimizing your code, it's hard to know where to start. Let's start here: are you sure you need to do it at all? Is your code really so bad? Is it worth the time to tune it? Over the lifetime of your application, how much time is going to be spent running that code, compared to the time spent waiting for a remote database server, or waiting for user input? Second, are you sure you're done coding? Premature optimization is like spreading frosting on a half-baked cake. You spend hours or days (or more) optimizing your code for performance, only to discover it doesn't do what you need it to do. That's time down the drain. This is not to say that code optimization is worthless, but you need to look at the whole system and decide whether it's the best use of your time. Every minute you spend optimizing code is a minute you're not spending adding new features, or writing documentation, or playing with your kids, or writing unit tests. Oh yes, unit tests. It should go without saying that you need a complete set of unit tests before you begin performance tuning. The last thing you need is to introduce new bugs while fiddling with your algorithms. With these caveats in place, let's look at some techniques for optimizing Python code. The code in question is an implementation of the Soundex algorithm. Soundex was a method used in the early 20th century for categorizing surnames in the United States census. It grouped similar- sounding names together, so even if a name was misspelled, researchers had a chance of finding it. Soundex is still used today for much the same reason, although of course we use computerized database servers now. Most database servers include a Soundex function. There are several subtle variations of the Soundex algorithm. This is the one used in this chapter: 1. Keep the first letter of the name as-is. 2. Convert the remaining letters to digits, according to a specific table: * B, F, P, and V become 1. * C, G, J, K, Q, S, X, and Z become 2. * D and T become 3. * L becomes 4. * M and N become 5. * R becomes 6. * All other letters become 9. 3. Remove consecutive duplicates. 4. Remove all 9s altogether. 5. If the result is shorter than four characters (the first letter plus three digits), pad the result with trailing zeros. 6. if the result is longer than four characters, discard everything after the fourth character. For example, my name, Pilgrim, becomes P942695. That has no consecutive duplicates, so nothing to do there. Then you remove the 9s, leaving P4265. That's too long, so you discard the excess character, leaving P426. Another example: Woo becomes W99, which becomes W9, which becomes W, which gets padded with zeros to become W000. Here's a first attempt at a Soundex function: Example 18.1. soundex/stage1/soundex1a.py If you have not already done so, you can download this and other examples used in this book. import string, re charToSoundex = {"A": "9", "B": "1", "C": "2", "D": "3", "E": "9", "F": "1", "G": "2", "H": "9", "I": "9", "J": "2", "K": "2", "L": "4", "M": "5", "N": "5", "O": "9", "P": "1", "Q": "2", "R": "6", "S": "2", "T": "3", "U": "9", "V": "1", "W": "9", "X": "2", "Y": "9", "Z": "2"} def soundex(source): "convert string to Soundex equivalent" # Soundex requirements: # source string must be at least 1 character # and must consist entirely of letters allChars = string.uppercase + string.lowercase if not re.search('^[%s]+$' % allChars, source): return "0000" # Soundex algorithm: # 1. make first character uppercase source = source[0].upper() + source[1:] # 2. translate all other characters to Soundex digits digits = source[0] for s in source[1:]: s = s.upper() digits += charToSoundex[s] # 3. remove consecutive duplicates digits2 = digits[0] for d in digits[1:]: if digits2[-1] != d: digits2 += d # 4. remove all "9"s digits3 = re.sub('9', '', digits2) # 5. pad end with "0"s to 4 characters while len(digits3) < 4: digits3 += "0" # 6. return first 4 characters return digits3[:4] if __name__ == '__main__': from timeit import Timer names = ('Woo', 'Pilgrim', 'Flingjingwaller') for name in names: statement = "soundex('%s')" % name t = Timer(statement, "from __main__ import soundex") print name.ljust(15), soundex(name), min(t.repeat()) Further Reading on Soundex * Soundexing and Genealogy gives a chronology of the evolution of the Soundex and its regional variations. 18.2. Using the timeit Module The most important thing you need to know about optimizing Python code is that you shouldn't write your own timing function. Timing short pieces of code is incredibly complex. How much processor time is your computer devoting to running this code? Are there things running in the background? Are you sure? Every modern computer has background processes running, some all the time, some intermittently. Cron jobs fire off at consistent intervals; background services occasionally “wake up” to do useful things like check for new mail, connect to instant messaging servers, check for application updates, scan for viruses, check whether a disk has been inserted into your CD drive in the last 100 nanoseconds, and so on. Before you start your timing tests, turn everything off and disconnect from the network. Then turn off all the things you forgot to turn off the first time, then turn off the service that's incessantly checking whether the network has come back yet, then . And then there's the matter of the variations introduced by the timing framework itself. Does the Python interpreter cache method name lookups? Does it cache code block compilations? Regular expressions? Will your code have side effects if run more than once? Don't forget that you're dealing with small fractions of a second, so small mistakes in your timing framework will irreparably skew your results. The Python community has a saying: “Python comes with batteries included.” Don't write your own timing framework. Python 2.3 comes with a perfectly good one called timeit. Example 18.2. Introducing timeit If you have not already done so, you can download this and other examples used in this book. >>> import timeit >>> t = timeit.Timer("soundex.soundex('Pilgrim')", [...]... 1000000)) 8.22203948912 Tip The timeit module only works if you already know what piece of code you need to optimize If you have a larger Python program and don't know where your performance problems are, check out the hotshot module 18.3 Optimizing Regular Expressions The first thing the Soundex function checks is whether the input is a nonempty string of letters What's the best way to do this? If you answered... in the corner and contemplate your bad instincts Regular expressions are almost never the right answer; they should be avoided whenever possible Not only for performance reasons, but simply because they're difficult to debug and maintain Also for performance reasons This code fragment from soundex/stage1/soundex1a.py checks whether the function argument source is a word made entirely of letters, with... return "0000" How much did we gain by using this specific method in soundex1e.py? Quite a bit C:\samples\soundex\stage1>python soundex1e.py Woo W000 13.5069504644 Pilgrim P426 18.2 199394057 Flingjingwaller F452 28.9975225902 Example 18.3 Best Result So Far: soundex/stage1/soundex1e.py import string, re charToSoundex = {"A": "9", "B": "1", "C": "2", "D": "3", "E": "9", "F": "1", "G": "2", "H": "9", "I":... translates each character into the corresponding digit, according to the matrix defined by string.maketrans timeit shows that soundex2c.py is significantly faster than defining a dictionary and looping through the input and building the output incrementally: C:\samples\soundex\stage2>python soundex2c.py Woo W000 11.437645008 Pilgrim P426 13.2825062962 Flingjingwaller F452 18.5 570110168 You're not going... print name.ljust(15), soundex(name), min(t.repeat()) 18.5 Optimizing List Operations The third step in the Soundex algorithm is eliminating consecutive duplicate digits What's the best way to do this? Here's the code we have so far, in soundex/stage2/soundex2c.py: digits2 = digits[0] for d in digits[1:]: if digits2[-1] != d: digits2 += d Here are the performance results for soundex2c.py: C:\samples\soundex\stage2>python... is not faster: C:\samples\soundex\stage2>python soundex2a.py Woo W000 15.0097526362 Pilgrim P426 19.254806407 Flingjingwaller F452 29.3790847719 The overhead of the anonymous lambda function kills any performance you gain by dealing with the string as a list of characters soundex/stage2/soundex2b.py uses a list comprehension instead of and lambda: source = source.upper() digits = source[0] + "".join([charToSoundex[c]... names = ('Woo', 'Pilgrim', 'Flingjingwaller') for name in names: statement = "soundex('%s')" % name t = Timer(statement, "from main import soundex") print name.ljust(15), soundex(name), min(t.repeat()) 18.4 Optimizing Dictionary Lookups The second step of the Soundex algorithm is to convert characters to digits in a specific pattern What's the best way to do this? The most obvious solution is to define... P426 13.2825062962 Flingjingwaller F452 18.5 570110168 You're not going to get much better than that Python has a specialized function that does exactly what you want to do; use it and move on Example 18.4 Best Result So Far: soundex/stage2/soundex2c.py import string, re allChar = string.uppercase + string.lowercase charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2) isOnlyChars... needs to create a new string each time through the loop, then discard the old one Python is good at lists, though It can treat a string as a list of characters automatically And lists are easy to combine into strings again, using the string method join() Here is soundex/stage2/soundex2a.py, which converts letters to digits by using and lambda: def soundex(source): # source = source.upper() digits = source[0]... even be slightly slower (although it's not enough of a difference to say for sure): C:\samples\soundex\stage3>python soundex3a.py Woo W000 11.5346048171 Pilgrim P426 13.3950636184 Flingjingwaller F452 18.6 108927252 Why isn't soundex3a.py faster? It turns out that list indexes in Python are extremely efficient Repeatedly accessing digits2[-1] is no problem at all On the other hand, manually maintaining . Chapter 18. Performance Tuning Performance tuning is a many-splendored thing. Just because Python is. without saying that you need a complete set of unit tests before you begin performance tuning. The last thing you need is to introduce new bugs while fiddling