Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 547 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
547
Dung lượng
2,39 MB
Nội dung
Abrash/Zen: Front Matter/ + -+ ¦ ¦ ¦THE ZENOFASSEMBLYLANGUAGE ¦ ¦ ¦Volume I: Knowledge ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦by Michael Abrash ¦ ¦ ¦ ¦ -¦ ¦ ¦ ¦ For the ¦ ¦ Scott, Foresman Assembling Series ¦ ¦ ¦ + -+ ¦ Abrash/Zen: Front Matter/ Michael Abrash 1599 Bittern Drive Sunnyvale, CA 94087 (408) 733-3945 (H) (415) 361-8883 (W) Abrash/Zen: Front Matter/ For Shay and Emily Abrash/Zen: Front Matter/ + -+ ¦¦¦¦¦ ¦ ¦¦¦¦¦ Introduction: Pushing the Envelope ¦ ¦¦¦¦¦ ¦ + -+ This is the book I wished for with all my heart seven years ago, when I started programming the IBM PC: the book that unlocks the secrets of writing superb assemblylanguage code There was no such book then, so I had to learn the hard way, through experimentation and through trial and error Over the years, I waited in vain for that book to appear; I looked everywhere without success for a book about advanced assembly- language programming, a book written specifically for assembly- language programmers who want to get better, rather than would-be assembly-language programmers I'm sure many of you have waited for such a book as well Well, wait no longer: this is that book TheZenofAssemblyLanguage assumes that you're already familiar with assemblylanguage Not an expert, but at least acquainted with the registers and instructions ofthe 8088, and with the use of one ofthe popular PC assemblers Your familiarity with assemblylanguage will allow us to skip over the droning tutorials about the use ofthe assembler and the endless explanations of binary arithmetic that take up hundreds of pages in introductory books We're going to jump into high-performance programming right from the start, and when we come up for air 16 chapters from now, your view ofassemblylanguage will be forever altered for the better Then we'll leap right back into Volume II, applying our newfound knowledge ofassemblylanguage to ever- more-sophisticated programming tasks In short, TheZenof Assembler is about nothing less than how to become the best assembly-language programmer you can be Abrash/Zen: Front Matter/ WHY ASSEMBLY LANGUAGE? For years, people have been predicting hoping for the demise ofassembly language, claiming that the world is ready to move on to less primitive approaches to programming and for years, the best programs around have been written in assemblylanguage Why is this? Simply because assemblylanguage is hard to work with, but properly used-produces programs of unparalleled performance Mediocre programmers have a terrible time working with assembly language; on the other hand, assemblylanguage is, without fail, thelanguage that PC gurus use when they need the best possible code Which brings us to you Do you want to be a guru? I'd imagine so, if you're reading this book You've set yourself an ambitious and difficult goal, and your success is far from guaranteed There's no sure-fire recipe for becoming a guru, any more than there's a recipe for becoming a chess grand master There is, however, one way you can greatly improve your chances: become an expert assemblylanguage programmer Assemblylanguage won't by itself make you a guru but without it you'll never reach your full potential as a programmer Why is assemblylanguage so important in this age of optimizing compilers and program generators? Assemblylanguage is fundamentally different from all other languages, as we'll see throughout TheZenofAssemblyLanguageAssemblylanguage lets you use every last resource ofthe PC to push the performance envelope; only in assemblylanguage can you press right up against the inherent limits ofthe PC If you aren't pushing the envelope, there's generally no reason to program in assembler High-level languages are certainly easier to use, and nowadays most high-level languages let you get at the guts ofthe PC display memory, DOS functions, interrupt vectors, Abrash/Zen: Front Matter/ and so on without having to resort to assembler If, in the other hand, you're striving for the sort of performance that will give your programs snappy interfaces and crackling response times, you'll find assemblylanguage to be almost magical, for no other language even approaches assembler for sheer speed Of course, no one tests the limits ofthe PC with their first assembler program; that takes time and practice While many PC programmers know something about assembler, few are experts The typical programmer has typed in the assembler code from an article or two, read a book about assembler programming, and perhaps written a few assembler programs of his own-but doesn't yet feel that he has mastered thelanguage If you fall into this category, you've surely sensed the remarkable potential of assembler, but you're also keenly aware of how hard it is to write good assembler code and how much you have yet to learn In all likelihood, you're not sure how to sharpen your assembler skills and take that last giant step toward mastery of your PC This book is for you Welcome to the most exciting and esoteric aspect ofthe IBM PC TheZenofAssemblyLanguage will teach you how to create blindingly fast code for the IBM PC More important still, it will teach you how to continue to develop your assembler programming skills on your own TheZenofAssemblyLanguage will show you a way to learn what you need to know as the need arises, and it is that way of learning that will serve you well for years to come There are facts and code aplenty in this book and in the companion volume, but it is a way of thinking and learning that lies at the heart ofTheZenofAssemblyLanguage Don't take the title to mean that this is a mystical book in any way In the context of assembly-language programming, Zen is a technique that brings intuition and non-obvious approaches to bear on difficult problems and puzzles If you would rather think of high- Abrash/Zen: Front Matter/ performance assembler programming as something more mundane, such as right-brained thinking or plain old craftsmanship, go right ahead; good assembler programming is a highly individualized process TheZenofAssemblyLanguage is specifically about assemblylanguage for the IBM PC (and, by definition, compatible computers) In particular, the bulk of this volume will focus on the capabilities ofthe 8088 processor that lies at the heart ofthe PC However, many ofthe findings and almost all ofthe techniques I'll discuss can also be applied to assembly-language programming for the other members of Intel's 808X processor family, including the 80286 and 80386 processors, as we'll see toward the end of this volume TheZenofAssemblyLanguage doesn't much apply to computers built around other processors, such as the 68XXX family, the Z80, the 8080, or the 6502, since a great deal oftheZenofassemblylanguage in the case ofthe IBM PC derives from the highly unusual architecture ofthe 808X family (In fact, the processors in the 808X family lend themselves beautifully to assembly language, much more so than other currently-popular processors.) While I will spend a chapter looking specifically at the 80286 found in the AT and PS/2 Models 50 and 60 and at the 80386 found in the PS/2 Model 80, I'll concentrate primarily on the 8088 processor found in the IBM PC and XT, for a number of reasons First, there are at least 15,000,000 8088-based computers around, ensuring that good 8088 code isn't going to go out of style anytime soon Second, the 8088 is far and away the slowest ofthe processors used in IBM-compatible computers, so no matter how carefully code is tailored to the subtleties ofthe 8088, it's still going to run much faster on an 80286 or 80386 Third, many ofthe concepts I'll present regarding the 8088 apply to the 80286 and 80386 as well, but to a different degree Given that there are simply too many processors around to cover in detail (and the 80486 on the way), I'd rather pay close attention to the 8088, the processor for which top-quality code is most Abrash/Zen: Front Matter/ critical, and provide you with techniques that will allow you to learn on your own how best to program other processors We'll return to this topic in Chapter 15, when we will in fact discuss other 808Xfamily processors, but for now, take my word for it: when it comes to optimization, the 8088 is the processor of choice WHAT YOU'LL NEED The tools you'll need to follow this book are simple: a text editor to create ASCII program files, the Microsoft Macro Assembler version 5.0 or a compatible assembler (Turbo Assembler is fine) to assemble programs, and the Microsoft Linker or a compatible linker to link programs into an executable form There are several types of reference material you should have available as you pursue assembler mastery You will certainly want a general reference on 8088 assembler The 8086 Book, written by Rector and Alexy and published by Osborne/McGraw-Hill, is a good reference, although you should beware of its unusually high number of typographic errors Also useful is the spiral-bound reference manual that comes with MASM, which contains an excellent summary ofthe instruction sets ofthe 8088, 8086, 80186, 80286, and 80386 IBM's hardware, BIOS, and DOS technical reference manuals are also useful references, containing as they detailed information about the resources available to assembler programmers If you're the type who digs down to the hardware ofthe PC in the pursuit of knowledge, you'll find Intel's handbooks and reference manuals to be invaluable (albeit none too easy to read), since Intel manufactures the 8088 and many ofthe support chips used in the PC There's simply no way to understand what a hardware component is capable of doing in the context ofthe PC without a comprehensive description of everything that part can do, and that's Abrash/Zen: Front Matter/ exactly what Intel's literature provides Finally, keep an eye open for articles on assembly-language programming Articles provide a steady stream of code from diverse sources, and are your best sources of new approaches to assembler programming By the way, the terms "assembler" and "assembly-language" are generally interchangeable While "assembly-language" is perhaps technically more accurate, since "assembler" also refers to the software that assembles assembly-language code, "assembler" is a widely-used shorthand that I'll use throughout this book Similarly, I'll use "the Zenof assembler" as shorthand for "the Zenofassembly language." ODDS AND ENDS I'd like to identify the manufacturers ofthe products I'll refer to in this volume Microsoft makes the Microsoft Macro Assembler (MASM), the Microsoft Linker (LINK), CodeView (CV), and Symdeb (SYMDEB) Borland International makes Turbo Assembler (TASM), Turbo C (TC), Turbo Link (TLINK), and Turbo Debugger (TD) SLR Systems makes OPTASM, an assembler Finally, Orion Instruments makes OmniLab, which integrates highperformance oscilloscope, logic analyzer, stimulus generator, and disassembler instrumentation in a single PC-based package In addition, I'd like to point out that while I've made every effort to ensure that the code in this volume works as it should, no one's perfect Please let me know if you find bugs Also, please let me know what works for you and what doesn't in this book; teaching is not a one-way street You can write me at: 1599 Bittern Drive Abrash/Zen: Front Matter/ Sunnyvale, CA 94087 THE PATH TO THEZENOF ASSEMBLER TheZenofAssemblyLanguage consists of four major parts, contained in two volumes Parts I and II are in this volume, Volume I, while Parts III and IV are in Volume II, TheZenofAssembly Language: The Flexible Mind While the book you're reading stands on its own as a tutorial in high-performance assembler code, the two volumes together cover the whole of superior assembler programming, from hardware to implementation I strongly recommend that you read both The four parts ofTheZenofAssemblyLanguage are organized as follows Part I introduces the concept oftheZenof assembler, and presents the tools we'll use to delve into assembler code performance Part II covers various and sundry pieces of knowledge about assembler programming, examines the resources available when programming the PC, and probes fundamental hardware aspects that affect code performance Part III (in Volume II) examines the process of creating superior code, combining the detailed knowledge of Part II with varied and often unorthodox coding approaches Part IV (also in Volume II) illustrates theZenof assembler in the form of a working animation program In general, Parts I and II discuss the raw stuff of performance, while Parts III and IV show how to integrate that raw performance with algorithms and applications, although there is considerable overlap The four parts together teach all aspects oftheZenof assembler: concept, knowledge, the flexible mind, and implementation Together, we will follow that path Abrash/Zen: Chapter 15/ capabilities to your applications A BRIEF NOTE ON THE 8087 The 8087, 80287 and 80387 are the most common and important PC coprocessors These numeric coprocessors improve the performance of floating-point arithmetic far beyond the speeds possible with an 8088 alone, performing operations such as floating-point addition, subtraction, multiplication, division, absolute value, comparison, and square root The 80287 is similar to the 8087, but with protected mode support; the 80387 adds some new functions, including sine and cosine (For the remainder of this section I'll use the term "8087" to cover all 8087-family numeric coprocessors.) While the 8087 is widely used, and is frequently used by high-level language programs, it is rarely programmed directly in assembler This is true partly because floatingpoint arithmetic is relatively slow, even with an 8087, so the cycle savings achievable via assembler are relatively small as a percentage of overall execution time Also, 8087 instructions are so specialized that they generally offer less rich optimization opportunities than 8088 instructions Given the specialized nature of 8087 assembler programming, and given that 8087 programming is largely a separate topic from 8088 programming (although the processors have their common points, such as addressing modes), I'm not going to tackle the 8087 in this book I will offer one general tip, however: Keep your arithmetic variables in the 8087's data registers as much as you possibly can (There are eight 80-bit data registers, organized as an internal stack.) "Keep it in the registers" is a rule we've become familiar with on the 8088, and it will stand us in equally good stead on the 8087 Abrash/Zen: Chapter 15/ Why? Well, the 8087 works with an internal 10-byte format, rather than the 2-, 4-, and 8-byte integer and floating-point formats we're familiar with Whenever an 8087 instruction loads data from or stores data to a memory variable that's in a 2-, 4-, or 8-byte format, the 8087 must convert the data format accordingly and it takes the 8087 dozens of cycles to perform those conversions Even apart from the conversion time, it takes a number of cycles just to copy to 10 bytes to or from memory For example, it takes the 8087 between 51 and 97 cycles (including effective address calculation time and the 4-cycle- per-word 8-bit bus penalty) just to push a floating-point value from memory onto the 8087's data register stack By contrast, it takes just 17 to 22 cycles to push a value from an internal register onto the data register stack Ideally, the value you need will have been left on top ofthe 8087 register stack as the result ofthe last operation, in which case no load time at all is required Intensive use ofthe 8087's data registers is one area in which assembler code can substantially outperform high-level language code High-level languages tend to use the 8087 for only one operation or, at most, one high-level language statement at a time, loading the data registers from scratch for each operation Most high-level languages load the operands for each operation into the 8087's data registers, perform the operation, and store the result back to memory then start the whole process over again for the next operation, even if the two operations are related What you can in assembler, of course, is use the 8087's data registers much as you've learned to use the 8088's general- purpose registers: load often-used values into the data registers, keep results around if you'll need them later, and keep intermediate results in the data registers rather than storing them to memory Also, remember that you often have the option of either popping or not popping source operands from the top ofthe stack, and that data registers Abrash/Zen: Chapter 15/ other than ST(0) can often serve as destination operands In short, the 8087 has both a generous set of data registers and considerable flexibility in how those registers can be used Take full advantage of those resources when you write 8087 code Before we go, one final item about the 8087 The 8087 is a true coprocessor, fully capable of executing instructions in parallel with the 8088 In other words, the 8088 can continue fetching and executing instructions while the 8087 is processing one of its lengthy instructions While that makes for excellent performance, problems can arise if a second 8087 instruction is fetched and started before the first 8087 instruction has finished To avoid such problems, MASM automatically inserts a wait instruction before each 8087 instruction wait simply tells the 8088 to wait until the 8087 has finished its current instruction before continuing In short, MASM neatly and invisibly avoids one sort of potential 8087 synchronization problem There's a second sort of potential 8087 synchronization problem, however, and this one you must guard against, for it isn't taken care of by MASM: instructions accessing memory out of sequence The 8088 is fully capable of executing new instructions while a lengthy 8087 instruction that precedes those 8088 instructions executes One of those later 8088 instructions can, for example, easily read a memory location before the 8087 instruction writes to it In other words, given an 8087 instruction that accesses a memory variable, it's possible for an 8088 instruction that follows that 8087 instruction to access that memory variable before the 8087 instruction does Clearly, serious problems can arise if instructions access memory out of sequence To avoid such problems, you should explicitly place a wait instruction between any 8087 instruction that accesses a memory variable and any following 8088 instructions that could possibly access that same variable Abrash/Zen: Chapter 15/ That doesn't by any stretch ofthe imagination mean that you should put wait after all of your 8087 instructions On the contrary, the rule is that you should use wait only when there's the potential for out-of-sequence 8087 and 8088 memory accesses, and then only immediately before the instructions during which the conflict might arise The rest ofthe time, you can boost performance by omitting wait and letting the 8088 and 8087 coprocess CONCLUSION Despite all the other processors, coprocessors, and peripherals in the PC family, the 8088 is still the best place to focus your optimization efforts If your code runs well on an 8088, it will run well on every 8086-family processor well into the twenty-first century, and even on a number of computers built around other processors as well Good performance and the largest possible market what more could you want? That's enough of being practical No one programs extensively in assembler just because it's useful; also required is a certain fondness for the sorts of puzzles assembler programming presents For that sort of programmer, there's nothing better than the weird but wonderful 8088 Admit it strange as 8088 assembler programming is isn't it fun? Abrash/Zen: Chapter 16/ Chapter 16: Onward to the Flexible Mind And so we come to the end of our journey through knowledge More precisely, we've come to the end of that part ofTheZenofAssemblyLanguage that's dedicated to knowledge, for no matter how long you or I continue to program the 8088, there will always be more to learn about this surprising processor If TheZenof assembler were merely a matter of instructions and cycle times, I would spend a few pages marvelling at the wonders we've seen, then congratulate you on arriving at a mastery of assembler and bid you farewell I won't that, though, for in truth we've merely arrived at a resting place from whence our journey will continue anew in Volume II ofTheZenofAssemblyLanguage There are marvels aplenty to come, so we'll just catch our breath, take a brief look back to see how far we've come and then it's on to the flexible mind The flexible mind notwithstanding, congratulations are clearly in order right now You've mastered a great deal in fact, you've absorbed just about as much knowledge about assembler as any mortal could in so short a time You've undoubtedly learned much more than you realize just yet; only with experience will everything you've seen in this volume sink in fully As important as the amount you've learned is the nature of your knowledge We haven't just thrown together a collection of unrelated facts in this volume; we've divined the fundamental nature and basic optimization rules ofthe PC We've explored the architectures ofthe PC and the 8088, and we've seen how those underlying factors greatly influence the performance of all assembler code and, by extension, the performance of all code that runs on the PC We've learned which members ofthe instruction set are best suited to various tasks, Abrash/Zen: Chapter 16/ we've come across unexpected talents in many instructions, and we've learned to view instructions in light of what they can do, not what they were designed to Best of all, we've learned to use theZen timer to check our assumptions and to help us continue to learn and hone our skills What all this amounts to is a truly excellent understanding of instruction performance on the PC That's important critically important but it's not the whole picture The knowledge we've acquired is merely the foundation for the flexible mind, which enables us to transform task specifications into superior assembler code In turn, application implementations whole programs are built upon the flexible mind So, while we've built a strong foundation, we've a ways yet to go in completing our mastery oftheZenof assembler The flexible mind and implementation are what Volume II ofTheZenofAssemblyLanguage is all about Volume II develops the concept ofthe flexible mind from the bottom up, starting at the level of implementing the most efficient code for a small, well-defined task, continuing on through algorithm implementation, and extending to designing custom assembler-based mini-languages tailored to various applications We'll learn how to search and sort data quickly, how to squeeze every cycle out of a line-drawing routine, how to let data replace code (with tremendous program-size benefits), and how to animation The emphasis every step ofthe way will be on outperforming standard techniques by using our new knowledge in innovative ways to create the best possible 8088 code for each task Finally, we'll put everything we've learned together by designing and implementing an animation application The PC isn't renowned as a game machine (to put it mildly!), but by the time we're through, I promise you won't be able to tell the difference between the graphics on your PC and those in an arcade The key, of course, is the flexible mind, the ability to bring together the needs ofthe application and the capabilities ofthe PC Abrash/Zen: Chapter 16/ with often-spectacular results So, while we've gone a mighty long way toward mastering theZenof assembler, we haven't arrived yet That's all to the good, though Until now, interesting as our explorations have been, we've basically been doing grunt work learning cycle times and the like What's coming up next is the really fun stuff taking what we've learned and using that knowledge to create the wondrous tasks and applications that are possible only with the very best assembler code In short, in Volume II we'll experience the full spectrum oftheZenof assembler, from the details that we now know so well to the magnificent applications that make it all worthwhile A TASTE OF WHAT YOU'VE LEARNED Before we leave Volume I, I'd like to give you a taste of both what's to come and what you already know Why you need to see what you already know? The answer is that you've surely learned much more than you realize right now The example we'll look at involves strong elements ofthe flexible mind, and what we'll find is that there's no neat dividing line between knowledge and the flexible mind and that we have already ventured much farther across the fuzzy boundary between the two than you'd ever imagine We'll also see that the flexible mind involves knowledge and intuition but no deep dark mysteries Knowledge you have in profusion, and, as you'll see, your intuition is growing by leaps and bounds (Try to stay one step ahead of me as we optimize the following routine I suspect you'll be surprised at how easy it is.) I'm presenting this last example precisely because I'd like you to see how well you already understand the flexible mind On to our final example Abrash/Zen: Chapter 16/ ZENNING In Jeff Duntemann's excellent book Complete Turbo Pascal, Third Edition (published by Scott, Foresman and Company), there's a small assembler subroutine that's designed to be called from a Turbo Pascal program in order to fill the screen or a systemmemory screen buffer with a specified character/attribute pair in text mode This subroutine involves only 21 instructions and works perfectly well; nonetheless, with what we know we can compact the subroutine tremendously, and speed it up a bit as well To coin a verb, we can "Zen" this already-tight assembler code to an astonishing degree In the process, I hope you'll get a feel for how advanced your assembler skills have become The code is as follows (the code is Jeff's, with many letters converted to lowercase in order to match the style ofZenofAssembly Language, but the comments are mine): OnStack OldBP RetAddr Filler Attrib BufSize BufOfs BufSeg EndMrk OnStack ; ClearS struc dw dw dw dw dw dw dw db ends ;data that's stored on the stack after PUSH BP ? ;caller's BP ? ;return address ? ;character to fill the buffer with ? ;attribute to fill the buffer with ? ;number of character/attribute pairs to fill ? ;buffer offset ? ;buffer segment ? ;marker for the end ofthe stack frame proc push mov cmp jne cmp je near bp ;save caller's BP bp,sp ;point to stack frame word ptr [bp].BufSeg,0 ;skip the fill if a null Start ; pointer is passed word ptr [bp].BufOfs,0 Bye ;make STOSW count up ax,[bp].Attrib ;load AX with attribute parameter ax,0ff00h ;prepare for merging with fill char bx,[bp].Filler ;load BX with fill char bx,0ffh ;prepare for merging with attribute ax,bx ;combine attribute and fill char bx,[bp].BufOfs ;load DI with target buffer offset di,bx bx,[bp].BufSeg ;load ES with target buffer segment es,bx cx,[bp].BufSize ;load CX with buffer size stosw ;fill the buffer sp,bp ;restore original stack pointer bp ; and caller's BP Start: cld Bye: mov and mov and or mov mov mov mov mov rep mov pop Abrash/Zen: Chapter 16/ ClearS ret endp EndMrk-RetAddr-2 ;return, clearing the parms from the stack The first thing you'll notice about the above code is that ClearS uses a rep stosw instruction That means that we're not going to improve performance by any great amount, no matter how clever we are While we can eliminate some cycles, the bulk ofthe work in ClearS is done by that one repeated string instruction, and there's no way to improve on that Does that mean that the above code is as good as it can be? Hardly While the speed of ClearS is very good, there's another side to the optimization equation: size The whole of ClearS is 52 bytes long as it stands but, as we'll see, that size is hardly graven in stone Where we begin with ClearS? For starters, there's an instruction in there that serves no earthly purpose mov sp,bp SP is guaranteed to be equal to BP at that point anyway, so why reload it with the same value? Removing that instruction saves us bytes Well, that was certainly easy enough! We're not going to find any more totally non-functional instructions in ClearS, however, so let's get on to some serious optimizing We'll look first for cases where we know of better instructions for particular tasks than those that were chosen For example, there's no need to load any register, whether segment or general- purpose, through BX; we can eliminate two instructions by simply loading ES and DI directly: ClearS proc push mov cmp jne cmp je mov and mov and or mov mov mov rep near bp ;save caller's BP bp,sp ;point to stack frame word ptr [bp].BufSeg,0 ;skip the fill if a null Start ; pointer is passed word ptr [bp].BufOfs,0 Bye ;make STOSW count up ax,[bp].Attrib ;load AX with attribute parameter ax,0ff00h ;prepare for merging with fill char bx,[bp].Filler ;load BX with fill char bx,0ffh ;prepare for merging with attribute ax,bx ;combine attribute and fill char di,[bp].BufOfs ;load DI with target buffer offset es,[bp].BufSeg ;load ES with target buffer segment cx,[bp].BufSize ;load CX with buffer size stosw ;fill the buffer pop bp Start: cld Bye: ;restore caller's BP Abrash/Zen: Chapter 16/ ClearS ret endp EndMrk-RetAddr-2 ;return, clearing the parms from the stack (The OnStack structure definition doesn't change in any of our examples, so I'm not going clutter up this chapter by reproducing it for each new version of ClearS.) Okay, loading ES and DI directly saves another bytes We've squeezed a total of bytes about 11% out of ClearS What next? Well, les would serve better than two mov instructions for loading ES and DI: ClearS proc push mov cmp jne cmp je Start: cld mov and mov and or les mov rep near bp ;save caller's BP bp,sp ;point to stack frame word ptr [bp].BufSeg,0 ;skip the fill if a null Start ; pointer is passed word ptr [bp].BufOfs,0 Bye ;make STOSW count up ax,[bp].Attrib ;load AX with attribute parameter ax,0ff00h ;prepare for merging with fill char bx,[bp].Filler ;load BX with fill char bx,0ffh ;prepare for merging with attribute ax,bx ;combine attribute and fill char di,dword ptr [bp].BufOfs ;load ES:DI with target buffer segment:offset cx,[bp].BufSize ;load CX with buffer size stosw ;fill the buffer Bye: ClearS pop ret endp bp EndMrk-RetAddr-2 ;restore caller's BP ;return, clearing the parms from the stack That's good for another bytes We're down to 43 bytes, and counting We can save more bytes by clearing the low and high bytes of AX and BX, respectively, by using sub reg8,reg8 rather than anding 16-bit values: ClearS proc push mov cmp jne cmp je Start: cld mov sub mov sub or near bp ;save caller's BP bp,sp ;point to stack frame word ptr [bp].BufSeg,0 ;skip the fill if a null Start ; pointer is passed word ptr [bp].BufOfs,0 Bye ;make STOSW count up ax,[bp].Attrib ;load AX with attribute parameter al,al ;prepare for merging with fill char bx,[bp].Filler ;load BX with fill char bh,bh ;prepare for merging with attribute ax,bx ;combine attribute and fill char Abrash/Zen: Chapter 16/ les mov rep di,dword ptr [bp].BufOfs cx,[bp].BufSize stosw ;load ES:DI with target buffer segment:offset ;load CX with buffer size ;fill the buffer pop ret endp bp EndMrk-RetAddr-2 ;restore caller's BP ;return, clearing the parms from the stack Bye: ClearS Now we're down to 40 bytes more than 20% smaller than the original code That's pretty much it for simple instruction- substitution optimizations Now let's look for instruction- rearrangement optimizations It seems strange to load a word value into AX and then throw away AL Likewise, it seems strange to load a word value into BX and then throw away BH However, those steps are necessary because the two modified word values are ored into a single character/attribute word value that is then used to fill the target buffer Let's step back and see what this code really does, though All it does in the end is load byte addressed relative to BP into AH and another byte addressed relative to BP into AL Heck, we can just that directly! Presto we've saved another bytes, and turned two wordsized memory accesses into byte-sized memory accesses as well: ClearS proc push mov cmp jne cmp je Start: cld mov mov les mov rep near bp ;save caller's BP bp,sp ;point to stack frame word ptr [bp].BufSeg,0 ;skip the fill if a null Start ; pointer is passed word ptr [bp].BufOfs,0 Bye ;make STOSW count up ah,byte ptr [bp].Attrib[1] ;load AH with attribute al,byte ptr [bp].Filler ;load AL with fill char di,dword ptr [bp].BufOfs ;load ES:DI with target buffer segment:offset cx,[bp].BufSize ;load CX with buffer size stosw ;fill the buffer Bye: ClearS pop ret endp bp EndMrk-RetAddr-2 ;restore caller's BP ;return, clearing the parms from the stack (We could get rid of yet another instruction by having the calling code pack both the attribute Abrash/Zen: Chapter 16/ and the fill value into the same word, but that's not part ofthe specification for this particular routine.) Another nifty instruction-rearrangement trick saves more bytes ClearS checks to see whether the far pointer is null (zero) at the start ofthe routine then loads and uses that same far pointer later on Let's get that pointer into memory and keep it there; that way we can check to see whether it's null with a single comparison, and can use it later without having to reload it from memory: ClearS proc push mov les mov or je Start: cld mov mov mov rep near bp ;save caller's BP bp,sp ;point to stack frame di,dword ptr [bp].BufOfs ;load ES:DI with target buffer segment:offset ax,es ;put segment where we can test it ax,di ;is it a null pointer? Bye ;yes, so we're done ;make STOSW count up ah,byte ptr [bp].Attrib[1] ;load AH with attribute al,byte ptr [bp].Filler ;load AL with fill char cx,[bp].BufSize ;load CX with buffer size stosw ;fill the buffer Bye: ClearS pop ret endp bp EndMrk-RetAddr-2 ;restore caller's BP ;return, clearing the parms from the stack Well Now we're down to 28 bytes, having reduced the size of this subroutine by nearly 50% Only 13 instructions remain Realistically, how much smaller can we make this code? About one-third smaller yet, as it turns out but in order to that, we must stretch our minds and use the 8088's instructions in unusual ways Let me ask you this: what most ofthe instructions in the current version of ClearS do? Answer: they either load parameters from the stack frame or set up the registers so that the parameters can be accessed Mind you, there's nothing wrong with the stack-frameoriented instructions used in ClearS; those instructions access the stack frame in a highly Abrash/Zen: Chapter 16/ efficient way, exactly as the designers ofthe 8088 intended, and just as the code generated by a high-level language would That means that we aren't going to be able to improve the code if we don't bend the rules a bit Let's think the parameters are sitting on the stack, and most of our instruction bytes are being used to read bytes off the stack with BP-based addressing we need a more efficient way to address the stack the stack THE STACK! Ye gods! That's easy we can use the stack pointer to address the stack While it's true that the stack pointer can't be used for mod-reg-rm addressing, as BP can, it can be used to pop data off the stack and pop is a 1-byte instruction Instructions don't get any shorter than that There is one detail to be taken care of before we can put our plan into action: the return address the address ofthe calling code is on top ofthe stack, so the parameters we want can't be reached with pop That's easily solved, however we'll just pop the return address into an unused register, then branch through that register when we're done, as we learned to in Chapter 14 As we pop the parameters, we'll also be removing them from the stack, thereby neatly avoiding the need to discard them when it's time to return With that problem dealt with, here's the Zenned version of ClearS: ClearS proc pop pop pop mov pop pop pop mov or je cld rep near dx ax bx ah,bh cx di es bx,es bx,di Bye jmp endp dx stosw ;get the return address ;put fill char into AL ;get the attribute ;put attribute into AH ;get the buffer size ;get the offset ofthe buffer origin ;get the segment ofthe buffer origin ;put the segment where we can test it ;null pointer? ;yes, so we're done ;make STOSW count up ;do the string store Bye: ClearS ;return to the calling code Abrash/Zen: Chapter 16/ At long last, we're down to the bare metal This version of ClearS is just 19 bytes long That's just 37% as long as the original version, without any change whatsoever in the functionality ClearS makes available to the calling code The code is bound to run a bit faster too, given that there are far fewer instruction bytes and fewer memory accesses All in all, the Zenned version of ClearS is a vast improvement over the original Probably not the best possible implementation never say never! but an awfully good one KNOWLEDGE AND BEYOND There is a point to all this Zenning above and beyond showing off some neat tricks we've learned (and a trick or two we'll learn more about in Volume II) The real point is to illustrate the breadth of knowledge you now possess, and the tremendous power that knowledge has when guided by the flexible mind Consider the optimizations we made to ClearS above Our initial optimizations resulted purely from knowing particular facts about the 8088, and nothing more We knew, for example, that segment registers not have to be loaded from memory by way of generalpurpose registers but can instead be loaded directly, so we made that change As optimizations became harder to come by, however, we shifted from applying pure knowledge to coming up with creative solutions that involved understanding and reworking the code as a whole We started out by compacting individual instructions and bits of code, but in the end we came up with a solution that applied our knowledge ofthe PC to implementing the functionality ofthe entire subroutine as efficiently as possible And that, simply put, is the flexible mind Think back Did you have any trouble following the optimizations to ClearS? I Abrash/Zen: Chapter 16/ very much doubt it; in fact, I would guess that you were ahead of me much ofthe way So, you see, you already have a good feel for the flexible mind There will be much more ofthe flexible mind in Volume II ofTheZenofAssembly Language, but it won't be an abrupt change from what we've been doing; rather, it will be a gradual raising of our focus from learning the nuts and bolts ofthe PC to building applications with those nuts and bolts We've trekked through knowledge and beyond; now it's time to seek out ways to bring the magic oftheZenof assembler to the real world of applications I hope you'll join me for the journey ... processors, such as the 68000 family, the Z80, the 8080, or the 6502, since much of the Zen of assembly language in the case of the IBM PC derives from the highly unusual architecture of the 808X family... built around other processors, such as the 68XXX family, the Z80, the 8080, or the 6502, since a great deal of the Zen of assembly language in the case of the IBM PC derives from the highly unusual... to "the Zen of assembler" as a shorthand for "the Zen of assembly language. " THE PATH TO THE ZEN OF ASSEMBLER The Zen of Assembly Language consists of four major parts, contained in two volumes