Effective awk Programming ,TITLE.24009 Page 1 Tuesday, October 9, 2001 1:55 AM ,TITLE.24009 Page 2 Tuesday, October 9, 2001 1:55 AM Effective awk Programming Third Edition Arnold Robbins Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo ,TITLE.24009 Page 3 Tuesday, October 9, 2001 1:55 AM Effective awk Programming, Third Edition by Arnold Robbins Copyright © 1989, 1991, 1992, 1993, 1996–2001 Free Software Foundation, Inc. All rights reserved. Printed in the United States of America. Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. Phone: (617) 542-5942, Fax: (617) 542-2652, Email: gnu@gnu.org, URL: http://www.gnu.org. Published by O’Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472. This is Edition 3 of Effective awk Programming: A User’s Guide for GNU awk, for the 3.1.0 (or later) version of the GNU implementation of awk. Editor: Chuck Toporek Production Editor: Jeffrey Holcomb Cover Designer: Hanna Dyer Printing History: March 1996: First Edition (published by Specialized Systems Consult- ants, Inc. and the Free Software Foundation, Inc. as Effec- tive AWK Programming: A User’s Guide for GNU AWK ) February 1997: Second Edition (published by Specialized Systems Consul- tants, Inc. and the Free Software Foundation, Inc. as Effec- tive AWK Programming: A User’s Guide) May 2001: Third Edition (published by O’Reilly & Associates, Inc.) Cover design, trade dress, Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly & Associates, Inc. The association between the image of a great auk and the topic of awk programming is a trademark of O’Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License,” the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled “GNU Free Documentation License.” a. “A GNU Manual.” b. “You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development.” ISBN: 0-596-00070-7 [M] ,COPYRIGHT.23885 Page 1 Tuesday, October 9, 2001 1:55 AM To Miriam, for making me complete. To Chana, for the joy you bring us. To Rivka, for the exponential increase. To Nachum, for the added dimension. To Malka, for the new beginning. 9 October 2001 01:44 9 October 2001 01:44 Ta ble of Contents Fore word xiii Preface xv I. The awk Language and gawk 1 1. Getting Star ted with awk 3 How to Run awk Programs 4 Datafiles for the Examples 10 Some Simple Examples 11 An Example with Two Rules 13 A Mor e Complex Example 14 awk Statements Versus Lines 15 Other Features of awk 17 When to Use awk 17 2. Regular Expressions 19 How to Use Regular Expressions 19 Escape Sequences 21 Regular Expression Operators 23 Using Character Lists 26 gawk-Specific Regexp Operators 28 Case Sensitivity in Matching 29 How Much Text Matches? 31 Using Dynamic Regexps 31 vii 9 October 2001 01:45 viii Table of Contents 3. Reading Input Files 33 How Input Is Split into Records 33 Examining Fields 36 Non-constant Field Numbers 38 Changing the Contents of a Field 39 Specifying How Fields Are Separated 41 Reading Fixed-Width Data 46 Multiple-Line Records 48 Explicit Input with getline 51 4. Printing Output 58 The print Statement 58 Examples of print Statements 59 Output Separators 60 Contr olling Numeric Output with print 61 Using printf Statements for Fancier Printing 62 Redir ecting Output of print and printf 68 Special Filenames in gawk 70 Closing Input and Output Redirections 74 5. Expressions 78 Constant Expressions 79 Using Regular Expression Constants 81 Variables 82 Conversion of Strings and Numbers 84 Arithmetic Operators 85 String Concatenation 87 Assignment Expressions 88 Incr ement and Decrement Operators 92 True and False in awk 93 Variable Typing and Comparison Expressions 94 Boolean Expressions 97 Conditional Expressions 99 Function Calls 99 Operator Precedence (How Operators Nest) 101 9 October 2001 01:45 Ta ble of Contents ix 6. Patter ns, Actions, and Var iables 103 Patter n Elements 103 Using Shell Variables in Programs 109 Actions 110 Contr ol Statements in Actions 111 Built-in Variables 120 7. Arra ys in awk 129 Intr oduction to Arrays 130 Referring to an Array Element 132 Assigning Array Elements 133 Basic Array Example 133 Scanning All Elements of an Array 134 The delete Statement 135 Using Numbers to Subscript Arrays 136 Using Uninitialized Variables as Subscripts 137 Multidimensional Arrays 138 Scanning Multidimensional Arrays 139 Sorting Array Values and Indices with gawk 140 8. Functions 142 Built-in Functions 142 User-Defined Functions 166 9. Internationalization with gawk 174 Inter nationalization and Localization 174 GNU gettext 175 Inter nationalizing awk Programs 177 Translating awk Programs 179 A Simple Internationalization Example 182 gawk Can Speak Your Language 183 10. Advanced Features of gawk 185 Allowing Nondecimal Input Data 185 Two-Way Communications with Another Process 186 Using gawk for Network Programming 188 Using gawk with BSD Portals 189 Pr ofiling Your awk Programs 190 9 October 2001 01:45 x Table of Contents 11. Running awk and gawk 194 Invoking awk 194 Command-Line Options 195 Other Command-Line Arguments 200 The AWKPATH Envir onment Variable 201 Obsolete Options and/or Features 202 Known Bugs in gawk 203 II . Using awk and gawk 205 12. A Librar y of awk Functions 207 Naming Library Function Global Variables 208 General Programming 210 Datafile Management 218 Pr ocessing Command-Line Options 222 Reading the User Database 228 Reading the Group Database 232 13. Practical awk Prog rams 237 Running the Example Programs 237 Reinventing Wheels for Fun and Profit 238 A Grab Bag of awk Programs 259 14. Internetworking with gawk 281 Networking with gawk 281 Some Applications and Techniques 305 Related Links 323 III . Appendixes 325 A. The Evolution of the awk Language 327 B. Installing ga wk 337 C. Implementation Notes 350 9 October 2001 01:45 [...]... SSC published the first two editions of Effective awk Programming, and the FSF published the same two editions under the title The GNU Awk User’s Guide This edition maintains the basic structure of Edition 1.0, but with significant additional material, reflecting the host of new features in gawk Version 3.1 Of particular note is the section “Sorting Array Values and Indices with gawk” in Chapter 7, as well... easier with awk The awk utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs The GNU implementation of awk is called gawk ; it is fully compatible with the System V Release 4 version of awk gawk is also compatible with the POSIX specification of the awk language This means that all properly written awk programs should work with gawk Thus,... for network access from awk, and with a little help from me, set about adding features to do this for gawk At that time, he also wrote the bulk of TCP/IP Internetworking with gawk (a separate document, available as part of the gawk distribution) Chapter 14, Inter networking with gawk, is condensed from that document His code finally became part of the main gawk distribution with gawk Version 3.1 See Appendix... variables awk and gawk use * All such differences appear in the index under the entry “differences in awk and gawk.” 9 October 2001 01:40 Preface xix Chapter 7, Arrays in awk, covers awk ’s one-and-only data structure: associative arrays Deleting array elements and whole arrays is also described, as well as sorting arrays in gawk Chapter 8, Functions, describes the built-in functions awk and gawk provide,... discovered that my computer had ‘‘old awk ’’ and the awk book described ‘‘new awk. ’’ I learned that this was typical; the old version refused to step aside or relinquish its name If a system had a new awk , it was invariably called nawk , and few systems had it The best way to get a new awk was to ftp the source code for gawk from prep.ai.mit.edu gawk was a version of new awk written by David Trueman and... term awk program refers to a program written by you in the awk programming language * Of particular note is Sun’s Solaris, where /usr/bin /awk is, sadly, still the original version Use /usr/xpg4/bin /awk to get a POSIX-compliant version of awk on Solaris † Often, these systems use gawk for their awk implementation! 9 October 2001 01:40 xviii Preface Primarily, this book explains the features of awk, ... particular implementation of awk called gawk (which stands for “GNU awk ) gawk runs on a broad range of Unix systems, ranging from 80386 PC-based computers up through large-scale systems, such as Crays gawk has also been ported to Mac OS X, MS-DOS, Microsoft Windows (all versions) and OS/2 PCs, Atari and Amiga microcomputers, BeOS, Tandem D20, and VMS History of awk and gawk The name awk comes from the initials... describes how to run gawk, the meaning of its command-line options, and how it finds awk program source files Chapter 12, A Library of awk Functions, and Chapter 13, Practical awk Programs, provide many sample awk programs Reading them allows you to see awk solving real problems Chapter 14, Inter networking with gawk, provides an in-depth discussion and examples of how to use gawk for Internet programming... the awk language and a nawk utility for the new version.* Others have an oawk version for the “old awk ” language and plain awk for the new one Still others only have one version, which is usually the new one.† All in all, this makes it difficult for you to know which version of awk you should run when writing your programs The best advice I can give here is to check your local documentation Look for awk, ... 7, Arrays in awk • Chapter 8, Functions • Chapter 9, Inter nationalization with gawk • Chapter 10, Advanced Features of gawk • 9 October 2001 01:44 Chapter 1, Getting Started with awk Chapter 11, Running awk and gawk 9 October 2001 01:44 In this chapter: • How to Run awk Programs • Datafiles for the Examples • Some Simple Examples • An Example with Two Rules • A More Complex Example • awk Statements . 95472. This is Edition 3 of Effective awk Programming: A User’s Guide for GNU awk, for the 3.1.0 (or later) version of the GNU implementation of awk. Editor:. invariably called nawk , and few systems had it. The best way to get a new awk was to ftp the source code for gawk fr om prep.ai.mit.edu. gawk was a version of new awk written