1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Your unixlinux the ultimate guide (third edition) – part 2

432 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Filtering and Programming with awk
Tác giả Aho, Weinberger, Kernighan
Năm xuất bản 2011
Định dạng
Số trang 432
Dung lượng 22,97 MB

Nội dung

PA R T II UNIX for the Programmer das76205_Ch12_329-358.indd 329 12/13/11 10:47 AM das76205_Ch12_329-358.indd 330 12/13/11 10:47 AM CHAPTER 12 Filtering and Programming with awk T his chapter begins Part II, which presents the programming features of UNIX We begin with the awk command, which made a late entry into the UNIX system in 1977 to augment the toolkit with suitable report formatting capabilities Named after its authors, Aho, Weinberger, and Kernighan, awk, until the advent of perl, was the most powerful utility for text manipulation and report writing awk also appears as nawk (newer awk) on most systems and gawk (GNU awk) in Linux The POSIX specification and our discussions are based on nawk Like sed, awk doesn’t belong to the do-one-thing-well family of UNIX commands It combines features of several filters, but it has two unique features First, it can identify and manipulate individual fields in a line Second, awk is one of the few UNIX filters (bc is another) that can perform computation Further, awk also accepts extended regular expressions (EREs) for pattern matching, has C-type programming constructs, and has several built-in variables and functions Learning awk will help you understand perl, which uses most of the awk constructs, sometimes in an identical manner Objectives • Understand awk’s unusual syntax, including its selection criteria and action components • Split a line into fields and format the output with printf • Understand the special properties of awk variables and expressions • Use the comparison operators to select lines on practically any condition • Use the ~ and !~ operators with extended regular expressions (EREs) for pattern matching • Handle decimal numbers and use them for computation • Do some pre- and post-processing with the BEGIN and END sections • Use arrays and access an array element with a nonnumeric subscript • Examine awk’s built-in variables • Use the built-in functions for performing string handling tasks • Make decisions with the if statement • Use the for and while loops to perform tasks repeatedly 331 das76205_Ch12_329-358.indd 331 12/13/11 10:47 AM Your UNIX/Linux: The Ultimate Guide 332 12.1 awk Preliminaries awk is a little awkward to use at first, but if you feel comfortable with find and sed, then you’ll find a friend in awk Even though it is a filter, awk resembles find in its syntax: awk options ‘selection_criteria {action}’ file(s) Note the use of single quotes and curly braces The selection_criteria (a form of addressing) filters input and selects lines for the action component to act on This component is enclosed within curly braces The selection_criteria and action constitute an awk program that is surrounded by a set of single quotes These programs are often one-liners, though they can span several lines as well A sample awk program is shown in Fig 12.1 Let’s have a brief look at each of the constituents of the syntax Unlike other filters, awk uses a contiguous sequence of spaces and tabs as the default delimiter This default has been changed in the figure to a colon using the -F option Fields in awk are numbered $1, $2, and so on, and the selection criteria here test whether the third field is greater than 200 awk also addresses the entire line as $0 In Chapter 13, you’ll find the shell also using the same parameters to represent commandline arguments To prevent the shell from performing variable evaluation, we need to single-quote any awk program that uses these parameters Even though we haven’t seen relational tests in command syntax before, selection criteria in awk are not limited to a simple comparison They can be a regular expression to search for, one- or two-line addresses, or a conditional expression Here are some examples: awk awk awk awk ‘/negroponte/ { print }’ foo ‘$2 ~ /^negroponte$/ { print }’ foo ‘NR == 1, NR == { print }’ foo ‘$6 > 2000 { print }’ foo Lines containing negroponte Tests for exact match on second field Lines to Sixth field greater than 2000 That awk also uses regular expressions as patterns is evident from the second example, which shows the use of ^ and $ in anchoring the pattern The third example uses awk’s built-in variable, NR, to represent the record number The term record is new in this text By default, awk identifies a single line as a record, but a record in awk can also comprise multiple contiguous lines The action component is often a print or printf statement, but it can also be a program We’ll learn to use the if, while, and for constructs here before they show up FIGURE 12.1 Components of an awk Program awk -F: '$3 > 200 { print $1, $3 }' /etc/passwd Selection criteria das76205_Ch12_329-358.indd 332 Action 12/13/11 10:47 AM Chapter 12: Filtering and Programming with awk 333 again in the shell, perl, and C programming Moreover, the selection criteria in all of the four preceding examples can also be implemented in the action component Let’s consider a simple awk command that selects the Subject: lines from mbox, the mailbox file: $ awk ‘/^Subject:/ { print }’ $HOME/mbox Subject: RE: History is not bunk Subject: Mail server problem Subject: Take our Survey, Win US$500! When used without any field specifiers, print writes the entire line to the standard output Printing is also the default action of awk, so all following forms could be considered equivalent: awk ‘/^Subject:/’ mbox awk ‘/^Subject:/{ print }’ mbox awk ‘/^Subject:/ { print $0}’ mbox Printing is the default action Whitespace permitted $0 is the complete line Observe that the first example doesn’t have an action component If the action is missing, the entire line is printed If the selection criteria are missing, the action applies to all lines One of them has to be specified The selection criteria in these examples used the ^ to anchor the pattern For pattern matching, awk uses regular expressions in sed-style: $ awk ‘/wilco[cx]k*s*/’ emp.lst 3212:bill wilcocks :d.g.m :accounts :12/12/55: 85000 2345:james wilcox :g.m :marketing :03/12/45:110000 However, the regular expressions used by awk belong to the basic BRE (but not the IRE and TRE) and ERE variety The latter is used by grep -E (10.5) or egrep This means that you can also use multiple patterns using (, ) and |: awk ‘/wood(house|cock)/’ emp.lst awk ‘/wilco[cx]k*s*|wood(cock|house)/’ emp.lst awk ‘/^$/’ emp.lst Henceforth, the input for many awk programs used in this chapter will come from the file empn.lst We created this file with sed in Section 10.12.1 The lines here are of variable length: $ head -n empn.lst 2233:charles harris:g.m.:sales:12/12/52: 90000 9876:bill johnson:director:production:03/12/50:130000 5678:robert dylan:d.g.m.:marketing:04/19/43: 85000 2365:john woodcock:director:personnel:05/11/47:120000 We need to use the -F option to specify the delimiter (:) whenever we select fields from this file das76205_Ch12_329-358.indd 333 12/13/11 10:47 AM Your UNIX/Linux: The Ultimate Guide 334 An awk program must have either the selection criteria or the action, or both, but within single quotes Double quotes will create problems unless used judiciously Note 12.2 Using print and printf awk uses the print and printf statements to write to standard output print produces unformatted output, and since our new sample database contains lines of variable length, print maintains the field widths in its output This is how we use print to invert the first and second fields of the sales people: $ awk -F: ‘/sales/ { print $2, $1 }’ empn.lst charles harris 2233 gordon lightfoot 1006 p.j woodhouse 1265 jackie wodehouse 2476 A comma in the field list ($2, $1) ensures that the fields are not glued together The default delimiter is the space, but we’ll learn to change it later by setting the built-in variable, FS What about printing all fields except, say, the fourth one? Rather than explicitly specify all remaining field identifiers, we can reassign the one we don’t want to an empty string: $ awk -F: ‘{ $4 = “” ; print }’ empn.lst | head -n 2233 charles harris g.m 12/12/52 90000 9876 bill johnson director 03/12/50 130000 When placing multiple statements in a single line, use the ; as their delimiter print here is the same as print $0 With the C-like printf statement, you can use awk as a stream formatter printf uses a quoted format specifier and a field list awk accepts most of the formats used by the printf function in C and the printf command In this chapter, we’ll stick to these formats: %s — String %d — Integer %f — Floating-point number Let’s produce formatted output from unformatted input, using a regular expression this time in the selection criteria: $ awk -F: ‘/true?man/ { > printf(“%-20s %-12s %6d\n”, $2, $3, $6) }’ empn.lst ronie trueman executive 75000 julie truman g.m 95000 The name and designation have been printed in spaces 20 and 12 characters wide, respectively; the - symbol left-justifies the output Note that unlike print, printf requires \n to print a newline after each line das76205_Ch12_329-358.indd 334 12/13/11 10:47 AM Chapter 12: Filtering and Programming with awk C Shell Note 12.2.1 335 Note that awk gets multiline here It’s a shell belonging to the Bourne family (like Bash) that’s running this command, which considers a command to be complete only when it encounters the closing quote Don’t forget to place a \ after { and before you press [Enter] if you run this command in the C shell awk is the only filter that uses whitespace as the default delimiter cut and paste use the tab, and sort uses a contiguous set of spaces as the default delimiter Redirecting Standard Output Every print and printf statement can be separately redirected with the > and | symbols However, make sure the filename or the command that follows these symbols is enclosed within double quotes For example, the following statement sorts the output of the printf statement: printf “%s %-10s %-12s %-8s\n”, $1, $3, $4, $6 | “sort” If you use redirection instead, the filename should be enclosed in quotes in a similar manner: printf “%s %-10s %-12s %-8s\n”, $1, $3, $4, $6 > “mslist” awk thus provides the flexibility of separately manipulating the different output streams But don’t forget the quotes! 12.3 Number Processing awk supports computation using the arithmetic operators from the list shown in Table 12.1 The +, -, *, and / perform the four basic functions, but we’ll also use % (modulo) in some of our scripts awk (along with bc) also overcomes the inability of expr and the shell to handle floating-point numbers Let awk take, as its input, two numbers from the standard input: TA B L E das76205_Ch12_329-358.indd 335 Arithmetic Operators Used by awk and perl Operator Description + * / % ^ ** Addition Subtraction Multiplication Division Modulo (5 % = 2) Exponentiation (2 ^ 10 = 1024) (awk only) Exponentiation ( ** 10 = 1024) (perl only) 12/13/11 10:47 AM Your UNIX/Linux: The Ultimate Guide 336 $ echo 22 | awk ‘{print $1/$2}’ 3.14286 $ echo 22 | awk ‘{printf “%1.20f\n”, $1/$2}’ 3.14285714285714279370 The second example uses the %1.20f format string to print a floating-point number with 20 digits to the right of the decimal point Salespeople often earn a bonus apart from their salary We’ll assume here that the bonus amount is equal to one month’s salary We’ll print the pay slip for these people using a variable to print the serial number: $ awk -F: ‘/sales/ { > kount = kount + > printf “%3d %-20s %-12s %6d %8.2f\n”, kount, $2, $3, $6, $6/12 }’ empn.lst charles harris g.m 90000 7500.00 gordon lightfoot director 140000 11666.67 p.j woodhouse manager 90000 7500.00 jackie wodehouse manager 110000 9166.67 The last column shows the bonus component, obtained by dividing the salary field by 12 ($6/12) As in C, the = operator can be combined with any of the arithmetic operators For instance, += is an assignment operator that adds the value on its right to the variable on its left and also reassigns the variable These two operations mean the same thing in awk, perl, and C: kount = kount + kount += When the operand on the right is a (one), awk offers the increment operator, ++, as a synonym So all of the following three forms are equivalent: kount = kount + kount += kount++ The same line of reasoning applies to the other arithmetic operators too So, x decrements the existing value of x by and x *= reassigns x by multiplying its existing value by The assignment operators are listed in Table 12.2 T A B L E Assignment Operators (i = initially; result used as initial value by next line) das76205_Ch12_329-358.indd 336 Operator Description Example Value of i ++ += = *= /= Adds one to itself Adds and assigns to itself Subtracts one from itself Subtracts and assigns to itself Multiplies and assigns to itself Divides and assigns to itself i++ i += i-i -= i *= i /= 11 10 24 12/13/11 10:47 AM Chapter 12: Filtering and Programming with awk 337 The ++ and operators are special; they can be used as both prefix and postfix operators The statements x++ and ++x are similar but not identical: kount = count = print ++kount print count++ 12.4 Increments kount first and then prints Prints and then sets count to Variables and Expressions Throughout this chapter, we’ll be using variables and expressions with awk Expressions comprise strings, numbers, variables, and entities that are built by combining them with operators For example, (x + 5)*12 is an expression Unlike in programming languages, awk doesn’t have char, int, long, double, and so forth as primitive data types Every expression can be interpreted either as a string or a number, and awk makes the necessary conversion according to context awk also allows the use of user-defined variables but without declaring them Variables are case-sensitive: x is different from X A variable is deemed to be declared the first time it is used Unlike shell variables, awk variables don’t use the $ either in assignment or in evaluation: x = “5” print x A user-defined variable needs no initialization It is implicitly initialized to zero or a null string As discussed before, awk has a mechanism of identifying the type and initial value of a variable from its context Strings in awk are always double-quoted and can contain any character Like echo, awk strings can also use escape sequences and octal values, but strings can also include hex values There’s one difference, however: octal and hex values are preceded by only \ and \x, respectively: x =”\t\tBELL\7” print x Prints two tabs, the string BELL and sounds a beep awk provides no operator for concatenating strings Strings are concatenated by simply placing them side-by-side: x = “sun” ; y = “com” print x y print x “.” y Prints suncom Prints sun.com Concatenation is not affected by the type of variable A numeric and string value can be concatenated with equal ease The following examples demonstrate how awk makes automatic conversions when concatenating and adding variables: x = “5” ; y = ; z = “A” print x y das76205_Ch12_329-358.indd 337 y converted to string; prints 56 12/13/11 10:47 AM Your UNIX/Linux: The Ultimate Guide 338 print x + y print y + z x converted to number; prints 11 z converted to numeric 0; prints Even though we assigned “5” (a string) to x, we could still use it for numeric computation Also observe that when a number is added to a string, awk converts the string to zero since the string doesn’t have numerals Expressions also have true and false values associated with them Any nonempty string is true; so is any positive number The statement if (x) is true if x is a nonnull string or a positive number Note 12.5 Variables are neither declared nor is their type specified awk identifies their type and initializes them to zero or null strings String variables are always double-quoted, but they can contain escape sequences Nonprintable characters can be represented by their octal or hex values The Comparison and Logical Operators awk has a single set of comparison operators for handling strings and numbers and two separate operators for matching regular expressions (Table 12.3) You’ll find the scenario quite different in perl and shell programming; both use separate sets of operators for comparing strings and numbers In this section, we’ll demonstrate the use of these operators in the selection criteria, but they can also be used with modifications in the action component 12.5.1 String and Numeric Comparison Both numeric and string equality are tested with the == operator The operator != tests inequality Programmers already know that == is different from =, the assignment operator TA B L E das76205_Ch12_329-358.indd 338 The Comparison and Logical Operators Operator Significance < = > ~ !~ && || ! Less than Less than or equal to Equal to Not equal to Greater than or equal to Greater than Matches a regular expression Doesn’t match a regular expression Logical AND Logical OR Logical NOT 12/13/11 10:47 AM Your UNIX/Linux: The Ultimate Guide 746 17.7 (i), (iv), and (vii) are opening mode flags; the rest are status flags A file can be opened in one of the modes, but each mode can be associated with one or more status flags O_SYNC ensures that write doesn’t return until the physical write to disk has been completed 17.8 #include #include int main(int argc, char **argv) { int fd, n; char u; fd = open(argv[1], O_RDONLY); while ((n = read(fd, &u, 1)) > 0) { if (u >=97 && u d_name, &statbuf) == 0) if (S_ISREG(statbuf.st_mode)) if ((size = statbuf.st_size) > 100000) das76205_AppI_728-751.indd 747 12/13/11 10:56 AM Your UNIX/Linux: The Ultimate Guide 748 printf(“%s: %d\n”, direntry->d_name, size); else if (size == 0) unlink(direntry->d_name); exit(0); } 17.13 #include #include int main(void) { int fd; mode_t old_mode; old_mode = umask(0); /* No mask */ fd = open(“foo”, O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); chmod(“foo”, S_IRWXU | S_IRGRP | S_IWGRP | S_IROTH); system(“ls -l foo”); fchmod(fd, S_IRUSR | S_IRGRP); /* Can use fchmod also */ system(“ls -l foo”); umask(old_mode); /* Revert to previous mask */ exit(0); } Chapter 18 18.1 Because the addresses specified in the executable don’t point to actual memory locations The text segment is loaded directly from the disk file 18.2 When a process makes a system call that keeps the CPU idle 18.3 fork returns the PID in the parent and zero in the child This makes it possible for the parent to control the child 18.4 dup, dup2, fcntl, pipe 18.5 #include #include int main(void) { if (fork() > 0) fork(); printf(“PID: %d PPID: %d\n”, getpid(), getppid()); } 18.6 (i) #include int main(void) { execl(“/bin/wc”, “wc”, “-l”, “-c”, “/etc/passwd”, (char *) 0); printf(“execl error\n”); } das76205_AppI_728-751.indd 748 12/13/11 10:56 AM Appendix I: Solutions to Self-Test Questions 749 (ii) #include int main(int argc, char **argv) { char *cmdargs[] = { “wc”, “-l”, “-c”, “/etc/passwd”, NULL }; execv(“/bin/wc”, cmdargs); printf(“execv error\n”); } When using execlp and execvp, change the first argument to “wc” 18.7 A parent should wait to pick up the exit status of the child from the process table If it doesn’t, the child turns into a zombie and retains its process table entry 18.8 #include #include int main (int argc, char **argv) { int a, b, c, status; switch(fork()) { case 0: a = atoi(argv[1]); b = atoi(argv[2]); c = a + b ; exit(c); default: wait(&status); printf(“The sum of the two numbers is %d\n”, WEXITSTATUS(status)); exit(20); } } 18.9 The exit status is actually eight bits long, and the value set inside the shell script is a large value $? stores only the last eight bits 18.10 False 18.11 #include int main (void) { dup2(STDOUT_FILENO, STDERR_FILENO); write(STDERR_FILENO, “hello dolly\n”, 12); exit(0); } 18.12 A background process has no controlling terminal, so it can’t be sent a signal from the keyboard das76205_AppI_728-751.indd 749 12/13/11 10:56 AM Your UNIX/Linux: The Ultimate Guide 750 18.13 The output of this program will always show two lines for the SIGKILL and SIGSTOP signals: #include #include int main (void) { struct sigaction act; act.sa_handler = SIG_IGN; /* Disposition set to ignore */ int i; for (i = 1; i

Ngày đăng: 18/10/2022, 13:11

TỪ KHÓA LIÊN QUAN