1. Trang chủ
  2. » Giáo án - Bài giảng

regular-expressions-cookbook_-detailed-solutions-in-eight-programming-languages-[goyvaerts-_-levithan-2009-06-01]

511 101 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 511
Dung lượng 4,27 MB

Nội dung

CuuDuongThanCong.com https://fb.com/tailieudientucntt Regular Expressions Cookbook CuuDuongThanCong.com https://fb.com/tailieudientucntt CuuDuongThanCong.com https://fb.com/tailieudientucntt Regular Expressions Cookbook Jan Goyvaerts and Steven Levithan Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo CuuDuongThanCong.com https://fb.com/tailieudientucntt Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan Copyright © 2009 Jan Goyvaerts and Steven Levithan All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com Editor: Andy Oram Production Editor: Sumita Mukherji Copyeditor: Genevieve d’Entremont Proofreader: Kiel Van Horn Indexer: Seth Maislin Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: May 2009: First Edition Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc Regular Expressions Cookbook, the image of a musk shrew and related trade dress are trademarks of O’Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein TM This book uses RepKover™, a durable and flexible lay-flat binding ISBN: 978-0-596-52068-7 [M] 1242318889 CuuDuongThanCong.com https://fb.com/tailieudientucntt Table of Contents Preface ix Introduction to Regular Expressions Regular Expressions Defined Searching and Replacing with Regular Expressions Tools for Working with Regular Expressions Basic Regular Expression Skills 25 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 Match Literal Text Match Nonprintable Characters Match One of Many Characters Match Any Character Match Something at the Start and/or the End of a Line Match Whole Words Unicode Code Points, Properties, Blocks, and Scripts Match One of Several Alternatives Group and Capture Parts of the Match Match Previously Matched Text Again Capture and Name Parts of the Match Repeat Part of the Regex a Certain Number of Times Choose Minimal or Maximal Repetition Eliminate Needless Backtracking Prevent Runaway Repetition Test for a Match Without Adding It to the Overall Match Match One of Two Alternatives Based on a Condition Add Comments to a Regular Expression Insert Literal Text into the Replacement Text Insert the Regex Match into the Replacement Text Insert Part of the Regex Match into the Replacement Text Insert Match Context into the Replacement Text 26 28 30 34 36 41 43 55 57 60 62 64 67 70 72 75 81 83 85 87 88 92 v CuuDuongThanCong.com https://fb.com/tailieudientucntt Programming with Regular Expressions 95 Programming Languages and Regex Flavors 3.1 Literal Regular Expressions in Source Code 3.2 Import the Regular Expression Library 3.3 Creating Regular Expression Objects 3.4 Setting Regular Expression Options 3.5 Test Whether a Match Can Be Found Within a Subject String 3.6 Test Whether a Regex Matches the Subject String Entirely 3.7 Retrieve the Matched Text 3.8 Determine the Position and Length of the Match 3.9 Retrieve Part of the Matched Text 3.10 Retrieve a List of All Matches 3.11 Iterate over All Matches 3.12 Validate Matches in Procedural Code 3.13 Find a Match Within Another Match 3.14 Replace All Matches 3.15 Replace Matches Reusing Parts of the Match 3.16 Replace Matches with Replacements Generated in Code 3.17 Replace All Matches Within the Matches of Another Regex 3.18 Replace All Matches Between the Matches of Another Regex 3.19 Split a String 3.20 Split a String, Keeping the Regex Matches 3.21 Search Line by Line 95 100 106 108 114 121 127 132 138 143 150 155 161 165 169 176 181 187 189 195 203 208 Validation and Formatting 213 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 Validate Email Addresses Validate and Format North American Phone Numbers Validate International Phone Numbers Validate Traditional Date Formats Accurately Validate Traditional Date Formats Validate Traditional Time Formats Validate ISO 8601 Dates and Times Limit Input to Alphanumeric Characters Limit the Length of Text Limit the Number of Lines in Text Validate Affirmative Responses Validate Social Security Numbers Validate ISBNs Validate ZIP Codes Validate Canadian Postal Codes Validate U.K Postcodes Find Addresses with Post Office Boxes vi | Table of Contents CuuDuongThanCong.com https://fb.com/tailieudientucntt 213 219 224 226 229 234 237 241 244 248 253 254 257 264 265 266 266 4.18 Reformat Names From “FirstName LastName” to “LastName, FirstName” 4.19 Validate Credit Card Numbers 4.20 European VAT Numbers 268 271 278 Words, Lines, and Special Characters 285 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 Find a Specific Word Find Any of Multiple Words Find Similar Words Find All Except a Specific Word Find Any Word Not Followed by a Specific Word Find Any Word Not Preceded by a Specific Word Find Words Near Each Other Find Repeated Words Remove Duplicate Lines Match Complete Lines That Contain a Word Match Complete Lines That Do Not Contain a Word Trim Leading and Trailing Whitespace Replace Repeated Whitespace with a Single Space Escape Regular Expression Metacharacters 285 288 290 294 295 297 300 306 308 312 313 314 317 319 Numbers 323 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 Integer Numbers Hexadecimal Numbers Binary Numbers Strip Leading Zeros Numbers Within a Certain Range Hexadecimal Numbers Within a Certain Range Floating Point Numbers Numbers with Thousand Separators Roman Numerals 323 326 329 330 331 337 340 343 344 URLs, Paths, and Internet Addresses 347 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 Validating URLs Finding URLs Within Full Text Finding Quoted URLs in Full Text Finding URLs with Parentheses in Full Text Turn URLs into Links Validating URNs Validating Generic URLs Extracting the Scheme from a URL Extracting the User from a URL Extracting the Host from a URL 347 350 352 353 356 356 358 364 366 367 Table of Contents | vii CuuDuongThanCong.com https://fb.com/tailieudientucntt 7.11 7.12 7.13 7.14 7.15 7.16 7.17 7.18 7.19 7.20 7.21 7.22 7.23 7.24 7.25 Extracting the Port from a URL Extracting the Path from a URL Extracting the Query from a URL Extracting the Fragment from a URL Validating Domain Names Matching IPv4 Addresses Matching IPv6 Addresses Validate Windows Paths Split Windows Paths into Their Parts Extract the Drive Letter from a Windows Path Extract the Server and Share from a UNC Path Extract the Folder from a Windows Path Extract the Filename from a Windows Path Extract the File Extension from a Windows Path Strip Invalid Characters from Filenames 369 371 374 376 376 379 381 395 397 402 403 404 406 407 408 Markup and Data Interchange 411 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 Find XML-Style Tags Replace Tags with Remove All XML-Style Tags Except and Match XML Names Convert Plain Text to HTML by Adding

and Tags Find a Specific Attribute in XML-Style Tags Add a cellspacing Attribute to Tags That Do Not Already Include It Remove XML-Style Comments Find Words Within XML-Style Comments Change the Delimiter Used in CSV Files Extract CSV Fields from a Specific Column Match INI Section Headers Match INI Section Blocks Match INI Name-Value Pairs 417 434 438 441 447 450 455 458 462 466 469 473 475 476 Index 479 viii | Table of Contents CuuDuongThanCong.com https://fb.com/tailieudientucntt Preface Over the past decade, regular expressions have experienced a remarkable rise in popularity Today, all the popular programming languages include a powerful regular expression library, or even have regular expression support built right into the language Many developers have taken advantage of these regular expression features to provide the users of their applications the ability to search or filter through their data using a regular expression Regular expressions are everywhere Many books have been published to ride the wave of regular expression adoption Most a good job of explaining the regular expression syntax along with some examples and a reference But there aren’t any books that present solutions based on regular expressions to a wide range of real-world practical problems dealing with text on a computer and in a range of Internet applications We, Steve and Jan, decided to fill that need with this book We particularly wanted to show how you can use regular expressions in situations where people with limited with regular expression experience would say it can’t be done, or where software purists would say a regular expression isn’t the right tool for the job Because regular expressions are everywhere these days, they are often a readily available tool that can be used by end users, without the need to involve a team of programmers Even programmers can often save time by using a few regular expressions for information retrieval and alteration tasks that would take hours or days to code in procedural code, or that would otherwise require a third-party library that needs prior review and management approval Caught in the Snarls of Different Versions As with anything that becomes popular in the IT industry, regular expressions come in many different implementations, with varying degrees of compatibility This has resulted in many different regular expression flavors that don’t always act the same way, or work at all, on a particular regular expression ix CuuDuongThanCong.com https://fb.com/tailieudientucntt ... make a regular expression case-insensitive in JavaScript, set the /i flag when creating it Flavor-Specific Features NET character class subtraction [a-zA-Z 0-9 -[ g-zG-Z]] This regular expression... them with subtraction, ‹[p{IsThai }-[ P{N}]]› matches any of the 10 Thai digits Java character class union, subtraction, and intersection [a-f[A-F][ 0-9 ]] [a-f[A-F[ 0-9 ]]] Java allows one character... and the nested class does not match ‹[g-zG-Z_]›, those are dropped from the final character class, leaving only the hexadecimal digits: [a-zA-Z 0-9 &&[^g-zG-Z]] 2.3 Match One of Many Characters

Ngày đăng: 14/09/2020, 23:07

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w