Mastering Regular Expressions 3rd Edition by Jeffrey Friedl – Ebook PDF Instant Download/Delivery: 0596528124, 9780596528126
Full download Mastering Regular Expressions 3rd Edition after payment
Product details:
ISBN 10: 0596528124
ISBN 13: 9780596528126
Author: Jeffrey E. F. Friedl
Regular expressions are an extremely powerful tool for manipulating text and data. They are now standard features in a wide range of languages and popular tools, including Perl, Python, Ruby, Java, VB.NET and C# (and any language using the .NET Framework), PHP, and MySQL.If you don’t use regular expressions yet, you will discover in this book a whole new world of mastery over your data. If you already use them, you’ll appreciate this book’s unprecedented detail and breadth of coverage. If you think you know all you need to know about regularexpressions, this book is a stunning eye-opener.As this book shows, a command of regular expressions is an invaluable skill. Regular expressions allow you to code complex and subtle text processing that you never imagined could be automated. Regular expressions can save you time and aggravation. They can be used to craft elegant solutions to a wide range of problems. Once you’ve mastered regular expressions, they’ll become an invaluable part of your toolkit. You will wonder how you ever got by without them.Yet despite their wide availability, flexibility, and unparalleled power, regular expressions are frequently underutilized. Yet what is power in the hands of an expert can be fraught with peril for the unwary. Mastering Regular Expressions will help you navigate the minefield to becoming an expert and help you optimize your use of regular expressions. Mastering Regular Expressions, Third Edition, now includes a full chapter devoted to PHP and its powerful and expressive suite of regular expression functions, in addition to enhanced PHP coverage in the central ‘core’ chapters. Furthermore, this edition has been updated throughout to reflect advances in other languages, including expanded in-depth coverage of Sun’s java.util.regex package, which has emerged as the standard Java regex implementation.Topics include:A comparison of features among different versions of many languages and toolsHow the regular expression engine worksOptimization (major savings available here!)Matching just what you want, but not what you don’t wantSections and chapters on individual languagesWritten in the lucid, entertaining tone that makes a complex, dry topic become crystal-clear to programmers, and sprinkled with solutions to complex real-world problems, Mastering Regular Expressions, Third Edition offers a wealth information that you can put to immediateuse.Reviews of this new edition and the second edition:’There isn’t a better (or more useful) book available on regular expressions.’–Zak Greant, Managing Director, eZ Systems’A real tour-de-force of a book which not only covers the mechanics of regexes in extraordinary detail but also talks about efficiency and the use of regexes in Perl, Java, and .NET…If you use regular expressions as part of your professional work (even if you already have a good book on whatever language you’re programming in) I would strongly recommend this book to you.’–Dr. Chris Brown, Linux Format’The author does an outstanding job leading the reader from regexnovice to master. The book is extremely easy to read and chock full ofuseful and relevant examples…Regular expressions are valuable toolsthat every developer should have in their toolbox. Mastering RegularExpressions is the definitive guide to the subject, and an outstandingresource that belongs on every programmer’s bookshelf. Ten out of TenHorseshoes.’–Jason Menard, Java Ranch
Mastering Regular Expressions 3rd Table of contents:
Ch. 1: Introduction to Regular Expressions
Solving Real Problems
Regular Expressions as a Language
The Filename Analogy
The Language Analogy
The goal of this book
The Regular-Expression Frame of Mind
If You Have Some Regular-Expression Experience
Searching Text Files: Egrep
Egrep Metacharacters
Start and End of the Line
Character Classes
Matching any one of several characters
Negated character classes
Matching Any Character with Dot
Alternation
Matching any one of several subexpressions
Ignoring Differences in Capitalization
Word Boundaries
In a Nutshell
Optional Items
Other Quantifiers: Repetition
Defined range of matches: intervals
Parentheses and Backreferences
The Great Escape
Expanding the Foundation
Linguistic Diversification
The Goal of a Regular Expression
A Few More Examples
Variable names
A string within double quotes
Dollar amount (with optional cents)
An HTTP/HTML URL
An HTML tag
Time of day, such as “9:17 am” or “12:30 pm”
Regular Expression Nomenclature
Regex
Matching
Metacharacter
Flavor
Subexpression
Character
Improving on the Status Quo
Summary
Personal Glimpses
Ch. 2: Extended Introductory Examples
About the Examples
A Short Introduction to Perl
Matching Text with Regular Expressions
Toward a More Real-World Example
Side Effects of a Successful Match
Intertwined Regular Expressions
A short aside–metacharacters galore
Generic “whitespace” with s
Intermission
Modifying Text with Regular Expressions
Example: Form Letter
Example: Prettifying a Stock Price
Automated Editing
A Small Mail Utility
Real-world problems, real-world solutions
The “real” real world
Adding Commas to a Number with Lookaround
Lookaround doesn’t “consume” text
A few more lookahead examples
Back to the comma example…
Word boundaries and negative lookaround
Commafication without lookbehind
Text-to-HTML Conversion
Cooking special characters
Separating paragraphs
“Linkizing” an email address
Matching the username and hostname
Putting it together
“Linkizing” an HTTP URL
Building a regex library
Why `$’ and ` @’ sometimes need to be escaped
That Doubled-Word Thing
Moving bits around: operators, functions, and objects
Ch. 3: Overview of Regular Expressions Features and Flavors
Regular Expressions and Cars
In This Chapter
A Casual Stroll Across the Regex Landscape
The Origins of Regular Expressions
Grep’s metacharacters
Grep evolves
Egrep evolves
Other species evolve
POSIX–An attempt at standardization
Henry Spencer’s regex package
Perl evolves
A partial consolidation of flavors
Versions as of this book
At a Glance
Care and Handling of Regular Expressions
Integrated Handling
Procedural and Object-Oriented Handling
Regex handling in Java
A procedural example
Regex handling in VB and other .NET languages
Regex handling in PHP
Regex handling in Python
Why do approaches differ?
A Search-and-Replace Example
Search and replace in Java
Search and replace in VB.NET
Search and replace in PHP
Search and Replace in Other Languages
Awk
Tcl
GNU Emacs
Care and Handling: Summary
Strings, Character Encodings, and Modes
Strings as Regular Expressions
Strings in Java
Strings in VB.NET
Strings in C#
Strings in PHP
Strings in Python
Strings in Tcl
Regex literals in Perl
Character-Encoding Issues
Richness of encoding-related support
Unicode
Characters versus combining-character sequences
Multiple code points for the same character
Unicode 3.1+ and code points beyond U +FFFF
Unicode line terminator
Regex Modes and Match Modes
Case-insensitive match mode
Free-spacing and comments regex mode
Dot-matches-all match mode (a.k.a., “single-line mode”)
An unfortunate name.
Enhanced line-anchor match mode (a.k.a., “multiline mode”)
Literal-text regex mode
Common Metacharacters and Features
Character Representations
Character shorthands
These are machine dependent?
Octal escape– num
Hex and Unicode escapes: xnum, x{num}, unum, Unum, …
Control characters: cchar
Character Classes and Class-Like Constructs
Normal classes: [a-z]and [^a-z]
Almost any character: dot
Dot versus a negated character class
Exactly one byte
Unicode combining character sequence: X
Class shorthands: w, d, s, W, D, S
Unicode properties, scripts, and blocks: p{Prop }, P{Prop }
Scripts.
Blocks.
Other properties/qualities.
Simple class subtraction:
Full class set operations:
Class subtraction with set operators.
Mimicking class set operations with lookaround.
POSIX bracket-expression “character class”: [[:alpha:]]
POSIX bracket-expression “collating sequences”: [[.span-ll.]]
POSIX bracket-expression “character equivalents”: [[=n=]]
Emacs syntax classes
Anchors and Other “Zero-Width Assertions”
Start of line/string: ^, A
End of line/string: $, Z, z
Start of match (or end of previous match): G
End of previous match, or start of the current match?
Word boundaries: b, B, , …
Lookahead (?=•••), (?!•••); Lookbehind, (?<=•••), (?<!•••)
Comments and Mode Modifiers
Mode modifier: (?modifier ), such as (?i)or (?-i)
Mode-modified span: (?modifier :•••), such as (?i:•••)
Comments: (?#•••)and #•••
Literal-text span: Q•••E
Grouping, Capturing, Conditionals, and Control
Capturing/Grouping Parentheses: (•••)and 1, 2, …
Grouping-only parentheses: (?:•••)
Named capture: (?•••)
Atomic grouping: (?>•••)
Alternation: •••<•••<•••
Conditional: (?if then |else )
Using a special reference to capturing parentheses as the test
Using lookaround as the test.
Other tests for the conditional.
Greedy quantifier s: *, +, ?, {num,num}
Intervals– {min ,max }or {min ,max }
Lazy quantifier s: *, ?, +?, ??, {num,num}?
Possessive quantifier s: *, +, ++, ?+, {num,num}+
Guide to the Advanced Chapters
Ch. 4: The Mechanics of Expression Processing
Start Your Engines!
Two Kinds of Engines
New Standards
The impact of standards
Regex Engine Types
From the Department of Redundancy Department
Testing the Engine Type
Traditional NFA or not?
DFA or POSIX NFA?
Match Basics
About the Examples
Rule 1: The Match That Begins Earliest Wins
The “transmission” and the bump-along
The transmission’s main work: the bump-along
Engine Pieces and Parts
No “electric” parentheses, backreferences, or lazy quantifiers
Rule 2: The Standard Quantifiers Are Greedy
A subjective example
Being too greedy
First come, first served
Getting down to the details
Regex-Directed Versus Text-Directed
NFA Engine: Regex-Directed
The control benefits of an NFA engine
DFA Engine: Text-Directed
First Thoughts: NFA and DFA in Comparison
Consequences to us as users
Backtracking
A Really Crummy Analogy
A crummy little example
Two Important Points on Backtracking
Saved States
A match without backtracking
A match after backtracking
A non-match
A lazy match
Backtracking and Greediness
Star, plus, and their backtracking
Revisiting a fuller example
More About Greediness and Backtracking
Problems of Greediness
Multi-Character “Quotes”
Using Lazy Quantifiers
Greediness and Laziness Always Favor a Match
The Essence of Greediness, Laziness, and Backtracking
Possessive Quantifiers and Atomic Grouping
Atomic grouping with !(?>•••)”
The essence of atomic grouping
Some states may remain.
Faster failures with atomic grouping.
Possessive Quantifiers, ?+, ++, ++, and {m,n}+
The Backtracking of Lookaround
Mimicking atomic grouping with positive lookahead
Is Alternation Greedy?
Taking Advantage of Ordered Alternation
Ordered alternation pitfalls
NFA, DFA, and POSIX
“The Longest-Leftmost”
Really, the longest
POSIX and the Longest-Leftmost Rule
Speed and Efficiency
DFA efficiency
Summary: NFA and DFA in Comparison
DFA versus NFA: Differences in the pre-use compile
DFA versus NFA: Differences in match speed
DFA versus NFA: Differences in what is matched
DFA versus NFA: Differences in capabilities
DFA versus NFA: Differences in ease of implementation
Summary
Ch. 5: Practical Regex Techniques
Regex Balancing Act
A Few Short Examples
Continuing with Continuation Lines
Matching an IP Address
Know your context
Working with Filenames
Removing the leading path from a filename
Accessing the filename from a path
Both leading path and filename
Matching Balanced Sets of Parentheses
Watching Out for Unwanted Matches
Matching Delimited Text
Allowing escaped quotes in double-quoted strings
Knowing Your Data and Making Assumptions
Stripping Leading and Trailing Whitespace
HTML-Related Examples
Matching an HTML Tag
Matching an HTML Link
Examining an HTTP URL
Validating a Hostname
Plucking Out a URL in the Real World
Extended Examples
Keeping in Sync with Your Data
Keeping the match in sync with expectations
Maintaining sync after a non-match as well
Maintaining sync with G
This example in perspective
Parsing CSV Files
Distrusting the bump-along
Another approach.
One change for the sake of efficiency
Other CSV formats
Ch. 6: Crafting an Efficient Expression
Tests and Backtracks
Traditional NFA versus POSIX NFA
A Sobering Example
A Simple Change–Placing Your Best Foot Forward
Efficiency Versus Correctness
Advancing Further–Localizing the Greediness
Reality Check
“Exponential” matches
A Global View of Backtracking
More Work for a POSIX NFA
Work Required During a Non-Match
Being More Specific
Alternation Can Be Expensive
Benchmarking
Know What You’re Measuring
Benchmarking with PHP
Benchmarking with Java
Benchmarking with VB.NET
Benchmarking with Ruby
Benchmarking with Python
Benchmarking with Tcl
Common Optimizations
No Free Lunch
Everyone’s Lunch is Different
The Mechanics of Regex Application
Pre-Application Optimizations
Compile caching
Compile caching in the integrated approach
Compile caching in the procedural approach
Compile caching in the object-oriented approach
Pre-check of required character/substring optimization
Length-cognizance optimization
Optimizations with the Transmission
Start of string/line anchor optimization
Implicit-anchor optimization
End of string/line anchor optimization
Initial character/c lass/substring discrimination optimization
Embedded literal string check optimization
Length-cognizance transmission optimization
Optimizations of the Regex Itself
Literal string concatenation optimization
Simple quantifier optimization
Needless parentheses elimination
Needless character class elimination
Character following lazy quantifier optimization
“Excessive” backtracking detection
Exponential (a.k.a., super-linear) short-circuiting
State-suppression with possessive quantifiers
Small quantifier equivalence
Need cognizance
Techniques for Faster Expressions
Common Sense Techniques
Avoid recompiling
Use non-capturing parentheses
Don’t add superfluous parentheses
Don’t use superfluous character classes
Use leading anchors
Expose Literal Text
“Factor out” required components from quantifier s
“Factor out” required components from the front of alternation
Expose Anchors
Expose ^and Gat the front of expressions
Expose $at the end of expressions
Lazy Versus Greedy: Be Specific
Split Into Multiple Regular Expressions
Mimic Initial-Character Discrimination
Don’t do this with Tcl
Don’t do this with PHP
Use Atomic Grouping and Possessive Quantifiers
Lead the Engine to a Match
Put the most likely alternative first
Distribute into the end of alternation
This optimization can be dangerous
Unrolling the Loop
Method 1: Building a Regex From Past Experiences
Constructing a general “unrolling-the-loop” pattern
The Real Unrolling-the-Loop” Pattern
Avoiding the neverending match
1) The start of special and normal must never inter sect.
2) Special must not match nothingness.
3) Special must be atomic.
General things to look out for
Method 2: A Top-Down View
Method 3: An Internet Hostname
Observations
Using Atomic Grouping and Possessive Quantifiers
Making a neverending match safe with possessive quantifiers
Making a neverending match safe with atomic grouping
Short Unrolling Examples
Unrolling “multi-character” quotes
Unrolling the continuation-line example
Unrolling the CSV regex
Unrolling C Comments
To unroll or to not unroll…
Avoiding regex headaches
A direct approach
Making it work
Unrolling the C loop
Return to reality
The Freeflowing Regex
A Helping Hand to Guide the Match
A Well-Guided Regex is a Fast Regex
Wrapup
In Summary: Think!
Ch. 7: Perl
Regular Expressions as a Language
Perl’s Greatest Strength
Perl’s Greatest Weakness
Perl’s Regex Flavor
Regex Operands and Regex Literals
Features supported by regex literals
Picking your own regex delimiters
How Regex Literals Are Parsed
Regex Modifiers
Regex-Related Perlisms
Expression Context
Contorting an expression
Dynamic Scope and Regex Match Effects
Global and private variables
Dynamically scoped values
A better analogy: clear transparencies
Regex side effects and dynamic scoping
Dynamic scoping versus lexical scoping
Special Variables Modified by a Match
Using $1within a regex?
The qr/ŁŁŁ/ Operator and Regex Objects
Building and Using Regex Objects
Match modes (or lack thereof) are very sticky
Viewing Regex Objects
Using Regex Objects for Efficiency
The Match Operator
Match’s Regex Operand
Using a regex literal
Using a regex object
The default regex
Special match-once ?ŁŁŁ?
Specifying the Match Target Operand
The default target
Negating the sense of the match
Different Uses of the Match Operator
Normal “does this match?”–scalar context without /g
Normal “pluck data from a string”–list context, without /g
“Pluck all matches”–list context, with the /g modifier
Iterative Matching: Scalar Context, with /g
The “current match location” and the pos()function
Pre-setting a string’s pos
Using G
“Tag-team” matching with /gc
Pos-related summary
The Match Operator’s Environmental Relations
The match operator’s side effects
Outside influences on the match operator
Keeping your mind in context (and context in mind)
The Substitution Operator
The Replacement Operand
The /e Modifier
Multiple uses of /e
Context and Return Value
The Split Operator
Basic Split
Basic match operand
Target string operand
Basic chunk-limit operand
Advanced split
Returning Empty Elements
Trailing empty elements
The chunk-limit operand’s second job
Special matches at the ends of the string
Split’s Special Regex Operands
Split has no side effects
Split’s Match Operand with Capturing Parentheses
Fun with Perl Enhancements
Using a Dynamic Regex to Match Nested Pairs
Using the Embedded-Code Construct
Using embedded code to display match-time information
Using embedded code to see all matches
Finding the longest match
Finding the longest-leftmost match
Using embedded code in a conditional
Using local in an Embedded-Code Construct
A Warning About Embedded Code and my Variables
Matching Nested Constructs with Embedded Code
Overloading Regex Literals
Adding start- and end-of-word metacharacters
Adding support for possessive quantifiers
Problems with Regex-Literal Overloading
Mimicking Named Capture
Perl Efficiency Issues
“There’s More Than One Way to Do It”
Regex Compilation, the /o Modifier, qr/ŁŁŁ/,
The internal mechanics of preparing a regex
Perl steps to reduce regex compilation
Unconditional caching
On-demand recompilation
The “compile once” /o modifier
Potential “gotchas” of /o
Using regex objects for efficiency
Using m/•••/ with regex objects
Using /o with qr/•••/
Using the default regex for efficiency
Understanding the “Pre-Match” Copy
Pre-match copy suppor ts $1, $&, $’, $+, . . .
The pre-match copy is not always needed
The variables $`, $&, and $’are naughty
How expensive is the pre-match copy?
Avoiding the pre-match copy
Never use naughty variables
Don’t use naughty modules.
The Study Function
When not to use study
When study can help
Benchmarking
Regex Debugging Information
Run-time debugging information
Other ways to invoke debugging messages
Final Comments
Ch. 8: Java
Java’s Regex Flavor
Java Support for p{•••}and P{•••}
Unicode properties
Unicode blocks
Special Java character properties
Unicode Line Terminators
Using java.util.regex
The Pattern.compile()Factory
Pattern’s matcher method
The Matcher Object
Applying the Regex
Querying Match Results
Match-result example
Simple Search and Replace
Simple search and replace examples
The replacement argument
Advanced Search and Replace
Search-and-replace examples
In-Place Search and Replace
Using a different-sized replacement
The Matcher’s Region
Points to keep in mind
Setting and inspecting region bounds
Looking outside the current region
Transparent bounds
Anchoring bounds
Method Chaining
Methods for Building a Scanner
Examples illustrating hitEnd and requireEnd
The hitEndbug and its workaround
The workaround
Other Matcher Methods
Querying a matcher’s target text
Other Pattern Methods
Pattern’s split Method, with One Argument
Empty elements with adjacent matches
Pattern’s split Method, with Two Arguments
Split with a limit less than zero
Split with a limit of zero
Split with a limit greater than zero
Additional Examples
Adding Width and Height Attributes to Image Tags
Validating HTML with Multiple Patterns Per Matcher
Parsing Comma-Separated Values (CSV) Text
Java Version Differences
Differences Between 1.4.2 and 1.5.0
New methods in Java 1.5.0
Unicode-support differences between 1.4.2 and 1.5.0
Differences Between 1.5.0 and 1.6
Ch. 9: .NET
.NET’s Regex Flavor
Additional Comments on the Flavor
Named capture
An unfortunate consequence
Conditional tests
“Compiled” expressions
Right-to-left matching
Backslash-dig it ambiguities
ECMAScript mode
Using .NET Regular Expressions
Regex Quickstart
Quickstart: Checking a string for match
Quickstart: Matching and getting the text matched
Quickstart: Matching and getting captured text
Quickstart: Search and replace
Package Overview
Importing the regex namespace
Core Object Overview
Regex objects
Match objects
Group objects
Capture objects
All results are computed at match time
Core Object Details
Creating Regex Objects
Catching exceptions
Regex options
Using Regex Objects
Using a replacement delegate
Using Split with capturing parentheses
Using Match Objects
Using Group Objects
Static “Convenience” Functions
Regex Caching
Support Functions
Regex.Escape(string )
Regex.Unescape(string )
Match.Empty
Regex.CompileToAssembly(•••)
Advanced .NET
Regex Assemblies
Matching Nested Constructs
Capture Objects
Ch. 10: PHP
PHP’s Regex Flavor
The Preg Function Interface
“Pattern” Arguments
PHP single-quoted strings
Delimiters
Pattern modifiers
Pattern modifiers within the regex
Mode modifiers outside the regex
PHP-specific modifiers
The Preg Functions
preg_match
Capturing match data
Trailing “non-participatory” elements stripped
Named capture
Getting more details on the match: PREG_OFFSET_CAPTURE
The offset argument
preg_match_all
Collecting match data
The default PREG_PATTERN_ORDER arrangement
The PREG_SET_ORDER arrangement
preg_match_all and the PREG_OFFSET_CAPTURE flag
preg_match_all with named capture
preg_replace
Basic one-string, one-pattern, one-replacement preg_replace
Multiple subjects, patterns, and replacements
Ordering of array arguments
preg_replace_callback
A callback versus the e pattern modifier
preg_split
preg_split’s limit argument
preg_split’s flag arguments
PREG_SPLIT_OFFSET_CAPTURE
PREG_SPLIT_NO_EMPTY
PREG_SPLIT_DELIM_CAPTURE
preg_grep
preg_quote
“Missing” Preg Functions
preg_regex_to_pattern
The problem
The solution
Syntax-Checking an Unknown Pattern Argument
Syntax-Checking an Unknown Regex
Recursive Expressions
Matching Text with Nested Parentheses
Recursive reference to a set of capturing parentheses
Recursive reference via named capture
More on possessive quantifiers
No Backtracking Into Recursion
Matching a Set of Nested Parentheses
PHP Efficiency Issues
The S Pattern Modifier: “Study”
Standard optimizations, without the S pattern modifier
Enhancing the optimization with the S pattern modifier
When the S pattern modifier can’t help
Suggested use
Extended Examples
CSV Parsing with PHP
Checking Tagged Data for Proper Nesting
The main body of this expression
The second alternative: non-tag text
The third alternative: self-closing tags
The first alternative: a matched set of tags
Possessive quantifiers
Real-world XML
HTML ?
People also search for Mastering Regular Expressions 3rd:
mastering regular expressions 4th edition
mastering regular expressions o’reilly pdf
q regular expression
mastering regular expressions 1st edition pdf
Tags:
Mastering,Jeffrey Friedl,Regular,Expressions