BBEdit-TextWrangler Regular Expression Cheat-Sheet - BBEdit-TextWranglerRegExCheatSheet.txt. Allow the grep engine to match at ^ and $ after and before at r. Of sed, one must understand 'regular expressions.' For this, see 'Mastering Regular Expressions' by Jeffrey Friedl (O'Reilly, 1997). The manual ('man') pages on Unix systems may be helpful (try 'man: sed', 'man regexp', or the subsection on regular expressions in 'man: ed'), but man pages are notoriously difficult. They are not written to. Feel free to email me with any GREP questions you might have. Chances are, the GREP you need has been done before and can be found online or through my resources. Be sure to check out Peter Kahrel’s book, “Grep in InDesign.” If you’re a member of Adobe’s InDesign User Group, you can get it at a discount. Regular Expressions in grep. Regular Expressions is nothing but a pattern to match for each input line. A pattern is a sequence of characters. Following all are examples of pattern: ^w1 w1 w2 ^ foo bar 0-9 Three types of regex. The grep understands three different types of regular expression syntax as follows: basic (BRE) extended (ERE. Regular expressions enable strings that match a particular pattern within textual data records to be located and modi ed and they are often used within utility programs and programming languages that manipulate textual data. Regular expressions are extremely powerful. Supporting Software and Tools 1 Command Line Tools: grep, egrep, sed.
A cheat sheet about regular expressions in Sublime Text.
expression | Description |
---|---|
. | Match any character |
^ | Match line begin |
$ | Match line end |
* | Match previous RE 0 or more times greedily |
*? | Match previous RE 0 or more times non-greedily |
+ | Match previous RE 1 or more times greedily |
+? | Match previous RE 1 or more times non-greedily |
? | Match previous RE 0 or 1 time greedily |
?? | Match previous RE 0 or 1 time non-greedily |
A|B | Match either RE A or B |
{m} | Match previous RE exactly m times |
{m,n} | Match previous RE m to n times greedily |
{m, n}? | Match previous RE m to n times, no-greedily |
expression | Description |
---|---|
[abc] | Match either a , b or c |
[^abc] | Match any character not in this set (i.e., not a , b and c ) |
[a-z] | Match the range from a to z |
[a-f2-8] | Match the range from a to z or the range from 2 to 8 |
[a-z] | Match a , - or z |
[a-] | Match a , - |
[-a] | Match - , a |
[-a] | Match - , a |
[{}*|()[]+^$.?] | Match either one of the chacters in []{}*|()+^$?. |
- Note that you can also use character class inside
[]
, for example,[w]
matches any character inword
character class.
“Multiple character” character class
An expression of the form [[:name:]]
matches the named character class name
.
class name | Description |
---|---|
alnum | Any alpha-numeric character |
alpha | Any alphabetic character. |
digit | Any decimal digit. |
xdigit | Any hexadecimal digit character. |
lower | Any lower case character. |
upper | Any upper case character. |
cntrl | Any control character1. |
print | Any printable character. |
punct | Any punctuation character. 2 |
space | Any whitespace character. 3 |
word | Any word character (alphanumeric characters plus the underscore). |
Note: To use upper
and lower
, you have to enable case sensitve search.
“Single character” character class
class name | Description |
---|---|
d | Equal to [[:digit:]] |
l | Equal to [[:lower:]] |
u | Equal to [[:upper:]] |
s | Equal to [[:space:]] |
w | Equal to [[:word:]] |
D | Equal to [^[:digit:]] |
L | Equal to [^[:lower:]] |
U | Equal to [^[:upper:]] |
W | Equal to [^[:word:]] |
Defining capture groups
expression | Description |
---|---|
(?<NAME>pattern) | Define a regex group named NAME which you can later refer to with g{NAME} |
(?=pattern) | Positive lookahead, consumes zero characters, the preceding RE only matches if this matches |
(?!pattern) | Negative lookahead, consumes zero characters, the preceding RE only matches if this does not match |
(?<=pattern) | Positive lookbehind, consumes zero characters, the following RE will only match if preceded with this fixed length RE. |
(?<!pattern) | Negative lookbehind, consumes zero characters, the following RE will only match if not preceded with this fixed length RE. |
Refering to matching groups (capture groups)
expression | Description |
---|---|
1 | Refer to first regex group |
g{1} | Refer to first regex group |
g{12} | Refer to 12th regex group |
g{-1} | Refer to last regex group |
g{-2} | Refer to last but one regex group |
- The regex groups are indexed by the order of their opening braces.
- Note the
g{NUM}
form allows for matching regex group index larger than 9, for example,g{12}
.
Escapes
class name | Description |
---|---|
xdd | A hexadecimal escape sequence - matches the single character whose code point is 0xdd . |
x{dddd} | A hexadecimal escape sequence - matches the single character whose code point is 0xdddd . |
Word boundaries
The following escape sequences match the boundaries of words:
class name | Description |
---|---|
< | Matches the start of a word. |
> | Matches the end of a word. |
b | Matches a word boundary (the start or end of a word). |
B | Matches only when not at a word boundary. |
The title image is taken from here.
Control character explanation: https://en.wikipedia.org/wiki/Control_character↩︎
There are 14 punctuation marks in English: https://grammar.yourdictionary.com/punctuation/what/fourteen-punctuation-marks.html↩︎
For whitespace character, see https://en.wikipedia.org/wiki/Whitespace_character↩︎
Cheat sheet based off the Udemy cysa+ course from Jason Dion – video 75 as i’m sure i’ll end up looking for it at some point in the future.
REGEX:
[] – Match a single instance of a chracter from a range such as a-z A-Z 0-9 or for all [a-zA-Z0-9]
[s] – Match whitespace
[d] – Match a digit
+ – Match one or more occurrences e.g. d+-
*- Match zero or more occurrences e.g. d*
? – Match one or none occureences e.g. d?
{} – Match the number of times within the braces e.g. d{3} finds 3 digits in a row or d{7-10} matches 7,8,9 or 10 digits in a row
| – OR
^ – Only search at the start of a line
$ – Only search at the end of a line
GREP:
-F = search for a literal value, can use “” instead of -F
-r = recursive
-i = Ignore case sensitivity
-v = Find things which do not match
Grep Wildcard Regex
-w = Treat search strings as words (instead of parts of words)
-c = Show count of matches
Regex Cheat Sheet Pdf
-l = Return names of files containing matches
Grep Regex Cheat Sheet 2020
-L = Return names of files without matches