Regular Expression

Bellow is a list of all the metacharacters that Ruby supports.

Letters and numbers without \ are not metacharacters
Symbols with \ are not metacharacters

…is the rule.

^
Beginning of line. Match directly before the first character or line feed.
$
End of line. Match directly before the end of a character string or line feed.
```
p "\n".gsub(/$/, "o")     # => "o\no"
```
.
Match any single character that excludes a line feed (when working with multi-byte characters, this refers to one character, not one byte). With the Regular Expression option m (multiple line mode. See the Regular Expression Literal.), it matches with any character that includes a line feed.
\w
Letters or numbers. The same as [0-9A-Za-z_]
Matches the Japanese double-byte characters.
\W
Non-alphanumeric character. Characters besides \w.
\s
Space character. The same as [ \t\n\r\f].
\S
Non-blank character. Characters besides [ \t\n\r\f]
\d
Number. The same as [0-9].
\D
Non-number.
\A
Beginning of a Character String. Differently from ^, it does not influence the effectiveness of a line feed.
\Z
End of a character string. Matches the front of a line feed if the character string ends at the line feed.
```
p "\n".gsub(/\Z/, "o")     # => "o\no"
```
\z
End of a character string. Differently from $ or \Z, it does not influence the effectiveness of a line feed.
\b
Outside the specified character class it is a language boundary. (Matches between \a and \W.) While in the specified class it is a back space (0x08).
\B
Non-language boundary.
\G
Matches (doesn't have a width) the place matched from the previous one (directly after). Matches the front position only the first time. (Same as \A)
Can use with scan or gsub. Use when you want to make a match after the location that was matched the time before.
```
#  Takes values from the front of the line three at a time (for as long as the values continue.)
str = "123456 789"
str.scan(/\G\d\d\d/) {|m| p m }
```
[ ]
Set character class. See character class.
*
Return the expression from directly before more then 0 times. Will try to match for as long as possible.
*?
Quantifiers. Return the expression from directly before more then 0 times. (At shortest 1 match)
+
Quantifiers. Return the expression from directly before more then 1 time.
+?
Quantifiers. Return the expression from directly before more then 1 time. (At shortest 1 match.)
{m}
{m,}
{m,n}
Control the return of a set group (interval quantifier). Return all of the Regular Expressions from directly before.
- m times
- more then m times
- more then m times, at most n times.
Matches for {n} or {,} will always fail.
```
str = "foofoofoo"
p str[/(foo){1}/]   # => "foo"
p str[/(foo){2,}/]  # => "foofoofoo"
p str[/(foo){1,2}/] # => "foofoo"
```
Regular Expression ?, *, + are all the same as {0,1}, {0,} {1,}.
{m}?
{m,}?
{m,n}?
Interval Quantifier. Will return the regular expression from each before for
- m times
- more then m times
- more then m times, at most n times.
at least one time.
?
Quantifiers. Will return the previous regular expression 1 or 0 times.
??
Quantifiers. Will return the previous regular expression 1 or 0 times (at least once).
|
Alternative.
( )
Regular Expression group movement. The character string matched to the regular expression in parenthesis is remembered for back referencing.
\1, \2 ... \n
Back reference. Reference a Back Reference.
(?# )
Comment. Ignore all the character strings in parentheses.
(?: )
Back Reference without group movement. That is, use a flexible group movement without becoming the targets \1, \2 (also $1, $2) etc .
```
/(abc)/ =‾ "abc"
p $1
=> "abc"

/(?:abc)/ =‾ "abc"
p $1
=> nil
```
(?= )
Lookahead. Set location according to pattern. (Has no width.)
The
```
(?=re1)re2
```
expression is a regular expression that matches a match of both re1 and re2.
The
```
re1(?=re2)
```
expression, is a regular expression that continues to the following character string which matches with re2, called re1.
```
p /foo(?=bar)/ =‾ "foobar"      # => 0
p $&    # => "foo"   (no information about the "bar" section)
```

(?! )

Negative Loookahead. Set place depending on the negation of a pattern. (Has not width.)

The

(?!re1)re2

expression, is a regular expression that does not match re1 but does match re2.

# 000 Number that excludes 3 characters
re = /(?!000)\d\d\d/
p re =‾ "000"   # => nil
p re =‾ "012"   # => 0
p re =‾ "123"   # => 0

#C Identifier (Starts at [A-Za-z_] and continues with the [0-9A-Za-z_] character string.

/\b(?![0-9])\w+\b/

Back References

The regular expression \1 \2 … \n is a back reference. It matches the character string matched in the nth parentheses (Regular Expression ( ) Grouping).

/((foo)bar)\1\2/

is the same as:

/((foo)bar)foobarfoo/

Example:

re = /(foo|bar|baz)\1/
p re =‾ 'foofoo'   # => 0
p re =‾ 'barbar'   # => 0
p re =‾ 'bazbaz'   # => 0
p re =‾ 'foobar'   # => nil

The parentheses in use must be more to the left than the back reference.

If there is a back reference in the parentheses in use the match will consistently fail. Also, the match will consistently fail when a single digit back reference has no parenthesis too.

p /(\1)/ =‾ "foofoofoo" # => nil
p /(foo)\2/ =‾ "foo\2"  # => nil

While one can specify a back reference greater then 2 digits, one must be carefull not to confuse it with \nnn (characters corresponding to the octal nnn) of backslash notation. If a numeric value is 1 digit, it is a back reference. When establishing more then 2 digits, it will be perceived as 8-bit code if parentheses are not used.

Also, when working with regular expressions it is necessary to start with 0 (such as \01, etc.) when using 1 bit code in 8 bit. (There is no back reference \0 so it isn't unclear.)

p   /\1/ =‾ "\1"   # => nil     # back reference that doesn’t use parentheses.
p  /\01/ =‾ "\1"   # => 0       8 bit code
p  /\11/ =‾ "\11"  # => 0       8 bit code

# 8 bit code (because there are no parentheses in use)
p /(.)\10/ =‾ "1\10" # => 0

# back reference (because there are parentheses in use)
p /((((((((((.))))))))))\10/ =‾ "aa"  # => 0

# 8 bit code (However because there is no such
# \08 "\0" + "8" 8 bit code)
p /(.)\08/ =‾ "1\0008" # => 0

#If you want to write numbers following a back reference 
#you have to use parentheses to group them and split them up.
p /(.)(\1)1/ =‾ "111"   # => 0

Character Class

Regular class [] is a set character class. One character listed inside the [] will be matched.

For example, for /[abc]/ one of "a", "b" or "c" will be matched. You can also write character strings using the "-" when characters follow the ASCII code order like this: /[a-c]/. Also, if the first character is a ^ character from outside of the set character string will be matched.

Any ‘^' not at the beginning will be matched with that character. Also, any "-" at the front or end of a line will be matched with that character.

p /[a^]/ =‾ "^"   # => 0
p /[-a]/ =‾ "-"   # => 0
p /[a-]/ =‾ "-"   # => 0
p /[-]/ =‾ "-"   # => 0

A blank character class will result in an error.

p /[]/ =‾ ""
p /[^]/ =‾ "^"
# => invalid regular expression; empty character class: /[^]/

The "]" at the front of a line (or directly after a NOT "^") doesn't mean that the character class is over but is just a simple "]". It is recommended that this kind of "]" performs a backslash escape.

p /[]]/ =‾ "]"       # => 0
p /[^]]/ =‾ "]"      # => nil

"^", "-", "]" and "\\" (backslash) can do a backslash escape and make a match with that character.

p /[\^]/ =‾ "^"   # => 0
p /[\-]/ =‾ "-"   # => 0
p /[\]]/ =‾ "]"   # => 0
p /[\\]/ =‾ "\\"  # => 0

Inside the [] you can use character string and the same backslash notation, and also the regular expressions \w, \W, \s, \S, \d, \D (these are short-hand for the character class).

Please pay attention to to the fact that the character classes below can make a match with a line feed character too according to the negation (the same is true with regular expressions \W and \D.)

p /[^a-z]/ =‾ "\n"    # => 0

PreviousAppendix Nextsprintf Format

Last updated 1 year ago