Introduction
Regular expression defines a matching method. First, it can be used to check whether a string contains a substring which matches a certain pattern, and then the substring can be retrieved. Second, regular expression can be used to do complex replacement.
It is very simple to study regular expression syntax, and the few abstract concepts can be understood easily too. Many articles does not introduce its concepts from simple ones to abstract ones step by step, so some persons may feel it is difficult to study. On the other hand, each regular expression engine's document will describe its special function, but this part of special function is not what we should study first.
Every example in this article has a link to test page. Now let's begin!
1. Regular Expression Basic Syntax
1.1 Common Characters
Letters, numbers, the underline, and punctuations with no special definition are "common characters". When regular expression matches a string, a common character can match the same character.
Example1: When pattern "c" matches string "abcde", match result: success; substring matched: "c"; position: starts at 2, ends at 3.
Example2: When pattern "bcd" matches string "abcde",match result: success; substring matched: "bcd"; position: starts at 1, ends at 4.
1.2 Simple escaped characters
Nonprinting characters which we know:
|
Expression |
Matches |
|
\r, \n |
Carriage return, newline character |
|
\t |
Tabs |
|
\\ |
Matches "\" itself |
Some punctuations are specially defined in regular expression. To match these characters in string, add "\" in pattern. For example: ^, $ has special definition, so we need to use "\^" and "\$" to match them.
|
Expression |
Matches |
|
\^ |
Matches "^" itself |
|
\$ |
Matches "$" itself |
|
\. |
Matches dot(.) itself |
These escaped characters have the same effect as "common characters": to match a certain character.
Example1: When pattern "\$d" matches string "abc$de", match result: success; substring matched: "$d"; position: starts at 3, ends at 5.
1.3 Expression matches anyone of many characters
Some expressions can match anyone of many characters. For example: "\d" can match any number character. Each of these expressions can match only one character at one time, though they can match any character of a certain group of characters.
|
Expression |
Matches |
|
\d |
Any digit character, any one of 0~9 |
|
\w |
Any alpha, numeric, underline, any one of A~Z,a~z,0~9,_ |
|
\s |
Any one of space, tab, newline, return, or newpage character |
|
. |
'.' matches any character except the newline character(\n) |
Example1: When pattern "\d\d" matches "abc123", match result: success; substring matched: "12"; position: starts at 3, ends at 5.
Example2: When pattern "a.\d" matches "aaa100", match result: success; substring matched: "aa1"; position: starts at 1, ends at 4.
1.4 Custom expression matches anyone of many characters
Expression uses square brackets [ ] to contain a series of characters, it can match anyone of them. Uses [^ ] to contain a series of characters, it can match anyone character except characters contained.
|
Expression |
Matches |
|
[ab5@] |
Matches "a" or "b" or "5" or "@" |
|
[^abc] |
Matches any character except "a","b","c" |
|
[f-k] |
Any character among "f"~"k" |
|
[^A-F0-3] |
Any character except "A"~"F","0"~"3" |
Example1: When pattern "[bcd][bcd]" matches "abc123" , match result: success; substring matched:
