Inline

Latest revision as of 13:28, 11 November 2017

Syntax

	This section is under construction.

Metacharacters

Definitions		Examples
MChar	Definition	Pattern	Sample Matches
^	Start of a string.	^abc	abc, abcdefg, abc123, ...
$	End of a string.	abc$	abc, endsinabc, 123abc, ...
.	Any character (except \n newline)	a.c	abc, aac, acc, adc, aec, ...
\|	Alternation.	bill\|ted	ted, bill
{...}	Explicit quantifier notation.	ab{2}c	abbc
[...]	Explicit set of characters to match.	a[bB]c	abc, aBc
(...)	Logical grouping of part of an expression.	(abc){2}	abcabc
*	0 or more of previous expression.	ab*c	ac, abc, abbc, abbbc, ...
+	1 or more of previous expression.	ab+c	abc, abbc, abbbc, ...
?	0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.	ab?c	ac, abc
\	Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.	a\sc	a c

Character Escapes

Escaped Char	Description
Ordinary Characters	Characters other than $ . ^ { [ ( \| ) ] } * + ? \ match themselves.
\a	Matches a bell (alarm) \u0007.
\b	Matches a backspace \u0008 if in a [ ]; otherwise matches a word boundary (between \w and \W characters).
\t	Matches a tab \u0009.
\r	Matches a carriage return \u000D.
\v	Matches a vertical tab \u000B.
\f	Matches a form feed \u000C.
\n	Matches a new line \u000A.
\e	Matches an escape \u001B.
\040	Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. For example, the character \040 represents a space.
\x20	Matches an ASCII character using hexadecimal representation (exactly two digits).
\cC	Matches an ASCII control character; for example \cC is control-C.
\u0020	Matches a Unicode character using a hexadecimal representation (exactly four digits).
\*	When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

Character Classes

Char Class	Description
.	Matches any character except \n. If modified by the Singleline option, a period character matches any character.
[aeiou]	Matches any single character included in the specified set of characters.
[^aeiou]	Matches any single character not in the specified set of characters.
[0-9a-fA-F]	Use of a hyphen (–) allows specification of contiguous character ranges.
\p{name}	Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
\P{name}	Matches text not included in groups and block ranges specified in {name}.
\w	Matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].
\W	Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].
\s	Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].
\S	Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].
\d	Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
\D	Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

More Examples

Symbol	Function
[\^$.\|?*+()	Special characters any other will match themselves
\	Escapes special characters and treat as literal
*	Repeat the previous item zero or more times
.	Single character except line break characters
.*	Match zero or more characters
^	Match at the start of a line/string
$	Match at the end of a line/string
.$	Match a single character at the end of line/string
^ $	Match line with a single space
[^A-Z]	Match any line beginning with any char from A to Z

Examples

Samples

Matching specific value from output

Source: pythex.org:

%Cpu(s): 0.3 us, 0.1 sy, 0.0 ni, 99.3 id, 0.2 wa, 0.0 hi, 0.0 si, 0.1 st
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,1.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

Regex:

([0-9]{3}.[0-9]|[0-9]{2}.[0-9]|[0-9].[0-9])(?=\sid)

Explanation:

[0-9]{3} => 3 digits
|        => OR
[0-9]{2} => 2 digits
.        => any character (Dot here)
(?=\sid) => select non-greedy output before 'id' 
?        => non-greedy
\s       => Space
id       => 'id' character

IP Addresses

To Match upto 999.999.999.999:

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

OR shortened with a quantifier to:

\b(?:\d{1,3}\.){3}\d{1,3}\b

To match exactly upto 255.255.255.255:

 
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

OR shortened with a quantifier to:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

Credit Card numbers

Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13:

^4[0-9]{12}(?:[0-9]{3})?$

MasterCard numbers start with the numbers 51 through 55. All have 16 digits:

^5[1-5][0-9]{14}$

Misc Examples

Match 2 characters/numbers only:

^[0-9a-zA-Z]{2}$

Simple URL Verification:

(http|https):\/\/([a-z])\w+\.(com|net|org)

{{#widget:DISQUS |id=networkm |uniqid=Regex |url=https://aman.awiki.org/wiki/Regex }}

@@ Line 1: / Line 1: @@
 [[Category:Linux]]
 =Syntax=
+{{UC}}
+== Metacharacters ==
+{| class="wikitable" style="width:80%"
+|-
+!colspan=2| Definitions !!colspan=2| Examples
+|-
+! MChar !! Definition !! style="width:15%"|Pattern !!style="width:25%"| Sample Matches
+|-
+| ^ || Start of a string. || ^abc || abc, abcdefg, abc123, ...
+|-
+| $ || End of a string. || abc$ || abc, endsinabc, 123abc, ...
+|-
+| . || Any character (except \n newline) || a.c || abc, aac, acc, adc, aec, ...
+|-
+| <nowiki>|</nowiki> || Alternation. || <nowiki>bill|ted</nowiki> || ted, bill
+|-
+| {...} || Explicit quantifier notation. || ab{2}c || abbc
+|-
+| [...] || Explicit set of characters to match. || a[bB]c || abc, aBc
+|-
+| (...) || Logical grouping of part of an expression. || (abc){2} || abcabc
+|-
+| * || 0 or more of previous expression. || ab*c || ac, abc, abbc, abbbc, ...
+|-
+| + || 1 or more of previous expression. || ab+c || abc, abbc, abbbc, ...
+|-
+| ? || 0 or 1 of previous expression; also forces minimal matching when an expression might  <br />match several strings within a search string. || ab?c || ac, abc
+|-
+| \ || Preceding one of the above, it makes it a literal instead of a special character.  <br />Preceding a special matching character, see below. || a\sc || a c
+|}
+== Character Escapes ==
 {| class="wikitable"
 |-
-! Symbol !! Function
+! Escaped Char !! Description
+|-
+| Ordinary Characters || Characters other than <nowiki> $ . ^ { [ ( | ) ] } * + ? \</nowiki> match themselves.
+|-
+| \a || Matches a bell (alarm) \u0007.
+|-
+| \b || Matches a backspace \u0008 if in a [ ]; otherwise matches a word boundary (between \w and \W characters).
+|-
+| \t || Matches a tab \u0009.
+|-
+| \r || Matches a carriage return \u000D.
+|-
+| \v || Matches a vertical tab \u000B.
+|-
+| \f || Matches a form feed \u000C.
+|-
+| \n || Matches a new line \u000A.
+|-
+| \e || Matches an escape \u001B.
+|-
+| \040 || Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have  <br />only one digit or if they correspond to a capturing group number. For example, the character \040 represents a space.
+|-
+| \x20 || Matches an ASCII character using hexadecimal representation (exactly two digits).
+|-
+| \cC || Matches an ASCII control character; for example \cC is control-C.
+|-
+| \u0020 || Matches a Unicode character using a hexadecimal representation (exactly four digits).
+|-
+| \* || When followed by a character that is not recognized as an escaped character, matches that character.  <br />For example, \* is the same as \x2A.
+|}
+== Character Classes ==
+{| class="wikitable"
+|-
+!Char Class !! Description
+|-
+| . || Matches any character except \n. If modified by the Singleline option, a period character matches any character.
+|-
+| [aeiou] || Matches any single character included in the specified set of characters.
+|-
+| [^aeiou] || Matches any single character not in the specified set of characters.
+|-
+| [0-9a-fA-F] || Use of a hyphen (–) allows specification of contiguous character ranges.
+|-
+| \p{name} || Matches any character in the named character class specified by {name}. <br />Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
+|-
+| \P{name} || Matches text not included in groups and block ranges specified in {name}.
+|-
+| \w || Matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. <br />If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].
+|-
+| \W || Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].  <br />If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].
+|-
+| \s || Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}].  <br />If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].
+|-
+| \S || Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}].  <br />If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].
+|-
+| \d || Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
+|-
+| \D || Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.
+|}
+== More Examples ==
+{| class="wikitable"
+|-
+!style="width:25%"| Symbol !!style="width:65%"| Function
 |-
 | <nowiki>[\^$.|?*+()</nowiki> || Special characters any other will match themselves
@@ Line 29: / Line 129: @@
 =Examples=
+==Samples==
-*For IP Addresses:
-. To Match upto 999.999.999.999:
+*;Matching specific value from output
+Source: [https://pythex.org pythex.org]:
+ %Cpu(s): 0.3 us, 0.1 sy, 0.0 ni, '''99.3 id''', 0.2 wa, 0.0 hi, 0.0 si, 0.1 st
+ %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,'''100.0 id''', 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,'''1.0 id''', 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+Regex:
+ ([0-9]{3}.[0-9]|[0-9]{2}.[0-9]|[0-9].[0-9])(?=\sid)
+Explanation:
+ [0-9]{3} => 3 digits
+ |        => OR
+ [0-9]{2} => 2 digits
+ .        => any character (Dot here)
+ (?=\sid) => select non-greedy output before 'id'
+ ?        => non-greedy
+ \s       => Space
+ id       => 'id' character
+==IP Addresses==
+* To Match upto 999.999.999.999:
  \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
 OR shortened with a quantifier to:
  \b(?:\d{1,3}\.){3}\d{1,3}\b
-. To match exactly upto 255.255.255.255:
+* To match exactly upto 255.255.255.255:
 <pre style="width: 97%; overflow-x: scroll;">
 \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
@@ Line 41: / Line 161: @@
  \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
-* For Credit Card numbers:
+==Credit Card numbers==
-. Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13:
+Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13:
  ^4[0-9]{12}(?:[0-9]{3})?$
-. MasterCard numbers start with the numbers 51 through 55. All have 16 digits:
+MasterCard numbers start with the numbers 51 through 55. All have 16 digits:
  ^5[1-5][0-9]{14}$
+== Misc Examples ==
+* Match 2 characters/numbers only:
+ ^[0-9a-zA-Z]{2}$
+* Simple URL Verification:
+ (http|https):\/\/([a-z])\w+\.(com|net|org)
+<br/>

Regex: Difference between revisions