What are regular expressions?
A regular expression is a piece of code that essentially facilitates R(read)-U(update)-D(delete) operations on a javascript primitive, most likely a string literal.
Regular expression test method:
Regular expression.test(String):
Before explaining regular expressions, one must understand the basic test() method.
test() is a method of a RegExp() instance object. The text() method returns a boolean. The first parameter within this method is the string you want to test against the regular expression.
test() returns true if the string matches the regular expression pattern-wise! Otherwise, it returns false.
Example:
/abc/.test('abc') // true
/abc/.test('ccc') // false
As illustrated in the example above, the string 'abc' matches the pattern set in the regular expression /abc/; testing the string against the regular expression returns true.
Javascript Regex 101: Part 1:
Declaring a Regular Expression Instance in Javascript
There exist two ways to declare a regular expression instance in Javascript, either through the literal notation or through the constructor function.
Method 2: with /\s/ syntax
This way is known as a literal notation and takes a regular expression pattern between two slashes. You can set a 'flag' with this syntax right after the closing slash. In this article, we will dive deeper into what were 'flags' after.
Method 1: RegExp constructor
The constructor function can take a string or a RegExp instance as the first parameter. As an optional second parameter, you can add a "flag."
Types of Data
In this section, we will describe the most common wildcards for specific javascript one character primitives, I.e. digit and letter.
You use these to declare the constraint regarding character primitive in your regular expression.
\d : digit
Example:
/\d\d/.test('123') // true, matches '12'
\d is essentially a wildcard for a number primitive. In this example, when testing '123', the test method will look for the first instance of two consecutive digits. In this example, the returned value is true because \d\d matches '12'
Examples:
/\d\d/.test('A23D') // true, matches '23'
/\d\d/.test('Z9Z9') // false
\w : word character
Examples:
/\d\w\w/.test('5AA') // true, matches '5AA'
/\d\w\w/.test('2B2') // false
/\d\w\w/.test('9992AA') // true, matches '2AA'
/\w\w/.test('55PPQ') // true, matches 'PP'
\s : any whitespace character
Examples:
/\d\s\w/.test('5 A') // true, matches '5 A'
/\d\s\w\s/.test('A B ') // true, matches ' B '
/\d\d\s\d\d\d/.test('321 555') // true, matches '21 555'
/\d\d\s\d\d\d/.test(`321
555`) // true, matches `21
555`
\D : not a digit
Examples:
/\D\d\d/.test('W33') // true, matches 'W33'
/\D\d\d/.test('3AA') // false
\W : not a word character
Examples:
/\W\w/.test('1A') // true, matches '1A'
/\W\w/.test('125T') // true, matches '5T'
/\W\d/.test('AA44') // true, matches '44'
\S : not a whitespace character
Examples:
/\d\S\d/.test('1E1') // true, matches '1E1'
/\d\S\d/.test('AAA334') // true, matches '334'
/\d\S\d/.test('44 3') // false
. : any character except newline
Examples:
/\d../.test('1AC') // true, matches '1AC'
/d../.test('AAA23D') // true, matched '23D'
/\d../.test(`3
5`) // false
Note: When operating a method on a regular expression, if there is more than 1 match instance of the string against the regular expression. In that scenario, the method returns a result based on the first instance of the match. I.e. subsequent instances are disregarded (depending on the method being used and/or the optional flag set)
Regular Expression Methods:
Regular expression.exec(String)
This method returns an array of all the matched groups, index, input, and defined groups.
Example:
/maestro\sis\s(the)\sboss/.exec(`maestro is the boss`)
// ['maestro is the boss', 'the', index: 0, input: 'maestro is the boss', groups: undefined ]
Explanation:
By default, the method utilizes the regular expression to find a match. In this scenario, it is using first: /maestro\sis\sthe\sboss/. This execution will result in 'maestro is the boss' as the first element that matches.
The method then executes sequentially for the regular expressions defined in groups, I.e. parts of the regular expression in parentheses. Whatever matches the group is then placed in the returned array. In this example, (the) is a group, so 'the' in 'maestro is the boss' constitutes the second element of the returned array.
The exec method runs from the global regular expression towards the regular expressions nested within parentheses.
Let's explain the extra added fields: index, input, and groups:
- index: constitutes the character number position at which the match starts on the string. In this scenario, the match begins at 'm' in 'maestro is the boss'; therefore, the index is 0.
- input: input is the the string parameter value. In this scenario: 'maestro is the boss'
- groups: groups need to be defined using special syntax to get outputted. In this example, no group was set; hence, we got undefined for the groups field.
If we wanted to set a group and get a value for it, the way we define a group is as follows:
Example:
/maestro\sis\s(?the)\sboss/.exec(`maestro is the boss`)
console.log(arrayRegExec.groups) // [Object: null prototype] { the: 'the' }
Notice the use of the (?the) syntax. 'the_name' is the group's name, and it populates the field with the match within the parentheses, I.e. 'the.' - indices: Indices are the index numbers where matches start and end relative to the string's character positions.
Add the d flag at the end of the regular expression to enable this field. (We will look at what are flags later.)
Example:
/maestro\sis\s(?
console.log(arrayRegExec.indices) // 0,19,11,14
Explanation:
0: matches start index of 'm' in maestro is the boss
19: matches end index of 's' in maestro is the boss
Then,
11: Matches start index of 't' in the (inner group)
14: Matches end index of 'e' in the (inner group)
String.match(Regular Expression)
String.match() is a method in JavaScript that searches a string for a match against a regular expression and returns the matches as an array-like object. If no matches are found, it returns null.
Example:
let matchedWords = "The quick brown fox jumps over the brown dog".match(/\bbrown\b/g);
console.log(matchedWords);
// Outputs: [ 'brown', 'brown' ]
Explanation: This regex searches for the word 'brown' in the string. In this example, two instances of 'brown' were matched.
String.search(Regular Expression)
The search method returns the index, at which point the string starts matching the regular expression. If no match is registered, the method returns -1.
Example:
'ABS335'.search(/\d\d\d/) // returns 3
Explanation: The string 'ABS335' matches the /\d\d\d/ regular expression pattern at index three regarding character position basis.
String.replace(Regular Expression | String, String | String with groupings in it | function)
String.replace(Regular Expression, String)
The replace method is executed on a string instance. The replace method takes two parameters. The first is to identify the portion of the string to replace. The first parameter can be either a string or a regular expression.
Whatever matches that first parameter within the string instance will then be replaced by the second parameter. The replace method returns the string instance after the replacement is complete.
Note 1: The string instance does not change, but on the execution of replace and an assignment, the newly assigned variable will be the edited string.
Note 2: When a string or a regular expression without a g flag is positioned as the first parameter, the replacement only operates once if there is a match to perform on.
Note 3: To render a replacement global, use the replace() method with a regular expression with a g flag. Or, you can use the replaceAll() method.
Example:
'ABS335'.replace(/\d\d\d/, '999') // ABS999
Explanation:
335 matches the pattern /\d\d\d/; therefore, the string 'ABS335' replaces the portion that matches, I.e. 335, with the second parameter: 999. Consequently, the returned result will be ABS999
Let's discuss other scenarios:
String.replace(Regular Expression - with group(s) set, String - with group(s) utilized)
Example 1:
$n: Inserts the nth (1-indexed) capturing group
const replacedStr1 = 'Hello World'.replace(/(Hello) (World)/, '$2 $1');
console.log(replacedStr1); // Outputs: "World Hello"
You can use the $n syntax within your replacement string. This method enables you to manipulate your strings' inner matchings. All this is to yield a more custom replacement.
Example 2:
$
const str6 = 'John Doe';
const replacedStr6 = str6.replace(/(?
console.log(replacedStr6); // Outputs: "Doe, John"
This is the same concept as previously, except with this syntax, you can refer to the captured group by pre-assigned name.
String.replace(Regular Expression, String - with particular syntactical elements)
Example 1:
$$: Inserts a "$"
const str1 = 'Price is 100.';
const replacedStr1 = str1.replace(/100/, '$$100');
console.log(replacedStr1); // Outputs: "Price is $100."
Example 2:
$&: Inserts the matched substring
const str2 = 'Hello';
const replacedStr2 = str2.replace(/ell/, '($&)');
console.log(replacedStr2); // Outputs: "H(ell)o"
Example 3:
$`: Inserts the portion of the string that precedes the matched substring
const str3 = 'abcXYZdef';
const replacedStr3 = str3.replace(/XYZ/, '$`');
console.log(replacedStr3); // Outputs: "abcabcdef"
Example 4:
$': Inserts the portion of the string that follows the matched substring
const str4 = 'abcXYZdef';
const replacedStr4 = str4.replace(/XYZ/, "$'");
console.log(replacedStr4); // Outputs: "abcdefdef"
Javascript Regex 101: Part 2:
In this section we will present and explain the different optional flags you can set on regular expressions.
Regular expressions Options
/RegExp/i:
The I flag essentially permits case insensitivity when making matches against a regular expression object.
Example:
const str1 = 'Hello World';
const regex1 = /hello/i;
console.log(str1.match(regex1)); // Outputs: ["Hello"]
/RegExp/g:
The g flag is essentially forcing the match operation to operate globally, therefor you will most likely be returned all the instances of the string that match the regular expression.
Note: the default behaviour is to return only the first instance of a match.
Example:
const str2 = 'apple orange apple banana';
const regex2 = /apple/g;
console.log(str2.match(regex2)); // Outputs: ["apple", "apple"]
/RegExp/y:
Returns a match only if regex.lastIndex property value coincides with the value of the index where the match actually is positioned in the string of characters.
Example:
const str3 = 'apple orange apple banana';
const regex3 = /orange/y;
regex3.lastIndex = 6; // Setting lastIndex to the index that 'orange' starts
console.log(regex3.exec(str3)); // Outputs: ["orange"]
regex3.lastIndex = 0; // Placing the index at 0, therefor not the same index where 'orange' is positioned.
console.log(regex3.exec(str3)); // Outputs: null
/RegExp/u:
u stands for unicode mode. On a high level, when enabled, the regular expression engine becomes broader in terms of characters it can read.
Example:
const regex = /^.$/; // regex specifying one character
// ^: starts with one character
// $: ends with one character
console.log(regex.test('\uD83D\uDE80')); // Outputs: false
console.log(regex.test('🚀')); // Outputs: false
const regexU = /^.$/u;
console.log(regexU.test('\uD83D\uDE80')); // Outputs: true
console.log(regexU.test('🚀')); // Outputs: true
Explanation:
'\uD83D\uDE80' represents a single astral symbol: 🚀. When the regex is tested against the symbol, it returns false despite being of one character length.
When the u flag is set, the regex interpreter can match the symbol to the regex that specifies one character.
/RegExp/d:
This flag enables the indices to be returned when making regex operations. A successful match is required for indices to be returned; otherwise, null will be returned.
As mentioned previously, the indices are the index positions relative to the string of chars being tested.
The indices represent where the matches start and end.
Example:
/maestro\sis\s(?
console.log(arrayRegExec.indices) // 0,19,11,14
Explanation:
0: matches start index of 'm' in maestro is the boss
19: matches end index of 's' in maestro is the boss
Then,
11: Matches start index of 't' in the (inner group)
14: Matches end index of 'e' in the (inner group)
Defining Regular Expression Structures and Patterns
I would qualify these regular expression elements as structure-based:
/[abc]/: Any character from a set of characters:
Example:
let matched = "zzzawwww".match(/[abc]/);
console.log(matched)
// Outputs: [ 'a', index: 3, input: 'zzzawwww', groups: undefined ]
/[0-3]/: Any character in a set of characters:
/[0-3]/ This is a regular expression, and the character set it can match is from 0 to 3.
Example:
let matched2 = "zzz123wwww".match(/[0-3]/);
console.log(matched2)
// Outputs [ '1', index: 3, input: 'zzz123wwww', groups: undefined ]
/[^abc]/: Any character not in a set of characters:
/[^abc]/ This is a regular expression, and the character set it can match is any character other than 'a', 'b', and 'c'.
Example:
let matched3 = "aaavvvvvxxxjjjjj".match(/[^a-w]/);
console.log('\n\n');
console.log(matched3)
// Output: [ 'x', index: 8, input: 'aaavvvvvxxxjjjjj', groups: undefined ]
Explanation: Matches 'x', as it is not within the set from 'a' to 'w'
/\b/: A word boundary:
Example:
let matched4 = "apple lalaPIE is good PIE hello".match(/\bPIE/);
console.log(matched4);
// Outputs: ['PIE', index: 22, input: 'apple lalaPIE is good PIE hello', groups: undefined ]
Explanation: Matches the first PIE as it is not delimited with any characters at the beginning, I.e. it is a word and delimited with a space at the start of the word. The space constitutes the match with the \b boundary defined in the regex.
/^/: Start of input
/^/ This regular expression matches the start of an input. It's useful when ensuring a pattern appears right at the beginning of a string.
Example:
let matched5 = "apple foobar".match(/^apple/);
console.log(matched5);
// Outputs: [ 'apple', index: 0, input: 'apple foobar', groups: undefined ]
let matched6 = "lala apple foobar".match(/^apple/);
console.log(matched6); // null
/$/: End of input:
/$/ This regular expression matches the end of an input. It's often used to ensure a pattern appears right at the end of a string.
Example:
let matched7 = "blabla apple foobar".match(/foobar$/);
console.log(matched7);
// Outputs: ['foobar', index: 13, input: 'blabla apple foobar', groups: undefined]
let matched8 = "blabla apple".match(/foobar$/);
console.log(matched8);
// Outputs: null
/(abc)/: A group:
/(abc)/ This regular expression groups multiple characters as a single unit. It can be helpful when you want to capture portions of your match.
Example:
let matched9 = "defabcghi".match(/ef(abc)/);
console.log(matched9);
// Outputs: [ 'efabc', 'abc', index: 1, input: 'defabcghi', groups: undefined ]
Explanation: First, it matches the whole regular expression efabc. Then, as the second returned element in the array, it corresponds to what matches the group. The second element is abc.
/a|b|c/: Anyone of several patters:
/a|b|c/ This regular expression matches any pattern from the | options. In this example, it will match either 'a', 'b', or 'c'.
Example:
let matched10 = "def XaX XcX".match(/a|b|c/);
console.log(matched10);
// Outputs: [ 'a', index: 5, input: 'def XaX XbX', groups: undefined ]
I would qualify these regular expression elements as pattern-based:
/x+/: One or more occurrences of a pattern:
/x+/ This regular expression matches one or more consecutive occurrences of the character 'x'.
Example:
let matched9 = "yyxxxyyy".match(/x+/);
console.log(matched9);
// Outputs: [ 'xxx', index: 2, input: 'yyxxxyyy', groups: undefined ]
// Matches the three consecutive 'x' characters
/x+?/:One or more occurrences, non-greedy:
/x+?/ This regular expression matches one or more consecutive occurrences of the character 'x', but in a non-greedy manner, meaning it will match the smallest number of 'x' characters possible.
Example:
ertain settings, the regular expression skips over an instance that should have matched the string. We qualify that regex as being 'greedy.' To avoid the regex being 'greedy,' we can set a specific syntax.
const s1 = 'type="submit" class="btn"'
const greedyPattern = /".+"/g;
const result1 = s1.match(greedyPattern)
console.log(result1);
// Outputs: [ '"submit" class="btn"' ]
Explanation: Even though the instance of " was present in the middle, the match disregarded it and treated it as any character.
const s2 = 'type="submit" class="btn"'
const nonGreedyPattern = /".+?"/g;
const result2 = s2.match(nonGreedyPattern)
console.log(result2);
// Outputs: [ '"submit"', '"btn"' ]
Explanation: Adding the extra '?' at the end of the + sets the regular expression to not skip over instances where " are met. Setting the g flag ensures we retrieve both HTML attribute's values.
/x* /: Zero or more occurrences:
/x*/ This regular expression matches zero or more consecutive occurrences of the character 'x'.
Example:
const matched2 = '[w] ha[www] to [] foo bar [hello] bla bla [ewr].'.match(/\[w*\]/g);
console.log(matched2);
// Outputs: [ '[w]', '[www]', '[]' ]
Explanation: We can see with this example that the * operator enables one to match an instance where brackets are empty and where brackets contain any number of w's exclusively.
/x?/: Zero or one occurrences:
/x?/ This regular expression matches zero or one character 'x' occurrence.
Example:
const matched3 = '[] ha[w] to [ww] foo bar [hello] bla bla [ttwwtt].'.match(/\[w?\]/g);
console.log(matched3);
// Outputs: [ '[]', '[w]' ]
Explanation: With the ? operator, the example illustrates that a match occurs when one occurrence or not of 'w' exclusively is found within brackets.
/x{2,4}/: Two to four occurrences:
/x{2,4}/ This regular expression matches two to four consecutive occurrences of the character 'x'.
Example:
const matched4 = '[] ha[ww] to [www] foo bar [wwww] bla bla [TTwwwTT] [w].'.match(/\[w{2,4}\]/g);
console.log(matched4);
// Outputs: [ '[ww]', '[www]', '[wwww]' ]
Explanation: The match will be effective if the number of w characters between brackets is between two and four.
Most used Regular expressions for the WEB:
If you want to use the regular expressions I used and developed throughout the years, drop your email below and download the PDF.
The PDF contains the regular expressions I developed and enables you to parse the following elements:
- File name extraction (part before the .extension) ext_filename.js
- Last directory in path extraction last_directory.js
- Decompose an email to extract username, provider and extension emailvalidation.js
- Extension (.extension) extraction from the file name and URL getExtension.js
- Pull the last file from a full path getFileFromPath.js
- Extract values from a SQL INSERT statement getValsSQL.js
- Extract innerHTML for a specific HTML tag htmlH3s_innerHTML.js
- Decompose a full URL to its protocol, full domain name, and full path parseHTTP.js
- Break down a full path to an array of directory names pathParser.js
- Regular expression to identify un-wanted characters in input fields, because used by hackers sanitizehtml.js
- Function to test for username validity usernameValid.js
Also, the PDF contains ways to edit elements.
- Transform an H1 to a file-friendly format h1_to_file.js
- Transform the date string in the format: "1-30-2003" to Date instance dateFormat.js
Register for interest to get a copy for only 40 USD!
Conclusion
Mastering JavaScript regex patterns unlocks powerful string manipulation capabilities, enriching your coding toolbox.
In "JavaScript Regex 101", we delved into the foundational aspects of regular expressions, ensuring you can confidently craft and interpret regex patterns. As with any skill, practice solidifies understanding.
Thus, continually challenge yourself with varied string patterns.
With time, regex won't just be a series of cryptic characters but a clear, logical means to sift through and process data.