Regular expressions

Regular expressions are a programming term that refers to a sequence of characters used to define a search criterion within a given text. Recently, Excel has gained several functions for working with regular expressions, and we can also use them as an argument to the XMATCH function and the XLOOKUP function as an additional search criterion.

A regular expression consists of a series of characters that are meant to describe what we want to find. When you want to find a character in a range, you need to display it in square brackets; For example, if you specify the criterion “[abc]”, it represents the character “a”, “b” or “c”. Specifying the criterion “[a-z]” searches for any lowercase letter in the given range, and the criterion “[^abc]” allows any letter except “a”, “b”, or “c”. In a similar way, a rule is set for searching for digits. For example, “[0-9]” searches for any digit within a given search scope. It is also possible to specify “\d” for numbers, “\w” for letters, and “\s” for spaces.

A vertical line is used as a logical “or”. For example, if you search for the term “Jovan” or “Jovana” you should use the regular term “Jovan|Jovana”. To test this in practice, let’s use the REGTEST function, whose syntax is:

REGEXTEST(<test>,<regular expression>,[<sensitivity>])

The first argument is the cell or range where the search is performed. The second argument is the regular expression that we use for the search, and the last, optional argument, is used to determine whether the search distinguishes between lowercase and uppercase letters. If we write the formula:

=REGEXTEST(A2:A9,” Jovan|Jovana”)

The REGTEXT function will return TRUE if the search matches the criterion, and FALSE if it doesn’t. For a better view, a conditional formatting rule has been added to emphasize the exact values.

If we use the regular expression “Jovan(a|o)”, it will be used to find the words “Jovana” and “Jovano”. Wild signs can also be used in the search. A dot (.) is a symbol that replaces a single character. The question mark “?” indicates exactly one, and the “*” indicates multiple characters. If we use the regular expression “J?van?” it means that we are looking for all names where the second and last character can be any, and with the expression “Jov*” we are looking for all words that begin with the sequence “Jov”.

The plus “+” is used to give a regular expression in which we have multiple repetitions of a character. To specify in detail the number of times a character repeats within an expression, parameters in curly brackets are used. For example, {n} represents exactly one repetition of a given character, {min,} represents the minimum, {,max} represents the maximum, and {min,max} represents the range of minimum and maximum repetition of a character. For example, the sample a{2} searches for two repetitions of the character “a” within the given text.

The REGEXTRACT function is used to extract some text from a given search scope with the help of a regular expression. Its syntax is:

REGEXEXTRACT(<test>,<regular expression>,[<return mode>],[<sensitivity>])

In cell A1, in the new worksheet, the text from the beginning has been added. If we enter the formula:

=REGEXEXTRACT(A1,”izraz*”)

Excel will extract all occurrences of the term that starts with “izraz”. With the third argument, instead of the first term found, we can return a dynamic list with all the terms. The last argument is used to determine whether the search is  case sensitive.

If you want to replace the text based on a given regular expression, you can use the REGEXREPLACE function. Its syntax is:

REGEXREPLACE(<test>,<regular expression>,<substitution>,[<repetition>],[<sensitivity>])

The first argument is the cell or range that is being searched. The second argument is the text that is used for the replacement. Next, we specify the regular expression, whether we want to replace only the first or all occurrences, as well as whether the search takes into account case. For example, if we write the formula:

=REGEXREPLACE(A1,” izraz*”;”nizov”)

The word “izrazi” will be replaced by the word “nizovi”.

Regular expressions can also be used within the XMATCH or XLOOKUP function. Imagine that we are looking for a plasma biscuit, and we are not sure if it is listed as “Plazma” or “Plazme” in the product name. If we are looking for the position of the first such product, let’s write the formula:

=XMATCH(“PLAZM(A|E)”; B4:B13; 3; 1)

And if we also want to return some additional information about this product, we will write the formula:

=XLOOKUP(“PLAZM(A|E)”; B4:B13; A4:E13;””; 3; 1)

A novelty compared to traditional XMATCH and XLOOKUP searches is that, when specifying an argument for a search mode, you can select the value 3 which indicates that you are performing a search by a regular expression.