Home > Articles > Programming > Windows Programming

How to Use Regular Expressions TODAY in Your Windows PowerShell Code

  • Print
  • + Share This
Do you need to learn all about regular expressions before using them with PowerShell? Nope. Timothy Warner, author of Sams Teach Yourself Windows PowerShell 5 in 24 Hours, doesn't waste time with boring backstory. Learn how to combine regex with your PowerShell code to jump right into performing search-and-replace operations, validation, and more.
From the author of

If you're a Window systems administrator (and decidedly not a programmer), I would hazard a guess that your PowerShell adoption thus far has been a bit...slow. Am I correct?

Let me speed things up for you. I'll teach you in this article how to use regular expressions (regex for short, typically pronounced REJ-ex) in your PowerShell code to parse string data with laser-like efficiency.

Suppose you're tasked with one or more of the following real-world scenarios:

  • Finding personally identifiable information in a folder containing hundreds of files
  • Finding and replacing globally unique identifiers (GUIDs) in hundreds of server log files
  • Validating date formats and password strength in your company's intranet portal

The aforementioned tasks are trivial for .NET programmers: "I'll just use regex!" they say. However, if you're getting into PowerShell automation slowly, your blood might run cold at the thought of performing complicated pattern matches.

Don't stress! By the end of this article, you'll understand what regex actually does, and you'll learn how to implement regex patterns in PowerShell by using the -match operator, the -replace operator, and the Select-String cmdlet. Let's begin.

Regular Expression Basics

In a nutshell, regular expressions represent a rule set for performing pattern matching on string data. You're probably familiar with using the old MS-DOS wildcard characters. For instance, we can run the following command at the prompt to find all .xls or .xlsx files in the current folder whose names contain the word report:

C:\>dir *report*.xl?

In this example, the asterisk (*) represents zero or more characters, and the question mark (?) substitutes for any single character.

Open an administrative PowerShell console, and let's dive right in. We can use the -match operator to perform true/false tests against incoming string data. Doing so gives you valuable practice with both regex and PowerShell syntax.

The following tests should both evaluate to True. Can you see why?

'project14' -match 'pro'
'project14' -match '14'

Your first regular expressions lesson is that you can perform literal matches. The subject string project14 contains both pro and 14, so both expressions evaluate to True. Of course, this question arises: Does the match value include just the matching characters, or the entire string?

Windows PowerShell populates the $matches automatic array variable with the previous regex match result. Run the previous tests again, this time adding $matches after each. In the following code, I'm using the PowerShell command separator, the semicolon (;), to keep the example compact:

PS C:\> 'project14' -match 'pro' ; $Matches
True

Name                           Value
----                           -----
0                              pro

PS C:\> 'project14' -match '14' ; $Matches
True

Name                           Value
----                           -----
0                              14

Now let's say we have a bunch of files whose names start with the word project. Do you think the following expression will result in True or False?

'project14' -match 'project*'

If you tried the previous example, you know we'll get False here. Why? Your second regular expression lesson is that some regex metacharacters operate only on the preceding character, so 'project*' can be translated as "one or more occurrences of t." Yes, that's right. With regex, you need to construct your match patterns one character at a time.

While the asterisk matches one or more occurrences of the preceding character, the question mark actually behaves much like the MS-DOS question mark wildcard. Let's say we wanted to match project10 through project19:

'project14' -match 'project1?'

A metacharacter in regex is a character (or character combination) that's processed by the regex engine in a non-literal way.

Let's check out another metacharacter:

'8675309' -match '\d'

The \d metacharacter is called a character class, and it matches one or more instances of (you guessed it) the preceding character in the string. You can use quantifiers to match specific occurrences. Take a look:

'8675309' -match '\d{7}'

The $matches variable should show you the entire subject string (8675309) instead of only the number 8, because the {7} denotes seven repetitions of the digit match. The following table shows other examples of using the \d character class with the { } quantifier.

Example

Interpretation

'\d{1,3}'

Match between one and three times

'\d{5,}'

Match five or more times

Regex has many character classes, but I can't explain them all here. Instead, the following table gives you a "punchlist" of my favorites.

Character Class

Action

\w

Matches entire words

\b

Matches word boundaries

\s

Matches whitespace

One more regex concept before we do some "real world" examples: Put match ranges in square brackets ([ ]). The following expression should evaluate to True (be sure to inspect $matches as well):

'admin@company.com' -match '[a-z]+'

The match should have been 'admin' in this case. Yes, I sneaked in another metacharacter; in regex syntax, the plus (+) quantifier matches one or more instances of the preceding character. This is unlike the asterisk, which you'll recall matches zero or more instances of the preceding character. The range construct is awesome in regex, because your subject string might have variable length.

Using the -match Operator in the Real World

Let's say we need to parse a list of universal naming convention (UNC) paths in a text file named C:\input\servers.txt:

\\dc1\logs
\\mem2\documents
\\23ressvr\share1
\\sharepoint.company.pri\doclibe
\\server234\dfs1
\\server532\dfs2
\\server99\dfs5

We need to find out (a) whether server532 exists in the file; and, if so, (b) the name(s) of any shared folder(s) hosted by that server. How can we do this? Well, the first thing we need to do is grab all the servers.txt content and import the data into our PowerShell run space:

Get-Content -Path 'C:\input\servers.txt'

That's not enough, though. We need to filter that file content by using the Where-Object cmdlet, the -match operator, and a regex expression:

Get-Content -Path 'C:\input\servers.txt' | Where-Object { $_ -match '\\\\server532' }

You probably know that the $_ token is shorthand notation for the current object in the PowerShell pipeline. But doubtless you're wondering what \\\\ means. Get ready for regular expression lesson three: We need to escape certain characters to suppress the .NET regex engine from processing them as non-literals.

The UNC example is particularly confusing because the backslash (\) is the escape character, and we need to escape the two literal backslashes that precede any UNC path.

Let's try another example. This time, we want to match \\sharepoint.company.pri from servers.txt:

Get-Content -Path 'C:\input\servers.txt' | Where-Object { $_ -match '\\\\\w+\.\w+\.\w+' }

Whoa, Nelly! Now we're truly getting into the thick of things. Notice that I used the shorthand \w+ construction to match one or more occurrences of a word character. Because the period/dot (.) isn't a word character, I escape the two periods in the hostname sharepoint.company.pri. Cool, eh?

Introducing Select-String

For jobs when you need to dip into one or more files, find matches, and potentially make replacements, Select-String is what you need. Consider the following sample file named C:\input\customers.csv:

FirstName,LastName,SSN,Birthdate
Carey,Landry,123-45-6789,5/22/1981
Kayla,Duquette,344-55-5677,4/2/1970
Mike,Connor,543-21-9876,11/29/1955
Wendy,Robbins,987-32-4244,10/4/1968

First of all, the names and metadata in this example are entirely fictional. Second, notice that we have a comma and no intervening space separating each column entry (this file contains comma-separated values, after all).

Now imagine that instead of four records this database file has several thousand records. We're tasked with identifying every U.S. Social Security number (SSN) in the file. As you may know, the SSN has the following general format:

111-22-3333

We're keeping things extra-simple here; in the real world, you'll want to employ a regex expression that matches only valid SSNs. For instance, real SSNs don't start with 000 or 666.

I habitually use the Select-String -AllMatches switch parameter to gather all matches instead of only one match per line:

Select-String -Path 'C:\input\customers.csv' -Pattern '\d{3}\-\d{2}\-\d{4}' -AllMatches

input\customers.csv:2:Carey,Landry,123-45-6789,5/22/1981
input\customers.csv:3:Kayla,Duquette,344-55-5677,4/2/1970
input\customers.csv:4:Mike,Connor,543-21-9876,11/29/1955
input\customers.csv:5:Wendy,Robbins,987-32-4244,10/4/1968

Notice that the result set gives us the line number where each match took place. Let's finish up by redacting each exposed Social Security number with the string pattern XXX-XX-XXXX:

Select-String -Path 'C:\input\customers.csv' -Pattern '\d{3}\-\d{2}\-\d{4}'
-AllMatches | ForEach { $_ -replace '\d{3}\-\d{2}\-\d{4}', 'XXX-XX-XXXX' } C:\input\customers.csv:2:Carey,Landry,XXX-XX-XXXX,5/22/1981 C:\input\customers.csv:3:Kayla,Duquette,XXX-XX-XXXX,4/2/1970 C:\input\customers.csv:4:Mike,Connor,XXX-XX-XXXX,11/29/1955 C:\input\customers.csv:5:Wendy,Robbins,XXX-XX-XXXX,10/4/1968

I used the ForEach construct to loop through the dataset and the -replace operator to replace the SSN matches with our redaction string. The results look good, but if you open the source file, you won't see the letter X everywhere. What's up?

Well, Select-String writes MatchInfo objects to the pipeline. In order to replace the source string data, we need to operate on that source string data.

My proposed solution is to use Get-Content to "vacuum" the customers.csv text into our run space, perform the match/replace, and then export the final result set to a new file. Try the following:

$file = 'C:\input\customers.csv'
$content = Get-Content $file
$content | ForEach-Object {$_ -Replace '\d{3}\-\d{2}\-\d{4}', 'XXX-XX-XXXX' } | Set-Content $file

Here's what each line does:

  1. Store the .csv file path string as the variable $file.
  2. Create a second variable named $content that actually contains the .csv file contents.
  3. Perform the replacement using our regex expression, and write the result set back to the file with Set-Content. Done and done!
Get-Content -Path 'C:\input\customers.csv'

FirstName,LastName,SSN,Birthdate
Carey,Landry,XXX-XX-XXXX,5/22/1981
Kayla,Duquette,XXX-XX-XXXX,4/2/1970
Mike,Connor,XXX-XX-XXXX,11/29/1955
Wendy,Robbins,XXX-XX-XXXX,10/4/1968

Next Steps

Clearly, using regular expressions is an enormous subject. I'm going to leave you with a laundry list of useful resources. Enjoy!

Some online regex testers that I like:

Some neat online regex tutorials:

Some Windows regex applications that can make learning and using regex easier:

  • + Share This
  • 🔖 Save To Your Account