Leanpub: Publish Early, Publish Often

Measuring Tool Performance

We PowerShell geeks will often get into late-night, at-the-pub arguments about which bits of PowerShell perform best under certain circumstances. You’ll hear arguments like, “The ForEach-Object cmdlet is slower because its script block has to be parsed each time” or, “Storing all those objects in a variable will make everything take longer because of how arrays are managed.” At the end of the day, if performance is important to you, this is the chapter for you.

Is Performance Important

Well, maybe. Why is performance important to you? Look, if you’ve written a command that will have to reboot a dozen computers, then we’re going to be splitting hairs all night about which way is faster or slower. It won’t matter. But if you’re writing code that needs to manipulate thousands of objects, or tens of thousands or more, then a minute performance gain per object will add up quickly. The point is, before you sweat this stuff, know that tweaking PowerShell for millisecond performance gains isn’t useful unless there are a lot of milliseconds to be saved.

Measure What’s Important

But if performance is important, then you need to measure it. Forget every possible argument for or against any given technique, and measure it. And, as you measure, make sure you’re measuring to the scale that your command will eventually run. That is, don’t test a command with five objects when the plan is to run against five hundred thousand. Pressures like memory, disk I/O, network, and CPU won’t interact in meaningful ways at a small scale, and so small-scale measurements won’t prove out as you scale up your workload.

Think of it this way: just because a one-lane road can carry 100 cars an hour, doesn’t mean a 4-lane road can carry 400 an hour. It’s a different situation, with different dynamics. So measure against the workload you plan to run.

You’ll perform that measurement using the Measure-Command cmdlet. Feed it your command, script, pipeline, or whatever, and it’ll run it - and spit out how long it took it to complete. Take this short script as an example (this is test.ps1 in the sample files):

measuring-tool-performance/test.ps1

Write-Host 'Round 1' -ForegroundColor Green
Measure-Command -Expression {
    Get-Service |
    ForEach-Object { $_.Name }
}

Write-Host 'Round 2' -ForegroundColor Yellow
Measure-Command -Expression {
    Get-Service |
    Select-Object Name
}

Write-Host 'Round 3' -ForegroundColor Cyan
Measure-Command -Expression {
    ForEach ($service in (Get-Service)) {
        $service.name
    }
}

This does the same thing in different ways. Let’s run that to see what happens:

Round 1

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 148
Ticks             : 1486572
TotalDays         : 1.72056944444444E-06
TotalHours        : 4.12936666666667E-05
TotalMinutes      : 0.00247762
TotalSeconds      : 0.1486572
TotalMilliseconds : 148.6572

Round 2
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 37
Ticks             : 379826
TotalDays         : 4.39613425925926E-07
TotalHours        : 1.05507222222222E-05
TotalMinutes      : 0.000633043333333333
TotalSeconds      : 0.0379826
TotalMilliseconds : 37.9826

Round 3
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 38
Ticks             : 389199
TotalDays         : 4.50461805555556E-07
TotalHours        : 1.08110833333333E-05
TotalMinutes      : 0.000648665
TotalSeconds      : 0.0389199
TotalMilliseconds : 38.9199

There’s a significant penalty, time-wise, for the first method, while the second two are almost tied. Neat, right?

One thing to watch for when running Measure-Command is that a single test isn’t necessarily absolute proof. There could be any number of factors that might influence the result. Sometimes it helps to run the test several times. Jeff wrote a command called Test-Expression in the PSScriptTools module that allows you to run a test multiple times, giving you (hopefully) a more meaningful result. There’s even a GUI version.

Factors Affecting Performance

There are a bunch of things that can impact a tool’s performance.

Collections and arrays can get very slow if they get really big and you keep adding objects to them one at a time. This slowdown has to do with how .NET allocates and manages memory for these things.

Anything storing a lot of data in memory can be affected if .NET has to stop and garbage-collect variables that are no longer referenced. Generally, you want to try and manage reasonable amounts of data in memory, not great huge wodges of 60GB text files.

Compiling script blocks - as ForEach-Object requires - can incur a performance penalty. It’s not always avoidable, but it isn’t the fastest operation on the planet in some cases.

Wasting memory can result in disk paging, which can slow things down. For example, in the below fragment, we’re still storing a potentially and unnecessarily huge list of users in $users long past the point where we’re done with it.

$users = Get-ADUser -filter *
$filtered = $users | Where { $_.Department -like '*IT*' }
$final = $filtered | Select Name,Cn
$final | Out-File names.txt

It’d be better do to this entirely without variables, and let the filtering happen on the domain controller:

Get-ADUser -filter "Department -like '*IT*'" |
Select Name,Cn |
Out-File names.txt

Now we’re getting massively less data back from Active Directory, and storing none of it in persistent variables. Or to put it more precisely, this is an example of the benefits of early filtering.

Here’s the problem - We often see beginners write a command like this:

Get-CimInstance win32_service -computername server01 |
where state -eq 'running'

This may not seem like a big deal but imagine the CIM command was going to return 1000 objects. With the approach we just showed, the first command has to complete and send all 1000 objects, in this case across the wire and then the results are figured. Compared to letting Get-CimInstance do the filtering in place - on the server - and then only sending the filtered results back.

Get-CimInstance win32_service -computername server01 -filter "state = 'running'"

There’s one other feature you should take advantage of when using Get-CimInstance and not many people do. Let’s say you are using code like this:

Get-CimInstance win32_service -computername $computers-filter "state = 'running'" |
Select-Object -property Name,StartMode,StartName,ProcessID,SystemName

The $Computers variable is a list of computer names. This pattern is pretty common. Get something and then select the things that matter to you. However, the remote server is assembling a complete Win32_Server instance with all the properties. But you are throwing most of them away. The better approach is to limit what Get-CimInstance will send back.

$CimParams=@{
  ClassName    = 'win32_service'
  ComputerName = $computers
  Filter       = "state = 'running'"
  Property     = 'Name','StartMode','StartName','ProcessID','SystemName'
}
Get-CimInstance  @CimParams |
Select-Object -property Name,StartMode,StartName,
ProcessID,@{Name="Computername";Expression={$_.SystemName}}

PowerShell will want to display the results using its default formatting. You’ll most likely use Select-Object or create your custom object. Regardless, this approach should run slightly faster. It may be small, but it can add up. You’ll appreciate this when you are querying 500 servers.

Key Take-Away

You should get used to using Measure-Command to test your code, especially if there are several ways you could go. We’ll look at other performance-related concepts in the Scripting at Scale chapter. But for now your key take-away should be that good coding practices can go a long way toward avoiding performance problems!

Up next

Part 6: Pester