When doing a Get-Content vs StreamReader speed comparison, you use -like in one and StartsWith in another.
Whats really funny is you call this article Coding for Speed yet you say this:
Wow, I never knew that! That actually explains why the entries with ArrayLists seemed to return more entries than expected. Still, their speed alone means its worth it.
Regex God This award goes toCraig Duff,who blew my socks off with his impressive Regex skills!
In this post, we review some of the things we learned about coding for speed in theHadoop PowerShell challenge. The winners are at the end of this post, so zip down there to see if you won!
ArrayListwill be your new best friend.
Chris Warwick (@cjwarwickps)March 7, 2016
This syntax comes to us by way ofu/evetsleep/u/Vortex100and Kevin Marquette, fromReddit/r/powershell!
Also the reason people cast to void is because it doubles performance again.
Reblogged this onpshirwinand commented:
Yup Out-Null is one of those inefficient cmdlets, faster to use other methods:
Also I tested some other combinations:
As it turns out Select-String (PowerShells text searching cmdlet) is capable of mounting a file in memory, no need to gc it first. Its also MUCH slimmer too, and has speed for days. Look at the performance difference in this common scenario, searching 10 directories of files usingSelect-String, and then stark contract compared toGet-Content.
Taking out your own Trash This cool tip comes to us fromKevin Marquette. If PowerShell has some monster objects in memory, or you just want to clean things up, you can call a System Garbage Collection method to take out your trash, like so:
PowerBullet : PushBullet forPowerShell
Write-Output New array outside of function $newArray
Where Machine Name is in this list ofnames
gAmazingly, our two Speed Demons, Tore Groneng, and Øvind Kallstad, working in conjunction with Mathias Jensen, turned in a blazing fast time of eight seconds, each! To be specific, Øvinds time was 8,778 MS, while Tore beat that by an additional 200 MS. This represents a data throughput of 411.75 MB/s! This is close to the maximum speed of my all SSD Raid-0, so they REALLY turned in quite a result!
Post was not sent – check your email addresses!
Sorry, your blog cannot share posts by email.
Yep, the add method is a special feature of arraylist, which explains why that wasnt working. I didnt know the performance was different between casting to void or piping to out-null, thanks for providing the metrics!
SOLVED! Windows 10 Reset – There was a problem resetting your PC
Backing up your Testlab with Altaro VMBackup
Sorry to find fault with an otherwise absolutely stellar article ?
Write-Output testing ArrayList.. (measure-command -ex $guid = new-object llections.ArrayList 1..100000 % $guid.Add([guid]::NewGuid().guid) out-null ).TotalMilliseconds Write-Output testing `$collection+=… (measure-command -ex $guid = @() 1..100000 % $guid += [guid]::NewGuid().guid ).TotalMilliseconds testing ArrayList… 7784.5875 MS testing $collection+=…465156.249 MS
This is VERY SLOW on big files. If youd like to know a bit more, readDons great post on Get-Content here, orKeiths write-up here.
Part II – Deploying PowerShell GUIs in Minutes using Visual Studio
True Speed comes from going native The fastest of the fast approaches used native c code which powershell has supported since v 3. Using this, you gain a whole slew (thats a technical term) of new dotnet goodness to play with. For examples of this technique, check out whatToreOysindandMathiasdid.
ArrayList is a bit weird. when you add an entry to it, ArrayList responds back with the index position of the new item you added. In some use case in the world, this might be helpful, but not really to us. So, we just pipe our .Add() statement into null, like so:
When were working with large files, or lots of small files, we have a better, option, and that is the StreamReader from . It IS fundamentally different in how it presents the content from the file, so heres a comparison.
Fill in your details below or click an icon to log in:
(measure-command -ex $guid = new-object System.Collections.ArrayList
Your CMS has messed up the HTML formatting on your scripts; theyre showing amp;quot; etc.
The results are in! Great summary about the Hadoop PowerShell Challenge by Stephen Owen! It was fun to see the different approaches. When it comes to speed you cant beat native C code! Great tips! Worth the read!
Reblogged this onSkatterbrainz Blog.
Write-Output Arraylist inside function $arrayList
Arraylist is totally worth it! I just figured Id save some people the troubleshooting time I had to go through.
360 times faster than the Hadoop cluster. Astounding!
Get System Names from SCCM Status Messages, the easyway
Write-Output testing pipeline
$arrayList = New-Object llections.ArrayList
I was simply astounded to see the tremendous speed difference between using PowerShellsGet-Contentcmdlet versus the incredibly fast StreamReader.
Some people put [void] on the front of the line instead, I try to avoid it, seems confusing and very developery too me.
Thank you to everyone who entered.The leaderboards have been updates with your times, and Ill add your throughput when I get the chance this week!
1RedOne closed an issue in PowerShell/PowerShell
This site uses Akismet to reduce spam.Learn how your comment data is processed.
Some people put [void] on the front of the line instead, same result.
One-liner Champion This award was well earned byFlynn Bundy, who managed to turn out a very respectable time of two minutes, and did it all in a one-liner! His code ALMOST fits in a single, tweet, in fact! Only 216 characters!
He is capturing them, he assigns $guid = to the loop.
You can simplify your StreamReader snippet and make it more reliable like so:
Using Select-String alone is a 31x Speed Increase!This is pretty much a no-brainer. If you need to look inside of files, definitely dump yourGet-Contentsteps. Credit goes toChris Warwickfor this find.
Array list is a bit different from a regular string; heres how you do it. First you have to make a new array list (which developers call instantiating an instance of a class, sounds so cool to say it!), like so:
Next, we iterate through each object, and heres the real difference.
We call the ArrayLists .Add() method, instead of using the += syntax. Finally at the end, we get the whole list back out by using return, or just putting the variable name in again.
$newArray += placeholder
Good idea, but how would you suggest we capture the emitted objects? Does capturing them introduce a delay?
Believe it or not, avoiding the pipeline here makes a difference, especially over very large loops. My preference is the last of those three, but youll see all of them used in place of Out-Null.
A close runner-up wasØvind Kallstad, with a very honorable time of 8778 MS.
If possible you should not be adding to collections or arrays inside the loop, just output what you need to pipeline and assign the output instead, much faster.
Drop the Get-Content thats gonna kill you. Just run Select-String on the files directly, *much* quicker
From the original post that started this whole thing,Adam Drakes Can command line tools be faster than your Hadoop cluster?
Great article Stephen. A few comments:
All of these entrants can proudly say that their code DID beat the Hadoop cluster. Boe Prox , Craig Duff, Martin Pugh, /u/evetsleep /u/Vortex100 and kevin Marquette, Irwin Strachan, Flynn Bundy, David Kuehn, and /u/LogicalDiagram from Reddit, and @IisResetme! All eleven averaged a minimum of 10.76 MB/sec. Their code all completed in less than six minutes, much faster than the 26 minutes of the mighty seven node Hadoop cluster!
I think theres something else going on to cause the performance increase youre seeing.
Same concept but with StreamReader Setup a streamreader to process the file $file = New-Object System.IO.StreamReader -ArgumentList $Fullname :loop while ($true ) Read this line $line = $file.ReadLine() if ($line -eq $null) If the line was $null, were at the end of the file, lets break $file.close() break loop Do something with our line here if($line.StartsWith([Re)) $results[$line]+=1
Our group has a new home! Sign up here to stay up to date with
1. Dot net variables are not immutable by design. Certain dot net types are immutable, arrays being one of those types. Strings are another. Not dot net variables in general.
I thought so too! So here you go. Load this into the ISE and run it once. After that, you can hitCtrl+Jand have a nice sample StreamReader code structure.
Notify me of new comments via email.
Im now pleased to announce the winners of the Hadoop contest. I was so impressed with the entries that I decided to pick a bonus fourth winner.
Faster Web Cmdlet Design with Chrome65
You are commenting using your account.(LogOutChange)
$snippet = @ Title = StreamReader Snippet Description = Use this to quickly have a working StreamReader Text = @ $fullname = FilePathHere begin $results = @ process $file = New-Object System.IO.StreamReader -ArgumentList $Fullname :loop while ($true ) $line = $file.ReadLine() if ($line -eq $null) $file.close() break loop if($line.StartsWith([Re)) do something with the line here $results[$line]+=1 end return $results @ New-IseSnippet @snippet
Write-Output Arraylist after function $arrayList
Hacking an Intel network card to work on Server 2012 R2
Runspaces are crazy fast Boe Prox turned in anawesome example of working with RunSpaces, here. If youd like to read a bit more, check out his full write-upguide here. This guide should be considered REQUIRED reading, if speed is your game. Amazing stuff, and incredibly fast, much better than using PowerShell Jobs.
You might notice when you run this that you see something like this:
When I saw that Adam Drake, a master of the Linux command line and Bash tools, was able to process all of the results in only 11 seconds, I knew this was a tall order. We gave it our all guys, theres no shame inBEATING that time!
Arraylist is awesome but Ive found you have to be careful when using it with functions. It tends to ignore scoping (Im assuming since its not a native powershell type or something?). If you modify the variable inside the function, the changes are saved where normally the child scope doesnt directly affect the parent scope.
Write-Output New array in function $newArray
Part I – Creating PowerShell GUIs in Minutes using Visual Studio – A New Hope
Because string comparison can be expensive (I am not sure -like does not uses regex behind), wouldnt it be more relevant if you used the same string function? Otherwise you might test the difference in string comparison operations more than file readers.
So please use [void]. Otherwise youre slowing your code down, ironically enough.
Atlanta PowerShell Users Group on m
This is a good tip because it is not obvious and could easily cause some real confusion. But to clarify, it is not that scope is ignored, it is that the variable is passed by reference instead of by value. The same is true with hash tables in PowerShell.
[using Amazon Web Services hosting] with 7 x dium machine[s] in the cluster took 26 minutesprocessing data at ~ 1.14MB/sec
Working with Get-Content Read our file into File $file = Get-Content $fullname Step through each line foreach ($line in $file) Do something with our line here ex: if($line -like [Re*) $results[$line]+=1
You are commenting using your Google+ account.(LogOutChange)
Sure strings in .net are immutable by design, but variables and objects are not generally immutable.
Sixty times faster!!!The really crazy part, you can watch PowerShells RAM usage jump all over the place, as it doubles up the variable in memory, commits it, and then runs GarbageCollection. Watch how the RAM keeps doubling, then halfing!
Capture command line output with MDTToolkit
c) Using the normal @() with .Add instead of += (invalid, doesnt work).
Thanks for the tips. I knew I was missing a distinction about the variables vs strings. Ill update this
3. When you use a stream, you should use try/finally and close it in the finally block. Otherwise you risk leaving it open in the event of an exception. Always close your streams. Keith Hills example in these comments has a cleaner version that does this.
ClientFaux the fastest way to fill ConfigMgr withClients
I know I said my top three tips, but I also want to give a little extra. Here are some extra BONUS TIPS for you.
If your name is mentioned here, send me a DM and well work out getting you your hard-earned stickers ?
Thanks Tim! That makes more sense than what I assumed it was.
You are commenting using your Twitter account.(LogOutChange)
(measure-command -ex $guid = new-object System.Collections.ArrayList
4. You forgot the y in Øyvind in numerous locations in your article.
Get-Content Select-String example dir $pgnfiles select -first 10 get-content Select-String Result Select-String Only example dir $pgnfiles select -first 10 Select-String Result Testing GC Select-String…3108.5527 MS Testing Select-String Only…99.1534 MS
In one project we were migrating customers from two different remote desktop systems into one with some complex PowerShell code. There was a section of the code which built a list of all of there files and omitting certain ones. When we swapped out $string += for array list, we dropped out execution time from six minutes to only 20 seconds! A huge performance boost with this one tip!
Email check failed, please try again
a) Using ArrayList with += instead of Add (very slow).
Heres why Get-Content can be a bit slow. When youre running Get-Content, or Select-String, PowerShell is reading the whole file into memory at once. It parses it and dumps out a object for each line in the file, sending it on down the pipeline for processing.
No, instead PowerShell has to make a new variable equal to the whole of the old one, add our new entry to the end, and then throws away the old variable. This has almost no impact on small datasets, but look at the difference when we go through 100k GUID here!
This structure sets up a master list, then does some processing for each object, eventually adding it to our master list, then at the end, display the list.
$file = New-Object System.IO.StreamReader -ArgumentList $pwd\build.ps1 try while (($line = $file.ReadLine()) -ne $null) if ($line.StartsWith([Re)) do something with the line here $results[$line]+=1 finally $file.Close()
1RedOne commented on issue PowerShell/PowerShell5589
Most Best Practice Award This one goes to Boe Prox, with a textbook perfect entry, including object creation, runspaces, and just plain pretty code.
Not sure I agree with this claim: dotnet variables are immutable by design.
PowerShell is based off of dotnet andsome dotnet variable types including our beloved string and array are immutable.This means that PowerShellcant simply tack your entry to the end of $collection, like youd think.
You are commenting using your Facebook account.(LogOutChange)
So, now that youve seen how it works, how much faster and better is it?
Click to win some awesome prizes from our sponsor, Altaro!Microsoft MVP
Write-Output Arraylist before function $arrayList
We see this structure a LOT in PowerShell:
Except Out-Null is several orders of magnitude slower than [void]:
Get help much faster (and help each other) on our new dedicated Subreddit!
I spoke at Ignite, Click here to see my video!
2. Since this is about performance, you should never pipe to Out-Null. Piping to Out-Null, especially when youre doing so in a pipeline with only two pipeline elements, is a performance hit because youre invoking a pipeline where you dont need one. You should do one of the following instead:
b) Using System.Collections.Generic.List1[string] instead of ArrayList (doesnt return an index and runs at basically the same speed as the ArrayList with a void return).
Well use the post to cover some of what we learned from the entries here. Heres our top three tips for making your PowerShell scripts run just that much faster!
Speed King Winner This one goes toTore Groneng. He worked closely with Mathias Jensen, and turned out an incredible 8 second total execution. For comparison, this is a200x speed increase over the results of the Hadoop Clusterfrom our original challenge. He should be proud.