Analyzing PowerShell Transcripts

By Raido Karro and Liisa Tallinn
28 May 2020

Even if you’re not using PowerShell in your Windows machines, your adversaries and pen-testers are. Why use custom hax0r tools when every Windows machine comes with a built-in scripting environment? Powershell lets the attacker do pretty much anything while leaving no traces on disks. It’s a mistake not to love PowerShell as an attacker and a mistake not to love PowerShell logs as a defender.

The following is a parsing and query tutorial using SpectX’ free log analyzer to get general statistics as well as specific insights from PowerShell transcript logs that are notoriously hard to parse. There is no need to import or ingest the files from their current output directory - SpectX runs queries directly on raw logs. Creating a clean view from the raw logs only takes a couple of minutes.

We've uploaded a small set of transcript files to our public S3 repository - feel free to run the queries below on this sample data. The questions we've asked in the queries are aimed at exploring the baseline and spotting abnormal (potentially malicious) behaviour.  What were the top commands run? When were most commands run? When were they run for the first time? What were the command outputs that appeared on the console when running the command - anything unusual? Who are the top users? Any users running commands they shouldn’t be running? Top host applications? How many commands per hour/day/week? What about obfuscated commands? And so on - once you get a hang of SpectX queries, the follow-up questions never stop. 

Contents

About Transcripts
How to turn on transcription logging
Format of the transcript logs
Reading and parsing the logs with SpectX
Who? What? When? How? - the queries 

Transcripts Know it All... or Almost 

There are 3 types of logging that can be enabled for PowerShell: module logging, script block and transcription. As the first two can be viewed as Windows events, it is often recommended you manually read the transcript text files only when an issue arises from monitoring the first two. At the same time, the information contained in transcripts is the most comprehensive of the three and the flat text files are much more storage-efficient than Windows events produced by module and script block logging. For example, running a simple write-host ‘hello world’ or in this case ‘All You Need is Logs’ produces 22 Windows events from module and script block logging but the same information plus the command output is also logged into this 1KB PowerShell transcript:

**********************
Windows PowerShell transcript start
Start time: 20200521135800
Username: DESKTOP-7KL6SCQ\Loki
RunAs User: DESKTOP-7KL6SCQ\Loki
Configuration Name:
Machine: DESKTOP-7KL6SCQ (Microsoft Windows NT 10.0.18363.0)
Host Application: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
Process ID: 5624
PSVersion: 5.1.18362.752
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.18362.752
BuildVersion: 10.0.18362.752
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
**********************
Command start time: 20200521135818
**********************
PS C:\Users\Loki> write-host "All You Need is Logs"
All You Need is Logs
**********************
Command start time: 20200521135821
**********************
PS C:\Users\Loki> exit
**********************
Windows PowerShell transcript end
End time: 20200521135821
**********************
The trouble is - transcript logs were historically designed for debugging purposes, that is for humans rather than machines to read. Their content is almost freeform and notoriously hard to parse. Digging in the web, you can find a few attempts on the web to ingest transcripts into a log processing tool but none of them achieves extracting commands as well as command outputs into separate fields. This is a pity because the command outputs are only logged to transcripts and insights into what actually happened as a result of PS commands can be valuable.

Enter SpectX. Skipping the ingestion phase and using building blocks in the parsing language it can extract not only the user and host/machine fields in the prologue but also command start times, commands and outputs. As SpectX’ parser is applied to the raw log files during query runtime, it also means you can debug, enhance and fix the parser with each query in case the log format happens to change. And PowerShell transcription logs are indeed volatile.

When starting to write this post, we believed PowerShell transcripts are almighty and record everything. However, playing around with commands and outputs, it turned out there’s an exception. When running a PS script from a URL the transcript catches the command (including the URL) and output but leaves you completely blind regarding the contents of the script or commands run by the script. For example, to learn what this particular script run from a url did, you’d need to turn to Windows events produced by module logging.
**********************
Windows PowerShell transcript start
Start time: 20200521141014
Username: DESKTOP-7KL6SCQ\Loki
RunAs User: DESKTOP-7KL6SCQ\Loki
Configuration Name:
Machine: DESKTOP-7KL6SCQ (Microsoft Windows NT 10.0.18363.0)
Host Application: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
Process ID: 788
PSVersion: 5.1.18362.752
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.18362.752
BuildVersion: 10.0.18362.752
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
**********************
Command start time: 20200521141021
**********************
PS C:\Users\Loki> iex (New-Object "net.webclient").DownloadString("http://spectx.com/test.ps1")
All You Need is Logs!
**********************
Command start time: 20200521141023
**********************
PS C:\Users\Loki> exit
**********************
Windows PowerShell transcript end
End time: 20200521141023
**********************

How to Turn on PowerShell Transcript Logging 

1. Open the Start menu and type 'gpedit' to open a Local Group Policy Editor window


2. Navigate to Admin Templates > Windows Components > Windows PowerShell


3. Click on ‘Turn on PowerShell Transcription’

4. Select ‘Enabled’ and specify the path to the output directory. Important! Check the box to turn on invocation headers. This will record each command run with a Command start time.


5. Click OK

6. Open a PowerShell window to test if the logging works. Type in a command, eg. a simple write-host: ‘all you need is logs’ followed by ‘exit’.


7. Check if the output directory you specified in step 4 contains a transcript with the timestamps and actions in step 6.


In the long run it’s a good idea to centralize the collection of PowerShell transcripts to analyze them at scale, e.g. check this tutorial by @dan_franciscus in the 4sysops blog.

Transcript Log Format 

A few words about the log format and the parser logic. If you’d rather get querying and are not that interested in the anatomy of the log files and the parser, skip to the next section.  Here’s a sample of a raw PS v. 5.1 transcript log:

**********************
Windows PowerShell transcript start
Start time: 20171030223248
Username: WIN-FT17VBNL4B2\Administrator
RunAs User: WIN-FT17VBNL4B2\Administrator
Machine: WIN-FT17VBNL4B2 (Microsoft Windows NT 10.0.14393.0)
Host Application: C:\Windows\system32\WindowsPowerShell\v1.0\PowerShell.exe
Process ID: 4268
PSVersion: 5.1.14393.1770
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.14393.1770
BuildVersion: 10.0.14393.1770
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
**********************
Command start time: 20171030223255
**********************
PS C:\Users\Administrator> echo test
test
**********************
Command start time: 20171030223256
**********************
PS C:\Users\Administrator> exit
**********************
Windows PowerShell transcript end
End time: 20171030223256
**********************

A transcript file is created with each PowerShell session. The first section of the transcript is a prologue containing session-specific fields like session start time, user, PowerShell version etc. Then, after a line of asterisks, the collection of commands and outputs run during this session. Lastly, an optional epilogue section at the end of the file with session end time but only if the session was finished with an exit command (and not by closing the window).

The number of commands between the prologue and epilogue varies. This is why we’ve taken a two-step approach: first parsing the prologue, epilogue and commands-outputs in bulk; then extracting the commands and outputs into separate fields in the query layer.

How to Read and Parse the Transcripts 

SpectX reads data directly from raw log files and parses them at runtime. In the pattern, we assume you’ve turned on invocation headers, that is each PowerShell command is recorded with a command start time. To view the transcripts as a clean table:

1. Download and install Free SpectX.

2. Copy this base query (Github) into the query window

3. Modify the path on line 47 to match the output directory of your transcript log files. Don’t change the time variables in the path - this will enable filtering logs based on timestamps in the file path. For example, if the path to one of your transcript files is

C:\Logs\powershell\20200405\PowerShell_transcript.WS01.5OoA7WLR.20200405152828.txt

then the URI in the SpectX query that points to all log files with a similar path in the C:\Logs\powershell folder would be:

file:/C:/Logs/powershell/$yyyy$$MM$$dd$/PowerShell_transcript.*.$yyyy$$MM$$dd$$HH$$mm$$ss$.txt

4. Change the number of days in the init-script at the very beginning of the script (lines 2-4) if you'd like to take a look at older/newer logs.

5. Press run. The results should look similar to this:


Who? What? When? How? The Queries 

When asking questions from the logs, you’re either looking for the known or the unknown. Looking for the known is easy - just filter the data you need. For example:
| filter(lower(output) CONTAINS 'error') 

gives you all the commands that contain the string ‘error’ in the output. 

If you wanted to look at a specific time period, modify the init block containing timestamps at the very beginning of the base query script.

It gets more interesting if you’re looking for the unknown and want to drill down to anything unusual. The first step here is to establish the baseline - the usual and then filter out as something you’re not interested in. For example, a neat trick is to look at when was a command first run and then focus on commands that were first seen later than others (see sample query #2 below).

Copy and paste these queries one-by-one to the end of the base query. Double click a record to see all its contents, right-click on a field to apply (negative) filters.

Insights into Commands and Command Outputs

1. Top commands

| select(cnt:count(*), command)
| group(@2)
| sort(cnt DESC)

2. When was the first time a command was run?

| select(min(commandStartTime),count(*),command,*)
| group(command)
| sort(min_commandStartTime)

3. Top command-output pairs

| select(cnt:count(*), command,output)
| group(command,output)
| sort(cnt DESC)

4. Number of commands per hour

| select(count(*),trunc_cmd_start:commandStartTime[1h])
| group(trunc_cmd_start)

5. Commands that were probably written manually, i.e. commands beginning with 'c:\\'

| select(start_time, end_time, Username, RunAs_User, Machine, PID, PSVersion, commandText)
| filter(lower(commandText) contains 'c:\\')

6. Parse the output field once again to detect IP addresses in commandText

| select(parse("DATA* IPADDR:ip DATA*", commandText), commandText, *)
| filter_out(ip IS NULL)

7. Find the string 'download' inside command_output even if it is obfuscated (remove punctuation). For example, if the invoke-command looked like this:

| select(deobf_commandText:lower(REPLACEALL(commandText, "[\\`,\\‘,\\’,\\,\\+,\\*,\\',\\”,\\“]" , '')) ) 
| filter(deobf_commandText contains 'download')

Generic Stats

8. How many PowerShell sessions per hour?

| select(start_time:start_time[1 h],count(*))
| group(start_time)

9. TOP 100 Usernames

| select(Username, cnt:count(*))
| group(@1)
| sort(cnt DESC)
| limit(100)

10. How many records did a user generate per machine?

| select(Username,Machine,cnt:count(*))
| group(@1,@2)
| limit(100)

11. Top Host Applications - list of applications that run PowerShell, including additional command parameters.

| select(cnt:count(*),Host_Application)
| group(@2)
| sort(cnt DESC)
| limit(100)

12. Top 100 Host Applications per User

| select(Host_Application, Username, cnt:count(*))
| group(@1,@2)
| sort(cnt DESC)
| limit(100)

13. Take a closer look at punctuation in commands. Commands that are outliers (the count is small) deserve a detailed look.

| select(PUNCT(command, 64, false), cnt:count(*),uniq_commands:ARRAY_LEN(ARRAY_UNIQ(ARRAY_AGG(command))),commands_array:ARRAY_UNIQ(ARRAY_AGG(command)))
| group(@1)
| sort(cnt)
| limit(100)

Conclusion

Even though PowerShell transcript logs are not easy to read for machines, the information they contain is comprehensive and worth analyzing. After all, PowerShell is a tool for automating and configuring almost any task in Windows - it's a ‘ring to rule them all’. Taking a focused look at PowerShell stats, commands and their outputs is a good way of making sure the ring is not in the wrong hands or used cluelessly.

Acknowledgements
Thank you Taavi @ Clarified Security. Transcripts from the Hunt the Hacker course were a great starting point for this article.

Back to articles