Analyzing IIS logs

By Liisa Tallinn and Raido Karro

The most difficult question when analyzing logs is 'What to ask?'  We've put down 20 sample questions and copy-pastable queries to ask from your Microsoft Internet Information Services logs or IIS logs by querying them directly in their current storage location with SpectX. For example, the logs could be stored in the IIS server itself, in an Azure blob or a centralized log server collecting flat text files - SpectX can read them directly from that source location.

SpectX is a log analyzer that skips ingestion and indexing to parse data at runtime. It is quick and easy to set up with existing data sources -  no need to configure and wait for the data to get ingested and indexed. The average processing speed for SpectX is 350MB per second per CPU core which makes it a great tool to tackle even large volumes of data in distributed locations. The full-functionality trial version of the software is free for 30 days.

To get started with the queries,  read the instructions on pointing SpectX at the IIS logs, setting up the pattern (schema) and getting the data parsed. Having run the first query, continue with pasting the scripts below to the end of the initial query.  

SpectX' query syntax is similar to piping UNIX commands. Queries are written by chaining commands one after another, directing the previous output to the next input.

Adding the queries to the inital script

Explore CLIENT IPs

1) Which URIs did a specific client IP visit and how many times? Right-click on one of the cs_ip fields in the results to add that IP as the filter.
| filter(c_ip = 0.0.0.0)
| select(cs_uri_stem, count(cs_uri_stem) as requestcount)
| group (cs_uri_stem)
| sort(requestcount DESC)
2) How many requests are unique client IPs generating? Sort the results in descending order. 
| select(c_ip,reqcount:count(c_ip)) 
| group(c_ip)
| sort(reqcount desc)
Note: if you wish to get a top list of c_ip’s, add a limit to this script. Eg to get top 20 IPs:
| limit(20)
3) How many hits per hour did the server get from a particular IP address?
| filter(c_ip = 0.0.0.0)
| select (date_time[1 hour], reqcount:count(*))
| group(date_time[1 hour])
Press on the chart button to get a line visualization

4) What is the geographical location of the IP addresses? Generate a “pew-pew” map.
| select(date_time, GEO(c_ip))
| sort(date_time)
Click on ‘Map’ and shorten the range to hit play and enjoy the scenery.



5) How many times did different IPs visit a particular URI that contains the string ‘foo’?
| filter(cs_uri_stem like '%foo%') 
| select(c_ip, reqcount:count(c_ip))
| group (c_ip)
| sort(reqcount DESC)
Note: If you wish to display this result on a chart, add one more line and then click on the Chart button
| select(string(c_ip), reqcount) //transform the IP back to strings
6) What is the domain associated with these IP addresses (ReverseDNS)
| filter(c_ip IN (0.0.0.0, 1.1.1.1, 1.2.3.4)) 
| select(c_ip, reverse:DNS_LOOKUP(c_ip))
| group(c_ip)

Explore the URIs


7) How many times was each URI hit?
| select(cs_uri_stem, reqcount:count(cs_uri_stem))
| group(cs_uri_stem)
| sort(reqcount desc)
8) How many times were the URIs containing ‘foo’ or ‘bar’ hit?
| filter(cs_uri_stem like '%foo%' or cs_uri_stem like '%bar%')
| select(cs_uri_stem, reqcount:count(cs_uri_stem))
| group(cs_uri_stem)
| sort(reqcount desc)
9) How many times were the URIs containing ‘foo’ or ‘bar’ hit? When was the first and when the last hit?
| filter(cs_uri_stem like '%foo%' or cs_uri_stem like '%bar%') 
| select(cs_uri_stem, c_ip, reqcount:count(cs_uri_stem), first_seen:min(date_time), last_seen:max(date_time))
| group(cs_uri_stem, c_ip )
| sort(reqcount desc)
10) How many times did various IPs request a URI containing ‘foo’ or ‘bar’?
| filter(cs_uri_stem like '%foo%' or cs_uri_stem like '%bar%') 
| select(cs_uri_stem, c_ip, reqcount:count(cs_uri_stem))
| group (cs_uri_stem, c_ip )
| sort(reqcount desc)
11) Compare the hit count in time for URIs that contain ‘foo’, ‘bar’ and ‘baz’.
| select(date_time, cs_uri_stem
,total_hits:count(*)
,foo:count(cs_uri_stem like '%foo%')
,bar:count(cs_uri_stem like '%bar%')
,baz:count(cs_uri_stem like '%baz%')
)
| group(cs_uri_stem)
| filter(foo > 0 AND bar > 0 AND baz > 0)
Press on ‘chart’ to get a visualization on this graph.

12) How many times per hour did a particular IP visit URIs containing foo or bar?
| filter(c_ip=0.0.0.0 and (cs_uri_stem like '%foo%' or cs_uri_stem like '%bar%'))
| select(date_time, cs_uri_stem, reqcount:count(*))
| group(date_time[1 hour], cs_uri_stem)
| sort(reqcount desc)
13) What are the URIs visited after 1st September 2019 at 10 AM? How many hits did they get?
| filter(cs_uri_stem like '%foo%' or cs_uri_stem like '%bar%') 
| filter(date_time > T('2019-09-01 10:00:00') )
| select(cs_uri_stem, c_ip, reqcount:count(cs_uri_stem))
| group(cs_uri_stem, c_ip )
| sort(reqcount desc)

Explore Misc

14) What are the top 50 user agents?
| select(totalhits:count(*), browser:cs_user_agent)
| group(browser)
| sort(totalhits desc)
| limit(50)
15) What external pages (referrers) refer to broken links?
| filter(cs_referer IS NOT NULL AND sc_status = 404 AND (sc_substatus IS NULL OR sc_substatus=0))
| select(referer:cs_referer, uri:cs_uri_stem)
| group(referer)
16) How many times did the server return a particular status code?
| SELECT(status:sc_status,count(*) ) 
| GROUP(status)
| SORT(status)
17) Are there any requests originating from the TOR network?
This query fetches and parses current TOR exit nodes published on the project’s website and joins them virtually with the IIS logs based on the client IP. 
| join(PARSE(src:'https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=1.1.1.1', pattern:'ipaddr:exitAddress (EOL|EOS)') on left.c_ip = right.exitAddress)
18) Are there any Googlebot user agents not coming from Google AS? That is - who's pretending to be a Googlebot?
|filter(asname(c_ip) != 'AS15169 Google LLC' and lower(cs_user_agent) like '%googlebot%')

Explore the Traffic

These two queries are executable if logging cs_bytes and sc_bytes are enabled on the IIS server (check the pattern)
19) What is the hourly incoming and outgoing bandwidth in KB? (chart)
| select(hour:date_time[1 hour], incomingKB:round(Sum(cs_bytes)/1024,2), outgoingKB:round(Sum(sc_bytes)/1024,2))
| group(hour[1 hour])
20) What are the total, max and average bandwidth and max & average time of requests per URI in hourly slots?
| select( hour:date_time[1 hour], uri:lower(cs_uri_stem), reqcount:COUNT(*), TotBytesSent:SUM(sc_bytes), AvgBytesSent:AVG(sc_bytes), MaxBytesSent:Max(sc_bytes), AvgTime:(Avg(time_taken)*0.001), MaxTime:(MAX(time_taken)*0.001)) 
| group(hour[1 hour], uri)
| filter(reqcount> 10)
| sort(reqcount ASC)
| limit(80)

Export the results and Run the Query via API

Click on Save > Save File As to save the script to the SpectX resource tree. To export the results to e.g. CSV or JSON, click on Save > Export As and choose the convenient format.

To run a query via the API, save the query to the SpectX resource tree (Save > Save File As). Then right-click on the script file at the resource tree > properties. Copy the API query from the API tab.

Conclusion

These 20 sample queries are just a start. SpectX has a rich query language that helps to deliver complex analysis tasks and test even the wildest hypotheses. Browse the documentation for inspiration and don't forget to use Ctrl+space for auto-complete.

Back to articles