Collecting and Analyzing Google Workspace Audit Logs

What's the most accessed publicly shared document in your Drive? How many documents per user were downloaded during the past month? Where are all those calendar (spam) invites coming from? If your business relies on Gmail, Docs, Drive, Calendar, Meet and other products from Google, the answers to these questions lie in Google Workspace (formerly G Suite) audit logs. This data is a critical asset in any security incident investigation but also for running basic threat hunting queries to detect if there’s anything weird or interesting going on. This article is a quick guide for insightful analytics on Google Workspace logs, even if you’re just a small organization using a basic Google Workspace plan. We’ll also show you how to export and store the raw data via API and get a longer view than Google’s retention period.

How to Access the Logs

As Workspace logs are enabled by default, the question is how to access and analyze them and whether the log retention period is long enough for your organization. By default, you can go back 6 months and need to export the logs if you need to keep a longer history.

Option A: Browse the Dashboard in Google Admin Console.

The easiest way for a quick birds eye view of the activities and resources of your organization is to log in to the Admin Console and then go to Reports. You’ll be able to browse basic graphs for trends plus take a look at audit logs for admins, apps and users during the last 6 months. The Audit log UI will also let you apply basic filters, such as ‘show me all activities from user X’ or ‘show me all admin activities in September’.

Google Admin Console Reports screenshot

Option B: View the Logs in GCP Logs Explorer

If you’re looking to run more advanced queries on your Google Workspace logs, you can turn on sharing the data with Google Cloud Platform Services from Account settings. This will enable you to query the logs in GCP’s Log Explorer and work with them using Google’s Logging query language. Here’s an easy-to-follow tutorial video for configuring Workspace and GCP settings.

Screenshot of GCP Console Logs Explorer

Option C: Export the logs via Workspace API 

SpectX offers a convenient way of getting a full copy of all available log data via Google's APIs, including 450 days of reports. That is, grabbing the raw data (json) to your local machine for further processing and analysis. It takes about 10 minutes to set everything up on Google's side and get your first insights - see instructions here. To get instant insights from your data, we’ve also prepared a SpectX query pack for Google Workspace logs. Note that querying data via Workspace API is available in the free edition of SpectX until 1 April 2021. After that, you can upgrade to a Business or Enterprise edition to continue using this feature.


Beware the Lag Times

Even though the data available via API is latest and greatest and some of the records (like admin, drive) do appear near real-time, it can take Google 1-3 days to make specific Admin console reports and audit logs available. The data retention varies, but in most cases you can go back as far as 6 months (15 months for reports). If you query data with SpectX, the raw json is stored in your local machine thereby giving you a longer history of events with each query. Here’s Google’s overview on Workspace data retention and lag times.

Workspace Log Format

Having created your Google Workspace datastore in SpectX, you’re ready to take a first look and browse through the last 6-15 months of your organisations’ logs. The logs that SpectX retrieves via the API are divided into two broad categories: activity and reports.
Note that the first time you download the logs via API, the query can take some time to complete. E.g retrieving a month of *all* activity logs took 10 minutes with good internet connection (465 Mbps) for an organisation with 5-10 active users. As the raw data is cached to the local machine, the next queries will be considerably faster.

Activity Logs

First, let’s take a closer look at activity logs stored in activity->year->month->day->hour folders. Each hour-folder contains up to 17 different json files (depending on which boxes you ticked when setting up the Gsuite datastore in SpectX), such as drive, meet or calendar but also login logs. You can query them all simultaneously with SpectX, just run the view.data query from the query pack and specify the time period in the init block.

Reports

Reports contain two files: user and customer logs stored daily into year>month>day folders. However, because of the lag times, ‘daily’ means it will take 1-2 days to generate a report. User reports contain a daily summary snapshot for every user in the organization: how many times did the user log in, how many documents did they modify, how many emails did they receive, how much quota do they have available. Customer reports are also daily snapshots but for the organization as a whole. How many users were active in drive, how many emails were sent in total, what sort of devices and operating systems are in use and what kind of third-party apps are authorized to access Google Workspace data. These are the questions you can ask from reports.

Practical Query Examples

We’ve prepared dozens of queries for you to run on your Workspace logs. Note that there are several streams at the beginning of most queries. Use (uncomment) the one most suitable for you in terms of time period and app, feel free to tweak them. The app name can be replaced with any app you selected when configuring the datastore. If you don't remember, right-click on your Workspace datastore > Configure and see which boxes are checked. Here's the full list of app names you can use in the queries. Or if you'd like to see them all at once, use an asterisk  (*).

1. What’s going on? The big picture.
Take a look at a certain time period, count the number of active users and frequency of events . You’ll see peaks - hours with more active users than usual and/or more events or more events and less users. 
// How many unique users and events per hour

@[./view/view_data.sx] // Points to default query.
| select(time:time[1h], email, events:count(*))
| group(time, email)
| select(time, users:count(*), events:sum(events))
| group(time)

Click on the  'Chart' button in the upper menu. Select a peak you're interested in on the graph and the filter query will appear below.


2. Idle users

A low hanging fruit for reducing costs for your pay-per-user Workspace package is to look at users who have not logged in for a long time. Have they perhaps left the company altogether and forgotten to be deactivated? Run this query on the reports to find out:
@[./view/view_data.sx](app:'user')
| select(max(accounts[last_login_time]),email,*)
| group(email)
| sort(last_login_time)
The resultset will show you unique users not seen for the longest. If the user has never logged in, the timestamp is a Unix epoch, you can find them by adding this filter to the query above:
| filter(last_login_time = var("1970-01-01 00:00:00.000 +0000"))

3. Which files are shared externally? How many times have they been accessed? By whom?

The resultset on this query on activity logs will show externally shared documents edited/accessed/permission operated the most. You’ll also see a list of their titles and people performing the operations. Note that if the doc was shared ‘anyone with the link can view’, the email of the viewer is left blank.
| select(event[drive][visibility],
    event[drive][doc_id], *)
| filter(visibility = var("shared_externally"))
| select(count(*),
    doc_id,
    event[drive][doc_type],
    doc_titles:ARRAY_UNIQ(ARRAY_AGG(event[drive][doc_title])),
    visibility,
    event[drive][owner],
    emails:ARRAY_UNIQ(ARRAY_AGG(email))
    )
| group(doc_id)
| sort(count DESC)
| limit(1000)

Say you didn’t like the results of this query and saw a file with an alarming title shared externally. Unfortunately, there isn’t a good way to get more information about its contents or location. The best option is to enter the doc title into a search bar in your drive but if you're not auhtorized to access the file (superadmins != total access), the best option is just to have a conversation with its owner. If this is not possible, the most radical option is to suspend the user and transfer their data to yourself. To do this, go to Google Admin Console > Apps > Google Workspace > Drive and Docs. 

What about the access log of this confidential document, i.e. is our confidential information now viral? All actions, including external views, do get logged but if the document has been shared with ‘anyone with the link can view or edit’, there isn’t’ much you can learn about those views. There's a timestamp but no IP-address, no email (even if the external user was logged in to Google in their domain). However if the file was (additionally) shared with a specific user, their email address will show up.

4. Calendar logs - finally some external IPs

The only Google Workspace app to log external IP addresses is Google Calendar. You’ll see the IP from people who have sent a calendar invite. As SpectX can directly work with Maxmind to calculate the geolocation of an IP, you can take a look at the geolocations and AS names of calendar inviters with this query:
@[./view/view_data.sx] (from:now()[-7 day], to:now(), app:'calendar') 
| filter_out(email contains "spectx.com")
| filter_out(ipAddress IS NULL)
| select(ipAddress, cc(ipAddress), asname(ipAddress), geo(ipAddress), *)

5. What about Gmail logs?

Who sent / received a particular email, was it delivered or detained, opened - these records are available via a different API not currently supported by SpectX. You can however get basic email stats from the reports - how many emails were sent/received, how much spam included?

@[./view/view_data.sx](from:now()[-2 day], to:now(), app:'user')        // Change from, to and app values.

| select(
        email,
        total_recived:               SUM_LONG(LONG(gmail[num_emails_received])),
        total_exchanged:             SUM_LONG(LONG(gmail[emails_exchanged])),
        total_emails_sent:           SUM_LONG(LONG(gmail[num_emails_sent])),
        total_spam_emails_received:   SUM_LONG(LONG(gmail[num_spam_emails_received]))
)
| group(email)

To query the detailed email log, go to Google admin console > Reports > Email Log Search. Note that admins conducting  search here get logged and you can keep an eye on admins becoming too curious with this query. There is one exception though -  searches on the subject field will not appear in the logs.

 @[./view/view_data.sx] (from:now()[-2 day], to:now(), app:'admin')   // Change from, to and app values.
| filter(eventName = "EMAIL_LOG_SEARCH")
| select(time,
    ipAddress,
    email,
    sender:event[admin][EMAIL_LOG_SEARCH_SENDER],
    recipient:event[admin][EMAIL_LOG_SEARCH_RECIPIENT],
    *)

Conclusion

Google Workspace logs are well documented and easy to work with. At the same time, the logs are quite sanitized so there’s not too much to be found about external users accessing your data. The lag times can be annoying. For example, when we experimented with calendar logs, it took a day for an action to appear in the logs. Other than that, Google Workspace logs are a rich dataset ready to reveal useful facts.


Back to articles