3 Sep, 2021

Grafana & BigQuery

This guide aims to help you get up and running making dashboards in Grafana with BigQuery as a data source. It won't go into details on using Grafana itself which is an entirely separate topic. Also, it's possible your data is entirely different from the examples here. Hopefully, the specifics don't as matter much as the broader takeaways.

Avoid the query builder

Grafana includes a third-party, unofficial BigQuery plugin which lets us integrate BigQuery as a datasource. However, its bundled query builder isn't great for building useful and efficient queries. I'd write a list of reasons but hopefully it'll be evident from the tips below. So, do not use the query builder. Click "Edit SQL" along the bottom and use good ol' SQL.

📅 Note the date of this post. It's possible the BigQuery plugin has changed and improved.

Data structure

For the examples below, we'll be using a table called app.website.page_views which has the columns:

created_at (timestamp)
os_name (string)
browser_name (string)
user_name (string)
x_forwarded_for (JSON array of strings)
status_code (number)

And a table called app.auth.users which has the columns:

user_name (string)
group_name (string)

Query structure

Here's an example of the sort of query's we'll be writing. This one shows the number of unique views for each browser. We'll get into the details next.

SELECT
  TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * 20) * ${__interval_ms} * 20) AS time,
  `browser_name` AS metric,
  COUNT(DISTINCT `user_name`) AS total
FROM
  `app.website.page_views` views
WHERE
  `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
  AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
GROUP BY `time`, `browser_name`
ORDER BY `time` DESC, `browser_name`
LIMIT 10000

Time series columns

Every time series query we build needs 3 resulting columns: time, key, value. It doesn't matter what we name them as long as the types are valid. It's best to be consistent so that we can copy-and-paste queries and value mappings later. We'll use time, metric and total to avoid clashing with SQL keywords.

Time and bucket size

The time column is the most important part of the query. It limits the number of datapoints. The division and multiplication essentially groups each datapoint to the last bucket. In the example above, we scale the buckets by 20 to make the buckets larger and avoid too many datapoints. This is the bucket_size. We can use Grafana's Custom Variables to make this configurable on-the-fly.

103131210202  # raw `created_at` datapoints

|  |  |  |    # bucket_size = 3
4  5  3  4    # bucketed `time` datapoints

|     |       # bucket_size = 6
9     7       # bucketed `time` datapoints

TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * 20) * ${__interval_ms} * 20) AS time

Bucketing by day, month, year.

To avoid all the maths, we can use TIMESTAMP_TRUNC instead. It’s less flexible, you won’t get adjusted granularity based on the panel’s size, but it’s a lot simpler.

TIMESTAMP_TRUNC(`created_at`, DAY) AS time

Data delay

It's worth mentioning there's around a delay between sending data to BigQuery and it being available for querying. This may only be true for some tables and setups, but keep that in mind. Creating real-time dashboards may not be possible depending on your setup.

Global variables

The variables with the double underscores in the queries are global variables (like ${__variable}).

Partitioned tables

By limiting the query using _PARTITIONTIME, BigQuery is able to search within specific daily partition tables. This makes queries faster and cheaper to run as it doesn't need to scan the larger table. For example:

With _PARTITIONTIME: 150MB
Without _PARTITIONTIME: 5GB

It's a huge difference.

Note that because partition tables are aggregated periodically, we'll want to query from the period before our time range starts; since _PARTITIONTIME is the start of that period. Assuming that period is a day, this is why the query uses ${__from} - 86400000 a lot. Here's a visual explanation:

[ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ]   # '[' indicates _PARTITIONTIME

       [  time_range  ]          # without substracting a day
          [ 3 ][ 4 ][ 5 ]        # [ 2 ]'s data is missing!

  [    |  time_range  ]          # with substracting a day
     [ 2 ][ 3 ][ 4 ][ 5 ]        # [ 2 ]'s data is included.

Controlling costs

Check out BigQuery's pricing structure to better understand the cost of your queries.

For more tips for controlling costs, see the best practices.

💸 Worried about the cost of your queries? Use the "Query Inspector" in Grafana to copy the actual query being sent to BigQuery. Then go to BigQuery's Web Interface, click "Compose New Query" and paste the query in. BigQuery will calculate how much data will be scanned on the top-right.

Override the default LIMIT

Annoyingly, the BigQuery plugin tries to be smart and sets a LIMIT by default which is usually too small. We'll need to override this by explicitly setting a top-level LIMIT. I usually use 10000 as it's good enough for every query. Be careful as too many rows will slow down the dashboard and may even crash it.

If we let bucket_size be configurable, we'll probably want LIMIT to be configurable too as increasing the number of points may require us to adjust the LIMIT.

Custom variables

The previous query was very simple. It has no filtering so users can't drill down into specific details without writing more queries themselves. Let's make it more customisable using Variables.

🐢 Make sure to limit the number of selected items in your variables. Using the IN operator on a large list will result in extremely slow queries.

To create variables, along the top of the dahboard, click the "Cog" icon and along the left choose "Variables".

Here's the same query as before but now using custom variables.

SELECT
  TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * ${bucket_size) * ${__interval_ms} * ${bucket_size}) AS time,
  `browser_name` AS metric,
  COUNT(DISTINCT `user_name`) AS total
FROM
  `app.website.page_views` views
WHERE
  `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
  AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
  AND `browser_name` IN (${browser_name})
  AND `os_name` IN (${os_name})
  AND `user_name` IN (${user_name})
GROUP BY `time`, `browser_name`
ORDER BY `time` DESC, `browser_name`
LIMIT ${limit}

bucket_size is a "Custom" type which provides fixed values of: 1,5,10,20,40,80, defaulting to 20.

limit is a "Text box" type that takes any value, defaulting to 10000.

browser_name and os_name are dynamically generated using simple BigQuery queries.

SELECT DISTINCT `browser_name` as value
FROM
 `app.website.page_views` views
WHERE
 `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
 AND `browser_name` IS NOT NULL
ORDER BY `value`

Variables using other variables

user_name is a bit more complicated. As we often want to exclude our own data from queries, we want to only include users that aren't us. We can find this data using app.auth.users. We can also use this data to filter the dashboard on a group-level rather than just user-level. So we're using a group_name variable to filter groups which we then use to filter users in user_name.

group_name looks like:

SELECT DISTINCT REPLACE(`group_name`, '\'', '') as value
FROM
 `app.website.page_views` views LEFT OUTER JOIN `app.auth.users` users ON views.user_name = users.user_name
WHERE
 `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
  AND `browser_name` IN (${browser_name})
  AND `os_name` IN (${os_name})
  AND `group_name` IS NOT NULL
  AND `internal` IS FALSE
ORDER BY `value`

user_name looks like:

SELECT DISTINCT views.user_name as value
FROM
  `app.website.page_views` views LEFT OUTER JOIN `app.auth.users` users ON views.user_name = users.user_name
WHERE
 `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
  AND `browser_name` IN (${browser_name})
  AND `os_name` IN (${os_name})
  AND `group_name` IN (${group_name})
ORDER BY value

Now whenever group_name is changed, the list of available user_names will refresh automatically.

Calculating rates

So far we've only displayed basic counts in our time series. To calculate metrics from our analytics events we'll need to run two queries.

A rate can be represented as specific_events / all_events. So, for each time series datapoint, we'll need to find the count of all_events at that given point in time and the count of specific_events, divide them and get our rate as a number between 0 and 1.

We can use the WITH syntax to combine multiple queries into one. The example below generates a time series for view rates grouped by status code.

WITH
  all_events AS (
    SELECT
      TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * ${bucket_size}) * ${__interval_ms} * ${bucket_size}) AS time,
      COUNT(DISTINCT views.user_name) AS total
    FROM `app.website.page_views` views
    WHERE
      `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
      AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
    GROUP BY `time`
    LIMIT ${limit}
  ),
  status_events AS (
    SELECT
      TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * ${bucket_size}) * ${__interval_ms} * ${bucket_size}) AS time,
      status_code,
      COUNT(DISTINCT views.user_name) AS total
    FROM `app.website.page_views` views
    WHERE
      `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
      AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
      AND `status_code` IS NOT NULL
    GROUP BY `time`, `status_code`
    LIMIT ${limit}
  )
SELECT
  status_events.time,
  status_events.status_code AS metric,
  status_events.total / all_events.total AS total
FROM
  status_events LEFT OUTER JOIN all_events ON status_events.time = all_events.time
ORDER BY status_events.time, status_events.status_code DESC
LIMIT ${limit}

Columns with multiple values

Do you have a column with multiple values which you want to count and aggregate individually? Use the UNNEST keyword to split them up. Below we're splitting up a x_forwarded_for column which is a JSON array of IP address strings.

SELECT
  TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * 20) * ${__interval_ms} * 20) AS time,
  `ip_address` AS metric,
  COUNT(DISTINCT `user_name`) AS total
FROM
  `app.website.page_views` views,
  UNNEST(JSON_VALUE_ARRAY(`x_forwarded_for`)) ip_address
WHERE
  `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
  AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
  AND `ip_address` IS NOT NULL
GROUP BY `time`, `metric`
ORDER BY `time` DESC, `metric` DESC
LIMIT 10000

To better visualise what's happening, see below:

# Before UNNEST

Row 1 | [a, b]
Row 2 | [a, c]

# After UNNEST

Row 1 | a
Row 1 | b
Row 2 | a
Row 2 | c

Non-time series visualisations

For non-time series visualisations, we need to remove the time column on our queries. We just need a key and value.

Pie charts, stats, gauges, etc.

For example, the query below can be used in a pie chart showing page views by browser.

SELECT
  `browser_name` AS metric,
  COUNT(DISTINCT `user_name`) AS total
FROM
  `app.website.page_views` views
WHERE
  `_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
  AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
GROUP BY `browser_name`
ORDER BY `browser_name`
LIMIT 100

Tables

We can also use any number of columns to create plain old tables. Just remember to limit the number of rows and columns. Never use SELECT * as it's an expensive operation. Be specific. Make sure you're also using Partioned Tables and filtering by time range.

Use Field Overrides to customise the look of each column. This setting can be found along the right of Grafana.

Getting good at SQL

To transform data to however we need it, always have BigQuery's SQL Reference at hand. It's worth skimming through it to see what's possible. Aggregate Functions are especially helpful.

Conclusion

That's about every I've encountered trying to use BigQuery in Grafana. As you can tell by the complexity, it's best to avoid using BigQuery directly in Grafana. But if it's your only viable option, it's doable.

Thanks for reading.