Grafana & BigQuery
This guide aims to help you get up and running making dashboards in Grafana with BigQuery as a data source. It won't go into details on using Grafana itself which is an entirely separate topic. Also, it's possible your data is entirely different from the examples here. Hopefully, the specifics don't as matter much as the broader takeaways.
Avoid the query builder
Grafana includes a third-party, unofficial BigQuery plugin which lets us integrate BigQuery as a datasource. However, its bundled query builder isn't great for building useful and efficient queries. I'd write a list of reasons but hopefully it'll be evident from the tips below. So, do not use the query builder. Click "Edit SQL" along the bottom and use good ol' SQL.
📅 Note the date of this post. It's possible the BigQuery plugin has changed and improved.
Data structure
For the examples below, we'll be using a table called app.website.page_views
which has the columns:
created_at
(timestamp)os_name
(string)browser_name
(string)user_name
(string)x_forwarded_for
(JSON array of strings)status_code
(number)
And a table called app.auth.users
which has the columns:
user_name
(string)group_name
(string)
Query structure
Here's an example of the sort of query's we'll be writing. This one shows the number of unique views for each browser. We'll get into the details next.
SELECT
TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * 20) * ${__interval_ms} * 20) AS time,
`browser_name` AS metric,
COUNT(DISTINCT `user_name`) AS total
FROM
`app.website.page_views` views
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
GROUP BY `time`, `browser_name`
ORDER BY `time` DESC, `browser_name`
LIMIT 10000
Time series columns
Every time series query we build needs 3 resulting columns: time
, key
, value
. It doesn't matter what we name them as long as the types are valid. It's best to be consistent so that we can copy-and-paste queries and value mappings later. We'll use time
, metric
and total
to avoid clashing with SQL keywords.
Time and bucket size
The time
column is the most important part of the query. It limits the number of datapoints. The division and multiplication essentially groups each datapoint to the last bucket. In the example above, we scale the buckets by 20
to make the buckets larger and avoid too many datapoints. This is the bucket_size
. We can use Grafana's Custom Variables to make this configurable on-the-fly.
103131210202 # raw `created_at` datapoints
| | | | # bucket_size = 3
4 5 3 4 # bucketed `time` datapoints
| | # bucket_size = 6
9 7 # bucketed `time` datapoints
TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * 20) * ${__interval_ms} * 20) AS time
Bucketing by day, month, year.
To avoid all the maths, we can use TIMESTAMP_TRUNC
instead. It’s less flexible, you won’t get adjusted granularity based on the panel’s size, but it’s a lot simpler.
TIMESTAMP_TRUNC(`created_at`, DAY) AS time
Data delay
It's worth mentioning there's around a delay between sending data to BigQuery and it being available for querying. This may only be true for some tables and setups, but keep that in mind. Creating real-time dashboards may not be possible depending on your setup.
Global variables
The variables with the double underscores in the queries are global variables (like ${__variable}
).
Partitioned tables
By limiting the query using _PARTITIONTIME
, BigQuery is able to search within specific daily partition tables. This makes queries faster and cheaper to run as it doesn't need to scan the larger table. For example:
- With
_PARTITIONTIME
: 150MB - Without
_PARTITIONTIME
: 5GB
It's a huge difference.
Note that because partition tables are aggregated periodically, we'll want to query from the period before our time range starts; since _PARTITIONTIME
is the start of that period. Assuming that period is a day, this is why the query uses ${__from} - 86400000
a lot. Here's a visual explanation:
[ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ] # '[' indicates _PARTITIONTIME
[ time_range ] # without substracting a day
[ 3 ][ 4 ][ 5 ] # [ 2 ]'s data is missing!
[ | time_range ] # with substracting a day
[ 2 ][ 3 ][ 4 ][ 5 ] # [ 2 ]'s data is included.
Controlling costs
Check out BigQuery's pricing structure to better understand the cost of your queries.
For more tips for controlling costs, see the best practices.
💸 Worried about the cost of your queries? Use the "Query Inspector" in Grafana to copy the actual query being sent to BigQuery. Then go to BigQuery's Web Interface, click "Compose New Query" and paste the query in. BigQuery will calculate how much data will be scanned on the top-right.
Override the default LIMIT
Annoyingly, the BigQuery plugin tries to be smart and sets a LIMIT
by default which is usually too small. We'll need to override this by explicitly setting a top-level LIMIT
. I usually use 10000
as it's good enough for every query. Be careful as too many rows will slow down the dashboard and may even crash it.
If we let bucket_size
be configurable, we'll probably want LIMIT
to be configurable too as increasing the number of points may require us to adjust the LIMIT
.
Custom variables
The previous query was very simple. It has no filtering so users can't drill down into specific details without writing more queries themselves. Let's make it more customisable using Variables.
🐢 Make sure to limit the number of selected items in your variables. Using the
IN
operator on a large list will result in extremely slow queries.
To create variables, along the top of the dahboard, click the "Cog" icon and along the left choose "Variables".
Here's the same query as before but now using custom variables.
SELECT
TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * ${bucket_size) * ${__interval_ms} * ${bucket_size}) AS time,
`browser_name` AS metric,
COUNT(DISTINCT `user_name`) AS total
FROM
`app.website.page_views` views
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
AND `browser_name` IN (${browser_name})
AND `os_name` IN (${os_name})
AND `user_name` IN (${user_name})
GROUP BY `time`, `browser_name`
ORDER BY `time` DESC, `browser_name`
LIMIT ${limit}
bucket_size
is a "Custom" type which provides fixed values of: 1,5,10,20,40,80
, defaulting to 20
.
limit
is a "Text box" type that takes any value, defaulting to 10000
.
browser_name
and os_name
are dynamically generated using simple BigQuery queries.
SELECT DISTINCT `browser_name` as value
FROM
`app.website.page_views` views
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `browser_name` IS NOT NULL
ORDER BY `value`
Variables using other variables
user_name
is a bit more complicated. As we often want to exclude our own data from queries, we want to only include users that aren't us. We can find this data using app.auth.users
. We can also use this data to filter the dashboard on a group-level rather than just user-level. So we're using a group_name
variable to filter groups which we then use to filter users in user_name
.
group_name
looks like:
SELECT DISTINCT REPLACE(`group_name`, '\'', '') as value
FROM
`app.website.page_views` views LEFT OUTER JOIN `app.auth.users` users ON views.user_name = users.user_name
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `browser_name` IN (${browser_name})
AND `os_name` IN (${os_name})
AND `group_name` IS NOT NULL
AND `internal` IS FALSE
ORDER BY `value`
user_name
looks like:
SELECT DISTINCT views.user_name as value
FROM
`app.website.page_views` views LEFT OUTER JOIN `app.auth.users` users ON views.user_name = users.user_name
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `browser_name` IN (${browser_name})
AND `os_name` IN (${os_name})
AND `group_name` IN (${group_name})
ORDER BY value
Now whenever group_name
is changed, the list of available user_name
s will refresh automatically.
Calculating rates
So far we've only displayed basic counts in our time series. To calculate metrics from our analytics events we'll need to run two queries.
A rate can be represented as specific_events / all_events
. So, for each time series datapoint, we'll need to find the count of all_events
at that given point in time and the count of specific_events
, divide them and get our rate as a number between 0 and 1.
We can use the WITH
syntax to combine multiple queries into one. The example below generates a time series for view rates grouped by status code.
WITH
all_events AS (
SELECT
TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * ${bucket_size}) * ${__interval_ms} * ${bucket_size}) AS time,
COUNT(DISTINCT views.user_name) AS total
FROM `app.website.page_views` views
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
GROUP BY `time`
LIMIT ${limit}
),
status_events AS (
SELECT
TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * ${bucket_size}) * ${__interval_ms} * ${bucket_size}) AS time,
status_code,
COUNT(DISTINCT views.user_name) AS total
FROM `app.website.page_views` views
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
AND `status_code` IS NOT NULL
GROUP BY `time`, `status_code`
LIMIT ${limit}
)
SELECT
status_events.time,
status_events.status_code AS metric,
status_events.total / all_events.total AS total
FROM
status_events LEFT OUTER JOIN all_events ON status_events.time = all_events.time
ORDER BY status_events.time, status_events.status_code DESC
LIMIT ${limit}
Columns with multiple values
Do you have a column with multiple values which you want to count and aggregate individually? Use the UNNEST
keyword to split them up. Below we're splitting up a x_forwarded_for
column which is a JSON array of IP address strings.
SELECT
TIMESTAMP_MILLIS(DIV(UNIX_MILLIS(`created_at`), ${__interval_ms} * 20) * ${__interval_ms} * 20) AS time,
`ip_address` AS metric,
COUNT(DISTINCT `user_name`) AS total
FROM
`app.website.page_views` views,
UNNEST(JSON_VALUE_ARRAY(`x_forwarded_for`)) ip_address
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
AND `ip_address` IS NOT NULL
GROUP BY `time`, `metric`
ORDER BY `time` DESC, `metric` DESC
LIMIT 10000
To better visualise what's happening, see below:
# Before UNNEST
Row 1 | [a, b]
Row 2 | [a, c]
# After UNNEST
Row 1 | a
Row 1 | b
Row 2 | a
Row 2 | c
Non-time series visualisations
For non-time series visualisations, we need to remove the time
column on our queries. We just need a key
and value
.
Pie charts, stats, gauges, etc.
For example, the query below can be used in a pie chart showing page views by browser.
SELECT
`browser_name` AS metric,
COUNT(DISTINCT `user_name`) AS total
FROM
`app.website.page_views` views
WHERE
`_PARTITIONTIME` BETWEEN TIMESTAMP_MILLIS(${__from} - 86400000) AND TIMESTAMP_MILLIS(${__to})
AND `created_at` BETWEEN TIMESTAMP_MILLIS(${__from}) AND TIMESTAMP_MILLIS(${__to})
GROUP BY `browser_name`
ORDER BY `browser_name`
LIMIT 100
Tables
We can also use any number of columns to create plain old tables. Just remember to limit the number of rows and columns. Never use SELECT *
as it's an expensive operation. Be specific. Make sure you're also using Partioned Tables and filtering by time range.
Use Field Overrides to customise the look of each column. This setting can be found along the right of Grafana.
Getting good at SQL
To transform data to however we need it, always have BigQuery's SQL Reference at hand. It's worth skimming through it to see what's possible. Aggregate Functions are especially helpful.
Conclusion
That's about every I've encountered trying to use BigQuery in Grafana. As you can tell by the complexity, it's best to avoid using BigQuery directly in Grafana. But if it's your only viable option, it's doable.
Thanks for reading.