5 Jul, 2018

Always Log JSON

After years of writing log lines and trying to find a perfect format, I've come to a conclusion: Always log JSON.

Logging to JSON makes your code read more consistently, you save time thinking about how to log your message and it allows you to separate concerns between logging data and processing it for various use cases like debugging, generating reports and performance.

I'll be using JavaScript as an example here, but my opinions apply to all languages.

What does that mean?

Logging to JSON means whenever you log something:

console.log("my message");

You instead log it as JSON.

console.log(JSON.stringify({ message: "my message" }));

// {"message":"my message"}

Of course, this can look messy and it doesn't scale when you want to perform large scale refactoring. So use functions.

const log = (message) => console.log(JSON.stringify({ message }));
log("my message");

// {"message":"my message"}

You might be thinking: "Aren't we back to square one?" The answer is no. What we've done here is restricted how our application logs. We've provided an API.

Say we're logging something in an HTTP request handler, but we need some more context. Without JSON, we would've had to do something like

console.log(`${req.path} - my message`);

// /abc - my message

Instead, let's just do this:

const log = (message, data) => {
  console.log(JSON.stringify({ message, ...data }));
};

log("my message", { req });

// {"message":"my message","req":{"path":"/abc" ...}}

Assuming data can be turned directly to JSON, we're now able to add any amount of diagnostic information without worrying about how to present it. Your presentation is left to the logger, which can be as simple or as complex as it needs to be.

Since you're now essentially passing data around, you're not just limited to logging output to console. You can use the same approach to wire in metric reporters, databases, notifications.

const reporters = [
  (event) => console.log(JSON.stringify(event)),
  (event) => storeInElasticSearch(event),
];
const log = (message, data) => {
  const event = { message, ...data };
  reporters.forEach((reporter) => reporter(event));
};

log("my message", { req });

// {"message":"my message","req":{"path":"/abc" ...}}
// POST https://my-elastic-search.com/event

The possibilities are endless.

How do you choose what to log?

Say we only ever want specific bits of req since it can be quite big and sensitive. We can add a concept of serialisers.

const reporters = [(event) => console.log(JSON.stringify(event))];

const serialisers = {
  req: (req) => ({ path: req.path }),
};

const getSerialiser = (key) => serialisers[key] || ((i) => i);

const toEvent = (message, data) =>
  Object.keys(data).reduce(
    (acc, key) => {
      acc[key] = getSerialiser(key)(data[key]);
      return acc;
    },
    { message }
  );

const log = (message, data) => {
  const event = toEvent(message, data);
  reporters.forEach((reporter) => reporter(event));
};

log("my message", { req });

// {"message":"my message","req":{"path":"/abc"}}

As you can see, we're slowly building up an API that defines a consistent approach to logging.

This looks complicated.

Maybe it does, but overtime, it becomes much more predictable than log lines trying to awkwardly turn contextual data into human sentences.

You can always use an out-of-the-box solution like Bunyan or Pino for JavaScript. There are plenty of choices for most other languages too.

Why JSON?

JSON is supported by pretty much every popular tool and library. It's become a standard format for Web APIs, databases and developer tools. Most languages have a standard library which can parse it. It's also relatively human readable.

What about performance?

I would first try it out before worrying too much about performance. It's worth considering the balance between the time saved logging and querying data vs. the typically minor performance loss.

Isn't it kind of wasteful?

JSON does have a lot of quotes, commas and braces. But you're probably rotating your logs hourly and storing them using GZIP anyway, and GZIP is great at compressing repetition.

If you're getting disk filled and increasing your disk space isn't an option, you could post the data to an external service like Elastic Search.

What about developer tools?

Logging to JSON mostly shines in production environments. Places where you want to run queries on large sets of historical data. You can do the same with development tools. For example, you can log to JSON and pipe the output to whatever other tool to make it more human readable. By separating these concerns, you can create tools with whichever stacks are necessary and stick to a single stack for presentation.

What about tests?

Most test runners use an intermediate format between gathering results and outputting them. This lets them output to fit a variety of use cases like: creating web pages, tables, IDE integrations, CI integrations, and of course console output. What we've done in the example is no different. So look for any APIs your testing framework has to add your own data to it.

Isn't this like Event Dispatching?

Yeah, it is. You could go one step beyond and log straight from event handlers instead of specific lines of code; separating your logging even further from your implementation and allowing other tools and features to hook into the same events in a plug-able way.

When isn't this suitable?

There are very rare circumstances when logging to JSON doesn't make sense. If a framework is really restrictive and prevents you from controlling the output, then it can't be helped. But I say, try it first and fallback to strings if all else fails.

Conclusion

Always log JSON.