JSON for Lunch
It’s lunch time on Monday, and you want to go out to eat. Because you heard team-building is a thing somewhere, you ask around the office to see who wants to grab some grub. Cathy is in, Mike is in, Waleed and David and Sándra too. Where should we go?
Office social norms dictate that you should go somewhere that everyone will like. It would be risky to go Thai, because some people don’t like spicy food. You’ve been meaning to try that Salvadoran place, but you know someone in your group is going to resist cuisine from some place they’ve never heard of. Greek would be great too, but Sándra said she just had it this weekend. What to do?
Of course, you compromise. Rather than pick somewhere that most of you like, it would be a safer bet to just go to Applebees or Chili’s. The food is not going to offend anyone’s palate, but it probably won’t be memorable either. Risk aversion wins again.
JSON is Applebees. It’s the safe data format that everyone can agree on. Need to send data between a server and a browser? Just use JSON. Need a small configuration file? Just use JSON. Need to define authorization tokens? Just use JSON. Everyone supports it, everyone can read it, and everyone can write it.
Human Readable
JSON is human readable, and to an extent human writable. You can look at the contents of a JSON encoded blob and get a pretty good idea of what’s inside. For example:
{"issuer":"joe",
"expiration":1300819380,
"http://example.com/is_root":true}
This is an excerpt from the JWT RFC linked above. Pretty printing makes it easy to understand what this JWT payload contains. When debugging why something isn’t working, it’s crucial to be able to dig in to the data.
The problem is that practicality eventually comes along and mucks everything up. Encoding JSON on the wire takes up a lot of space, so white space is trimmed and names are shortened. Instead of human readable, you have quasi-human readable:
{"iss":"joe","exp":1300819380,"http://example.com/is_root":true}
After this processing, the data is much harder to read. You can probably glean that “iss” meant issuer, and “exp” meant expiration. However, adding anything more and it starts to be taxing:
{"iss":"joe","exp":1300819380,"nbf":1300819380,"iat":130819380,
"http://example.com/is_root":true,"prms":{"own":[1,3,5],
"dnrm":[1,2,4,5,7]}}
Did you spot the error? If you didn’t, it should be clear with proper formatting:
{
"iss": "joe",
"exp": 1300819380,
"nbf": 1300819380,
"iat": 130819380,
"http://example.com/is_root": true,
"prms": {
"own": [
1,
3,
5
],
"dnrm": [
1,
2,
4,
5,
7
]
}
}
The “iat”, the “issued at” time, is missing a character. Running the data through the browser’s JSON.stringify()
function makes the problem obvious. But, if you have to run the data through a function to understand it, why bother picking a “human readable” format to begin with?
Human Writable
Being able to use Javascript’s object notation inside Javascript is pretty convenient. The problem is that JSON is not Javascript. When trying to modify a JSON file by hand, it becomes annoying to keep the two separate. For example, what’s wrong with the following:
{
db: "postgres",
host: "db.internal.org",
password: "foo",
username: "bar"
}
Obviously, the keys need to be wrapped in double-quotes, unlike Javascript.
Say you need to add an additional field. What lines will you need to edit?
{
"db": "postgres",
"host": "db.internal.org",
"password": "foo",
"username": "bar",
"timeout": 300
}
Add a comma to the end of the username line, and add the timeout line. Two lines are edited for an effective one line change. This sounds like a minor inconvenience, but it has real downsides.
It’s really easy to miss. Several outages at Google have been due to back configuration pushes. Accidentally adding an extra comma is also common too. Unless you have validation all over the place you WILL eventually screw this up.
The blame history becomes completely wrong. Everyone becomes responsible for a line they didn’t really intend to edit. Making a one liner masks whoever added the previous line.
Additionally, what are the units for timeout? Seconds? Milliseconds? It isn’t clear just from the name. Without any types stronger than “number”, you’d have to look at the code that consumes it.
Trade-Offs
I realize of course that there are technical trade-offs when designing something, and JSON is a product of those decisions. However, the rest of the world comes across as ignoring these. Instead of using JSON for what it is good at, they have decided to use it unilaterally. I picked on the above two merits of JSON because they are what we supposedly gain by using it. What have paid in return?
- Inefficient spacial encoding, bloating response sizes
- Inefficient CPU usage, spending far more cycles encoding and decoding compared to other formats.
- Forcing compromise between readable key names and space efficiency.
- Lack of type info, making typos caught on deployment.
The last two points are solved when you separate the schema from the data. JSON mixes the two together (similar to XML), and forces you to make more trade-offs.
If the JSON you use is less than 100 or so bytes long, everything seems rosy. It’s easy to read, easy to write, and you can keep track of everything in your head. When it grows beyond the size of your monitor, it shows its weakness.
Despite all this, JSON will continue to live on. Everyone supports it, so everyone keeps using it. It’s a safe to pick data format that everyone can agree on. We have given up our readability, our efficiency, and our safety.
Addendum
Warning: Bias! I work on gRPC as part of my day job. One of the major selling points of gRPC is that it is fast. Browse the web for a little while to see success stories of companies switching to gRPC and massively speeding up their systems. Faster systems = less money on computing resources = $$$.
Why are they so much faster? The main difference is because they switched from JSON to Protobuf, not that the transport layer is radically different. I’m not saying that Protobuf is what you should use, but it is leaps and bounds better than JSON. Even the text format of Protobuf is better, eliding unnecessary punctuation. Do your research before picking a format!