When discussing the merits of HTTP/2 over HTTP/1.1 (hereafter H2 and H1, respectively) it is important to understand why H2 came about at all. H1 has served us well for many years, but it is starting to tear at the seams and the proposed solutions to fix it are not going to work long term. Occasionally people still raise such solutions (and in fact, I was one such person) and I thought I would write down why they don’t work.
Problem: Only one request can be active on an H1 connection at a time.
Non-solution: Enable Pipelining.
Pipelining is the idea that multiple requests can be sent one after another without waiting for the first response. The cost of waiting a round trip time (RTT) between each request is eliminated and web pages load faster.
Why is pipelining not a solution? The typical answer is that it breaks some webservers, but that is just a side reason. Pipelining still requires in order responses. There is no way a server can send responses based on which one is ready first.
Suppose that you request a dog.png, bird.gif, and cat.jpg from a server. With pipelining, the server must send dog.png first. Suppose further that the server has a cached copy of cat.jpg ready to send, but dog.png has to be fetched from another backend. The high latency cost of serving dog.png will be added to serving cat.jpg.
Keeping with the example above, the server then tries to send bird.gif. GIF images are typically very large compared to JPEGs so it will take a long time to serve the whole GIF before cat.jpg will even begin to render. A large response will also add latency to the latter requests. Even if the server could internally fetch resources in parallel, it still has to respond in order.
Lastly, a whole class of use cases is not enabled by pipelining. Streaming use cases, like a stock ticker, chat room, push notifications, etc. don’t work with pipelining. Other requests would never be fulfilled, since they are waiting on a long lived request to finish.
Non-solution: Use multiple connections.
Using multiple connections to concurrently load resources is actually used today. Since pipelining is disabled on most HTTP clients, they instead open up multiple connections to the same server. Requests can then be sent in parallel and speed up loading of web pages.
But, this too is a non-solution. Clients will typically limit the number of connections made to less than 10 because TCP connections are relatively expensive. They are expensive to make, expensive to maintain, and are even a security risk.
Bringing up a new TCP connection starts off very slow. This is known as “slow start” which is designed to avoid overloading intermediate routers between the client and the server. Each time more data is acknowledged by the remote endpoint, more data can be sent out. This allows the connection to find out the optimal speed at which to send and receive data.
In the case of making multiple TCP connections, each of these pays this initial cost. Rather than using a single connection which more quickly rises to full speed, lots of “cold” connections each go through the process of speeding up. This hurts initial load time.
Even when the connections are up to speed, there is no way to describe which of them should have the highest priority. Serving the CSS and JS for a webpage are more important than the tracking code or ads. A client cannot say “please give me the resources that are blocking rendering first.” Even if it somehow could do this, it is up to the OS to multiplex and prioritize the connections, not the client. Even if both the client and the OS were cooperating, all the routers and proxies on the way to the server will do the actual prioritization.
Consider a typical use case of a home internet router that is doing NAT. Such routers are notoriously bad at timesharing lots of TCP connections. Consider another typical use case of a cell phone over a mobile network. The latency is very high, the bandwidth limited, and there is no way to describe how best to share the antenna. Lastly, consider a load balancing reverse proxy. Receiving 100 connections at once would look more like an attack than well behaved client. The proxy will simply drop most of the connections since it already has thousands of other connections that need to be managed.
Solution: use a new protocol.
Expensive as it may be, introducing a new protocol to solve the above issues is tenable. H2 effectively takes the reins of multiplexing requests into its own hands. Interleaving and reordering requests and responses allows H2 to give control to the application over how to optimize web page loading. Reusing a single TCP connection reduces load on the OS and intermediate proxies.
It also enables use cases that either didn’t exist or had serious drawbacks over H1. Pretty much all forms of server streaming consumed a whole connection previously. Client side streaming didn’t exist. Bidirectional streaming was not possible. The only real solution was to use Websockets, which still have the same head of line blocking problem as before.
I am not saying the H2 isn’t without its flaws. There are problems with it too, but it does bring something to the table that H1 simply cannot. Try to go to any of the demos of H2 vs. H1 loading and play with your browser settings to get H1 to be as fast. H2 is fast by default, solves real world problems, and is being widely rolled out.
Note: Keeping H1 alive for the sake of backwards compatibility is not needed, as H2 has been designed specifically for the purpose.