Testing & Tweaking HTTP/3

20th Nov 2020

Testing & Tweaking HTTP/3

My latest configuration tweaks (besides adding a little more security here and there) have been related to... HTTP/3. Aye, that's not a typo: HTTP/3. Some of you have been increasingly seeing wider support of the last-before-latest HTTP protocol update, HTTP/2, which has been around for a while, and has been the only update from the ancient HTTP/1.1 (introduced in 1997, reformulated in 2014). You may wonder why so many versions were necessary. Well, up to HTTP/1.0, you had to establish one connection for each resource you wished to download. Because back then web pages were little more than a handful of 'raw' HTML, this would suffice. Establishing connections over port 80 or 443 — using TCP/IP — is a relatively costly (and thus slow) operation, but it would be fine anyway, since the actual download of a single 'pure' HTML page would take way longer than the actual establishment of the connection.

As webpages started to be filled with images, styled with CSS, and made interactive with JavaScript, things started getting more and more complicated to download, with both browsers and web servers having to establish lots of new connections, download the files, close the connections, and reopen them again for the next batch of files. Because the available Internet bandwidth also grew exponentially, web designers continued to make their pages increasingly more complex — and take even more time downloading. Thus, HTTP/1.1 was introduced, allowing browsers and clients to communicate through a single TCP/IP connection, which could be 'kept alive' for a while, and allow all sorts of content to be funneled through that connection, as opposed to constantly opening and closing connections. That's why HTTP/1.1 became so successful and why it hadn't been changed since, literally, the last century (note that HTTPS connections will also use the HTTP/1.1 'pipe'). About half of the websites in the world still use it, and practically all browsers in existence (even old, text-based ones!) support HTTP/1.1.

That certainly improved things, but just up to a limit. The 'funneling' through a single pipe certainly saved resources, but... what if the pipe got 'clogged', in the sense that there was either too much data to be transferred and either side was not able to deal with it? Web servers, these days, are mostly threaded (or use similar methods) to allow for 'slower' connections to be still 'kept alive', while continuing to accept requests from other threads. Again, this will work relatively well unless way too many 'slow browsers' are consuming all the resources...

Around 2015, one of Google's research teams was toying around with a better way of enqueing content via a single TCP/IP channel, using a protocol they named SPDY (aye, pronounced speedy). The idea was that you could sort of prioritise content travelling through the channel — imagine that a static HTML file might be quick to transmit, while images, movies, and eventually even complex CSS and JavaScript (which require the browser to do a lot of calculations to render a page) might take longer. SPDY included a lot of mechanisms to figure out how to send and retrieve that data more efficiently (on both sides of the connection); it also included nifty things like compressing the data being transmitted and/or encrypting it on-the-fly. Microsoft and others have also been toying with similar concepts, but eventually Google's SPDY protocol was adopted as the new standard HTTP/2. Although technically it should work both with unencrypted and encrypted streams, these days all contemporart browsers will only allow HTTP/2 with encrypted connections. Thanks to providers such as Let's Encrypt (where anyone can get a valid SSL certificate for free, for ever) as well as companies such as Cloudflare (they deploy — for free — a world-wide distributed front-end on top of existing servers providing HTTP/2 to external visitors, while allowing unencrypted HTTP/1.1 connections to the back-end servers, i.e. those being run by their clients), HTTP/2 caught on relatively well, supports way faster connections overall, allows much more reliable video streaming (among other benefits), and is safer, more secure, and wastes less network-related resources, at a small extra CPU cost on both sides of the stream — which is really negligible, even when serving vast amounts of data, as more and more CPU manufacturers include encrypting-decrypting/compressing-decompressing units in dedicated silicon, thus allowing such operations to be 'offloaded' to those specialised units.

Around half the websites in the world use HTTP/2. On my PPP server, it's the default for all sites, even though almost all of them have the extra protection layer of Cloudflare on top (which provides HTTP/2 anyway). There is a noticeable performance difference when requesting content from the same website using either HTTP/1.1 or HTTP/2. HTTP/2 (always using encryption) will even beat unencrypted HTTP/1.1 connections by a far margin. No wonder that it's quickly becoming the 'standard' in web communications.

Or perhaps not. Around 2018, more researchers at Google started work on the next-generation Web protocol, which they called QUIC (you can see Google's naming trend, right? 😅). The problem, this time, is an order of magnitude greater. You see, TCP/IP is the most universally used transport protocol on the Internet (with the exception of DNS and a few other specialised utilities; see below), because, from a programmers' perspective, it's easy to write applications on top of it. TCP/IP guarantees that a connection is established. It automatically handles things like missing packets, retransmitting them if needed. Both sides of the connection are constantly verifying and validating that the connection is still 'up', and that packets are not coming out of order. Due to the nature of the Internet — its biggest asset! — IP (Internet Protocol) packets can take all sorts of routes from the sender and the receiver, navigating across the interconnected network (aye, that's where the name Internet comes from!), trying to find the 'optimum' path which is not necessarily the one that is geographically shorter — since packets will try to avoid congested routes and search for alternatives when available. TCP, after all, stands for Transmission Control Protocol, and that's exactly what it does.

Here is a short overview on how this works (it's a gross oversimplification). TCP/IP requires that the sender knows the IP address of the destination and a port (for instance, 80 for HTTP or 443 for HTTPS). Ports under 1024 are standardised and can only be used for specific applications (in general, an 'application' is a client-server pair of software programmes which use a well-known port to communicate); above that, anyone can use them freely.

The sender first has to make sure that 'something' is listening to the service before sending anything. This is known in the telecommunications industry as handshaking. So, the sender creates a packet with its own address (so that the receiver knows how to 'talk back') and a port, and pushes it into the Internet, starting a timer. Eventually, this packet will get to the receiver. If that happens, the receiver 'writes back' — i.e. it creates its own packet saying 'ok, I got your request to establish communications, please acknowledge this packet and then you can start communicating'. Let's assume that this packet reaches the sender before the timer reaches zero. It now knows that someone is listening, and that they're clear to send. So first they acknowledge that the communication is 'open' by sending another packet, saying 'ok, I'm acknowledging your packet, communications are now open, please stand by for data'. And then it starts to send packets, labeling them with a series of sequential numbers, so that the receiver manages to figure out the correct order. During the handshaking, they also agree upon a common 'connection number', so that each side can differentiate specific packets belonging to one communication or another. After all, the browser can communicate with several different web servers at the same time; while naturally a web server will need to accept different communications from hundreds, thousands, millions of simultaneous browsers, all of them connecting to ports 80 and/or 443 — so they use a special connection number for each communication (there is obviously a limit to each side, beyond which no further communications can be made simultaneously — one has to wait until some communications are closed to accept more).

Assume that the sender wants to send packets #1, #2, and #3 to the receiver. It starts sending packet #1. If the receiver gets that, it sends an acknowledgement back saying, 'ok, #1 received, proceed to #2'. If that acknowledgement is properly received, the sender then sends packet #2, and so forth. Note that the receiver knows what packet to expect next: if it has just acknowledged packet #2745, it knows that the next one has to be #2746.

Eventually, the sender finishes sending everything, and then there is a different handshake to terminate the connection. The sender will just say something like, 'ok, I have no more packets, this is my last message'; the receiver acknowledges the message by sending back: 'fine with me, last message received, this is the last message you get from me, I'm closing connections now'. If the sender wishes to push a few more packets to receiver, it has to start a new communication. It's as if the receiver has suffered from memory loss and forgot all about what the sender has been sending. The new communication will be treated independently, using a new communication number, and from the outside perspective, it's not even obvious that the same client-server pair is communicating again.

Ok, let's now see how this works in a non-ideal scenario. In reality, things are not so easy, of course. Packets get lost; acknowledgements to packets get lost as well; or they are received out of order. For instance, imagine that things have been going rather well until packet #2745 is received; now the receiver expects #2746. But, instead, let's say that it received #2745 again (there are several reasons for that actually happening). The receiver has now a few choices:

Ignore the packet completely (since it's a duplicate) and pretend that nothing has happened;
Discard the packet, but inform the sender that a duplicate was received;
Compare the new #2745 with the old one; if both are exactly equal, do 1. If not... well, something went wrong, so there are now more choices:
1. Discard both #2745 and request a completely new one (what to do if this situation happens over and over again, i.e. the new packet is always different from the others?...)
2. Acknowledge the older #2745, and give an error message for the newer #2745; assume that the sender will figure out what kind of error has happened.
3. If this suddenly starts happening over and over again, assume that the connection is not reliable, and drop it; let the sender establish a new one if they wish (this mostly means that if eventually #2746 finds its way to the receiver, it will be disregarded — usually, with some kind of error message saying that the communication is not valid any longer).

Complicated? Wait until we see the other, more frequent situation: packet loss.

In this scenario, we imagine a 'bad' connection (either a very noisy channel, or a very congested line with not enough bandwidth, or some bad router in-between which will occasionally drop packets, or... well, there are a billion good reasons for communications to fail. Remember that the Internet was designed in 1969 to resist a direct nuclear attack. Things are supposed to continue to work — even if not as well as before! — when that happens). So, the sender creates packets #1, #2, and #3, and starts sending #1. This is received, so the receiver sends back the acknowledgement packet for #1 — let's write it as ACK#1. The sender receives ACK#1, so now both sides now that they can go ahead with further packets. The sender sends #2... but this fails to reach the receiver. So at this point the sender doesn't know if it will ever get an ACK#2, while the receiver is thinking 'ok, I got #1 — but did the sender receive my ACK#1? Or did they already send #2? But I received nothing...'

At this point we're in an indeterminate scenario, since both sides cannot agree any more on the packet sequence. The sender knows that #1 was sent and properly received, but doesn't know if #2 was received or if it never received ACK#2. What should it do? Wait longer (the network might be just slow and the acknowledgment may just be taking more time than usual)? Or send #2 again, and expect that it works this time? What if it doesn't? (let's assume that after trying a thousand times, something must be wrong, so the connection should be closed). Or assume that #2 was received, and, not to waste time, send #3, and expect that ACK#3 is sent back? (and possibly ACK#2 as well?)

On the side of the receiver, there are also several issues. Packet #1 was received... but nothing else since that. So, should the receiver send ACK#1 again — assuming that it was 'lost'? Or ask for #2 to be retransmitted, informing the sender that they didn't receive anything after #1? Or wait longer?

What exactly happens is beyond my knowledge, but I can assume that there are different strategies, and that possibly one can establish a TCP/IP connection with different parameters to coordinate the strategy. For instance, imagine that the sender has no trouble sending packets, but has some difficulty in receiving packets. If this asymmetry is known before the communication is established, the sender may say, 'look, I will be a bit slow when receiving your ACKs, so please wait longer for my next packets' or even 'because I'm slow in receiving your ACKs, please only send one ACK for every ten packets I send'.

The latter idea is actually much closer to how the TCP/IP protocol works in reality. Because TCP packets are small (usually 4096 bytes each, with a bit over 1000 bytes for the ACKs), when transferring a lot of data (such as a large GIF!) it makes more sense to send a lot of packets before waiting for an ACK for all of them. In fact, this is a simple way for each side to figure out the 'best' transfer rate between both: they start sending a certain number of packets, see if they get all properly received, and, if that's the case, they might agree to transmit more packets the next time, and so forth, until an equilibrium is found: sending less packets each time is a waste of the available bandwidth but is 'safer'; sending too many packets (more that the other side can receive!) means missing a lot of ACKs and thus being forced to retransmit everything again (possibly sending less packets the next time). I'm not even going to address these issues of automatic fine-tuning of the 'best' way to send packets — there are dozens of algorithms to deal with traffic congestion, high-bandwidth local networks, low-bandwidth international networks, asymmetric bandwidth (such as what most people have at home), very-high-bandwidth but very high latency (a satellite downlink), and so forth.

And I've not even touched the issue of dealing with communication errors inside packets. Suppose that we send packet #2345, but, when it arrives, instead of the expected 4096 bytes, it has only 3000. What to do next? Ask for a retransmission of the whole packet, or just for the last 1096 bytes? What if those last bytes are also truncated? Wouldn't it make sense to ask for the whole packet instead? And what should be done if the 4096 bytes are received but... during the transmission, there were errors inside it? (some 0s flipped to 1s and vice-versa) How does each side know that there were errors? Can some of these be fixed without asking for a retransmission? (The short answer is 'yes, up to a certain degree of errors'; the long answer requires learning about a gazillion of different methods to allow for error detection — 'checksums' — and error correction; a few of which are available for TCP/IP, others have to be done at a higher layer).

And then we have other considerations to deal with: what about compression and decompression? Most compression algorithms work only well with a large amount of data; thus, compressing individual packets is rarely useful. On the other hand, if one communication is consistently sending, say, 20 packets at the time, wouldn't it be worth to compress them before sending them? How will this affect the sequence numbering? What if some of these packets get lost and/or corrupted? How can each side of the connection figure out how to 'restore' the compressed data?

... and we could also start addressing encryption. To prevent things like packet sniffing, or spoofing, would it make sense to encrypt everything before sending each packet (or group of packets)? Or leave encryption to the 'upper layers' of the software and just send everything in 'clear text' at the TCP level? (the latter is usually what is implemented)

Quoting directly from the Wikipedia:

TCP is complex.

Deliberately so. The idea is that 'the network' does all the tough work of keeping a communication up and flowing packets between sender and receiver, whatever happens 'down below'. That's why typical higher-end communications — such as mail or web access — can be developed with all the above assumptions in place:

Browser: Connecting to Web server... done.
Browser: Hi, please give me the content of URL https://wikipedia.com. Here's a cookie I have for it. Feel free to send it encrypted and compressed using the Brotli algorithm.
Web server: (after figuring out what to do with the request, encrypting it and compressing it according to what the browser wants). Ok, here you go.
Browser: (disconnects)

Simple, right? There is a bit more to that — especially regarding processing forms or uploading data, figuring out what encryption ciphers should be used, dealing with malicious attacks — but this is HTTP/0.9 in its essence. It's simple and it works — because the layer beneath, TCP/IP, does all the dirty work. At the HTTP level, things are absurdely simple, and that's one of the (many) reasons why HTTP became so popular mere instants after Tim Berners-Lee invented it.

Ok. Although I just showed an oversimplification — nay, an extreme oversimplification — of how TCP/IP works, I hope that you can appreciate that all this complexity has a huge benefit, but it comes at a cost: there is a lot of overhead even for the simplest communications. With 4 or so billion people on the Internet, and at least as many web servers out there, with gazillions of pages/images/videos to be transferred, this means that browsers and web servers all over the world are constantly opening and closing connections, counting packets, setting up timers to wait for a reply, verifying if all packets have been sent correctly and received correctly, dealing with errors, retransmissions, different speeds — both on the telecommunications side of things, but also on the hardware itself: a slow, underpowered server will not be able to handle packet sending and receiving at very fast speeds, even if the network infrastructure is able to do so; on the other hand, the fastest supercomputer in the world will always be limited to the available network bandwidth (and thus wasting all that CPU power endlessly waiting for the network to do its magic...).

And while it's easy to understand why programmers are so fond of the TCP/IP model to develop higher-end applications, one might wonder if there isn't an alternative for those who do not mind getting their hands dirty in exchange of blindingly fast performance.

Well, the second most used protocol on the Internet is User Datagram Protocol, or UDP. Developed in 1980, it is a protocol that doesn't guarantee much — no delivery guarantees and no guarantee that there is even anyone still listening to the packets. There is no connection handshaking, no packet acknowledging, no sequence numbering, no error checking, no timers... nothing. It basically works this way:

Server A: Here goes a packet to Server B.
Server B: ... (silence)
Server A: Oh, I just received a packet from Server B! Cool! (no need to tell B anything) Let's see if this packet is actually useful for anything... if not, I can always send another packet and see what happens.
or: Server A: Hmm. I've been waiting for an hour and haven't received anything from B yet. Let's a) give up or b) wait a bit longer or c) try sending a new packet again...

After reading about TCP/IP, you might be wondering if UDP/IP isn't absolutely useless!

As said, DNS (Domain Name System) is probably the most known application which uses UDP, because it fits perfectly in this model. Let's assume that A is a computer which wants to know the IP address assigned to a certain server name. B, C and D are domain name servers, running what is called a name resolver, an application which looks up a table of names and their IP addresses.

Here is what DNS works in its simplest form:

A: UDP packet to B, what is the IP address of this.domain.name?
B: (never receives the packet)
A: (after waiting a few microseconds) Hello-ooooo! UDP packet to B, what is the IP address of this.domain.name?
B: (sends a packet back to A with the answer, but A never receives it)
A: (after waiting again) Let's try asking C instead: UDP packet to C, what is the IP address of this.domain.name?
C: Oh, I know that one! UDP packet to A, the IP address of this.domain.name is 10.0.0.1. Have a nice day!

or perhaps more like this:

A: UDP packet to B, what is the IP address of that.domain.name?
B: (receives the packet, but has no clue) UDP packet to C, do you happen to know the IP address of that.domain.name?
C: (is down, under maintenance, too slow, or has no clue, either, so it never replies)
A: Still waiting! (but not sending packets)
B: (after a while) Hmm, C is not responding. UDP packet to D, what about you? Do you know the IP address of that.domain.name?
D: UDP packet to B, sure, the IP address of that.domain.name is 10.0.0.2!
A: Still waiting...!
B: (after storing the reply from D, so that the next time, it knows how to reply, and will not make A wait so long). UDP packet to A, sorry for the delay, had to get it from D, the IP address of that.domain.name is 10.0.0.2.
A: Cool, an answer from B, it was working after all. Let's store this address locally so that I don't need to bother asking B again, he's too slow! Also, I might make a mental note to ask D instead, she seems to have better knowledge of these things.

Ok... so, perhaps it's not exactly like that, but you see how it works:

Each request may be received or not; it is irrelevant (because you can always request the same data again)
Each answer may be received or not; this is also irrelevant, and although it's a different situation from the previous case, it can be handled in exactly the same way (just wait a bit and make the request again!)
Requests and answers are tiny and fit well in a single UDP packet (going either way)
There is no need to 'remember' anything about a previous communication, since whatever can be 'remembered' (such as the relationship between a server name and its IP address) can be stored locally. Also, as the example shows, even though communications are incredibly basic and simple, each side can make resonable decisions about which server to contact (when they figure out that a server is down, they might ask the next one; if a server is slow, they might wait more time for it; and so forth)
There is zero overhead, packets are transmitted in its 'purest' form — they just have the address of the sender and the receiver and the payload, and nothing else. Therefore, the cost of establishing a new communication is virtually zero — it's far cheaper as well as easier simply to send a packet again, instead of using a very complex algorithm to figure out if 'the other side' is up and replying correctly to one's requests

Other common examples of using UDP are the Network Time Protocol (a way to synchronise all computer clocks in the world — they might not be always right at all the time, but, eventually, each and every computer will get some answer from one time server which is enough to adjust their own clocks) and... highly interactive games. This might sound surprising for such an unreliable protocol, but the truth is that on multiplayer games 'losing' a packet or two (sending positioning data, or information about a shot being fired or the ball being kicked) is not relevant. Games will use predictive algorithms to try to 'guesstimate' where the other players will be, based on packets already received, and draw the scene accordingly; a few packets lost for a few microseconds might not be relevant, and once they are received, the overall scene can be adjusted. Obviously, the more reliable the connection, and the faster the computer used to play, the higher the likelihood of successfully sending and receiving UDP packets. But take into account that, for a game displaying 60fps (enough to create a very convincing illusion of 'smooth' animation), the system has roughly 16 milliseconds to draw each frame. On a superfast local Ethernet connection you can expect ping times of 50 microseconds — plenty of time to exchange quite a lot of data and still have enough CPU cycles to do all those frames! — but even reaching out from my home fibre-installation, I reach the domain server I use from Cloudflare in 2-3 milliseconds, which is really not bad (I'm actually accessing a server installed in my country, thus the short delay). Note that the ping command uses UDP — if 1.1.1.1 were a game server, then I'd have plenty of milliseconds left to draw a frame.

(Again, take into account that all the above is another oversimplification: in practice, games will not need to get updates every frame; also, it's not likely that each communication packet with just 4096 bytes will be enough for everything — consider the download of a 'new' 3D object and especially all its textures! Obviously, things are not as simple as I have described...)

Okay. This should be enough to make you wonder why there aren't more communication protocols based on UDP, since it seems to be so blindingly fast and tremendously helpful to avoid all the problems and issues that TCP has. The simple answer is that UDP is 'too bare', or 'too raw'. It is very useful in situations where transmitting and receiving all the packets is not crucial. It's good when either the client or the server doesn't really need to 'track' anything (i.e. a DNS server will just reply to whatever request they receive — it's not important to keep track of each individual communication, or if the same packet is sent a million times because the network is down, etc.). But TCP has the advantage of guaranteed delivery — at the cost of having a lot of overhead. As the Internet becomes more and more complex, TCP has become an even more complicated protocol — these days, having to deal with congestion issues and multiple pipelines using the same channel, not to mention things like VPNs, NAT and reverse-NAT (look up all of those nifty features), as well as a hundred (a thousand?) options to configure and setup a specific communication channel — how many packets to send/receive before acknowledging them, for instance; how much memory to allocate to buffers; how much time to wait until an expected reply comes; and so forth.

So let's step up again from the lowest levels of data communications and get back to the application layer, where the developers' worries centre around delivering as much Web content in the least amount of time possible.

HTTP/2 solved a lot of issues while keeping a 'familiar' interface to the higher layers: for example, if a JavaScript developer wants to open a WebSocket from their application-embedded-on-a-HTML-page to a back-end server somewhere on the Internet, they don't wish to concern themselves how the communication is actually being established, i.e. if the underlying system is using HTTP/2 or any other protocol. In fact, since the adoption of those lower-level protocols depends on vendors providing either browsers or web servers with those capabilities, the same JavaScript application may be used to communicate, via WebSockets, to a multitude of possible configurations; it cannot, therefore, 'assume' anything about 'what happens below'. One might think that knowing just a little bit more would allow developers to tweak their code to give some extra improvement here or there, depending on what protocol is being used at the lower level; but this is exactly the reason why all these things are cleanly separated in different layers. If you connect your laptop to the Internet, you want to access www.google.com by just typing that address and have it working. You don't wish to worry if your connection is via an Ethernet link, Wi-Fi, or a cellular data network. You don't wish to install browser A or B (which might not be available for your laptop!) just to be able to use one or the other. Thus, from the perspective of both the end-user and the developer, they wish only to concern themselves about the 'upper' level of Internet communications, and that's the very abstract 'application layer'. The other layers are supposed to be 'made compatible' so that everything works exactly in the same way 'up above', no matter how you're physically connected to the Internet (and this is one of the biggest advantages that the Internet brought over many other networking protocols of the mid-1960s).

It's thus not surprising that Google researchers, at some point, asked themselves: what if we could implement HTTP not over traditional TCP/IP sockets, but, instead, use UDP/IP sockets? What would have to change so that we get reliable communications at the application level — while, at the same time, benefitting from the close-to-zero overhead of UDP connections? How will we deal with lost packets, transmission errors, encryption, compression? What has to be changed at the browser level and at the server level so that we can get them to work together — without requiring web designers to change their HTML and CSS, or JavaScript developers to rewrite their code?

Around 2018 or thereabouts, the QUIC protocol, which I mentioned above, emerged from the research labs and started being tested outside the lab environment; shortly afterwards, it was adopted as a brand new Internet standard, HTTP/3, although some changes have been made to QUIC (there were a few security flaws present in the encryption algorithm originally used by Google, so this required some fixing for the 'final' protocol). Google obviously implemented it on their own servers and made Chrome support it (with some caveats...) — if you use a 'modern' browser these days, you'll access everything on Google's servers via HTTP/3. Not to be left behind, Facebook implemented it on their own systems as well. Perhaps the biggest advantage of having a protocol over UDP is the way video streaming can be handled. Video can afford to lose a few packets here and there and still deliver enough quality to the end-user; most importantly, the 'dirty' work of handling things like a slow server, a slow browser running on a slow computer, and a congested network — all that is a problem to be solved at the HTTP/3 level, not at the top levels. If you have an HTTP/3-capable browser, you may have noticed that, these days, videos on YouTube do not 'degrade' their performance substantially during a video session (i.e. offering 720p, downgrading to 480p if the connection is bad, eventually going towards further degradation until the transmission becomes stable). There are many reasons for that, but one of those is that video over HTTP/3 is far less congested and degradation is less noticeable. Why? Well, consider the differences that we have seen between a TCP/IP and a UDP/IP connection. Under TCP/IP, what happens if a user has such a bad connection to the Internet that it cannot receive large video packets at a rate required for a certain resolution (480p, 720p, 1080p, 4K, etc.)? Well, both sides have to negotiate constantly what transmission rate is considered acceptable — if packets start to get dropped too often, the receiving side will inform the server side to 'slow it down', which, in turn, communicates that to the application level: 'here is a user which cannot receive packets so quickly, we have to send them less packets, and wait more time between each'. The application then says, 'well, ok, let's re-establish the connection, but this time, we'll send 480p video as opposed to 720p, and see if they can handle that'. This may happen every few seconds, as communication quality fluctuates, as well as congestion levels increase or decrease. Thus, using HTTP/1.1 or HTTP/2, a lot of work is being done at the transport level (and below) to keep the connection 'live', at the cost of a decrease of performance; and, eventually, connections break and have to be re-established. All this takes up time and resources.

These can be avoided with HTTP/3. UDP packets will be sent and eventually received. If they are, it's up to the receiving end to assemble them together — as best as they can — and as fast as possible. If they are enough to deliver smooth 720p or 1080p video — great. If there is serious congestion and packets get dropped, tough — your browser might drop a few frames now and then, but, overall, the quality will not degrade much, or, if it does, it will recover quickly. I'm obviously not familiar with the details of the implementation, but I can imagine that the UDP packets will just carry some identifier to allow them to be ordered. MPEG-4 (or WebM) video is lossy anyway: if you send out 1000 packets, but #3-#45 are not received, the difference in the way the frames get assembled is not very relevant: you lose some quality on a few frames, but the human eye is not that sensitive to such details. That's a reason why video streaming protocols such as the Real-Time Streaming Protocol, which runs on top of RTP, which, in turn, is implemented using UDP/IP.

Granted, even a stream of video frames requires some form of control; one thing is dropping a few frames now and then, or receiving packets out of order — which will make little difference if they're being stored in a buffer before viewing — but there are other things that also require attention: namely, the user may pause the video, and that means sending back information which requires a degree of reliability (if the pause button doesn't work the first time — due to a lost packet — that wil be fine, but if it happens over and over again, the user gets annoyed!); similarly, users might wish to reset the video size (or its quality), add higher-quality audio (or not at all!), get sync'ed subtitles, and so forth. Clearly, there has to be a way to implement such things using a 'connectionless' protocol such as UDP. For instance, the companion protocol to RTP is RTCP (Real-Time Control Protocol) and to handle things like network congestion there is even RSVP (Resource Reservation Protocol — the acronym is not a coincidence, of course!).

It's legitimate to ask if so many things running on top of good, old, absolutely unreliable UDP won't make it as slow as TCP! The answer is 'no'. TCP does all of that and way, way more. When streaming video (or dealing with positioning data on first-person-shooter games), having a fast, near-real-time communication channel that is 'lossy' is often more than acceptable, when the alternative is 'slow' TCP. There are image/video formats that are already lossy to begin with; with a few tricks, these can benefit from a lossy — but much faster! — transport mechanism.

The same is not true to other things, of course, such as, say... web pages. Sure, web pages also have things like JPEGs — images compressed in a lossy way! — and even video. Also, you can get a lot of things out of order (which, these days, happens anyway under HTTP/1.1 and/or HTTP/2...) and things will still 'work'. If an image is downloading slower than expected, the rest of the webpage can be rendered until the browser finishes the download; the user will see a completely-drawn page with just a missing image. Interlaced JPEG can even be progressively loaded, showing a low-rez image first but which will get rendered with more and more quality as further data is received. These days, images load so fast even in residential broadband that such benefits are often overlooked.

But you can't do much with having just partial HTML. Or partial CSS, or partial JavaScript. Among those, the worst is not having any HTML at all — CSS can make a lot of difference (just try to render any contemporary website with a text-only browser such as Lynx!), but the browser can still apply reasonable defaults here and there; on the other hand, if JavaScript is required to provide fundamental functionality (and not merely used for some cute effects — 'eye candy') then there is nothing you can do.

The good news is that, in general, images and video consume far more network resources than plain text (which can even be minified and compressed before sending them to the browser). Thus, a careful balance of what requires being 'intact' at the browser end, and what can eventually be 'partial' or incomplete — while still allowing the page to load and be useful! — might result in better performance overall, less congestion, and the ability for the server to dispatch answers simultaneously via 'connectionless' mechanisms to a multitude of browsers waiting for content, without worrying about the rigours of establishing a full TCP connection and carefully marshalling packets through the 'pipe'. Instead, packets are sent out under a 'fire and forget' policy — some may reach the destination, some may not, and some may end up with errors or out of order; in many cases, clever buffering and a few tricks of the trade will allow the receiving end to make some sense of what has been already received and eventually assemble whatever it can into what looks like the intended web page. This sounds like a very rough, 'patchy' solution just to slice a few milliseconds from a Web connection; but all those milliseconds count, especially for the Google algorithms which give better ranking to websites that answer more quickly.

Therefore — QUIC, or, better, HTTP/3 (which can be understood as 'an implementation of HTTP implemented over QUIC'). Google and Facebook, at least, believe that it's worth the trouble — and so does Cloudflare. Because almost all of my websites in my server use Cloudflare as their frontend, this means that all are HTTP/3-enabled (there are two exceptions — which, for many reasons, cannot be directly served from Cloudflare, so I've attempted to patch nginx in order to get HTTP/3 support. Sometimes it even works).

To test HTTP/3 with a compliant browser, you can take a look at https://cloudflare-quic.com/ — it will tell you if your browser is HTTP/3-compliant or not. Sometimes, due to the nature of how HTTP/3 works, you will need to refresh the page a few times. Or delete the cache for the page and try again later. Or sometimes it won't work at all. However, I have found that at the very least Google, Facebook and Cloudflare (meaning: everything that is hosted 'behind' Cloudflare and has the switch for HTTP/3 toggled to 'on') are very reliable: there is next-to-zero difference between accessing a page via HTTP/2 or HTTP/3 — with the difference that HTTP/3 pages load much faster (you can do some tests!). Then again, I wonder if this is the case on very congested networks — will HTTP/3 outperform HTTP/2 in such scenarios? If yes, then we can safely predict that HTTP/3 will be 'the future' of the Web. If not, well... it was an interesting attempt, and it may consume far less resources on the server side of things, so it might be still around for a while. Or... they might tweak the protocol (still under development!) to deal better with some edge situations and critical scenarios.

The ever-so-useful Can I Use website also keeps track of which browsers already support HTTP/3. At the time of writing, under Safari 14 (macOS and iOS), this comes out of the box — just point to Facebook, any Google site, any site that has Cloudflare enabled, and that's it. You can use the Inspector from the Developer Tools to verify which sites have been downloaded using HTTP/3. Because the standard is not 'fixed' yet, when using the Developer Tools, under the Network tab, you will have a way to enable a column for Protocol (it's disabled by default), and HTTP/3 connections will be listed as h3-XX, where the XX stands for the supported draft specification (at the time of writing, h3-29 seems to be the most frequent version, although there are already a few more recent drafts).

Currently, besides Safari, both Chrome (and Chromium) and Firefox support HTTP/3. Under Firefox, you just have to go to about:config and set network.http.http3.enabled to true (and relaunch). Chrome needs some command-line arguments, but it should work as well. Because Microsoft Edge is based on Chromium, you should be able to do the same command-line trick, but it seems only to work for Microsoft Edge Canary for now. This is due to change quickly, so I imagine that the next time you read this article, every major browser will support it.

Do not despair if things don't work quite as expected. For instance, the latest versions of Firefox somehow don't like the Cloudflare implementation of HTTP/3 — but they have no trouble with Google and/or Facebook. This is normal: the draft specifications are in flux and due to be implemented by browser developers at different paces. And, as said, at the other side of the equation — the web server — things can also be weird to figure out. For instance, at some point, I managed to get my non-Cloudflare'd websites respond perfectly to HTTP/3. Then they started showing a few quirks, now and then — such as pages taking too much time to load, needing a refresh, eventually timing out. And one day I woke up to find that none of my web sites worked. I don't remember having done anything overnight... but, alas, such is the nature of HTTP/3: it's not that easy to fine-tune, and probably I added something (or deleted something!) which made things stop working.

Note that in a well-designed 'fallback' scenario, the browser will try to communicate with the web server establishing a HTTP/1.1 connection, and announce what kinds of protocols it supports (even https is often negotiated this way, especially if you're visiting a web site for the first time). If both sides can agree on the same draft of HTTP/3 they support, the connection will be 'upgraded' to HTTP/3, and the browser will 'remember' that choice — the next time, it will simply make requests using HTTP/3 and don't even bother opening TCP connections and using plain old HTTP/1.1. If they cannot negotiate a HTTP/3 connection, they will fall back to HTTP/2 — and the browser will also 'remember' that HTTP/3 didn't work but HTTP/2 was fine, so, the next time, it will use HTTP/2 instead.

It seems that this 'remembering' is sometimes counter-productive. For instance, I believe I might have added or deleted something on my server configuration, and all of a sudden, HTTP/3 support was 'broken'. However, because the browsers 'remembered' that I used HTTP/3 before, they tried that again — and failed. So they fell back to HTTP/2 — which worked, so they remembered that instead. From now on, they're 'stuck' with HTTP/2, and even deleting cache, cookies, and so forth, they will stubbornly refuse to attempt a HTTP/3 connection again...

Oh well. I know that it will eventually work exactly as intended. There is still a long way to go... but somehow it feels exciting! 😄

Testing & Tweaking HTTP/3

About me

Popular Tags

Archives

Feed