Over the last couple of weeks we have drastically upgraded the performance, security and versatility of the front-facing http layer for all mendixcloud.com apps.
If you haven’t noticed anything: great! All changes have been rolled out without any downtime whatsoever.
However, if you have been paying close attention to your apps behaviour, you might have noticed the following improvements:
* faster load times
* spdy support
* less TLS handshakes
* ipv6 support
* better TLS ciphers
* more resources with gzip compression
* websocket support
In this article I will go over some of the changes and explain the technologies behind them.
HTTP Architecture Overview
Your http requests are tranmsitted over a TCP connection, and this TCP connection is set up through a TCP loadbalancer to one of our webservers. The webserver does two things:
* handle SSL/TLS
* proxy http requests through to the application server
So in each datacenter we have a TCP loadbalancer, a small number of webservers, and a large number of application servers.
As you might know, the number of IPv4 (Internet Protocol version 4) addresses is around 4 billion. That seems like a lot, untill you realise that ideally, any device could directly communicate with any other device over the internet, and that there are over 7 billion people, and that people in the western world have about 5 devices that are connected to the internet. So why is this a problem? Imagine that if you posted a package and you could only put the zip code and house number as the addressee. So no more names, departments, etc. How would you send this package to a specific department in a company, or a birthday card to the correct family member? Suppose the father of the household of Main Street 15 said “Let’s pretend that I live at 15a, mother lives at 15b, our son at 15c and our daugther at 15d, but we’ll keep on using one mailbox: nr. 15”. The good reaction in this case would be: that’s a stupid idea! Sure, it works, but that’s not what the letter suffixes in house numbers are for! That’s for when one house splits into multiple addresses, and does not want its neighbours to have to get address 19 instead of 17.
Nevertheless, this is the the standard solution on the internet. It’s called Network Address Translation (NAT): you share one ip address with multiple devices, and you use the port number as identification of the specific addressee. So to reach a specific device behind your router, you might forward port 8080 to that machine. This might seem like a good idea, but unfortunately, port numbers, like letter suffixes, are already used for something else. Ports are for specifying the kind of service you want to use on a device, like web traffic (80), encrypted web traffic (443), mail (25) etc.. The result: a service can only run on one device of you are behind a NAT gateway (like the router in your home). If you want to access two http devices from outside (your NAS and your webcam) you are out of luck and have to resort to ugly workarounds. NAT is rightly considered a dirty hack that everyone uses.
The real solution is IPv6, with more possible addresses than there are atoms on earth. However, IPv6 adoption is slow, at this moment only about 2% of all internet traffic is IPv6. This is a catch 22 situation: companies have to support IPv4 because most customers do not have IPv6 yet, and customers have no need to start using IPv6 because all companies still (have to) support IPv4. There is some progress though, (comcast link) and perhaps the Internet of Things would benefit from IPv6 so much that customer demand will grow.
What needs to happen is that companies should start feeling the pressure to support IPv6, and they should take their responsibility seriously. Which is something we do, and as per last week the last parts of our applications have been made accessible over IPv6.
Check out the IPv6 notificator chrome extension, so you can see if your provider and the website you are visiting are doing their part!
Over the last years you might have heard something about HTTP2.0 or SPDY. SPDY is a Google initiative for improving HTTP without having to wait for the official committees. What is wrong with HTTP1.1? Not that much really, it’s just that the demands have grown considerably. In order to display a website, the browser has to perform a lot of requests. The average is about 40 on a first visit. In the beginning of the internet it would be one: an html document. To cope with this, browsers open a pool of sockets to the webserver. With SPDY, this is no longer necessary, as requests can be multiplexed in a single TCP connection. You can easily see how many connections are used with the netstat command on *nix machines.
There are a lot of improvements:
* header deduplication
* request multiplexing
* server push
As we use NGINX, we are still tied to spdy2, we will have to wait for spdy3.1 to get accepted in a stable release.
TCP Load Balancing
IaaS providers like Rackspace and Linode allow you to create TCP loadbalancers, which we use extensively. By default these will be set to round robin scheduling. This makes sense, until you realize that a client might end up at a different webserver for each HTTPS connection. If the server is different, SSL session resumation can not be used and the SSL handshake has to be performed again. That means a cost of one additional round trip time (the time it takes for a complete request/response cycle).
To make sure a client ends up at the same webserver each time (and still be highly available) you can enable session persistence with your loadbalancer configuration.
However, if your browser supports SPDY this isn’t as big of an issue as you are already using only one TCP connection.
A couple of attacks on SSL/TLS have appeared in the last years, such as BEAST. The key to defeating these vulnerabilities is setting the order of the prefered cipher suites on the webserver, and excluding broken ciphers. This is a bit of a puzzle, as you will still need to support all browsers and operating systems.
It appears that we solved this puzzle successfully, at least according to the SSL Test at https://www.ssllabs.com/ssltest/analyze.html.
When you can’t crack an encrypted communication channel, a good idea might be to simply capture all the encrypted data so that you can crack it later. With some encryption ciphers, once you’ve discovered the private key you can decrypt any secure communication you recorded earlier. Ciphers that employ forward secrecy make this virtually impossible. Once you’ve cracked the key for one session, you still have to do the same for all the other sessions you recorded.
Like everything in this post, we support it. Check the status for your app at https://www.ssllabs.com/ssltest/analyze.html
Strict-Transport-Security: avoid the man in the middle
If you type in a url like mail.google.com in your web browser you end up at http://mail.google.com/ which is unsafe traffic, which means it can be intercepted and manipulated on any device between you and the server at Google. In this case http://mail.google.com/ will send a redirect to https://mail.google.com/ and from that point on your connection will be safe. Suppose however that someone intercepts the redirect and starts acting as a proxy: it gets your plain http requests, and sends them to https://mail.google.com/. When it gets a response from https://mail.google.com, it decrypts it and sends you the response as plain http. This way the interceptor can eavesdrop and manipulate all your data, and you will never notice a difference. This is called a man-in-the-middle attack.
There is no 100% fix, but the HTTP Strict Transport Security header tells your browser that it should never access the site in plain HTTP again. We use this to limit the vulnerability window to the first visit to the page.
Since a couple of weeks ago we send the HSTS header on all our HTTPS urls.