I just received two bits of good news about an internet-application that was built under my management. I have to be a bit mysterious about the project itself as I’ve signed a NDA but it’s an application that will be visited extremely often with a totally unpredictable planning. Luckily I love a challenge.
NOTE: This project has gone live since the writing of the blog . It was for the Dutch Monarchy and the Orange Foundation, the largest charity of the Netherlands, on occasion of the inauguration of the new Dutch King Willem-Alexander.
So the good news was:
- Our stress tests showed we are over-performing by 10-100 times.
- The company that audited us (KPMG) said: “The application was made with love and we don’t see that very often’. They actually said LOVE and the L-word is not used lightly by this large consultancy firm (and they hardly had any critical points at all).
So I’m a happy camper and thought I would share our approach with you. I’m focusing on performance issues here even though we paid just as much attention to usability, security, browser compatibility, etc etc.
Let me be fair and tell you that I’m just the person that comes up with concepts, knows a bit about the technology and manages things, so who am I to take credit for such a monster of an application? I worked with my Maximonster team: Marc Worrell, Arjan Scherpenisse and Joost Faber and together with the brilliant team at my former company ICATT interactive media.
First of all, what is fast?
To test the application, we’re using a scenario that simulates a user going through the application. It’s a simulation of many different types of browsers in about the same percentages as the real world (x% IE, Y% Firefox etc). We ran tests of up to 400 people doing a complete task in 0.8 seconds. We were serving 120 Megabit/second.
In our case this compares to about 30.000 concurrent users. Enough to run 150.000 users in an hour. And all this while the application runs as smoothly as when it runs with 10 users. We ran this on two servers, while we will be using at least three and can quickly deploy more if needed. Why not test further? Well… our test server just can’t do it right now and for now it’s enough – we underpromised anyway. We estimate that we can double or triple this easily.
So here’s how to do it from a birds eye view.
The obvious things
Build a light application and pay attention to performance issues all the time. Every query, every piece of script is going to count. Don’t use images if you don’t need to, don’t add design to things that don’t need it. If you are working for a client, keep stressing the performance issue even if you get on there nerves. They never will be grateful: if there are no problems, nobody will thank you. If there are problems, you will be remembered so you might as well point out the risks ahead of time.
Architecture
In the past year we have built “Elastic Zontonic”, an architecture developed in Erlang that makes it possible to add nodes (virtual servers or hardware) on the fly. Zotonic is a framework built in Erlang which makes it possible for us to build applications and offers us a backend which gives a lot of controls out of the box.
Each Zotonic nodes does the same as the next one and can fail. We don’t even care if it does. If nodes go down, the others will consolidate missing data. If the node comes back up, it’s great but if not, the others will be just as happy.
Have a stress tester with some good software
In our case we used Visual Studio running off a virtual server running at Rackspace. Rackspace sells servers time by the hour, so you can do this quite easily. Stress testing itself is complicated and hard to understand. You’ll need time from the developers to help build a stress-script and you’ll need to do it several times. Some user scenarios really can’t be simulated and you’ll have to make some guesses. Though we haven’t used it yet there are some interesting cloud based testing products being launched. If you haven’t invested in time and money into offline web-stress-software, you should definitely go that way.
Work with extremely flexible, cheap hosting
Now I’ll start this by saying: I don’t get commission from Rackspace. I compared several hosters at the start of this project and found their pricing to fit the project (our client kept part of the planning a secrete so charging by the hour makes things easier). Rackspace isn’t all perfect (some of the DNS software had some problems) but their service was great, the pricing easy to understand and they won my heart when I called them about stress testing. I had been warned that if we do extreme testing at a virtual server provider, you could get kicked out. So I called Rackspace to check this and they told me; “Oh… do what you like. We probably won’t even notice it.” What a bold thing to say.
We actually developed a module to automatically configure servers with the Rackspace DNS API, so we might just come back.
Content delivery network
Now I’m old enough to remember the days that bandwidth was a problem for websites. Those days are over for most sites but not for high performance situations like this one. So that’s why Amazon and Rackspace offer storage for files, be it images, video’s or just documents. It takes some extra work to use a cdn (it’s a bit more complicated than just a remote ftp site) but once you’ve got it running, it takes all the stress out of storage.
Don’t have a server too far away (Ping!)
One reason you might think that hosting at a hoster like Rackspace is a bad idea is that they are far away and the responstime (measured by the ping time) is much lower if the server is far away. Luckily we found out that Rackspace has a datacenter in the UK, making it much faster for us (in Amsterdam) that it would be in the USA. Most of the visitors will be coming from the Netherlands so that’s important. The difference can be quite dramatic. Before (!) we started, we checked a few sites hosted at the site in the UK and found a ping of 7-8 ms which is fine compared to a server in the US.
Unlimited loadbalancing
Loadbalancing is a tricky subject. You have to configure things and it’s quite common that things go wrong if a loadbalancer looses track of the server. One thing we got from Rackspace is for all practical purposes and unlimited loadbalancer. It’s not free and all data is charged for but it’s cheap enough and it’s one less thing to worry about. What we did to make things a lot simpler is to have a sessionless application. This certainly not always possible but if you can, there are lots of reasons for it.
Underpromise
From a less technical perspective: really underpromise. What ever an individual developer will think, combining technology in general makes things go slower. And if you’re talking hundreds of thousands or millions of users, things will be really massively slower. So – be smart and under promise, add extra hosting budget to your plan, add redundancy and expect the worse. We actually expected our application to run on one third of the hardware that we are using. And when one of the nodes did actually crash during our first major stress test, we were happy to see everything kept running smoothly. Your client will be totally impressed when things are even faster than they expected.
Michiel Klønhammer
Thanks to Marc Worrell and Arjan Scherpenisse!
So looking for an Erlang dream team? Check out www.maximonster.com and get in touch.