Starting freistilbox delivery

Published 2013-01-10 by Jochen Lillich

As we mentioned in our review of 2012, we had to delay the delivery of our new freistilbox infrastructure because we encountered architectural problems. Today, we are happy to announce that after finding some good, long-term solutions, we’ve finally started the rollout of freistilbox clusters.

In this post, we’d like to explain what it was that threw sand between our gears and how we solved the problem.

Idea

On DrupalCONCEPT, we had many services sharing the resources of a server; in the case of DrupalCONCEPT POWER, we even had Git, Varnish, Apache, Solr and MySQL running on a single server. With time, we found that this put too many limitations on performance and scalability optimization. So we decided to run almost every freistilbox service on its own servers, resulting in a completely distributed architecture.

Implementation

From an operations view, freistilbox needs a lot more servers than DrupalCONCEPT: In the backend, there are clusters for Git, MySQL, Solr and file storage. Incoming requests are received by load balancers and SSL offloaders which route them to the customer’s freistilbox cluster. Each of these freistilbox clusters has two servers running Varnish and Memcached, a maintenance server for SSH/SFTP logins and cron jobs, and finally the actual “boxes”, i.e. the application servers running the web applications (Drupal, for example).

First, these servers need to be provisioned. To make this easy, we’ve built a private cloud infrastructure that we operate on bare metal servers leased from our datacenter partners. Thanks to many years of experience with Chef and virtualization, we were able to implement this quite efficiently.

But what caused us a lot of headaches – and the embarrassing delay in delivery – was that these servers needed to be interconnected on the business process level. On a single DrupalCONCEPT server, it was easy for us to synchronize local processes, for example triggering a code deployment after receiving a Git repository update. On freistilbox, however, this synchronization needs to happen between servers. Let’s take the deployment process as an example again:

The customer pushes an update onto the Git server.
The Git server then needs to notify the application servers affected by the update.
Only these application servers finally deploy the changes in parallel which brings the update online.

At my previous jobs, I had experienced how quickly distributed technologies like CORBA can become complicated and costly, so we tried to find a simpler approach. To make a long story short: Try as we might, it turned out that our simple approaches didn’t work as effective or as reliable as we needed them to be. Finally, we bit the bullet and solved the problem with a full-grown orchestration infrastructure based on MCollective.

Conclusion

We’re sad that this conceptual odyssey has cost us as lot of unplanned effort, time and, worst of all, customer trust. Apparently, we had to be reminded the hard way that it isn’t ideas that count but their execution. We won’t make this mistake again.

On the other hand, we’re very happy that we now have all the components in place that we need to build an awesome managed hosting platform.

To all customers who have been waiting for their freistilboxes: The wait is over. We appreciate your patience more than we can put in words, and we promise to make it worth your while.