Starting with January, we’re going to post a monthly summary that will give you a bit of insight into what’s happened at freistil IT.
On our managed hosting platform, we’re now hosting 233 websites for which Pingdom reported an average availability of 99.94%. When we take into account that some of our customers chose to run their sites on a single box and/or near the maximum capacity of their cluster, this is a good result.
Our edge routers delivered a total traffic volume of 9.611 TB in January. Let’s see if we’ll crack the 10 TB threshold in February!
In January, we received 181 support requests. We solved 140, leaving a backlog of 41 tickets (most of these are usually pending customer response). Our average reaction time in this month is… catastrophic. A single request that went unanswered for half a year because it was covered elsewhere launched our average response time to a whopping 250 hours. When we break reaction time down into categories, you can see that this is an absolute outlier. Of all support requests in January, we answered:
- 45% within the first hour,
- 28% in 1–8h,
- 8% in 8–24h, and
- for only 19%, it took us longer than 24h to react.
We’re very happy that the support feedback we got was 100% positive. Here are some notable comments:
- “Competent feedback as always.”
- “Fast and reliable!”
- “freistil is always there when you need them.”
Our team is taking care of 264 servers, resulting in an average of 132 servers per system administrator. We think that’s quite respectable.
Each minute, our monitoring system processes 99518 separate metrics. In January, it sent 1781 alerts to the on-call engineer. This is an enormous amount, 54% more than in December. There were no significant outages during this month, though. Like with our support request reaction time, this value is so high because of an anomaly for which we needed to adapt our alert thresholds first.
Jochen attended the first Drupal Dublin meet-up in the new year. The group was treated to some detailed peeks behind the curtains of a huge museum website.
Other notable things
We signed the contract with our first employee! It took us quite some time to work out the right way to put our Results-Only Work Environment into legal writing, but now we’re excited for the first system administrator to join our team.
Overall, it’s been a good start into 2014 and we’re highly motivated to make it a great year!
Any questions? Please post them in the comments!
07 Feb 2014
2014 has gotten off to a good start but it’s still time to do a review of the last year. Learning is a big part of what we do at freistil IT and 2013 did teach us quite a lot of things!
I don’t exaggerate when I say that 2013 started with the most important events of the whole year. My son Richard was born in January and Markus became a #newdad with the arrival of his wonderful daughter Marlene in February. First there had been a comfortable span of time between the two estimated birth dates, but Richard decided to take his time while Marlene wasn’t patient at all. This lead to the first challenge in 2013: How can we both at the same time put our “Family first” principle to practice by taking care of mothers and children without losing our productivity (or even actual business)? The serious product problems we had at that time made things even more difficult, but I’ll come back to that later. Well, we managed somehow and two beautiful children are celebrating their first birthdays these days.
In June, we went to the Emerald Isle for Drupal Dev Days Dublin. freistil IT was Gold Sponsor and we almost didn’t get our freistilbox banners made in time because of a botched job by the printer. At the event, we launched freistilbox Solo, our virtual freistilbox environment for development and testing. I had arrived a few days early to start scouting for houses because this would be the year my family would finally make a long-held dream come true and move to Ireland.
In August, I flew over with my family and, as luck would have it, we found a house on the very first day of hunting. freistil IT is a distributed team and completely location-independent, so my move didn’t cause any major disruption. I only had to make do with a 3G modem until they switched on our broadband. In Freiburg, I had worked from a shared office in town, now my office is only a few steps from my bedroom. A home office has its own challenges, but if things gets too distracting, I simply go to a coffee shop like the one I’m typing this right now.
Drupal Ireland is a lovely bunch of people and I enjoy the monthly meetups in Dublin. In September, I met a few of the folks at the Prague airport where I had arrived for DrupalCon Europe and not much later, we found ourselves at an Irish Pub watching the GAA final. Prague actually was my first trip to eastern Europe and I enjoyed it very much. DrupalCon was a good opportunity to catch up with some of our customers. Not all of the feedback we got in these talks was positive but that’s not why we do them anyway: Improving first needs learning what we can do better. That’s why we appreciate honest feedback , and in the long run, constructive criticism pays for both sides.
We attended not only Drupal conferences, though. Since we’re doing cutting-edge IT work, we spent time exchanging knowledge and experience at DevOps Days in London and Berlin as well as at the Open Source Datacenter Conference in Nuremberg. We found these events great to gain new impulses and to share what we’ve learned.
During the year, especially in the second half, the reliability of our freistilbox hosting platform suffered a lot from network issues. In order to be more resilient against power or network outages, we had spread our servers over several datacenters. We didn’t experience a single outage on the datacenter level. What happened a lot, though, was the “noisy neighbor” problem: servers on the same network segment that were either origin or target of a DDoS attack which then impacted the whole segment. The only way to solve these problems back then was to notify datacenter staff who then identified and isolated the server involved in the attack; a procedure that usually takes 5 to 10 minutes. We realised that the distribution of our servers exposed our infrastructure too much to these problems and decided to move into dedicated racks that housed only our own servers. With that change, things got much quieter. This is a big advantage our bare-metal infrastructure has over cloud-based solutions: we have full control over where our IT resources are located and how much or how little they share with others. In November, our datacenter provider experienced a massive DDoS attack with no clear target and a traffic volume that caused problems even at their uplink carrier. It was a rough weekend for us and our customers. Interestingly, after that event, we’ve not experienced any other serious network problem at all. Looks like our datacenter provider made some effective changes. Additionally, they officially announced plans to overhaul their network infrastructure and to put anti-DDoS systems in place.
We’re proud to say that we achieved quite a bit in 2013, especially with freistilbox. To be honest, we had started the year with a massive low because we launched the platform before it was ready for prime time. The negative feedback from our customers made it clear that by rushing the launch, we had caused big disappointment and lost a lot of our customer’s trust. Some customers also questioned the decision to divert manpower from the production platform to freistilbox Solo. In consequence, making the system reliable and efficient was our topmost priority during the year and we put a lot of hard work and many hours into it. I’m very happy to say that we now managed to give freistilbox the high level of quality it should have had from the start. In a round of customer calls I did in December, we got a lot of praise for how well the platform is working and how good a job we do with improving and supporting it. We are incredibly grateful to our customers for all their feedback.
We received 1,693 support requests last year, down 28% from 2,367 in 2012. The first quarter had the most new tickets (479) and Q4 ranked lowest (367). This is a good sign that we’re making progress in improving both the quality and the ease of use of freistilbox.
For a team with only two DevOps engineers, we’ve certainly created a great product for hosting Drupal and WordPress sites. And we’re only able to keep up doing the daily business as well as the tech support and 24/7 on-call because we’ve learned to collaborate effectively. The key to effective teamwork is communication, especially in a distributed team. Judging from the fact that we replaced Campfire as our communication backbone with HipChat, and heavily use Confluence for internal documentation, JIRA for task and project management and BitBucket as our code repository, Atlassian products seem to fit our needs quite well. Another communication product that I first didn’t expect to catch on actually became one of our most important channels: Sqwiggle. It took a bit getting used to a slight “Big Brother” feeling but we highly appreciate being able to tell at a glance who’s currently at their desk and to start a video call with a single click.
Over the recent months, we also reshuffled our areas of responsibility in order to put our strengths to use more effectively. Markus took over some of my technical tasks and now takes care of daily operations. I’ve assumed a more strategic position in business development.
Laying stronger foundations: We’ve chosen Ireland as our new base of operations because of its growing importance as a tech hub for Europe. Our new company “freistil IT Ltd” will soon take over all business we’ve done so far as the partnership “freistil IT GbR”. The first benefit this change brings for our existing customers is that we’ll now be able to accept credit card payments.
Growing our team: We’re great in automating IT processes but we can’t automate innovation, that needs pure brain power. And for better “load distribution”, we need more “nodes”. We’ve been looking for quite some time and finally found two talented system administrators that are going to join our team over the coming weeks. This means we need to learn quickly about hiring and being a great employer.
Expanding our business: So far, we’ve mostly gained new customers by word of mouth. We love the fact — and we can’t be thankful enough for it — that our customers are so happy with freistilbox that they recommend our hosting platform to their friends and clients. To really expand our customer base internationally, though, we need to increase our sales and marketing activities. Since neither Markus nor I have a strong background in these areas, we decided to get external help. Thanks to an Enterprise Ireland Mentor Grant, a consultant experienced in international business will work with us on our growth strategy over the coming months.
It’s obvious that we’ve learned a lot of things in 2013, sometimes the hard way. We’re thankful for all the good will, feedback, advice and encouragement we’ve received from our families, friends and our customers. Every day, we’re getting better in fulfilling our mission: Making sure that our customers can work efficiently and sleep peacefully. To a great 2014!
 I recently found out that we were destined to go to this conference: Who would have thought that “Markus & Jochen” is an anagram for “Shamrock & June”!
29 Jan 2014
The important aspects for working in a remote team like here at freistil IT are very well explained in Mark Campbell’s blog entry “How to work remotely as a software developer”:
- Setting limits
- Time boxing
I’ve found the communication aspect the most essential one, because a remote team needs to compensate for the distance between coworkers. In an office, there are countless opportunities to have short exchanges to bounce ideas off each other, discuss findings and talk about other things. So, to keep round-trip times short in a virtual office, it’s important to use fast communication channels like HipChat or Sqwiggle. (We’ll expand on how we actually use these tools in later posts.)
There’s another great insight in Mark’s post I can wholeheartedly agree to:
“Working from a coffee shop is great. People will ignore you, you get hopped up on caffeine, and there’s a constant noise level about you.”
Oh, and by the way, Mark’s points are fully valid also for system administrators.
(Picture credit: Tracy Ruggles)
06 Jan 2014
From this Saturday (2013-12-21) on, our team will be off recharging.
During the holiday time, we’ll only do emergency support. That means we’ll only handle outages and other incidents that impact the delivery of existing websites.
We’ll resume working on tasks that aren’t connected to such incidents on Monday, January 6th 2014.
17 Dec 2013
We’re now providing web applications with two variants of temporary file directories: one that is shared between boxes and a faster one that is stored locally on each node of a freistilbox cluster.
Our main goal with freistilbox is giving website developers maximum performance at minimum effort. Storage access is an important aspect in web performance tuning and it’s a good idea to avoid expensive disk operations whenever possible. That’s why we decided to store temporary files created by web applications locally. It’s obvious that writing to a local disk is by far faster than shipping them over a network connection to a shared storage.
We chose that approach under the premise that temporary files are only created and used during a single content request, for example for uploading a file, aggregating CSS code or compressing data. Under this condition, it doesn’t matter when the next request is handled by another cluster box that has its own separate temporary file directory.
Support requests we got over the recent weeks were a clear indication that this premise was wrong. As it turns out, there are situations where temporary files are expected to persist beyond the lifetime of a single content request.
As an example, there was the customer who noticed that batch operations of the “views_export” Drupal module delivered incomplete data. We found out that the batch process saves intermediate results to the temp space. Since the batch runs were distributed over their boxes, so were the result files. At the end of the batch process, the box doing the final run only found the files that were created on this particular box and so returned corrupted data.
In order to make sure that even temporary files are visible to all boxes in a consistent way, we decided to relocate them to the shared file storage. The default temporary file directory available at ../tmp, relative to the document root, is now shared.
Obviously this has a significant impact on performance: the data still has to be written to and read from disk, but now on multiple separate storage servers; and the data transfer over the network comes on top. For those customers that don’t need a shared temporary file directory but depend on speedy file handling, we now also provide an alternative in the form of ../tmp_local.
Oh, and we also put a cleanup process in place that makes sure that temporary files stay true to their name: Files that haven’t been touched for a week are removed automatically.
05 Dec 2013
I’m on my way to Dublin where I’ll take the AirCoach bus to Cork. For the coming two days, the local university will be the venue for DrupalCamp Cork. Judging from the list of participants, Drupalcamp Cork is going to be a nice, small gathering of Drupal users from all over Ireland. I like it already.
Tomorrow, I’ll participate actively by giving a talk. I’ve submitted it as “Building a high-performance system stack” but I’ll shorten the title to “Supercharging Drupal”. In this talk, I’ll cover the most common ways of optimizing Drupal performance on the hosting layer.
I’ll also see if there’s an opportunity to demonstrate how easy it is to launch a website on freistilbox.
If you’re in Cork for DrupalCamp, be sure to say hello to me! Who knows, I just might invite you for a beer. And if that’s not enough: Order your new freistilbox cluster during the conference through me and get the first month for free!
07 Nov 2013
One of the most important keys to website performance is caching. That’s why freistilbox includes multiple caching services, first and foremost the Varnish HTTP cache. On our load balancers, Varnish stores all the content your web application allows to be cached. During the lifetime of the respective cache content, Varnish answers incoming requests right from its memory cache instead of forwarding them to the application boxes. This speeds up delivery by orders of magnitude.
But what about these requests that can’t be cached? Of course, not everything can be delivered from the cache all the time. Before it can be cached, content needs to be generated by your web application at least once in a while. So a certain percentage of web requests must be processed by your application. Each request your application receives is assigned to a single Processing Unit (PU). This PU then executes your application code in order to process and respond to the request.
The number of simultaneous requests that can be handled is limited by the total PU available to your freistilbox cluster which in turn depends on the size and number of boxes that make up your cluster. Up until now, these were the effective PU limits for our freistilbox sizes:*freistilbox S: 5 PU
- freistilbox M: 15 PU
- freistilbox L: 35 PU
- freistilbox XL: 75 PU
After a number of small freistilbox setups experienced occasional overload, we realised that with 5 PU, a single freistilbox S just doesn’t have enough capacity for production websites (except the most static ones). Since modern web browsers open multiple connections to fetch different assets in parallel, even only a single visitor could use up all five available Processing Units and block the website for everyone else. That’s why we recently started to advise customers not to use a single freistilbox S for production websites.
That didn’t feel right, though. There shouldn’t be even a single freistilbox configuration that isn’t up to the task. So we decided to increase the PU limits on freistilbox S, M and L. The new specs are as follows:
- freistilbox S: 10 PU (+5)
- freistilbox M: 25 PU (+10)
- freistilbox L: 40 PU (+5)
With 10 PU, a single freistilbox S still won’t be powerful enough to run a busy community website but it should now have enough capacity to reliably serve a medium website that has a decent cache hit ratio.
Oh, and all existing freistilbox customers have already been auto-upgraded!
freistilbox with more power — for the same price. What’s not to love?
What do you think? Leave us a comment below!
17 Oct 2013
Last week, Markus and I returned from DrupalCon Prague back to our desks in Germany and Ireland, respectively. It was a fun event and I’d like to tell you about my personal highlights.
First of all, DrupalCon is the biggest event for the Drupal community and the perfect opportunity to see and meet all the people that make Drupal a great open source project. Actually, meeting people was the main reason I flew to Prague. Especially in terms of customer contact, talking in person can’t be beat. The more if they praise our services in front of a lot other Drupal business people. ;-) That’s why I had a great conference start at the CxO meeting on Monday.
The fun continued early Tuesday morning (sadly, too early for many) with “Tutti fan’ Drupal”, a musical play “in which ‘The N00b’, a young, inexperienced web developer meets ‘The Client’ who needs a website”. The hilarious piece also featured “The Drupal Community on a Bad Day”, “Drupalgeno and Drupalgena” as well as “The Drupal Community on a Good Day”. I had a lot of laughs and learned that the highly sought-after “Drupal Talent” also includes fabulous singing voices.
Later, in his “State of Drupal” keynote, project founder Dries Buytaert explained his vision: “Drupal is bigger than technology. It’s an idea.” So, before going into detail about what’s happening around the next major release of Drupal 8, Dries listed what he sees as the most important drivers for our activities:
- “We’re changing the world.”
- “We help individuals build a dream.”
- “We give small organizations a big voice.”
- “We give enterprises a new idea.”
- “We inspire wonder and delight.”
- “We admit no boundaries.”,
Especially for me as someone who isn’t directly involved in Drupal development, it was highly interesting to see what technological changes Drupal 8 will bring. And I was amazed by the community support this new release enjoys: With more than 1600 contributors, Drupal 8 in its current pre-alpha stage already has more than twice the number of people involved than Drupal 7 when it was finally released!
The second reason I attended DrupalCon was because I had volunteered to curate its DevOps session track. For months, the DrupalCon content team had done a lot of work to make sure that conference attendees got to select from a wealth of high-quality talks on many different topics. I’d like to thank all speakers I got to work with before and during DrupalCon for their willingness to stand in front of a crowd and share their knowledge. After all, sharing is an integral part of DevOps culture.
During the week in Prague, Markus and I had many valuable conversations with our customers. Not rarely, we got critical feedback on our Drupal hosting platform. While criticism isn’t as easy to accept as praise, it’s essential for us in order to achieve better service quality, so we appreciate your constructive openness.
Although Prague was my third DrupalCon, it was the first time I attended Trivia Night on the last conference day. Organised by my new home team, Drupal Ireland, this entertaining event drew so many Drupalistas to the Hilton Hotel that we ended up sending away people because the room was stuffed with more than 100 people. Alan, you did a tremendous job as MC!
As it is tradition, at the end of this DrupalCon the location of next year’s DrupalCon Europe was announced and we’re looking forward to see what the passionate Dutch Drupal community has in store for us. Another important European event, the Drupal Developer Days, will take place in Szeged, Hungary; you’ll probably see us there, too.
Not so long ago, I had some doubts if attending DrupalCon for me still was worth spending the time and money. DrupalCon Prague got rid of them. I’ll see you in Amsterdam!
11 Oct 2013