Last year, I’ve given a talk at the Open Source Datacenter Conference about how we use Chef to automate our system administration. For us, Chef is the key to efficient IT infrastructure management. Since last year, we have more than doubled the number of our servers while the number of sysadmins still is the same.
But having powerful tools is just half of the equation. Over the recent months, we have learned (sometimes the hard way…) that having mature processes in place is at least equally important.
That’s why my talk at this year’s OSDC 2012 will be about “ Operations and Kanban”. Kanban is an agile task and project management method. There are other well-known agile methods like Scrum, but for IT operations teams, Kanban oftentimes fits the bill better.
In my talk, I will explain the history and basics of the Kanban method and elaborate on how IT teams can implement and use it in their daily practice.
At freistil IT, we’ve been using the Kanban method to organize our work for some time now. We’ve started with simple day-to-day tasks and then gradually extended its use to bigger projects. Kanban can be implemented with a simple whiteboard, but since we are a distributed team, we prefer web-based tools. In this context, we have found Trello an ideal solution. Its design is clearly inspired by the staged Kanban approach, it’s easy to use, there’s an iOS app and it’s free. I’ll give a few examples how we implemented Kanban with Trello in my talk as well.
Judging from the line-up of presenters and topics, OSDC 2012 will again be a great experience and I’d like to thank the awesome folks at Netways for letting me be a part of it!
You’re going to be at OSDC, too? Drop me a line at firstname.lastname@example.org and let’s talk over a few drinks!
22 Apr 2012
For the recent weeks, we’ve been making a great effort to minimize our reaction times to support requests. And we’re proud of the positive feedback we get from our customers every day!
With this improvement in mind, we’d like to inform you that over the Easter days, we’re going to shift down support capacity a few gears. And with “Easter days”, we mean the week from 6th April (Good Friday) to Friday, 13th April.
During that time, support will be limited mostly to emergency cases and we’ll take the liberty of postponing tasks that aren’t related to service problems by a few days.
We’re going to use (or even utilize) this off-time to regroup, spend some quality time with our loved ones and tackle some of those “when I’ve got the time” projects.
Of course, if there’s an emergency, you’ll always be able to reach a qualified member of our team.
We wish you a happy Easter weekend and some joyful spring days! Your freistil team
03 Apr 2012
We’d like to notify our DrupalCONCEPT customers about two important changes in our hosting infrastructure:
*Platform-wide upgrade to PHP 5.3
*Improved deployment process
Both changes are explained in detail below.
They will take place during the time from Thursday, 2012–03–22 23:00, to Friday, 2012–03–23 05:00 (CET) .
This information will also be distributed via our Tech Info newsletter. If you are the technical contact (webmaster, developer, etc.) for your organization’s website(s) hosted on DrupalCONCEPT, please subscribe!
This change affects the DrupalCONCEPT clusters “pro07” to “pro25” as well as “elite5” and “elite6”.
While our newer DrupalCONCEPT servers already run PHP 5.3, mainly the ones we built in 2010 still have PHP 5.2 installed. The PHP project stopped its support for PHP 5.2 in August 2011 and we’ve decided, for obvious maintenance and security reasons, to upgrade all older servers from PHP 5.2.10 to PHP 5.3.2 (original Ubuntu 10.04 LTS packages).
We had already announced this upgrade for 2012–03–03, but decided to cancel it on short notice. We figure that doing the upgrade during the week is a better choice because it gives us and our customers a better opportunity to quickly spot and eliminate problems caused by the upgrade.
If you have any questions regarding this upgrade, please let us know!
This change affects the DrupalCONCEPT clusters “pro07” to “pro19”.
On our newer DrupalCONCEPT servers, we already have an improved website deployment concept in place that reflects our learnings from the first months of operation. Now it’s time to roll out these improvements also onto those servers that have been running since early on.
Mainly, the following advantages come with the improved deployment concept:
*A different approach to updating the Drupal installation after a Git repository change prevents failing merge processes that block further updates.
*A modified set of permissions removes Drupal’s write access to the directories controlled by Git. This eliminates problems where file changes made by Drupal cause merge conflicts.
Please be aware that the second change will limit write access to the asset directories under sites/…/files/. We advise you to change your configuration if your Drupal installation still does file modifications somewhere outside of these asset directories. Please let us know if you need help correcting this behaviour.
If you have any questions regarding these changes, please get in touch with our tech support. We’ll do our best to find a quick solution.
17 Mar 2012
The world’s biggest trade fair for the ICT industry — CeBIT - will take place in Hannover starting this Tuesday, March 6, until Friday, March 10.
We are sponsoring and actively supporting the Drupal booth and I’d be very happy to meet you! I’ll be there from Tuesday to Thursday; just drop me a line when you’d like to get together.
PS: We’ve still got some free tickets left, so let me know if you need one!
05 Mar 2012
Yesterday morning, my work day started with 7 SMS alerts. They did not originate from an IT infrastructure outage, though. They were escalated support tickets from the weekend.
But let’s take one step back. We regularly ask our customers what we can improve to make their lives easier. Recently, the most common answer was “Shorter waiting times on support requests”. We analyzed our ticket resolution times and found that we needed not only to resolve support requests much faster but also to keep our customers better in the loop about the state of their issue. The latter gets especially important if a support requests needs extensive research or the involvement of a third party. And much too often, even small tasks took longer to resolve than necessary, mostly because the bigger ones drew all our attention.
So we went to work and built a ticket escalation process.
We created the role of the Ticket Dispatcher, whose responsibility it is that new tickets get assigned to a team member quickly. This role is assumed by our Sysadmin of the Day, the current on-call engineer. If a ticket doesn’t get assigned within an hour, it’s fed into our alerting system which means that its assignment is now handled as an operations incident, with SMS and phone alerts and all.
From now on, open tickets must be updated at least once in 24 hours. If this isn’t the case, the agent the ticket is assigned to gets notified. If they fail to update the ticket for another 24 hours, the ticket’s assignment is removed so another agent can take care of it.
In case we need customer feedback, the requester gets reminded twice that they need to provide additional information to get their issue resolved. If there’s no update after 7 days, the support agent is prompted to contact the customer directly and ask for the information needed.
We’ve introduced this new ticket escalation process a week ago and already, our ticket queue is shrinking rapidly. We still need to get used to the ticket alarms, but they’re proof that the process is working.
Providing great customer service is our main concern. We’d like to apologize for any unnecessary waiting times we’ve caused in the past and are optimistic that the new escalation process will help us eliminate them. Ticket by ticket.
What do you think? What else can we do to improve your support experience? Feel free to give us some feedback in the comments!
07 Feb 2012
About two weeks ago, we held our periodic strategy days where we thought about and discussed where we can improve in order to make the lives of our customers and of ourselves better. One central point became very clear from the beginning: We have to improve our support processes. So what you will see over the coming weeks will be changes to the way we work and how we handle support.
The first step in this process is to be more transparent about problems and incidents with our Drupal Hosting platform.
Until now, we used to announce incidents like power outages, network problems or urgent maintenance work in a forum on our Help Center website. But it was hard to find there and not the easiest way for us to publish urgent information.
That is why we built a status blog for freistil IT .
There, we will keep you up-to-date on all the incidents our operations team is currently working on. You can subscribe to it using its RSS feed or follow @freistilops on Twitter. Very soon, we will add email notifications as well. So, simply pick the method that fits your needs best.
If you can think of any way to improve this, please tell us by leaving a comment below or send us an email!
27 Jan 2012
Yesterday, Eugen Mayer of KontextWork told me on IRC that the download archive for the QTip2 JQuery plugin had been compromised and that there are now QTip2 versions with exploit code in the wild. As discussed on Github, someone hacked the QTip2 website and added malicious code.
This can also affect Drupal users because QTip2 is a popular JQuery plugin and can be easily integrated in Drupal projects, for example with the QTip module.
So, if you’re using QTip2, especially if you downloaded the plugin between December 8th 2011 and January 10th 2012, we recommend you get a clean current version as soon as possible.
25 Jan 2012
While our newer DrupalCONCEPT clusters already run PHP 5.3, mainly the ones we built in 2010 still have PHP 5.2 installed. The PHP project stopped its support for PHP 5.2 in August 2011 and we’ve decided, for obvious maintenance and security reasons, to upgrade all our clusters to PHP 5.3 .
Affected by this upgrade are all DrupalCONCEPT clusters from pro07 to pro25 and elite4 to elite6 .
We’ll upgrade those clusters from PHP version 5.2.10 to 5.3.2 (using the original Ubuntu 10.04 LTS packages).
All upgrades will take place over night from Saturday, 2012–03–03 23:00 CET , to Sunday, 2012–03–04 05:00 CET.
To prevent problems and outages, we urge all customers that are using the clusters mentioned above to test their websites with PHP 5.3 and to install necessary Drupal and module upgrades in good time before the upgrade deadline. We recommend to run those tests in a local test environment, better yet on a staging website instance on the DrupalCONCEPT hosting platform.
If you have any questions regarding this upgrade, please let us know! We’ll be more than happy to help you make the change.
24 Jan 2012