Years ago, I did some time helping companies plan and build networks and computer infrastructure. Since then, in several companies, I’ve done my share of operations and IT work, but more recently I’ve found that I prefer to delegate that work to someone who finds it more interesting. I’ll be honest; I find the ever-changing complexity of systems configuration to be a frustrating, never-ending learning problem. (This particularly applies to mail server management.)
Over the past month I’ve been playing with Chef and the Opscode Platform and found that I absolutely love learning a new framework that gives me new development tools for solving an existing problem. This means that my argument, above, is entirely irrational. Let me put that another way: Chef is damn cool.
What is it?
Briefly, Chef lets you create configurations and write Ruby (-ish) code to define your operations infrastructure. Opscode Platform gives you a place to keep all that code and configurations (you can also choose to run this in-house if that makes sense). Finally, there’s the public cookbook repository, where people share their “Recipes” for managing parts of the infrastructure.
Opscode Platform and the public cookbooks are still a work-in-progress (they’re still in Beta, as of this writing), but the Opscode folks and the 50 companies authorized to commit to the public project are working rapidly to stabilize, normalize and extend it. I’ve struggled with some of the core concepts, but once I got through them, I love what I can do.
How do you use it?
Let’s say I’m running a project that needs expansion capabilities in the cloud. I have my configurations (“Recipes” and “Roles”) complete and my Amazon Web Services account configured and connected to my Opscode/Chef installation. I need a new machine now, that is just like the current web service processor that we’re running:
knife ec2 server create 'role[web_service]'
Three minutes lates I have deployed a new system with all software installed and configured automatically from “bare metal”. This isn’t an image — it’s installing the software through commands.
Images are faster and there’s a place for images in scalable infrastructures. Images also need to change and there’s a place for Chef. Chef lets you incrementally update your images.
Or, for example, let’s say I wanted to spin up that same system on Rackspace:
knife rackspace server create 'role[web_service]'
You can’t do that with images.
Initially, I wanted the Opscode Platform to be more like Scalr; that is, to choose configurations and deploy nodes that just worked. It doesn’t work that way and mostly that’s because the Opscode team really understands what real, large-scale systems management is like.
A core concept in the framework is that any public Cookbook that you wish to use, you branch (with git) before adding to your own repository. That’s because no Recipe is ever going to work-out-of the box for everyone. Even if it does work out-of-the-box for you, chances are it won’t always. Being able to branch the Cookbooks means you can modify them for your environment, but still pull/merge updates as the Cookbook is enhanced by the author. It also means you can make your modifications available to the author so that the Recipes become more generally useful and are built with a community.
It’s hard to produce good documentation for API’s much less tools and services. Opscode is working on that, but at the time of writing, there are two or three locations where they have documentation, making it hard to find pertinent information. Several times, I’ve had to dig into the code (which, thankfully is open-source) to figure out how things are supposed to work. To be fair, I also have done this with every other framework that I’ve ever used. I used to keep a copy of Mono around so I could see how .NET interfaces might be coded because it helped me understand what I was supposed to be doing to use them.
While it’s easier to write a chapter in a book about it than to actually build intention revealing interfaces, that’s a challenge that Opscode has ahead of them. I should be clear that I come from a software development background and probably have vastly different expectations about interfaces than most of the folks in their target market. (For example, I prefer objects and inheritance to procedural code.) When getting started, be sure to read the material on the basics. The core concepts are important. Also, keep a copy of the Knife man page on-hand; it’s chock full of important details.
The value proposition
Chef and the Opscode Platform are broad enough that there are lots of different value opportunities. Here are the three that I think are most important, at the executive level.
1. If your infrastructure is done with Chef then you have the ultimate distaster-recovery plan in place. If anything should fail, you can role out all (or part) of your infrastructure with a few quick commands. This includes re-building it from “bare metal” at a new provider.
2. Chef is a way of documenting your infrastructure through code. This means that it not only provides a clear description of how your infrastructure is built it (hopefully) matches your infrastructure and has the added benefit being tracked in version control.
3. Opscode Platform is an off-site, highly available storage location for your critical IT knowledge.
I’m hard-pressed, at this point, to consider running an IT environment without Chef, at any scale. Even if it’s one server and two dev boxes, starting with Chef means you’ll scale with Chef.