Chef vs Ansible: Which Tool?

I have spent the last month evaluating configuration management tools, and I have narrowed it down to two: Chef and Ansible. Both are excellent. Both could solve our problems. But they are fundamentally different in philosophy, and choosing between them is harder than I expected.

Let me share what I have learned from actually using both, not just reading documentation.

Why We Need Configuration Management

Right now, our server configuration process is a collection of shell scripts and manual procedures. It works, but it does not scale. When you manage fifty servers, scripts are fine. When you manage hundreds, you need something more structured.

Configuration management tools let you define the desired state of your servers in code. Instead of writing a script that says "install Apache, then edit this config file, then restart the service," you write a declaration that says "Apache should be installed, this config file should contain these lines, and the service should be running." The tool figures out how to make reality match your declaration.

This is a subtle but powerful distinction. Scripts describe steps. Configuration management describes outcomes. If Apache is already installed, a script might try to install it again and fail. A configuration management tool checks, sees it is already there, and moves on.

Chef: The Ruby-Powered Workhorse

Chef has been around since 2009 and has significant adoption in the enterprise. It is written in Ruby, and its configuration language (called recipes, organized into cookbooks) is essentially Ruby code with a DSL on top.

Here is what a simple Chef recipe looks like:

package 'httpd' do
  action :install
end

service 'httpd' do
  action [:enable, :start]
end

template '/etc/httpd/conf/httpd.conf' do
  source 'httpd.conf.erb'
  notifies :restart, 'service[httpd]'
end

The good things about Chef: it is mature, well-documented, and has a large community. The cookbook ecosystem (their equivalent of plugins) is extensive. Opscode, the company behind Chef, provides a hosted Chef server that eliminates the operational overhead of running your own.

The bad things: it requires a dedicated Chef server as a central management point. Every managed node runs a Chef client that periodically connects to the server, downloads its configuration, and applies it. Setting up and maintaining the Chef server is a non-trivial task. It requires PostgreSQL, Solr, RabbitMQ, and several other components.

And then there is Ruby. I do not dislike Ruby, but my team does not know Ruby. Asking them to learn a new programming language to manage servers is a significant ask. Yes, you can write basic Chef recipes without deep Ruby knowledge, but the moment you need to do something complex, you are writing Ruby.

Ansible: The SSH-Based Newcomer

Ansible was released in 2012 by Michael DeHaan (who previously created Cobbler and worked on Puppet's Func project). It takes a radically different approach.

There is no central server. No agents on managed nodes. Ansible connects to your servers over SSH, which is already there, and executes tasks. Your configuration is written in YAML, which is arguably the simplest configuration language in existence.

Here is the equivalent Ansible playbook:

- name: Install Apache
  yum: name=httpd state=present

- name: Configure Apache
  template: src=httpd.conf.j2 dest=/etc/httpd/conf/httpd.conf
  notify: restart httpd

- name: Start Apache
  service: name=httpd state=started enabled=yes

The good things about Ansible: zero infrastructure required beyond SSH. My entire team already knows YAML (or can learn it in an afternoon). The learning curve is dramatically flatter than Chef. And the agentless architecture means no additional software to install and maintain on every managed node.

The bad things: it is young. The community is smaller than Chef's, though growing rapidly. The module ecosystem is less mature. Performance can be a concern because SSH-based execution is inherently slower than a local agent. And because there is no persistent agent, Ansible only enforces configuration when you run it, not continuously.

My Testing Process

I did not just read documentation. I set up both tools in our lab environment and used them to manage a set of ten test servers. Same tasks in both: install base packages, configure NTP, set up user accounts, harden SSH, configure iptables, install and configure the Nagios agent.

Chef took me three days to get fully operational. One day to set up the Chef server (including the PostgreSQL and Solr dependencies). One day to write the cookbooks. One day to troubleshoot issues, mostly related to the Chef client configuration on the nodes and some Ruby syntax errors in my recipes.

Ansible took me four hours. I installed it on my workstation with pip, wrote the playbook, created an inventory file listing my servers, and ran it. It just worked. When something failed, the error messages told me exactly what went wrong and on which server.

The Decision Factors

For a large organization with a dedicated DevOps team and existing Ruby expertise, Chef is probably the better choice. It is more powerful, more mature, and the continuous enforcement model (where the agent runs every thirty minutes and corrects any drift) is valuable for compliance-heavy environments.

For my team, right now, Ansible is the clear winner. Here is why:

No new infrastructure. We are already stretched thin managing client servers. Adding a Chef server to our operational burden is not attractive.

No new language. My team knows Linux, knows bash, and can learn YAML in a day. Ruby would take weeks to months of productive learning.

Immediate value. I can start using Ansible in production today. It works over SSH, which we already have configured with key-based authentication across our entire fleet.

Lower risk. If Ansible does not work out, we have lost nothing. No servers to decommission, no agents to remove, no infrastructure to tear down. If Chef does not work out, we have a Chef server and agents on every node that need to be cleaned up.

What I Am Missing

I should mention that I also looked briefly at Puppet, which is the oldest and most established tool in this space. Puppet has a lot going for it, particularly its mature ecosystem and strong enterprise support. But its custom DSL felt unnecessarily complex compared to Ansible's YAML, and the agent-based architecture has the same overhead issues as Chef.

I also looked at SaltStack, which is interesting because it uses a message bus (ZeroMQ) for communication instead of SSH. This makes it faster than Ansible at scale. But it is the youngest of the four tools and has the smallest community.

Moving Forward

I am going with Ansible. I have already started converting our post-install scripts into Ansible playbooks. The first playbook handles base server configuration, and it runs against our entire fleet in about twelve minutes.

Twelve minutes to ensure that hundreds of servers have consistent NTP configuration, consistent SSH hardening, consistent user accounts, and consistent monitoring setup. That used to take days of manual work and was never truly consistent.

I will write more about our Ansible journey as it progresses. For now, I am just grateful that someone built a configuration management tool that respects the Unix philosophy: do one thing well, use existing tools (SSH), and keep it simple.