Ansible, Puppet, Chef: The Configuration Management Wars

I manage a growing fleet of Linux servers, and the question I keep coming back to is: how do I keep them consistent? When you have five servers, you can SSH into each one and configure them by hand. When you have fifty, that stops working. When you have five hundred, it is impossible.

Configuration management tools solve this problem. You describe what your servers should look like in code, and the tool makes it happen. But which tool? Right now, there are three major contenders: Puppet, Chef, and the new kid, Ansible. I have been using all three in my lab environment, and I want to share what I have found.

The Problem

Let me describe the problem concretely. I have a fleet of web servers that need to be identically configured. They need:

A specific version of nginx installed and configured
Application code deployed to the right directory
SSL certificates in the right place
Log rotation configured
Monitoring agents installed and running
Firewall rules set correctly
System packages updated to specific versions

When one of these things drifts on a single server, bad things happen. A user hits the one server with an outdated config and sees an error. A security patch is missing on three servers out of fifty. The monitoring agent crashes on a server and nobody notices for a week because nobody checks manually.

Configuration management tools solve this by treating infrastructure as code. You write a description of what the server should look like, and the tool enforces it. If something drifts, the tool corrects it. If you need to change something, you change the code and apply it to every server simultaneously.

Puppet

Puppet has been around since 2005, making it the oldest of the three. It uses its own declarative language (Puppet DSL) to describe system state.

A Puppet manifest for installing and configuring nginx looks something like this:

class nginx {
  package { 'nginx':
    ensure => installed,
  }

  file { '/etc/nginx/nginx.conf':
    ensure  => file,
    content => template('nginx/nginx.conf.erb'),
    require => Package['nginx'],
    notify  => Service['nginx'],
  }

  service { 'nginx':
    ensure => running,
    enable => true,
  }
}

The declarative model is Puppet's greatest strength. You do not tell Puppet how to install nginx; you tell it that nginx should be installed. Puppet figures out whether to use apt-get or yum based on the operating system. You describe the desired state, and Puppet makes it so.

The architecture is client-server. A Puppet master holds the configuration, and Puppet agents on each managed node pull their configuration periodically (every 30 minutes by default), apply it, and report back. This pull-based model scales well but introduces a time delay between making a change and seeing it applied.

The downsides: Puppet's DSL has a learning curve. It is not a general-purpose programming language, and when you need to do something that the DSL was not designed for, you end up writing custom types and providers in Ruby, which is a different kind of complexity. The master-agent architecture requires infrastructure: you need to run and maintain the Puppet master itself.

Chef

Chef launched in 2009 and took a different philosophical approach. Where Puppet uses a custom DSL, Chef uses Ruby. Your configuration is written in Ruby code, which gives you the full power of a programming language.

The same nginx configuration in Chef:

package 'nginx' do
  action :install
end

template '/etc/nginx/nginx.conf' do
  source 'nginx.conf.erb'
  notifies :restart, 'service[nginx]'
end

service 'nginx' do
  action [:enable, :start]
end

Chef's use of Ruby is both its strength and its weakness. If you know Ruby, Chef feels natural and powerful. You can use loops, conditionals, data structures, and libraries. If you do not know Ruby, Chef requires you to learn a programming language before you can manage your servers.

Chef also uses a client-server architecture. The Chef Server stores cookbooks (Chef's term for configuration packages), and Chef clients on each node pull and apply them. Like Puppet, this means running and maintaining a server component.

The Chef community is active, and the ecosystem of community cookbooks is substantial. Need to configure MySQL, PostgreSQL, Java, Apache, or almost any common service? There is probably a community cookbook for it.

Ansible

Ansible is the newest of the three, released in 2012. It takes a radically different approach, and it is the one I find most interesting.

The same nginx configuration in Ansible:

- name: Configure nginx
  hosts: webservers
  tasks:
    - name: Install nginx
      apt:
        name: nginx
        state: present

    - name: Configure nginx
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: restart nginx

    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: restart nginx
      service:
        name: nginx
        state: restarted

YAML. That is what Ansible uses for its configuration. Not a custom DSL, not a programming language. YAML. If you can read a configuration file, you can read an Ansible playbook.

But the most important difference is architectural: Ansible is agentless. There is no server component to install and maintain. There is no agent to install on managed nodes. Ansible connects to your servers over SSH, executes commands, and disconnects. The only requirement on the managed node is Python and SSH, both of which are present on virtually every Linux server already.

This is a huge practical advantage. Setting up Puppet or Chef requires deploying and maintaining infrastructure for the tool itself. Setting up Ansible means installing it on your laptop and writing a playbook. You can go from zero to managing servers in minutes, not hours.

The push-based model means changes are applied when you run the playbook, not on a periodic schedule. You make a change, you run the playbook, and the change is applied immediately across your fleet. No waiting for agents to check in.

My Experience

I have been running all three in parallel on a test environment of about twenty servers. Here are my observations.

Learning curve: Ansible is the easiest to pick up. I had a working playbook managing my test servers within an afternoon. Puppet took about two days to get comfortable with the DSL. Chef took the longest because I had to learn enough Ruby to be productive.

Day-to-day operations: Ansible feels the most natural for ad-hoc tasks. Need to check disk space on all servers? Write a one-line Ansible command. Need to deploy an emergency patch? Run a playbook. The SSH-based, push-model approach fits how I already think about server management.

Scale concerns: Puppet and Chef's pull-based model should theoretically scale better for very large environments. If you have thousands of servers, pushing configuration over SSH to each one sequentially could be slow. Ansible has some parallelism, but it is not unlimited.

Community and ecosystem: Puppet and Chef have more mature ecosystems with more community modules and cookbooks. Ansible is newer and growing fast, but the library of available roles is smaller today.

What I Am Choosing

For my infrastructure, I am gravitating toward Ansible. The agentless architecture eliminates an entire category of operational complexity. The YAML-based playbooks are readable by anyone on the team, not just the person who wrote them. The push model gives me immediate feedback.

I think the configuration management landscape is going to evolve rapidly. All three tools are actively developed. Puppet and Chef have established enterprises and communities behind them. Ansible has momentum and simplicity on its side.

The real lesson is not which tool is "best." It is that infrastructure as code is no longer optional. If you are managing servers by hand, by SSH-ing in and running commands, you are accumulating technical debt with every change. Configuration management tools, whichever one you choose, bring discipline, repeatability, and auditability to infrastructure work.

Pick one and start using it. The differences between the tools matter less than the difference between using any tool and using none at all.