|6 min read

The Linux Admin Life: RHEL in Production

What day-to-day Linux system administration actually looks like when real systems depend on you

I have been doing some part-time Linux administration work for a small company, and let me tell you: managing production Linux servers is a completely different animal from tinkering with Ubuntu on your laptop.

The gap between "I can install Linux and browse the web" and "I can keep production systems running at 3 AM when the monitoring system starts screaming" is enormous. I am learning this the hard way, and I want to share some of what I have picked up.

RHEL vs Ubuntu: A Different World

At home and in college, I use Ubuntu. It is friendly, it has great community support, and it is perfect for learning. But in production environments, Red Hat Enterprise Linux (RHEL) and CentOS dominate. There are good reasons for this.

RHEL comes with commercial support. When your production database goes down at 2 AM and you cannot figure out why, you can call Red Hat and get an engineer on the phone. That matters when real money is on the line. Ubuntu is great, but "ask on the forums" is not an acceptable support model when your e-commerce site is losing thousands of dollars per hour of downtime.

The differences between distributions are deeper than the package manager. RHEL uses yum instead of apt-get. The init system is different. Configuration file locations are different. SELinux is enabled by default and will absolutely ruin your day if you do not understand it. I spent an entire afternoon once trying to figure out why Apache could not read files from a directory. The permissions were fine. The ownership was fine. It was SELinux, silently denying access. That was a humbling experience.

A Day in the Life

Let me walk you through what a typical day looks like.

Morning: Check monitoring dashboards. Look at Nagios alerts from overnight. Anything red? If yes, triage immediately. If no, check the yellow warnings. Disk space trending up on the database server? Make a note. CPU usage spiking at a certain time? Investigate.

Midday: Planned work. Apply security patches to staging servers first, test everything, then schedule production patches for the maintenance window. Update documentation. Review logs for anomalies. Write scripts to automate repetitive tasks.

Afternoon: Respond to requests. Developer needs a new testing environment. Manager wants a report on server utilization. Someone forgot their password. Security team wants to know why port 8080 is open on a server.

Evening/Night: Maintenance windows. The big changes happen when traffic is low. Database migrations, kernel upgrades, network changes. These are the changes that make you nervous because if something goes wrong, you are the one who has to fix it.

The Art of Troubleshooting

The most valuable skill I am developing is troubleshooting. Not just googling error messages (though that is part of it), but systematic problem diagnosis.

Here is my current approach when something breaks:

  1. Do not panic. This is harder than it sounds at 3 AM.
  2. What changed? Nine times out of ten, something changed recently. A deployment, a configuration change, a patch. Find out what changed and you are halfway to the answer.
  3. Check the logs. /var/log/messages, /var/log/syslog, application-specific logs. The answer is almost always in the logs if you know where to look.
  4. Reproduce if possible. Can you trigger the problem reliably? If yes, you can test solutions. If no, you are dealing with an intermittent issue, which is much harder.
  5. Google it. Seriously. Someone has probably hit this exact problem before. Stack Overflow, mailing list archives, vendor knowledge bases. There is no shame in searching for answers.
  6. Document the solution. This is the step everyone skips, and it is the most important one. If you do not write down what happened and how you fixed it, you will waste time solving the same problem again in six months.

Scripts Are Your Best Friend

One of the first things I learned is that good system administrators are lazy. Not the bad kind of lazy, the productive kind. If you do something more than twice, automate it.

I have been writing Bash scripts for everything. Backup scripts, monitoring scripts, deployment scripts, cleanup scripts. A well-written script does not make mistakes. It does not forget steps. It does not get tired at 3 AM.

Here is something I wish I had learned earlier: always include error handling in your scripts. A script that continues running after an error can do more damage than the original problem. set -e in Bash stops execution on errors, and it has saved me from disaster more than once.

I am also starting to learn about configuration management tools like Puppet. The idea of defining your server configuration as code, and having a tool automatically enforce that configuration, is incredibly powerful. When you are managing three servers, manual configuration is fine. When you are managing thirty or three hundred, you need automation.

The On-Call Experience

Being on-call is a unique experience. You go to bed knowing that your phone might ring at any moment. You keep your laptop charged and within reach. You learn to sleep with one ear listening for the alert tone.

My first on-call incident was terrifying. The monitoring system alerted that a web server was down. My heart was pounding. I SSH-ed in (after fumbling with the VPN for what felt like an eternity), checked the logs, found that Apache had crashed due to a memory leak in a PHP application, restarted the service, and monitored it for the next hour.

The whole thing took about twenty minutes. It felt like twenty hours.

But here is the thing: after a few incidents, you develop a calm. You build muscle memory. You know the commands, you know the systems, you know the common failure modes. The panic fades and gets replaced by a methodical, systematic approach.

What I Wish I Had Learned in College

College taught me computer fundamentals, and I am grateful for that. But there is a massive gap between academic knowledge and operational skills. Things I wish they had taught:

  • How to read and interpret system logs
  • Network troubleshooting (tcpdump, netstat, traceroute)
  • Process management (what happens when you fork, what a zombie process actually is)
  • Disk and filesystem management in practice
  • How DNS actually works (it is always DNS, or so the saying goes)
  • Version control for configuration files

I have been learning all of this on the job and through self-study. It is the kind of knowledge that only comes from getting your hands dirty with real systems.

Why I Love It

Despite the stress, the odd hours, and the constant learning curve, I genuinely love this work. There is something deeply satisfying about keeping systems running. About diagnosing a subtle problem that had everyone stumped. About writing a script that saves hours of manual work every week.

Linux administration is not glamorous. Nobody outside of IT knows what you do or why it matters. But every website they visit, every app they use, every email they send is running on servers that someone like me is keeping alive.

And honestly? That is pretty cool.

Share: