Idempotent Linux Server Security, DevSecOps by Chris Binnie

Idempotency With Ansible

I think it’s safe to say that the need to frequently update the packages on our machines has been firmly drilled into us. To provide us with the use of the latest features, and also keep security bugs to a minimum, skilled engineers and even desktop users are well-versed in the need to update their software.

Hardware, software and SaaS (Software as a Service) vendors have also firmly embedded the word “firewall” into our vocabulary for both domestic and industrial uses to protect our computers. However in my experience even within potentially more sensitive commercial environments few engineers actively tweak the Operating System they’re working on, to any great extent at least, to bolster security.

Standard fare for example on Linux Operating Systems (OSs) might mean looking at configuring a larger swap file to cope with your hungry application’s demands. Or, maybe adding a separate volume to your server for extra disk space, specifying a more performant CPU at launch time, installing a few of your favourite DevOps tools or chucking a couple of certificates onto the filesystem for each new server you build. This isn’t quite the same thing.

What I am specifically referring to, you’d be forgiven for asking, is a mixture of compliance and security I suppose. In short there’s sometimes a surprisingly large number of areas that a default OS can improve its security posture in. Take it for read that tweaking certain aspects of an OS are a little riskier than others. Consider your network stack for example. Imagine that completely out of the blue your server’s networking, six months after the server was built, suddenly does something unexpected and causes you troubleshooting headaches or even some downtime. This might happen for example because a new application or updated package suddenly expects routing to behave in a less-common way or needs a specific protocol enabled to function correctly.

The more welcome news however is there’s a large number of changes that you can make to your servers without suffering any sleepless nights. The version and flavour of an OS helps determine which changes and to what extent you might want to comfortably make. Most importantly though what’s good for the goose is rarely good for the gander. In other words every single server estate has different, both broad and subtle, requirements which makes each use case unique. And, don’t forget that a database server also has very different needs to a Web server so you can have a number of differing needs even within one small cluster of servers.

Over the last few years I’ve introduced these hardening and compliance tweaks more than a handful of times across varying server estates in my DevSecOps roles. The OSs have included: Debian, Red Hat Enterprise Linux (RHEL) and their respective derivatives (including what I suspect will be the increasingly popular RHEL derivative, Amazon Linux). There have been times that, admittedly including a multitude of relatively tiny tweaks, the number of changes to a standard server build was into the hundreds. It all depended on the time permitted for the work, the appetite for any risks and the generic or specific nature of the OS tweaks.

In this article we’ll discuss the theory around something called idempotency which, in hand with an automation tool such as Ansible, can provide the ongoing improvements to your server estate’s security posture. For good measure we’ll also look at a number of Ansible playbook examples and additionally refer to online resources so that you can introduce idempotency to a server estate near you.

Say What?

In simple terms the word “idempotent” just means returning something back to how it was prior to a change. It can also mean that lots of things you wanted to be the same, for consistency, are exactly the same too.

Picture that in action for a moment on a server estate; we’ll use AWS (Amazon Web Services) as our example. You create a new server image (Amazon Machine Images == AMIs) precisely how you want it with compliance and hardening introduced, custom packages, the removal of unwanted packages, SSH keys, user accounts etc and then spin up twenty servers using that AMI.

You know for certain that all the servers, at least at the time that they are launched, are absolutely identical. Trust me when I say that this is a “good thing” ™. The lack of what’s known as “config drift” means that if one package on a server needs updated for security reasons then all the servers need that package updated too. Or if there’s a typo in a config file that’s breaking an application then it affects all servers equally. There’s less administrative overhead, less security risk and greater levels of predictability in terms of achieving better uptime.

What about config drift from a security perspective? As you’ve guessed it’s definitely not welcome. That’s because engineers making manual changes to a “base OS build” can only lead to heartache and stress. The predictability of how a system is working suffers greatly as a result and servers running unique config become less reliable. These server systems are known as “snowflakes” as they’re unique but far less beautiful than when it’s snowing.

Equally an attacker might have managed to breach one aspect, component or service on a server but not all of its facets. By rewriting our base config again and again we’re able to, with 100% certainty (if it’s set up correctly), predict exactly what a server will look like and therefore how it will perform. Using various tools you can also trigger alarms if changes are detected to request that a pair of human eyes have a look to see if it’s a serious issue and then adjust the base config if needed.

To make our machines idempotent we might overwrite our config changes every twenty or thirty minutes for example. When it comes to running servers, that in essence, is what is meant by idempotency.

Central Station

My mechanism of choice for repeatedly writing config across a large number of servers is running Ansible playbooks. It’s relatively easy to implement and removes the all-too-painful additional logic required when using shell scripts. Of the popular configuration management tools I’ve seen in action is Puppet, used successfully on a large government estate in an idempotent manner, but I prefer Ansible due to its more logical syntax (to my mind at least) and its readily available documentation.

Before we look at some simple Ansible examples of hardening an OS with idempotency in mind we should explore how to trigger our Ansible playbooks.

This is a larger area for debate than you might first imagine. Say, for example, you have nicely segmented server estate with production servers being carefully locked away from development servers, sitting behind a production-grade firewall. Consider the other servers on the estate, belonging to staging (pre-production) or other development environments, intentionally having different access permissions for security reasons.

If you’re going to run a centralised server that has superuser permissions (which are required to make privileged changes to your core system files) then that server will need to have high-level access permissions potentially across your entire server estate. It must therefore be guarded very closely.

You will also want to test your playbooks against development environments (in plural) to test their efficacy which means you’ll probably need two all-powerful centralised Ansible servers, one for production and one for the multiple development environments.

The actual approach of how to achieve other logistical issues is up for debate and I’ve heard it discussed a few times. Bear in mind that Ansible runs using plain, old SSH keys (a feature that some other configuration management tools have started to copy over time) but ideally you want a mechanism for keeping non-privileged keys on your centralised servers so you’re not logging in as the “root” user across the estate every twenty minutes or thirty minutes.

From a network perspective I like the idea of having firewalling in place to enforce one-way traffic only into the environment that you’re affecting. This protects your centralised host so that a compromised server can’t attack that main Ansible host easily and then as a result gain access to precious SSH keys in order to damage the whole estate.

Speaking of which, are servers actually needed for a task like this? What about using AWS Lambda to execute your playbooks? A serverless approach stills needs to be secured carefully but unquestionably helps to limit the attack surface and also potentially reduces administrative responsibilities.

I suspect how this all-powerful server is architected and deployed is always going to be contentious and there will never be a one-size-fits-all approach but instead a unique, bespoke solution will be required for every server estate.

How Now, Brown Cow

It’s important to think about how often you run your Ansible and also how to prepare for your first execution of the playbook. Let’s get the frequency of execution out of the way first as it’s the easiest to change in the future.

My preference would be three times an hour or instead every thirty minutes. If we include enough detail in our configuration then our playbooks might prevent an attacker gaining a foothold on a system as the original configuration overwrites any altered config. Twenty minutes seems more appropriate to my mind.

Again, this is an aspect you need to have a think about. You might be dumping small config databases locally onto a filesystem every sixty minutes for example and that scheduled job might add an extra little bit of undesirable load to your server meaning you have to schedule around it.

Shopping List

Let’s get a little more hands-on now. You will need some Ansible experience before being able to make use of the information that follows. Rather than run through the installation and operation of Ansible let’s instead look at some of the idempotency playbook’s content.

As mentioned earlier there might be hundreds of individual system tweaks to make on just one type of host so we’ll only explore a few suggested Ansible tasks and how I like to structure the Ansible role responsible for the compliance and hardening. You have hopefully picked up on the fact that the devil is in the detail and you should absolutely, unequivocally, understand to as high a level of detail as possible, about the permutations of making changes to your server OS.

Be aware that I will mix and match between OSs in the Ansible examples that follow. Many examples are OS agnostic but as ever you should pay close attention to the detail. Obvious changes like “apt” to “yum” for the package manager is a given.

Inside a “tasks” file under our Ansible “hardening” role, or whatever you decide to name it, these named tasks represent the areas of a system with some example code to offer food for thought. In other words each section that follows will probably be a single YAML file, such as “accounts.yml”, and each will have with varying lengths and complexity. Let’s look at some examples with ideas about what should go into each file to get you started. The contents of each file that follow are just the very beginning of a checklist and the following suggestions are far from exhaustive.

SSH Server

This is the application that almost all engineers immediately look to harden when asked to secure a server. It makes sense as SSH (the OpenSSH package in many cases) is usually only one of a few ports intentionally prised open and of course allows direct access to the command line. The level of hardening that you should adopt is debatable. I believe in tightening the daemon as much as possible without disruption and would usually make around fifteen changes to the standard OpenSSH server config file, “sshd_config”. These changes would include pulling in a MOTD banner (Message Of The Day) for legal compliance (warning of unauthorised access and prosecution), enforcing the permissions on the main SSHD files (so they can’t be tampered with by lesser-privileged users), ensuring the “root” user can’t log in directly, setting an idle session timeout and so on. Here’s a very simple Ansible example that you can repeat within other YAML files later on, focusing on enforcing file permissions on our main, critical OpenSSH server config file. Note that you should carefully check every single file that you hard-reset permissions on before doing so. This is because there are horrifyingly subtle differences between Linux distributions. Believe me when I say that it’s worth checking first.

 - name: Hard reset permissions on sshd server file
   file: owner=root group=root mode=0600 path=/etc/ssh/sshd_config

To check existing file permissions I prefer this natty little command for the job:

$ stat -c "%a %n" /etc/ssh/sshd_config 644 /etc/ssh/sshd_config

As our “stat” command shows our Ansible snippet would be an improvement to the current permissions because 0600 means only the “root” user can read and write to that file. Other users or groups can’t even read that file which is of benefit because if we’ve made any mistakes in securing SSH’s config they can’t be discovered as easily by less-privileged users.

System Accounts

At a simple level this file might define how many users should be on a standard server. Usually a number of users who are admins have home directories with public keys copied into them. However this file might also include performing simple checks that the root user is the only system user with the all-powerful superuser UID 0; in case an attacker has altered user accounts on the system for example.

Kernel

Here’s a file that can grow arms and legs. Typically I might affect between fifteen and twenty sysctl changes on an OS which I’m satisfied won’t be disruptive to current and, all going well, any future uses of a system. These changes are again at your discretion and, at my last count (as there’s between five hundred and a thousand configurable kernel options using sysctl on a Debian/Ubuntu box) you might opt to split off these many changes up into different categories.

Such categories might include network stack tuning, stopping core dumps from filling up disk space, disabling IPv6 entirely and so on. Here’s an Ansible example of logging network packets that shouldn’t been routed out onto the Internet, namely those packets using spoofed private IP Addresses, called “martians”.

- name: Keep track of traffic that shouldn’t be routed onto the Internet
   lineinfile: dest="/etc/sysctl.conf" line="{{item.network}}" state=present
   with_items:
     - { network: 'net.ipv4.conf.all.log_martians = 1' }
     - { network: 'net.ipv4.conf.default.log_martians = 1' }

Pay close attention that you probably don’t want to use the file “/etc/sysctl.conf” but create a custom file under the directory “/etc/sysctl.d/” or similar. Again, check your OS’s preference, usually in the comments of the pertinent files. If you’ve not seen martian packets being enabled before then type “dmesg” (sometimes only as the “root” user) to view kernel messages and after a week or two of logging being in place you’ll probably see some traffic polluting your logs. It’s much better to know how attackers are probing your servers than not. A few log entries for reference can only be of value. When it comes to looking after servers, ignorance is certainly not bliss.

Network

As mentioned you might want to include hardening the network stack within your kernel.yml file, depending on whether there’s many entries or not, or simply for greater clarity. For your network.yml file have a think about stopping old-school broadcast attacks flooding your LAN and ICMP oddities from changing your routing in addition.

Services

Usually I would stop or start miscellaneous system services (and potentially applications) within this Ansible file. If there weren’t many services then rather than also using a “cron.yml” file specifically for “cron” hardening I’d include those here too.

There’s a bundle of changes you can make around cron’s file permissions etc. If you haven’t come across it, on some OSs, there’s a “cron.deny” file for example which blacklists certain users from accessing the “crontab” command. Additionally you also have a multitude of cron directories under the “/etc” directory which need permissions enforced and improved, indeed along with the file “/etc/crontab” itself. Once again check with your OS’s current settings before altering these or “bad things” ™ might happen to your uptime.

In terms of miscellaneous services being purposefully stopped and certain services, such as system logging which is imperative to a healthy and secure system, have a quick look at the Ansible below which I might put in place for syslog as an example.

- name: Insist syslog is definitely installed (to receive upstream logs)
  apt: name=rsyslog state=present

- name: Make sure that syslog starts after a reboot
  service: name=rsyslog state=started enabled=yes

IPtables

The venerable Netfilter which, from within the Linux kernel offers the IPtables software firewall the ability to filter network packets in an exceptionally sophisticated manner, is a must if you can enable it sensibly. If you’re confident that each of your varying flavours of servers (whether it’s a webserver, database server and so on) can use the same IPtables config then copy a file onto the filesystem via Ansible and make sure it’s always loaded up using this YAML file.

Time

Due to its reduced functionality, and therefore attack surface, the preference amongst a number of OSs has been to introduce “chronyd” over “ntpd”. If you’re new to “chrony” then fret not. It’s still using the NTP (Network Time Protocol) that we all know and love but in a more secure fashion.

The first thing I do with Ansible within the “chrony.conf” file is alter the “bind address” and if my memory serves there’s also a “command port” option. These config options allow Chrony to only listen on the localhost. In other words you are still syncing as usual with other upstream time servers (just as NTP does) but no remote servers can query your time services; only your local machine has access.

There’s more information on the “bindcmdaddress 127.0.0.1” and “cmdport 0” on this Chrony page under “2.5. How can I make chronyd more secure?” which you should read for clarity. This premise behind the comment on that page is a good idea: “you can disable the internet command sockets completely by adding cmdport 0 to the configuration file”.

Additionally I would also focus on securing the file permissions for Chrony and insist that the service starts as expected just like the syslog config above. Otherwise make sure that your time sources are sane, have a degree of redundancy with multiple sources set up and then copy the whole config file over using Ansible.

Logging

You can clearly affect the level of detail included in the logs from a number pieces of software on a server. Thinking back to what we’ve looked at in relation to syslog already you can also tweak that application’s config using Ansible to your needs and then use the example Ansible above in addition.

PAM

Apparently PAM (Pluggable Authentication Modules) has been a part of Linux since 1997. It is undeniably useful (a common use is that you can force SSH to use it for password logins, as per the SSH YAML file above). It is extensible, sophisticated and can perform useful functions such as preventing brute force attacks on password logins using a clever rate limiting system. The syntax varies a little between OS but if you have the time then getting PAM working well (even if you’re only using SSH keys and not passwords for your logins) is a worthwhile effort. Attackers like their own users on a system with lots of usernames, something innocuous such as “webadmin” or similar might be easy to miss on a server, and PAM can help you out in this respect.

Auditd

We’ve looked at logging a little already but what about capturing every “system call” that a kernel makes. The Linux kernel is a super-busy component of any system and logging almost every single thing that a system does is an excellent way of providing post-event forensics. This Admin Magazine article will hopefully shed some light on where to begin. Note the comments in that article about performance, there’s little point in paying extra for compute and disk IO resource because you’ve misconfigured your logging so spend some time getting it correct would be my advice.

For concerns over disk space I will usually change a few lines in the file “/etc/audit/auditd.conf” in order to prevent there firstly being too many log files created and secondly logs that grow very large without being rotated. This is also on the proviso that logs are being ingested upstream via another mechanism too. Clearly the files permissions and the service starting are also the basics you need to cover here too. Generally file permissions for auditd are tight as it’s a “root” orientated service so there’s less changes needed here generally.

Filesystems

With a little reading you can discover which filesystems that are made available to your OS by default. You should disable these (at the “modprode.d” file level) with Ansible to prevent weird and wonderful things being attached unwittingly to your servers. You are reducing the attack surface with this approach. The Ansible might look something like this below for example.

- name: Make sure filesystems which are not needed are forced as off
  lineinfile: dest="/etcmodprobe.d/harden.conf" line='install squashfs /bin/true' state=present

SELinux

The old, but sometimes avoided due to complexity, security favourite, SELinux, should be set to “enforcing” mode. Or, at the every least, set to log sensibly using “permissive” mode. Permissive mode will at least fill your auditd logs up with any correct rule matches nicely. In terms of what Ansible looks like it’s simple and is along these lines:

- name: Configure SElinux to be running in permissive mode
  replace: path=”/etc/selinux/config” regexp='SELINUX=disabled' replace='SELINUX=permissive'

Packages

Needless to say the compliance hardening playbook is also a good place to upgrade all the packages (with some selective exclusions) on the system. Pay attention to the section relating to reboots and idempotency in a moment however. With other mechanisms in place you might not want to update packages here but instead as per the Automation Documents article mentioned in a moment.

Idempotency

Now we’ve run through some of the aspects you would want to look at hardening on a server let’s think a little more about how the playbook might be used.

When it comes to cloud platforms most of my professional work has been on AWS and therefore, more often than not, a fresh AMI is launched and a then playbook is run over the top of it. There’s a mountain of detail in one way of doing that in this article (http://www.admin-magazine.com/Archive/2018/45/AWS-Automation-Documents) which you may be pleased to discover accommodates a mechanism to spawn a script or playbook.

It is important to note, when it comes to idempotency, that it may take a little more effort initially to get your head around the logic involved in being able to re-run Ansible repeatedly without disturbing the required status quo of your server estate.

One thing to be absolutely certain of however (barring rare edge cases) is that after you apply your hardening for the very first time, on a new AMI or server build, you will require a reboot. This is an important element due to a number of system facets not being altered correctly without a reboot. These include applying kernel changes so alterations become live, writing auditd rules as immutable config and also starting or stopping services to improve the security posture.

Note though that you’re probably not going to want to execute all plays in a playbook every twenty or thirty minutes, such as updating all packages and stopping and restarting key customer-facing services. As a result you should factor the logic into your Ansible so that some tasks only run once initially and then maybe write a “completed” placeholder file to the filesystem afterwards for referencing. There’s a million different ways of achieving a status checker.

The nice thing about Ansible is that the logic for rerunning playbooks is implicit and unlike shell scripts which for this type of task can be arduous to code the logic into. Sometimes, such as updating the GRUB bootloader for example, trying to guess the many permutations of a system change can be painful.

Bedtime Reading

I still think that you can’t beat trial and error when it comes to computing. Experience is valued for good reason.

Be warned that you’ll find contradictory advice sometimes from the vast array of online resources in this area. Advice differs probably because of the different use cases. The only way to harden the varying flavours of OS to my mind is via a bespoke approach. This is thanks to the environments that servers are used within and the requirements of the security framework or standard that an organisation needs to meet.

For OS hardening details you can check with resources such as the NSA, Cloud Security Alliance, proprietary training organisations such as GIAC who offer security resources, the diverse CIS Benchmarks for industry consensus-based benchmarking, the SANS Institute, NIST’s Computer Security Research and of course print media too.

This Is The End

Hopefully you can see how powerful an idempotent server infrastructure is and are tempted to try it for yourself.

The ever-present threat of APT (Advanced Persistent Threat) attacks on infrastructure, where a successful attacker will sit silently monitoring events and then when it’s opportune infiltrate deeper into an estate, makes this type of configuration highly valuable.

The amount of detail that goes into the tests and configuration changes is key to the value that such an approach will bring to an organisation. Like the tests in a CI/CD pipeline they’re only as ever as good as their coverage.




   Linux Books


If you've enjoyed reading the technical content on this site then please have a look at my Linux books which were both published in 2016 and some of my articles in Linux Magazine and Admin Magazine are available on their respective websites.



Linux Server Security: Hack and Defend by Chris Binnie           Practical Linux Topics by Chris Binnie

Postfix Howtos

I've written three articles on the admin and performance of the powerful Postfix MTA.

Docker Security

I wrote about the heavyweight champion of containers here: Docker Security.

Monitoring Howtos

There's comprehensive articles about the excellent Monit and the flexible nload.