Wednesday, July 31, 2019

Update and thoughts on Ansible for cloud automation

Except for a few posts here and there, there hasn't been much really useful content in this blog in almost eight years! I think an update is in order.

I started this blog initially to target mostly HP-UX as I was feeling comfortable enough to post on various subjects on this operating system, and few, if anybody, blogged on HP-UX outside of the official channels, making this niche blog relevant.

Then I moved on in 2010. Since then, HP-UX itself as a platform has moved on itself, with fewer and fewer systems running.  And in the years that followed, I'll be the first to admit that it has not been easy to find a subject on which I felt good enough to blog about.

This is partly because I could not get a foothold on any particular technology. I've briefly worked as a systems architect, then came back to the technical side in 2014 by keeping Tru64 systems up and running until they got decommissioned (this was in an environment with extremely strict compliance rules -- to be honest, it wasn't very exciting). I then assisted in deploying some Windows servers (!!) in 2015-2016, along with some Red Hat Linux systems, and finally, in 2017, I've got drafted to help upgrading some Solaris 11.3 servers on a few SuperClusters. Okay, drafted is a strong word, it's a terrific and exciting platform, but sorry Solaris, seems to me that you're slowly moving on like HP-UX, too.

For a year now, I've been working on automating deployments in Azure in a new team. This is a 180 degree turn for a systems administrator, and I like it.

We're using Ansible to do this, using it to call (somewhat in preferred order):

  • native Ansible modules (when exist, and also when they don't crash)
  • AZCLI
  • REST API calls using azure_rm_resource whenever possible
  • ARM templates 
  • Powershell (last resort on a Linux host)

Is Ansible great at this job? It's been one year now, and I'm still not sure.

For starters, it takes a long time to make the code fully bullet-proof and idempotent. Furthermore, while Ansible (especially the modules) makes it easy to expect a desired state for specific Azure resources, it is harder to make a playbook that will take care of not only deploying resources, but reporting differences over time (i.e. drift management) and deleting these resources in the future when they will no longer be needed.

Terraform has been sugested many times to resolve this, but I haven't looked into it yet. Well, actually I did, but after an hour I still couldn't find out how to print "hello world" so I kind of called it quits, there is so much work to be done that side projects are kind of limited right now.

AWS seems to have got it right with Cloud Formation and stacks, a feature which, I think, is missing from ARM templates for now as ARM templates seem to be designed to be a one-time thing. I've just learned about stacks today and I'm getting excited.

To be continued!