Monday, August 25, 2008

Adopting a conservative ESX patching strategy

HP-UX system administrators are familiar with the two "patching strategies": conservative or aggressive. Needless to say that on the mission critical systems I manage, I've always adopted the conservative strategy. It's hard to get downtime to reboot anyway, so one might as well be sure that the patches work.

With my previous, lone ESX 2.x server, I almost never installed any patches since it was complicated; VMware simply didn't have any tool to make the inventory easy.

With ESX 3.5, up until now I've been delighted by VMotion and Update Manager's ease of use. It's now simple to patch ESX servers: simply use Update Manager, remediate your servers, and everything goes on unnoticed by the users. UM will download the patches on your VC server, put the server in maintenance mode, VMotion away any VM you could have on your server, then run esxupdate. It's simple, no questions asked.

That was until the ESX 3.5 Update 2 release.

Most ESX admins will know about the August 12th timebomb in this release. All of this while I was on vacation. Thank God nothing happened, had anyone shutdown a VM it would have been impossible to restart it. And I might has been paged while on vacation.

Needless to say that I spent some time fixing this. Had I waited a few weeks before applying this update, as I should have, I would have missed this unpleasent experience.

Experienced sysadmins will tell me you've been too aggressive. That's true. I was too excited by Update Manager with VMotion. I'll be more careful, now.

1 comment:

Anonymous said...

You have to choose where you are going to fight your battles. On the one hand if you go conservative you wind up facing bugs that have been fixed with future releases. If you are aggressive you run into new bugs. In the end where would you rather fight the battle?

For me I would rather fight the battle on the newest code base since most times moving to the newest code rev is the first step in a troubleshooting exercise anyway.

So this expeienced sysadmin would say you did the right thing, and this is just one of those time where you had a problem. Though now much of one based on your reporting....