Wednesday, March 24, 2010

The redundant IT team

Following my resignation as the senior sysadmin at my division, I've had many of my colleagues come into my office, saying (amicably) "Darn! You've just put us in deep shit!". That's a matter of perspective; nothing should start breaking apart the moment I leave, and many servers can live without any maintenance for a while. No, I'm really not leaving them in distress, as for many years, I've been thinking the following:

Everyone is replaceable.

...a quote which probably makes me fit for management, because that's exactly what managers think. And they aren't wrong, not at all. They are right. No key IT staff, be it a sysadmin, developer or tech support person, should hold all the information and knowledge to keep any part of your business running. This is especially true with small IT teams of less than 50 resources where there is no implicit redundancy that covers everyone.

Yet, that's the position that many IT managers will put their staff into. When you're counting beans, the only thing that matters is keeping a predefined quality of service at the lowest cost. While hardware manufacturers will be hard-pressed to justify various expenses due to all the redundancies they pack into their stuff, remember that people need to be redundant too! And when the shit hits the fan, no amount of preparation will be of any help if you can't count on support staff who is experienced with your systems. Which will introduce my second quote:

If you can afford redundant hardware, you should be able to have redundant people too.

By "redundant" I don't mean to double up your resources 2 to 1. But you really, really have to reduce the chances of someone being the sole owner of a key role without an assigned teammate.

These rules should then be followed to put into place a redundant IT staff:

1. Make sure no one in your staff has a unique set of skills and knowledge. Someone must be able to replace any missing resource, to some level, quickly. This is for valid for everyone in your IT organization, as nobody answering the phone on your tech support hotline (or giving wrong information) could end up being as bad as somebody else inadvertently initiating an IPL on the mainframe.

2. Require a standard and consistent set of documentation for hardware and software solutions before putting anything into production. Docs that are formatted with a standard set of sections will be easy to skim through by anyone who needs it. The UNIX man pages are a stellar example of consistency. If you don't have the time to mess with recognized processes and standards, then simply don't; a word processor template is all you need to get your guys going.

3. Organize regular lunch'n'learn sessions where someone presents a technology subject to its peers. Not a deep dive, but just an overview so they know about it. Insist on quality presentation documents as they can become a great reference later down the road when training new recruits. And don't be a cheap bastard: if you want to motivate people to come and listen, you better pay the lunch.

4. Treat your key resources well, so that they aren't tempted to go elsewhere. By "key" resources, I mean anyone who has deep knowledge of proprietary systems, and for which there is a shortage of qualified workforce on the street. For that matter, enforce rule #2 with them. Even if you can count on someone else to fill in when they say their goodbyes, you'll still need to hire someone further down the road, so it's best to make sure they don't leave.

5. Don't hire smart asses and douche bags. And I'm serious about this. The smart ass is easy to spot in an interview; that guy will claim that he knows everything, sometimes amalgamating nonsensical buzzwords, so just grill him with very technical questions prepared in advance by your staff and he'll fail miserably. The second kind is harder to spot. In my career, I've crossed a few computer science folks who were very talented but also impossible to work with; these aren't team players, they don't trust anyone, and keep everything for themselves. They must be avoided at all cost as they have the ability to sink your operation. Besides a personality test, which they might be able to trick, there's not much you can do to find out except asking for references. So do you get me here? You need to find a balance between social and technical skills depending on the type of job you're offering; low social skills don't fit well with tech support but might be acceptable for a senior developer. Hell, that point is getting so long, I think I'll make it a blog post all by itself one day.

6. Hire people that are ready to hold many hats. Let the truth be told: some people are happy to be single task-oriented and will make sure it stays that way. These do not fit well in a small IT team, as you can't motivate them to learn about new technology, especially if it's under one of their colleague's responsibility. They're NOT autodidacts and the first they'll always do is ask to be "trained" for just about anything. If you have someone who can use a jig saw, but refuses to even try using a reciprocating saw without going a few years to the School of Reciprocating Saw Professionals, then you have a problem. Of course, when working with a unionized crew, the rules are very different and I don't think that blog post will be of any help to you.

7. And last, nobody likes change management and avoids it like the plague unless they're forced to implement it to follow some crapass compliance rule. But change management, if done well with a minimal mount of red tape, can be very beneficial to your team. Any time someone changes something, it will be documented, which makes fixing any mistake easier if that person is not reachable. Up until now, most "enterprise" change management I've seen seem to be a bunch of expensive, incomprehensible software stacks. I've yet to find a simple and easy-to-use web-based system so I can only encourage you to spend a few days making your own.

This set of rules is by no means scientific. They're the ones I would do my best to apply, was I in a management position for an IT team. Fortunately for me, I'm not. Managing an IT team presents its own set of challenges: with limited money you have to keep your employees happy, the users happy, while at the same time ensuring that your enterprise's survival isn't in peril by ensuring that a reasonable risk management is done. Running a redundant team is one of the best way to lower that risk.


No comments: