The Born-again Sysadmin: October 2009

Friday, October 30, 2009

Using Gartner's Magic Quadrants

When evaluating security products, one way to go is to check out what Gartner has to say. I found out last week that they produce yearly what they call Magic Quadrants on a variety of products, and this can help you choose which ones you're going to look into.

Here is an example of such a Quadrant, using made-up products:

Of course, being inside, or outside, of the Quadrant doesn't mean ANYTHING to me. I'm not evaluating here the veracity of the Magic Quadrant, just its purpose. When choosing software, I prefer "going with my heart" when I can.

But there are cases where using the Quadrant can be of help. For example, if I happen to like Gorbatcheck and it turns out that it's favorably placed in the Quadrant, that's another thing up my sleeve that I can pitch to management. It's also a good ticket to my own peace of mind as being backed by Gartner gives a sense of immunity if the product turns out to be below expectations.

There is also the case where Gorbatcheck might be based on an open-source product, let's say Gorbrafree. Of course, Gorbafree won't be in the Quadrant, but the Quadrant can be used to give more credibility to Gorbafree over something less stellar such as GZK. If you're on a tight budget, that's a way to introduce Gorbafree until you're ready to move on to Gorbacheck.

How do you obtain these quadrants? Try Wikipedia!

Thursday, October 29, 2009

HP-UX 11i: Mission-critical UNIX

HP was announcing that webcast for weeks, and I decided to check it out. I had seen the one from last year and I didn't find this one much different in terms of presentation or content. The detailed results of the report are here.

First, if I recall correctly the study is done on a sample of around 250 people, give or take, I don't remember the exact number. That's not, in my opinion, a high number. But they insist on the fact that the answers come from data center IT tech staff, not CIOs, which at least comforts me as a systems administrator.

According to the study, UNIX(r) is still alive and well and still a strategic OS, but mostly in the enterprise. I insist on UNIX with capitals and the registered trademark, as they didn't include Linux in that category. They only evaluated AIX, Solaris, and HP-UX which are the "big three" Unixes left. I can resume the presentation to this: HP is better than the rest, sprinkled with numbers from Gabriel Consulting Group and a few slides I've seen countless times previously such as the OE "Christmas present" diagram and the HP-UX roadmap. Cox also spent almost 7 minutes on Green IT, speaking about how HP is "green" and reduces carboard boxes. Interesting, but maybe a bit out of subject. The presentation is high-level, and targeted mostly at people evaluating a migration to HP-UX more than existing customers. No wonder it's on cio.com!

Dan made a smalll mistake during the presentation. He mentioned that Virtualization was a strong point for the HP platform, and he is right. But he then followed on how the tendency would be to virtualize different operating systems on the integrity platform, all the way from Linux, to Windows, to OpenVMS, HP-UX, and then he hesitated and brought up "Tandem", which Brian corrected as "Nonstop OS". Well guys, I don't think they are plans to run Nonstop on HPVM. If there are, I stand corrected!

I couldn't comment if HP is the "best". I haven't worked on multiple platforms in 6 years. But whenI did work with HP-UX, Solaris and AIX, I preferred HP-UX due to its better management feature then. I can only expect it is still even better now.

O.

Tuesday, October 27, 2009

To me, it is now true. The web is (almost) the (home) platform

I was holding up from purchasing a new laptop for over 6 months cause everyone told me that Vista sucked. I finally did it last Friday with a brand new one running Windows 7. An HP, of course. All that for a price that was half of what one could pay for an honest espresso machine! And the quarter of an equivalent Macbook (which is why I'm not a Mac user, they're just too expensive).

So what do I think of Windows 7?

Not much, actually. It doesn't seem to be on a suck-o-meter at first glance, which is a good thing. But I simply don't care. I'm not excited. I'm completely indifferent.

Why? Because I've realized that while my previous XP installation had accumulated some useful software over time, I'm not sure this will happen with my new laptop.

Thus why, last week-end while staring at my shiny new laptop, I found myself thinking :

>> Where do I want to go today?

That's a very interesting question.

Windows 7 is sure slicker than Windows XP, but the latter had raised the bar already in 2001 by adding a well-deserved feature that had been missing from Microsoft's consumer Windows line since its inception: it didn't crash.

While I used Outlook Express before, I didn't bother installing Windows Live Mail this time and decided to just use my webmail. I didn't bother shelling out money for Office, as I don't use it much and Google apps works just fine.

My brand new laptop is therefore just an appliance to run a browser. Besides maybe retouching my digital photographs, I simply have not much use for all the processing power, and slick Windows 7 features, of that laptop.

Is the web soon to become the platform for my home usage? You bet.

O.

Friday, October 23, 2009

Old motif habits die hard

Just saw an e-mail today from our development group. They're planning on making a graphical dashboard for our proprietary application which runs on HP-UX. Everything on the backend has been CLI or text-based up until now, with the client running on Windows with MFC. But for that dashboard reserved to application administrators, developing a Motif application came out.

Yes, you read me right, Motif. Whew! Haven't seen development with Motif in a while.

Of course they didn't mean by this they would start develop in Motif right this morning (at least, I hope so). Maybe GTK+ would be more fit, and a good browser-based app would be even better.

My point is that Motif used to be associated with the commercial UNIX flavors for so long that saying "I'll make a Motif app on UNIX" is still a catch-all phrase like saying "I'll transfer files with FTP". Yes, these technologies have been there for a while and still work, but are clunky and outdated... and don't take me wrong, I used to root for Motif! That was in the mid 1990s. The Motif toolkit was, for its time, quite customizable using X resources. The no-frills window manager mwm offered a refreshing, KISS interface that actually worked on a workstation without crashing like Windows95 used to do twice a day.

All that was 15 years ago. Today, everything runs in a browser.

Thursday, October 22, 2009

Migrating from OSEM to WEBES 5.6

Here is a post on my experience migrating from OSEM to WEBES 5.6. It went well except for one minor problem with the MSA 2000.

First of all, you should know there are a few outstanding problems with WEBES 5.6, one of them being a security issue. I'm not sure I can disclose what they are as I didn't get them from official channels but I will say that a patch is expected sometime in December. If you don't need to run 5.6 right now in order to support specific hardware, you should stay with WEBES 5.5 and OSEM 1.4.8a in the mean time.

That being said, for those who wish to run WEBES 5.6, you can update it manually from Remote Support Software Manager. Also update to the latest version of Remote Support Eligible system List at the same time. The procedure is documented in the guide WEBES 5.6 and product coverage.

There is also some (redundant) information in the guides A.05.30 HP Insight Remote Support Advanced with WEBES 5.6 and OSEM to WEBES Migration Guide.

To migrate away from OSEM, if you have a standard installation with OSEM populated by info from SIM, it is as easy as simply uninstalling OSEM but you should read the above documents just to be sure. Once OSEM is uninstalled, stop WEBES (not documented, but I did it anyway) using "net stop desta_service" and "net start desta_service".

If you have hundreds of managed systems, it is better to wait at least an hour before testing if everything works well, as it takes a while for WEBES to stabilize and trap events once it is restarted. I also always confirm that e-mail notifications are enabled in order to have an alternate way of receiving notifications in case there is a problem up the food chain in SIM or ISEE.

Now you should test equipment that used to notify OSEM with SNMP traps to be sure they are being caught by WEBES and service events are opened at HP.

Here is what I tested successfully:

Proliant running Windows ... OK
Proliant running ESX ... OK
C3000 blade chassis ... OK
C7000 blade chassis ... OK
MSA2012i G1 disk array ... NO

The MSA 2000 G1 used to work with OSEM but no longer with WEBES. I've opened a ticket at the ITRC to have an official support statement. This is exacerbated by the fact that its events in SIM are reportedly sent as informational so those of you who "follow the red" could miss critical events.

O.

Sunday, October 18, 2009

The grep of all games

I normally don't talk about games in this blog, but I think a special mention should be made to World of Goo, a game I bought on Wiiware a few months ago and recently finished. It's actually the first game I've played seriously in years. Why? Because like a tool as ubiquitous as grep, it has a simple concept and few rules. It is easy to get the hang of it quickly. It is launched in a matter of seconds. And you can stop where you are, and come back later. Simply put, it's a masterpiece in terms of design, that even my two young boys mastered in a matter of minutes.

Give the game a shot. You will not be disappointed.

World of Goo Trailer 3
par 2dboy

Thursday, October 15, 2009

Log management is done. Now, on to change control. And more pie charts!

First, let me introduce you to the interesting subject of pie charts in the land of security compliance software:

I've had enough of all these software products whose selling point is that they're able to make pie charts. Like in our economic times, someone's job would be to sit in front of a screen all day staring at Pie Charts, making Pie Charts, and reading Reports With Pie Charts (if you're one of these people then sorry - I just can't understand how you can cope with this job). As a systems administrator, I want something that is, in order: 1. easy to use and deploy 2. responsive and 3. fits compliance requirements. If these include pie charts, then let it be, but that shouldn't be the only feature to look for. Seems that when your business consists of charging big bucks for software, looking for a bigger piece of the... er, pie, being no-frills is not a good sales argument.

Now I feel better. Let's move on.

I've posted extensively on various log management solutions I've been looking into in the last few weeks. Oh yes, and let's not forget my rant on the lack of info available on the website of some vendors that could have helped me get an idea of what their product does without needing to bug their sales team. Turns out there has been a total of 5 contenders (and not all of them had great websites, by the way). My business case is almost over, and while I won't disclose what I intend to recommend between ArcSight, Q1labs, a Balabit/Splunk hybrid solution and RSA, let's just say assisting to demos and speaking to a fair number of sales reps got me exhausted.

I could almost joke that had I decided upfront to use rsyslog and program a few perl scripts to extract the required compliance-related data manually, I might had been able to pull it off quicker for free. And if the auditor came in wanting pie charts, I could have been able to plot them in Lotus 1-2-3 like I used to do in high school and print them on a sheet of sprocket-fed paper.

Now it is time to turn that wheel again.

Yes, that's right, I have to do the process all over since I'm now looking for a Change Control solution that supports most of my devices. Change Control = knowing what, and possibly how, predefined critical files have changed on my servers from a certain reference point. None of the vendors above have one available I could piggyback on except Splunk's fschange which is not end-to-end enough and doesn't support HP-UX anyway.

I've looked into what's available and the names Tripwire and Solidcore pop up. I've used the academic Tripwire in the past, it did the job, but I need something that is based on a central server and supports multiple platforms, Windows being one of them. Maybe OSSEC? Perhaps, it is already running here successfully under the radar... but in my enterprise world, FOSS, especially when its intention is to reach compliance, is a hard sell even if it costs close to nothing.

Any suggestions on what I should look into?

O.

Tuesday, October 13, 2009

HPTF 2010: Yes. In Sin City. Again.

The HPTF website has not been updated yet, but the HPTF facebook group posted an official announcement that it will happen again, still at Mandalay Bay. I was hoping for another venue, somewhere in the bay area would have been nice, but ah, well. I'm not the one who decides.

Monday, October 12, 2009

Thoughts on the Sidekick fiasco

... What a fiasco. I can only feel sorry for the sysadmin team in charge of the data at Danger, is this due to their incompetence or simply pressure to deliver? We'll probably never know. But some heads will be rolling for sure, Microsoft will be associated with this mess for years to come, and this might be the end of their foray with the smartphone. This is another one of these data loss event that will go down in history. It's also a strong point against the Cloud, for anyone thinking about outsourcing their data.

It seems to have all happened during a "SAN upgrade". When you update anything on your SAN, you better have a DR site ready, and stop your replication before doing the upgrade. And that doesn't give you the luxury of not backing up your data correctly, which these guys at Danger didn't seem to be doing.

I won't start bashing any particular vendor in this blog, if you're interested in finding out who the rumors point to, I'll let you do your own search. No, it is not HP, but it's not far either. I can't give details, but it's not the first time I hear about a "routine" firmware update on a storage array that goes south. Yes, it is true that SANs are supposed to be upgradable online. But the more and more I think of it, the more I'm comparing the firmware update of a disk array to upgrading the thrusters of a jet while it's in the air. Yes, a jumbo jet can fly while one of the motors is stopped, but would you put your life at stake flying during an upgrade unless you really needed to? I wouldn't.

O.

Thursday, October 8, 2009

Monitoring an MSA 2000 G1 with SIM and Remote Support

I tried it, and it works. Here is a quick checklist:

On the MSA:

Go into Manage -> Event Notification -> SNMP Configuration
Configure your read/write community and the IP address of the CMS.

On the CMS:

Discover or identify the MSA if you have not done so already. If you have two controllers, you only need to discover one controller management IP address, SIM does not correlate together both controllers.
In the system properties, the product number will not be identified correctly. The product number burned in the MSA seems to be a valid HP part number, but the product number I had under contract differed and was a 6 letter number, so I copied it from my contract directly. Add the correct product number in the customer field under Contract and Warranty Information, as well as in Product Number field on the top, just in case. Check the two Prevent the discovery... checkboxes to prevent your mocked product number to be overwritten in the future.
Just to be sure, I also added the Care Pack directly in the system properties instead of relying on it being detected from the HP back-end, due to the problems I've had with the product number.
Re-check the entitlement in Remote Support, it should be green.

Go back on the MSA, and send an SNMP test trap. The event should be logged in SIM and a service event will be opened. N.B. I only tested this with OSEM as of now, as I have not yet had the time to migrate the SNMP monitoring to WEBES.

O.

Wednesday, October 7, 2009

Log management for the system administrator

I've had an increased number of readers who have been following this blog since my first posts detailing my log management hurdles, so here is an update on what's been going on.

I've limited myself to talking to a small number vendors, for various reasons I won't explain here. But I'll tell you what I think you should ask yourself when considering purchasing a log management solution:

Do you want an appliance, or software that runs on your own infrastructure?
Do you want your log data to be translated to a high-level format, keep your raw logs, or do both?
Do you plan on deploying this yourself or do you need an onsite consultant?
Do you favor a solution that is easy to use or one that is feature rich?
Do you have the human resources to maintain the solution once it's installed?
And, of course, what is your budget?

Getting answers to these questions is, well, complicated. Buying software is like purchasing a suit: you have the choice of doing it online, at a rock bottom price, with no help whatsoever and without trying it on. You can also go downtown to stroll down a few department stores, where you can get a feel of what's available, look at the price tags freely, and possibly get some minor adjustments done. Or you can go to a full-service luxury store, where someone will help you pick the perfect suit. Whatever you do is up to you, but I think you get my point.

If you're the department-store type of person, you can assemble some of the components by yourself. While getting your hands dirty will give you more control on the solution and possibly save some money, you need to be sure you'll be compliant with your auditor's requirements once you're done.

Instead of an appliance, getting the specs and a quote for an enterprise-grade x86 server running Linux or Windows isn't rocket science. Enough said.

To centralize your logging, if you're already familiar with syslog-ng, Balabit's Premium Edition of Syslog-ng has few secrets, they have a well-written whitepaper on the suject, and you can even get an instant quote online. If you're on a zero budget, rsyslogd a free alternative but I think syslog-ng might sound better to possible auditors, as they've been hearing about it for years.

As for the log drilling itself, which I decided in my documents to call deferred log analysis, I still don't know what can make the job as I have not finished that part of my architecture yet. I've seen both free and commercial solutions, and up until now Splunk seems to be a strong contender in this area. But I still need to figure out exactly what our tech people will be drilling for, and what the auditors will be looking for in terms of high-level, bells-and-whistles reports, before making my own decision.

The last part is the real-time log analysis, for which some IT security people tell me that it is "not automatable". I have doubts on this statement. While enterprise-wide solutions require dedicated staff, our needs are at a departmental level; I therefore think it is possible to pull it off with limited human resources. We'll see.

Sunday, October 4, 2009

Connecting a MSA2012i through a Virtual Connect with ESX

A year ago, I ordered the required building blocks to install a small ESX cluster in a remote office: a C3000, a few blades, and a MSA2012i. It was my first iSCSI implementation. It took a while to get it racked because my team was busy elsewhere, but now that it's done, I had to experiment a bit to make it work correctly.

The MSA is not an HP design. It's made by a Carlsbad, CA company named Dot Hill. The documentation and web interface are not up to HP's usual standards. (the interface has been upgraded with the MSA 2000 G2, but I have a G1). Furthermore, there is not much information explaining how the controller failover works, and this is important to set it up correctly. There is a very good document here in the ITRC KB that you must read before deploying these devices.

Go read it, right now, and come back to this post when you're done.

Here's how I integrated this through a Virtual Connect. That's not how you should do it, that's how I did it; if there are better solutions, please drop me a comment as I would be glad to hear about what alternatives are possible. If you google around, you'll see that some people have made similar setups to this one.

Single Controller Setup

Above is how you should hook everything up with a single controller MSA. The reason for using two different subnets is to isolate them as if you were on two SAN fabrics. If you use the same subnet, ESX will gladly team both pNICs under the same vswitch, and since one pNIC is active at a time, you won't be able to see both paths at the same time. There might be workarounds but I suggest you save yourself some trouble and use two separate subnets. Be sure to create a vmkernel interface on each one of these subnets, as well as a two service consoles too.

How does failover work? Well, esxcfg-mpath will report two paths for each iSCSI device. So you are free to shut down or update the firmware of one of your Virtual Connect's with no downtime. I tried it, and it works as if you were on a fibre channel SAN.

Dual Controller Setup

With two controllers, you are required to add two switches because of the way the controller failover is designed. It did not find a proper way to hook up both controllers to the Virtual Connect - it insists on teaming the two controllers, and shutting down Controller A doesn't turn off its link so the VC doesn't failover to network Controller B.

In the iSCSI initiator, configure only 192.168.10.10 and 192.168.11.10 - don't bother with the IP addresses of controller B. Although the MSA2012i is not supposed to be active-passive - I've had trouble configuring paths on both controllers at the same time. If you're experiencing long delays booting ESX or scanning your iSCSI HBAs, be sure to reference only to the IPs of the master controller in your iSCSI initiator setup.

If controller A fails or is shut down, controller B will takeover the IP addresses of A automatically and you'll be able to resume I/O. ESX will not even switch from one path to another, as the path is bound to the IP address -- 192.168.10.10 should be out of reach for 30 seconds and come back magically.

As I said, this might not be the best solution, but it worked for me. If I ever revise mine, I'll update this post. Good luck.

O.