Thursday, January 29, 2009

Goobye ISEE: Monitoring an EVA with RSP: day 2

Turns out that my WEBES problems not recognizing my SMS as a "CommandView Server" were a bug in WEBES 5.4. There is a document in the ITRC KB, that was published just yesterday, that describes the issue. Look for document #mmr_na-0229085.

The resolution is not really clear, however. Basically, you have to create a new Managed Protocol named "CommandView". In it, input your authentication information for CV. Then, delete your server from the managed entities. Stop and restart the desta service (net stop desta_service; net start desta_service). You'll then notice that the server will reappear in the managed systems list, still as a Proliant, but in its detais you'll see the ELMC is now using your CommandView Protocol.

Does it actually work? I don't know yet. Besides the "wccproxy test", I can't do much. There is no way to initiate a real test from the HSV controller besides pulling of an FRU part, and I'd rather not do this. I'll try to look if there's a test even somewhere in the service menu but I won't keep my fingers crossed.

Wednesday, January 28, 2009

My m0n0wall setup

Last week I replaced my home firewall, an aging D-Link DI-524A, with a dedicated PC running m0n0wall and have been satisfied with the results. I used an old IBM Netvista purchased off Ebay, from which I removed the hard disk to reduce noise and electrical comsumption. It boots off a live CD and an old 64Mb USB key holds the configuration. The PC sits in my utility room, near the ceiling.

M0n0wall follows the Unix tradition: do one thing and do it well (there are exceptions in the Windows world such as Putty which also follows the principle). The user interface is simple and elegant, and its footprint is very small. I could actually run it on a 486, as long as it had enough memory. There is an alternate project name pfsense which includes many features, but this come at the expense of security and stability, and for this I prefer m0n0wall's philosophy.

This means that extra features such as a log analyzer or SSH daemon are not included in m0n0wall and you have to rely on another server. I would actually have like to be able to do this without needing another server, so running m0n0wall in a QEMU VM along with a small FreeBSD server (or even pfsense) could be a good idea for my next implementation. Time will tell.

Goobye ISEE: Monitoring an EVA with RSP: day 1

I have deferred upgrading the monitoring of my EVAs to RSP until Q1 2009 since I had a great deal of trouble with RSP last fall and was fed up.

HP Services proposed coming to help me (we have 6 EVAs and 6 SMSes), but I thought I'd try for myself for the first one to at least understand what they'll be doing, and be able to troubleshoot it once they're gone.

To increase my chances, I decided to start everything from scratch on the CMS and SMS. As far as these two servers are concernd, it doesn't get as "standard" as this:

  • The SIM administrator and me installed SIM 5.2 on a freshly reinstalled Windows server, then we restored our database succesfully (see one of my previous post for my recommendations on this)
  • I completely zapped my test SMS and reinstalled a vanilla Windows 2003 Enterprise, along with CV 6.0.2 and nothing more (we stick with 6.0.2 since it's the only version certified with Metrocluster).
Now does it work? Partly.

  • SIM must have at least spoken to SMI-S, since EVAs appeared automagically in the system list. But there's not much information I can get from them.
  • As the RSP "prerequisite" documentation that explains how to set everything up has no fucking example screenshot, who knows if the EVA entries in SIM are supposed to be in this state or not. Message to whoever's writing these guides: I'm sure you are allowed to put images there. Please do it!!!
  • WEBES is still not able to communicate with CV since it sees the server as a generic "Proliant", and not a "CommandView Server". Is it because I'm running on a generic server instead of a real SMS? Maybe, but a generic server is officially supported. I don't know how to change this yet. More work needs to be done.
Since there are still problems and SIM 5.3 has just been released last Monday along with WEBES 5.4, I'll try to have SIM upgraded first and start from there. This will probably be the last straw. If it still doesn't work, I'm converting my SSSU monitoring script to a nagios plugin and I'll give it for free to everyone who's interested. If HP is happy to give me crappy software that doesn't work, then I'll let them handle the overhead paying a human to manage the service calls that I'll log manually. I just wasted too much time and energy on this.

Thursday, January 22, 2009

The idiot's guide to (re-)installing SIM on Windows and making it actually work

My colleague and I have been busy in the last few days doing a complete re-install of SIM and RSP since we were running into problems with our server that would be tough to explain. To make a long story short, we decided that a fresh reinstall would fix things, and it looks like it did. Why are we running on Windows and not on HP-UX? Basically because 1) SIM was initially installed on Windows in our shop, 2) RSP only works on Windows and 3) My colleague is a Windows guy. :)

Here are my 10 suggestions if you want to do this. This might seem stupid for a Windows admin but I'm an HP-UX guy, remember.

1. Have a good backup
First of all, we made sure we had a good backup of the SIM database. HP has a whitepaper on the subject. But it says what to backup, but not necessarily how to back it up automatically. This was my first MS-SQL experience, and I ended up writing a custom script to back it up. I run it each day to dump the database, so that it can be backed up consistently.

2. Before reinstalling, confirm first that your data can be restored
Which I did by setting up a dummy VM running Windows, and restored data to a dummy SIM. It worked.

3. Use the Smart Start CD to Install Windows Server
I'm always sceptical of software that's self-labeled as "smart" and thought that we could just install a vanilla Windows server, then add all appropriate drivers and stuff... waste of time. Smart Start does all of this for you, and can install Windows from a CIFS-accessible .iso file.

4. Don't use a localized Windows and other software
Use a plain, honest-to-goodness U.S English version of Windows. If and when you run into problems, google will be a much better friend if you paste it error messages that are in english. If your company has a policy of installing software in a localized language, screw 'em.

5. Use the defaults to install *EVERYTHING*
Even if you don't like the defaults, at least they will work. We ran into a few bugs, especially with the database, and ended up thinking "if we were the QA guys at HP, how would we set up our server?" Chances are the answer to this is using the defaults! So don't try to tweak install optons, whether in SIM, RSP or MSSQL, unless you really know what you're doing. We didn't.

6. Don't run the software in your own account
Have it run with a generic account. If you use your personal account, SIM and MSSQL will work, but expect problems when your account gets deleted once you a) quit your job or b) get fired. Of course doing this is a good way to leave a time bomb at work in the case of b).

7. Update your server with Windows update between each software install
You'll probably end up going there 3-4 times

8. Run the SIM installer on the console
No need to use the iLO, you can type "mstsc /console" to do a terminal session. If you don't use the console, the RSP installer could fail miserably. Trust me.

9. Be patient when RSP is installing
It often asks you to wait "a few minutes" but experience here has shown me that it should rather be "a few hours" since it's downloading in the background a lot of software. Looks like the development team at HP tested this only on their gigabit network. In the real world, downloading hundreds of megabytes of bloated data through the internet can actually take quite some time.

10. Be prepared to reinstall everything, even Windows, if it doesn't work
There's an expression in French, un mal pour un bien, which means a bad thing for a good thing. We had problems with MSSQL which would have been impossible to fix cleanly, and decided that reinstalling Windows would be actually quicker than trying to make it work. It's not that bad, since by reinstalling Windows, yours truly actually took notes this time, and is sharing them with you!

Good luck

Tuesday, January 13, 2009

One liner: count the total uncompressed space of a gzipped tarball on HP-UX

gzcat u01.tar.gz tar tvf - awk '{tot+=$3; print; printf ("total = %10.0f\n", tot);}'

Friday, January 9, 2009

BladeSystem Virtual Connect Support Utility reports "TCP Port 21 in Use"

When this happens, don't waste time looking for TCP issues on the VC-FC as I did... it means that you have a running FTP server on the machine from which you're running vcutil. Stop it and it will work.