Thursday, February 19, 2009

Olivier's hot tips to monitor HP-UX servers with SIM and RSP


1. Configure WBEM on your server
SIM and WEBES subscribe to WBEM events on your server in order to receive events. But you need to put root credentials in SIM's Global Protocol Settings for this to work. Whatever you do, don't add root's credentials anywhere. You should never have to hand out the root password to some slimy application unless you really know what you're doing. Create a dedicated WBEM user for this instead.

Add a user with "adduser", I named it hpwbem:
# useradd -u 505 -g users -s /bin/false -c "HP WBEM provider" -m -k /etc/skel hpwbem

Then use passwd to input a password of your choice.

Enable non priviledged users in the CIM:
# cimconfig -s enableSubscriptionsForNonprivilegedUsers=true -p
# cimconfig -s enableNamespaceAuthorization=true -p
# cimserver -s
# cimserver

Then add rights for hpwbem to the CIM:
# cimauth -a -u hpwbem -n root/cimv2 -R -W
# cimauth -a -u hpwbem -n root/PG_InterOp -R -W

# cimauth -a -u hpwbem -n root/PG_Internal -R -W
# cimauth -a -u hpwbem -n root/cimv2/npar -R -W
# cimauth -a -u hpwbem -n root/cimv2/vpar -R -W

Configure the hpwbem user, and its password, in SIM's Global Protocol Settings.

Now, have SIM subscribe to WBEM events for your server. It doesn't by default. On your CMS, type:

C:> mxwbemsub -a -n server_name

Once this is done, check on your server if you have SIM subscriptions by using evweb:
# evweb subscribe -L -b external

You should see three subscriptions named HPSIM_*.


2. Configure your system properties in SIM

Get into the System Properties of your server in SIM, then confirm that a serial and product number has been discovered. Sometimes the PN is missing for Integrity servers, so add it manually. Just to be sure, also recopy the SN and PN in the Customer-Entered serial number and product number fields in the Entitlement Information area. You'll be sorry if you don't do this. Next, set your Country code. If you don't do this, ISEE/RSP won't work. The other fields in the Entitlement Information area can normally be left blank.

Assign a site name, and at least a primary customer contact to your server. It's important, else I think no ticket will be generated by ISEE since there will be nobody to contact.


3. Configure RSP entitlement

Go in the ISEE client (Remote Support Configuration and Services under the Options menu) and confirm that your server is entitled. If it isn't, you can try clicking on the entitlement icon, and have it send a new entitlement request. As long as you're not entitled, RSP will not forward service calls to HP so it's critical that you get this fixed. Be sure you set the system properties correctly as mentioned above.


4. Configure WEBES

Get into WEBES (localhost:7906) and confirm that your server is in the Managed Entities list. Of course, there's no search feature, you'll probably have to check multiple pages in the Full List to find it. If your server appears, confirm that its system type is ManagedSystem - HPUX. If the server is of the wrong type, delete it, as it could stay that way for a while -- better be safe than sorry.

WEBES synchronizes its entity data with SIM, but it does this through telepathy or some other magic, I couldn't find out how it's done and if it can be forced (and nobody replied to me in the forums to help me...). Restarting desta doesn't do the trick.... the real trick is actually waiting, sometimes for a loooooong time, until your server appears as a managed entity. I suggest you wait until the next day.

Once your server is in WEBES, run evweb (see above) and confirm that there's a subscription named HPWEBES_*. You need to have one, else hardware events will not be caught by WEBES and forwarded to RSP...


5. Generate test events to confirm it actually work

Generate a test event with EMS:

# /etc/opt/resmon/lbin/send_test_event ia64_corehw

...then cross your fingers, hoping it will be reported. The following should happen:

a) the event will be shown in SIM, in the event tab of the server (this is what the SIM WEBM suscription is used for)
b) the event will be trapped by WEBES, and sent to ISEE (this is what the WEBES WBEM subscription is used for)
c) ISEE will send the event to HP, and you'll see in the server event log messages such as A service incident has been reported (this is what all the entitlement hassle is used for)

If you went down to step c, you're done. If it didn't work, go to step 1 and start again. I had to to this quite a few times. There's an old song in Quebec French named "refelemele". It basically means "doittomeagain". Chances are you'll be singing this along for a few days.

Monday, February 9, 2009

easyddr: a wrapper for scsimgr to help you manage your DDR


The DDR (device data repository) which I spoke about a few months ago is an interesting feature of the new HP-UX mass storage stack which lets you set attributes on-the-fly for storage devices matching a specific scope. Any new device which is presented to the system gets its custom attributes assigned dynamically.

While this is a nice feature, there are not thousands of use cases I can think of here. The stock DDR that comes with HP-UX seems to be populated with entries for tape devices, so my guess is that a lot of usefulness is related to tape drives (but I couldn't say, as our Data Protector infrastructure runs on Windows). But with disks, one case that is worth mentioning is that the SCSI queue depth can be fine-tuned for specific devices, enhancing performance.

While the DDR is useful, it's not really easy to use: it depends on fixed-length, whitespace-padded strings and it's hard to dispatch modifications to a bunch of servers at the same time. So I wrote a small script named easyddr.sh which eases its use somewhat.

Instructions:

1. Create a configuration file

The configuration file contains pairs matching wildcards and attributes. For example:

# Sample configuration file
/HSV:max_q_depth=32 esd_secs=60
/IR Volume:max_q_depth=8

This means that:

  1. All HSV (EVA) devices, no matter the controller version, will be set with a SCSI queue depth of 32 and an I/O timeout of 60 seconds.

  2. All IR Volumes (MPT) devices will be set with a SCSI max queue depth of 8

2. Run easyddr with your configuration file

Run easyddr:
# easyddr.sh /usr/local/etc/easyddr.cfg

It first reads the configuration file to see if any devices on your system matches your widcards. Then, if there is no DDR entry for a device, it adds it to the DDR without needing you to cut-and-paste it. The last step it takes is to set your attributes, then save them. The script always runs in preview mode by default. If everything seems okay and you want to apply the settings, re-run it with the "-apply" option:

# easyddr.sh /usr/local/etc/easyddr.cfg -apply

Caveat: there is no way to delete entries using easyddr once they are added. If you make a mistake, you'll need to run scsimgr ddr_del to remove them.

In my case, I install easyddr and the config file in my Golden Depot, and make it a baseline script that needs to be run when the server is initially installed. No more need to use the scsimgr command. Have fun!

Download easyddr here

Thursday, February 5, 2009

Goobye ISEE: Monitoring an EVA with RSP: day 4

I've finally been able to have the B-Series switches recognized in SIM. However, I was expecting the Brocade SMI-S provider to give me the serial number, which kind of limits its usefulness for me. The next step is seeing if I can actually have it appear in OSEM.

Hints:
1. Be sure that the hostname of the switch, and its DNS name, are the same
2. If SIM discovers the switch by scanning the network before finding it from the SMI-S provider, it will create an "unmanaged" entry and it won't override it. So delete them before running an identifying systems on the server which has the SMI-S Provider.

Good luck

Wednesday, February 4, 2009

Goobye ISEE: Monitoring an EVA with RSP: day 3

I don't only have EVAs to monitor, but B-Series fibre switches as well. The RSP documentation has a few indications on how to configure Fabric OS to send SNMP traps to the CMS, but you're better off reading the OSEM setup guide which gives more details on the exact commands to use.

However, the switches don't appear magically in OSEM. Usually I think a device will appear once OSEM receives the first SNMP trap. But Fabric OS does not currently have any SNMP test trap that can be sent. Besides pulling off a FRU, which HP doesn't recommend, there's no solution for now. I'll try to pull off a SFP to see if it works.

As a sidenote, I also tried installing the Brocade SMI-S provider on the SMS since the switches are labeled as "unmanaged" by SIM but it doesn't seem to work. No matter how much I try running an Identify on the CMS, it doesn't discover the switches. Everything seems to be set up correctly, wbemdisco finds the devices, and I also modifid wbemportlist.xml to use port 60001. No results yet, the switches are still unmanaged (yes, I tried deleting one).