The Born-again Sysadmin: December 2008

Monday, December 8, 2008

Reduce vxfsd usage

If you're seeing high usage of vxfsd on 11iv2 (I don't know for 11iv3), chances are it's wasting time managing the vxfs inode cache. Depending on your situation, setting a static cache can help. I've been doing this for years on a particular system with good results, had to do it again this morning, so I thought I'd post about this. The procedure is documented here:
http://docs.hp.com/en/5992-0732/5992-0732.pdf

Simply put, you have to do this:
# kctune vxfs_ifree_timelag=-1

Don't credit me to finding this one out. I owe it to Doug Grumann and Stepehen Ciullo.

Wednesday, December 3, 2008

Using DDR in a mixed SAN environment under 11iv3

Update Feb 10th 2009: I wrote a script to help manage DDR.

A little-known feature of the HP-UX 11iv3 storage stack is DDR which stands for Device Data Repository. It lets you set "scopes attributes" for the storage driver which apply to specific disk types. As far as I know, there is no whitepaper on this yet, so you have to read the scsimgr(1m) manpage to know about it. In my case, I learned about this feature during a lab in Mannheim (which was worth the trip in itself). The scsimgr whitepaper on docs.hp.com does give out a few bits of info but doesn't show the real deal. I'll try to do this here.

Simply put, creating a scope enables you to use the -N option with scsimgr set_attr and scsimgr get_attr that will let you apply attributes on a set of devices that share common attributes, rather than a specific device.

For example, if you have a server that has EVA disks along with MPT devices, you will probably want to set the SCSI queue length of the EVA devices to something bigger than 8 which is the default. But MPT devices have to remain at 8. Doing this with DDR is easy; simply set a scope attribute that will automatically adjust the queue length only for HSV200 devices.

Here's an example.

First of all, let's define a scope. Start by getting the DDR name that applies to your EVA device:
# scsimgr ddr_name -D /dev/rdisk/disk93 pid
SETTABLE ATTRIBUTE SCOPE
"/escsi/esdisk/0x0/HP /HSV210 "

You can go down further to the bone and even include the revision of your controller:
# scsimgr ddr_name -D /dev/rdisk/disk93 pid
SETTABLE ATTRIBUTE SCOPE
"/escsi/esdisk/0x0/HP /HSV210 /6110"

Once you got your scope, add it to the device data repository - the DDR. You have to do some cut and paste here, as blanks between the quotes are important.
# scsimgr ddr_add \
-N "/escsi/esdisk/0x0/HP /HSV210 "
scsimgr:WARNING: Adding a settable attribute scope may impact system operation if some attribute values are changed at this scope.Do you really want to continue? (y/[n])? y
scsimgr: settable attribute scope '/escsi/esdisk/0x0/HP /HSV210 ' added successfully

Finally, use the -N to scsimgr to set your attribute on the entire scope. In this example, I'll set max_q_depth:
# scsimgr set_attr \
-N "/escsi/esdisk/0x0/HP /HSV210 " -a max_q_depth=32

Don't forget to save it if you want to keep it across reboots:
# scsimgr save_attr \
-N "/escsi/esdisk/0x0/HP /HSV210 " -a max_q_depth=32

And voilà. All your EVA disks, running on an HSV200, now have a queue depth of 32. Furthermore, any new EVA device you present on the server that matches your scope will inherit the new attribute. Does it really work across reboots? I don't know yet, but most probably.

Another example would be to set a specific load balancing policy for MSA devices:
# scsimgr ddr_add \
-N "/escsi/esdisk/0x0/HP /COMPAQ MSA1000 VOLUME"
# scsimgr set_attr \
-N "/escsi/esdisk/0x0/HP /COMPAQ MSA1000 VOLUME" \
-a load_bal_policy=preferred_path
# scsimgr save_attr \
-N "/escsi/esdisk/0x0/HP /COMPAQ MSA1000 VOLUME" \
-a load_bal_policy=preferred_path

Get the picture? DDR is very powerful in mixed SAN environments. With it you don't have to bother about setting attributes for each specific disk.

Have fun.

Tuesday, December 2, 2008

RSP still sucks... but not big time anymore

The blog entry were I was saying that RSP sucks has created some attention, both in and out of the comments area. An update is in order. First of all, I won't censor this entry; it represents my initial feeling about RSP, a software bundle which made me waste lots of time, and whatever I think of it has not changed.

On the upside, following my rant on the ITRC forums (which was deleted quickly), some people at HP Canada noticed and they've put me in contact with colleagues in Colorado who were glad to listen my comments, and they promised to address some of the issues. Some of my concerns were: no support for VMs; no cookbook for HP-UX admins, lack of feedback from SWM, etc. I also had a quick talk with Brian Cox in Mannheim a few weeks later and he was aware of the problems HP-UX shops are facing with ISEE going away as some of them don't want to install Windows. Personally I don't care, but I would have rather run this on HP-UX if I could; I'm no Windows admin and feel more at home on Unix systems.

I've been running RSP as the only notification mechanism for a few Proliant(ESX) and Integrity(HP-UX) servers for over a month now, and it seems to work. All the events are sent to HP, and closed. I've also been able to have my C7000 blade chassis monitored too, although I couldn't find any documentation for this. I just set up the CMS as the trap destination, crossed my fingers, and test traps generate RSP events.

I evaluate that installing, debugging (and trying to understand) SIM and all the components that replace ISEE have taken me over 20 hours. That's a lot of work. So when a component will break in the future, I expect a phone call or e-mail from HP Support. If I don't get anything, I won't be in a good mood. I have many EVAs of different generations that will be migrated sometime in early 2009. They require more preventive maintenance, so this will be the real test.

In the mean time I'm asking all the support personnel to take a walk in their data center (we have 6) once in a while, looking for red lights. I thought these days were over, but RSP is a stack of multiple monitoring software solutions, and I haven't had proof yet that it can be trusted.