The Born-again Sysadmin: 2008

Monday, December 8, 2008

Reduce vxfsd usage

If you're seeing high usage of vxfsd on 11iv2 (I don't know for 11iv3), chances are it's wasting time managing the vxfs inode cache. Depending on your situation, setting a static cache can help. I've been doing this for years on a particular system with good results, had to do it again this morning, so I thought I'd post about this. The procedure is documented here:
http://docs.hp.com/en/5992-0732/5992-0732.pdf

Simply put, you have to do this:
# kctune vxfs_ifree_timelag=-1

Don't credit me to finding this one out. I owe it to Doug Grumann and Stepehen Ciullo.

Wednesday, December 3, 2008

Using DDR in a mixed SAN environment under 11iv3

Update Feb 10th 2009: I wrote a script to help manage DDR.

A little-known feature of the HP-UX 11iv3 storage stack is DDR which stands for Device Data Repository. It lets you set "scopes attributes" for the storage driver which apply to specific disk types. As far as I know, there is no whitepaper on this yet, so you have to read the scsimgr(1m) manpage to know about it. In my case, I learned about this feature during a lab in Mannheim (which was worth the trip in itself). The scsimgr whitepaper on docs.hp.com does give out a few bits of info but doesn't show the real deal. I'll try to do this here.

Simply put, creating a scope enables you to use the -N option with scsimgr set_attr and scsimgr get_attr that will let you apply attributes on a set of devices that share common attributes, rather than a specific device.

For example, if you have a server that has EVA disks along with MPT devices, you will probably want to set the SCSI queue length of the EVA devices to something bigger than 8 which is the default. But MPT devices have to remain at 8. Doing this with DDR is easy; simply set a scope attribute that will automatically adjust the queue length only for HSV200 devices.

Here's an example.

First of all, let's define a scope. Start by getting the DDR name that applies to your EVA device:
# scsimgr ddr_name -D /dev/rdisk/disk93 pid
SETTABLE ATTRIBUTE SCOPE
"/escsi/esdisk/0x0/HP /HSV210 "

You can go down further to the bone and even include the revision of your controller:
# scsimgr ddr_name -D /dev/rdisk/disk93 pid
SETTABLE ATTRIBUTE SCOPE
"/escsi/esdisk/0x0/HP /HSV210 /6110"

Once you got your scope, add it to the device data repository - the DDR. You have to do some cut and paste here, as blanks between the quotes are important.
# scsimgr ddr_add \
-N "/escsi/esdisk/0x0/HP /HSV210 "
scsimgr:WARNING: Adding a settable attribute scope may impact system operation if some attribute values are changed at this scope.Do you really want to continue? (y/[n])? y
scsimgr: settable attribute scope '/escsi/esdisk/0x0/HP /HSV210 ' added successfully

Finally, use the -N to scsimgr to set your attribute on the entire scope. In this example, I'll set max_q_depth:
# scsimgr set_attr \
-N "/escsi/esdisk/0x0/HP /HSV210 " -a max_q_depth=32

Don't forget to save it if you want to keep it across reboots:
# scsimgr save_attr \
-N "/escsi/esdisk/0x0/HP /HSV210 " -a max_q_depth=32

And voilà. All your EVA disks, running on an HSV200, now have a queue depth of 32. Furthermore, any new EVA device you present on the server that matches your scope will inherit the new attribute. Does it really work across reboots? I don't know yet, but most probably.

Another example would be to set a specific load balancing policy for MSA devices:
# scsimgr ddr_add \
-N "/escsi/esdisk/0x0/HP /COMPAQ MSA1000 VOLUME"
# scsimgr set_attr \
-N "/escsi/esdisk/0x0/HP /COMPAQ MSA1000 VOLUME" \
-a load_bal_policy=preferred_path
# scsimgr save_attr \
-N "/escsi/esdisk/0x0/HP /COMPAQ MSA1000 VOLUME" \
-a load_bal_policy=preferred_path

Get the picture? DDR is very powerful in mixed SAN environments. With it you don't have to bother about setting attributes for each specific disk.

Have fun.

Tuesday, December 2, 2008

RSP still sucks... but not big time anymore

The blog entry were I was saying that RSP sucks has created some attention, both in and out of the comments area. An update is in order. First of all, I won't censor this entry; it represents my initial feeling about RSP, a software bundle which made me waste lots of time, and whatever I think of it has not changed.

On the upside, following my rant on the ITRC forums (which was deleted quickly), some people at HP Canada noticed and they've put me in contact with colleagues in Colorado who were glad to listen my comments, and they promised to address some of the issues. Some of my concerns were: no support for VMs; no cookbook for HP-UX admins, lack of feedback from SWM, etc. I also had a quick talk with Brian Cox in Mannheim a few weeks later and he was aware of the problems HP-UX shops are facing with ISEE going away as some of them don't want to install Windows. Personally I don't care, but I would have rather run this on HP-UX if I could; I'm no Windows admin and feel more at home on Unix systems.

I've been running RSP as the only notification mechanism for a few Proliant(ESX) and Integrity(HP-UX) servers for over a month now, and it seems to work. All the events are sent to HP, and closed. I've also been able to have my C7000 blade chassis monitored too, although I couldn't find any documentation for this. I just set up the CMS as the trap destination, crossed my fingers, and test traps generate RSP events.

I evaluate that installing, debugging (and trying to understand) SIM and all the components that replace ISEE have taken me over 20 hours. That's a lot of work. So when a component will break in the future, I expect a phone call or e-mail from HP Support. If I don't get anything, I won't be in a good mood. I have many EVAs of different generations that will be migrated sometime in early 2009. They require more preventive maintenance, so this will be the real test.

In the mean time I'm asking all the support personnel to take a walk in their data center (we have 6) once in a while, looking for red lights. I thought these days were over, but RSP is a stack of multiple monitoring software solutions, and I haven't had proof yet that it can be trusted.

Thursday, November 27, 2008

Visualizing Iozone data with Gnuplot

I've been using iozone for a few years now to measure disk I/O performance. My version was getting old, so I downloaded a more recent version and I noticed that it comes with a nifty perl script named iozone_visualizer.pl that ties it with Gnuplot to produce very interesting graphs. It's quite useful when you want to compare multiple systems,or different tunables and you no longer need to use Excel. Yes boys and girls, the best things in life are free.

Monday, November 24, 2008

With AVIO, swapper's the one doing all the job

If swapper has a high usage on a VM Host, that doesn't mean that your system is "swapping". In fact, I just discovered that AVIO I/O is handled by the swapper!

I'm benchmarking with iozone one of my VMs and noticed that on the HPVM 4.0 host, swapper usage was high. This is not normal, old school attendance thaught me that swapper should never be doing this, as when it is, your system is deactivating processes and you're deep trouble. Yet the VM Host is humming along, with plenty of memory left.

Since the I/O rate of the disk presented to my benchmark VM roughly matches the I/O rate of the swapper, I can only conclude that the AVIO engineers hacked the swapper so that it's now the sole responsible of doing I/O on behalf of the guests. I'm no kernel developper so the implementation details are beyond me. But it does makes sense in a way: since swapper is real-time priority process, it's sure that I/O handled by swapper will go out the door faster than if it's coming from hpvmapp.

Your VM Host should technically never start deactivating processes anyway, unless of your all the WBEM providers go haywire. Oups, I bashed the WBEM providers once again, sorry about that.

As all this I/O was labeled under the memory_management application in Glance, I modified my /var/opt/perf/parm file to add an application type for AVIO. Remove swapper from memory_management, and add a new type named "avio" which includes the swapper.

Sunday, November 23, 2008

Now that's the spirit

An HP engineer wrote a song about Integrity Servers. 30 seconds of pure bliss. http://h30423.www3.hp.com/?fr_story=2a16002b3acfd5b7d5ca1a785706e90c8984a95f&rf=bm

Friday, November 21, 2008

Seeing agile devices under HP-UX

What if you could see all your HP-UX 11iv3 agile devices under an even better agile view like this:

Yeah, I thought you'd be interested. You can download my script by clicking here.

Friday, November 14, 2008

Why we chose HP-UX for our Mission Critical Application

The company I work for trusts HP-UX on Integrity as the platform to run a mission critical service. Last monday, I presented at CCE 2008 "Why we chose HP-UX for our Mission Critical Application". It's an updated version of what I presented at HPTF, and probably my last version.

I've just uploaded a copy of the slides which are available on this link. Connect's website should also have the conference proceedings available soon.

HP-UX 25th anniversary beer mugs

At CCE2008, a German BCS manager took it upon himself to organize a small celebration to underline the 25th anniversary of this operating system. Thanks Juërgen! It was well-attended, which is a good thing.

Not long after, promotional beer mugs full of Bitburger were given out to all attendees at the conference reception. As a die hard HP-UX fan I couldn't resist on taking a picture of one of these beer mugs besides my computer, doing actual productive work (of course).

That picture is proof that there are some german blondes who can actually take interest in my day-to-day work.

Wednesday, November 12, 2008

My thoughts on CCE 2008

I've just finished attending CCE 2008 which was Connect's first european event. I'll put it quickly: it didn't work. That's not because the organizers, the sponsors and HP didn't try. My hat goes up to them, they did the best they could. The conference itself was OK, without all the extravaganza of HPTF and that's fine since being a tie-less techie, I'm always there for the technical agenda.

But the economic downturn, combined with a lack of "community spirit" from my HP-UX and VMS counterparts basically made this a low attendance NonStop event. The sessions that were not NonStop-related got few attendees from the first day, to the dismay of HP executives. And that's really a shame because there were some excellent sessions and labs.

Someone asked at the QA panel what Connect thought of this. Nina Buik was frank: many delegates cancelled due to travelling budgets being restrained, so attendance figures got lower than expected. One thing's for sure, there won't be a CCE 2009, Connect will concentrate more on local events the next time. And I'm not making anything up, this comes from Buik herself.

That's assuming, of course that Connect's finances can recover. The event attracted around 500-550 people, they were expecting 800, so they're 35% under their initial hopes. Then I've heard about the number of actual customers who attended the event, which I won't disclose here, but I can say that it was far from stellar.

It's too bad to see an event with so much potential having been met with coldness by HP customers (except the NonStop guys, of course). But times got hard quickly in 2008 and Connect was hit by this uncontrollable circumstance.

Bye

Friday, November 7, 2008

One liner: poor man's esxtop for HPVM

while true; do
hpvmsar -s 1 -n 1 -a >/tmp/$$
clear; cat /tmp/$$
done

I have to redirect the output to a file, else the slow response time of hpvmsar makes the output flicker.

Thursday, October 30, 2008

Igniting Integrity VMs

For the last year, my VMs under IVM 3.0 and 3.5 were mostly installed one by one. But since I installed a huge IVM 4.0 server for more critical environments, I've started seriously using Ignite-UX to install VMs.

I was surprised: I think I can beat my Windows administrator colleague by deploying HP-UX VMs quicker than he can do Windows VMs under ESX. I counted 30 minutes from the inital hpvmcreate to the last boot.

The core media way
This one is simple, but installations are long. They will take at least 2 hours since installing from core media uses straight SD packages and they're slow to install.
1. Copy the core media of your HP-UX release to a depot. Take a recent one - it will have the VM libraries, and AVIO drivers as well. It's well documented on the Ignite-UX quick start guide.
2. Build a Ignite-UX boot helper using make_media_image. Don't burn it - just keep the iso and transfer it to your VM Host. I prefer using the boot helper since DHCP can't work across subnets, and it's more complex to setup than just use a boot helper (furthermore all our subnets are managed by a Windows DHCP server, and I can't fudge into booting Integrity servers which don't work with PXE yet for HP-UX. Yuck.)
3. Configure your VM with AVIO if possible. Boot your VM with the boot helper, contact the Ignite-UX server and install from there.

The Golden Image way
This one is pretty fast, assuming you have a gigabit network.
1. Create a small VM to build your image - I aptly named it "genesis". You can install it using the above method.
2. Configure it to your taste,
3. Add the latest VM Guest package and AVIO drivers (they are available from the software depot)
4. Use make_sys_image to build your golden image, and setup your configuration files. It's well documented in the Ignite-UX documentation

To deploy a VM, boot it with a .iso boot helper (see above), and ignite with your Golden Image. Use AVIO for lan and disk. It's so damn quick that I didn't even have time to finish my lunch when I tried it today.

Good luck

Wednesday, October 29, 2008

Quick review of Integrity VM 4.0

I've been a user of IVM since 3.0, and I'm about to finish putting in production a fairly big server that will host a bunch of VMs.

One of the big drawbacks of versions prior to 3.5 was the lack of a built-in MPIO. You had to either purchase the expensive SecurePath, or use PVLinks which forced you to use the LV backend. I used PVLinks, but the concept of having to manage VGs both inside my VMs, and one level upwards on the host, was complex. I wouldn't suggest it to anyone who is not familiar with LVM. On the upside, using VGs on the host can prevent mistakes since PVs are harder to corrupt than raw disks.

Furthermore, to benefit from network redundancy, APA had to be purchased seperately, which also increased costs. So of course the big advantage of 4.0 is the 11iv3 host, that lets you use its built-in MPIO. Furthermore, the VSE-OE now includes APA for free (It was about time). So these two items are covered. And did I say that APA was now very easy to configure? I'm not fond of the System Management Homepage, but the APA configuration in it is now dead easy, and quick. Only a linkloop feature is missing.

The agile addressing still seems weird to me, it's not as simple as usingSecurePath, but I'm catching on slowly. Actually finding the LUN of a device is a hard task, I'll have to rewrite ioscan_fc.sh for 11iv3 for this matter.

ESX administrators are used to managing files. They're easy to move around, and you can "see" them, which prevents mistakes. It's a similar paradigm as a DBA preferring files to raw devices. In this area, there is one improvement: AVIO is now supported with a file datastore. Even with a tuned VxFS, I found the files datastore to be slow when I did tests with 3.0 last year, you can be sure I'll try again this time.

Bye

Monday, October 27, 2008

Understanding all the RSP components

N.B. My updated diagram from December 2009 is here

This blog entry is updated regularly. Latest updates:

November 4th 2008
November 19th 2008
December 10th 2008
December 16th 2008
Feburary 20th 2008

Having read (diagonally) over 1000 pages of documentation related to every component that RSP includes, here are my notes that might be of help. This is definitely not all accurate. When I find inconsistencies, I'll update this blog post.

The bottom line is that you no longer have a simple ISEE client running on your HP-UX host anymore. It's now much more complex than this.

There's a bunch of "new" tools that will become part of your life. In fact these are "old" tools that have been available for years. They're now tightly welded together, run on a central server (CMS) instead of locally on each monitored host, and for the most part do not need to be configured independently, but it's important to understand what each one does.

SysFaultMgmt (System Fault Management) - runs on the HP-UX server
It's the "new generation" of EMS, that speaks WBEM. Using WBEM, it can be integrated easily in SMH (System Management Homepage) and SIM (Systems Insight Manager). SysFaultMgmt used to work in parallel with traditionnal EMS monitors, but since HP-UX 11iv3 March 2008, it seems to switch off EMS and replaces it completely. EMS will be eventually EOL'd.

EVWeb - runs on the HP-UX server
A companion to SysFaultMgmt which is a GUI that lets you query and manage WBEM subscriptions. There's also an evweb CLI, which will let you extract events and see their contents (they look similar to EMS's event.log file). The CLI has a man page, it's not hard to use. Be careful: I've played with evweb from SMH, sometimes it crashed, and it resulted in some evweb CGI's spinning endlessly, taking 100% CPU. The CLI is probably more robust.

System Insight Manager agent - runs on Proliants running VMware ESX and probably Windows as well

This agent includes a good-old System Management Homepage, along with hardware diagnostics agents. If the agents detect that something goes wrong, they are configured to send an SNMP trap to the CMS.

OSEM - runs on the CMS
OSEM is an agent that analyzes SNMP events that are sent to it. It filters them, and translates them to a human-readable form which can be sent by e-mail and/or to ISEE. By filtering, I mean that will be find out if an SNMP trap send by a device is actually an important one, and decide if it's necessary to generate a service event for it.

OSEM supports mostly systems that reports their events using SNMP:

Proliant servers running Linux, Windows or VMware ESX.
Integrity Servers running Linux
SAN switches
MSA enclosures
Bladesystem chassis (simply configure the OA to send SNMP traps to the CMS)

WEBES - runs on the CMS
WEBES is an analyzer that processes events in a similar fashion to OSEM that are sent to it from these primary sources:

Event log on a Windows Server
WBEM subscriptions
Interactions with Command View to gather data for EVAs

From my understanding, it does not "translate" the WBEM events to a readable form as OSEM does, since the WBEM events already contain the information.

WEBES supports mostly:

Integrity servers running HP-UX, through WBEM subscriptions
EVAs by reading the event long on the Storage Management Server through ELMC, and by logging directly into Command View

Now there seems to be some places where WEBES and OSEM overlap each other, and I haven't understood yet to what extent these tools talk to each other. From the OSEM documentation, it seems that WEBES sends events to OSEM, and OSEM then manages the notification.

Why is there OSEM and WEBES? I'm not sure but it looks like OSEM has a Compaq history, while WEBES comes from Digital. ISEE in itself is HP. The tools have not been merged yet, are still actively developped and they will probably complement each other for a while.

ISEE - runs on the CMS
The new 5.x ISEE client is a new version of the 3.95 client, which is now integrated into SIM. Most of the configuration settings you used to put in the ISEE client are now configured there, from the Remote Support menu entry.

SIM - runs on the CMS
SIM is used to actually manage your servers, and WEBES and OSEM automatically sync their configuration with SIM. For instance, if you set yourself as the contact person for a server, both OSEM and/or WEBES configuration will be populated with what you put in SIM. So SIM is the only place where you actually need to do some manual configuration.

Basically, if you think that SIM takes care of handling events, you're wrong. It just _reports_ the events it receives directly and gathers from WEBES/OSEM. It also reports what ISEE does with the events. The exact way it gets the information from these agents is beyond me, I don't know how yet. SIM doesn't send any events to ISEE; RSP and OSEM do. SIM also receives SNMP traps and subscribes to WBEM events. But since they are not filtered, it will only log and "raw" events.

That's what I understand out of this for now. Hope that helps.

Friday, October 10, 2008

Bl495: a perfect fit for virtualization

Bundled OEM "value-added" software that comes with a subpar digital camera or printer is usually not useful, bloated, proprietary and hard to uninstall. And RSP (see my previous post) makes this kind of software look rather elegant.

Yet despite my rant on some HP management tools which are really not worth getting excited for, they do design some pretty interesting hardware, such as the bl495.

I almost wet my pants when I saw these. Now that's the kind of blade I was waiting for -- lots of CPU, even more lots of RAM, two SSDs, and a small footprint. They're just perfect for running an ESX cluster. ESXi can be burned in using the HP USB key, but I'd still prefer ESX for now. Combine this with an EVA4400 and you're on for a helluva ride.

The only thing that's missing in my opinion is an additionnal two LAN ports, which are available on an optional mezzanine. The bl495s include two built-in 10GbE ports which has plenty of bandwidth, but it's complicated to isolate the Service Console, VMkernel and various Vswitches without using tagged Vlans (especially through a Virtual Connect). I prefer having different, physical interfaces for this, especially considering the fact that 10GbE is still too modern for out 1GbE catalysts.

You can easily replace 6 or more full racks of less-critical Wintel server with a 10U chassis full of these. With technologies like this, that can be done. Think about all the space you'll save, and let's not forget about cooling, SAN ports, LAN ports...

Way to go HP!

Tuesday, September 23, 2008

Redirecting a chroot-jailed /dev/log to a dumb syslogd

HP-UX's stock syslogd doesn't support multiple input streams, so when chrooting applications that write to /dev/log (namely an SFTP server), you're pretty much stuck with no possibility of logging anything in syslog.

I considered for a short time installing syslog-ng and connecting /chroot/dev/log to /dev/log but that seemed overkill.

That's until I found out that this works perfectly to connect one fifo to another:


while true; do cat /chroot/dev/log > /dev/log; done

Wow. That's an easy workaround. And it doesn't consume much, the loop only happens when a line is written in the log.

So I wrote a nicer wrapper around this line (97 to be exact), and published it here:

http://www.mayoxide.com/toolbox/log_redirector.sh

Bye

Thursday, September 18, 2008

Cable management - a whitepaper from HP

The art of cable management is a trade that seems to be learned from "father to son" and there are few documents out there that actually show how to do it.

The following whitepaper is a start, but it has few pictures:
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01085208/c01085208.pdf?jumpid=reg_R1002_USEN

Wednesday, September 10, 2008

Paper: Understanding routing in multi-homed HP-UX environments

Multi-homed HP-UX servers, especially ServiceGuard nodes, present a challenge in terms of routing, which is further exacerbated by the lack of documentation on the subject.

This paper tries to explain how to configure multi-homed servers to enhance the routing of IP packets and prevent asymmetric routing.

Note that I call this paper a "graypaper". I do not work for HP, nor do I have any internal knowledge of HP-UX. The information in this paper has been determined by looking at the output of tcpdump and by reading publicly accessible documentation and posts in the ITRC forums. This document is provided "as is" without any warranty.

Click here to read this paper.

Friday, September 5, 2008

mail loops haunting me again

This week, an important mail loop caused slowdown problems in our company's mail servers. The cause of it was one of my servers that, in 12 hours, managed to send over 125K emails. Counting all the bounces that came back, the total number of e-mails must have been around 250K.

Here were the ingredients:

1. All our servers have a local sendmail daemon active. This is a requirement for our applications that speak SMTP to localhost:25. From a security standpoint, I had IP Filter filtering port 25 so I didn't modify the default sendmail configuration too much as I wanted it to remain as standard as possible.

2. After a few months, I forgot about point #1, of course. For a long time, I was under the impression that we had no sendmails listening at all.

3. Last week, we stopped IP Filter on one of the servers which was having some networking problems, and since it's a mission-critical one, I didn't have the guts to restart it. So this basically made the SMTP server active to the outside world.

The 3 ingredients were in place for a mail loop. Here's how it happened:

1. Thursday, I killed a process on the server, and an e-mail was generated with a missing process alert. The e-mail was sent to root.

2. All mails destined to root are redirected, through /etc/mail/aliases, to a MS Exchange mailing-list that includes all the system administrators.

3. One of our administrators, let's say John Doe, was on leave since a while, and it's mailbox was full.

4. The mail bounced back with a message stating that John Doe's mailbox was full. Its return address was either root@server.

5. Since the server had sendmail, and its port was unfiltered, it picked up the mail and tried to deliver it to root.

6. Back to step #2, 150000 times.

Now that loop lasted for a while until I got back at work.

To prevent this in the future:

1. I spent some time making sendmail "send only". The HP-UX sendmail.cf generator, gen_cf, sucks big time but I found out that by setting send_only and modifying /etc/rc.config.d/mailservs, it adds the correct DaemonOptions to restrict it to listening to 127.0.0.1. So even if IP Filter is stopped, at least any bounce will be refused by the server.

2. IP Filter should also be restarted ASAP.

3. I also redirected postmaster and MAILER-DAEMON to /dev/null (they are sent to root by default) so that if steps 1 and 2 are not followed, at least these addresses these won't participate in the loop.

4. I checked how sendmail could be throttled to limit the number of emails that are sent in a specific time period, there are macros for this available but I'd rather not deviate too much from the default settings.

5. I also think the Exchange administrators should reconsider the "let's send a bounced mail each time a mailbox is full" strategy. I know nothing of Exchange but I strongly beleived this can be throttled. If an account has, say, 10 bounces a second, this feature should be automatically deactivated.

As a side node, having support from the manufacturer is important to me. So don't tell me to install postfix or qmail. I don't want to. If I die, quit or go on a hell of a long vacation, I expect any less experienced admin to be able to call HP directly and be supported. That's why I'm relying on the subsystems that are included with HP-UX (sendmail, apache, tomcat, wu-ftpd, etc.) and not the open-source ones. Yes, they're outdated and yes, they're not necessarily the best of breed, but they work. Furthermore, any security patch is issued by HP, so I don't need to take care of that either.

Preventing asymmetric routing under multi-homed HP-UX hosts

I've been having problems with asymmetric routing for a week now, and found some interesting tidbits on the routing algorithm of HP-UX. Most of this was done with experimentation and lots of sniffing with tcpdump. There are so few documents on this subject, that I'm working on what I call a graypaper. I will should post it eventually. Serviceguard nodes are especially prone to this because many of them are multi-homed.

In the mean time, if you experience some asymmetric routing, send me an e-mail. There are some interesting ndd and route settings that can be tweaked to circument it. You need 11iv2 or later, or 11iv2 with TOUR 2.4.

Friday, August 29, 2008

Multi-homing under IPFilter: A gotcha with HP-UX

In the last year I've been experiencing some weird problems under IP Filter when using multi-homed HP-UX servers. I've overcome this up until now but I think I have hit a particular problem when running under ServiceGuard and floating IPs.

Take the following steps if your TCP sessions lock up after a while, without any indication in the syslog that packets are being blocked:

1. Stop IP Filter (easy, but probably not what you want)

2. If running IP Filter with a multi-homed system, take great care to prevent any asymmetric routing (i.e. be sure that what gets in on one interface, gets out on the same).

I'll try to make a comprehensive post on this particular problem soon.

Tuesday, August 26, 2008

Great blog entry that lists useful ESX tools

http://communities.vmware.com/blogs/gabrielmaciel/2008/06/18/more-vmware-tools-and-utilities

MCS StorageView is particularly useful!!

Monday, August 25, 2008

Building a (cheap) NAS for ESX using OpenFiler

In the last days before my vacation, I spent some time rebuilding an old DL380G2 Proliant attached to an MSA500 to make a cheap NAS to use as an ESX datastore.

Using OpenFiler, it is possible to make a cheap, iSCSI-based server that could store non-critical data such as ESX clones and templates. I tried it, and it seems to work well.

However:

There is no way to easily install a Proliant Insight Agent on OpenFiler, as RPM packages can't be installed (and I didn't push my luck trying rpm2cpio). When reusing old hard drives, this is a necessity as you really need to be able to monitor them.
I left it up and running for a few weeks, and a networking glitch made it unresponsive on the network; my take is that teaming does not work well. That's weird since I test it by unplugging cables. That server doesn't have an iLO, and it's located in our downtown datacenter to which I don't go that often, so I'm screwed.

So I'm ditching this for the time being. I would prefer having a CentOS-based solution, so that the RHEL Proliant Insight Agent works. But AFAIK nothing seems as easy to set up as OpenFiler. I'm no Red Hat admin, so making all these features work on a vanilla system would take me too much time. If anybody has any suggestions, drop me a note.

Adopting a conservative ESX patching strategy

HP-UX system administrators are familiar with the two "patching strategies": conservative or aggressive. Needless to say that on the mission critical systems I manage, I've always adopted the conservative strategy. It's hard to get downtime to reboot anyway, so one might as well be sure that the patches work.

With my previous, lone ESX 2.x server, I almost never installed any patches since it was complicated; VMware simply didn't have any tool to make the inventory easy.

With ESX 3.5, up until now I've been delighted by VMotion and Update Manager's ease of use. It's now simple to patch ESX servers: simply use Update Manager, remediate your servers, and everything goes on unnoticed by the users. UM will download the patches on your VC server, put the server in maintenance mode, VMotion away any VM you could have on your server, then run esxupdate. It's simple, no questions asked.

That was until the ESX 3.5 Update 2 release.

Most ESX admins will know about the August 12th timebomb in this release. All of this while I was on vacation. Thank God nothing happened, had anyone shutdown a VM it would have been impossible to restart it. And I might has been paged while on vacation.

Needless to say that I spent some time fixing this. Had I waited a few weeks before applying this update, as I should have, I would have missed this unpleasent experience.

Experienced sysadmins will tell me you've been too aggressive. That's true. I was too excited by Update Manager with VMotion. I'll be more careful, now.

Sunday, August 3, 2008

Taking time off... and VM snapshots

Anyone reading this blog won't notice much activity as I'm taking a vacation from work for the next three weeks. Stuff I could write about during that time would mostly concern home improvement, child care and leisure destinations and this, my friends, I don't intend to post about. :)

On a side note, be careful with these darn ESX snapshots. It turns out that the snapshots are reversed in logic from what I'm used to. I might be wrong, but all snapshot technologies I've seen until now such as VxFS snapshots and EVAs snapshots/snapclones all create a seperate data area, and store all the delta since the moment of the snapshot. When there's no more space left, for example when the LV busts out with a VxFS snapshot, the logical thing that happens is that no more delta can be logged so your snapshot is lost.

That's not how it works with ESX. Under ESX, the original .vmdk is frozen and made read-only, and all the delta is logged to another .vmdk file, aptly named xxxxx-delta.vmdk. So the original vmdk holds the state of the past snapshot, and not the current state of the disk.

When you "delete" a snapshot, as a matter of fact you're commiting the delta to the original file, a process which takes some time as all the delta is merged back to the original file. So anyone intending to use snapshots must consider the time it takes to get rid of it.

I don't know why ESX makes snapshots like this, I haven't found an explanation yet (although I'm sure there is one; there might be a performance gain in doing so). But what happens if there's no more space left to hold your snapshot? You'll be actually loosing current, and not past data. That sucks. Your VM will crash. And since your snapshot, or would I say current state, will be corrupted, the only thing you can do is go back to its parent.

So be careful.

Monday, July 28, 2008

Monitoring VMFS disk space

I always thought that monitoring free space on VMFS volumes was not useful, but I was wrong. If one creates a snapshot of a VM, and forgets to remove this snapshot after a while, the VMFS might become full and some VMs will crash.

So... how can it be monitored? Well under ESX the quickest solution I found is writing a script, called from a key-authenticated SSH that is executed by Nagios.

I'll spare the nagios side, but here's my generic script that runs on ESX. It uses "vdf" which produces df-like output for VMFS volumes. Of yes, it's not executed as root but as some other unpriviledged user that calls vdf via sudo: monitor_vmfs.sh

N.B. Installing NRPE in ESX is a venture I will not consider. Many people in the forums tend to have problems with this, and it would take too much effort. SSH works right out of the box, is more secure, and does not require me to install third-party software in the ESX console (except my script).

Friday, July 25, 2008

Making (fax)modems work under VMware ESX

Yesterday I was asked if it was possible to make an external faxmodem work under a Windows Server 2003 VM. The customer has an application that is linked against the Microsoft Fax Service DLL, and they have to use a honest-to-goodness modem. The answer was: probably. I checked and indeed, serial ports can virtualized in a VM.

That's nice, but...

The modem being physically connected to a host, you can't put the VM in a cluster (unless maybe you have identical modems hooked up on each of your hosts, but I didn't try it)
We have blade servers, and to use their serial ports we need to hook up a three-way cable in front of the blade, which has a USB, serial and video connectors. This is for diagnostic purposes only. Hooking a modem on this will work, but that's not really pretty to look at.

The solution I found was this: we already have Digi PortServers in production, which we use to access consoles on Alpha systems, and service a few modems as well. By installing RealPort inside a Windows VM, one can redirect a COM port, through the lan, to a port on the PortServer. Windows thinks it's speaking to a physically attached serial device, while in fact it's hooked up on a PortServer.

Tada! The modem now works inside a VM. And it will also work if I VMotion it to another host. Case closed.

RealPort exists on Linux and Solaris as well (no mention if it's x86 or SPARC, though...) so these VMs can also use a similar solution as well.

Sidenote: You can even assign a virtualized serial port on a named pipe on the ESX host. I initally thought of writing a perl script that would have opened the pipe, and use Net:Telnet to telnet to the PortServer. That was before I found out that RealPort existed.

Thursday, July 24, 2008

Petit cours sur vxfsd

On me pose souvent des questions sur vxfsd et le rôle de ce daemon est peu documenté et il n'a pas de man page. Alors voici des explications sur vxfsd.

Sous Unix, une inode est une structure qui contient le metadata sur les fichiers (taille, emplacement sur disque, propriétaire, etc). Ici vous trouverez un .pdf avec un exemple d'inode: http://www.tux4u.nl/freedocs/unix/draw/inode.pdf

Sous HP-UX, ces structures sont mis en cache pour accélérer les performances des I/O. La cache est dynamique et grossit au besoin, consommant de la RAM si nécessaire.

vxfsd est un daemon dont la job est de scanner la cache des inodes afin de libérer celles qui n'ont pas été référencées depuis un certain temps, et de libérer la mémoire. Plus il y a d'inodes à libérer, plus il va travailler longtemps.

Quand un backup roule, il lit tous les fichiers, donc tous les inodes sont ultimement mis en cache. Il est donc normal de voir vxfsd spinner peu de temps après un backup.

C'est un trade-off, on peut tuner le kernel et diminuer l'utilisation de vxfsd en configurant une cache statique. On peut le faire sur un serveur qui a peu de puissance et s'il faut réduire la consommation de vxfsd au minimum.

Mais par souci de simplicité je préfère laisser la configuration par défaut ailleurs même si c'est pas la config idéale.

Références:
Common Misconfigured HP-UX resources: http://docs.hp.com/en/5992-0732/5992-0732.pdf

First post & welcome

Welcome to my blog.

I'll post mostly work-related technical stuff here. I tend to cut my teeth on technical issues quite a lot, and I'll post my findings here. Both in english in french. Don't expect personal information here, unless I'm really in the mood.