The Born-again Sysadmin: 2009

Tuesday, December 22, 2009

How to breed life in an old laptop

My older Acer TravelMate laptop was getting so slow, I decided a few weeks ago to change it for a a flashy HP with Windows 7.

I still use it from time to time. Like right now. But I was still faced with a clunker that took at least 7 minutes to boot up and be usable, with the hard disk spinning endlessly. 7 minutes! I tried all the Remove-Windows-Rot advice I found, and uninstalled every piece of software I didn't use, then ran all sorts of Crap-Cleaners and Page-File-Optimizers and Disk-Defragmenters. Nothing worked.

I finally decided to:
1. Upgrade my memory to the maximum I could
2. When 1 didn't proove to be satisfying, I reinstalled Windows.

1. Memory Upgrade
I wanted to update from 512Mb to 2Gb, which is the maximum the laptop can hold. I looked on eBay but, what do you know, 1Gb SODIMMs are more expensive than the new stuff (especially when you add the shipping costs). Newegg.ca had cheaper memory modules than eBay, which happened to be brand new, completely legit and under warranty, so I went with them. I ended up buying two modules from G.Skill. I didn't know this company, but apparently they're well known in the gamer and overclocker scene. They actually manufacture their own ICs. I installed them, ran Memtest86+ and everything went well.

2. Reinstall from Scratch
Even with 2Gb or RAM, the rot was still there, with the hard disk being abused like what Jason Scott would call a "drunk cheerleader dropped in the exercise yard of a prison". I went down from 7 minutes to 4 minutes, and that still was unacceptable. I was fed up, so I simply booted off the Recovery CDs (yes, I DO have them!), and went through the pain of visiting Windows Update 8 times, rebooting each time. Total time: 4 hours, but it was worth it. I didn't install NOTHING else on the laptop except the vanilla Windows XP. To prevent crapping my registry, I decided that all add-ons will come from PortableApps.com. Bloatware like Adobe Reader and Quicktime, move away please - you're not welcome on my vanilla PC.

I now have a Laptop which, while not good enough to handle Youtube HD channels, is workable enough to boot in less than two minutes and offer a completely sane Windows XPerience.

If anyone has suggestions, send them in. I would like to know what you do on your side to prevent Windows Rot.

O.

Understanding Insight Remote Support

Here is an update to my "Understanding Insight Remote Support" (formerly "Understanding RSP Components") flow diagram. This one removes OSEM from the picture, which means that there is one less tool to worry about, and adds generic SNMP devices to the mix:

Note that this diagram only shows devices I'm familiar with. If you have any comments, I'll be glad to modify it.

O.

WEBES 5.6 update 1 is out

WEBES 5.6 update 1 has been released recently. It is required to support new hardware, namely the behemoth AMD-based Proliant DL785. Users interested in learning more can take a look at this document which explains what 5.6 and 5.6U1 are all about -- from what I understand, users who have not voluntarily upgraded to 5.6 yet are still at 5.5 and will be updated once RSP 5.40 comes out in early 2010.

For current 5.6 users, the minor update was silently pushed through the RSSWM last week-end and at my site, the upgrade went flawlessly.

I noticed that a bug that was plaguing me since 5.5 has finally disappeared, when you click in the managed systems list on an Integrity/HP-UX Server, the interface now gets back with the info panel instead of timing out.

O.

Friday, December 18, 2009

The outcome doesn't look that good for both HP-UX and Integrity

First of all, we have Brian Cox's blog recent post comparing what he thinks of HP-UX and Linux. Read it here:
http://www.communities.hp.com/online/blogs/musings-on-mcc/archive/2009/12/16/linux-vs-unix-shouldn-t-they-be-equals.aspx

Here is a quote:
Similarly, if you asked me to choose between HP-UX and Linux for a customer’s most demanding workload, I would typically recommend HP-UX. However, if my customers’ time horizon is five years from now, then I would seriously consider Linux (by the way, you could replace OpenVMS for HP-UX and Windows for Linux in the above comparison and I would give you a similar answer).

I've met Brian personally last year, and he's a level-headed guy. Preferences for a platform versus another aside, what he says here makes sense on both a business and technical perspective.

Then, around the same time, rumors pop up indicating that Red Hat will be canceling their Itanium port:
http://www.theregister.co.uk/2009/12/18/redhat_rhel6_itanium_dead/

What will the future be for HP-UX and Integrity? Red Hat apparently abandoning ia64, with Novell being unsure if they'll continue, are especially bad news for BCS. That leaves us with one less operating system for the Integrity line, and it turns out it's one that Cox suggested potential mission critical customers should investigate if planning for 5 years down the road. The outcome for the excellent Integrity line doesn't look that good.

As far as I'm concerned, as a current HP-UX / Integrity customer, it's business as usual for now and will be for a few years to come. We're starting to renew our systems next year and this won't change our plans. But I think it is time to seriously plan my long-term strategy for post 2015.

O.

Wednesday, December 9, 2009

Performing a chmod on a symbolic link

On HP-UX, symbolic links cannot have their permissions changed. When doing a chmod on a symbolic link, the chmod operation is performed on the file it references.

A little background information is in order. When a symbolic link is created, it sets its permissions depending on the current umask. So if you have a umask set to 027, it will create a link like this:
# umask 027
# ln -s /stand/vmunix /tmp/link1
# ls -al /tmp/link1
lrwxr-x--- 1 root sys 13 Dec 9 14:50 /tmp/link1 -> /stand/vmunix

While a very restrictive umask such as 777 will do this:
# umask 777
# ln -s /stand/vmunix /tmp/link1
# ls -la /tmp/link1
l--------- 1 root sys 13 Dec 9 14:50 /tmp/link1 -> /stand/vmunix

So what do you do if someone created a bunch of symbolic links with a umask of 000, and you have scattered symlinks that look like they're world-writable files?

The technical answer would be to ignore them. As most file operations except the link()-related apply to the file referenced by the symbolic link itself, I do not think this is a security problem. But of course, when you're being scrutinized by a security auditor, explanations like this one often don't have any merit. It's less hassle to just satisfy whatever the auditor wants, and correct these symbolic links.

The problem arises when you notice that chmod doesn't work on a symbolic link. And this isn't specific to HP-UX; Linux doesn't allow this either, but I found that FreeBSD has a "-h" option to chmod that addresses the issue. How can you fix that?

The only solution I found by looking into the ITRC forums is to delete the symlink, and re-create it with an appropriate umask. This can be done really quickly but the process won't be atomic so I can't garantee this will be completely unnoticed by your applications.

Here is a short script I've written named lchmod which will ease the operation:


#!/bin/sh
if [ "${1}" = "" -o "${2}" = "" ]
then
       echo "Usage: lchmod  "
       return 1
fi
umask=${1}
symlink=${2}

if [ ! -h ${symlink} ]
then
       echo "Symlink '${symlink}' does not exist"
       return 1
fi
destination=$(/bin/ls -l ${symlink} | sed 's/.*-> //g')
umask ${umask}
rm ${symlink}
ln -s ${destination} ${symlink}

Say you've got this link:
# ls -la /tmp/link3
lrwxrwxrwx 1 root sys 13 Dec 9 14:59 /tmp/link3 -> /stand/vmunix

Simply run lchmod like this and the link will be recreated with a umask set to 027:
# lchmod 027 /tmp/link3
# ls -al /tmp/link3
lrwxr-x--- 1 root sys 13 Dec 9 15:00 /tmp/link3 -> /stand/vmunix*

O.

Using USB dongles with ESX-based virtual machines

The post where I mentioned how I made a pool of external fax modems work with ESX guests using a Digi PortServer has proven to be one of the most popular of my blog.

Recently, I've been faced with a similar challenge: Is it possible to virtualize Windows servers which host software that requires a copy protection USB dongle? The answer is yes!

Since I was a happy camper with the PortServers, I once again checked what Digi had to offer and found their AnywhereUSB line of network-enabled USB hubs. Simply put, these devices work like this: The hub has a LAN port, and you can use it to access USB devices through your LAN. You simply need to add a special driver to your Windows server that will "fake" a local USB port, while in fact it redirects the traffic to the remote hub. This works flawlessly with physical servers and most importantly VMs, and you can VMotion them around at will.

Digi has written a concise whitepaper that describes how to connect the Anywhere USB to VMware ESX guests here: http://www.digi.com/pdf/wp_ESXServer_AnywhereUSB.pdf. The setup is done within a matter of minutes.

The 2-port version, according to Digi's online store has a list price of 287$USD while the 5-port version is 349$USD.

But whatever you do, don't do the same mistake I did and buy a 5-port hub, thinking that each independent port can be shared among multiple servers in the same manner like Digi's serial servers can - I found out that the Anywhere USB can be connected to only one server at the time. The whitepaper above claims that you can have "multiple USB hubs per virtual machine", but don't confuse this with "multiple virtual machines per USB hub". I don't think the 5-port version is very useful for many cases unless you need to plug a lot of devices on the same VM.

Also, remember that the hub only has a 100Mb/s connection and will downgrade USB 2.0 devices to work at USB 1.1 speed. This is fine for many cases such as with a dongle, but any use requiring a high-performance data rate will be better served by using a physical server.

The fact that you can't share the hub with multiple VMs is a serious design limitation that will require you to deploy a lot of these devices if you ever need to virtualize dozens of servers that use dongles. The 287$ cost for each VM has to be considered in this context, but compared to having to install and manage a physical server, this is as cheap as it can get.

O.

Update Dec 10th 2009: I found another product that is a lot cheaper than Digi's. While it would do the job in a SOHO environment, it's built by a vendor that I wouldn't trust for enterprise systems. At 287$, better buy yourself peace of mind, and especially long-term support.

Sunday, December 6, 2009

When something doesn't work...

Here is my definition of something that "doesn't work": It is a product you try before you buy (or, if you're unlucky, you buy outright) and you're not able to use it.

When I buy computer peripherals for my personal use at home, usually the cheaper, the better. And even though they're engineered overseas and poorly translated, they almost always work.

Then why in hell are there many instances where I've seen enterprise software and hardware that does not work when I try it? This is plain nonsense.

Recently I've been trying an OTP solution and, geez, it looks like the manufacturer did *everything* to discourage me from buying their product. Yet I feel compelled to do it as a favour to my team, because it looks like Corporate IT chose this for the VPN access and I don't want my personnel to end up with two different tokens from two different vendors.

I won't go as far as saying the product itself sucks. But its marketing sure as hell does. I won't give out details or name that vendor right now. It's better to give myself some time to vent. But I'm slowly getting pissed and, guess what, I don't like wasting my time.

O.

Tuesday, December 1, 2009

Moving physical extents within a PV

A new pvmove feature appeared in 11.31 which lets you move physical extents within a PV. This can be very useful to move PEs to make space for a LV which has a contiguous allocation policy such as the swap LV, /stand or /.

To use this, simply specify a start range and end range, and tell pvmove to move a range of extents within the same PV.

Example:

root@bonyeune[~]# pvmove /dev/disk/disk21_p2:00736-01248 /dev/disk/disk21_p2
Transferring logical extents of logical volume "/dev/vg00/lvol3"...
Transferring logical extents of logical volume "/dev/vg00/lvol4"...
Transferring logical extents of logical volume "/dev/vg00/lvol5"...
Transferring logical extents of logical volume "/dev/vg00/lvol6"...
Physical volume "/dev/disk/disk21_p2" has been successfully moved.
Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf

In the previous example, I moved a range of 512 PEs from #736 to #1248 further inside the same PV. This freed up PEs between 736 to 1248.

Is it possible to move around a LV with a contiguous policy using this technique? To my surprise, I tried it, and it the answer seems to yes. But there are some limitations. I'll need to set up a VM to be able to experiment further. There are other interesting commands such as "pvmove -n" which lets you move a whole LV without needing to specify PEs like above. I'll make a better post once I've had to time to try it out.

O.

Friday, November 20, 2009

Virtual Connect for Dummies

Grab it while you can

I just went over it and although it is a true honest-to-goodness "For Dummies" book, this is by no means a complete book but rather a 75 page marketing stint by HP.

There isn't much deep technical info in the book but it can be very handy if you need to introduce Virtual Connect to someone who is new to the technology. The informal and clear writing style of the For Dummies books will be familiar to many, and this will encourage people to read it. However, there are few figures and logical diagrams, which means that for serious training you'll be better off reading the official documentation. For someone already familiar with networking and blades, the product briefs for Virtual Connect are very clear in describing what it does.

By the way, after almost 2 years of running Virtual Connect modules, I'm still very excited by them. They saved me a bunch of work and hassles dealing with our networking team.

O.

The hiden advantage of using Remote Support on HP-UX

Readers of my blog will know that I spent quite some time integrating HP-UX 11.23/11.31 with SIM and Remote Support on each and every of my servers, including even the older, neglected test/QA servers no one usually cares about.

I'm sure some must have thought I was crazy investing so much time on a feature that doesn't bring back much, because, they'll say, hardware doesn't break. It is partly true. Hardware doesn't break a lot with the exception of disks, fans and power supplies which can experience a higher failure rate than, say, anything else that's based on transistors. So, most efforts should be prioritized towards monitoring devices which have a lot of these, and this mostly applies to disk arrays.

There is, however, a hidden gem in using Remote Support pack with HP-UX, and it's the monitoring of system panics.

That's right, panics! I don't hear about the term as much as I used to in the old days, but the fact remains that they still happen, and can either be the result of a software bug or even an untrapped hardware problem. With HPVM guests, I've had my share of panics, too.

Remote Support comes to great help with panics. When a panic occurs, once rebooted the monitoring agents will notice it, WEBES will gladly flag it as important, and an event will be logged at the response center. If the panic happens overnight or when the sysadmin is not there (and it WILL happen - most of us are in the office only a small amount of time), hours will be saved in the process as someone will probably have already contacted the system contact about the issue.

There is not yet a feature to send to HP details on the crash dump when the event is opened, and it must be done manually. But I wouldn't be surprised this will come in the future. Wouldn't it be great, for example, if upon rebooting the server crashinfo could be ran automatically and send details to the engineer? One can only hope this will come in the future, to reduce even further the response time.

O.

Tuesday, November 17, 2009

HP-UX Community Links

Okay... over time I've accumulated a few HP-UX resources, here is what I've stumbled upon up until now. I'll keep this post updated, so send me your comments.

Of course, I couldn't start the list without mentioning the ITRC forums which replaced the trusted HP-UX sysadmin list in functionality. Unfortunately, the interface and features have not evolved much in, what, 10 years, which limits its outreach to a database of questions and answers. Some regulars over there do put a community spirit in the HP-UX forum, but over the years I've seen a decrease in the quality of many of the questions and I don't participate as much as I used to. Furthermore, any post that is too critical of HP or its products can get deleted by a moderator, which overcomes an important aspect of having a community-driven independent site.

The real "community" effort should logically comme from Connect and they have an Enterprise Unix SIG (special interest group) and HP-UX group. However, even though they are linked from the ITRC forums, these groups are not very active. Their web interface has a lot of usability problems, which might not encourage anyone wanting to start a discussion there.

I've been surprised to find an HP-UX group on Linkedin, though, it has a group named "HP-UX users" which counts 600 people, and seems to be quite active. Facebook has an "HP-UX" group too but almost nobody posts anything there. I think Linkedin is much better suited to this usage.

Now to blogs. I've found a few people who run HP-UX-related blogs, likewise to mine. Some have RSS feeds you can subscribe to.
Victor Balogh
Steven Protter (a top ITRC member)
Daniel Parkes

I also subscribe to the blog "Musings on Mission Critical Computing", which isn't very technical inbut shows an insight into where top people at HP could be steering HP-UX.

On Twitter, I found a lot of HP feeds available. One of them is named HP_UX_Docs who posts liks to recent documentations and community resources. Well worth subscribing to.

O.

Friday, November 13, 2009

Planning racks and cabling using Visio

For years, HP has been offering on visiocafe.com Visio stencils that help plan not only the racking and dispotition of your servers, but their cabling too. Everything, from blade servers, to SANs, to power distribution units is available. The drawings can be made quickly in a matter of minutes and every component uses the same scale. I can compare the stencils to a bunch of Lego blocks.

By preparing myself in advance using these stencils, I've been able to greatly reduce the the time it takes me to perform physical installations and evaluate precisely how many cables are needed before going on site. This is especially useful when dealing with unmanned data centers where things need to be done right on a unique visit. If you're not familiar with Visio, it is easy to learn, and I can assure you that you'll save time in the long run.

I also use the stencils when drawing maps of my environment, especially network-centric ones. By using the "real" device images instead of white boxes, I end up with maps that are readily understandable by IT staff and they look great!

O.

Friday, November 6, 2009

Quick start with the HP-UX CIFS Client

I've been slowly migrating web servers from IIS over the years to HP-UX Web Server Suite to benefit from increased PHP performance, and a need has come up to access data stored on a Windows share.

The HP-UX CIFS Client is an OEMed version of Objective Development GmbH's Sharity which is a software that lets you mount Windows shares on a variety of platforms. I don't know the specifics but it used to be a userland tool, and HP has extended it to to make it a kernel module. The engineer in charge of the CIFS client actually posts once in a while in the ITRC forum.

The CIFS client is already built in the default 11.31 installation, so there is no need to install it. It is very easy to use. It uses a clever hack to manage who owns files in the CIFS mount: your users log on independently to the CIFS server, and will be able to see the whole filesystem as their own.

In a nutshell, here is how to mount a CIFS share. Your HP-UX box doesn't have to be in a domain.

1. Obtain from your Windows admin a login/password to an account authorized to use the share

2. Activate the CIFS client
# vi /etc/rc.config.d/cifsclient
RUN_CIFSCLIENT=1
# /sbin/init.d/cifsclient start

3. Create a mountpoint and mount it
# mkdir -p /cifs/myshare
# mount -F cifs myserver:/myshare /cifs/myshare

At this step, the filesystem is mounted but it is not possible to access it. You first need to have a UNIX user log in to the CIFS Server using cifslogin, then access to the share will be possible. You can do this as root but it is better to use an unprivileged standard user.

4. Log in to the CIFS server as an unprivileged user
# su - user
user$ cifslogin -U winuser -P winpassword myserver

5. Save the user credentials in the CIFS datbase so they can be reused automatically next time:
user$ cifsdb myserver

6. Return to root and save the share the CIFS database so it can mount automatically next time:
# cifsdb /cifs/clsweb_donnees

This should enable the share to remain mounted across reboots. It is also supposed to work if the Windows server reboots but I have not tested it yet. Those of you who would prefer to use the automounter to mount shares dynamically can also do so, which can be useful if you have a bunch of home directories to take care of.

O.

Friday, October 30, 2009

Using Gartner's Magic Quadrants

When evaluating security products, one way to go is to check out what Gartner has to say. I found out last week that they produce yearly what they call Magic Quadrants on a variety of products, and this can help you choose which ones you're going to look into.

Here is an example of such a Quadrant, using made-up products:

Of course, being inside, or outside, of the Quadrant doesn't mean ANYTHING to me. I'm not evaluating here the veracity of the Magic Quadrant, just its purpose. When choosing software, I prefer "going with my heart" when I can.

But there are cases where using the Quadrant can be of help. For example, if I happen to like Gorbatcheck and it turns out that it's favorably placed in the Quadrant, that's another thing up my sleeve that I can pitch to management. It's also a good ticket to my own peace of mind as being backed by Gartner gives a sense of immunity if the product turns out to be below expectations.

There is also the case where Gorbatcheck might be based on an open-source product, let's say Gorbrafree. Of course, Gorbafree won't be in the Quadrant, but the Quadrant can be used to give more credibility to Gorbafree over something less stellar such as GZK. If you're on a tight budget, that's a way to introduce Gorbafree until you're ready to move on to Gorbacheck.

How do you obtain these quadrants? Try Wikipedia!

Thursday, October 29, 2009

HP-UX 11i: Mission-critical UNIX

HP was announcing that webcast for weeks, and I decided to check it out. I had seen the one from last year and I didn't find this one much different in terms of presentation or content. The detailed results of the report are here.

First, if I recall correctly the study is done on a sample of around 250 people, give or take, I don't remember the exact number. That's not, in my opinion, a high number. But they insist on the fact that the answers come from data center IT tech staff, not CIOs, which at least comforts me as a systems administrator.

According to the study, UNIX(r) is still alive and well and still a strategic OS, but mostly in the enterprise. I insist on UNIX with capitals and the registered trademark, as they didn't include Linux in that category. They only evaluated AIX, Solaris, and HP-UX which are the "big three" Unixes left. I can resume the presentation to this: HP is better than the rest, sprinkled with numbers from Gabriel Consulting Group and a few slides I've seen countless times previously such as the OE "Christmas present" diagram and the HP-UX roadmap. Cox also spent almost 7 minutes on Green IT, speaking about how HP is "green" and reduces carboard boxes. Interesting, but maybe a bit out of subject. The presentation is high-level, and targeted mostly at people evaluating a migration to HP-UX more than existing customers. No wonder it's on cio.com!

Dan made a smalll mistake during the presentation. He mentioned that Virtualization was a strong point for the HP platform, and he is right. But he then followed on how the tendency would be to virtualize different operating systems on the integrity platform, all the way from Linux, to Windows, to OpenVMS, HP-UX, and then he hesitated and brought up "Tandem", which Brian corrected as "Nonstop OS". Well guys, I don't think they are plans to run Nonstop on HPVM. If there are, I stand corrected!

I couldn't comment if HP is the "best". I haven't worked on multiple platforms in 6 years. But whenI did work with HP-UX, Solaris and AIX, I preferred HP-UX due to its better management feature then. I can only expect it is still even better now.

O.

Tuesday, October 27, 2009

To me, it is now true. The web is (almost) the (home) platform

I was holding up from purchasing a new laptop for over 6 months cause everyone told me that Vista sucked. I finally did it last Friday with a brand new one running Windows 7. An HP, of course. All that for a price that was half of what one could pay for an honest espresso machine! And the quarter of an equivalent Macbook (which is why I'm not a Mac user, they're just too expensive).

So what do I think of Windows 7?

Not much, actually. It doesn't seem to be on a suck-o-meter at first glance, which is a good thing. But I simply don't care. I'm not excited. I'm completely indifferent.

Why? Because I've realized that while my previous XP installation had accumulated some useful software over time, I'm not sure this will happen with my new laptop.

Thus why, last week-end while staring at my shiny new laptop, I found myself thinking :

>> Where do I want to go today?

That's a very interesting question.

Windows 7 is sure slicker than Windows XP, but the latter had raised the bar already in 2001 by adding a well-deserved feature that had been missing from Microsoft's consumer Windows line since its inception: it didn't crash.

While I used Outlook Express before, I didn't bother installing Windows Live Mail this time and decided to just use my webmail. I didn't bother shelling out money for Office, as I don't use it much and Google apps works just fine.

My brand new laptop is therefore just an appliance to run a browser. Besides maybe retouching my digital photographs, I simply have not much use for all the processing power, and slick Windows 7 features, of that laptop.

Is the web soon to become the platform for my home usage? You bet.

O.

Friday, October 23, 2009

Old motif habits die hard

Just saw an e-mail today from our development group. They're planning on making a graphical dashboard for our proprietary application which runs on HP-UX. Everything on the backend has been CLI or text-based up until now, with the client running on Windows with MFC. But for that dashboard reserved to application administrators, developing a Motif application came out.

Yes, you read me right, Motif. Whew! Haven't seen development with Motif in a while.

Of course they didn't mean by this they would start develop in Motif right this morning (at least, I hope so). Maybe GTK+ would be more fit, and a good browser-based app would be even better.

My point is that Motif used to be associated with the commercial UNIX flavors for so long that saying "I'll make a Motif app on UNIX" is still a catch-all phrase like saying "I'll transfer files with FTP". Yes, these technologies have been there for a while and still work, but are clunky and outdated... and don't take me wrong, I used to root for Motif! That was in the mid 1990s. The Motif toolkit was, for its time, quite customizable using X resources. The no-frills window manager mwm offered a refreshing, KISS interface that actually worked on a workstation without crashing like Windows95 used to do twice a day.

All that was 15 years ago. Today, everything runs in a browser.

Thursday, October 22, 2009

Migrating from OSEM to WEBES 5.6

Here is a post on my experience migrating from OSEM to WEBES 5.6. It went well except for one minor problem with the MSA 2000.

First of all, you should know there are a few outstanding problems with WEBES 5.6, one of them being a security issue. I'm not sure I can disclose what they are as I didn't get them from official channels but I will say that a patch is expected sometime in December. If you don't need to run 5.6 right now in order to support specific hardware, you should stay with WEBES 5.5 and OSEM 1.4.8a in the mean time.

That being said, for those who wish to run WEBES 5.6, you can update it manually from Remote Support Software Manager. Also update to the latest version of Remote Support Eligible system List at the same time. The procedure is documented in the guide WEBES 5.6 and product coverage.

There is also some (redundant) information in the guides A.05.30 HP Insight Remote Support Advanced with WEBES 5.6 and OSEM to WEBES Migration Guide.

To migrate away from OSEM, if you have a standard installation with OSEM populated by info from SIM, it is as easy as simply uninstalling OSEM but you should read the above documents just to be sure. Once OSEM is uninstalled, stop WEBES (not documented, but I did it anyway) using "net stop desta_service" and "net start desta_service".

If you have hundreds of managed systems, it is better to wait at least an hour before testing if everything works well, as it takes a while for WEBES to stabilize and trap events once it is restarted. I also always confirm that e-mail notifications are enabled in order to have an alternate way of receiving notifications in case there is a problem up the food chain in SIM or ISEE.

Now you should test equipment that used to notify OSEM with SNMP traps to be sure they are being caught by WEBES and service events are opened at HP.

Here is what I tested successfully:

Proliant running Windows ... OK
Proliant running ESX ... OK
C3000 blade chassis ... OK
C7000 blade chassis ... OK
MSA2012i G1 disk array ... NO

The MSA 2000 G1 used to work with OSEM but no longer with WEBES. I've opened a ticket at the ITRC to have an official support statement. This is exacerbated by the fact that its events in SIM are reportedly sent as informational so those of you who "follow the red" could miss critical events.

O.

Sunday, October 18, 2009

The grep of all games

I normally don't talk about games in this blog, but I think a special mention should be made to World of Goo, a game I bought on Wiiware a few months ago and recently finished. It's actually the first game I've played seriously in years. Why? Because like a tool as ubiquitous as grep, it has a simple concept and few rules. It is easy to get the hang of it quickly. It is launched in a matter of seconds. And you can stop where you are, and come back later. Simply put, it's a masterpiece in terms of design, that even my two young boys mastered in a matter of minutes.

Give the game a shot. You will not be disappointed.

World of Goo Trailer 3
par 2dboy

Thursday, October 15, 2009

Log management is done. Now, on to change control. And more pie charts!

First, let me introduce you to the interesting subject of pie charts in the land of security compliance software:

I've had enough of all these software products whose selling point is that they're able to make pie charts. Like in our economic times, someone's job would be to sit in front of a screen all day staring at Pie Charts, making Pie Charts, and reading Reports With Pie Charts (if you're one of these people then sorry - I just can't understand how you can cope with this job). As a systems administrator, I want something that is, in order: 1. easy to use and deploy 2. responsive and 3. fits compliance requirements. If these include pie charts, then let it be, but that shouldn't be the only feature to look for. Seems that when your business consists of charging big bucks for software, looking for a bigger piece of the... er, pie, being no-frills is not a good sales argument.

Now I feel better. Let's move on.

I've posted extensively on various log management solutions I've been looking into in the last few weeks. Oh yes, and let's not forget my rant on the lack of info available on the website of some vendors that could have helped me get an idea of what their product does without needing to bug their sales team. Turns out there has been a total of 5 contenders (and not all of them had great websites, by the way). My business case is almost over, and while I won't disclose what I intend to recommend between ArcSight, Q1labs, a Balabit/Splunk hybrid solution and RSA, let's just say assisting to demos and speaking to a fair number of sales reps got me exhausted.

I could almost joke that had I decided upfront to use rsyslog and program a few perl scripts to extract the required compliance-related data manually, I might had been able to pull it off quicker for free. And if the auditor came in wanting pie charts, I could have been able to plot them in Lotus 1-2-3 like I used to do in high school and print them on a sheet of sprocket-fed paper.

Now it is time to turn that wheel again.

Yes, that's right, I have to do the process all over since I'm now looking for a Change Control solution that supports most of my devices. Change Control = knowing what, and possibly how, predefined critical files have changed on my servers from a certain reference point. None of the vendors above have one available I could piggyback on except Splunk's fschange which is not end-to-end enough and doesn't support HP-UX anyway.

I've looked into what's available and the names Tripwire and Solidcore pop up. I've used the academic Tripwire in the past, it did the job, but I need something that is based on a central server and supports multiple platforms, Windows being one of them. Maybe OSSEC? Perhaps, it is already running here successfully under the radar... but in my enterprise world, FOSS, especially when its intention is to reach compliance, is a hard sell even if it costs close to nothing.

Any suggestions on what I should look into?

O.

Tuesday, October 13, 2009

HPTF 2010: Yes. In Sin City. Again.

The HPTF website has not been updated yet, but the HPTF facebook group posted an official announcement that it will happen again, still at Mandalay Bay. I was hoping for another venue, somewhere in the bay area would have been nice, but ah, well. I'm not the one who decides.

Monday, October 12, 2009

Thoughts on the Sidekick fiasco

... What a fiasco. I can only feel sorry for the sysadmin team in charge of the data at Danger, is this due to their incompetence or simply pressure to deliver? We'll probably never know. But some heads will be rolling for sure, Microsoft will be associated with this mess for years to come, and this might be the end of their foray with the smartphone. This is another one of these data loss event that will go down in history. It's also a strong point against the Cloud, for anyone thinking about outsourcing their data.

It seems to have all happened during a "SAN upgrade". When you update anything on your SAN, you better have a DR site ready, and stop your replication before doing the upgrade. And that doesn't give you the luxury of not backing up your data correctly, which these guys at Danger didn't seem to be doing.

I won't start bashing any particular vendor in this blog, if you're interested in finding out who the rumors point to, I'll let you do your own search. No, it is not HP, but it's not far either. I can't give details, but it's not the first time I hear about a "routine" firmware update on a storage array that goes south. Yes, it is true that SANs are supposed to be upgradable online. But the more and more I think of it, the more I'm comparing the firmware update of a disk array to upgrading the thrusters of a jet while it's in the air. Yes, a jumbo jet can fly while one of the motors is stopped, but would you put your life at stake flying during an upgrade unless you really needed to? I wouldn't.

O.

Thursday, October 8, 2009

Monitoring an MSA 2000 G1 with SIM and Remote Support

I tried it, and it works. Here is a quick checklist:

On the MSA:

Go into Manage -> Event Notification -> SNMP Configuration
Configure your read/write community and the IP address of the CMS.

On the CMS:

Discover or identify the MSA if you have not done so already. If you have two controllers, you only need to discover one controller management IP address, SIM does not correlate together both controllers.
In the system properties, the product number will not be identified correctly. The product number burned in the MSA seems to be a valid HP part number, but the product number I had under contract differed and was a 6 letter number, so I copied it from my contract directly. Add the correct product number in the customer field under Contract and Warranty Information, as well as in Product Number field on the top, just in case. Check the two Prevent the discovery... checkboxes to prevent your mocked product number to be overwritten in the future.
Just to be sure, I also added the Care Pack directly in the system properties instead of relying on it being detected from the HP back-end, due to the problems I've had with the product number.
Re-check the entitlement in Remote Support, it should be green.

Go back on the MSA, and send an SNMP test trap. The event should be logged in SIM and a service event will be opened. N.B. I only tested this with OSEM as of now, as I have not yet had the time to migrate the SNMP monitoring to WEBES.

O.

Wednesday, October 7, 2009

Log management for the system administrator

I've had an increased number of readers who have been following this blog since my first posts detailing my log management hurdles, so here is an update on what's been going on.

I've limited myself to talking to a small number vendors, for various reasons I won't explain here. But I'll tell you what I think you should ask yourself when considering purchasing a log management solution:

Do you want an appliance, or software that runs on your own infrastructure?
Do you want your log data to be translated to a high-level format, keep your raw logs, or do both?
Do you plan on deploying this yourself or do you need an onsite consultant?
Do you favor a solution that is easy to use or one that is feature rich?
Do you have the human resources to maintain the solution once it's installed?
And, of course, what is your budget?

Getting answers to these questions is, well, complicated. Buying software is like purchasing a suit: you have the choice of doing it online, at a rock bottom price, with no help whatsoever and without trying it on. You can also go downtown to stroll down a few department stores, where you can get a feel of what's available, look at the price tags freely, and possibly get some minor adjustments done. Or you can go to a full-service luxury store, where someone will help you pick the perfect suit. Whatever you do is up to you, but I think you get my point.

If you're the department-store type of person, you can assemble some of the components by yourself. While getting your hands dirty will give you more control on the solution and possibly save some money, you need to be sure you'll be compliant with your auditor's requirements once you're done.

Instead of an appliance, getting the specs and a quote for an enterprise-grade x86 server running Linux or Windows isn't rocket science. Enough said.

To centralize your logging, if you're already familiar with syslog-ng, Balabit's Premium Edition of Syslog-ng has few secrets, they have a well-written whitepaper on the suject, and you can even get an instant quote online. If you're on a zero budget, rsyslogd a free alternative but I think syslog-ng might sound better to possible auditors, as they've been hearing about it for years.

As for the log drilling itself, which I decided in my documents to call deferred log analysis, I still don't know what can make the job as I have not finished that part of my architecture yet. I've seen both free and commercial solutions, and up until now Splunk seems to be a strong contender in this area. But I still need to figure out exactly what our tech people will be drilling for, and what the auditors will be looking for in terms of high-level, bells-and-whistles reports, before making my own decision.

The last part is the real-time log analysis, for which some IT security people tell me that it is "not automatable". I have doubts on this statement. While enterprise-wide solutions require dedicated staff, our needs are at a departmental level; I therefore think it is possible to pull it off with limited human resources. We'll see.

Sunday, October 4, 2009

Connecting a MSA2012i through a Virtual Connect with ESX

A year ago, I ordered the required building blocks to install a small ESX cluster in a remote office: a C3000, a few blades, and a MSA2012i. It was my first iSCSI implementation. It took a while to get it racked because my team was busy elsewhere, but now that it's done, I had to experiment a bit to make it work correctly.

The MSA is not an HP design. It's made by a Carlsbad, CA company named Dot Hill. The documentation and web interface are not up to HP's usual standards. (the interface has been upgraded with the MSA 2000 G2, but I have a G1). Furthermore, there is not much information explaining how the controller failover works, and this is important to set it up correctly. There is a very good document here in the ITRC KB that you must read before deploying these devices.

Go read it, right now, and come back to this post when you're done.

Here's how I integrated this through a Virtual Connect. That's not how you should do it, that's how I did it; if there are better solutions, please drop me a comment as I would be glad to hear about what alternatives are possible. If you google around, you'll see that some people have made similar setups to this one.

Single Controller Setup

Above is how you should hook everything up with a single controller MSA. The reason for using two different subnets is to isolate them as if you were on two SAN fabrics. If you use the same subnet, ESX will gladly team both pNICs under the same vswitch, and since one pNIC is active at a time, you won't be able to see both paths at the same time. There might be workarounds but I suggest you save yourself some trouble and use two separate subnets. Be sure to create a vmkernel interface on each one of these subnets, as well as a two service consoles too.

How does failover work? Well, esxcfg-mpath will report two paths for each iSCSI device. So you are free to shut down or update the firmware of one of your Virtual Connect's with no downtime. I tried it, and it works as if you were on a fibre channel SAN.

Dual Controller Setup

With two controllers, you are required to add two switches because of the way the controller failover is designed. It did not find a proper way to hook up both controllers to the Virtual Connect - it insists on teaming the two controllers, and shutting down Controller A doesn't turn off its link so the VC doesn't failover to network Controller B.

In the iSCSI initiator, configure only 192.168.10.10 and 192.168.11.10 - don't bother with the IP addresses of controller B. Although the MSA2012i is not supposed to be active-passive - I've had trouble configuring paths on both controllers at the same time. If you're experiencing long delays booting ESX or scanning your iSCSI HBAs, be sure to reference only to the IPs of the master controller in your iSCSI initiator setup.

If controller A fails or is shut down, controller B will takeover the IP addresses of A automatically and you'll be able to resume I/O. ESX will not even switch from one path to another, as the path is bound to the IP address -- 192.168.10.10 should be out of reach for 30 seconds and come back magically.

As I said, this might not be the best solution, but it worked for me. If I ever revise mine, I'll update this post. Good luck.

O.

Wednesday, September 30, 2009

Integrity fibre channel card firmware quick dive

I've had a fixation on the firmware of these cards for a while. Why? Because while I never update firmware on LAN cards, I still have many 2Gb fibre cards in my environment and they are based on designs probably made sometime around 2005-2006. Their firmware took a while to become mature and support specifics on the Integrity platform such as vPars, so there have been a few firmware releases in their lifetime.

There are two official and documented ways to update the firmware on QLogic-based fibre cards shipped with Integrity servers running HP-UX. The first one, which is also the easiest, consists of putting in the server or through Virtual Media a recent IA Offline Diagnostics CD (which comes with up to date drivers for many cards) and run fcd_update at the EFI Shell. The other one, which I personally prefer, requires you to copy the firmware files on the EFI partition, reboot to the EFI shell, and run fcd_update from there.

However, few administrators actually need to update this firmware as each release of the HP-UX fcd driver comes with the required RISC firmware and it updates the card automatically if required. The only situation where one might need to update the firmware manually is when booting on SAN as it might require updating the EFI driver. More on this below.

That's where being able to do it online can come handy if you want to save time. Why? Because flashing the firmware offline requires, from my experience, around 10 minutes per port and that can become very cumbersome if you're dealing with a rx8640 with multiple vPars or a Superdome. That online process is not well documented but I found this document here which explains how to do it using fcmsutil. It is actually easy: Simply run fcmsutil once to update the RISC firmware, and run it again for the EFI firmware.

So what is the difference between that RISC and EFI firmware? The same document linked above, although not well written, provides some definite answers. The RISC firmware is the storage processor on the card, which, not surprisingly, is based on a RISC chip; it it used to implement the fibre channel protocol. On the other hand, the EFI "firmware" is in fact an EFI driver embedded in a 2nd flash ROM on the card which is loaded by the EFI when booting the server. vPars themselves also go a bit deeper and I won't go too much in details here but they require an additional layer named fPars, or firmware partitions (Alan Hymes from HP has good slides on this), and that EFI firmware must support them if you're running vPars.

I hope this clears things up for you. Good luck with your firmware update endeavours!

O.

Monday, September 28, 2009

Moving a C3000 and C7000

Today, I had to move two blade chassis, a C7000 and C3000, to two different locations downtown. We no longer had the original packaging and this being sensible equipment, my fellow sysadmins and I didn't want to hire movers and risk having some parts broken. HP does offer an official moving service, and they will cover anything that breaks once at your destination, but it can be costly. As two blade chassis can fit quite well in a minivan, they can be moved around easily as long as you're cautious.

You know you're dealing with true geeks when you see a bunch of guys shoveling on a hand truck a naked C7000, tied to it with old orange fiber optic cable because they didn't find anything else. That image was so cool, I should have taken a picture. But man, these suckers are heavy. Even with all the blades, power supplies, fans and interconnects removed, you'll still need to be two to hold them up. And whatever yo do, don't drop'em, especially if you have off-the-shelf Hush Puppies right underneath.

Saturday, September 26, 2009

Looks like things are still like what they used to be!

Last Wednesday, I tried to renew my subscription to a consumer protection magazine I've been reading for the last 11 years. I don't know how many subscribers it has, but it must not be beyond 100 or 200K so it's fair to expect their web services to be limited. Yet, they offered the possibility of renewing over the internet, so sure, I decided to save some carbon dioxide and use their web interface instead of snail mail.

Wrong idea. By following their subscription process, I ended up in the profile of another customer and saw his personal info. I didn't do any effort to do get there, and by that I really mean NONE. It just popped up in my browser. Looks like our sessions got mixed up. Man, even something using the infamous formmail would have given me a better sense of security! Looks like things are still like what they used to be.

There was no credit card info, but enough data to try doing a fraudulent phone call since I not only knew the guy's birth year, but also his address, phone number, and the pinnacle of it all: that he was subscribed to a highly respected magazine, along with the expiration date of his subscription... Social engineering anybody? Sure, many people put all this on display on facebook, but I'm note sure that customer would have liked me calling him up.

While a mom-and-pop operation could be a little more excusable, I'm surprised considering the nature of that publication that such a thing could happen. I left them an e-mail with a screenshot and sure hope they'll fix this soon. We're not in 1995; we're in 2009, and a bug like this shouldn't have gone unnoticed. And no, two business days later, I didn't get any reply to my mail whatsoever.

Needless to say I decided to delete all information in my profile... and send everything through the mail.

O.

Friday, September 25, 2009

Comparing log management products

In the last few weeks, I've been looking into SIEMs and log management products. Yes, you know it already, I've blogged extensively on how I was upset that I had to go through a sales channel to get a bit of info, but promised I would give out details on what I preferred between ArcSight and Splunk.

It turns out that doing a public comparison of these products won't be easy as ArcSight gives out technical info only under NDA. While I can probably announce loudly that "their appliances log stuff", I can probably say no more. So technical details will remain sealed to my business documents. Sorry. One thing I can say, however, it that their range of products seem to be the Cadillac of log management, and everything I could possibly think of needing to better score at our next audit will be in it.

Concerning Splunk, I inquired about ESS using the "contact sales" button as I didn't find much details on that application. They left me a VM some 4 business days after my initial request for info although I said in it I preferred e-mail, and that didn't rub me the right way (I hate voicemail but that subject is more fitting for a future blog post). No follow-up e-mail. I'll try to call them back when I'll be near a phone when it's California time, and with all these governance'n'compliance-related meetings I'm assisting to these days, it might turn out to be never.

Q1labs read my blog, knew I was looking for log management products, and gave themselves the trouble to track me down and find me at my workplace. I normally would have turned them away, but they showed some good will by having someone call me up in french, and their products being designed in Fredericton N.B., I just had to give them a chance. I saw what they make and it's similar in spirit to what ArcSight does, and their selling point is that their technology is simpler and quicker to deploy than ArcSight's. It sure looks interesting.

I'll see what political pressures I'll face internally but compared to some other cost centers in our company, for us IT is an expense, not a revenue. What will determine whoever wins might come down to be strictly business... as long as the tool does the job and has the feature set we're looking for, the financial aspect might end up having the most weight.

I'm all new to pleasing this IT Governance gestapo that came out of nowhere to bully our small, under-the-radar-IT dream team. But from what I understand until now, I first need to submit a "business opportunity" document to them to justify my funding, giving ball park figures and a few vendors, THEN I can make another "business case" document to explain which one I've chosen. Such a process takes time, and when I cannot give any clear timeframe, it's no wonder that these sales people get their hopes down.

Want to know why I prefer Open Source software? Because since it costs nothing, I've been able to pull it off for years without having to go through this shit. Now I'm knee-deep in it.

O.

Wednesday, September 23, 2009

HPTF 2010:if it happens, what would you like to see?

HPTF 2010 has not been confirmed yet. But should it happen for a fifth year, I sure hope to be able to make it again as I enjoy presenting to my peers very much. As abstracts must usually be submitted in January, I started thinking about what I would like to talk about in 2010.

I'll keep it to a technical presentation on what I know most and like the most, and that is - what a surprise - HP-UX.

The year 2008 was spent on increasing the availability and resilience of all mission critical systems under my responsibility. In 2009, my research and efforts have been increasingly towards manageability and security. The security aspect is totally not under my control, and I should rather talk about compliance rather than security. The two might be complementary but they're totally different. And I don't find that subject interesting.

So I think my 2010 paper will be in the manageability area. This being said, my current ideas for subjects are:

Integrating HP-UX systems in a Nagios Core monitoring environment
Easy and secure monitoring of HP-UX servers with SIM and Remote Support
How I manage my HP-UX environment without getting paged

You're welcome to cast your vote on what you would like the most.

Tuesday, September 15, 2009

Are enterprise software details accessible to the average joe?

The post where I bashed an enterprise security software vendor because it wasn't possible to obtain technical information on their products without leaving personal information, and going through the sales channel, got me a lot of e-mails. Well, I wasn't exactly right. I discovered that other vendors in the SIEM industry follow similar standards and don't provide much information, except a feature list, without requiring visitors to register first. Even one product which is spun off from an open source project seems to do the same ! And no, I won't tell their names explicitly this time as I don't want this post to end up on Twitter and get blown out of proportion again. This blog is named Technocrat-UX, not Cranky-UX.

Having used lots of infrastructure security software over the years, where I never had any trouble getting an idea of what these product did exactly, all in a discipline where disclosure is paramount, I was surprised by the way SIEM products are presented. They're in their right to do it that way, but to me, a website is like a store, and if it makes me feel like I've just crossed the door of a very special car dealer instead of my corner Toyota dealership, my interest wanes quickly. Maybe it's just me. After all, I'm a Unix guy.

Perhaps companies that sell products based on business requirements, rather than technical requirements, have a modus operandi I'm not familiar with? Maybe they're, justifiably so, only targeting people with a business education instead of a scientific one? This is possible. So let's check. I've assembled a list of six "Enterprise" software products, and spent 15 minutes checking their websites to see if they have information relevant for a systems administrator. I've voluntarily excluded Open Source software since that wouldn't have been very fair. I also excluded HP software, as I've accumulated 10 years experience of searching through their web maze.

This is way, way, far from thorough. But here are my quick results.

Databases:

Oracle 11g: Has lots of information freely available, and documentation is free to access.
IBM DB2: Same as Oracle. Even better arranged than Oracle, with technical documentation easy to access.

Enterprise Content Management:

Opentext Document Management: I need to register just to see a spec sheet. Yuck.
EMC Documentum: I was curious about EMC, since they also make kickass hardware and own VMware, but for Documentum I also need to register to see info. DoubleYuck.

ITIL-related service request systems:

CA Service Desk: I wasn't expecting much from CA but I was pleasantly surprised. They have lots of info, and access to manuals is free. I'll see CA differently from now on.
BMC Remedy Service Desk: Information is passable, and manuals are not available.

Monday, September 14, 2009

WEBES 5.6 just got released

WEBES 5.6 has just been released. There are no release notes on HP's web site but from what I've been able to gather, the two major changes are that it now uses PostgreSQL as its embedded database instead of relying on SQL Server, and it seems to replace OSEM outright.

I don't know if I will have time to try it out this week but I'll follow-up as soon as I can. For instance, I'll test if the old OSEM monitored devices I have will work out of the box (namely, Proliants and B-Series fibre switches). I'm also curious to see if the old SQL Server database will be deleted during the upgrade. I had some interface timeouts when checking HP-UX managed systems with 5.5 and I can't wait to see if they are resolved.

O.

Tuesday, September 1, 2009

Updating a server to 11iv3 while keeping it in SIM

Here is how to update a server to 11iv3 and keep everything working in SIM and RemoteSupport.

Update or re-install the server following your own procedure.
If necessary, configure all the requisites on the updated server so that it can be integrated correctly with SIM (there are too much to detail here, but this post can help)
Once the update is done, log into SIM and show the System Properties page of your server. Confirm that the two checkboxes "Prevent the discovery from changing these system properties" are unchecked.
Launch a discovery on your system. In 5.3, the process has changed: you need to create a discovery job and specify directly the target server in it.
Subscribe to WBEM events from your server from the Options->Events menu

If using RemoteSupport:

Redo an entitlement check to be sure that it your server is still entitled correctly.
This part is important, you need to restart WEBES (stop director, start director) or else I don't know if and when it will resubscribe to events. I waited 24 hours and it didn't subscribe, so screw it, I restarted it (I know that sucks, but I didn't find out how to force a resubscription besides restarting WEBES). Restarting the director results in WEBES subscribing to your server eventually, this might take a while depending on how many managed nodes you have.
Confirm with "evweb subscribe -b external -L" that there are SIM and WEBES subscriptions and run "sfmconfig -t -a" to test the delivery of events to SIM and the RemoteSupport back-end.

Good luck

Monday, August 31, 2009

Using OFM to update firmware on rx7640/rx8640 series

In the past, updating firmware on cell-based servers was a daunting task, requiring an FTP server usually piggybacked directly on the MP, and lots of manual commands to flash each part independently. Not anymore. HP wanted to charge me to come in and flash a bunch of servers so this gave me the opportunity to do it myself. I flashed among these a two-cell rx7640 using OFM, and it's now dead easy: simply download a .iso file, burn it on a CD, and boot on it. It uses OFM which has been available for low-end Integrity servers for a while. We're still far from the Proliant Firmware Maintenance CD, but nevertheless it's still much better than nothing! One detail: the MP has to be configured to "allow upload of firmware updates from the OS", which is enabled in CM>SO. You still need to cut off AC power though at some point, so an onsite update is still mandatory.

And for those who still have rx7620s, there's no OFM version of the latest firmware and you still have to do it the long way. While it's not as trivial, it is at least well documented.

Friday, August 28, 2009

Apache.org hacked. What the hell were they thinking?

As many will know already, apache.org has been hacked yesterday. While events like these are rare, and sometimes look like science fiction, the path taken to exploit their servers was a relatively easy one that, if I understand it correctly, shows gross negligence from their part.

Here is my analysis of what the apache team posted today:

On August 27th, starting at about 18:00 UTC an account used for automated backups for the ApacheCon website hosted on a 3rd party hosting provider was used to upload files to minotaur.apache.org. The account was accessed using SSH key authentication from this host.

Having your SSH keys stolen is a possibility. With automated tasks, keys are not protected by a passphrase, so anyone who gains access to them can easily use them for their own purpose. The first line of defense is to protect file access to your private keys as much as possible, and use a dedicated user to own it. How were these keys stolen? That could possibly be inside job, and you can probably bet that it didn't require root privileges to grab it. If it did require root, then that should narrow down the culprit unlesss the provider got highjacked, too.

Or perhaps that "3rd party hosting provider" didn't bother protecting the key at all, leaving it world-readable, and didn't chroot its inbound data transfer accounts, so anyone who has FTP access to the server to upload his own stuff could have stumbled upon it by snooping on the server. For that part, I don't know.

But the next part is particularly interesting:

The attackers created several files in the directory containing files for www.apache.org, including several CGI scripts. These files were then rsynced to our production webservers by automated processes. At about 07:00 on August 28 2009 the attackers accessed these CGI scripts over HTTP, which spawned processes on our production web services.

Now get this. From what I can see here, there a few problems here:

Whoever owns the SSH key can upload stuff on minotaur.apache.org. That's not really a problem per se. But there is probably no filtering done on the IP address to limit inbound connections to the provider's netblock, so he can come from possibly anywhere. That might be a usability requirement; in that case, you can bet they'll probably think about requiring port knocking from now on to at least mitigate the possibility of this happening again (and I insist on the verb mitigating, as knocking is more obscurity than real security). And for this to be "safe", the hosting provider will have to keep the knocking sequence as safe as the key.

Whether the account is kept under a tight leash on on minotaur, such as using a chroot jail or whatever else, doesn't make a difference! Why? Because data uploaded to this account is rsynced automatically from that account to the servers running www.apache.org, unverified. So you can possibly upload any nasty code you would like to compromise anyone reading a page on www.apache.org using an exploitable browser.

Hell, who knows, maybe these hackers have been injecting compromised pages for a few days as what seems to have tipped the apache admin off are rogue processes on their servers. They were launched remotely quite easily, as data can be rsync'ed straight in cgi-bin/ ! Now how good is that?

While not thinking about details like that inside a corporation is standard practice, and tolerable in many cases as there is an implicit trust within the organization, as soon as you have a server with a gateway exposed publicly on the net you need to take precautions to isolate it from your production. In this case, it's clear to me that the Apache group didn't think this completely through. Being the authors of a secure and great web server, being hacked like this will probably go down in as one of the shameful events in the Apache group's history.

Tuesday, August 11, 2009

August update

Please note that I'm on vacation thus there will be no updates for a while.

But what's coming for the next fall?

Well dear readers, we've had a security audit recently. While I've invested lots of time into hardening the server perimeter with IP Filter over the years, some adjustements will be needed to enhance security and compliance inside the OS itself.

There are especially discoveries and experiments to be made with the new 11iv3 auditing subsystem which is not well documented, and for which there is currently no whitepaper available at HP Docs. Auditing is now it is way better than what we had before with Trusted Mode, and you can bet I'll use it. I just hope HP did their homework so I won't need to write a hack like audenable to have it work correctly this time, having to rely on audenable in the 11.11 days sucked.

Furthermore, I don't forward all my logs to an external, secure server, except everything related to AUTH_LOG. More needs to be done to be compliant. An intern has worked hard to make this work under many scenarios a few months ago and this will be implemented soon. I'm just waiting for the official mandate. I'll keep you posted on what we'll be doing.

Thursday, July 23, 2009

A tale of trying to run the HP BladeSystem Power Sizer

Normally when I want to evaluate the power that a server will pull from the grid, I just check its specifications and I'm done with it. Yesterday I needed to measure the power requirements of a C3000 chassis we're going to install in a data center downtown. It hasn't been that trivial.

Here is the scoop: I'll tell you how right away how much power a C3000 consumes.

It has a maximum of 6 power supplies, each rated at 1200W.
Since this is a 3+3 configuration of redundant power, this means that the total it can consume is 1200W * 3, which makes 3600W.
At 240V, 3600W gives out 15 amperes.
Done.

While this is possibly not the best way to calculate how much power it requires, doing otherwise is not very easy. The quickspecs detail every possible configuration but are shy on the exact consumption. 6.2A per power supply is mentioned at the end. So okay, we're at 18.6 amperes then.

Then there is the tool named "HP BladeSystem Power Sizer" that looked promising. I could not have been more wrong.

I expected a nice web-based application or, at worst, an Excel spreadsheet (Google docs would have been way cool). But of course, no, no, no, no, no, no! HP made a stand-alone software to do this. I don't see that much anymore, especially for such a purpose. Maybe it has nice bells and whistles that the programmers didn't know how to develop using AJAX, because they're stuck in limbo in a brave .NET world? Perhaps. How could I know? I haven't even been able to make it run!

For a vendor to release stand-alone software like this in 2009 is an excellent way to loose sales. Writing short-sighted Javascript too, by the way. Here's why:

1. Many corporate PCs are locked and users do not have administrator privileges, it's my case here. The Power Sizer is a full-blown InstallShield-packaged setup program, and it fails with an error if you're not an admin. Okay, so I ask an admin to install it.

2. Then I'm welcomed with a dialog that asks me to input my name, company, and e-mail address. It's definitely going to send this back to HP. Why not doing a web-based interface then? No privacy policy info can be seen anywhere (looks like it was at the end of the long EULA I blood-signed when I installed earlier).

3. And, at the end of it all, I get greeted by this:

As you see, this is a white window, with useless menus at the top which don't do anything except offer me the choice of leaving or changing my profile information and an about box.

Boy, does a tool like this suck. Not only does it run only on Windows (GNU/Linux users, scram!), but it requires administrator privileges to install (users with locked PCs, scram!), and according to the EULA, it sends back to HP what you're doing with it (privacy advocates, scram!). And finally, it doesn't even work (customers who made it here, scram!).

Back to the calculator to chew up the power requirements, I guess.

Tuesday, July 21, 2009

vgimport with persistent DSFs

When upgrading a server from 11.23 to 11.31 using a cold-install, the procedure to migrate your VGs and filesystems boils down to this:

vgexport -s -p -m vg_name.map vg_name
Keep a backup copy of the fstab
Reinstall
vgimport -N -s -m vg_name.map vg_name
Put back the filesystems in fstab, create the directories and mount

Pretty straightforward, huh?

System administrators familiar with vgexport/vgimport will notice the new -N option which, according to the man page does this:


 -N             Configure the volume group by populating
                persistent device special files in the /etc/lvmtab
                or /etc/lvmtab_p file, corresponding to the volume
                group, vg_name.  (See intro(7) for information
                about device special files.) This option can only
                be used with -s option.  If vgimport is invoked
                without a -N option, legacy device special files
                will be used to populate the /etc/lvmtab or
                /etc/lvmtab_p file.

                This option may become obsolete in future
                releases.

What this option does, is importing your imported VGs with persisten DSFs. If you don't do this, they will be imported using the legacy DSFs, and if you're on a SAN this means that PVLinks will still be used. I think that the storage stack on 11.31 is smart enough to apply multipathing in the background to legacy DSFs but it's much cleaner to use the persistent DSFs right away.
O./

Friday, July 17, 2009

Even though I'm not fond of RSP, it does work

This morning, a cache battery in one of our EVAs failed at 10h58 AM. At 11h20, the response center called me back to confirm the event (I hadn't read all my mails yet and was not even aware of it by then). I didn't even have mention the part number or the site address. We got in touch with a field engineer by noon.

That's the kind of event where of having invested so much time, and so much energy, in making RSP work pays off.

Wednesday, July 15, 2009

The ISEE EOL e-mail. Or is is AOL? Or LOL?

I got this e-mail today from the Insight Remote Support Team announcing the demise of ISEE:

Did you notice the image has been scaled down? And that the text in it is more readable than this blog post? This is because, my friends, the desk jockey who typed up that e-mail thought that using a 36 point font would gather my attention. How pathetic. It looks like a junk mail sent from @aol.com in the early 2000s. It's so unreadable that I threw it away, laughing. As my blog readers know already, I'm fully aware that ISEE is going away.

To help me appreciate HP further, did you know that I've been trying for 6 weeks now to get Licenses in Command View 7 format? This is becoming an interesting developing story. I'll soon write about this, and will give out how many people up the HP ladder had to be bothered so that I could get these fucking licenses. Help rebuild the U.S. economy: Ask for CV 7 licenses!

Thursday, July 9, 2009

HP-UX IP Filters for SIM and RSP/IRS

Here is what I think are the best filters you can configure in IP Filter when you want an HP-UX server to be monitored by SIM and RSP/IRS:


# Ports required for System Insight Manager / IRS
block return-rst in log quick proto tcp from 1.2.3.4/32 to any head 10
  pass in quick proto tcp from 1.2.3.4/32 to any port = wbem-http  flags S keep state keep frags group 10
  pass in quick proto tcp from 1.2.3.4/32 to any port = wbem-https flags S keep state keep frags group 10
  pass in quick proto tcp from 1.2.3.4/32 to any port = 2381       flags S keep state keep frags group 10
  pass in quick proto tcp from 1.2.3.4/32 to any port = 2301       flags S keep state keep frags group 10
  pass in quick proto tcp from 1.2.3.4/32 to any port = 22         flags S keep state keep frags group 10
block return-icmp(port-unr) in log quick proto udp from 1.2.3.4/32 to any

Replace 1.2.3.4 with the IP address of your CMS.

The rules are set up as a group, to optimize filter processing: any TCP packet that comes in from the CMS goes in group 10, where the filter tries to match it with group 10's rules.

If the TCP packet originating from the CMS is trying to reach the WBEM Services, The System Management Homepage or the SSH port, it goes through. In all other cases, we're a good IP citizen here, as anything that does not match these rules will be sent back a TCP reset (return-rst) instead of seeing its packet dropped. This accelerates the scanning from SIM, and also fixes a problem with WEBES that can hang for a while when it has to deal with dropped packets. We also return an ICMP port unregistered for each UDP packet, since no service at all listens on UDP (not even SNMP).

Wednesday, July 8, 2009

The 2GB limit of lp

I just noticed here a thread in the ITRC forums about someone who's trying to lp a file bigger than 2GBytes:
http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1354026

I'm not surprised it doesn't work. Who could possibly have imagined at Berkeley in, say, 1985, that someone would actually want to print 2 gigs of data?

Although I'm sure there is a valid need for this, man, this makes me realize that times have changed. When I started high school, complete PC games used to fit on a 360k floppy! And they were fun!

Time to convert all these legacy tools to 64 bits. And yes, that includes biff, roff and ex.

Thursday, July 2, 2009

One liner: Compare kernel parameters between 11iv2 et 11iv3

When upgrading from 11iv2 to 11iv3, you might be required to check your kernel settings to be sure they are still appropriate. I don't know if update-ux retains kernel settings since I prefer cold installs, but this one liner should help you. Simply redirect the output of kctune to kctune.11iv2 before the upgrade, keep the file, then do the same with kctune.11iv3 once you've upgraded and run this to compare the two files:


cat kctune.11iv2 || while read param value crap
do
  grep -E "^${param}" kctune.11iv3 || read param2 value2 crap2
  if [ ! "${param2}" = "" -a ! "${value}" = "${value2}" ]
  then
     echo "11iv2: ${param} ${value}"
     echo "11iv3: ${param2} ${value2}"
     echo
  fi
done

The double-pipes should be replaced with pipes (blogspot doesn't seem to let me paste pipes here)

Tuesday, June 23, 2009

Quick review of Openview Performance Manager 8.0

Version 8.0 of Openview Performance Manager has been released, what, two years ago now, and I never thought of upgrading it since versions 5, 6 and 7 were basically identical from my perspective and I expected the same with 8.0. So I didn't bother upgrading until I saw screenshots from 8.0 in slides that were released at the HPTF.

I missed a lot of things. Version 8.0 is better integrated, actually easier to use for newcomers, and the graphs you can make are much more readable. Exit is the old "web forms" interface, everything is Java-based, and surprisingly, it works! While previous versions didn't allow you to easily print out graphs from the Java interface, this one has a handy Print function that will pop a standard browser with a nice looking graph you can print such as this one:

And take a look at the CPU gauge which has changed for the better:

There are some things that are less fun, though. For instance, while it used to be very easy to unselect metrics to be displayed in the graph in the previous version by simply clicking on them in the legend, this no longer works. You have to edit the graph and remove the manually, and it takes more effort. The older systemcoda.txt no longer works either to quickly add managed nodes - you need to import them using a tedious process, or edit and XML file directly, and restart OVPM each time you change it.

The graphic designer has also changed a lot its interface, so users who tediously migrated from Perfview to OVPM <=7.0 will again have to change the way they work. Considering it's a complex process, that's too bad.

Overall 8.0 is interesting if you're looking into making sexier graphs, but functionality might be a problem if you're used to the older versions. In my case, since I'm a casual user, I'm glad to have migrated to 8.0.

O.

Monday, June 22, 2009

Virtual Connect performance monitoring and profiling with HP-UX and Windows

I presented on this subject at HPTF2009, and the slides are out on the session scheduler. For those who did not attend to the event, I have a copy of my presentation here: http://www.mayoxide.com/presentations/

If I had to do this all over again, you can be sure I would have presented with a subject on RSP this year. But abstracts are due in January, and it was too early back then to know if I would have enough dough to make a presentation on RSP for the HPTF. Perhaps for 2010? Maybe, if I can go once more. I've assisted to the event since 2006 and I'm not sure if my management will let me go again.

Saturday, June 20, 2009

Back from HPTF 2009

Once again, I've had the privilege of attending the HPTF and, once again, I was not disappointed.

Although I didn't see the special events, I assisted to all the technical sessions I could find and the content was similar to previous years. But it shows that travel expenses at HP have been cut off; I've had at least one session where the speaker was covering for some of his colleagues, and one particular session where the speaker, an HP employee, wasn't up to the "I like HP" thing as he could have been.

Here are the highlights:

1. At the HP-UX panel, where customers have a chance of speaking to a panel of HP-UX experts, there was a guy responsible for Systems Insight Manager. A lady went ballistic against RSP and the System Fault Management agents. The SIM guy replied that "the team developping the agents have come under lots of scrutiny recently" but he wasn't aware of all the problems (which I've had too, by the way). I felt bad for him, as RSP and the agents was not his product. As a matter of fact, I think the agents, which are very bad, aren't anyone's product at HP. Then two other attendees, and me, chipped in to vent ourselfs on the ISEE-to-RSP migration process. The lady was very upset all that time and the moderator eventually put the discussion to a halt.

2. Speaking about RSP, I've been able to meet its Product Manager face-to-face, all-around a nice guy who was glad to hear from a customer. I didn't spend too much time on my past issues, as they have been resolved, and instead gave some suggestions on improving the product. He should follow-up on this. I was very aware that some of the worst products, the SysFaultMgmt agents the lady I mentionned earlier was pissed about, aren't his products so I didn't spend too much time with them.

3. Connect organized activities named "Meet the experts" which were round-table discussions with select people within HP that cater to a particular product. I've been able to meet Bdale, who's in charge of Open Source at HP, I was alone, and very surprised that no one else showed up. I didn't have much to say actually, but asked him if HP intended to eventually open up HP-UX. The answer was no, as there are too many proprietary technologies in it. Another "meet the experts" session I assisted to was with Bruce and Bob, who are based in Fort Collins, and at this one we were three users. We've gave them some input on what we would like to see in the OS in the future. Hands off to Bruce and Bob, by the way. I've seen them showing up at many conferences for the whole three days, and they're very implicated with the community of users.

4. The closing keynote with Dr. Michio Kaku was very interesting. Great choice. In my opinion, for a tech conference it's much better to have someone like him rather than a comedian. Before him we had to endure the usual corporate mumbo-jumbo from Intel, a Microsoft researcher and Brocade, and as usual it wasn't very good. They at least gave me time to follow-up on my e-mails. That triple--dipping didn't leave much time for Dr. Kaku, he got unpolitely shoved off the stage once his time was up, and that sucked.

5. Like last year, the CDA area wasn't very good, very few prototypes and actual hardware. It would be better off not doing it at all. But there were some CDA sessions and although they didn't deep-dive very far, it was better than nothing.

6. Of yes, and Brian Cox handed out HP-UX 25th anniversary champagne glasses at the HP-UX kickoff. For a geek like I am, that was a great "show-up" present.

That's it for now. More to come.