Monday, October 27, 2008

Understanding all the RSP components

N.B. My updated diagram from December 2009 is here

This blog entry is updated regularly. Latest updates:

  • November 4th 2008
  • November 19th 2008
  • December 10th 2008
  • December 16th 2008
  • Feburary 20th 2008

Having read (diagonally) over 1000 pages of documentation related to every component that RSP includes, here are my notes that might be of help. This is definitely not all accurate. When I find inconsistencies, I'll update this blog post.

The bottom line is that you no longer have a simple ISEE client running on your HP-UX host anymore. It's now much more complex than this.

There's a bunch of "new" tools that will become part of your life. In fact these are "old" tools that have been available for years. They're now tightly welded together, run on a central server (CMS) instead of locally on each monitored host, and for the most part do not need to be configured independently, but it's important to understand what each one does.

SysFaultMgmt (System Fault Management) - runs on the HP-UX server
It's the "new generation" of EMS, that speaks WBEM. Using WBEM, it can be integrated easily in SMH (System Management Homepage) and SIM (Systems Insight Manager). SysFaultMgmt used to work in parallel with traditionnal EMS monitors, but since HP-UX 11iv3 March 2008, it seems to switch off EMS and replaces it completely. EMS will be eventually EOL'd.

EVWeb - runs on the HP-UX server
A companion to SysFaultMgmt which is a GUI that lets you query and manage WBEM subscriptions. There's also an evweb CLI, which will let you extract events and see their contents (they look similar to EMS's event.log file). The CLI has a man page, it's not hard to use. Be careful: I've played with evweb from SMH, sometimes it crashed, and it resulted in some evweb CGI's spinning endlessly, taking 100% CPU. The CLI is probably more robust.

System Insight Manager agent - runs on Proliants running VMware ESX and probably Windows as well

This agent includes a good-old System Management Homepage, along with hardware diagnostics agents. If the agents detect that something goes wrong, they are configured to send an SNMP trap to the CMS.

OSEM - runs on the CMS
OSEM is an agent that analyzes SNMP events that are sent to it. It filters them, and translates them to a human-readable form which can be sent by e-mail and/or to ISEE. By filtering, I mean that will be find out if an SNMP trap send by a device is actually an important one, and decide if it's necessary to generate a service event for it.

OSEM supports mostly systems that reports their events using SNMP:

  • Proliant servers running Linux, Windows or VMware ESX.
  • Integrity Servers running Linux
  • SAN switches
  • MSA enclosures
  • Bladesystem chassis (simply configure the OA to send SNMP traps to the CMS)

WEBES - runs on the CMS
WEBES is an analyzer that processes events in a similar fashion to OSEM that are sent to it from these primary sources:

  • Event log on a Windows Server
  • WBEM subscriptions
  • Interactions with Command View to gather data for EVAs

From my understanding, it does not "translate" the WBEM events to a readable form as OSEM does, since the WBEM events already contain the information.

WEBES supports mostly:

  • Integrity servers running HP-UX, through WBEM subscriptions
  • EVAs by reading the event long on the Storage Management Server through ELMC, and by logging directly into Command View

Now there seems to be some places where WEBES and OSEM overlap each other, and I haven't understood yet to what extent these tools talk to each other. From the OSEM documentation, it seems that WEBES sends events to OSEM, and OSEM then manages the notification.

Why is there OSEM and WEBES? I'm not sure but it looks like OSEM has a Compaq history, while WEBES comes from Digital. ISEE in itself is HP. The tools have not been merged yet, are still actively developped and they will probably complement each other for a while.

ISEE - runs on the CMS
The new 5.x ISEE client is a new version of the 3.95 client, which is now integrated into SIM. Most of the configuration settings you used to put in the ISEE client are now configured there, from the Remote Support menu entry.

SIM - runs on the CMS
SIM is used to actually manage your servers, and WEBES and OSEM automatically sync their configuration with SIM. For instance, if you set yourself as the contact person for a server, both OSEM and/or WEBES configuration will be populated with what you put in SIM. So SIM is the only place where you actually need to do some manual configuration.

Basically, if you think that SIM takes care of handling events, you're wrong. It just _reports_ the events it receives directly and gathers from WEBES/OSEM. It also reports what ISEE does with the events. The exact way it gets the information from these agents is beyond me, I don't know how yet. SIM doesn't send any events to ISEE; RSP and OSEM do. SIM also receives SNMP traps and subscribes to WBEM events. But since they are not filtered, it will only log and "raw" events.

That's what I understand out of this for now. Hope that helps.

No comments: