SUSE Manager/Nagios

From MicroFocusInternationalWiki
Jump to: navigation, search

SUSE Manager Main Page

Feeding status data from SUSE Manager to Nagios

This describes how to provide data from SUSE Manager to a Nagios server. To achieve this, the Nagios distribution provides a tool called "nrpe", which is short for "Nagios Remote Probe Executor".

Configuring the SUSE Manager server

On the SUSE Manager server, the packages nagios-nrpe and susemanager-nagios-plugin need to be installed. The nrpe will be running as a daemon and listen for requests on a specific port:

zypper in nagios-nrpe susemanager-nagios-plugin
insserv nrpe

The configuration of the nrpe happens in the configuration file /etc/nagios/nrpe.cfg. The following table lists the most important settings:

server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=nagios.suse.de
dont_blame_nrpe=1
command[check_suma_patches]=/usr/lib/nagios/plugins/check_suma_patches $ARG1$
command[check_suma_lastevent]=/usr/lib/nagios/plugins/check_suma_lastevent $ARG1$

The variable server_port defines the port the nrpe will listen on. The default is 5666. Please do not forget that this port needs to be opened in the firewall!

The variables nrpe_user and nrpe_group control the effective user and group IDs the nrpe will be run under.

Since SUSE Manager probes need to access the database, nrpe needs to be able to access the database credentials stored in /etc/rhn/rhn.conf. There are several ways to achieve this: You can either add the user nagios to the group www (this is already done for other IDs such as tomcat); alternatively you can simply have the nrpe run with effective group ID www in above config file.

The variable allowed_hosts defines from which hosts the nrpe will accept connections. Obviously you want to enter the name or address of your Nagios server here.

The variable dont_blame_nrpe is unfortunately unavoidable for this specific use; by default, for security reasons nrpe commands must not have any arguments. However, we need to pass the name of the host we want information for over to nrpe. This is only possible when this special variable is set to 1.

Finally we need to define the command(s) the nrpe can run on SUSE Manager. $ARG1$ will be replaced by the actual host the Nagios server wants information about. In above example, the command is named check_suma_patches, but of course you can use any other name; it is only important that the very same name is being used in the probe definitions on the Nagios server. This is described later in this document.

Please note that nrpe can also be run by inetd or xinetd instead as a standalone daemon. In this case some of above settings do not have any effect as their task is already done by inetd/xinetd. Please see the respective comments in the configuration file if you prefer such a setup.

When the configuration is complete, start the nrpe:

rcnrpe start

Configuring the Nagios server

On the Nagios server, the package nagios-plugins-nrpe or monitoring-plugins-nrpe, nrpe needs to be installed. This provides a plugin which will run probes on a remote host (here always the SUSE Manager server) via the nrpe. Depending on the exit code of the command, the status of the respective probe is displayed.

zypper in nagios-plugins-nrpe
on SLES12 SP1
zypper in nrpe monitoring-plugins-nrpe

Defining the nrpe command

Add the following command definition to one of Nagios' configuration files (eg. to /etc/nagios/objects/commands.cfg or /etc/icinga/objects/commands.cfg)

define command{
        command_name    check_suma
        command_line    /usr/lib/nagios/plugins/check_nrpe -H manager.suse.de -c $ARG1$ -a $HOSTNAME$
}

Please note that in this case manager.suse.de names the name of the SUSE Manager server! Our use case is somewhat special as it will not run on the actual host we want information about, but on the "metahost" SUSE Manager. $ARG1$ is the name of the actual probe that should run on the SUSE Manager server. $HOSTNAME$ is the name of the actual machine we want to retrieve information for.

Defining remote probes

A remote probe for a specific host can now be defined as follows:

define service{
        use                             generic-service
        host_name                       hostname.suse.de
        service_description             Patches
        check_command                   check_suma!check_suma_patches
        }

This defines a service named "Patches" for the host "hostname.suse.de". The check_command tells Nagios to use the special command just defined above; the actual name of the probe is passed as parameter. Right now there exists only one such probe (display the number of pending patches), but chances are there will be additional ones in the future. They will all use the same command definition; the distinction of the various probes happens on the SUSE Manager server.

Now restart the nagios server:

rcnagios restart
or
rcnrpe restart

Feeding data to Nagios

When a probe for a specific host is run it will return of the following four stati:

0:  Ok
1:  Warning
2:  Critical
3:  Unspecified

If everything is configured correctly, you should see the status of the check on the nagios service page of the respective host. In case of problems, please check the nagios log file in /var/log/nagios/nagios.log or /var/log/icinga/icinga.log

Available probes

So far plugins for the following probes are part of susemanager-nagios-plugin:

Patches

This probe informs about the status of a host with regard to pending patches. The probe command looks like this:

define service{
        use                             generic-service
        host_name                       hostname.suse.de
        service_description             Patches
        check_command                   check_suma!check_suma_patches
        }

The following stati may be displayed:

0:  Ok: System is up to date
1:  Warning: At least one patch or package update is available
2:  Critical: At least one security/critical update is available
3:  Unspecified: The host cannot be found in the SUSE Manager database or the host name is not unique

Events

This probe displays the status of the last action that has been performed on the host. The probe command:

define service{
        use                             generic-service
        host_name                       hostname.suse.de
        service_description             Events
        check_command                   check_suma!check_suma_lastevent
        }

The following stati may be displayed:

0:  Ok: Last action completed successfully
1:  Warning: Action is currently in progress
2:  Critical: Last action failed
3:  Unspecified: The host cannot be found in the SUSE Manager database or the host name is not unique

Caveats

Nagios operates on hostnames, SUSE Manager is using system IDs. So it is possible that in SUSE Manager two different systems have the same name (eg. after re-registering). The nagios plugin will detect such a condition and exit with code 3 and the following status information:

System name "hostname.suse.de" not unique

Since the hostname is the unique identifier of a machine for Nagios, it is very important that those names do match. If the name of a monitored machine is misspelled, the Nagios plugin will not find the host in the SUSE Manager database. Again it will exit with code 3 and the following status:

Unknown System: "hostname.suse.de"