LCG-FR / SA1-FR Monitoring NagiosWithQuattor

Un article de lcgwiki.
Jump to: navigation, search

Installing Nagios with quattor

Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.

You can contact frederic.schaer__arobase in case of problem.

Configuring the Nagios server

The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.

Repository Used

  • RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on

Ex. :

  • Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »

Ex. :

see template : nagios3/plugins/config.tpl

Server Template

An example Nagios server template is here :

This machine should be a UI to monitor grid services.

Who is monitored, Client configuration

2 conditions

  • Hosts from site (variable SITES) and present in config/’sitename’_nodes_properties.tpl will be monitored

Template example for hosts declaration are in LCGQWG:

  • AND host that has NAGIOS_CLIENT_ENABLED set to "true":
variable NAGIOS_CLIENT_ENABLED ?= true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

If you want to set this variable to "true" for all your hosts you can put it in the following template:

template site/pro_site_common_config

note : Please consider disabling client installation (by setting NAGIOS_CLIENT_ENABLED=false in the profiles) on nodes with unsupported OSes... otherwise you'll encounter quattor compilation errors.

You can tune this with


see the profile:

Autorisation to contact NRPE

You need to declare your nagios server in the variable NAGIOS_SERVER' so it is able to contact NRPE daemon on the clients

What is monitored


Services are added in the template « server/cfgfiles/services.tpl » , adding a service can be done like this :

variable TMP_SERVICE=nlist( 
       "use","                             generic-service", 
       "host_name","             ", 
     "service_description","             Workers ssh_known_hosts", 
     "contact_groups","                  admins", 
     "check_command","                   check_nrpe_long!check_ssh_known_hosts!60", 
     "normal_check_interval","           60 ; check every hour", 
     "max_check_attempts","              1", 
variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE);

If the second parameter of the function nagios_add_service is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this...

It's possible to add dependency on a services for a host, with a service from another host well defined:

variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES =	nagios_add_host_service_dependency(
	"","nrpe daemon", "","Workers ssh_known_hosts" 

It's not possible to add dependency between hostgroups (for the moment ?)

commands and NRPE

Some Nagios configuration files don't need complex quattor structure template and so are created with filecopy :

  • adding commands is done in:


  • adding NRPE commands is done in:


It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)

        escape("CE,CE-MPI,!NOQUATTOR")  ,"black hole workers",
        escape("CE,CE-MPI,LFC,SE_DPM,SE_DISK,MON,VOBOX,WMS,!NOQUATTOR"),"host certificate",
        escape("MON")                   ,"apel publisher",
        escape("CE,CE-MPI")                             ,"apel parser",
        escape("WN,CE-MPI,UI,VOBOX")    ,"home partition freespace",
        escape("WN")    ,"pbs_mom transfers",

Proxy management

Need to have a valid certificate for local grid probe. 2 mechanisms are possible: Renewal and Retrieval. In cfg/standard/monitoring/nagios3/server/config.tpl

include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'};
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};

Les variables associées:


boolean. true if you want to use renewal mechanism


boolean. true if you want to use retrieval mechanism


string. file where the proxy is renewed by the vobox mechanism renewal


string. file where the proxy should be retrieved by the retrieval cron proxy


string. name of your proxy for later retrieval


string. name of your myproxy server host


string. VO Used for voms authentication

Les variables


string. Nagios Server RPM version

variable NAGIOS_NODES_PROPERTIES ?= nlist()

nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","" , "hardware","undef"))

variable NAGIOS_HTPASSWD_LOGIN ?= "nagios"

string. Defines the user name allowed to access the nagios interface.

variable NAGIOS_HTPASSWD_PASS  ?= "i88QUu1o8Jwq."

string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "nagios".


list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a "type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now).


string. Nodes which type is not known will be put in this hostgroup


string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set for all nodes, not only the server

variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin"

string. Default user name to use if none specified.

variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails")

nlist. Contains the names of people to be notified, with emails. ex. : nlist("toto" ,"me@localhost",)


boolean. Nagios wide variable to disable/enable notifications

variable NAGIOS_SYSINFO_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands

variable NAGIOS_IGNORED_NODES ?= list()

list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* want to monitor

variable NAGIOS_NCG_CONFIG ?= false

boolean. Do you want to define NCG services ? (experimental)

Installation Exemple

With Quattor

svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl
  • Modify your list of machines:
vi ./cfg/sites/your/site/config/your_nodes_properties.tpl
  • create your hardware template
svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl
  • Configure your clients adding:
variable NAGIOS_CLIENT_ENABLED = true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

in the template template site/pro_site_common_config

  • Comit your change:
svn ci -m 'adding serveur nagios'

on the nagios server

  • As root
vi /var/log/spma.log
vi /var/log/ncm-cdispd.log
/etc/init.d/nagios status
/etc/init.d/nagios start
add your server certificate in /etc/grid-security

  • As nagios
add your personnal certificate and create a local proxy (or you can do that from another UI):
voms-proxy-init --voms
myproxy-init -c 336 -k xxxxx-s -l nagios -x  -Z "/O=GRID-FR/C=FR/O=xxx/OU=xxx/"
  • As root: check the proxy retrieval mechanism (see cron /etc/cron.d/nagios-proxy-refresh , installed by the rpm nagios-proxy-refresh-1.7-3.noarch)
MyProxy credential retrieved. VOMS credential retrieved.
# voms-proxy-info --all -file /etc/nagios/globus/userproxy.pem
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
issuer    : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
identity  : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
type      : proxy
strength  : 1024 bits
path      : /etc/nagios/globus/userproxy.pem
timeleft  : 11:27:56
=== VO extension information ===
VO        :
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy
issuer    : /O=GRID-FR/C=FR/O=CNRS/OU=LAL/
attribute : /
timeleft  : 11:27:56
uri       :


Check your interface:

Do you get the SAM test Back? you can check by hand on your server, as root:

/usr/libexec/grid-monitoring/plugins/nagios/gather_sam -t 3000 --site GRIF \
 --vos ops --sam-root-url --sam-all

Adding a service

You can add Personal services in the nagios server profile.

You need to source the template with the nagios functions standard/monitoring/nagios3/server/functions.tpl

You need to define the service before including monitoring/nagios3/server/config


  • 1) Define What is needed to test remotely a http server
  • 2) Define a hostgroup (for exemple "WEB" in NAGIOS_MONITORED_HOSTGROUPS) and set this node type to WEB in your nodes_properties
  • 3) add in your configuration the service to check the HTTP on those HOSTS

tuning the "time period", if needed

  • 4) comit your change and check your interface