Difference between revisions of "LCG-FR / SA1-FR Monitoring NagiosWithQuattor"

Un article de lcgwiki.
Jump to: navigation, search
(Les variables)
Ligne 125: Ligne 125:
== Les variables ==
== Les variables ==
{| border="1" width="100%"
| width="50%" |
variable NAGIOS_RPM_VERSION ?= "3.0.5-1"         
| width="50%" |
string. Nagios Server RPM version
|nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","" , "hardware","undef"))
Ligne 143: Ligne 131:
| width="50%" |
| width="50%" |
string. Nagios Server RPM version
string. Nagios Server RPM version
variable NAGIOS_NODES_PROPERTIES ?= nlist()
variable NAGIOS_NODES_PROPERTIES ?= nlist()

Version du 12:17, 21 janvier 2009

Installing Nagios with quattor

Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.

Configuring the Nagios server

The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.

Repository Used

  • RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.

Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/

  • Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »

Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/

see template : nagios3/plugins/config.tpl

Server Template

An example Nagios server template is here :


This machine should be a UI to monitor grid services.

Who is monitored

2 conditions

  • Hosts from site (variable SITES) and present in config/’sitename’_nodes_properties.tpl will be monitored

Template example for hosts declaration are in LCGQWG: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config

  • AND host that has NAGIOS_CLIENT_ENABLED set to "true":
variable NAGIOS_CLIENT_ENABLED ?= true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

If you want to set this variable to "true" for all your hosts you can put it in the following template:

template site/pro_site_common_config

You can tune this with


see the profile: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl

What is monitored


Services are added in the template « server/cfgfiles/services.tpl » , adding a service can be done like this :

variable TMP_SERVICE=nlist( 
       "use","                             generic-service", 
       "host_name","                       node07.org.fr", 
     "service_description","             Workers ssh_known_hosts", 
     "contact_groups","                  admins", 
     "check_command","                   check_nrpe_long!check_ssh_known_hosts!60", 
     "normal_check_interval","           60 ; check every hour", 
     "max_check_attempts","              1", 

If the second parameter of the function nagios_add_service is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this...

It's possible to add dependency on a services for a host, with a service from another host well defined:

variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES =	nagios_add_host_service_dependency(
	"node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts" 

It's not possible to add dependency between hostgroups (for the moment ?)

commands and NRPE

Nagios configuration files doesn't need complex quattor structure template and so are created with filecopy :

  • adding commands is done in:


  • adding NRPE commands is done in:


It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)

        escape("CE,CE-MPI,!NOQUATTOR")  ,"black hole workers",
        escape("CE,CE-MPI,LFC,SE_DPM,SE_DISK,MON,VOBOX,WMS,!NOQUATTOR"),"host certificate",
        escape("MON")                   ,"apel publisher",
        escape("CE,CE-MPI")                             ,"apel parser",
        escape("WN,CE-MPI,UI,VOBOX")    ,"home partition freespace",
        escape("WN")    ,"pbs_mom transfers",

Proxy management

Need to have a valid certificate for local grid probe. 2 mechanisms are possible: Renewal and Retrieval. In cfg/standard/monitoring/nagios3/server/config.tpl

include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'};
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};

Les variables associées:


client configuration

Les variables


string. Nagios Server RPM version

variable NAGIOS_NODES_PROPERTIES ?= nlist()

nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","" , "hardware","undef"))

variable NAGIOS_HTPASSWD_LOGIN ?= "nagios"

string. Defines the user name allowed to access the nagios interface.

variable NAGIOS_HTPASSWD_PASS  ?= "i88QUu1o8Jwq."

string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "nagios".


list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a "type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now).


string. Nodes which type is not known will be put in this hostgroup


string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set for all nodes, not only the server

variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin"

string. Default user name to use if none specified.

variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails")

nlist. Contains the names of people to be notified, with emails. ex. : nlist("toto"                ,"me@localhost",)


boolean. Nagios wide variable to disable/enable notifications

variable NAGIOS_SYSINFO_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands


string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands

variable NAGIOS_IGNORED_NODES ?= list()

list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* want to monitor

variable NAGIOS_NCG_CONFIG ?= false

boolean. Do you want to define NCG services ? (experimental)

Installation Exemple

With Quattor

svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl
  • Modify your list of machines:
vi ./cfg/sites/your/site/config/your_nodes_properties.tpl
  • create your hardware template
svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl
  • Configure your clients adding:
variable NAGIOS_CLIENT_ENABLED = true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

in the template template site/pro_site_common_config

  • Comit your change:
svn ci -m 'adding serveur nagios'

on the nagios server

  • As root
vi /var/log/spma.log
vi /var/log/ncm-cdispd.log
/etc/init.d/nagios status
/etc/init.d/nagios start
add your server certificate in /etc/grid-security

  • As nagios
add your personnal certificate and create a local proxy (or you can do that from another UI):
voms-proxy-init --voms vo.grif.fr
myproxy-init -c 336 -k xxxxx-s myproxy.grif.fr -l nagios -x  -Z "/O=GRID-FR/C=FR/O=xxx/OU=xxx/CN=xxxx.xxxx.fr"
  • As root: check the proxy retrieval mechanism:
MyProxy credential retrieved. VOMS credential retrieved.
# voms-proxy-info --all -file /etc/nagios/globus/userproxy.pem
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
issuer    : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
identity  : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
type      : proxy
strength  : 1024 bits
path      : /etc/nagios/globus/userproxy.pem
timeleft  : 11:27:56
=== VO vo.grif.fr extension information ===
VO        : vo.grif.fr
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy
issuer    : /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid12.lal.in2p3.fr
attribute : /vo.grif.fr/Role=NULL/Capability=NULL
timeleft  : 11:27:56
uri       : grid12.lal.in2p3.fr:20001