Difference between revisions of "LCG-FR / SA1-FR Monitoring NagiosWithQuattor"

Un article de lcgwiki.
Jump to: navigation, search
(2 conditions)
 
(50 intermediate revisions by 2 users not shown)
Ligne 3: Ligne 3:
 
Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.
 
Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.
  
 +
You can contact '''frederic.schaer__arobase char__cea.fr''' in case of problem.
 +
 +
== Configuring the Nagios server ==
  
== Configuring the Nagios server ==
+
The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.
 +
 
 +
=== Repository Used ===
 +
 
 +
*Sensors are provided for some of the grid plugins from the SA1 repository: http://www.sysadmin.hep.ac.uk/rpms/grid-services/RPMS.monitoring/
 +
 
 +
*RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.
 +
Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/
 +
 
 +
*Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »
 +
Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/
 +
 
 +
see template : '''nagios3/plugins/config.tpl'''
 +
 
 +
=== Server Template ===
  
The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace. Sensors are provided for many of the plug-ins from the SA1 repository:
+
An example Nagios server template is here :
http://www.sysadmin.hep.ac.uk/rpms/grid-services/RPMS.monitoring/
 
  
An example Nagios server template is included in the QWG distribution !A FAIRE! :
+
https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl
  
object template profile_node58;
+
This machine should be a UI to monitor grid services.
 
   
 
   
include { 'rpms/kernelupdates' }; # this includes kernel updates, no matter the OS version
+
=== Who is monitored, Client configuration ===
variable AII_KS_SRV = "192.54.208.182";
+
 
variable AII_ACK_SRV = AII_KS_SRV;
+
==== 2 conditions ====
variable NFS_AUTOFS = true;
+
 
include { 'site/firewall/nagios_server' };
+
*Hosts from site ('''variable SITES''') and present in '''config/’sitename’_nodes_properties.tpl''' will be monitored
+
 
############
+
Template example for hosts declaration are in LCGQWG:
#Fonctionnalite UI utile pour nagios service grille
+
https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config
  variable VOS ?= list('grif','dteam');
+
 
include { 'machine-types/ui' };
+
*AND host that has NAGIOS_CLIENT_ENABLED set to "true":
############
+
 
 
+
  variable NAGIOS_CLIENT_ENABLED ?= true;
#include Nagios server
+
  include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};
##############################
+
 
##What resources are monitored
+
If you want to set this variable to "true" for all your hosts you can put it in the following template:
  variable SITES = list('dapnia');
+
 
  include { 'config/nodes_properties' };
+
'''template site/pro_site_common_config'''
##############################
+
 
###Configuration, setting variables
+
<u>note : </u> Please consider disabling client installation (by setting NAGIOS_CLIENT_ENABLED=false in the profiles) on nodes with unsupported OSes... otherwise you'll encounter quattor compilation errors.
variable NAGIOS_NCG_CONFIG = true;
+
 
variable NAGIOS_NOTIFICATIONS_ENABLED = false;
+
==== You can tune this with ====
variable NAGIOS_NODES_PROPERTIES  = NODES_PROPS;
+
 
variable NAGIOS_DEFAULT_ADMIN_NAME = "dapnia";
+
'''NAGIOS_IGNORED_NODES''' , '''NAGIOS_MONITORED_HOSTGROUPS'''
variable NAGIOS_IGNORED_NODES = list("node09.datagrid.cea.fr","node19.datagrid.cea.fr","node22.datagrid.cea.fr");
+
 
variable NAGIOS_MONITORED_HOSTGROUPS =  
+
see the profile: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl
  list("WN","NFS","SEDPM","SE_DISK","SITE_BDII","MON","LFC","CE","CE-MPI","VOBOX","UI","WMS");
+
 
variable NAGIOS_ADMIN_CONTACTS= nlist(
+
==== Autorisation to contact NRPE ====
        "tuto1"          ,"tuto1@org.fr",
+
 
        "tuto2"    ,"tuto2@org.fr",
+
You need to declare your nagios server in the variable  '''NAGIOS_SERVER' so it is able to contact NRPE daemon on the clients
+
 
        );
+
=== What is monitored ===
  variable NAGIOS_HTPASSWD_LOGIN ?= "grif";
+
 
variable NAGIOS_HTPASSWD_PASS ?= 'xxxxxx';
+
==== Services ====
+
Services are added in the template '''« server/cfgfiles/services.tpl »''' ,
##############################
+
adding a service can be done like this :
###Functions used to configure services and hosts
+
 
include { 'monitoring/nagios3/server/functions' };
+
  variable TMP_SERVICE=nlist(  
+
        "use","                            generic-service",  
##############################
+
        "host_name","                      node07.org.fr",  
###Services configuration
+
      "service_description","            Workers ssh_known_hosts",  
  variable TMP_SERVICE=nlist(
+
      "contact_groups","                  admins",  
    "use","                            generic-service",
+
      "check_command","                  check_nrpe_long!check_ssh_known_hosts!60",  
    "host_name","                      node07.datagrid.cea.fr",
+
      "normal_check_interval","          60 ; check every hour",  
    "service_description","            Workers ssh_known_hosts",
+
      "max_check_attempts","              1",  
    "contact_groups","                  admins",
+
  );  
    "check_command","                  check_nrpe_long!check_ssh_known_hosts!60",
 
    "normal_check_interval","          60 ; check every hour",
 
    "max_check_attempts","              1",
 
);
 
 
  variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE);
 
  variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE);
variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES=nagios_add_host_service_dependency\
 
("node07.datagrid.cea.fr","nrpe  daemon","node07.datagrid.cea.fr","Workers ssh_known_hosts");
 
include { 'monitoring/nagios3/server/config' };
 
 
###
 
#
 
# software repositories (should be last)
 
#
 
include { 'rpms/siteupdates' };
 
include { PKG_REPOSITORY_CONFIG };
 
  
=== Who is monitored ===
 
  
hosts present in config/nodes_properties.tpl will be monitored, you can tune this with the variable:
 
  
NAGIOS_IGNORED_NODES
+
If the second parameter of the function '''nagios_add_service'''  is « true » , a dependency will be added on the NRPE daemon for all the nodes  « "*,!NOQUATTOR » for the service defined .... need to improve on this...
 +
 
 +
It's possible to add dependency on a services for a host, with a service from another host well defined:
 +
 
 +
variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES = nagios_add_host_service_dependency(
 +
"node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts"
 +
);
 +
 
 +
It's not possible to add dependency between hostgroups (for the moment ?)
 +
 
 +
==== commands and NRPE ====
  
NAGIOS_MONITORED_HOSTGROUPS
+
Some Nagios configuration files don't need complex quattor structure template and so are created with '''filecopy''' :
  
 +
*adding commands is done in:
 +
'''monitoring/nagios3/server/cfgfiles/commands'''
 +
*adding NRPE commands is done in:
 +
'''monitoring/nagios3/client/cfgfiles/nrpe_commands'''
  
see the profile above.
+
It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template '''services.tpl''')
  
== What is monitored ==
 
  
 +
variable NRPE_HOSTGROUPS_SPECIFIC_DEPENDENT_SERVICES=nlist(
 +
        escape("CE,CE-MPI,!NOQUATTOR")  ,"black hole workers",
 +
        escape("CE,CE-MPI,LFC,SE_DPM,SE_DISK,MON,VOBOX,WMS,!NOQUATTOR"),"host certificate",
 +
        escape("MON")                  ,"apel publisher",
 +
        escape("CE,CE-MPI")                            ,"apel parser",
 +
        escape("WN,CE-MPI,UI,VOBOX")    ,"home partition freespace",
 +
        escape("WN")    ,"pbs_mom transfers",
 +
);
  
 +
=== Proxy management ===
 +
Need to have a valid certificate for local grid probe.
 +
2 mechanisms are possible: Renewal and Retrieval.
 +
In cfg/standard/monitoring/nagios3/server/config.tpl
 +
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'};
 +
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};
  
 +
Les variables associées:
 +
{| border="1" width="100%"
 +
| width="50%" |
 +
NAGIOS_MODE_PROXY_RENEW
 +
| width="50%" |
 +
boolean. true if you want to use renewal mechanism
 +
|----
 +
|
 +
NAGIOS_MODE_PROXY_RETRIEVE
 +
| width="50%" |
 +
boolean. true if you want to use retrieval mechanism
 +
|----
 +
|
 +
NAGIOS_RENEW_PROXY
 +
| width="50%" |
 +
string. file where the proxy is renewed by the vobox mechanism renewal
 +
|----
 +
|
 +
NAGIOS_OUTPUT_PROXY
 +
| width="50%" |
 +
string. file where the proxy should be retrieved by the retrieval cron proxy
 +
|----
 +
|
 +
NAGIOS_MYPROXY_NAME
 +
| width="50%" |
 +
string. name of your proxy for later retrieval
 +
|----
 +
|
 +
MYPROXY_SERVER
 +
| width="50%" |
 +
string. name of your myproxy server host
 +
|----
 +
|
 +
NAGIOS_VONAME_PROXY
 +
| width="50%" |
 +
string. VO Used for voms authentication
 +
|----
 +
|}
  
 
== Les variables ==
 
== Les variables ==
  
| NAGIOS_ADMIN_CONTACTS | |
+
 
| NAGIOS_CONFINFO_USERS | |
+
 
| NAGIOS_DEFAULT_ADMIN_NAME | |
+
{| border="1" width="100%"
| NAGIOS_DEFAULT_NODE_GROUP | |
+
| width="50%" |
| NAGIOS_HOSTCOMMANDS_USERS | |
+
NAGIOS_RPM_VERSION ?= "3.0.5-1"
NAGIOS_HOSTVIEW_USERS | |
+
| width="50%" |
| NAGIOS_HTPASSWD_CONFIG | |
+
string. Nagios Server RPM version
| NAGIOS_HTPASSWD_LOGIN | |
+
|----
| NAGIOS_HTPASSWD_PASS | |
+
|
| NAGIOS_IGNORED_NODES | |
+
variable NAGIOS_NODES_PROPERTIES ?= nlist()
| NAGIOS_KNOWN_HOSTGROUPS | |
+
| width="50%" |
| NAGIOS_MONITORED_HOSTGROUPS | |
+
nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.68.0.1" , "hardware","undef"))
| NAGIOS_NCG_CONFIG | |
+
|----
| NAGIOS_NODES_PROPERTIES | |
+
|
| NAGIOS_NOTIFICATIONS_ENABLED | |
+
variable NAGIOS_HTPASSWD_LOGIN ?= "nagios"
| NAGIOS_RPM_VERSION | |
+
| width="50%" |
| NAGIOS_SERVCOMMANDS_USERS | |
+
string. Defines the user name allowed to access the nagios interface.
| NAGIOS_SERVER | |
+
|----
| NAGIOS_SERVICEEXTINFOS | |
+
|
NAGIOS_SERVICEEXTINFOS | |
+
variable NAGIOS_HTPASSWD_PASS ?= "i88QUu1o8Jwq."
| NAGIOS_SERVICES | |
+
| width="50%" |
| NAGIOS_SERVVIEW_USERS | |
+
string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "nagios".
| NAGIOS_SUPPORTED_OS_LIST | |
+
|----
| NAGIOS_SYSCOMMAND_USERS | |
+
|
| NAGIOS_SYSINFO_USERS | |
+
variable NAGIOS_MONITORED_HOSTGROUPS ?= NAGIOS_KNOWN_HOSTGROUPS
| NAGIOS_USER_DEFINED_HOST_DEPENDENCIES | |
+
| width="50%" |
 +
list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a "type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now).
 +
|----
 +
|
 +
variable NAGIOS_DEFAULT_NODE_GROUP?="Others"
 +
| width="50%" |
 +
string. Nodes which type is not known will be put in this hostgroup
 +
|----
 +
|
 +
variable NAGIOS_SERVER ?= FULL_HOSTNAME
 +
| width="50%" |
 +
string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set for all nodes, not only the server
 +
|----
 +
|
 +
variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin"
 +
| width="50%" |
 +
string. Default user name to use if none specified.
 +
|----
 +
|
 +
variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails")
 +
| width="50%" |
 +
nlist. Contains the names of people to be notified, with emails. ex. : nlist("toto"                ,"me@localhost",)
 +
|----
 +
|
 +
variable NAGIOS_NOTIFICATIONS_ENABLED ?= true
 +
| width="50%" |
 +
boolean. Nagios wide variable to disable/enable notifications
 +
|----
 +
|
 +
variable NAGIOS_SYSINFO_USERS ?= "*"
 +
| width="50%" |
 +
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information
 +
|----
 +
|
 +
variable NAGIOS_CONFINFO_USERS ?= "*"
 +
| width="50%" |
 +
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information
 +
|----
 +
|
 +
variable NAGIOS_SYSCOMMAND_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME
 +
| width="50%" |
 +
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands
 +
|----
 +
|
 +
variable NAGIOS_SERVVIEW_USERS ?= "*"
 +
| width="50%" |
 +
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services
 +
|----
 +
|
 +
variable NAGIOS_HOSTVIEW_USERS ?= "*"
 +
| width="50%" |
 +
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts
 +
  |----
 +
|
 +
variable NAGIOS_SERVCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME
 +
| width="50%" |
 +
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands
 +
|----
 +
|
 +
variable NAGIOS_HOSTCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME
 +
| width="50%" |
 +
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands
 +
|----
 +
|
 +
variable NAGIOS_IGNORED_NODES ?= list()
 +
| width="50%" |
 +
list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* want to monitor
 +
|----
 +
|
 +
variable NAGIOS_NCG_CONFIG ?= false
 +
| width="50%" |
 +
boolean. Do you want to define NCG services ? (experimental)
 +
|----
 +
|
 +
|}
 +
 
 +
= Installation Exemple =
 +
 
 +
== With Quattor ==
 +
*server profile creation (look at the profile https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl):
 +
svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl
 +
 
 +
*Modify your list of machines:
 +
vi ./cfg/sites/your/site/config/your_nodes_properties.tpl
 +
 
 +
*create your hardware template
 +
svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl
 +
 
 +
*Configure your clients adding:
 +
variable NAGIOS_CLIENT_ENABLED = true;
 +
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};
 +
in the template '''template site/pro_site_common_config'''
 +
 
 +
 
 +
 
 +
*Comit your change:
 +
svn ci -m 'adding serveur nagios'
 +
 
 +
 
 +
 
 +
== on the nagios server ==
 +
*As root
 +
vi /var/log/spma.log
 +
vi /var/log/ncm-cdispd.log
 +
/etc/init.d/nagios status
 +
/etc/init.d/nagios start
 +
add your server certificate in /etc/grid-security
 +
 
 +
 
 +
*As nagios
 +
add your personnal certificate and create a local proxy (or you can do that from another UI):
 +
 
 +
voms-proxy-init --voms vo.grif.fr
 +
myproxy-init -c 336 -k xxxxx-s myproxy.grif.fr -l nagios -x  -Z "/O=GRID-FR/C=FR/O=xxx/OU=xxx/CN=xxxx.xxxx.fr"
 +
 
 +
*As root: check the proxy retrieval mechanism (see cron '''/etc/cron.d/nagios-proxy-refresh''' , installed by the rpm '''nagios-proxy-refresh-1.7-3.noarch''')
 +
/usr/sbin/nagios-proxy-refresh
 +
MyProxy credential retrieved. VOMS credential retrieved.
 +
# voms-proxy-info --all -file /etc/nagios/globus/userproxy.pem
 +
subject  : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
 +
issuer    : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
 +
identity  : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
 +
type      : proxy
 +
strength  : 1024 bits
 +
path      : /etc/nagios/globus/userproxy.pem
 +
timeleft  : 11:27:56
 +
=== VO vo.grif.fr extension information ===
 +
VO        : vo.grif.fr
 +
subject  : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy
 +
issuer    : /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid12.lal.in2p3.fr
 +
attribute : /vo.grif.fr/Role=NULL/Capability=NULL
 +
timeleft  : 11:27:56
 +
uri      : grid12.lal.in2p3.fr:20001
 +
 
 +
 
 +
== Interface ==
 +
Check your interface: http://nagioserver.xx.fr:/nagios
 +
 
 +
Do you get the SAM test Back?
 +
you can check by hand on your server, as root:
 +
/usr/libexec/grid-monitoring/plugins/nagios/gather_sam -t 3000 --site GRIF \
 +
  --vos ops --sam-root-url  http://lcg-sam.cern.ch:8080/same-pi/ --sam-all
 +
 
 +
== Adding a service ==
 +
 
 +
You can add Personal services in the nagios server profile.
 +
 
 +
You need to source the template with the nagios functions '''standard/monitoring/nagios3/server/functions.tpl'''
 +
 
 +
You need to define the service before including monitoring/nagios3/server/config
 +
 
 +
 
 +
=== EXERCICE ===
 +
 
 +
*1) Define What is needed to test remotely a http server
 +
 
 +
*2) Define a hostgroup (for exemple "WEB"  in '''NAGIOS_MONITORED_HOSTGROUPS''') and set this node type to WEB in your nodes_properties
 +
 
 +
*3) add in your configuration the service to check the HTTP on those HOSTS
 +
tuning the "time period", if needed
 +
 
 +
*4) comit your change and check your interface
 +
 
 +
http://nagiosserver.xxx.fr/nagios

Latest revision as of 16:31, 13 mars 2009

Installing Nagios with quattor

Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.

You can contact frederic.schaer__arobase char__cea.fr in case of problem.

Configuring the Nagios server

The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.

Repository Used

  • RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.

Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/

  • Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »

Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/

see template : nagios3/plugins/config.tpl

Server Template

An example Nagios server template is here :

https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl

This machine should be a UI to monitor grid services.

Who is monitored, Client configuration

2 conditions

  • Hosts from site (variable SITES) and present in config/’sitename’_nodes_properties.tpl will be monitored

Template example for hosts declaration are in LCGQWG: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config

  • AND host that has NAGIOS_CLIENT_ENABLED set to "true":
variable NAGIOS_CLIENT_ENABLED ?= true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

If you want to set this variable to "true" for all your hosts you can put it in the following template:

template site/pro_site_common_config

note : Please consider disabling client installation (by setting NAGIOS_CLIENT_ENABLED=false in the profiles) on nodes with unsupported OSes... otherwise you'll encounter quattor compilation errors.

You can tune this with

NAGIOS_IGNORED_NODES , NAGIOS_MONITORED_HOSTGROUPS

see the profile: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl

Autorisation to contact NRPE

You need to declare your nagios server in the variable NAGIOS_SERVER' so it is able to contact NRPE daemon on the clients

What is monitored

Services

Services are added in the template « server/cfgfiles/services.tpl » , adding a service can be done like this :

variable TMP_SERVICE=nlist( 
       "use","                             generic-service", 
       "host_name","                       node07.org.fr", 
     "service_description","             Workers ssh_known_hosts", 
     "contact_groups","                  admins", 
     "check_command","                   check_nrpe_long!check_ssh_known_hosts!60", 
     "normal_check_interval","           60 ; check every hour", 
     "max_check_attempts","              1", 
 ); 
variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE);


If the second parameter of the function nagios_add_service is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this...

It's possible to add dependency on a services for a host, with a service from another host well defined:

variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES =	nagios_add_host_service_dependency(
	"node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts" 
);

It's not possible to add dependency between hostgroups (for the moment ?)

commands and NRPE

Some Nagios configuration files don't need complex quattor structure template and so are created with filecopy :

  • adding commands is done in:

monitoring/nagios3/server/cfgfiles/commands

  • adding NRPE commands is done in:

monitoring/nagios3/client/cfgfiles/nrpe_commands

It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)


variable NRPE_HOSTGROUPS_SPECIFIC_DEPENDENT_SERVICES=nlist(
        escape("CE,CE-MPI,!NOQUATTOR")  ,"black hole workers",
        escape("CE,CE-MPI,LFC,SE_DPM,SE_DISK,MON,VOBOX,WMS,!NOQUATTOR"),"host certificate",
        escape("MON")                   ,"apel publisher",
        escape("CE,CE-MPI")                             ,"apel parser",
        escape("WN,CE-MPI,UI,VOBOX")    ,"home partition freespace",
        escape("WN")    ,"pbs_mom transfers",
);

Proxy management

Need to have a valid certificate for local grid probe. 2 mechanisms are possible: Renewal and Retrieval. In cfg/standard/monitoring/nagios3/server/config.tpl

include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'};
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};

Les variables associées:

NAGIOS_MODE_PROXY_RENEW

boolean. true if you want to use renewal mechanism

NAGIOS_MODE_PROXY_RETRIEVE

boolean. true if you want to use retrieval mechanism

NAGIOS_RENEW_PROXY

string. file where the proxy is renewed by the vobox mechanism renewal

NAGIOS_OUTPUT_PROXY

string. file where the proxy should be retrieved by the retrieval cron proxy

NAGIOS_MYPROXY_NAME

string. name of your proxy for later retrieval

MYPROXY_SERVER

string. name of your myproxy server host

NAGIOS_VONAME_PROXY

string. VO Used for voms authentication

Les variables

NAGIOS_RPM_VERSION ?= "3.0.5-1"

string. Nagios Server RPM version

variable NAGIOS_NODES_PROPERTIES ?= nlist()

nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.68.0.1" , "hardware","undef"))

variable NAGIOS_HTPASSWD_LOGIN ?= "nagios"

string. Defines the user name allowed to access the nagios interface.

variable NAGIOS_HTPASSWD_PASS  ?= "i88QUu1o8Jwq."

string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "nagios".

variable NAGIOS_MONITORED_HOSTGROUPS ?= NAGIOS_KNOWN_HOSTGROUPS

list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a "type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now).

variable NAGIOS_DEFAULT_NODE_GROUP?="Others"

string. Nodes which type is not known will be put in this hostgroup

variable NAGIOS_SERVER ?= FULL_HOSTNAME

string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set for all nodes, not only the server

variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin"

string. Default user name to use if none specified.

variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails")

nlist. Contains the names of people to be notified, with emails. ex. : nlist("toto" ,"me@localhost",)

variable NAGIOS_NOTIFICATIONS_ENABLED ?= true

boolean. Nagios wide variable to disable/enable notifications

variable NAGIOS_SYSINFO_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information

variable NAGIOS_CONFINFO_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information

variable NAGIOS_SYSCOMMAND_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands

variable NAGIOS_SERVVIEW_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services

variable NAGIOS_HOSTVIEW_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts

variable NAGIOS_SERVCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands

variable NAGIOS_HOSTCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands

variable NAGIOS_IGNORED_NODES ?= list()

list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* want to monitor

variable NAGIOS_NCG_CONFIG ?= false

boolean. Do you want to define NCG services ? (experimental)

Installation Exemple

With Quattor

svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl
  • Modify your list of machines:
vi ./cfg/sites/your/site/config/your_nodes_properties.tpl
  • create your hardware template
svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl
  • Configure your clients adding:
variable NAGIOS_CLIENT_ENABLED = true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

in the template template site/pro_site_common_config


  • Comit your change:
svn ci -m 'adding serveur nagios'


on the nagios server

  • As root
vi /var/log/spma.log
vi /var/log/ncm-cdispd.log
/etc/init.d/nagios status
/etc/init.d/nagios start
add your server certificate in /etc/grid-security


  • As nagios
add your personnal certificate and create a local proxy (or you can do that from another UI):
voms-proxy-init --voms vo.grif.fr
myproxy-init -c 336 -k xxxxx-s myproxy.grif.fr -l nagios -x  -Z "/O=GRID-FR/C=FR/O=xxx/OU=xxx/CN=xxxx.xxxx.fr"
  • As root: check the proxy retrieval mechanism (see cron /etc/cron.d/nagios-proxy-refresh , installed by the rpm nagios-proxy-refresh-1.7-3.noarch)
/usr/sbin/nagios-proxy-refresh
MyProxy credential retrieved. VOMS credential retrieved.
# voms-proxy-info --all -file /etc/nagios/globus/userproxy.pem
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
issuer    : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
identity  : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
type      : proxy
strength  : 1024 bits
path      : /etc/nagios/globus/userproxy.pem
timeleft  : 11:27:56
=== VO vo.grif.fr extension information ===
VO        : vo.grif.fr
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy
issuer    : /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid12.lal.in2p3.fr
attribute : /vo.grif.fr/Role=NULL/Capability=NULL
timeleft  : 11:27:56
uri       : grid12.lal.in2p3.fr:20001


Interface

Check your interface: http://nagioserver.xx.fr:/nagios

Do you get the SAM test Back? you can check by hand on your server, as root:

/usr/libexec/grid-monitoring/plugins/nagios/gather_sam -t 3000 --site GRIF \
 --vos ops --sam-root-url  http://lcg-sam.cern.ch:8080/same-pi/ --sam-all

Adding a service

You can add Personal services in the nagios server profile.

You need to source the template with the nagios functions standard/monitoring/nagios3/server/functions.tpl

You need to define the service before including monitoring/nagios3/server/config


EXERCICE

  • 1) Define What is needed to test remotely a http server
  • 2) Define a hostgroup (for exemple "WEB" in NAGIOS_MONITORED_HOSTGROUPS) and set this node type to WEB in your nodes_properties
  • 3) add in your configuration the service to check the HTTP on those HOSTS

tuning the "time period", if needed

  • 4) comit your change and check your interface

http://nagiosserver.xxx.fr/nagios