LCG-FR / SA1-FR Monitoring NagiosWithQuattor: Difference between revisions

Un article de lcgwiki.
Jump to navigation Jump to search
LEROY (talk | contribs)
No edit summary
Fschaer (talk | contribs)
 
(6 intermediate revisions by 2 users not shown)
Ligne 46: Ligne 46:


'''template site/pro_site_common_config'''
'''template site/pro_site_common_config'''
<u>note : </u> Please consider disabling client installation (by setting NAGIOS_CLIENT_ENABLED=false in the profiles) on nodes with unsupported OSes... otherwise you'll encounter quattor compilation errors.


==== You can tune this with ====
==== You can tune this with ====
Ligne 318: Ligne 320:
Check your interface: http://nagioserver.xx.fr:/nagios
Check your interface: http://nagioserver.xx.fr:/nagios


Do you get the SAM test Back?
you can check by hand on your server, as root:
/usr/libexec/grid-monitoring/plugins/nagios/gather_sam -t 3000 --site GRIF \
  --vos ops --sam-root-url  http://lcg-sam.cern.ch:8080/same-pi/ --sam-all


== Adding a service ==
== Adding a service ==


You can add Personal services in the nagios server profile.  
You can add Personal services in the nagios server profile.  
You need to source the template with the nagios functions '''standard/monitoring/nagios3/server/functions.tpl'''
You need to source the template with the nagios functions '''standard/monitoring/nagios3/server/functions.tpl'''
You need to define the service before including monitoring/nagios3/server/config
You need to define the service before including monitoring/nagios3/server/config




EXERCICE
=== EXERCICE ===


1)Define What service is needed to test remotely a http server
*1) Define What is needed to test remotely a http server


2) Define a hostgroup (for exemple "WEB"  in '''NAGIOS_MONITORED_HOSTGROUPS''')
*2) Define a hostgroup (for exemple "WEB"  in '''NAGIOS_MONITORED_HOSTGROUPS''') and set this node type to WEB in your nodes_properties


3) add in your configuration services to check the HTTP on those HOSTS
*3) add in your configuration the service to check the HTTP on those HOSTS
tuning the "time period"
tuning the "time period", if needed


4) comit your change and check your interface
*4) comit your change and check your interface


http://nagiosserver.xxx.fr/nagios
http://nagiosserver.xxx.fr/nagios

Latest revision as of 17:31, 13 mars 2009

Installing Nagios with quattor

Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.

You can contact frederic.schaer__arobase char__cea.fr in case of problem.

Configuring the Nagios server

The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.

Repository Used

  • RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.

Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/

  • Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »

Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/

see template : nagios3/plugins/config.tpl

Server Template

An example Nagios server template is here :

https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl

This machine should be a UI to monitor grid services.

Who is monitored, Client configuration

2 conditions

  • Hosts from site (variable SITES) and present in config/’sitename’_nodes_properties.tpl will be monitored

Template example for hosts declaration are in LCGQWG: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config

  • AND host that has NAGIOS_CLIENT_ENABLED set to "true":
variable NAGIOS_CLIENT_ENABLED ?= true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

If you want to set this variable to "true" for all your hosts you can put it in the following template:

template site/pro_site_common_config

note : Please consider disabling client installation (by setting NAGIOS_CLIENT_ENABLED=false in the profiles) on nodes with unsupported OSes... otherwise you'll encounter quattor compilation errors.

You can tune this with

NAGIOS_IGNORED_NODES , NAGIOS_MONITORED_HOSTGROUPS

see the profile: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl

Autorisation to contact NRPE

You need to declare your nagios server in the variable NAGIOS_SERVER' so it is able to contact NRPE daemon on the clients

What is monitored

Services

Services are added in the template « server/cfgfiles/services.tpl » , adding a service can be done like this :

variable TMP_SERVICE=nlist( 
       "use","                             generic-service", 
       "host_name","                       node07.org.fr", 
     "service_description","             Workers ssh_known_hosts", 
     "contact_groups","                  admins", 
     "check_command","                   check_nrpe_long!check_ssh_known_hosts!60", 
     "normal_check_interval","           60 ; check every hour", 
     "max_check_attempts","              1", 
 ); 
variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE);


If the second parameter of the function nagios_add_service is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this...

It's possible to add dependency on a services for a host, with a service from another host well defined:

variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES =	nagios_add_host_service_dependency(
	"node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts" 
);

It's not possible to add dependency between hostgroups (for the moment ?)

commands and NRPE

Some Nagios configuration files don't need complex quattor structure template and so are created with filecopy :

  • adding commands is done in:

monitoring/nagios3/server/cfgfiles/commands

  • adding NRPE commands is done in:

monitoring/nagios3/client/cfgfiles/nrpe_commands

It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)


variable NRPE_HOSTGROUPS_SPECIFIC_DEPENDENT_SERVICES=nlist(
        escape("CE,CE-MPI,!NOQUATTOR")  ,"black hole workers",
        escape("CE,CE-MPI,LFC,SE_DPM,SE_DISK,MON,VOBOX,WMS,!NOQUATTOR"),"host certificate",
        escape("MON")                   ,"apel publisher",
        escape("CE,CE-MPI")                             ,"apel parser",
        escape("WN,CE-MPI,UI,VOBOX")    ,"home partition freespace",
        escape("WN")    ,"pbs_mom transfers",
);

Proxy management

Need to have a valid certificate for local grid probe. 2 mechanisms are possible: Renewal and Retrieval. In cfg/standard/monitoring/nagios3/server/config.tpl

include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'};
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};

Les variables associées:

NAGIOS_MODE_PROXY_RENEW

boolean. true if you want to use renewal mechanism

NAGIOS_MODE_PROXY_RETRIEVE

boolean. true if you want to use retrieval mechanism

NAGIOS_RENEW_PROXY

string. file where the proxy is renewed by the vobox mechanism renewal

NAGIOS_OUTPUT_PROXY

string. file where the proxy should be retrieved by the retrieval cron proxy

NAGIOS_MYPROXY_NAME

string. name of your proxy for later retrieval

MYPROXY_SERVER

string. name of your myproxy server host

NAGIOS_VONAME_PROXY

string. VO Used for voms authentication

Les variables

NAGIOS_RPM_VERSION ?= "3.0.5-1"

string. Nagios Server RPM version

variable NAGIOS_NODES_PROPERTIES ?= nlist()

nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.68.0.1" , "hardware","undef"))

variable NAGIOS_HTPASSWD_LOGIN ?= "nagios"

string. Defines the user name allowed to access the nagios interface.

variable NAGIOS_HTPASSWD_PASS  ?= "i88QUu1o8Jwq."

string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "nagios".

variable NAGIOS_MONITORED_HOSTGROUPS ?= NAGIOS_KNOWN_HOSTGROUPS

list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a "type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now).

variable NAGIOS_DEFAULT_NODE_GROUP?="Others"

string. Nodes which type is not known will be put in this hostgroup

variable NAGIOS_SERVER ?= FULL_HOSTNAME

string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set for all nodes, not only the server

variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin"

string. Default user name to use if none specified.

variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails")

nlist. Contains the names of people to be notified, with emails. ex. : nlist("toto" ,"me@localhost",)

variable NAGIOS_NOTIFICATIONS_ENABLED ?= true

boolean. Nagios wide variable to disable/enable notifications

variable NAGIOS_SYSINFO_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information

variable NAGIOS_CONFINFO_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information

variable NAGIOS_SYSCOMMAND_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands

variable NAGIOS_SERVVIEW_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services

variable NAGIOS_HOSTVIEW_USERS ?= "*"

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts

variable NAGIOS_SERVCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands

variable NAGIOS_HOSTCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME

string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands

variable NAGIOS_IGNORED_NODES ?= list()

list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* want to monitor

variable NAGIOS_NCG_CONFIG ?= false

boolean. Do you want to define NCG services ? (experimental)

Installation Exemple

With Quattor

svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl
  • Modify your list of machines:
vi ./cfg/sites/your/site/config/your_nodes_properties.tpl
  • create your hardware template
svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl
  • Configure your clients adding:
variable NAGIOS_CLIENT_ENABLED = true;
include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};

in the template template site/pro_site_common_config


  • Comit your change:
svn ci -m 'adding serveur nagios'


on the nagios server

  • As root
vi /var/log/spma.log
vi /var/log/ncm-cdispd.log
/etc/init.d/nagios status
/etc/init.d/nagios start
add your server certificate in /etc/grid-security


  • As nagios
add your personnal certificate and create a local proxy (or you can do that from another UI):
voms-proxy-init --voms vo.grif.fr
myproxy-init -c 336 -k xxxxx-s myproxy.grif.fr -l nagios -x  -Z "/O=GRID-FR/C=FR/O=xxx/OU=xxx/CN=xxxx.xxxx.fr"
  • As root: check the proxy retrieval mechanism (see cron /etc/cron.d/nagios-proxy-refresh , installed by the rpm nagios-proxy-refresh-1.7-3.noarch)
/usr/sbin/nagios-proxy-refresh
MyProxy credential retrieved. VOMS credential retrieved.
# voms-proxy-info --all -file /etc/nagios/globus/userproxy.pem
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy/CN=proxy
issuer    : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
identity  : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy
type      : proxy
strength  : 1024 bits
path      : /etc/nagios/globus/userproxy.pem
timeleft  : 11:27:56
=== VO vo.grif.fr extension information ===
VO        : vo.grif.fr
subject   : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy
issuer    : /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid12.lal.in2p3.fr
attribute : /vo.grif.fr/Role=NULL/Capability=NULL
timeleft  : 11:27:56
uri       : grid12.lal.in2p3.fr:20001


Interface

Check your interface: http://nagioserver.xx.fr:/nagios

Do you get the SAM test Back? you can check by hand on your server, as root:

/usr/libexec/grid-monitoring/plugins/nagios/gather_sam -t 3000 --site GRIF \
 --vos ops --sam-root-url  http://lcg-sam.cern.ch:8080/same-pi/ --sam-all

Adding a service

You can add Personal services in the nagios server profile.

You need to source the template with the nagios functions standard/monitoring/nagios3/server/functions.tpl

You need to define the service before including monitoring/nagios3/server/config


EXERCICE

  • 1) Define What is needed to test remotely a http server
  • 2) Define a hostgroup (for exemple "WEB" in NAGIOS_MONITORED_HOSTGROUPS) and set this node type to WEB in your nodes_properties
  • 3) add in your configuration the service to check the HTTP on those HOSTS

tuning the "time period", if needed

  • 4) comit your change and check your interface

http://nagiosserver.xxx.fr/nagios