Difference between revisions of "LCG-FR / SA1-FR Monitoring NagiosWithQuattor"

Un article de lcgwiki.
Jump to: navigation, search
(Les variables)
(Les variables)
Ligne 118: Ligne 118:
 
|
 
|
 
|}
 
|}
 
variable NAGIOS_RPM_VERSION ?= "3.0.5-1"          || string. Nagios Server RPM version
 
 
variable NAGIOS_NODES_PROPERTIES ?= nlist()      || nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.
 
 
168.0.1" , "hardware","undef"))
 
 
variable NAGIOS_HTPASSWD_LOGIN ?= "nagios"        || string. Defines the user name allowed to access the nagios interface.
 
 
variable NAGIOS_HTPASSWD_PASS  ?= "i88QUu1o8Jwq." || string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "n
 
 
agios".
 
 
variable NAGIOS_MONITORED_HOSTGROUPS ?= NAGIOS_KNOWN_HOSTGROUPS || list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a
 
 
"type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now).
 
 
variable NAGIOS_DEFAULT_NODE_GROUP?="Others"      || string. Nodes which type is not known will be put in this hostgroup
 
 
variable NAGIOS_SERVER ?= FULL_HOSTNAME          || string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set
 
 
for all nodes, not only the server
 
 
variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin"|| string. Default user name to use if none specified.
 
 
variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails")|| nlist. Contains the names of people to be notified, with em
 
 
ails. ex. : nlist("toto"                ,"me@localhost",)
 
 
variable NAGIOS_NOTIFICATIONS_ENABLED ?= true    || boolean. Nagios wide variable to disable/enable notifications
 
 
variable NAGIOS_SYSINFO_USERS ?= "*"                            || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information
 
 
variable NAGIOS_CONFINFO_USERS ?= "*"                          || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information
 
 
variable NAGIOS_SYSCOMMAND_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME  || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands
 
 
variable NAGIOS_SERVVIEW_USERS ?= "*"                          || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services
 
 
variable NAGIOS_HOSTVIEW_USERS ?= "*"                          || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts
 
 
variable NAGIOS_SERVCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands
 
 
variable NAGIOS_HOSTCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands
 
 
variable NAGIOS_IGNORED_NODES ?= list() ||list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* wan
 
 
t to monitor
 
 
variable NAGIOS_NCG_CONFIG ?= false    ||boolean. Do you want to define NCG services ? (experimental)
 
 
 
 
\\
 
\\
 
 
|NAGIOS_ADMIN_CONTACTS  | admin emails for alarms|
 
|  NAGIOS_CONFINFO_USERS | |
 
|  NAGIOS_DEFAULT_ADMIN_NAME | |
 
|  NAGIOS_DEFAULT_NODE_GROUP | |
 
|  NAGIOS_HOSTCOMMANDS_USERS | |
 
|  NAGIOS_HOSTVIEW_USERS | |
 
|  NAGIOS_HTPASSWD_CONFIG | |
 
|  NAGIOS_HTPASSWD_LOGIN | |
 
|  NAGIOS_HTPASSWD_PASS | |
 
|  NAGIOS_IGNORED_NODES | |
 
|  NAGIOS_KNOWN_HOSTGROUPS | |
 
|  NAGIOS_MONITORED_HOSTGROUPS | |
 
|  NAGIOS_NCG_CONFIG | |
 
|  NAGIOS_NODES_PROPERTIES | |
 
|  NAGIOS_NOTIFICATIONS_ENABLED | |
 
|  NAGIOS_RPM_VERSION | |
 
|  NAGIOS_SERVCOMMANDS_USERS | |
 
|  NAGIOS_SERVER | |
 
|  NAGIOS_SERVICEEXTINFOS | |
 
|  NAGIOS_SERVICEEXTINFOS | |
 
|  NAGIOS_SERVICES | |
 
|  NAGIOS_SERVVIEW_USERS | |
 
|  NAGIOS_SUPPORTED_OS_LIST | |
 
|  NAGIOS_SYSCOMMAND_USERS | |
 
|  NAGIOS_SYSINFO_USERS | |
 
|  NAGIOS_USER_DEFINED_HOST_DEPENDENCIES | |
 
  
 
= Installation Exemple =
 
= Installation Exemple =

Version du 23:07, 20 janvier 2009

Installing Nagios with quattor

Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.


Configuring the Nagios server

The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.

Repository Used

  • RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.

Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/

  • Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »

Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/

see template : nagios3/plugins/config.tpl

Server Template

An example Nagios server template is here :

https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl

This machine should be a UI to monitor grid services.

Who is monitored

Hosts from site (variable SITES) and present in config/’sitename’_nodes_properties.tpl will be monitored

Template example for hosts declaration are in LCGQWG: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config

You can tune this with:

NAGIOS_IGNORED_NODES , NAGIOS_MONITORED_HOSTGROUPS

see the profile above.

What is monitored

Services

Services are added in the template « server/cfgfiles/services.tpl » , adding a service can be done like this :

variable TMP_SERVICE=nlist( 
       "use","                             generic-service", 
       "host_name","                       node07.org.fr", 
     "service_description","             Workers ssh_known_hosts", 
     "contact_groups","                  admins", 
     "check_command","                   check_nrpe_long!check_ssh_known_hosts!60", 
     "normal_check_interval","           60 ; check every hour", 
     "max_check_attempts","              1", 
 ); 


If the second parameter of the function nagios_add_service is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this...

It's possible to add dependency on a services for a host, with a service from another host well defined:

variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES =	nagios_add_host_service_dependency(
	"node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts" 
);

It's not possible to add dependency between hostgroups (for the moment ?)

commands and NRPE

Nagios configuration files doesn't need complex quattor structure template and so are created with filecopy :

  • adding commands is done in:

monitoring/nagios3/server/cfgfiles/commands

  • adding NRPE commands is done in:

monitoring/nagios3/client/cfgfiles/nrpe_commands

It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)


NEED SOMETHING


Proxy management

Need to have a valid certificate for local grid probe. 2 mechanisms are possible: Renewal and Retrieval. In cfg/standard/monitoring/nagios3/server/config.tpl

include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'};
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};

Les variables associées:

NAGIOS_MODE_PROXY_RENEW
NAGIOS_RENEW_PROXY 
NAGIOS_OUTPUT_PROXY
NAGIOS_MODE_PROXY_RETRIEVE
NAGIOS_MYPROXY_NAME 
MYPROXY_SERVER
NAGIOS_VONAME_PROXY

client configuration

Les variables

variable NAGIOS_RPM_VERSION ?= "3.0.5-1"

string. Nagios Server RPM version

variable NAGIOS_NODES_PROPERTIES

nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.168.0.1" , "hardware","undef"))

Installation Exemple

With Quattor

server profile creation look at the profile above.

svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl

Modify your list of machines:

vi ./cfg/sites/your/site/config/your_nodes_properties.tpl

create your hardware template

svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl


Comit your change:

svn ci -m 'adding serveur nagios'


on the nagios server

vi /var/log/spma.log
vi /var/log/ncm-cdispd.log
/etc/init.d/nagios status
/etc/init.d/nagios start


Verifier le certificat serveur
NEED SOMETHING from node58