LCG-FR / SA1-FR Monitoring NagiosWithQuattor: Difference between revisions

Un article de lcgwiki.
Jump to navigation Jump to search
LEROY (talk | contribs)
No edit summary
LEROY (talk | contribs)
No edit summary
Ligne 10: Ligne 10:
=== Repository Used ===
=== Repository Used ===


-Sensors are provided for some of the grid plugins from the SA1 repository: http://www.sysadmin.hep.ac.uk/rpms/grid-services/RPMS.monitoring/
*Sensors are provided for some of the grid plugins from the SA1 repository: http://www.sysadmin.hep.ac.uk/rpms/grid-services/RPMS.monitoring/


-RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.
*RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.
Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/
Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/


-Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »
*Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »
Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/
Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/


see template : nagios3/plugins/config.tpl
see template : '''nagios3/plugins/config.tpl'''


=== Server Template ===
=== Server Template ===
Ligne 48: Ligne 48:


==== Services ====
==== Services ====
Services are added in the template « server/cfgfiles/services.tpl »,  
Services are added in the template '''« server/cfgfiles/services.tpl »''' ,  
adding a service can be done like this :
adding a service can be done like this :


Ligne 78: Ligne 78:


*adding commands is done in:
*adding commands is done in:
monitoring/nagios3/server/cfgfiles/commands
'''monitoring/nagios3/server/cfgfiles/commands'''
*adding NRPE commands is done in:
*adding NRPE commands is done in:
monitoring/nagios3/client/cfgfiles/nrpe_commands
'''monitoring/nagios3/client/cfgfiles/nrpe_commands'''


It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)
It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template '''services.tpl''')





Version du 23:53, 20 janvier 2009

Installing Nagios with quattor

Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.


Configuring the Nagios server

The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.

Repository Used

  • RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.

Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/

  • Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »

Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/

see template : nagios3/plugins/config.tpl

Server Template

An example Nagios server template is here :

https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl

This machine should be a UI to monitor grid services.

Who is monitored

hosts from site (variable SITES) and present in config/’sitename’_nodes_properties.tpl will be monitored

Template example for host declaration are in LCGQWG: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config

You can tune this with:

NAGIOS_IGNORED_NODES

NAGIOS_MONITORED_HOSTGROUPS


see the profile above.

What is monitored

Services

Services are added in the template « server/cfgfiles/services.tpl » , adding a service can be done like this :

variable TMP_SERVICE=nlist( 
       "use","                             generic-service", 
       "host_name","                       node07.org.fr", 
     "service_description","             Workers ssh_known_hosts", 
     "contact_groups","                  admins", 
     "check_command","                   check_nrpe_long!check_ssh_known_hosts!60", 
     "normal_check_interval","           60 ; check every hour", 
     "max_check_attempts","              1", 
 ); 


If the second parameter of the function nagios_add_service is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this...

It's possible to add dependency on a services for a host, with a service from another host well defined:

variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES =	nagios_add_host_service_dependency(
	"node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts" 
);

It's not possible to add dependency between hostgroups (for the moment ?)

commands and NRPE

Nagios configuration files doesn't need complex quattor structure template and so are created with filecopy :

  • adding commands is done in:

monitoring/nagios3/server/cfgfiles/commands

  • adding NRPE commands is done in:

monitoring/nagios3/client/cfgfiles/nrpe_commands

It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)


NEED SOMETHING


Proxy management

Need to have a valid certificate for local grid probe. 2 mechanisms are possible: Renewal and Retrieval. In cfg/standard/monitoring/nagios3/server/config.tpl

include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'};
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};

Les variables associées:

NAGIOS_MODE_PROXY_RENEW
NAGIOS_RENEW_PROXY 
NAGIOS_OUTPUT_PROXY
NAGIOS_MODE_PROXY_RETRIEVE
NAGIOS_MYPROXY_NAME 
MYPROXY_SERVER
NAGIOS_VONAME_PROXY

client configuration

Les variables

variable NAGIOS_RPM_VERSION ?= "3.0.5-1" || string. Nagios Server RPM version

variable NAGIOS_NODES_PROPERTIES ?= nlist() || nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.

168.0.1" , "hardware","undef"))

variable NAGIOS_HTPASSWD_LOGIN ?= "nagios" || string. Defines the user name allowed to access the nagios interface.

variable NAGIOS_HTPASSWD_PASS  ?= "i88QUu1o8Jwq." || string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "n

agios".

variable NAGIOS_MONITORED_HOSTGROUPS ?= NAGIOS_KNOWN_HOSTGROUPS || list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a

"type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now).

variable NAGIOS_DEFAULT_NODE_GROUP?="Others" || string. Nodes which type is not known will be put in this hostgroup

variable NAGIOS_SERVER ?= FULL_HOSTNAME || string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set

for all nodes, not only the server

variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin"|| string. Default user name to use if none specified.

variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails")|| nlist. Contains the names of people to be notified, with em

ails. ex. : nlist("toto" ,"me@localhost",)

variable NAGIOS_NOTIFICATIONS_ENABLED ?= true || boolean. Nagios wide variable to disable/enable notifications

variable NAGIOS_SYSINFO_USERS ?= "*" || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information

variable NAGIOS_CONFINFO_USERS ?= "*" || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information

variable NAGIOS_SYSCOMMAND_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands

variable NAGIOS_SERVVIEW_USERS ?= "*" || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services

variable NAGIOS_HOSTVIEW_USERS ?= "*" || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts

variable NAGIOS_SERVCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands

variable NAGIOS_HOSTCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME || string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands

variable NAGIOS_IGNORED_NODES ?= list() ||list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* wan

t to monitor

variable NAGIOS_NCG_CONFIG ?= false ||boolean. Do you want to define NCG services ? (experimental)


\\ \\

|NAGIOS_ADMIN_CONTACTS | admin emails for alarms| | NAGIOS_CONFINFO_USERS | | | NAGIOS_DEFAULT_ADMIN_NAME | | | NAGIOS_DEFAULT_NODE_GROUP | | | NAGIOS_HOSTCOMMANDS_USERS | | | NAGIOS_HOSTVIEW_USERS | | | NAGIOS_HTPASSWD_CONFIG | | | NAGIOS_HTPASSWD_LOGIN | | | NAGIOS_HTPASSWD_PASS | | | NAGIOS_IGNORED_NODES | | | NAGIOS_KNOWN_HOSTGROUPS | | | NAGIOS_MONITORED_HOSTGROUPS | | | NAGIOS_NCG_CONFIG | | | NAGIOS_NODES_PROPERTIES | | | NAGIOS_NOTIFICATIONS_ENABLED | | | NAGIOS_RPM_VERSION | | | NAGIOS_SERVCOMMANDS_USERS | | | NAGIOS_SERVER | | | NAGIOS_SERVICEEXTINFOS | | | NAGIOS_SERVICEEXTINFOS | | | NAGIOS_SERVICES | | | NAGIOS_SERVVIEW_USERS | | | NAGIOS_SUPPORTED_OS_LIST | | | NAGIOS_SYSCOMMAND_USERS | | | NAGIOS_SYSINFO_USERS | | | NAGIOS_USER_DEFINED_HOST_DEPENDENCIES | |

Installation Exemple

With Quattor

server profile creation look at the profile above.

svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl

Modify your list of machines:

vi ./cfg/sites/your/site/config/your_nodes_properties.tpl

create your hardware template

svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl


Comit your change:

svn ci -m 'adding serveur nagios'


on the nagios server

vi /var/log/spma.log
vi /var/log/ncm-cdispd.log
/etc/init.d/nagios status
/etc/init.d/nagios start


Verifier le certificat serveur
NEED SOMETHING from node58