Difference between revisions of "LCG-FR / SA1-FR Monitoring NagiosWithQuattor"
(→2 conditions) |
|||
(44 intermediate revisions by 2 users not shown) | |||
Ligne 3: | Ligne 3: | ||
Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on. | Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on. | ||
+ | You can contact '''frederic.schaer__arobase char__cea.fr''' in case of problem. | ||
+ | |||
+ | == Configuring the Nagios server == | ||
+ | |||
+ | The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace. | ||
+ | |||
+ | === Repository Used === | ||
+ | |||
+ | *Sensors are provided for some of the grid plugins from the SA1 repository: http://www.sysadmin.hep.ac.uk/rpms/grid-services/RPMS.monitoring/ | ||
+ | |||
+ | *RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr. | ||
+ | Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/ | ||
+ | |||
+ | *Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios » | ||
+ | Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/ | ||
− | + | see template : '''nagios3/plugins/config.tpl''' | |
− | + | === Server Template === | |
− | |||
An example Nagios server template is here : | An example Nagios server template is here : | ||
− | + | https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl | |
+ | |||
+ | This machine should be a UI to monitor grid services. | ||
− | + | === Who is monitored, Client configuration === | |
− | + | ||
− | + | ==== 2 conditions ==== | |
− | + | ||
− | + | *Hosts from site ('''variable SITES''') and present in '''config/’sitename’_nodes_properties.tpl''' will be monitored | |
− | + | ||
− | + | Template example for hosts declaration are in LCGQWG: | |
− | + | https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config | |
− | + | ||
− | + | *AND host that has NAGIOS_CLIENT_ENABLED set to "true": | |
− | + | ||
− | + | variable NAGIOS_CLIENT_ENABLED ?= true; | |
− | + | include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'}; | |
− | + | ||
− | + | If you want to set this variable to "true" for all your hosts you can put it in the following template: | |
− | variable | + | |
− | include { 'config | + | '''template site/pro_site_common_config''' |
− | + | ||
− | + | <u>note : </u> Please consider disabling client installation (by setting NAGIOS_CLIENT_ENABLED=false in the profiles) on nodes with unsupported OSes... otherwise you'll encounter quattor compilation errors. | |
− | + | ||
− | + | ==== You can tune this with ==== | |
− | + | ||
− | + | '''NAGIOS_IGNORED_NODES''' , '''NAGIOS_MONITORED_HOSTGROUPS''' | |
− | + | ||
− | + | see the profile: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl | |
− | + | ||
− | + | ==== Autorisation to contact NRPE ==== | |
− | + | ||
− | + | You need to declare your nagios server in the variable '''NAGIOS_SERVER' so it is able to contact NRPE daemon on the clients | |
− | + | ||
− | + | === What is monitored === | |
− | + | ||
− | + | ==== Services ==== | |
− | + | Services are added in the template '''« server/cfgfiles/services.tpl »''' , | |
− | + | adding a service can be done like this : | |
− | + | ||
− | + | variable TMP_SERVICE=nlist( | |
− | + | "use"," generic-service", | |
− | + | "host_name"," node07.org.fr", | |
− | + | "service_description"," Workers ssh_known_hosts", | |
− | variable TMP_SERVICE=nlist( | + | "contact_groups"," admins", |
− | + | "check_command"," check_nrpe_long!check_ssh_known_hosts!60", | |
− | + | "normal_check_interval"," 60 ; check every hour", | |
− | + | "max_check_attempts"," 1", | |
− | + | ); | |
− | |||
− | |||
− | |||
− | |||
variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE); | variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE); | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | If the second parameter of the function '''nagios_add_service''' is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this... | |
+ | |||
+ | It's possible to add dependency on a services for a host, with a service from another host well defined: | ||
+ | |||
+ | variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES = nagios_add_host_service_dependency( | ||
+ | "node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts" | ||
+ | ); | ||
+ | |||
+ | It's not possible to add dependency between hostgroups (for the moment ?) | ||
+ | |||
+ | ==== commands and NRPE ==== | ||
− | + | Some Nagios configuration files don't need complex quattor structure template and so are created with '''filecopy''' : | |
+ | *adding commands is done in: | ||
+ | '''monitoring/nagios3/server/cfgfiles/commands''' | ||
+ | *adding NRPE commands is done in: | ||
+ | '''monitoring/nagios3/client/cfgfiles/nrpe_commands''' | ||
− | + | It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template '''services.tpl''') | |
− | |||
+ | variable NRPE_HOSTGROUPS_SPECIFIC_DEPENDENT_SERVICES=nlist( | ||
+ | escape("CE,CE-MPI,!NOQUATTOR") ,"black hole workers", | ||
+ | escape("CE,CE-MPI,LFC,SE_DPM,SE_DISK,MON,VOBOX,WMS,!NOQUATTOR"),"host certificate", | ||
+ | escape("MON") ,"apel publisher", | ||
+ | escape("CE,CE-MPI") ,"apel parser", | ||
+ | escape("WN,CE-MPI,UI,VOBOX") ,"home partition freespace", | ||
+ | escape("WN") ,"pbs_mom transfers", | ||
+ | ); | ||
− | == | + | === Proxy management === |
+ | Need to have a valid certificate for local grid probe. | ||
+ | 2 mechanisms are possible: Renewal and Retrieval. | ||
+ | In cfg/standard/monitoring/nagios3/server/config.tpl | ||
+ | include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'}; | ||
+ | include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'}; | ||
+ | Les variables associées: | ||
+ | {| border="1" width="100%" | ||
+ | | width="50%" | | ||
+ | NAGIOS_MODE_PROXY_RENEW | ||
+ | | width="50%" | | ||
+ | boolean. true if you want to use renewal mechanism | ||
+ | |---- | ||
+ | | | ||
+ | NAGIOS_MODE_PROXY_RETRIEVE | ||
+ | | width="50%" | | ||
+ | boolean. true if you want to use retrieval mechanism | ||
+ | |---- | ||
+ | | | ||
+ | NAGIOS_RENEW_PROXY | ||
+ | | width="50%" | | ||
+ | string. file where the proxy is renewed by the vobox mechanism renewal | ||
+ | |---- | ||
+ | | | ||
+ | NAGIOS_OUTPUT_PROXY | ||
+ | | width="50%" | | ||
+ | string. file where the proxy should be retrieved by the retrieval cron proxy | ||
+ | |---- | ||
+ | | | ||
+ | NAGIOS_MYPROXY_NAME | ||
+ | | width="50%" | | ||
+ | string. name of your proxy for later retrieval | ||
+ | |---- | ||
+ | | | ||
+ | MYPROXY_SERVER | ||
+ | | width="50%" | | ||
+ | string. name of your myproxy server host | ||
+ | |---- | ||
+ | | | ||
+ | NAGIOS_VONAME_PROXY | ||
+ | | width="50%" | | ||
+ | string. VO Used for voms authentication | ||
+ | |---- | ||
+ | |} | ||
== Les variables == | == Les variables == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | Installation Exemple | + | {| border="1" width="100%" |
+ | | width="50%" | | ||
+ | NAGIOS_RPM_VERSION ?= "3.0.5-1" | ||
+ | | width="50%" | | ||
+ | string. Nagios Server RPM version | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_NODES_PROPERTIES ?= nlist() | ||
+ | | width="50%" | | ||
+ | nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.68.0.1" , "hardware","undef")) | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_HTPASSWD_LOGIN ?= "nagios" | ||
+ | | width="50%" | | ||
+ | string. Defines the user name allowed to access the nagios interface. | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_HTPASSWD_PASS ?= "i88QUu1o8Jwq." | ||
+ | | width="50%" | | ||
+ | string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "nagios". | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_MONITORED_HOSTGROUPS ?= NAGIOS_KNOWN_HOSTGROUPS | ||
+ | | width="50%" | | ||
+ | list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a "type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now). | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_DEFAULT_NODE_GROUP?="Others" | ||
+ | | width="50%" | | ||
+ | string. Nodes which type is not known will be put in this hostgroup | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_SERVER ?= FULL_HOSTNAME | ||
+ | | width="50%" | | ||
+ | string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set for all nodes, not only the server | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin" | ||
+ | | width="50%" | | ||
+ | string. Default user name to use if none specified. | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails") | ||
+ | | width="50%" | | ||
+ | nlist. Contains the names of people to be notified, with emails. ex. : nlist("toto" ,"me@localhost",) | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_NOTIFICATIONS_ENABLED ?= true | ||
+ | | width="50%" | | ||
+ | boolean. Nagios wide variable to disable/enable notifications | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_SYSINFO_USERS ?= "*" | ||
+ | | width="50%" | | ||
+ | string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_CONFINFO_USERS ?= "*" | ||
+ | | width="50%" | | ||
+ | string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_SYSCOMMAND_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME | ||
+ | | width="50%" | | ||
+ | string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_SERVVIEW_USERS ?= "*" | ||
+ | | width="50%" | | ||
+ | string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_HOSTVIEW_USERS ?= "*" | ||
+ | | width="50%" | | ||
+ | string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_SERVCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME | ||
+ | | width="50%" | | ||
+ | string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_HOSTCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME | ||
+ | | width="50%" | | ||
+ | string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_IGNORED_NODES ?= list() | ||
+ | | width="50%" | | ||
+ | list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* want to monitor | ||
+ | |---- | ||
+ | | | ||
+ | variable NAGIOS_NCG_CONFIG ?= false | ||
+ | | width="50%" | | ||
+ | boolean. Do you want to define NCG services ? (experimental) | ||
+ | |---- | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | = Installation Exemple = | ||
− | + | == With Quattor == | |
− | server profile creation look at the profile | + | *server profile creation (look at the profile https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl): |
svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl | svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl | ||
− | Modify your list of machines: | + | *Modify your list of machines: |
vi ./cfg/sites/your/site/config/your_nodes_properties.tpl | vi ./cfg/sites/your/site/config/your_nodes_properties.tpl | ||
− | create your hardware template | + | *create your hardware template |
svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl | svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl | ||
+ | *Configure your clients adding: | ||
+ | variable NAGIOS_CLIENT_ENABLED = true; | ||
+ | include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'}; | ||
+ | in the template '''template site/pro_site_common_config''' | ||
− | Comit your change: | + | |
+ | |||
+ | *Comit your change: | ||
svn ci -m 'adding serveur nagios' | svn ci -m 'adding serveur nagios' | ||
− | + | == on the nagios server == | |
+ | *As root | ||
vi /var/log/spma.log | vi /var/log/spma.log | ||
vi /var/log/ncm-cdispd.log | vi /var/log/ncm-cdispd.log | ||
/etc/init.d/nagios status | /etc/init.d/nagios status | ||
/etc/init.d/nagios start | /etc/init.d/nagios start | ||
+ | add your server certificate in /etc/grid-security | ||
+ | |||
+ | |||
+ | *As nagios | ||
+ | add your personnal certificate and create a local proxy (or you can do that from another UI): | ||
+ | |||
+ | voms-proxy-init --voms vo.grif.fr | ||
+ | myproxy-init -c 336 -k xxxxx-s myproxy.grif.fr -l nagios -x -Z "/O=GRID-FR/C=FR/O=xxx/OU=xxx/CN=xxxx.xxxx.fr" | ||
+ | |||
+ | *As root: check the proxy retrieval mechanism (see cron '''/etc/cron.d/nagios-proxy-refresh''' , installed by the rpm '''nagios-proxy-refresh-1.7-3.noarch''') | ||
+ | /usr/sbin/nagios-proxy-refresh | ||
+ | MyProxy credential retrieved. VOMS credential retrieved. | ||
+ | # voms-proxy-info --all -file /etc/nagios/globus/userproxy.pem | ||
+ | subject : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy/CN=proxy | ||
+ | issuer : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy | ||
+ | identity : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy | ||
+ | type : proxy | ||
+ | strength : 1024 bits | ||
+ | path : /etc/nagios/globus/userproxy.pem | ||
+ | timeleft : 11:27:56 | ||
+ | === VO vo.grif.fr extension information === | ||
+ | VO : vo.grif.fr | ||
+ | subject : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy | ||
+ | issuer : /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid12.lal.in2p3.fr | ||
+ | attribute : /vo.grif.fr/Role=NULL/Capability=NULL | ||
+ | timeleft : 11:27:56 | ||
+ | uri : grid12.lal.in2p3.fr:20001 | ||
+ | |||
+ | |||
+ | == Interface == | ||
+ | Check your interface: http://nagioserver.xx.fr:/nagios | ||
+ | |||
+ | Do you get the SAM test Back? | ||
+ | you can check by hand on your server, as root: | ||
+ | /usr/libexec/grid-monitoring/plugins/nagios/gather_sam -t 3000 --site GRIF \ | ||
+ | --vos ops --sam-root-url http://lcg-sam.cern.ch:8080/same-pi/ --sam-all | ||
+ | |||
+ | == Adding a service == | ||
+ | |||
+ | You can add Personal services in the nagios server profile. | ||
+ | |||
+ | You need to source the template with the nagios functions '''standard/monitoring/nagios3/server/functions.tpl''' | ||
+ | |||
+ | You need to define the service before including monitoring/nagios3/server/config | ||
+ | |||
+ | |||
+ | === EXERCICE === | ||
+ | |||
+ | *1) Define What is needed to test remotely a http server | ||
+ | |||
+ | *2) Define a hostgroup (for exemple "WEB" in '''NAGIOS_MONITORED_HOSTGROUPS''') and set this node type to WEB in your nodes_properties | ||
+ | |||
+ | *3) add in your configuration the service to check the HTTP on those HOSTS | ||
+ | tuning the "time period", if needed | ||
+ | *4) comit your change and check your interface | ||
− | + | http://nagiosserver.xxx.fr/nagios |
Latest revision as of 16:31, 13 mars 2009
Sommaire
Installing Nagios with quattor
Nagios configuration requires both a set of client templates for commands to be run on clients by the Nagios Remote Plug-in Executor (NRPE) and a set of server templates configuring contacts for alarms, hosts to be monitored, services (AKA sensors) and so on.
You can contact frederic.schaer__arobase char__cea.fr in case of problem.
Configuring the Nagios server
The configuration of a Nagios server is done in a set of standard templates, in the 'monitoring/nagios3' namespace.
Repository Used
- Sensors are provided for some of the grid plugins from the SA1 repository: http://www.sysadmin.hep.ac.uk/rpms/grid-services/RPMS.monitoring/
- RPMs for nagios and nagios-plugins (+dépendances) are compiled for each supported OS, and are put in the repository « updates » on quattorsrv.lal.in2p3.fr.
Ex. : http://quattor.web.lal.in2p3.fr/packages/os/sl440-i386/updates/
- Plugins « nagios-grid-plugins » are in noarch RPM in the repository « nagios »
Ex. : http://quattor.web.lal.in2p3.fr/packages/nagios/
see template : nagios3/plugins/config.tpl
Server Template
An example Nagios server template is here :
This machine should be a UI to monitor grid services.
Who is monitored, Client configuration
2 conditions
- Hosts from site (variable SITES) and present in config/’sitename’_nodes_properties.tpl will be monitored
Template example for hosts declaration are in LCGQWG: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/sites/example/site/config
- AND host that has NAGIOS_CLIENT_ENABLED set to "true":
variable NAGIOS_CLIENT_ENABLED ?= true; include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};
If you want to set this variable to "true" for all your hosts you can put it in the following template:
template site/pro_site_common_config
note : Please consider disabling client installation (by setting NAGIOS_CLIENT_ENABLED=false in the profiles) on nodes with unsupported OSes... otherwise you'll encounter quattor compilation errors.
You can tune this with
NAGIOS_IGNORED_NODES , NAGIOS_MONITORED_HOSTGROUPS
see the profile: https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl
Autorisation to contact NRPE
You need to declare your nagios server in the variable NAGIOS_SERVER' so it is able to contact NRPE daemon on the clients
What is monitored
Services
Services are added in the template « server/cfgfiles/services.tpl » , adding a service can be done like this :
variable TMP_SERVICE=nlist( "use"," generic-service", "host_name"," node07.org.fr", "service_description"," Workers ssh_known_hosts", "contact_groups"," admins", "check_command"," check_nrpe_long!check_ssh_known_hosts!60", "normal_check_interval"," 60 ; check every hour", "max_check_attempts"," 1", ); variable NAGIOS_SERVICES=nagios_add_service(TMP_SERVICE);
If the second parameter of the function nagios_add_service is « true » , a dependency will be added on the NRPE daemon for all the nodes « "*,!NOQUATTOR » for the service defined .... need to improve on this...
It's possible to add dependency on a services for a host, with a service from another host well defined:
variable NAGIOS_USER_DEFINED_HOST_DEPENDENCIES = nagios_add_host_service_dependency( "node07.datagrid.cea.fr","nrpe daemon", "node07.datagrid.cea.fr","Workers ssh_known_hosts" );
It's not possible to add dependency between hostgroups (for the moment ?)
commands and NRPE
Some Nagios configuration files don't need complex quattor structure template and so are created with filecopy :
- adding commands is done in:
monitoring/nagios3/server/cfgfiles/commands
- adding NRPE commands is done in:
monitoring/nagios3/client/cfgfiles/nrpe_commands
It's possible to add dependency on the NRPE daemon for services wich are not defined on all the hosts(template services.tpl)
variable NRPE_HOSTGROUPS_SPECIFIC_DEPENDENT_SERVICES=nlist( escape("CE,CE-MPI,!NOQUATTOR") ,"black hole workers", escape("CE,CE-MPI,LFC,SE_DPM,SE_DISK,MON,VOBOX,WMS,!NOQUATTOR"),"host certificate", escape("MON") ,"apel publisher", escape("CE,CE-MPI") ,"apel parser", escape("WN,CE-MPI,UI,VOBOX") ,"home partition freespace", escape("WN") ,"pbs_mom transfers", );
Proxy management
Need to have a valid certificate for local grid probe. 2 mechanisms are possible: Renewal and Retrieval. In cfg/standard/monitoring/nagios3/server/config.tpl
include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RENEW) 'monitoring/nagios3/server/vobox'}; include { if(NAGIOS_NCG_CONFIG && NAGIOS_MODE_PROXY_RETRIEVE) 'monitoring/nagios3/server/proxy_retrieval'};
Les variables associées:
NAGIOS_MODE_PROXY_RENEW |
boolean. true if you want to use renewal mechanism |
NAGIOS_MODE_PROXY_RETRIEVE |
boolean. true if you want to use retrieval mechanism |
NAGIOS_RENEW_PROXY |
string. file where the proxy is renewed by the vobox mechanism renewal |
NAGIOS_OUTPUT_PROXY |
string. file where the proxy should be retrieved by the retrieval cron proxy |
NAGIOS_MYPROXY_NAME |
string. name of your proxy for later retrieval |
MYPROXY_SERVER |
string. name of your myproxy server host |
NAGIOS_VONAME_PROXY |
string. VO Used for voms authentication |
Les variables
NAGIOS_RPM_VERSION ?= "3.0.5-1" |
string. Nagios Server RPM version |
variable NAGIOS_NODES_PROPERTIES ?= nlist() |
nlist. The site nodes properties. For instance : nlist( escape("mynode.mydomain"), nlist("type","NFS", "monitoring","yes", "os","undef", "ip","192.68.0.1" , "hardware","undef")) |
variable NAGIOS_HTPASSWD_LOGIN ?= "nagios" |
string. Defines the user name allowed to access the nagios interface. |
variable NAGIOS_HTPASSWD_PASS ?= "i88QUu1o8Jwq." |
string. Defines the password associated with the user name. This is a md5 hash generated using the "openssl passwd" command. Default password is "nagios". |
variable NAGIOS_MONITORED_HOSTGROUPS ?= NAGIOS_KNOWN_HOSTGROUPS |
list. Defines the list of hostgroups you want to define in Nagios. If different from NAGIOS_KNOWN_HOSTGROUPS, each node that is of a "type" not listed will be added to the hostgroup NAGIOS_DEFAULT_NODE_GROUP. Please not that KNOWN hostgrouops will still be defined, for compatibility reasons (for now). |
variable NAGIOS_DEFAULT_NODE_GROUP?="Others" |
string. Nodes which type is not known will be put in this hostgroup |
variable NAGIOS_SERVER ?= FULL_HOSTNAME |
string. Defines the NAGIOS_SERVER variable, which tells the clients which nagios host is going to poll the NRPE daemon. This *REALLY* should be set for all nodes, not only the server |
variable NAGIOS_DEFAULT_ADMIN_NAME ?="nagiosadmin" |
string. Default user name to use if none specified. |
variable NAGIOS_ADMIN_CONTACTS ?= error("you need to define NAGIOS_ADMIN_CONTACTS with a nlist, containing names, associated with emails") |
nlist. Contains the names of people to be notified, with emails. ex. : nlist("toto" ,"me@localhost",) |
variable NAGIOS_NOTIFICATIONS_ENABLED ?= true |
boolean. Nagios wide variable to disable/enable notifications |
variable NAGIOS_SYSINFO_USERS ?= "*" |
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_information |
variable NAGIOS_CONFINFO_USERS ?= "*" |
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_configuration_information |
variable NAGIOS_SYSCOMMAND_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME |
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_system_commands |
variable NAGIOS_SERVVIEW_USERS ?= "*" |
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_services |
variable NAGIOS_HOSTVIEW_USERS ?= "*" |
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_hosts |
variable NAGIOS_SERVCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME |
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_service_commands |
variable NAGIOS_HOSTCOMMANDS_USERS ?= NAGIOS_DEFAULT_ADMIN_NAME |
string. part of CGI users authentication. Corresponds to nagios configuration variable authorized_for_all_host_commands |
variable NAGIOS_IGNORED_NODES ?= list() |
list. Defines a list of nodes that will be IGNORED when defining nagios hosts. This is usefull if you have nodes in the NODES_PROPERTIES that you do *NOT* want to monitor |
variable NAGIOS_NCG_CONFIG ?= false |
boolean. Do you want to define NCG services ? (experimental) |
Installation Exemple
With Quattor
- server profile creation (look at the profile https://trac.lal.in2p3.fr/LCGQWG/browser/templates/trunk/clusters/example-3.1/profiles/nagios3-server.example.org.tpl):
svn add cfg/clusters/your-3.1/profiles/profile_node58.tpl
- Modify your list of machines:
vi ./cfg/sites/your/site/config/your_nodes_properties.tpl
- create your hardware template
svn cp ./cfg/sites/your/hardware/virtual_machine_3.tpl ./cfg/sites/your/hardware/virtual_machine_13.tpl
- Configure your clients adding:
variable NAGIOS_CLIENT_ENABLED = true; include { if(NAGIOS_CLIENT_ENABLED) 'monitoring/nagios3/client/config'};
in the template template site/pro_site_common_config
- Comit your change:
svn ci -m 'adding serveur nagios'
on the nagios server
- As root
vi /var/log/spma.log vi /var/log/ncm-cdispd.log /etc/init.d/nagios status /etc/init.d/nagios start add your server certificate in /etc/grid-security
- As nagios
add your personnal certificate and create a local proxy (or you can do that from another UI):
voms-proxy-init --voms vo.grif.fr myproxy-init -c 336 -k xxxxx-s myproxy.grif.fr -l nagios -x -Z "/O=GRID-FR/C=FR/O=xxx/OU=xxx/CN=xxxx.xxxx.fr"
- As root: check the proxy retrieval mechanism (see cron /etc/cron.d/nagios-proxy-refresh , installed by the rpm nagios-proxy-refresh-1.7-3.noarch)
/usr/sbin/nagios-proxy-refresh MyProxy credential retrieved. VOMS credential retrieved. # voms-proxy-info --all -file /etc/nagios/globus/userproxy.pem subject : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy/CN=proxy issuer : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy identity : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy/CN=proxy/CN=proxy/CN=proxy type : proxy strength : 1024 bits path : /etc/nagios/globus/userproxy.pem timeleft : 11:27:56 === VO vo.grif.fr extension information === VO : vo.grif.fr subject : /O=GRID-FR/C=FR/O=CEA/OU=IRFU/CN=Christine Leroy issuer : /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid12.lal.in2p3.fr attribute : /vo.grif.fr/Role=NULL/Capability=NULL timeleft : 11:27:56 uri : grid12.lal.in2p3.fr:20001
Interface
Check your interface: http://nagioserver.xx.fr:/nagios
Do you get the SAM test Back? you can check by hand on your server, as root:
/usr/libexec/grid-monitoring/plugins/nagios/gather_sam -t 3000 --site GRIF \ --vos ops --sam-root-url http://lcg-sam.cern.ch:8080/same-pi/ --sam-all
Adding a service
You can add Personal services in the nagios server profile.
You need to source the template with the nagios functions standard/monitoring/nagios3/server/functions.tpl
You need to define the service before including monitoring/nagios3/server/config
EXERCICE
- 1) Define What is needed to test remotely a http server
- 2) Define a hostgroup (for exemple "WEB" in NAGIOS_MONITORED_HOSTGROUPS) and set this node type to WEB in your nodes_properties
- 3) add in your configuration the service to check the HTTP on those HOSTS
tuning the "time period", if needed
- 4) comit your change and check your interface