Nagios Configuration
Files
We have already dicussed sample configuration files will appear
in /usr/local/nagios/etc folder.The following files are basic
configuration files if you don't see any one of these file you
need to create each file with the exact syntax.
We will explain each file with the complete syntax in the
following sections
Nagios has a list of important files on which they depend upon.
These range from the config files to the plugins, logs, command
files etc.
The following are the files of importance in Nagios:
Note: The file path is assumed
based on the default locations of the files.
Main Configuration File
/usr/local/nagios/etc/nagios.cfg
This is the configuration file which defines the various
directives that Nagios uses. These directives include the path
to various folders where Nagios needs to check in for the
required files, the object config files, the command files etc
and various other parameters which decide how Nagios operates.
Resource File
/usr/local/nagios/etc/resource.cfg
This file has the suer defined macros and other sensitive
configuration information which are denied access for the CGIs.
Commands Config File
/usr/local/nagios/etc/commands.cfg
CGI Config file
/usr/local/nagios/etc/cgi.cfg
Other Object Configuration files include but not limited to the
following:
/usr/local/nagios/etc/hosts.cfg
/usr/local/nagios/etc/hostgroup.cfg
/usr/local/nagios/etc/services.cfg
/usr/local/nagios/etc/servicegroup.cfg
/usr/local/nagios/etc/contacts.cfg
/usr/local/nagios/etc/contactsgroup.cfg
/usr/local/nagios/etc/timeperiod.cfg
Nagios Command File
/usr/local/nagios/var/rw/nagios.cmd
Nagios check this file for external commands to process. The
command CGI writes commands to this file. Other third party
programs can write to this file if proper file permissions have
been granted as outline in here. The external command file is
implemented as a named pipe (FIFO), which is created when Nagios
starts and removed when it shuts down. If the file exists when
Nagios starts, the Nagios process will terminate with an error
message.
Nagios Log Files
Status Log
/usr/local/nagios/var/status.log
Downtime Log File
/usr/local/nagios/var/downtime.log
Comment log File
/usr/local/nagios/var/comment.log
Nagios Lock File
/tmp/nagios.lock
Nagios creates this file when it runs as a daemon. This file
contains the process id (PID) number of the running Nagios
process.
Nagios Temp File
/usr/local/nagios/var/nagios.tmp
State Retention File
/usr/local/nagios/var/status.sav
This is the file that Nagios will use for storing service and
host state information before it shuts down. When Nagios is
restarted it will use the information stored in this file for
setting the initial states of services and hosts before it
starts monitoring anything. This file is deleted after Nagios
reads in initial state information when it (re)starts.
Configure nagios
Files
These are the Object configuration files for nagios these files
are pointed in nagios.cfg file which is the main configuration
file.If you don't have the following files just create these
files using the follwing command
#touch <filename>
and Check the file permissions and ownership
/usr/local/nagios/etc/contactgroups.cfg
/usr/local/nagios/etc/contacts.cfg
/usr/local/nagios/etc/services.cfg
/usr/local/nagios/etc/dependencies.cfg
/usr/local/nagios/etc/escalations.cfg
/usr/local/nagios/etc/hostgroups.cfg
/usr/local/nagios/etc/hosts.cfg
/usr/local/nagios/etc/servicegroups.cfg
/usr/local/nagios/etc/timeperiods.cfg
You will first need to set the authentication option for the
nagiosadmin user in $NAGIOSHOME/etc/cgi.cfg:-
use_authentication=1
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
Of course, other users can be set up with different privileges.
Remember to create them in $NAGIOSHOME/etc/htpasswd.users.
Also, you need to make sure that the relevant users have the
correct permissions for nagios. Usually, you will want the admin
user to be able to do everything. So, edit these lines in $NAGIOSHOME/etc/cgi.cfg
as follows:-
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
Check through the $NAGIOSHOME/etc/nagios.cfg to see which are
the best options for you with things like whether nagios allows
external commands to be executed through the web interface, how
often to rotate log files etc.
If you decide to make external commands accessible to nagios,
then you make ensure that the directory $NAGIOSHOME/var/rw is
readable and writeable by the web server user (usually
'www-data').
If you do want to allow external commands to be parsed and acted
on by Nagios, you need to set the directive:
check_external_commands=1
in $NAGIOSHOME/etc/nagios.cfg Then we need a new user group and
relevant permissions on $NAGIOSHOME/var/rw and $NAGIOSHOME/var/rw/nagios.cmd
accordingly:-
#groupadd nagiocmd
#usermod -G nagiocmd nagios
#usermod -G nagiocmd www-data
where "www-data" is the apache user. Now make the command
directory (if it does not already exist).
#mkdir $NAGIOSHOME/var/rw
and set the permissions
#chown nagios:nagiocmd $NAGIOSHOME/var/rw
#chmod u+rwx $NAGIOSHOME/var/rw
#chmod g+rwx $NAGIOSHOME/var/rw
#chmod g+s $NAGIOSHOME/var/rw
You'll need to restart apache so that it can take advantage of
being part of the nagiocmd group.
Templating Configuration Files
With all of the object configuration files, you can use
templates to make the files smaller and save you time and effort
when you need to make changes to them. Let's take the example of
the services definitions (see later for more explanation):-
# Generic service definition template
define service{
name generic-service ; The 'name' of this service template,
referenced in other service definitions
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are
enabled/accepted
parallelize_check 1 ; Active service checks should be
parallelized (disabling this can lead to major performance
problems)
obsess_over_service 1 ; We should obsess over this service (if
necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across
program restarts
retain_nonstatus_information 1 ; Retain non-status information
across program restarts
contact_groups $CONTACT_GROUP1
is_volatile 0
check_period $PERIOD
max_check_attempts #n
normal_check_interval #n
retry_check_interval #n
notification_interval #n
notification_period $PERIOD
notification_options w,u,c,r
check_command $COMMAND $ARGUMENTS
service_description $SERVICE
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL
SERVICE, JUST A TEMPLATE!
}
# Service definition
define service{
use generic-service
host_name $HOST1,$HOST2,$HOST3...
}
# Service definition
define service{
use generic-service
host_name $HOST4,$HOST5...
contact_groups $CONTACT_GROUP1,$CONTACTGROUP2
}
Any pretty common directives to the service checking can go into
the template section at the top, then specify only the bits that
would differ for specific (groups of) hosts in the service
definition sections. Also, you can over-ride templated settings
in the specific service definition sections.
Configure time periods (timeperiods.cfg)
You need to think about what time periods you would want to
separate out the notifications and checking of services. e.g.
# '24x7' timeperiod definition
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
# 'workhours' timeperiod definition
define timeperiod{
timeperiod_name workhours
alias "Normal" Working Hours
monday 08:00-18:00
tuesday 08:00-18:00
wednesday 08:00-18:00
thursday 08:00-18:00
friday 08:00-18:00
}
# 'nonworkhours' timeperiod definition
define timeperiod{
timeperiod_name nonworkhours
alias Non-Work Hours
sunday 00:00-24:00
monday 00:00-09:00,17:00-24:00
tuesday 00:00-09:00,17:00-24:00
wednesday 00:00-09:00,17:00-24:00
thursday 00:00-09:00,17:00-24:00
friday 00:00-09:00,17:00-24:00
saturday 00:00-24:00
}
# 'none' timeperiod definition
define timeperiod{
timeperiod_name none
alias No Time Is A Good Time
}
Notice that time period definitions are allowed to overlap.
For most purposes, the existing configuration is pretty good,
though you may just want to tweak the "workhours" definitions
(and thus the "nonworkhours" from 9am-5pm to your local
requirements. This edit can be made in the $NAGIOSHOME/etc/timeperiods.cfg
If you plan to make no changes from the supplied timeperiods.cfg-sample
file, then just copy it to timeperiods.cfg and you're done.
Configure contacts (contacts.cfg)
Obviously, the point of monitoring is that the relevant people
know when something isn't right. So, one thing we need to do is
to set up a list of people who will be notified in the event of
problems. e.g.:- Let's say we have 6 servers, 2 in London (LON1
and LON2), 2 in New York (NY1 and NY2) and 2 in Hong Kong (HK1
and HK2). Each location has one machine that is a gateway and
firewall (machine 1) and the other machine is mail and webcache
(machine 2) and the webserver runs on LON1. There are people in
the company responsible for various services and hardware and
there are those who would need to know in the event of an
outage, for escalation purposes.
You will need one section per person. Let's take two people;
Fred Bloggs (login ID fbloggs, email address fbloggs@bigcorp.com),
who is the operations manager and needs to know 24x7x365 about
problems and Joanna Smith (login ID jsmith, email address jsmith@bigcrop.com),
who is a web architect and needs to know about critical problems
with her web servers on weekdays, in working hours, but someone
else covers at weekends and warnings aren't of interest.
# 'fbloggs' contact definition
define contact{
contact_name fbloggs
alias Fred Bloggs
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email fbloggs@bigcorp.com
}
# 'jsmith' contact definition
define contact{
contact_name jsmith
alias Joanna Smith
service_notification_period workinghours
host_notification_period workinghours
service_notification_options u,c
host_notification_options d,u
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email jsmith@bigcorp.com
}
Configure contact groups (contactsgroup.cfg)
In our hypothetical company, we have various functional groups
responsible for technical issues:-
Mail admins - Fred
New York admins - Fred, Joanna
... etc. and we can define these groups in the $NAGIOSHOME/etc/contactgroups.cfg
file:-
# 'mail-admins' contact group definition
define contactgroup{
contactgroup_name mail-admins
alias Mail Admins
members fbloggs
}
# 'ny-admins' contact group definition
define contactgroup{
contactgroup_name ny-admins
alias New York Admins
members fbloggs,jsmith
}
...and so on.
Configure host groups (hostgroup.cfg)
Host groups are useful to separate different physical locations,
functions and services. Hosts can be members of one or more
groups. We could group them as follows:-
Hong Kong Group: HK1,HK2
New York Group: NY1,NY2
London Group: LON1,LON2,LON3
Mail Servers: HK2,NY2,LON2
Gateways: HK1,NY1,LON1
Firewalls: HK1,NY1,LON1
Webcaches: HK1,NY1,LON1
Webservers: LON3
So, in the view of host groups, there is a logical set-out by
location and by function, making it easier to spot problems. We
can specify the groups in the $NAGIOSHOME/etc/hostgroups.conf
for this example like this:-
# 'hong-kong' host group definition
define hostgroup{
hostgroup_name hong-kong
alias Hong Kong Group
contact_groups hk-admins*
members HK1,HK2
}
# 'new-york' host group definition
define hostgroup{
hostgroup_name new-york
alias New York Group
contact_groups ny-admins*
members NY1,NY2
}
# 'london' host group definition
define hostgroup{
hostgroup_name london
alias London Group
contact_groups lon-admins*
members LON1,LON2,LON3
}
# 'mail' host group definition
define hostgroup{
hostgroup_name mail
alias Mail Servers
contact_groups mail-admins,hk-admins,ny-admins,lon-admins*
members HK2,NY2,LON2
}
# 'gateway' host group definition
define hostgroup{
hostgroup_name gateway
alias Gateway Servers
contact_groups infrastructure,hk-admins,ny-admins,lon-admins*
members HK1,NY1,LON1
}
# 'firewall' host group definition
define hostgroup{
hostgroup_name firewall
alias Firewalls
contact_groups security,hk-admins,ny-admins,lon-admins*
members HK1,NY1,LON1
}
# 'cache' host group definition
define hostgroup{
hostgroup_name cache
alias Webcaches
contact_groups infrastructure*
members HK1,NY1,LON1
}
# 'www' host group definition
define hostgroup{
hostgroup_name www
alias Web Servers
contact_groups infrastructure, webbies*
members LON3
}
* - host groups do not take contact_groups as a directive in
Nagios 2.0.
Configure hosts (hosts.cfg)
This is the part where you tell nagios which hosts you are
interested in. In $NAGIOSHOME/etc/hosts.cfg you can specify the
hosts by IP address, give them a label and set which check
command to use for testing whether it is alive and finally, what
time period you want to use for notifications. e.g. for our
company's webserver, LON3, we reference the generic host
definition given at the top of the hosts.cfg-sample file which
we retain in hosts.cfg and specify specifics:-
# 'LON1' host definition
define host{
use generic-host
host_name LON3
alias Solaris/Apache webserver
address 192.168.1.13
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
Now, when it comes to the status map, where you will want to
make the map look like the physical layout, you can use the
"parents" parameter to specify which host is the parent to the
one you are defining. For example, if you want the map to show
LON1, LON2 and LON3 connected to a router "Route1" on the way to
NY1 and NY2, you would specify that LON1, LON2, LON3, NY1 and
NY2 have the parent "Route1" like this in the hosts.cfg:-
# 'LON3' host definition
define host{
use generic-host
host_name LON3
parents Route1
alias Solaris/Apache webserver
address 192.168.1.13
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
# 'LON2' host definition
define host{
use generic-host
host_name LON2
parents Route1
alias Solaris/Mail server
address 192.168.1.14
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
Status Map
Also in the status map, you would probably like to have pretty
icons for each of the hosts. Download and unpack
imagepak-base.tar.gz(http://prdownloads.sourceforge.net/nagios/imagepak-base.tar.gz)
and copy the contents to $NAGIOSHOME/share/images/logos Now, we
need to tell nagios which icons to use for each host. In $NAGIOSHOME/etc/cgi.cfg
you need to point to an external template file which will
contain the definitions:-
xedtemplate_config_file=$NAGIOSHOME/etc/hostextinfo.cfg
and create that file, with the definitions for the hosts:-
define hostextinfo{
host_name LON2
2d_coords 40,40
icon_image sun40.png
icon_image_alt Solaris/Mail server
vrml_image sun40.png
statusmap_image sun40.gd2
}
where the *_image files are appropriately selected from those in
$NAGIOSHOME/share/images/logos, though you must use a .gd2 file
for the statusmap_image. The 2d_coords are where the icon should
appear on the status map if you are using an option of the
statusmap layout (set in $NAGIOSHOME/etc/cgi.cfg) that allows
for specifying the location. It is a good idea to start out
using the default layout 5 (Circular, Marked Up), which does not
required co-ordinates to be set. You can modify the setting
later (or not), when you have a better idea of where you want
them placed.
Configure commands (commands.cfg)
This part is quite complex, so I've made the details a separate
guide, here. However, basically what you need to do is to look
in the $NAGIOSHOME/libexec directory to see what commands are
there, check out the switches and flags (usually by running the
command with a --help option) and configure the ones you want in
$NAGIOSHOME/etc/checkcommands.cfg
Here is a basic example for the command to check whether a
secure apache is running on a host:-
# 'check_apache' command definition
define command{
command_name check_apache
command_line $USER1$/check_https -H $HOSTADDRESS$
}
$USER1$ refers to a configuration in the $NAGIOSHOME/etc/resource.cfg
file which usually (and in the frame of this installation guide)
refers to the location of the executable checking commands/plugins.
$HOSTADDRESS$ is the variable passed into the command denoting
on which host that service should be checked.
Configure dependencies
Dependencies between services can be configured in $NAGIOSHOME/etc/dependencies.cfg
For the moment, this will not be covered by this set of
guidelines.
Configure escalations
Dependencies between services can be configured in $NAGIOSHOME/etc/escalations.cfg
For the moment, this will not be covered by this set of
guidelines.
Configure resources
The $NAGIOSHOME/etc/resource.cfg file is where some common
variables and macros are defined. You can define up to 32 $USERx$
macros, which can in turn be used in command definitions in your
host config file(s). $USERx$ macros are useful for storing
sensitive information such as usernames, passwords, etc. They
are also handy for specifying the path to plugins and event
handlers - if you decide to move the plugins or event handlers
to a different directory in the future, you can just update one
or two $USERx$ macros, instead of modifying a lot of command
definitions.
Most importantly, the CGIs will not attempt to read the contents
of resource files, so you can set restrictive permissions (600
or 660) on them.
After installing nagios, the default resource.cfg-sample file is
generally good enough to be used as resource.cfg, unless you
have some fancy stuff to configure in.
nrpe Addon Configuration in Nagios
nrpe is the commonly used client application or agent that runs
on the hosts to be monitored to gather local data which cannot
(or is less logical to) be retrieved directly from the Nagios
host.
Download a copy of nrpe-<your version>.tar.gz and untar
somewhere sensible. Now build it:-
#./configure
#make all
#cp ./src/nrpe /usr/local/nagios
#cp ./src/check_nrpe /usr/local/nagios
#cp nrpe.cfg /usr/local/nagios
Add nrpe to the network services:-Edit /etc/services to add the
following line:-
nrpe 5666/tcp # nrpe, nagios monitoring service
We have already installed the nagios plugins packages
Now Configure the checks:- Edit nrpe.cfg to configure locally
and to add any checks to run on that host:-
allowed hosts=10.141.145.117command[check_data1]=/usr/local/nagios/libexec/check_disk
-w 10 -c 5 -p /data1
command[check_data2]=/usr/local/nagios/libexec/check_disk -w 10
-c 5 -p /data2
command[check_mysql_5]=/usr/local/nagios/libexec/check_mysql_5
-H database.domain.uk -u nagios -p nagios -P 3309
command[check_mysql_4]=/usr/local/nagios/libexec/check_mysql_4
-H database.domain.uk -u nagios -p nagios -P 3306
command[check_load]=/usr/local/nagios/libexec/check_load -w
15,10,5 -c 30,25,20
command[check_home]=/usr/local/nagios/libexec/check_disk -w 10
-c 2 -p /home
command[check_root]=/usr/local/nagios/libexec/check_disk -w 10
-c 2 -p /
command[check_var]=/usr/local/nagios/libexec/check_disk -w 10 -c
2 -p /var
command[check_usr]=/usr/local/nagios/libexec/check_disk -w 10 -c
2 -p /usr
command[check_u01]=/usr/local/nagios/libexec/check_disk -w 10 -c
5 -p /u01
command[check_u02]=/usr/local/nagios/libexec/check_disk -w 10 -c
5 -p /u02
command[check_u03]=/usr/local/nagios/libexec/check_disk -w 10 -c
5 -p /u03
command[check_u04]=/usr/local/nagios/libexec/check_disk -w 10 -c
5 -p /u04
(above are example checks, obviously) Check nrpe responds from
your main Nagios host:-
#/usr/local/nagios/libexec/check_nrpe -H machine.domain.uk -c
check_root
#/home/nagios/libexec/check_nrpe -H machine.domain.uk -c
check_root
And add services to your main Nagios host services.cfg:-
# Service definition
define service{
use nrpe-service
host_name dbdev2
service_description load
contact_groups engineers
check_command check_nrpe!check_load
}
# Service definition
define service{
use nrpe-service
host_name dbdev2
service_description /home
contact_groups engineers
check_command check_nrpe!check_home
}
...Then reload the nagios config on the Nagios host:-
#/etc/init.d/nagios reload
[* - if checking mysql, you might want to add a nagios user so
you're not using real ones:-
grant select on test.* to nagios@'%' identified by 'nagios';
grant select on test.* to nagios@'dev8' identified by 'nagios';
grant select on test.* to nagios@'localhost' identified by 'nagios';]
Configure services (services.cfg)
This is a quite large part of the configuration. The basics are
as follows.
In the file $NAGIOSHOME/etc/services.cfg, you need to specify
which services are to be monitored for each host. This ranges
from the basic ping to checking apache is running, SMTP is
working etc. For each server, you must at least specify a ping
service. The example I'll give is generic and based on the
generic-service template which is supplied in the file
services.cfg-sample (which must be included in services.cfg if
you want to reference it).
# Service definition
define service{
use generic-service
host_name $HOST1,$HOST2,$HOST3...
service_description $SERVICE
is_volatile 0
check_period $PERIOD
max_check_attempts #n
normal_check_interval #n
retry_check_interval #n
contact_groups unix-admins
notification_interval #n
notification_period $PERIOD
notification_options w,u,c,r
check_command $COMMAND $ARGUMENTS
}
One thing to note... if you are probing the availability of
machines/services which are not owned by you, it is probably
best to set the normal_check_interval to a conservative time
period, say 10 minutes. The interval_length is set in $NAGIOSHOME/etc/nagios.cfg,
defaults to 60 (seconds). The check_interval is set in multiples
of the normal_check_interval, so for 10 minutes, leave
interval_length at the default and set normal_check_interval to
10.
Configure service groups (servicegroup.cfg
only forNagios v2.0 or higher)
As with host groups, you can group services into logical clumps,
specifying the host and service name for each service in the
group:-
# 'Live Databases' service group definition
define servicegroup{
servicegroup_name live_db
alias Live Databases
members
$HOST1,$SERVICE1,$HOST2,$SERVICE2,$HOST2,$SERVICE3,$HOST3,$SERVICE4,$HOST4,$SERVICE5
}
Service groups do not take contact_groups as a directive.
Configure mail alerts (misccommands.cfg)
This is specific to Solaris. The default setup of mail uses
mail, which does not take -s under Solaris, so the subject lines
of the alert emails will be blank. You need to use mailx. So,
edit $NAGIOSHOME/etc/misccommands.cfg and find the lines:-
# 'notify-by-email' command definition
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 1.0 *****\n\nNotifica
tion Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddr
ess: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $DATETIME$\n\nAdditional
Info:\n\n$OUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$
alert - $HOSTALIA
S$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
and change mail to mailx. Also in this section, you can
configure what will appear on the subject line. Basically, just
modify the section in quotes after mailx -s, using relevant
variables for what you want to see.
Troubleshooting Nagios Configuration
If you have problems with the status map, histograms etc., then
you do need to make sure that your libraries are linked as
follows:-
crle -l /usr/lib:/usr/local/lib:/usr/local/ssl/lib:/opt/sfw/lib
Remember, your system may be using libraries in other places in
addition to these locations. Take care to include those if you
need to.
Also, for problems with status map and histograms, check back to
when you installed the GD, jpeg and png libraries. Did you
install them in the correct order and did gd report jpeg and png
support something like this:-
** Configuration summary for gd 2.0.33:
Support for PNG library: yes
Support for JPEG library: yes
Support for Freetype 2.x library: no
Support for Fontconfig library: no
Support for Xpm library: yes
Support for pthreads: yes
If not, you may need to re-visit your gd installation.
Start her up and see what happens
$NAGIOSHOME/bin/nagios start
Then point your browser at: http://yourserver/nagios/ and
attempt to log in.