The FreeBSD Diary |
(TM) | Providing practical examples since 1998If you buy from Amazon USA, please support us by using this link. |
NRPE: Unable to read output - The followup
15 October 2010
|
Last week, I wrote about a problem with NRPE reporting NRPE: Unable to read output. Harold Paulson encountered the same issue yesterday. I was able to help him debug it today. We found a solution: a full path to sudo. We both had the same situation. Checking from the nagios server: $ /usr/local/libexec/nagios/check_nrpe2 -H kraken -c check_smartmon_ada8 NRPE: Unable to read output But when running locally on the Nagios client (NOTE: I amended the shell for nagios to /bin/sh for this test): # su -m nagios -c 'sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ada8' OK: device is functional and stable (temperature: 35)|TEMP=35;55;60; If you don't make that temporary shell adjustment, you'll get this instead: # su -m nagios -c 'sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ada8' UNKNOWN: no read permission given We couldn't figure this out. Then, while looking at /usr/local/etc/nrpe.cfg and searching for sudo, I found: # *** THIS EXAMPLE MAY POSE A POTENTIAL SECURITY RISK, SO USE WITH CAUTION! *** # Usage scenario: # Execute restricted commmands using sudo. For this to work, you need to add # the nagios user to your /etc/sudoers. An example entry for alllowing # execution of the plugins from might be: # # nagios ALL=(ALL) NOPASSWD: /usr/local/libexec/nagios/ # # This lets the nagios user run all commands in that directory (and only them) # without asking for a password. If you do this, make sure you don't give # random users write access to that directory or its contents! # command_prefix=/usr/local/bin/sudo My clue was the full path. I figured we were onto something. In the meantime, Harold had restarted nrpe and discovered it resolved his particular problem. I then recalled that the same restart had fixed something for me too. I decided to restart my system to reproduce the problem Harold had just fixed. After the reboot, Nagios was indeed reporting the same error messages. It was about that time that I recalled this entry in /etc/crontab, which attempted, but failed, to solve the problem: # for some reason, nrpe2 doesn't start right on boot @reboot root /bin/sleep 600 && /usr/local/etc/rc.d/nrpe2 restart I recall now that this never need resolve the issue. I always had to restart it by hand. We were about to confirm our suspicions. First, I mounted procfs so I can use the -e option on ps to view the environment variables of the nrpe process: # mount -t procfs proc /proc Then I looked at the existing nrpe process which was producing the error: $ sudo ps -auxwe -p 1104 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND nagios 1104 0.0 0.1 11060 3300 ?? Ss 1:16PM 0:00.02 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin RC_PID=24 PWD=/ /usr/local/sbin/nrpe2 -d -c /usr/local/etc/nrpe.cfg Then I restarted nrpe and issue the command for the new process: $ sudo ps -auxwe -p 3619 sudo: cannot get working directory USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND nagios 3619 0.0 0.1 11060 3372 ?? Ss 1:25PM 0:00.00 SUDO_GID=1001 USER=root MAIL=/var/mail/root HOME=/root SUDO_UID=1001 LOGNAME=root USERNAME=root TERM=xterm PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/home/dan/bin RC_PID=3600 SUDO_COMMAND=/usr/local/etc/rc.d/nrpe2 restart SHELL=/bin/sh SUDO_USER=dan PWD=/proc/1104 /usr/local/sbin/nrpe2 -d -c /usr/local/etc/nrpe.cfgAs you can see, in the first output, /usr/local/bin is not in the PATH, but it is in the second output. This explains why the errors go away after nrpe is restarted. I then amended nrpe.cfg to contain a full path to sudo: command[check_smartmon_ada8]=/usr/local/bin/sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ada8 After a reboot of the system, no more nrpe errors. :) Why? When processes are run from init and cron, they exist in a very sparse environement. This is by design. Thus, full paths are often needed and are a very good design choice in any case. |