Adding a couple of new Windows hosts to my monitoring network this morning, my NRPE plugin checks against them were failing.
me@nagios:~$ /usr/lib/nagios/plugins/check_nrpe -H monitoredwindowsserver.vpn CHECK_NRPE: Error - could not complete SSL handshake.
The usual culprits – password set but not being used, or hosts_allowed not including the host doing the checking – were already set correctly.
Turns out the NSClient++ folks changed up the default configs. In order to get it working again with a relatively vanilla Nagios server on the other end, I needed to set two new directives under [/settings/NRPE/server]:
verify mode = none insecure = true
With those directives set and a restart to the nsclient service on the Windows end, manual tests to NRPE worked properly:
me@nagios:~$ /usr/lib/nagios/plugins/check_nrpe -H monitoredwindowsserver.vpn I (0.4.4.19 2015-12-08) seem to be doing fine...
One of these days I should figure out how to get the default modes, which use peer-to-peer certificate checks, working… but for the moment, I’m only allowing traffic over a VPN tunnel anyway, so it was more important to get it working in its existing (secured by VPN) configuration than to blow a few hours untangling the new defaults.
Wasn’t out of the woods yet, though – that got NRPE working, but not NSClientServer, which is what I’m actually using to monitor these Windows hosts for the most part. So I was still seeing “CRITICAL – Socket timeout after 10 seconds” on a lot of tests against the new hosts in Nagios. Doing a netstat -an on the Windows hosts themselves showed that they were listening on 5666 – the NRPE port – but nothing was listening on 12489, the NSClientServer port. This required another fix in the nsclient.ini file. Just underneath [/modules], you’ll need to add (not just uncomment!) this line:
NSClientServer = enabled
And restart the NSClient++ service again, either from Services applet or with net stop nscp ; net start nscp. Now we test the check_nt plugin against the host…
me@nagios:~$ /usr/lib/nagios/plugins/check_nt -H monitoredwindowsserver.vpn -v UPTIME -p 12489 System Uptime - 0 day(s) 1 hour(s) 34 minute(s)
Now, finally, all of my tests are working. Hope this saves somebody else from having the kind of morning I had!