Interfacing SilentJack and Nagios

So, silence detection is a big deal when it comes to monitoring broadcast audio systems. You want to be sure your stuff is making noise. If your sustainer’s not putting anything out, it’s not a lot of good.

SilentJack is an awesome little utility from the king of ‘oh, that’s a handy little program for broadcast’, Nicholas Humfrey. This guy’s getting a beer if I ever meet him. But it’s not a simple drop-in tool for monitoring, sadly – we need to do a bit of work to make it so.

We use Nagios at Insanity – it wasn’t made for broadcasting shops, but it’s perhaps the best thing out there for monitoring large complex systems. And let’s face it, even simple stations are. For Insanity we have 17 hosts (one of which is a virtual host representing our hardware silence detector) and 95 services configured. All of those are prodded every few minutes to check they’re alive and working. The sort of thing we monitor are load averages of Linux boxes, disk space, process counts, pings, time synchronization, HTTP responses (on our online streaming servers), POP3/IMAP/SMTP responses. For the Windows boxes we also check if Myriad is running (though this is a bit useless, as if Myriad’s crashed it will still be running, just frozen). We also monitor things like SNMP responses from our switch and use cluster checks to monitor dumb switches without SNMP support. GPIO monitoring at the moment is done with some kinda awkward Arduino scripting. More on that in another post.

Anyway, back to the task at hand. SilentJack lets us run a command after a given amount of silence. It will wait till the command finishes, and then go back to listening for silence.

This is awkward. What we really want is a command to run on silence, and a command to run after we get good audio for a certain period. Well, we can almost do that with what we’ve got. Perhaps the ideal solution is some hacking on the source for silentjack- I’ll have a look at that in a bit…

Here’s what we have for now, though. We’ve got Redis running at Insanity, which is a very lightweight key-value server. Go grab it and install it. Now we’ll need some scripts!

So, this is a tiny little script that does only one thing – it spawns our main script in a screen window, and takes very little time to do it. This escapes the fact that we can’t fork a long-running process from silentjack without stopping it listening to the audio. Hence, silenthack.sh! Note you need GNU Screen to run the above, packaged as ‘screen’ on most distros. Now we need something a little more heavyweight. Break out python, send_nsca and redis! We’re going to construct a small script that runs for 12 seconds; we’ll be making silentjack angry on more than 10 seconds of audio silence. This script will use a mutex lock to see if another copy of the script has been started since it started; this means we can see if there’s a continuing outage of audio. With that knowledge we can both maintain a simple flag in Redis (to be queried by Nagios for active checks) and send a passive check result to Nagios. We will always send the failure result to let Nagios know that we’ve still got a problem each time the script fires, but we only send the “All okay” result if we don’t see silentjack spawn another process after we’ve waited for greater than the silentjack silence duration. This keeps Nagios nice and clean and ensures you don’t get unneeded flapping and loads of emails.

The script to do this is above. Note you’ll have to change a few parameters here and there, and you need the send_nsca command-line utility installed, not to mention the nsca daemon running on another machine and all that. You’ll also need the python Redis package. But if you’re using Nagios already you’ll probably already have that lot, and if you’re not, it’s pretty simple – the Nagios documentation is great, and your distro probably already has packages.

So, this now gives us a rather nice simple display of our levels being okay. But don’t forget to set up a process check to ensure that silentjack is actually running. Once you’ve done that, you’re sorted. You can also whip up a quick Nagios plugin to check the flag actively.

There’s what I use – it’s quick and dirty, but it works. We simply check to see if the silentjack_silent flag is true or false. Then all that remains is to define the command in Nagios…

define command{
        command_name    check_redis
        command_line    $USER1$/check_redis $ARG1$ $ARG2$
        }

And to define the service, making sure we enable passive checks…

define service{
        use                     generic-service
        host_name               paxman
        service_description     SilenceDetector
        check_command           check_redis!silentjack_silent!False
        passive_checks_enabled  1
        }

And now to start silentjack, here using Rivendell’s main output.

screen -dmS sj silentjack -l -30 -c rivendell_0:playout_0L -p 10 -v \
         /home/insanity/rdscripts/silenthack.sh

Voila! We’ve now got both active and passive monitoring enabled for our silence detector in software. It’s a bit rough around the edges- I’d love any suggestions on how to improve it.

One thought on “Interfacing SilentJack and Nagios”

  1. Actually difficult to get skillful people about this issue, but you sound like you know exactly what you are sharing! Regards

Comments are closed.