Node Monitoring

This page is an attempt to bring together information on the monitoring of nodes. The idea is to do something like MississippiMonitoring for the rest of our nodes so that we can get some information, at least, about how much they are used. Ideally, we might have some indication of when they break too, and some information to help us diagnose why they have broken. And, it would be nice if most of this information existed in one place, since we are very lazy.

Note: This is a work in progress.

Historical Discussion

In the past, we have used ["Nagios"] and sometimes ["SNMP"] to monitor nodes, although, neither of those are currently used (they are still listed on ToDoList, for instance). Some information can be garnered from the NoCat status page on port 5280, and a bit more by manually logging into the box, but this is tedious, and it is only a snapshot. SystemWideStatistics was details one idea which was never followed through. NodeMississippi monitors its nodes using cacti and snmpd, and it works quite well. WifiDog uses heartbeats to monitor the upness of a host and keep some statistics, and while this is a great advance, it is only useful on our few nodes that use WifiDog, and isn't quite as information rich or flexible as we would like.

Current Approach

The current approach, will be to run snmpd (net-snmpd, read-only, limited to chevy.personaltelco.net) on our internet accessible NuCabs and then aggregate information using [http://cacti.net Cacti]. We have a cacti instance on chevy already for MississippiMonitoring, so it is a pretty good candidate for initial experimentation. Although, at some point, we might want to run cacti elsewhere (maybe on biker.personaltelco.net).

Here is a tiny script that you can run on a nucab to get a count of currently associated clients:

echo $((`iptables -L NoCat -t mangle | grep "MARK set 0x3" | wc -l`))

Here is an example snmpd.conf script. The 64... ip is chevy.personaltelco.net, I put the IP there because it wasn't working with a dns name, but I'm not sure why.

rocommunity sPecial0ps 64.105.215.242
rocommunity public 127.0.0.1
exec assoc_count /usr/local/bin/assoc_count

Nodes To Do

These are nodes that are compatable with the above plan and just need to be done.

Nodes Done