Node Monitoring

Motivating Image

This page is an attempt to bring together information on the monitoring of nodes. The idea is to do something like MississippiMonitoring for the rest of our nodes so that we can get some information, at least, about how much they are used. Ideally, we might have some indication of when they break too, and some information to help us diagnose why they have broken. And, it would be nice if most of this information existed in one place, since we are very lazy.

Note: This is a work in progress.

Historical Discussion

Some information can be garnered from the NoCat status page on port 5280, and a bit more by manually logging into the box, but this is tedious, and it is only a snapshot. SystemWideStatistics details one idea which was never followed through. NodeMississippi monitors its nodes using cacti and snmpd, and it works quite well. WifiDog uses heartbeats to monitor the upness of a host and keep some statistics, and while this is a great advance, it is only useful on our few nodes that use WifiDog, and isn't quite as information rich or flexible as we would like.

Current Approach

The current approach, will be to run snmpd (net-snmpd, read-only, limited to on our internet accessible NuCabs and then aggregate information using Cacti. We have a cacti instance on chevy already for MississippiMonitoring, so it is a pretty good candidate for initial experimentation. Although, at some point, we might want to run cacti elsewhere (maybe on

You can paste the following into a (root) shell and it will save a little script at /usr/local/bin/assoc_count that gives an approximation of an active client count:

echo '#!/bin/bash
echo $(($(iptables -n -L NoCat -t mangle | grep "MARK set 0x3" | wc -l)))
' > /usr/local/bin/assoc_count
chmod 755 /usr/local/bin/assoc_count

Or, for a node that uses WifiDog, the script looks like:

echo $((`iptables -n -L WiFiDog_Outgoing -v -t mangle | grep 'MARK set 0x2' | wc -l`))

One possible limitation of these scripts, is that they only counts a client as "gone" when nocat times out their lease. Often, NoCat is configured for VeryLongTimeouts, and isn't good about checking the arp-table for early leavers. Still, it is the best solution I can think of at the time, perhaps a second version could cross-check the arp table for activity and exclude those macs that are clearly inactive.

Here is an example snmpd.conf script. The 64... ip is, I put the IP there because it wasn't working with a dns name, but I'm not sure why.

cat > /etc/snmp/snmpd.conf <<EOF
rocommunity sPecial0ps
rocommunity public

# Use exec to pull up the association count
exec assoc_count /usr/local/bin/assoc_count
# OID =

# Or, alternately, you can use 'extend' instead of 'exec'
# extend assoc-count /usr/local/bin/assoc_count
# OID = .

You can test that it is working on the box using this command:

snmpget -c public -v 1 localhost

Or, from chevy with:

snmpget -c sPecial0ps -v 1


This ruby script will turn give you the proper OID for an extend object given the label. For intance, to get the OID listed above, I did:

./snmp_oid_extend.rb "assoc-count" 1

# Will give you the full OID for an "extend" executed
# snmpd command which will be part of NET-SNMP-EXTEND-MIB
# and is part of the nsExtendOutLine."somelabel" object.
# For instance, suppose you have this line in your snmpd.conf:
# extend qmail-smtp-concurrency /usr/local/bin/qmailmrtg7 t /var/log/qmail/qmail-smtpd
# And, you want to get the output remotely, you can do:
# snmpget -v 1 -c public localhost `snmpd_extend_oid.rb "qmail-smtp-concurrency"`
# Basically, the format for an OID of a label indexed item is:
# <parent_oid>.<label strlen>.<label chars as ascii decimal codes>.<line>
# You can find out different parent OIDs using this command:
# snmptranslate -Td -OS <some textual OID>
# The numeric parent OID will be in the last line, like:
# ::= { iso(1) org(3) dod(6) internet(1) private(4) enterprises(1) netSnmp(8072) netSnmpObjects(1) nsExtensions(3) 
# nsExtendObjects(2) nsExtendOutput2Table(4) nsExtendOutput2Entry(1) 2 }
# Where you extract the numeric OID by taking all those integers in order and
# concatenating them, delimiting with periods
# And, you can get a list of textual OIDs with this command (for the extend MIB at least)
# snmpwalk -c public -v 1 localhost NET-SNMP-EXTEND-MIB::nsExtendObjects
# Author: Caleb Phillips
# License: Beer Ware v.42

def usage; puts "Usage: ./snmpd.rb <label> [line-number]"; exit 1;end
label = ARGV[0]
line = ARGV[1].nil? ? "1" : ARGV[1]
usage if label.nil?
puts "." + label.length.to_s +
     "." + (0..label.length-1).collect{ |i| label[i] }.join(".") +
     "." + line

NodeMonitoring (last edited 2009-08-15 21:04:09 by JasonMcArthur)