Mississippi Network Diagram and Configuration
Buick serves DHCP and does NAT for our DSL line. All of the metrix boxes run ["STP"]. The whole network is bridged, and in the 10.11.104.0/22 network.
NEWS
November 21, 2005: RussellSenior built a freshened kernel (2.6.14.2) and madwifi-ng (rev 1329), installed them on metrix-naya-sw, metrix-commons, and metrix-west, and rebooted. The new madwifi-ng rev was built in a metrix-compatible chroot environment and so the madwifi-utils in /usr/local/bin are now linked properly. The other two metrixes, metrix-naya-nw and metrix-ed are still running the original 2.6.12.3-metrix kernel and the WDS-branch madwifi drivers from late July. It is possible to connect with essentially zero packet loss from buick to metrix-west if you simultaneously "ping -f 10.11.104.2" from buick. Metrix-west was apt-get upgraded that way.
November 20, 2005: RussellSenior thinks he's figured out what is going wrong. It is an effect caused by client-node to client-node when the traffic needs to pass through one of the client bridges. As mentioned earlier, when a client-node sends to a client-node, it sees the traffic twice, once when it is sends it and once (in promiscuous mode) when the master rebroadcasts it. When the traffic is passing from the other side of the bridge (say, from buick on eth0), and it sees the rebroadcast packet it just sent with a SRC MAC on ath0, the bridge is reassigning that MAC to the bridge port associated with ath0, not eth0. When packets return headed for that MAC, they get to the bridge and the bridge fails to deliver to the port where that MAC actually lives. Boom. This problem does not occur when communicating client-to-master (or master-to-client), because these packets are not rebroadcast. The problem doesn't occur when the communication is strictly client-to-client, because even though the client still sees the rebroadcast packet, the bridge is smart enough to know not to reassign local MAC addresses to a different port.
RussellSenior tested this model this morning by ping flooding from buick to metrix-commons (thus keeping metrix-naya-sw's bridge refreshed with where buick's MAC should properly live) while pinging the problematic metrix-west. Still some lossage, but far less than the usual 98%, only about 17%.
Now the question is, what is the solution? One temporary solution might be to use ebtables filtering to drop packets at metrix-naya-sw where buick's MAC shows up on ath0 as a SRC MAC. But there are other situations where we'll see the same phenomenon, e.g. 11b/g clients of the 11a client nodes. The real solution is to get the sending bridges to ignore the rebroadcasts altogether.
November 19, 2005: DonPark and RussellSenior climb up on Mississippi Commons and collect some more data. Russell collects some kismet data during some ping tests, but misconfiguration of his kismet rig reduces its utility to near zero, but a careful examination of the tcpdumps from the metrixes yields the insight needed to figure out what is going wrong. The failure in ping request/reply loop between metrix-west and buick always occurs in the delivery of packets from metrix-naya-sw to buick. ARP traffic gets delivered fine in both directions, but ICMP and other IP traffic disappears after it arrives at naya-sw on its way to buick... buick never sees it. why?
November 17, 2005: TroyJaqua and RussellSenior visited Cecily's to try to recover metrix-west from misconfigured network. Unfortunately, we were unable to connect via ethernet either, so at about 3pm we came back, equipped with a ladder generously loaned by a neighbor, and Troy climbed up and swapped out the metrix motherboard with one configured with Ben's firmware. This was deemed to be the most practical solution, given the difficulties of getting the serial cable onto the DB9 pins and logging in while balancing on the crest of the roof. The radios on metrix-west remain the same as before, just the motherboard and its flash and ethernet are changed. See updated MAC address in the table below. Also did some testing from metrix-west. One interesting result was that pinging from metrix-west to buick disrupted a ping from metrix-commons to buick. Still need to get onto the commons roof to collect some over-the-air packets. Hopefully tomorrow. Ebtables may be our salvation. We are getting closer, but still haven't cracked it yet. The 11b/g radio was put on essid notyet.personaltelco.net to indicate it isn't actually working yet. I said we'd switch back to www when it was active and working. Talked to a few residents that were enthusiastic to dump their $60/month broadband.
November 11, 2005: I think I've figured out the "received packet with own address as source address" messages. They are only appearing on metrix-west and metrix-naya-sw. I think they are a consequence of having bridges on nodes in managed-mode. The master-mode node rebroadcasts frames sent via it, and bridging puts the interfaces in promiscuous mode, so the sender is hearing the rebroadcast. The messages are therefore, presumably, innocuous. --RussellSenior
RussellSenior, MichaelWeinberg, and I got together today and did some more discussion and testing regarding the problems at hand. This discussion continued into the evening on IRC. As of now, we have some unanswered questions the most pertanent being: what is naya-sw doing with packets coming from metrix-west headed to buick, and why isn't it doing the obvious thing...send them to buick? --CalebPhillips
November 10, 2005: RussellSenior rebooted metrix-commons and metrix-naya-sw to the new kernel, and magically, traffic started to flow between metrix-west and metrix-naya-sw for the first time. However, oddly, connectivity from buick (10.11.104.1) to metrix-west was still severely lossy. Log messages "received packet with own address as source address" are appearing on metrix-naya-sw's /var/log/messages. Some progress, but some bugs still need straightening out.
November 9, 2005: RussellSenior got into the basement and was able to recover metrix-west via ethernet. The problem had to do with modules not loading. Patched that problem in a somewhat kludgy way by adding "pre-up modprobe ath-pci" to the athN stanzas in /etc/network/interfaces. The ath-pci module should have loaded from /etc/modules, but wasn't for some reason. Applied the same fix to metrix-commons and metrix-naya-sw, but haven't rebooted them. Can ping metrix-commons from metrix-west, but not all the way to metrix-naya-sw. Hoping a reboot to the new kernel will correct that.
November 8, 2005: RussellSenior has copied a new kernel, modules and utilities for use with the madwifi-ng drivers over to metrix-west, metrix-commons, and metrix-naya-sw. The /boot/grub/menu.lst file is modified but still pointing at the 2.6.12.3-metrix kernel. Except in the case of metrix-west, which because it was already not connected, we decided to use it as a test case. It rebooted to 2.6.14-metrix (with the madwifi-ng drivers) and is associated with metrix-commons with a nice strong signal on 802.11a, but for some reason its network is not functioning. Same thing with the 802.11b/g radio, association and a nice strong signal from the street, but no network. It isn't pingable from either radio. Going to try to get inside to test from the ethernet tomorrow.
November 7, 2005: RussellSenior is hacking on a metrix image with a new kernel and madwifi-ng drivers, using the metrix we pulled off of Cecily's as a testbed.
- I built a serial-console cable for the metrixes. It consists of a standard serial cable with one end cut off and spliced with three small wires with female connectors on the tips to slip over the male DB9 pins. It is slightly tricky to install, requiring tweezers, some light and a little persistance, but it beats the hell out of disassembling the thing to get at the serial port. The communications parameters for talking to the metrix console are 19200 baud, N81, no flow control. Using this console cable, the three wires are placed as follows (taking care that bending loads on the wires don't cause the conductive parts to touch... probably should insulate the connectors better with some heat-shrink tubing):
- pin 2 - blue
- pin 3 - white
- pin 5 - black
- Observed that wds clients can't associate with non-wds masters, but any client can associate with wds masters.
- I apt-get update ; apt-get upgrade'd the metrix, which installed about half a dozen new versions of things, but not too overwhelming.
- I also apt-get install'd tcpdump and ntpdate.
- I have compiled a 2.6.14 kernel and current svn madwifi-ng drivers and loaded them onto the metrix with rsync. It still can't see my 802.11a AP, presumably because I don't have the madwifi-ng drivers on the AP yet. Grub is not configured to boot the new kernel, so the metrix falls back to the old kernel without manual intervention on the serial console.
- The eth0 interface isn't coming up on boot. But /etc/network/interfaces is a mess right now, so maybe no surprise! An ifup eth0 cures it, but you obviously need a console to do that.
- The madwifi-ng drivers employ a new method of defining interfaces. Physical devices are named wifi0, wifi1, ... wifiN. Interfaces are created with the wlanconfig command, e.g.: "wlanconfig ath0 create wlandev wifi0 wlanmode ap". The associated utilities are installed in /usr/local/bin. Some /etc/network/interfaces changes need thinking through.
October 28, 2005: CalebPhillips, RussellSenior, and MichaelWeinberg replaced the metrix on Cecily's rooftop and re-attached the equipment with real chimney mounting hardware. This node seems to be working just fine now, with good connectivity to Commons. However, currently there seems to be a problem with the switch at Naya that is preventing the network from handing out DHCP or access to the intarweb. The todo list below has the current todo.
Russell and I came back to fix the switch issue and found Buick DOA. We also replaced the switch with one Russell had on hand. Everything seems to work now...except, Cecily's cannot connect to anything past commons in the direction of naya. Specifically, it seems like clients of the commons 802.11a radio (with the omni) cannot see each other (naya-sw and metrix-west are both clients to commons). We are working on an explanation. At this point the network is entirely functional everywhere except metrix-west. - CalebPhillips
October 27, 2005: CalebPhillips and RussellSenior managed to get Ed's roof online. Without access to a ladder we had only one option to make progress, and that was to see if the apparently non-functioning metrix on Ed's roof was actually powered on and accessible from the ethernet. Ed graciously let us in to check. This possibility was suggested by Russell's experience with metrix-naya-sw, where the radios did not initially come up after a reboot. Turns out, the metrix was on. Russell was able to connect via the ethernet, and got a weak signal radio signal on 11g, roughly 7 dB SNR. So the problem wasn't a bad POE connection and it was not a failure to load ath_pci either. Caleb suggested that we might have the antennas backwards. Twice, trying to "ifdown ath0" and the "ifdown ath1" froze the metrix. Russell tried swapping ath0 and ath1 in /etc/network/interfaces, rebooted and bingo, 11b/g started working! SNR in the attic jumped to about 30. However, the 11a backhaul was weak. Pinging to Commons worked, but with about a 40% packet loss. Maybe need to repoint the antenna (isn't there a distance tweak for 11a having to do with an ACK timeout or something, but I thought it was for further than we're talking about here). Caleb and Russell retreated to FreshPot to report success and think. Russell, looking out the window at the backfire already pointing at Ed's from NAYA NW, realized there was a chance that Ed's radio might be able to hit NAYA NW, even though it wasn't pointed directly at it, because it was only 600 or so feet away instead of 1500 feet to Commons. Russell changed the metrix-naya-nw ath0 radio to ESSID backhaul-nw and master mode on channel 161 and turned it on, then walked up the block near Ed's, logged in via 802.11g and reconfigured its ath1 (connected to the 11a antenna) to backhaul-nw and rebooted. Bingo! SNR of about 30 dB. Kind of a chewing gum and bailing wire solution, but it is up and passing traffic. - RussellSenior
TODO
- Add connectable interfaces to eth0, ath0 and ath1 so that recovery from bridge failure is easier.
Followup on madwifi.org ticket [http://www.madwifi.org/ticket/166 166], which describes our problems with metrix-west.
- What's the deal with Broadcom-based client devices connected to Atheros based APs? Why doesn't it work? Under what circumstances (if any) will it?
- Decommission (redundant) Cisco AP on Commons
- Ed's is mounted on the chimney, plugged in, talking to metrix-naya-nw. Need to try repointing antenna at Commons roof.
- Metrix-naya-sw currently does not have line-of-sight from Commons mast due to vegetation, but still a reasonable signal, probably want to use metrix-naya-nw for 11a backhaul instead for better line-of-sight, perhaps even replacing metrix-naya-sw with a single-radio device. Another option is to leave both metrixes pointing at Commons. The spanning-tree protocol (STP) should pick one to use and self-heal if one goes down.
- Metrix-naya-nw's ath0 is configured as a master for Ed's on ESSID backhaul-nw, channel 161. If we can get Ed's metrix connected to metrix-commons, then we should change metrix-naya-nw's ath0 into client on the 11a backhaul network and point its backfire antenna at metrix-commons.
- Add ciscos to the MAC address table below.
- Add access point in Commons courtyard to MAC address table below.
- The Sector antennas on Naya might benefit from being repointed...
- Install link to second DSL line (in bookstore) and setup round-robin load balancing
- Need to install a "repeater" at Cecily's location, attached to the ethernet port on the PoE injector.
- Need to install a "repeater" at Amnesia brewing inside and behind the corrogated steel.
Access points
Location |
Part of roof |
Hardware |
Interface/IP address |
Int/Channel |
Antenna |
NAYA |
SW corner |
Metrix |
br0(eth0,ath0,ath1)/10.11.104.2 |
ath0/165/client |
17 dBi to Commons |
ath1/1 |
9 dbi 120deg pointing S |
||||
NW corner |
Metrix |
br0(eth0,ath0,ath1)/10.11.104.3 |
ath0/161/master |
17 dBi to Ed's |
|
ath1/1 |
9 dbi 120deg pointing NW |
||||
SE corner |
Cisco |
BVI1(Dot11Radio0,FastEth0)/10.11.104.4 |
Dot11Radio0/11 |
9 dbi 120deg pointing E |
|
Commons |
SE corner |
Metrix |
br0(eth0,ath0,ath1)/10.11.104.5 |
ath0/165/AP |
8 dBi 802.11a omni |
ath1/1 |
7 dBi omni (w/down-tilt) |
||||
Cisco |
BVI1(Dot11Radio0,FastEth0)/10.11.104.6 |
Dot11Radio0/11 |
9 dBi 120deg pointing SE |
||
Cecily's |
Chimney |
Metrix |
br0(eth0,ath0,ath1)/10.11.104.8 |
ath0/165/client |
17 dBi to Commons |
ath1/1 |
9 dBi omni |
||||
Ed's |
Chimney |
Metrix |
br0(eth0,ath0,ath1)/10.11.104.9 |
ath1/161/client |
17 dBi to NAYA NW |
ath0/11 |
9 dBi omni |
See ChannelFrequencyChart for reasoning about channels, and channel choices.
MAC addresses
buick |
Interface |
MAC |
10.11.104.1 |
eth1 |
08:00:20:c5:9a:5c |
metrix-naya-sw |
Interface |
MAC |
10.11.104.2 |
ath0 |
00:02:6F:21:EC:AA |
ath1 |
00:02:6F:21:EC:A5 |
|
eth0(br0) |
00:00:24:C3:A9:C0 |
|
metrix-naya-nw |
Interface |
MAC |
10.11.104.3 |
ath0 |
00:02:6F:21:EC:A8 |
ath1 |
00:02:6F:21:EC:A6 |
|
eth0(br0) |
00:00:24:C3:A9:B4 |
|
metrix-commons |
Interface |
MAC |
10.11.104.5 |
ath0 |
00:02:6F:21:EC:A9 |
ath1 |
00:02:6F:21:E9:49 |
|
eth0(br0) |
00:00:24:C3:A9:A0 |
|
metrix-west |
Interface |
MAC |
10.11.104.8 |
ath0 |
00:02:6F:21:EF:ED |
ath1 |
00:02:6F:21:EF:F2 |
|
eth0(br0) |
00:00:24:c3:a9:ac |
|
metrix-ed |
Interface |
MAC |
10.11.104.9 |
ath0 |
00:02:6F:21:EF:F0 |
ath1 |
00:02:6F:21:EF:F1 |
|
eth0(br0) |
00:00:24:C3:E4:1C |
|
metrix-troy (not deployed) |
Interface |
MAC |
10.11.104.x |
eth0 |
00:00:24:C3:E4:30 |
metrix-russell (not deployed) |
Interface |
MAC |
10.11.104.y |
ath0 |
00:02:6F:21:EC:A2 |
ath1 |
00:02:6F:21:EC:A1 |
|
eth0(br0) |
00:00:24:C3:A9:B8 |
|
edimax-commons |
Interface |
MAC |
10.11.104.15 |
bridge |
00:0E:2E:3C:4B:7D |
Firmware Image
The image for the Metrix boxen is at http://cornerstone.personaltelco.net/~brj/metrix-missnet.img It is setup with all 3 interfaces bridged, and br0 set to 10.11.104.5/22.
To flash:
- Setup a sarge installer pxeboot environment.
Instructions at http://www.debian.org/releases/stable/i386/ch04s06.html.en
The pxelinux included with debian tries to use both the VGA emulation and the serial console. This causes problems. Use pxelinux.0 from http://centerclick.org/net4801/pxelinux/ instead.
- Use the serial console pxeboot config.
- Change all the 9600 to 19200 in the config.
- You probably want to add DEBIAN_FRONTEND=text to the boot options of the kernel you'll use. Use expert26.
- Netboot the metrix into the installer
- Attach a serial terminal.
- 19200 baud, 1 bit parity, no flow control
- Press ctrl+p at the prompt to get to the BIOS
boot f0
- Use expert26. You don't need any extra options.
- Go through the installer up through step 9 (Download installer components). This is where device drivers are loaded.
- After the drivers are loaded, go to a shell.
wget -O - http://somewhere/foo/metrix-missnet.img | dd of=/dev/discs/disc0/disc bs=1M (don't get it straight from cornerstone, copy it to your machine first)
- Reboot
Note: as of 2005-10-11 this process isn't working for me. The Metrix stalls during the PXELINUX stuff. I think it's probably just an issue with my setup but I've got it working fine using the documentation from Metrix. (- KeeganQuinn)
The way the Metrix website says to do it is actually better. When I was working on the metrixes, though, that documentation didn't exist, to my memory. The debian netboot was simply the easiest way I could find to get a shell netbooted. (- BenjaminJencks)
Can KeeganQuinn or BenjaminJencks please point at or reiterate the referred-to Metrix documentation here? --RussellSenior

