Size: 12757
Comment:
|
Size: 2815
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
START |
|
Line 13: | Line 11: |
STOP '''November 19, 2005''': DonPark and RussellSenior climb up on Mississippi Commons and collect some more data. Russell collects some kismet data during some ping tests, but misconfiguration of his kismet rig reduces its utility to near zero, but a careful examination of the tcpdumps from the metrixes yields the insight needed to figure out what is going wrong. The failure in ping request/reply loop between metrix-west and buick always occurs in the delivery of packets from metrix-naya-sw to buick. ARP traffic gets delivered fine in both directions, but ICMP and other IP traffic disappears after it arrives at naya-sw on its way to buick... buick never sees it. why? '''November 17, 2005''': TroyJaqua and RussellSenior visited Cecily's to try to recover metrix-west from misconfigured network. Unfortunately, we were unable to connect via ethernet either, so at about 3pm we came back, equipped with a ladder generously loaned by a neighbor, and Troy climbed up and swapped out the metrix motherboard with one configured with Ben's firmware. This was deemed to be the most practical solution, given the difficulties of getting the serial cable onto the DB9 pins and logging in while balancing on the crest of the roof. The radios on metrix-west remain the same as before, just the motherboard and its flash and ethernet are changed. See updated MAC address in the table below. Also did some testing from metrix-west. One interesting result was that pinging from metrix-west to buick disrupted a ping from metrix-commons to buick. Still need to get onto the commons roof to collect some over-the-air packets. Hopefully tomorrow. Ebtables may be our salvation. We are getting closer, but still haven't cracked it yet. The 11b/g radio was put on essid notyet.personaltelco.net to indicate it isn't actually working yet. I said we'd switch back to www when it was active and working. Talked to a few residents that were enthusiastic to dump their $60/month broadband. '''November 11, 2005''': I think I've figured out the "received packet with own address as source address" messages. They are only appearing on metrix-west and metrix-naya-sw. I think they are a consequence of having bridges on nodes in managed-mode. The master-mode node rebroadcasts frames sent via it, and bridging puts the interfaces in promiscuous mode, so the sender is hearing the rebroadcast. The messages are therefore, presumably, innocuous. --RussellSenior RussellSenior, MichaelWeinberg, and I got together today and did some more discussion and testing regarding the problems at hand. This discussion continued into the evening on IRC. As of now, we have some unanswered questions the most pertanent being: what is naya-sw doing with packets coming from metrix-west headed to buick, and why isn't it doing the obvious thing...send them to buick? --CalebPhillips '''November 10, 2005''': RussellSenior rebooted metrix-commons and metrix-naya-sw to the new kernel, and magically, traffic started to flow between metrix-west and metrix-naya-sw for the first time. However, oddly, connectivity from buick (10.11.104.1) to metrix-west was still severely lossy. Log messages "received packet with own address as source address" are appearing on metrix-naya-sw's /var/log/messages. Some progress, but some bugs still need straightening out. '''November 9, 2005''': RussellSenior got into the basement and was able to recover metrix-west via ethernet. The problem had to do with modules not loading. Patched that problem in a somewhat kludgy way by adding "pre-up modprobe ath-pci" to the athN stanzas in /etc/network/interfaces. The ath-pci module should have loaded from /etc/modules, but wasn't for some reason. Applied the same fix to metrix-commons and metrix-naya-sw, but haven't rebooted them. Can ping metrix-commons from metrix-west, but not all the way to metrix-naya-sw. Hoping a reboot to the new kernel will correct that. '''November 8, 2005''': RussellSenior has copied a new kernel, modules and utilities for use with the madwifi-ng drivers over to metrix-west, metrix-commons, and metrix-naya-sw. The /boot/grub/menu.lst file is modified but still pointing at the 2.6.12.3-metrix kernel. Except in the case of metrix-west, which because it was already not connected, we decided to use it as a test case. It rebooted to 2.6.14-metrix (with the madwifi-ng drivers) and is associated with metrix-commons with a nice strong signal on 802.11a, but for some reason its network is not functioning. Same thing with the 802.11b/g radio, association and a nice strong signal from the street, but no network. It isn't pingable from either radio. Going to try to get inside to test from the ethernet tomorrow. '''November 7, 2005''': RussellSenior is hacking on a metrix image with a new kernel and madwifi-ng drivers, using the metrix we pulled off of Cecily's as a testbed. * I built a serial-console cable for the metrixes. It consists of a standard serial cable with one end cut off and spliced with three small wires with female connectors on the tips to slip over the male DB9 pins. It is slightly tricky to install, requiring tweezers, some light and a little persistance, but it beats the hell out of disassembling the thing to get at the serial port. The communications parameters for talking to the metrix console are 19200 baud, N81, no flow control. Using this console cable, the three wires are placed as follows (taking care that bending loads on the wires don't cause the conductive parts to touch... probably should insulate the connectors better with some heat-shrink tubing): * pin 2 - blue * pin 3 - white * pin 5 - black * Observed that wds clients can't associate with non-wds masters, but any client can associate with wds masters. * I apt-get update ; apt-get upgrade'd the metrix, which installed about half a dozen new versions of things, but not too overwhelming. * I also apt-get install'd tcpdump and ntpdate. * I have compiled a 2.6.14 kernel and current svn madwifi-ng drivers and loaded them onto the metrix with rsync. It still can't see my 802.11a AP, presumably because I don't have the madwifi-ng drivers on the AP yet. Grub is not configured to boot the new kernel, so the metrix falls back to the old kernel without manual intervention on the serial console. * The eth0 interface isn't coming up on boot. But /etc/network/interfaces is a mess right now, so maybe no surprise! An ifup eth0 cures it, but you obviously need a console to do that. * The madwifi-ng drivers employ a new method of defining interfaces. Physical devices are named wifi0, wifi1, ... wifiN. Interfaces are created with the wlanconfig command, e.g.: "wlanconfig ath0 create wlandev wifi0 wlanmode ap". The associated utilities are installed in /usr/local/bin. Some /etc/network/interfaces changes need thinking through. '''October 28, 2005''': CalebPhillips, RussellSenior, and MichaelWeinberg replaced the metrix on Cecily's rooftop and re-attached the equipment with real chimney mounting hardware. This node seems to be working just fine now, with good connectivity to Commons. However, currently there seems to be a problem with the switch at Naya that is preventing the network from handing out DHCP or access to the intarweb. The todo list below has the current todo. Russell and I came back to fix the switch issue and found Buick DOA. We also replaced the switch with one Russell had on hand. Everything seems to work now...except, Cecily's cannot connect to anything past commons in the direction of naya. Specifically, it seems like clients of the commons 802.11a radio (with the omni) cannot see each other (naya-sw and metrix-west are both clients to commons). We are working on an explanation. At this point the network is entirely functional everywhere except metrix-west. - CalebPhillips '''October 27, 2005''': CalebPhillips and RussellSenior managed to get Ed's roof online. Without access to a ladder we had only one option to make progress, and that was to see if the apparently non-functioning metrix on Ed's roof was actually powered on and accessible from the ethernet. Ed graciously let us in to check. This possibility was suggested by Russell's experience with metrix-naya-sw, where the radios did not initially come up after a reboot. Turns out, the metrix ''was'' on. Russell was able to connect via the ethernet, and got a weak signal radio signal on 11g, roughly 7 dB SNR. So the problem wasn't a bad POE connection and it was not a failure to load ath_pci either. Caleb suggested that we might have the antennas backwards. Twice, trying to "ifdown ath0" and the "ifdown ath1" froze the metrix. Russell tried swapping ath0 and ath1 in /etc/network/interfaces, rebooted and bingo, 11b/g started working! SNR in the attic jumped to about 30. However, the 11a backhaul was weak. Pinging to Commons worked, but with about a 40% packet loss. Maybe need to repoint the antenna (isn't there a distance tweak for 11a having to do with an ACK timeout or something, but I thought it was for further than we're talking about here). Caleb and Russell retreated to FreshPot to report success and think. Russell, looking out the window at the backfire already pointing at Ed's from NAYA NW, realized there was a chance that Ed's radio might be able to hit NAYA NW, even though it wasn't pointed directly at it, because it was only 600 or so feet away instead of 1500 feet to Commons. Russell changed the metrix-naya-nw ath0 radio to ESSID backhaul-nw and master mode on channel 161 and turned it on, then walked up the block near Ed's, logged in via 802.11g and reconfigured its ath1 (connected to the 11a antenna) to backhaul-nw and rebooted. Bingo! SNR of about 30 dB. Kind of a chewing gum and bailing wire solution, but it is up and passing traffic. - RussellSenior |
---- |
November 29, 2005: Buick got sick and was rebooted. In fact, it is still sick and will be replaced, hopefully on Wednesday evening, with a nucab, at least temporarily. We also power cycled the Edimax AP in the dog shop in Mississippi Commons. It appears to be functioning now.
November 21, 2005: RussellSenior built a freshened kernel (2.6.14.2) and madwifi-ng (rev 1329), installed them on metrix-naya-sw, metrix-commons, and metrix-west, and rebooted. The new madwifi-ng rev was built in a metrix-compatible chroot environment and so the madwifi-utils in /usr/local/bin are now linked properly. The other two metrixes, metrix-naya-nw and metrix-ed are still running the original 2.6.12.3-metrix kernel and the WDS-branch madwifi drivers from late July. It is possible to connect with essentially zero packet loss from buick to metrix-west if you simultaneously "ping -f 10.11.104.2" from buick. Metrix-west was apt-get upgraded that way.
November 20, 2005: RussellSenior thinks he's figured out what is going wrong. It is an effect caused by client-node to client-node when the traffic needs to pass through one of the client bridges. As mentioned earlier, when a client-node sends to a client-node, it sees the traffic twice, once when it is sends it and once (in promiscuous mode) when the master rebroadcasts it. When the traffic is passing from the other side of the bridge (say, from buick on eth0), and it sees the rebroadcast packet it just sent with a SRC MAC on ath0, the bridge is reassigning that MAC to the bridge port associated with ath0, not eth0. When packets return headed for that MAC, they get to the bridge and the bridge fails to deliver to the port where that MAC actually lives. Boom. This problem does not occur when communicating client-to-master (or master-to-client), because these packets are not rebroadcast. The problem doesn't occur when the communication is strictly client-to-client, because even though the client still sees the rebroadcast packet, the bridge is smart enough to know not to reassign local MAC addresses to a different port.
RussellSenior tested this model this morning by ping flooding from buick to metrix-commons (thus keeping metrix-naya-sw's bridge refreshed with where buick's MAC should properly live) while pinging the problematic metrix-west. Still some lossage, but far less than the usual 98%, only about 17%.
Now the question is, what is the solution? One temporary solution might be to use ebtables filtering to drop packets at metrix-naya-sw where buick's MAC shows up on ath0 as a SRC MAC. But there are other situations where we'll see the same phenomenon, e.g. 11b/g clients of the 11a client nodes. The real solution is to get the sending bridges to ignore the rebroadcasts altogether.