Personal Telco VPN

Overview

This is a project which aims to integrate ["VPN"] technology with ["PTPnet"], focusing primarily on permanent IP-over-IP tunnels created with ["OpenVPN"]. Some history and background are available, as are references to configuration data.

The Good

Aside from being a cool idea and a fascinating problem domain from a technical perspective, there are a couple of practical benefits to this:

Maintenance - Some nodes are trapped behind unfriendly routers doing cone NAT, which prevents the NetworkOperationsTeam from working on them in the usual way. One example is NodeLuckyLab, but there are more than a couple of these out there: NodesBehindNat
Universal connectivity - it would be ideal if all of our different locations were connected, via radio, laser, fiber, Ethernet, frame relay circuit or whatever you like, forming one big ["PTPnet"] cloud. Unfortunately, the fiber-backed wireless dream mesh isn't quite blanketing the world yet. However, in the meantime we can achieve a similar effect with tunnels.
- One related idea would be allowing more users from outside our network to tunnel in, participating as VPN clients. Think PicoPeer.
  - Technically, quite easy to do, with the foundation that is already in place. It could be set up a number of ways.
  - Several tunnels exist but currently they are all between nodes and specific designated servers; anyone can access the network but only if they are physically present at one of the integrated nodes, which needlessly limits the potential usefulness of the network.
  - What about connecting networks that are not nodes? For example, a home with no wireless network, or a block of servers at a company..
Redundancy - More connections mean more bandwidth for everyone. Even tunnels have the potential to supplement direct links. For example:
- Additional bandwidth could be gained in situations where multiple routes through different interfaces are available.
- Fault tolerance is also possible; traffic can be redirected to another path if one interface fails, reducing or even eliminating service interruptions..
IPv6 deployment - Tunnel brokers are effectively the only way that most people in this area can obtain significant IPv6 connectivity. It's better than none at all but these tunnels tend to be unreliable and often suffer from high latency. We have especially if we establish a BGP peer relationship or two at the Internet border rather than falling back to another broker tunnel.
Education - Tunnels give us an opportunity to start acquiring practical knowledge about how to deal with increasing scale in a wide area network, which is going to be invaluable as we begin facing those problems with physical networks.
- Similarly, we could potentially get a head start on exploring and building potential applications to run on these networks.

Goals

The polished brass version

Whenever someone raises the issue of tunnels in a discussion and there's a new initiative to start creating VPN tunnels, the conclusion is always that we need to take care of the NodesBehindNat. It's an easy decision to reach by committee, since it allows you to completely overrule any naysayer Scrooge types by playing the security card, and the folks who just really want a network get told it'll happen sometime soon. Everyone's happy. It's happened the same way many times, usually with different groups of people; rather than listing the cast for the whole series or even just the latest episode, just give yourself a pat on the back if you've ever been one of us.

The rationale is generally that those systems stand to gain the most at first. While this is completely true, the idea has actually slowed down the progress of VPN deployment overall, as a result of the very same factor which is always thought will speed it along: these nodes aren't accessible except to a person with a laptop who must physically travel to each one. The amount of work is multiplied; it's not just a matter of a couple of hours with a terminal - it can take days of traveling all over town trying to make the right things happen.

The idea that these nodes have a greater need for accessibility improvements has real merit, but there is no reason they should be handled with exclusive priority over other potential nodes. In particular, the process of establishing an effective baseline configuration is greatly simplified when you have the ability to bring the interface up and down freely.

In the trenches

We've got a whole bunch of nodes that need to get hooked up. A complete list needs to be created or located; NodeAudit is probably a good starting point, and maybe the best thing we have. It is unlikely that there is even one node that is completely current in all of the aspects discussed here, although only a few will need all of these updates. Every single active node is different, which complicates matters when you want to start thinking about them as a group to simplify management, or even if you just want to provide useful directions about how they work. Connecting outdated systems also presents a serious threat to the continued viability of the network; in the past, version compatibility problems have brought the entire project to a halt after we chose to ignore them.

Some work has already been done and more details will be provided on this page when possible. Anyone who is part of the NetworkOperationsTeam is invited to participate as much as they are able. Others who are interested should contact a team member and ask about joining.

Preparation

Guidelines

Be patient; this is probably the most significant single task that this group has ever attempted, and we'll be lucky if we get them all by 2008. Please don't begin the task unless you are prepared to fix whatever you break, even if it means ripping the box out of the dusty hole (nearly all nodes live in dusty holes), bringing it to someone who can fix it, watching them do it, then bringing it back to the dusty hole. It's a great learning experience, and you're not likely to make the same mistake twice after it costs you.

If you are doing upgrades remotely, it can be helpful to make phone contact with someone who has access to the system; a pair of hands on site can make quite a difference in some situations. If you are on site, it's a good idea to introduce yourself and explain that you are going to be working on the system. If everything goes well, the only noticeable effect will likely be that people will be trapped by the captive portal sooner than usual when it is rebooted. Some nodes can also slow down when new software is being downloaded and installed.

It sometimes happens that things do not go so well. Some of these systems were not configured with major upgrades in mind, and some were only barely adequate to begin with, so keep on the look-out for full hard disks and partitions. Check for free space before you begin downloading with df -h; anything less than 50M available on each partition can cause problems. You can keep an eye on file sizes when running aptitude if you are cutting it close. If you get stuck with a full disk, do the best you can to get the system to keep serving up Internet access and make a note of it on this page. Many nodes could use new disks, which means a fresh installation, but unfortunately there is no current documentation about that process. If you think a node needs a new disk, contact the NetworkOperationsTeam and we'll figure out a solution. As if hard disk issues were not enough of a problem, anecdotal evidence suggests that several nodes tend to just crash or break without warning from time to time; because we're neglecting them, because their operational environment is partially submerged and the water is moonlighting as part of a 220V circuit or just because some of them are really crappy old computers.

Be ready for anything if you're going to attempt the process at all. For most of the process, you can stop any time and no harm will be done - even so, don't be timid, just beat the thing up by running through the commands until you're pretty sure there are no more steps - then it'll either be really properly broken or working perfectly. Computers are pretty good at remembering stuff; if you end up on a node that's been done already, or you run the same command twice or skip one, it's just going to tell you that, and usually with this type of stuff, it will refuse to do it again. It will also refuse to do stuff at all if it's something that it doesn't like. Anyway, just dive in and have fun, and you'll have the quirks figured out in no time.

Tasks

With that in mind, this is a rough breakdown of things that need to happen to each NuCab node before that new tunnel goes hot.

1. Most NuCabs will need to be upgraded to etch first.

Some of them will need to be upgraded to sarge before they can be upgraded to etch. Believe it.
Read the upgrade docs in the release notes and just follow the instructions. Seriously.

2. Install the etch kernel after all of the other updates and get it all running.

Technically this is part of the etch upgrade, but you must be certain it gets done.
Sometimes you'll need to select a different kernel specifically; some nodes will seem to be upgraded to etch completely while still running sarge or woody kernels. Take a look at /boot to get the whole story and always check uname -a as a confirmation.
We really want our shiny new VPN to work when we get it set up, and running anything other than a proper kernel can cause serious problems.
Really, this is very important. Contact another team member if you need help.

3. After the etch kernel is running, install and configure any software that isn't already taken care of.

OpenVPN (current etch version)
olsrd (backported version; see DebianAptSource)
osirisd (current etch version)
use only the versions indicated or you'll wish you had.

4. Make sure it all works as well as you're able. Clean up as much as possible. Remove old packages and even consider another reboot just to be sure all of the configuration is working correctly.

These are some pretty mundane goals, and they rather make it sound like someone is paying us or something. Be that as it may, this process should provide us with a level of connectivity far beyond anything we have done before. To build upon this foundation, some more interesting ideas (although also more long-term) are described in the section about benefits, above.

Methodology

Design

For now, we are using one central server (donk) and allowing any number of clients to connect directly over TCP, in a classic hub pattern.

Scalability

At some point in the future, we will likely to reach an upper limit with this design and we will have to re-evaluate our options given the options which are available at that point. Given the nature of the system, the resource that is most likely to be exhausted first would be bandwidth.

Configuration

Just checking

Oh, you actually want to set up a link?

Are you absolutely positive that everything has been upgraded and the new kernel is running?

OpenVPN

To generate a new keypair for a client do something like this:

ssh you@donk
sudo -s
cd /etc/ssl/easy-rsa
. vars
./build-key thenode
cp keys/thenode.crt /etc/openvpn/keys/
cp keys/thenode.crt ~
mv keys/thenode.key ~
exit

Next, you must configure the client. Do something like:

ssh you@thenode
sudo apt-get update
sudo apt-get install openvpn
cd /etc/openvpn
sudo scp you@donk:thenode.* .
sudo scp you@donk:/etc/openvpn/keys/ca.crt .

Create the client configuration file at /etc/openvpn/client.conf:

client
remote donk.personaltelco.net 1195
proto tcp-client
dev tap
ca /etc/openvpn/ca.crt
cert /etc/openvpn/thenode.crt
key /etc/openvpn/thenode.key
comp-lzo

And finally, start OpenVPN on the client-side:

sudo /etc/init.d/openvpn start client

At this point, if avahi-daemon is installed and running, you should be able to reach donk.local from the client, or thenode.local from donk.

You can now install the olsrd-plugins package, configure OLSR and join ["PTPnet"].

Address Allocation

Servers

Server	10.11.255.?	Port	Proto	Compression	Dev
donk	1	1195/tcp	OpenVPN	lzo	tap0

Clients

Client addresses are assigned dynamically by the server.

DNS

Install the avahi-daemon package on the client system and use mDNS.

References

Related notes:

Technical details about the proposed VPN configuration: (not currently being used)

History

In Summer of 2006, JimmySchmierbach and KeeganQuinn spent several weeks planning and testing a design for a virtual network with the potential to scale to serve the entire city. The results of this project were inconclusive due to problems with software stability, but a complete theory for the design was formulated. The central idea is based on a hierarchy with two tiers, referred to as supernodes and nodes. The basic idea was that all of the supernodes would be connected together with tunnels in a full mesh pattern, then each node would maintain a connection with two or more of the supernodes at all times. Fault tolerance is a central element in this design; actual potential for operation on a large scale remains untested.

Jimmy's original drawings specified the three core servers as the supernodes: cornerstone, bone and alitheia. This page previously stated that the design included the idea of supernodes each being connected to a master node (eg. donk), resulting in a hierarchy with three tiers. That is not correct: donk was never a functional part in the original design, and was never connected to the other systems during this project. It would not have provided any notable benefit in acting as an independent tier; with three supernodes in a mesh in the center and a minimum of two supernode connections per node, all of the routes supported by the system could be maintained if any one server failed.

The design is really impressive in theory, but when the time came to test it out, things didn't work out so well. The VPN clients kept taking naps and the dynamic routing daemons often got confused about the fact that we were running a mesh on a layer over their heads. Sometimes machines on the same switch, side by side, would decide that they'd prefer to talk to each other through a big chunk of Internet. This type of erratic behavior made it difficult to take the project seriously at the time.

It certainly seems that the software versions available now have come a long way in terms of solving these problems. The reliability and consistency of the current network is far better than has ever been accomplished in the past. However, it is difficult to say with certainty if this effect is a result of improvements in the software or the simplification of our topology. At this point, building a working network is more important than proving our triad theory; perhaps at some point in the future we will have a better opportunity to test it.

CategoryDocumentation CategoryDamnYouKeegan CategoryEducation CategoryMan CategoryNetwork