Personal Telco VPN

Overview

This is a place to tie together all of the information about the ["VPN"] aspects of ["PTPnet"].

In Summer of 2006, JimmySchmierbach and KeeganQuinn spent several weeks planning and testing a design for a system which could allow all PersonalTelco nodes to be logically interconnected. The design allows us to provide a structure that is flexible enough to take advantage of any type of connection, although we focused mostly on IP-over-IP tunnels created with software such as ["OpenVPN"] with glue provided by ad-hoc IP routing protocols like OptimizedLinkStateRouting.

The Good

Aside from being a cool idea and a fascinating problem domain from a technical perspective, there are a couple of practical benefits to this:

Maintenance - Some nodes are trapped behind unfriendly routers doing cone NAT, which prevents the NetworkOperationsTeam from working on them in the usual way. One example is NodeLuckyLab, but there are more than a couple of these out there. (Is there a list of these?)
Universal connectivity - it would be ideal if all of our different locations were connected, via radio, laser, fiber, Ethernet, frame relay circuit or whatever you like, forming one big ["PTPnet"] cloud. Unfortunately, the fiber-backed wireless dream mesh isn't quite blanketing the world yet. However, in the meantime we can achieve a similar effect with tunnels.
- One related idea would be allowing more users from outside our network to tunnel in, participating as VPN clients. Think PicoPeer.
  - Technically, quite easy to do, with the foundation that is already in place. It could be set up a number of ways.
  - Several tunnels exist but currently they are all between nodes and specific designated servers; anyone can access the network but only if they are physically present at one of the integrated nodes, which needlessly limits the potential usefulness of the network.
  - What about connecting networks that are not nodes? For example, a home with no wireless network, or a block of servers at a company..
Redundancy - More connections mean more bandwidth for everyone. Even tunnels have the potential to supplement direct links. For example:
- Additional bandwidth could be gained in situations where multiple routes through different interfaces are available.
- Fault tolerance is also possible; traffic can be redirected to another path if one interface fails, reducing or even eliminating service interruptions..
IPv6 deployment - Tunnel brokers are effectively the only way that most people in this area can obtain significant IPv6 connectivity. It's better than none at all but these tunnels tend to be unreliable and often suffer from high latency. In contrast, we actually have something of a bandwidth surplus, especially when it comes to just getting around town. The conclusion which follows is that if we were to put a bit of serious effort into it, we could easily end up with a faster and more useful IPv6 network of our own, especially if we establish a BGP peer relationship or two at the Internet border rather than falling back to another broker tunnel.
Education - Tunnels give us an opportunity to start acquiring practical knowledge about how to deal with increasing scale in a wide area network, which is going to be invaluable as we begin facing those problems with physical networks.
- Similarly, we could potentially get a head start on exploring and building potential applications to run on these networks.

Reference

Here are the places on this wiki where there is currently information on VPNs:

Goals

Vague and not too ambitious

The near-term goal would be to actually complete the implementation that has been attempted so many times, then look into applying Jimmy's ideas to whatever extent is possible, and finally document everything here so that it can be maintained and expanded with relative ease.

The polished brass version

Every time it comes up, the idea is always that we should start with NodesBehindNat. It's an easy decision to reach by committee, since it allows you to completely overrule any naysayer Scrooge types by playing the security card, and the folks who just really want a network get told it'll happen sometime soon. Everyone's happy. It's happened that way probably a dozen times with a different group of people each time; rather than listing all of the names or even the most recent batch, just give yourself a pat on the back if you've ever been one of them.

The rationale is generally that those systems stand to gain most significantly at first. While this is completely true, and very noble, I'm afraid it has actually slowed down the progress of VPN deployment overall, as a result of the very same factor which is always thought will speed it along: these nodes aren't accessible except to a person with a laptop who must physically travel to each one. So it's not just a matter of a couple of hours with a terminal - it's a couple of days or more traveling all over town trying to make the right things happen. They're always the first ones, so there are always problems which of course don't manifest until later that night, and so these poor nodes get visited over and over by these equally poor guys who are doing their best to make things work.

Anyway, I'm not saying you should refrain from treading that well-traveled path, if you're upwardly mobile (read: car owner) and have the will to get out there and do it. Go for it. Send me (KeeganQuinn) an email; I'll help out. What I am saying is that there is really no reason for all of the nodes that have good connectivity to wait patiently on the back burner while the second-class citizen nodes get emancipated. In fact, it seems to me that it makes more sense if the hard-to-reach nodes get their wings later on in the process; if something doesn't go quite right in the beginning, which has happened every single time, it's no trouble to fix it if the node was accessible anyway. If instead that botch means someone has to drive all the way across town again, the amount of time spent goes way out of proportion to the benefit realized, as does the frustration level of our good friend the example volunteer.

The skinny

We've got a whole bunch of nodes that need to get hooked up. A list needs to be made; NodeAudit is probably a good starting point. To my knowledge, there is not a single node anywhere in the city that doesn't need at least one of these things done to it, although relatively few will need the full-service deal.

With that in mind, this is a rough description of things that need to happen to each node before that new tunnel goes hot. I will start doing these tasks myself and fill in more details as I go. Be patient - we'll be lucky if we get them all done before 2008. Don't even think about doing any of this just yet unless you are prepared to fix whatever you break. Actually, it's fine by me if your answer to that is going down to the location, ripping the box out of the dusty hole (nearly all nodes live in dusty holes), bringing it to someone who can fix it, watching them do it, then bringing it back to the dusty hole. It's a great learning experience, and you're not likely to make the same mistake twice after it costs you.

They don't need to be done in the NodeAudit list order or the middle of the night or any specific thing. A fair percentage tend to break from time to time, because we're neglecting them, because their operational environment is partially submerged and the water is moonlighting as part of a 220V circuit or just because some of them are really crappy old computers - you'd be amazed. Anyway, if you're going to attempt the process at all, don't be timid, just beat the thing up by running through the steps until you're pretty sure there are no more steps - then it'll either be really properly broken or working perfectly. Either way, you'll get three cheers from me.

Computers are pretty good at remembering stuff; if you end up on a node that's been done already, it's just going to tell you that, and usually with this type of stuff, it will refuse to do it again. It will also refuse to do stuff at all if it's something that doesn't seem like a good idea to it. As a final word of warning, some of these systems were not configured with major upgrades in mind, and some were only barely adequate to begin with, so keep on the look-out for completely filled hard disks and partitions. If you get stuck with a full disk, do the best you can to get the system to at least keep serving up Internet access and make a note of it on this page. A few of them could use new disks... Anyway, just dive in and have fun, and you'll have the quirks figured out in no time.

1. Most NuCabs will need to be upgraded to etch first.

Some of them will need to be upgraded to sarge before they can be upgraded to etch. Believe it.
Read the upgrade docs in the release notes and just follow the instructions. Seriously.

2. Install the etch kernel after all of the other software updates and get it running.

Technically this is part of the etch upgrade, but it is worth repeating. It's important.
Some nodes will appear to be upgraded to etch already but are actually running sarge or woody kernels. Always check.
We really want our shiny new VPN to work properly after we set it up, and having a dozen different kernel versions trying to interact sanely does not help at all. It will seem like it works fine at first, but eventually things will start acting weird, so just take care of it before it's a problem.

The same goes for OpenVPN and olsrd versions; use only the stuff in etch, if it's a NuCab, or you'll wish you had.

Last but not least, it would be really nice to get them hooked into the Osiris server. It doesn't all have to happen at once but it will all need to get done eventually, and it's really pretty quick and easy to do it all in one session. You can move quickly; nodes don't get upset or anything if every detail isn't checked and double-checked. We spend a little more time now to save a lot of time later.

After those nodes are connected, work should be continued on interconnecting other nodes, as time allows.

These are some pretty mundane goals, and they rather make it sound like someone is paying us or something. Some more interesting (although also more long-term) ideas are described in the section about benefits, above.

Methodology

For now, we are using one central server with several clients connecting to the server, in a classic hub pattern. At some point, we're going to reach an upper limit with this design and we will have to re-evaluate our options given the technology available at that point.

JimmySchmierbach has done some fairly significant planning in anticipation of this eventuality, including some detailed documentation of a design based on a hierarchy with two tiers, referred to as supernodes and nodes. The basic idea was that all of the supernodes would be connected together with tunnels to form a full mesh pattern, then each node would connect to one or more of the supernodes. The original drawings specified the three core servers as the supernodes: cornerstone, bone and alitheia.

Someone once wrote here that Jimmy's plan involved the supernodes each being connected to a master node (eg. donk), resulting in a hierarchy with three tiers. That is not correct. donk was never a functional part in the original design. It didn't have to be; the idea was that with three supernodes in a mesh, all bases were covered as long as only one server was ever down at a time. All of the routes would still work because everything was supposed to be redundant.

It has been roughly one year since that design was dreamed up; unfortunately, during that time, alitheia has been completely removed from service and the Subversion repository (which was our most important organizational tool) has been destroyed. bone has also seen some hard times; although it ended up with an upgraded mainboard and some other components, it is now hosted virtually in the same location as cornerstone, which unfortunately forces us quickly to the conclusion that, of the three supernodes in the original design, only cornerstone retains even a chance of being effective in that role.

Jimmy's design is really impressive in theory, but I don't recall that we ever actually got it to work with all of the good parts. The VPN clients kept taking naps and the dynamic routing daemons got confused about the fact that we were running a mesh on a layer over their heads. Sometimes machines on the same switch, side by side, would decide that they'd prefer to talk to each other through a big chunk of Internet. It might work better with current software but I'm not really in a big hurry to try it; there's a lot to be said for simplicity, like for example a big fat server that handles everything and actually just works. You want redundancy? Get another big fat server, and just double everything or let it do round robin failover. Simple. Works. Doesn't confuse the software or the people configuring it.

This brings us right back around to the first paragraph in this section: we're running the whole show from donk, until it starts to break, at which point we need to look at our options. I propose we add capacity the same way I suggested we add redundancy: more big fat servers. However, that's probably a moot point, since we're nowhere near any kind of capacity limit. I don't expect we will even catch a glimpse of a limit until real users start generating traffic on the tunnel network, which is not likely given our modest selection of near-term goals, outlined in the previous section. At that point, my bet is that the pipe hits a red line before the box does, anyway.

Configuration

Just checking

Oh, you actually want to set up a link?

Are you absolutely positive that everything has been upgraded and the new kernel is running?

OpenVPN

To generate a new keypair for a client do something like this:

ssh you@donk
sudo -s
cd /etc/ssl/easy-rsa
. vars
./build-key thenode
cp keys/thenode.crt /etc/openvpn/keys/
cp keys/thenode.key ~
mv keys/thenode.key ~
exit

Then, do the configuration on the server side - add a file in /etc/openvpn/ccd with a name like thenode.personaltelco.net. The contents should be something like (replacing 10.11.255.X with an unused IP within 10.11.255.0/24 from the NetworkAddressAllocations page):

ifconfig-push 10.11.255.X 255.255.255.0

Finally, you must configure the client. Do something like:

ssh you@thenode
sudo apt-get update
sudo apt-get install openvpn
cd /etc/openvpn
sudo scp you@donk:thenode.* .
sudo scp you@donk:/etc/openvpn/keys/ca.crt .

Create the clients configuration file at /etc/openvpn/client.conf:

client
remote donk.personaltelco.net 1195
proto tcp-client
dev tap
ca /etc/openvpn/ca.crt
cert /etc/openvpn/thenode.crt
key /etc/openvpn/thenode.key
comp-lzo

And finally, start OpenVPN on the client-side:

/etc/init.d/openvpn restart

Now, you should be able to goto 10.11.255.1 from the client and get to donk, or 10.11.255.X (where X is whatever you assigned it) on donk to get to the client.

Address Allocation

Servers

Server	10.11.255.?	Port	Proto	Compression	Dev
donk	1	1195/udp	OpenVPN	lzo	tap0

Clients

Node	Client	Tunnel To	10.11.255.?
NodeLuckyLab	luckylab	donk	5
NodeMississippi	chevy	donk	6
NodeCostellos	afterthought	donk	7
NodeCommunitecture	dryrot	donk	8
NodeNorthstar	star	donk	9
NodePowellsTech	cantos	donk	10
NodeTB151	beast	donk	11

DNS

Each client/server should have an entry in DNS for their VPN IP as a subdomain of vpn.ptp (i.e. donk.vpn.ptp). But, this isn't always as uptodate as it should be...

References

CategoryDocumentation CategoryDamnYouKeegan