Differences between revisions 17 and 18

Personal Telco VPN

Overview

This is a project which aims to integrate ["VPN"] technology with ["PTPnet"], focusing primarily on permanent IP-over-IP tunnels created with ["OpenVPN"]. Some history and background are available, as are references to configuration data.

The Good

Aside from being a cool idea and a fascinating problem domain from a technical perspective, there are a couple of practical benefits to this:

Maintenance - Some nodes are trapped behind unfriendly routers doing cone NAT, which prevents the NetworkOperationsTeam from working on them in the usual way. One example is NodeLuckyLab, but there are more than a couple of these out there: NodesBehindNat
Universal connectivity - it would be ideal if all of our different locations were connected, via radio, laser, fiber, Ethernet, frame relay circuit or whatever you like, forming one big ["PTPnet"] cloud. Unfortunately, the fiber-backed wireless dream mesh isn't quite blanketing the world yet. However, in the meantime we can achieve a similar effect with tunnels.
- One related idea would be allowing more users from outside our network to tunnel in, participating as VPN clients. Think PicoPeer.
  - Technically, quite easy to do, with the foundation that is already in place. It could be set up a number of ways.
  - Several tunnels exist but currently they are all between nodes and specific designated servers; anyone can access the network but only if they are physically present at one of the integrated nodes, which needlessly limits the potential usefulness of the network.
  - What about connecting networks that are not nodes? For example, a home with no wireless network, or a block of servers at a company..
Redundancy - More connections mean more bandwidth for everyone. Even tunnels have the potential to supplement direct links. For example:
- Additional bandwidth could be gained in situations where multiple routes through different interfaces are available.
- Fault tolerance is also possible; traffic can be redirected to another path if one interface fails, reducing or even eliminating service interruptions..
IPv6 deployment - Tunnel brokers are effectively the only way that most people in this area can obtain significant IPv6 connectivity. It's better than none at all but these tunnels tend to be unreliable and often suffer from high latency. We have especially if we establish a BGP peer relationship or two at the Internet border rather than falling back to another broker tunnel.
Education - Tunnels give us an opportunity to start acquiring practical knowledge about how to deal with increasing scale in a wide area network, which is going to be invaluable as we begin facing those problems with physical networks.
- Similarly, we could potentially get a head start on exploring and building potential applications to run on these networks.

Goals

The polished brass version

Whenever someone raises the issue of tunnels in a discussion and there's a new initiative to start creating VPN tunnels, the conclusion is always that we need to take care of the NodesBehindNat. It's an easy decision to reach by committee, since it allows you to completely overrule any naysayer Scrooge types by playing the security card, and the folks who just really want a network get told it'll happen sometime soon. Everyone's happy. It's happened the same way many times, usually with different groups of people; rather than listing the cast for the whole series or even just the latest episode, just give yourself a pat on the back if you've ever been one of us.

The rationale is generally that those systems stand to gain the most at first. While this is completely true, the idea has actually slowed down the progress of VPN deployment overall, as a result of the very same factor which is always thought will speed it along: these nodes aren't accessible except to a person with a laptop who must physically travel to each one. The amount of work is multiplied; it's not just a matter of a couple of hours with a terminal - it can take days of traveling all over town trying to make the right things happen.

The idea that these nodes have a greater need for accessibility improvements has real merit, but there is no reason they should be handled with exclusive priority over other potential nodes. In particular, the process of establishing an effective baseline configuration is greatly simplified when you have the ability to bring the interface up and down freely.

In the trenches

We've got a whole bunch of nodes that need to get hooked up. A complete list needs to be created or located; NodeAudit is probably a good starting point, and maybe the best thing we have. It is unlikely that there is even one node that is completely current in all of the aspects discussed here, although only a few will need all of these updates. Every single active node is different, which complicates matters when you want to start thinking about them as a group to simplify management, or even if you just want to provide useful directions about how they work. Connecting outdated systems also presents a serious threat to the continued viability of the network; in the past, version compatibility problems have brought the entire project to a halt after we chose to ignore them.

Some work has already been done and more details will be provided on this page when possible. Anyone who is part of the NetworkOperationsTeam is invited to participate as much as they are able. Others who are interested should contact a team member and ask about joining.

Preparation

Guidelines

Be patient; this is probably the most significant single task that this group has ever attempted, and we'll be lucky if we get them all by 2008. Please don't begin the task unless you are prepared to fix whatever you break, even if it means ripping the box out of the dusty hole (nearly all nodes live in dusty holes), bringing it to someone who can fix it, watching them do it, then bringing it back to the dusty hole. It's a great learning experience, and you're not likely to make the same mistake twice after it costs you.

If you are doing upgrades remotely, it can be helpful to make phone contact with someone who has access to the system; a pair of hands on site can make quite a difference in some situations. If you are on site, it's a good idea to introduce yourself and explain that you are going to be working on the system. If everything goes well, the only noticeable effect will likely be that people will be trapped by the captive portal sooner than usual when it is rebooted. Some nodes can also slow down when new software is being downloaded and installed.

It sometimes happens that things do not go so well. Some of these systems were not configured with major upgrades in mind, and some were only barely adequate to begin with, so keep on the look-out for full hard disks and partitions. Check for free space before you begin downloading with df -h; anything less than 50M available on each partition can cause problems. You can keep an eye on file sizes when running aptitude if you are cutting it close. If you get stuck with a full disk, do the best you can to get the system to keep serving up Internet access and make a note of it on this page. Many nodes could use new disks, which means a fresh installation, but unfortunately there is no current documentation about that process. If you think a node needs a new disk, contact the NetworkOperationsTeam and we'll figure out a solution. As if hard disk issues were not enough of a problem, anecdotal evidence suggests that several nodes tend to just crash or break without warning from time to time; because we're neglecting them, because their operational environment is partially submerged and the water is moonlighting as part of a 220V circuit or just because some of them are really crappy old computers.

Be ready for anything if you're going to attempt the process at all. For most of the process, you can stop any time and no harm will be done - even so, don't be timid, just beat the thing up by running through the commands until you're pretty sure there are no more steps - then it'll either be really properly broken or working perfectly. Computers are pretty good at remembering stuff; if you end up on a node that's been done already, or you run the same command twice or skip one, it's just going to tell you that, and usually with this type of stuff, it will refuse to do it again. It will also refuse to do stuff at all if it's something that it doesn't like. Anyway, just dive in and have fun, and you'll have the quirks figured out in no time.

Tasks

With that in mind, this is a rough breakdown of things that need to happen to each node before that new tunnel goes hot.

1. Most nodes will need to be upgraded to etch first.

Some of them will need to be upgraded to sarge before they can be upgraded to etch. Believe it.
Read the upgrade docs in the release notes and just follow the instructions. Seriously.

2. Install the etch kernel after all of the other updates and get it all running.

Technically this is part of the etch upgrade, but you must be certain it gets done.
Sometimes you'll need to select a different kernel specifically; some nodes will seem to be upgraded to etch completely while still running sarge or woody kernels. Take a look at /boot to get the whole story and always check uname -a as a confirmation.
We really want our shiny new VPN to work when we get it set up, and running anything other than a proper kernel can cause serious problems.
Really, this is very important. Contact another team member if you need help.

3. After the etch kernel is running, install and configure any software that isn't already taken care of.

OpenVPN (current etch version)
olsrd (backported version; see DebianAptSource)
osirisd (current etch version)
use only the versions indicated or you'll wish you had.

4. Make sure it all works as well as you're able. Clean up as much as possible. Remove old packages and even consider another reboot just to be sure all of the configuration is working correctly.

These are some pretty mundane goals, and they rather make it sound like someone is paying us or something. Be that as it may, this process should provide us with a level of connectivity far beyond anything we have done before. To build upon this foundation, some more interesting ideas (although also more long-term) are described in the section about benefits, above.

Methodology

Design

For now, we are using one central server (donk) and allowing any number of clients to connect directly over TCP, in a classic hub pattern.

Scalability

At some point in the future, we will likely to reach an upper limit with this design and we will have to re-evaluate our options given the options which are available at that point. Given the nature of the system, the resource that is most likely to be exhausted first would be bandwidth.

Configuration

Just checking

Oh, you actually want to set up a link?

Are you absolutely positive that everything has been upgraded and the new kernel is running?

OpenVPN

To generate a new keypair for a client do something like this:

ssh you@donk
sudo -s
cd /etc/ssl/easy-rsa
. vars
./build-key thenode
cp keys/thenode.crt /etc/openvpn/keys/
cp keys/thenode.key ~
mv keys/thenode.key ~
exit

Then, do the configuration on the server side - add a file in /etc/openvpn/ccd with a name like thenode.personaltelco.net. The contents should be something like (replacing 10.11.255.X with an unused IP within 10.11.255.0/24 from the NetworkAddressAllocations page):

ifconfig-push 10.11.255.X 255.255.255.0

Finally, you must configure the client. Do something like:

ssh you@thenode
sudo apt-get update
sudo apt-get install openvpn
cd /etc/openvpn
sudo scp you@donk:thenode.* .
sudo scp you@donk:/etc/openvpn/keys/ca.crt .

Create the clients configuration file at /etc/openvpn/client.conf:

client
remote donk.personaltelco.net 1195
proto tcp-client
dev tap
ca /etc/openvpn/ca.crt
cert /etc/openvpn/thenode.crt
key /etc/openvpn/thenode.key
comp-lzo

And finally, start OpenVPN on the client-side:

/etc/init.d/openvpn restart

Now, you should be able to goto 10.11.255.1 from the client and get to donk, or 10.11.255.X (where X is whatever you assigned it) on donk to get to the client.

Address Allocation

Servers

Server	10.11.255.?	Port	Proto	Compression	Dev
donk	1	1195/udp	OpenVPN	lzo	tap0

Clients

Node	Client	Tunnel To	10.11.255.?
NodeLuckyLab	luckylab	donk	5
NodeMississippi	chevy	donk	6
NodeCostellos	afterthought	donk	7
NodeCommunitecture	dryrot	donk	8
NodeNorthstar	star	donk	9
NodePowellsTech	cantos	donk	10
NodeTB151	beast	donk	11

DNS

Each client/server should have an entry in DNS for their VPN IP as a subdomain of vpn.ptp (i.e. donk.vpn.ptp). But, this isn't always as uptodate as it should be...

References

Technical details about the proposed VPN configuration: (not currently being used)

History

In Summer of 2006, JimmySchmierbach and KeeganQuinn spent several weeks planning and testing a design for a virtual network with the potential to scale to serve the entire city. The results of this project were inconclusive due to problems with software stability, but a complete theory for the design was formulated. The central idea is based on a hierarchy with two tiers, referred to as supernodes and nodes. The basic idea was that all of the supernodes would be connected together with tunnels in a full mesh pattern, then each node would maintain a connection with two or more of the supernodes at all times. Fault tolerance is a central element in this design; actual potential for operation on a large scale remains untested.

Jimmy's original drawings specified the three core servers as the supernodes: cornerstone, bone and alitheia. This page previously stated that the design included the idea of supernodes each being connected to a master node (eg. donk), resulting in a hierarchy with three tiers. That is not correct: donk was never a functional part in the original design, and was never connected to the other systems during this project. It would not have provided any notable benefit in acting as an independent tier; with three supernodes in a mesh in the center and a minimum of two supernode connections per node, all of the routes supported by the system could be maintained if any one server failed.

The design is really impressive in theory, but when the time came to test it out, things didn't work out so well. The VPN clients kept taking naps and the dynamic routing daemons often got confused about the fact that we were running a mesh on a layer over their heads. Sometimes machines on the same switch, side by side, would decide that they'd prefer to talk to each other through a big chunk of Internet. This type of erratic behavior made it difficult to take the project seriously at the time.

It certainly seems that the software versions available now have come a long way in terms of solving these problems. The reliability and consistency of the current network is far better than has ever been accomplished in the past. However, it is difficult to say with certainty if this effect is a result of improvements in the software or the simplification of our topology. At this point, building a working network is more important than proving our triad theory; perhaps at some point in the future we will have a better opportunity to test it.

CategoryDocumentation CategoryDamnYouKeegan CategoryEducation CategoryMan

-  ← Revision 17 as of 2007-08-06 17:42:10 →
  Size: 16782
  Editor: JasonMcArthur
  Comment: removal of needless complaints
+  ← Revision 18 as of 2007-08-06 18:29:41 →
  Size: 16646
  Editor: KeeganQuinn
  Comment: reorganization
-Deletions are marked like this.
+Additions are marked like this.
 Line 8:
-This is a place to tie together all of the information about the ["VPN"] aspects of ["PTPnet"].



In Summer of 2006, JimmySchmierbach and KeeganQuinn spent several weeks planning and testing a design for a system which could allow all PersonalTelco nodes to be logically interconnected. The design allows us to provide a structure that is flexible enough to take advantage of any type of connection, although we focused mostly on IP-over-IP tunnels created with software such as ["OpenVPN"] with glue provided by ad-hoc IP routing protocols like OptimizedLinkStateRouting.
+This is a project which aims to integrate ["VPN"] technology with ["PTPnet"], focusing primarily on permanent IP-over-IP tunnels created with ["OpenVPN"]. Some history and background are available, as are references to configuration data.
-Line 17:
+Line 15:
-. '''Maintenance''' - Some nodes are trapped behind unfriendly routers doing cone NAT, which prevents the NetworkOperationsTeam from working on them in the usual way. One example is NodeLuckyLab, but there are more than a couple of these out there. (Is there a list of these?)
+. '''Maintenance''' - Some nodes are trapped behind unfriendly routers doing cone NAT, which prevents the NetworkOperationsTeam from working on them in the usual way. One example is NodeLuckyLab, but there are more than a couple of these out there: NodesBehindNat
-Line 26:
+Line 24:
-. '''IPv6 deployment''' - Tunnel brokers are effectively the only way that most people in this area can obtain significant IPv6 connectivity. It's better than none at all but these tunnels tend to be unreliable and often suffer from high latency. In contrast, we actually have something of a bandwidth surplus, especially when it comes to just getting around town. The conclusion which follows is that if we were to put a bit of serious effort into it, we could easily end up with a faster and more useful IPv6 network of our own, especially if we establish a BGP peer relationship or two at the Internet border rather than falling back to another broker tunnel.
+. '''IPv6 deployment''' - Tunnel brokers are effectively the only way that most people in this area can obtain significant IPv6 connectivity. It's better than none at all but these tunnels tend to be unreliable and often suffer from high latency. We have  especially if we establish a BGP peer relationship or two at the Internet border rather than falling back to another broker tunnel.
-Line 31:
+Line 29:
-== Reference ==



Here are the places on this wiki where there is currently information on VPNs:



 * OpenvpnIpSchemeProposal

   * OpenvpnIpScheme

   * OpenvpnNamingScheme

   * OpenvpnPortScheme 

 * NodesBehindNat

 * NetworkAddressAllocations
-Line 45:
+Line 31:
-=== Vague and not too ambitious ===



The near-term goal would be to actually complete the implementation that has been attempted so many times, then look into applying Jimmy's ideas to whatever extent is possible, and finally document everything here so that it can be maintained and expanded with relative ease.
-Line 51:
+Line 33:
-Every time it comes up, the idea is always that we should start with NodesBehindNat. It's an easy decision to reach by committee, since it allows you to completely overrule any naysayer Scrooge types by playing the security card, and the folks who just really want a network get told it'll happen sometime soon. Everyone's happy. It's happened that way probably a dozen times with a different group of people each time; rather than listing all of the names or even the most recent batch, just give yourself a pat on the back if you've ever been one of them.



The rationale is generally that those systems stand to gain most significantly at first. While this is completely true, and very noble, I'm afraid it has actually slowed down the progress of VPN deployment overall, as a result of the very same factor which is always thought will speed it along: these nodes aren't accessible except to a person with a laptop who must physically travel to each one. So it's not just a matter of a couple of hours with a terminal - it's a couple of days or more traveling all over town trying to make the right things happen. They're always the first ones, so there are always problems which of course don't manifest until later that night, and so these poor nodes get visited over and over by these equally poor guys who are doing their best to make things work.



Anyway, I'm not saying you should refrain from treading that well-traveled path, if you're upwardly mobile (read: car owner) and have the will to get out there and do it. Go for it. Send me (KeeganQuinn) an email; I'll help out. What I am saying is that there is really no reason for all of the nodes that have good connectivity to wait patiently on the back burner while the second-class citizen nodes get emancipated. In fact, it seems to me that it makes more sense if the hard-to-reach nodes get their wings later on in the process; if something doesn't go quite right in the beginning, which has happened every single time, it's no trouble to fix it if the node was accessible anyway. If instead that botch means someone has to drive all the way across town again, the amount of time spent goes way out of proportion to the benefit realized, as does the frustration level of our good friend the example volunteer.



=== The skinny ===



We've got a whole bunch of nodes that need to get hooked up. A list needs to be made; NodeAudit is probably a good starting point. To my knowledge, there is not a single node anywhere in the city that doesn't need at least one of these things done to it, although relatively few will need the full-service deal. 



With that in mind, this is a rough description of things that need to happen to each node before that new tunnel goes hot. I will start doing these tasks myself and fill in more details as I go. Be patient - we'll be lucky if we get them all done before 2008. Don't even think about doing any of this just yet unless you are prepared to fix whatever you break.  Actually, it's fine by me if your answer to that is going down to the location, ripping the box out of the dusty hole (nearly all nodes live in dusty holes), bringing it to someone who can fix it, watching them do it, then bringing it back to the dusty hole. It's a great learning experience, and you're not likely to make the same mistake twice after it costs you.



They don't need to be done in the NodeAudit list order or the middle of the night or any specific thing. A fair percentage tend to break from time to time, because we're neglecting them, because their operational environment is partially submerged and the water is moonlighting as part of a 220V circuit or just because some of them are really crappy old computers - you'd be amazed. Anyway, if you're going to attempt the process at all, don't be timid, just beat the thing up by running through the steps until you're pretty sure there are no more steps - then it'll either be really properly broken or working perfectly. Either way, you'll get three cheers from me.



Computers are pretty good at remembering stuff; if you end up on a node that's been done already, it's just going to tell you that, and usually with this type of stuff, it will refuse to do it again. It will also refuse to do stuff at all if it's something that doesn't seem like a good idea to it. As a final word of warning, some of these systems were not configured with major upgrades in mind, and some were only barely adequate to begin with, so keep on the look-out for completely filled hard disks and partitions. If you get stuck with a full disk, do the best you can to get the system to at least keep serving up Internet access and make a note of it on this page. A few of them could use new disks... Anyway, just dive in and have fun, and you'll have the quirks figured out in no time.





1. Most NuCabs will need to be upgraded to etch first.
+Whenever someone raises the issue of tunnels in a discussion and there's a new initiative to start creating VPN tunnels, the conclusion is always that we need to take care of the NodesBehindNat. It's an easy decision to reach by committee, since it allows you to completely overrule any naysayer Scrooge types by playing the security card, and the folks who just really want a network get told it'll happen sometime soon. Everyone's happy. It's happened the same way many times, usually with different groups of people; rather than listing the cast for the whole series or even just the latest episode, just give yourself a pat on the back if you've ever been one of us.



The rationale is generally that those systems stand to gain the most at first. While this is completely true, the idea has actually slowed down the progress of VPN deployment overall, as a result of the very same factor which is always thought will speed it along: these nodes aren't accessible except to a person with a laptop who must physically travel to each one. The amount of work is multiplied; it's not just a matter of a couple of hours with a terminal - it can take days of traveling all over town trying to make the right things happen.



The idea that these nodes have a greater need for accessibility improvements has real merit, but there is no reason they should be handled with exclusive priority over other potential nodes. In particular, the process of establishing an effective baseline configuration is greatly simplified when you have the ability to bring the interface up and down freely.



=== In the trenches ===



We've got a whole bunch of nodes that need to get hooked up. A complete list needs to be created or located; NodeAudit is probably a good starting point, and maybe the best thing we have. It is unlikely that there is even one node that is completely current in all of the aspects discussed here, although only a few will need all of these updates. Every single active node is different, which complicates matters when you want to start thinking about them as a group to simplify management, or even if you just want to provide useful directions about how they work. Connecting outdated systems also presents a serious threat to the continued viability of the network; in the past, version compatibility problems have brought the entire project to a halt after we chose to ignore them.



Some work has already been done and more details will be provided on this page when possible. Anyone who is part of the NetworkOperationsTeam is invited to participate as much as they are able. Others who are interested should contact a team member and ask about joining.





== Preparation ==



=== Guidelines ===



Be patient; this is probably the most significant single task that this group has ever attempted, and we'll be lucky if we get them all by 2008. Please don't begin the task unless you are prepared to fix whatever you break, even if it means ripping the box out of the dusty hole (nearly all nodes live in dusty holes), bringing it to someone who can fix it, watching them do it, then bringing it back to the dusty hole. It's a great learning experience, and you're not likely to make the same mistake twice after it costs you.



If you are doing upgrades remotely, it can be helpful to make phone contact with someone who has access to the system; a pair of hands on site can make quite a difference in some situations. If you are on site, it's a good idea to introduce yourself and explain that you are going to be working on the system. If everything goes well, the only noticeable effect will likely be that people will be trapped by the captive portal sooner than usual when it is rebooted. Some nodes can also slow down when new software is being downloaded and installed.



It sometimes happens that things do not go so well. Some of these systems were not configured with major upgrades in mind, and some were only barely adequate to begin with, so keep on the look-out for full hard disks and partitions. Check for free space before you begin downloading with `df -h`; anything less than 50M available on each partition can cause problems. You can keep an eye on file sizes when running `aptitude` if you are cutting it close. If you get stuck with a full disk, do the best you can to get the system to keep serving up Internet access and make a note of it on this page. Many nodes could use new disks, which means a fresh installation, but unfortunately there is no current documentation about that process. If you think a node needs a new disk, contact the NetworkOperationsTeam and we'll figure out a solution. As if hard disk issues were not enough of a problem, anecdotal evidence suggests that several nodes tend to just crash or break without warning from time to time; because we're neglecting them, because their operational environment is partially submerged and the water is moonlighting as part of a 220V circuit or just because some of them are really crappy old computers.



Be ready for anything if you're going to attempt the process at all. For most of the process, you can stop any time and no harm will be done - even so, don't be timid, just beat the thing up by running through the commands until you're pretty sure there are no more steps - then it'll either be really properly broken or working perfectly. Computers are pretty good at remembering stuff; if you end up on a node that's been done already, or you run the same command twice or skip one, it's just going to tell you that, and usually with this type of stuff, it will refuse to do it again. It will also refuse to do stuff at all if it's something that it doesn't like. Anyway, just dive in and have fun, and you'll have the quirks figured out in no time.





=== Tasks ===



With that in mind, this is a rough breakdown of things that need to happen to each node before that new tunnel goes hot. 





1. Most nodes will need to be upgraded to etch first.
-Line 71:
+Line 67:
-. Install the etch kernel after all of the other software updates and get it running.

   * Technically this is part of the etch upgrade, but it is worth repeating. It's important.

   * Some nodes will appear to be upgraded to etch already but are actually running sarge or woody kernels. Always check.

   * We really want our shiny new VPN to work properly after we set it up, and having a dozen different kernel versions trying to interact sanely does not help at all. It will seem like it works fine at first, but eventually things will start acting weird, so just take care of it before it's a problem.



The same goes for OpenVPN and olsrd versions; use only the stuff in etch, if it's a NuCab, or you'll wish you had.



Last but not least, it would be really nice to get them hooked into the Osiris server. It doesn't all have to happen at once but it will all need to get done eventually, and it's really pretty quick and easy to do it all in one session. You can move quickly; nodes don't get upset or anything if every detail isn't checked and double-checked. We spend a little more time now to save a lot of time later. 



After those nodes are connected, work should be continued on interconnecting other nodes, as time allows.



These are some pretty mundane goals, and they rather make it sound like someone is paying us or something. Some more interesting (although also more long-term) ideas are described in the section about benefits, above.
+. Install the etch kernel after all of the other updates and get it all running.

   * Technically this is part of the etch upgrade, but you must be certain it gets done.

   * Sometimes you'll need to select a different kernel specifically; some nodes will seem to be upgraded to etch completely while still running sarge or woody kernels. Take a look at `/boot` to get the whole story and always check `uname -a` as a confirmation.

   * We really want our shiny new VPN to work when we get it set up, and running anything other than a proper kernel can cause serious problems.

   * Really, this is very important. Contact another team member if you need help.

3. After the etch kernel is running, install and configure any software that isn't already taken care of.

   * OpenVPN (current etch version)

   * olsrd (backported version; see DebianAptSource)

   * osirisd (current etch version)

   * use only the versions indicated or you'll wish you had.

4. Make sure it all works as well as you're able. Clean up as much as possible. Remove old packages and even consider another reboot just to be sure all of the configuration is working correctly.





These are some pretty mundane goals, and they rather make it sound like someone is paying us or something. Be that as it may, this process should provide us with a level of connectivity far beyond anything we have done before. To build upon this foundation, some more interesting ideas (although also more long-term) are described in the section about benefits, above.
-Line 87:
+Line 85:
-For now, we are using one central server with several clients connecting to the server, in a classic hub pattern. At some point, we're going to reach an upper limit with this design and we will have to re-evaluate our options given the technology available at that point.



JimmySchmierbach has done some fairly significant planning in anticipation of this eventuality, including some detailed documentation of a design based on a hierarchy with two tiers, referred to as supernodes and nodes. The basic idea was that all of the supernodes would be connected together with tunnels to form a full mesh pattern, then each node would connect to one or more of the supernodes. The original drawings specified the three core servers as the supernodes: cornerstone, bone and alitheia.



Someone once wrote here that Jimmy's plan involved the supernodes each being connected to a master node (eg. donk), resulting in a hierarchy with three tiers. That is not correct. donk was never a functional part in the original design. It didn't have to be; the idea was that with three supernodes in a mesh, all bases were covered as long as only one server was ever down at a time. All of the routes would still work because everything was supposed to be redundant.



It has been roughly one year since that design was dreamed up; unfortunately, during that time, alitheia has been completely removed from service and the Subversion repository (which was our most important organizational tool) has been destroyed. bone has also seen some hard times; although it ended up with an upgraded mainboard and some other components, it is now hosted virtually in the same location as cornerstone, which unfortunately forces us quickly to the conclusion that, of the three supernodes in the original design, only cornerstone retains even a chance of being effective in that role. 



Jimmy's design is really impressive in theory, but I don't recall that we ever actually got it to work with all of the good parts. The VPN clients kept taking naps and the dynamic routing daemons got confused about the fact that we were running a mesh on a layer over their heads. Sometimes machines on the same switch, side by side, would decide that they'd prefer to talk to each other through a big chunk of Internet. It might work better with current software but I'm not really in a big hurry to try it; there's a lot to be said for simplicity, like for example a big fat server that handles everything and actually just works. You want redundancy? Get another big fat server, and just double everything or let it do round robin failover. Simple. Works. Doesn't confuse the software or the people configuring it.



This brings us right back around to the first paragraph in this section: we're running the whole show from donk, until it starts to break, at which point we need to look at our options. I propose we add capacity the same way I suggested we add redundancy: more big fat servers. However, that's probably a moot point, since we're nowhere near any kind of capacity limit. I don't expect we will even catch a glimpse of a limit until real users start generating traffic on the tunnel network, which is not likely given our modest selection of near-term goals, outlined in the previous section.

At that point, my bet is that the pipe hits a red line before the box does, anyway.
+=== Design ===



For now, we are using one central server (donk) and allowing any number of clients to connect directly over TCP, in a classic hub pattern.



=== Scalability ===



At some point in the future, we will likely to reach an upper limit with this design and we will have to re-evaluate our options given the options which are available at that point. Given the nature of the system, the resource that is most likely to be exhausted first would be bandwidth.
-Line 166:
+Line 158:
-Line 189:
+Line 182:
-Line 194:
+Line 188:
+Technical details about the proposed VPN configuration: (not currently being used)



 * OpenvpnIpSchemeProposal

   * OpenvpnIpScheme

   * OpenvpnNamingScheme

   * OpenvpnPortScheme 

 * NodesBehindNat

 * NetworkAddressAllocations





== History ==



In Summer of 2006, JimmySchmierbach and KeeganQuinn spent several weeks planning and testing a design for a virtual network with the potential to scale to serve the entire city. The results of this project were inconclusive due to problems with software stability, but a complete theory for the design was formulated. The central idea is based on a hierarchy with two tiers, referred to as supernodes and nodes. The basic idea was that all of the supernodes would be connected together with tunnels in a full mesh pattern, then each node would maintain a connection with two or more of the supernodes at all times. Fault tolerance is a central element in this design; actual potential for operation on a large scale remains untested.



Jimmy's original drawings specified the three core servers as the supernodes: cornerstone, bone and alitheia. This page previously stated that the design included the idea of supernodes each being connected to a master node (eg. donk), resulting in a hierarchy with three tiers. That is not correct: donk was never a functional part in the original design, and was never connected to the other systems during this project. It would not have provided any notable benefit in acting as an independent tier; with three supernodes in a mesh in the center and a minimum of two supernode connections per node, all of the routes supported by the system could be maintained if any one server failed.



The design is really impressive in theory, but when the time came to test it out, things didn't work out so well. The VPN clients kept taking naps and the dynamic routing daemons often got confused about the fact that we were running a mesh on a layer over their heads. Sometimes machines on the same switch, side by side, would decide that they'd prefer to talk to each other through a big chunk of Internet. This type of erratic behavior made it difficult to take the project seriously at the time.



It certainly seems that the software versions available now have come a long way in terms of solving these problems. The reliability and consistency of the current network is far better than has ever been accomplished in the past. However, it is difficult to say with certainty if this effect is a result of improvements in the software or the simplification of our topology. At this point, building a working network is more important than proving our triad theory; perhaps at some point in the future we will have a better opportunity to test it.
-Line 195:
+Line 212:
-CategoryDocumentation CategoryDamnYouKeegan
+CategoryDocumentation CategoryDamnYouKeegan CategoryEducation CategoryMan