At first I had dismissed tahoe-lafs for any distributed filesystems needs that I had. Recently a project had cropped up in work that needed backups done. The nature of the data is probably considered to be sensitive and I'd rather not know what I am backing up. Not only does the data need to be backed up, but it has to be secure and safely stored for disaster recovery.

Assuming that you have managed to install tahoe-lafs, following the instructions to install it is pretty straightforward. Setting up your own grid or a node isn't as clear.

The basic tahoe grid needs to have 1 introducer node and at least 10 storage nodes, this is all based on the default settings. The first test setup that I created had the introducer node on one of my storage node.

On my first node

$ tahoe create-introducer /data/tahoe-introducer
... edit /data/tahoe-introducer/tahoe.cfg and set a nickname ...
$ tahoe start -d /data/tahoe-introducer
$ tahoe create-node /data/tahoe-node
... edit /data/tahoe-node/tahoe-node/tahoe.cfg and set a nickname ...
$ cp /data/tahoe-introducer/introducer.furl /data/tahoe-node
$ tahoe start -d /data/tahoe-node

Then on my other 9 nodes I just did...

$ tahoe create-node /data/tahoe-node
... edit /data/tahoe-node/tahoe-node/tahoe.cfg and set a nickname ...
$ scp node01:/data/tahoe-introducer/introducer.furl /data/tahoe-node
$ tahoe start -d /data/tahoe-node

Assuming all goes well and everything starts correctly, you can then setup a client node, for example on my desktop

  $ tahoe create-client
  $ scp node01:/data/tahoe-introducer/introducer.furl ~/.tahoe
  $ tahoe start

Once I started the client on my desktop, I was able to view the gateway page on my desktop by going to http://localhost:3456. The webgui is nice to view things, but its not very useful for managing your files in the system. You can do things like

 $ tahoe ls
 $ tahoe cp SOURCE alias:DEST

where alias is a mapping to a URI or hash in the tahoe-lafs system, this took me a while to figure out and understand, so its worth reading up the docs for more info on the concept of CAPs and ROOTCAPs etc... these are pretty much identifiers for retrieving your data.

You will need to protect these hashes and store them somewhere safe outside of the tahoe-lafs system if you with to keep your data private. Without these hashes you cannot access the data.

So far I've just experimented with the backup command and the deep-check command for repairing data.

   $ tahoe backup IMPORTANT alias:backup/IMPORTANT
   $ tahoe deep-check --add-lease --repair alias:

I'm so far impressed with how resilient the system is and how easy it was to setup the basic system. Performance wise, it's a bit slow compared to plain old rsync, tar and scp for uploading data. Downloading data from the system seems just as fast as any other method.

I've been testing on a mac desktop and a bunch of linux servers and a few windows machines. So the system is quite portable and runs on lots of different platforms.

I'm curious to see how many files can I put into the system before it grinds to a halt. I wonder how big files have to be before tahoe-lafs chokes. There's also a bunch of other issues with upgrades and migrating data as new releases of the software comes out, I guess this needs to be experimented with a bit more.

This is also of interest to read, its a thread on client-caching it does what I think I want to do.

Bookmark and Share