There's lots of free software out there that will provide distributed filesystems and grid based storage systems. Some notable ones which I've experimented and used in production before, from the point of plain data storage and management on HPC clusters

Each one of the above systems has there own features, issues and traits (and costs for running). Some are better suited for general purpose computing and some are suited for specialised storage and computing.

The traditional distributed systems like Lustre, GlusterFS and GPFS are pretty useful and offers a pretty good way of scaling out and expanding your system to meet your requirements. The only thing they lack is a complete storage management system, this is where iRODS shines, you can do much more with metadata with the data that you have. SECTOR sort of sits in between, it's interesting to see the p2p design of the system and the capabilities it has, it just lacks a coherent user interface right now, the FUSE client for SECTOR doesn't provide full posix compliance, but it's good enough to do most things.

A quick review of some of these systems (based on my experience)...

  • Lustre - Has great read/write performance, scales out well. Reliability is an issue, it relies on the sysadmin to have lots of experience in setting up LinuxHA. Although the software is free, the running cost and start up cost in learning how the technology works is quite high. There were issues quotas the last time we tried Lustre out on our compute clusters in work. Doing FSCK's on a Lustre is complicated and messy, recovering from failed OST's and OSS's used to be pretty involved. Setting up the MDS's and MDT's such that it is reliable, redundant and highly available can be an involved process. Migrating the OSS's, MDS's, OST's and MDT's between systems can be complicated.

  • GPFS - Like Lustre read/write performance is pretty good and scales quite well. Once you read the basic design document of GPFS and run through the process of setting up and adminning a GPFS you will appreciate how easy it is to run and maintain the system. The sysadmin doesn't need to learn about LinuxHA and all the complexities that go with it. IBM techsupport for GPFS (if you have a support contract) is extremely good, if you have ever needed to log an issue with them you will know, things get fixed and fast! The overall management of the NSD's and nodes in a GPFS cluster is pretty trivial.

  • GlusterFS - Write performance suffers as you add nodes to the system (the last time I tried it), but that depends on how you set it up (if you want raid1/0 with replication etc...). Read performance is as good as the above two filesystems. The setup cost is medium, I have no experience with setting up and monitoring linuxHA with GlusterFS, although GlusterFS does have some redundancy features, it lacked some management tools and documentation at the time when I tested it.

  • iRODS - I've only used this for research projects (developing micro-services and work-flows). It layers itself on top of an existing filesystems (or many machines and filesystems) and it can scale quite well from the looks of it. The main issue with iRODS is that to make the most of the cool features you will need to be using a GSI infrastructure for authentication, read this as grid certs! I'd pretty happy to use this for large scale archiving and storage of data that doesn't have a high rate of change. The start up costs for this is probably as high as Lustre and users will need to learn how to use the system as well since there isn't really a filesystem interface to the storage system. iRODS also lets the admin/user to write rules/work-flows and user defined functions (micro-services) to manipulate the metadata or data in the system. I have yet to try the FUSE client for iRODS. Having a LDAP back end to the user system would be nice.

  • SECTOR - I've only been testing this out on a small scale on a few nodes. The startup costs is quite low. Adding additional slave and master nodes to the system is trivial. Basic management of the system seems trivial. From the minimal amount of testing write performance is not great, but from first glance read performance scales pretty much linearly with the number nodes that you add to the system. The concept of a topology for your storage nodes is also interesting and useful as you can define replication numbers and distance based on the topology. The user interface is a bit weird and requires a little bit of technical knowledge to use. There is a FUSE client, but it isn't fully posix compliant. SECTOR like iRODS lets you write user defined functions to operate on your data, but in SECTOR's case every slave node becomes a compute element. SECTOR's user experience could be better, but it's not the worst. The user management probably needs more work. The system it provides is sufficient but it would be nice to have a bit more such as GSI, LDAP or PAM support. There isn't really metadata in the system, but is there a need for it?

So what was the point of the above? Not much really, except that there are lots of good solutions out there and you don't have to pay much for it if you have knowledgeable techies working for you and know what you want. If I want to run a cluster with a distributed and parallel filesystem I'd probably pick in the order of GPFS, Lustre then GlusterFS. If I want to analyse lots of data in parallel, from what I've played with I'd pick (in this order) iRODS/SECTOR, GPFS, Lustre then GlusterFS. For archiving there is no question that iRODS would be picked, but beyond that, anything else with a commercial support contract would probably be ok.

I spend some of my time in work getting paid to mess with these systems and sometimes I just do it for fun. I've a few ideas in the pipeline for the above tools. Some include

  • Doing analysis with bio-applications and bio-data (MRI, aligning, sequencing) with SECTOR.

  • Plugging in SECTOR as a backend storage system for iRODS to get the WAN capabilities of SECTOR.

  • Update the Python interface to SECTOR (as soon as I learn CPP and swig).

  • Test SECTOR in a WAN environment and maybe build a mini storage cloud for the bio people who have data distribution problems between our sites in work.

  • Test out the FUSE interface for iRODS and update some of the test code I have for iRODS.

Bookmark and Share