What's a good distro for building a HPC cluster? Well what do you want to do? There are linux distro's out there which are nice as a building block, and then there are turnkey solutions if you are really lazy and then there are the distro's that sit in between.

There's the usual set of distro's like

Then there are turnkey solutions such as

or else you might come across distros that sort of sit in between.

I've helped to setup and maintain a few debian and scientificlinux based clusters over the years.

From experience, as nice as debian was, there was almost no end to pain with commercial software and drivers for new cutting edge hardware, filesystem drivers (for things like lustre or GPFS) and packaging stuff up for debian. I would probably not pick debian again as distro for a HPC machine unless I knew that the requirements did not involve commericial software or brand spanking new hardware which requires binary blobs and custom startup scripts. I would imagine fedora would not be too far off from how debian worked for us.

scientificlinux and it's RHEL cloned cousins are a much better for a production environment. A lot of vendors tend to have support for the upstream RHEL distro, so it's nice to be able to plug in software into a free clone and see it work. In general you get almost no problems with commercial codes and drivers. In fact we use scientificlinux in work.

If someone ever tries to convince me to use a particular distro for building a cluster I would entertain the idea. I'd usually ask a few questions about the distro first (to see if it meets my criteria for sucessful deployment and management), such as:

  • How long is it supported for (the longer the better)?
  • Will things like Lustre/GPFS build cleanly?
  • Is the distro stable and tested, does the upstream vendor maintain real bug fixes and security fixes as opposed to "upgrading the package to a newer version will fix the problem"?
  • Is the kernel version stable? i.e. like the bug fixes, I want a stable API/ABI etc... so I can let users target a known system that will be consistant for a long time so they can be sure their codes will work and they won't need to spend time writing solutions to edge cases on a changing system.
  • How easy is it to create packages and a repository for the distro? So the deployment and management process can be streamlined, automated and repeated.

In general I like things to be supported for a long time, and be compatible with a bunch of commercial applications. Over time I've noticed that it's really only the RHEL based distro's that seem to really shine for HPC sysadmins (who run clusters bigger than say 32nodes). I would lean towards RHEL based distro's myself for building a cluster.

The outliers of RHEL derived distros such as caos and chaos are linux pretty good. These two distro's include a bunch of tools that I would install after I have done a base install, but they do require the sysadmin to know what they are doing, but that is no different from rolling my own with scientificlinux. There is a market for these types of tools, that's why there are things like oscar and rockscluster. There are turnkey solutions for those with no experience or budget in setting up a cluster, but be warned they are "limiting" in the way that you can configure your cluster to your tastes.

In the real world (for HPC sysadmins and users), people want to do work and not debate over who is more free or who has the better package manager or best and newest selection of software. Of course if it is free people will gravitate towards it. If it's easy to use then even more people gravitate to it.


Related posts:

Memory debuggers and garbage collectors for C/C++
Posted

Alternatives to using GOLD, Maui for accounting and banking for HPC systems
Posted

The ultimate sandbox game, ever!
Posted

Astro, Particle, High Energy physics is cool
Posted

Creating a git branch with no ancestry
Posted

Bookmark and Share