So I've talked about "How not to build a High Performance Computer". I'd now like to rant on how I would build a "Cheap High Performance Computer". If one does an online search for "How to build a cluster", you will more than likely get hundreds if not thousands of hits. You're also likely to get performance analysis reports and cost analysis reports etc...

Clusters are systems, not just individual components. It's the sum of all the components which makes the system useful and or usable for the user who wants to do their work. Clusters usually contain at least the following.

  • Controller Nodes
  • Compute Nodes
  • Storage Nodes
  • Management Network
  • Compute and or IO Network
  • Data center / Machine room with the necessary power and cooling
  • Software stack to manage and provide the service

Ideally I would group the Controller, Compute and Storage nodes all into the same purchasing pool but only vary them slightly to allow them to function with their designated services. It makes purchasing easier.

Step 1

Decide on what application you want to run. Will be bio-informatics, quantum chemistry, high energy particle physics.

Step 2

Find out what the code does. Does the code do lots of file IO? Does it do lots of communications. Who develops the code and if its a local developer, can you influence how the code is designed.

Step 3

Once Step 2 is done, optimise your budget for the required hardware to complete your task. Procuring a capacity machine is different from a capability machine. Of course there are also different types of capacity machines and capability machines.

You could for instance go for high clock rate machines with 1gb ram per core and cheapish infiniband/gig-e/10gig-e if your jobs are BLAS/LAPACK operations intensive. Or else if you are latency/communications sensitive you might really care about a fast scalable comms network, you might care less about clock rate, so cheaper processors will be fine. You may even not care about comms or high clock speeds, but rather more memory is useful.

It all boils down to an optimisation problem.

Step 4

Once you've decided on the type of machine/equipment you think you need, you might want to start and build a list of the applications you want to run with some realistic sample inputs that the user wants to run, use this as benchmarks for evaluating whether the suggested hardware that vendors suggest is good or not. Synthetic benchmarks like HPL are pretty useless in most cases for evaluating and are only good for PR purposes.

Step 5

Try and build a list of the required hardware and see if you can estimate roughly how much you can get it for.

Step 6

Armed with a list of benchmarks and a manifest of what you think you might need. Write a request for procuring this machine and issue it to the various vendors you are interested in.

Step 7

Collect the replies if any and evaluate the performance of the suggested machines based on the benchmarks that the vendors have run for you. Evaluate whether the proposals meet your budget or not.

Step 8

Procure the machine, install it.

Step 9

Benchmark the machine that you have procured to make sure it does what you and the think the cluster should do. Make sure everything is working, if there are failover components, try breaking the system in controlled ways to make sure you have the necessary redundancy required.

Step 10

Signoff, and profit!

Bookmark and Share