Search This Blog

Thursday, July 12, 2018

Completed Cluster! Benchmarks

Finally...after months of working on this, the full 5 node cluster with infiniband works. I ran some more of the motorBike Openfoam benchmarks.

  • Headnode only, n=20: 1.12 iter/s
  • Compute node only, n=20: 1.015 ips
  • head+node002, n=40, 1Gbe: 1.75 ips
  • head+node002, n=40, QDR infiniband: 2.18 ips
  • all 5, n=100, 1Gbe: 1.56 ips
  • all 5, n=100, QDR infiniband: 5.24 ips
You can see that the 1Gb ethernet link is definitely the bottleneck. In fact, it's so restrictive that using 5 nodes or more actually hurts performance. My guess is that the maximum performance with the 1Gbe link is probably about 3 nodes. The QDR Infiniband link is a different story entirely. It shows perfect scaling (sum of the headnode + X compute node ips) up to 5 nodes, and it'd probably continue to show excellent scaling up to many more, particularly for larger meshes.



Feels good man...


Update (3 moths later): The n=100 result is not realistic. Coincidentally, a (corrected method) n=108 case with FDR Infiniband ended up with almost the same iter/s (5.29), so just imagine the caption replaced. See this post for an explanation.


Still have some stuff to do:
  1. Clean up the wiring
  2. Get everything situated in the soundproof cabinet
  3. Fix the heat extraction system if it isn't sufficient
  4. Fix the RAID1 data array in the headnode so it stops failing
  5. Compile these blog posts into step-by-step guides
  6. Use the cluster

No comments:

Post a Comment