Rocket Science: Completed Cluster! Benchmarks

Thursday, July 12, 2018

Completed Cluster! Benchmarks

Finally...after months of working on this, the full 5 node cluster with infiniband works. I ran some more of the motorBike Openfoam benchmarks.

Headnode only, n=20: 1.12 iter/s
Compute node only, n=20: 1.015 ips
head+node002, n=40, 1Gbe: 1.75 ips
head+node002, n=40, QDR infiniband: 2.18 ips
all 5, n=100, 1Gbe: 1.56 ips
all 5, n=100, QDR infiniband: 5.24 ips

You can see that the 1Gb ethernet link is definitely the bottleneck. In fact, it's so restrictive that using 5 nodes or more actually hurts performance. My guess is that the maximum performance with the 1Gbe link is probably about 3 nodes. The QDR Infiniband link is a different story entirely. It shows perfect scaling (sum of the headnode + X compute node ips) up to 5 nodes, and it'd probably continue to show excellent scaling up to many more, particularly for larger meshes.

Feels good man...

Update (3 moths later): The n=100 result is not realistic. Coincidentally, a (corrected method) n=108 case with FDR Infiniband ended up with almost the same iter/s (5.29), so just imagine the caption replaced. See this post for an explanation.

Still have some stuff to do:

Clean up the wiring
Get everything situated in the soundproof cabinet
Fix the heat extraction system if it isn't sufficient
Fix the RAID1 data array in the headnode so it stops failing
Compile these blog posts into step-by-step guides
Use the cluster

Rocket Science

Search This Blog

Thursday, July 12, 2018

Completed Cluster! Benchmarks

No comments:

Post a Comment

Followers