Search This Blog

Saturday, June 16, 2018

More cluster hardware changes

I got a good deal on 4x E5-2690 v2's, so I decided to switch out the E5-2690's for them and run some tests. With 20 cores, the motorbike benchmark ended up about 2s faster than with the E5-2667 v2's on 16 cores. On 16 cores, the E5-2690 v2's were about 1s slower. That's well within the repeatability margin. On one core, the E5-2667 v2 was bout 7% faster, which makes sense because it's single core turbo frequency is about 10% higher. The reason the E5-2690 v2's aren't faster despite having a higher core*GHz (28.8 vs. 33) is due to the memory bottleneck. For these nodes and this benchmark, time improvement is on the order of 10's of seconds for 8 cores and up and only seconds for 16 cores and up. The memory bandwidth is fully saturated, so additional cores simply don't help. This is why processors with many (>16), but slower cores are not recommend for CFD. On non-memory bottleneck workloads (pretty much everything not CFD), the E5-2690 v2's should be faster due to the additional cores, so I decided to go with those. The seller had more of them, so I got 4x more at the same price and replaced the E5-2667 v2's. Now my cluster has 100 cores at 3-3.3GHz. Nice.

No comments:

Post a Comment