Final list of hardware:
1. Headnode: ASUS Z10PE-D8, 2x Xeon E5-2690V4 ES (14c @ 3GHz), 8x8GB 2Rx8 PC4-19200 ECC RDIMMs, 500GB Samsung 960 Evo NVMe (CentOS 7.5), 2x 3TB HDD in RAID1 (data), 480GB SSD (Windows), GTX Titan, CX354A FDR IB HCA.
2. Compute nodes: Supermicro 6027TR-HTR, which has 4x nodes: 2x E5-2690v2, 8x8GB dual rank PC3-14900R ECC RDIMMs, 120GB SSD (CentOS 7.5 compute node), CX354A FDR IB HCA.
3. Mellanox SX6005 FDR switch with Sonoff wifi power switch
4. 2x 8 port unmanaged 1Gbe switches, one for IPMI, one for intranet
5. Riello UPS: 3300VA, 2300W
6. APC NetShelter CX Soundproof Cabinet with custom, automatic heat extraction system
Here are some pictures:
Front of cabinet. Looks clean and organized.
Close up of heat extraction electronics. The controller board is mounted in its new 3D printed tray.
|Mounted power strip for not-C13-plug things|
Possible future changes
Since I just got finished saying that the homelab cluster is finished, it's time to list some possible future upgrades, because that's how this hobby goes...
1. Clean up the wiring a little more. It's kind of ugly in the back due to all of the coiled up wires. I'm not really sure how to make it neater without custom cables, though, and that definitely isn't worth the time/money involved to me.
3. AMD EPYC. The new AMD EPYC processors are awesome for CFD. Each has 8 channels of DDR4-2666 RAM = crazy high memory bandwidth = more tasks/CPU before hitting the memory bandwidth bottleneck. Looking at the OpenFOAM benchmarks, two dual 7301 servers with dual rank RAM (4 CPUS, 64 cores @~2.9GHz) should be faster than my entire cluster (10 CPUS, 108 cores @~3GHz), and it's almost all thanks to memory bandwidth. Unfortunately, the economics don't make any sense. Building just one dual socket 7301 server/workstation would cost more than I spent on this whole cluster, even if the RAM, CPUs, and motherboard were all purchased used. Because its new hardware, there aren't many used EPYCs or motherboards on the market yet. Also, DDR4 RAM is absurdly expensive, mostly due to price fixing/collusion between the only three RAM manufacturers in the world. Two dual socket EPYC servers would require 32x dual rank 2666 RAM, which for 8GB at the cheapest (new) prices I could find would run about ~$3500....ouch. Again, since that's the latest speed RAM, there isn't much pre-owned DDR4-2666 yet. I did an electricity price analysis to see if it would still make sense economically to upgrade. Assuming running for 1/2 of a year, the current server would use 6100 kWh. At $0.22/kWh (England...), that's about $1350/year in electricity. I think two AMD EPYC servers would use about 900W. That's about $870/year in electricity, for a savings of ~$480/year. Even including selling off what I currently have, it'd take more years than it's worth for me to break even. So if/when this upgrade occurs, it will be in the future when the prices come down. One really exciting prospect of the EPYC chips is that they allow overlclocking. The famous overclocker der8auer's EPYC build used a prototype Elmor Labs to EVC v2 to overclock dual 7601's to 4GHz (all cores) on a Supermicro H11DSI and an ASUS RS700a-e9. He had to use novec submersion or dry ice, but he was able to get a fairly good overclock with a water cooler. Overclocking doesn't make sense in a large cluster/data center environment where running costs (electricity, cooling, maintenance) dominate. Power (and thus heat) scales with frequency cubed, so it's cheaper for them to buy more servers and not overclock. But in a homelab/small cluster environment, where initial hardware costs are usually the dominating factor, overlcocking makes a lot of sense, so this might be something I look into in a few years.
4. Add internal LED lights that come on when either the front or back doors are opened. These would probably be in the form of stick-on strips along the top front and top rear of the cabinet running off of the heat extraction system PSU. The only reason I haven't done this is that I doubt I'll be opening the cabinet much anymore now that everything is situated, heat extraction is automatic, and powering on equipment can all be done remotely.