Search This Blog

Saturday, November 3, 2018

Homelab Cluster: Hardware Finally Done

The day has finally come: I'm happy with the homelab's hardware. *fireworks*

Final list of hardware:
1. Headnode: ASUS Z10PE-D8, 2x Xeon E5-2690V4 ES (14c @ 3GHz), 8x8GB 2Rx8 PC4-19200 ECC RDIMMs, 500GB Samsung 960 Evo NVMe (CentOS 7.5), 2x 3TB HDD in RAID1 (data), 480GB SSD (Windows), GTX Titan, CX354A FDR IB HCA.
2. Compute nodes: Supermicro 6027TR-HTR, which has 4x nodes: 2x E5-2690v2, 8x8GB dual rank PC3-14900R ECC RDIMMs, 120GB SSD (CentOS 7.5 compute node), CX354A FDR IB HCA.
3. Mellanox SX6005 FDR switch with Sonoff wifi power switch
4. 2x 8 port unmanaged 1Gbe switches, one for IPMI, one for intranet
5. Riello UPS: 3300VA, 2300W
6. APC NetShelter CX Soundproof Cabinet with custom, automatic heat extraction system

Here are some pictures:

The whole homelab + 3D printer. It's final position will be about 2 feet to the right. The plywood under it is to allow for easy rolling over the carpeted floor. My desk with the monitor is just to the right.


Front of cabinet. Looks clean and organized.

Back is a little bit of a mess, but it's the best I could come up with. All of the cables are too long, so I had to coil them.

Close up of heat extraction electronics. The controller board is mounted in its new 3D printed tray.

Mounted power strip for not-C13-plug things
It's currently cranking through TB's of astrophysics data. I'll be running CFD cases on it soon.

Possible future changes

Since I just got finished saying that the homelab cluster is finished, it's time to list some possible future upgrades, because that's how this hobby goes...

1. Clean up the wiring a little more. It's kind of ugly in the back due to all of the coiled up wires. I'm not really sure how to make it neater without custom cables, though, and that definitely isn't worth the time/money involved to me. 

2. Rack rails/strips. Racking the server and switches might help clean up the wiring inside slightly and make it look neater. The biggest problem with doing this is that I will lose the ability to pull the SM compute nodes out. They come out of the back, and I currently have to slide the server to the side/angle it so that I can pull a node out of the back door. If the server chassis is racked, I won't be able to do that, so I'd have to pull the whole chassis out in order to get to a node. Aside from make it look a little prettier, adding rack rails would be pretty pointless, so this probably won't happen.

3. AMD EPYC. The new AMD EPYC processors are awesome for CFD. Each has 8 channels of DDR4-2666 RAM = crazy high memory bandwidth = more tasks/CPU before hitting the memory bandwidth bottleneck. Looking at the OpenFOAM benchmarks, two dual 7301 servers with dual rank RAM (4 CPUS, 64 cores @~2.9GHz) should be faster than my entire cluster (10 CPUS, 108 cores @~3GHz), and it's almost all thanks to memory bandwidth. Unfortunately, the economics don't make any sense. Building just one dual socket 7301 server/workstation would cost more than I spent on this whole cluster, even if the RAM, CPUs, and motherboard were all purchased used. Because its new hardware, there aren't many used EPYCs or motherboards on the market yet. Also, DDR4 RAM is absurdly expensive, mostly due to price fixing/collusion between the only three RAM manufacturers in the world. Two dual socket EPYC servers would require 32x dual rank 2666 RAM, which for 8GB at the cheapest (new) prices I could find would run about ~$3500....ouch. Again, since that's the latest speed RAM, there isn't much pre-owned DDR4-2666 yet. I did an electricity price analysis to see if it would still make sense economically to upgrade. Assuming running for 1/2 of a year, the current server would use 6100 kWh. At $0.22/kWh (England...), that's about $1350/year in electricity. I think two AMD EPYC servers would use about 900W. That's about $870/year in electricity, for a savings of ~$480/year. Even including selling off what I currently have, it'd take more years than it's worth for me to break even. So if/when this upgrade occurs, it will be in the future when the prices come down. One really exciting prospect of the EPYC chips is that they allow overlclocking. The famous overclocker der8auer's EPYC build used a prototype Elmor Labs to EVC v2 to overclock dual 7601's to 4GHz (all cores) on a Supermicro H11DSI and an ASUS RS700a-e9. He had to use novec submersion or dry ice, but he was able to get a fairly good overclock with a water cooler. Overclocking doesn't make sense in a large cluster/data center environment where running costs (electricity, cooling, maintenance) dominate. Power (and thus heat) scales with frequency cubed, so it's cheaper for them to buy more servers and not overclock. But in a homelab/small cluster environment, where initial hardware costs are usually the dominating factor, overlcocking makes a lot of sense, so this might be something I look into in a few years.

4. Add internal LED lights that come on when either the front or back doors are opened. These would probably be in the form of stick-on strips along the top front and top rear of the cabinet running off of the heat extraction system PSU. The only reason I haven't done this is that I doubt I'll be opening the cabinet much anymore now that everything is situated, heat extraction is automatic, and powering on equipment can all be done remotely.

No comments:

Post a Comment