Search This Blog

Monday, October 22, 2018

More thermal management

The headnode's CPU 1 sometimes shows temperatures about 6C higher than CPU 2, despite the same reported power draw. I tried tightening the screws on CPU slightly, but I don't want to wrench them down due to the lack of a back plate. It seemed to help slightly, maybe 1-2 C. The temperatures aren't breaching 70 C, so I'm not too concerned. I moved the GPU down a slot to give more room for the CPU fans to intake air.

As a follow on to this post, I purchased 3x new heat extraction fans. I couldn't get the 24V versions cheaply, so I bought 12V ones and a new 12V power supply for them. The ones I had in there before were louder than everything with the cabinet open, which defied the purpose of a soundproof cabinet. The new ones have same total max flow rate, but lower pressure and total noise. I soldered on fan connectors, made a custom 3 way splitter, connected them up, reinstalled the fan bracket, and tried it out. MUCH quieter with the cabinet closed up now. Definitely quieter than the server and switch with open doors, so that's good. The flow rate isn't as high, so I'm guessing there is more pressure drop than what I was measuring with the water manometer. I have them connected directly to the power supply instead of through the PWM fan controller because I think they will need to operate at full throttle all of the time. Total power draw is about 40W, which is a small price to pay for a quieter server. I did some stress testing to see how hot it would get in there. The server's system temp got to about 39 C with the doors closed, which is just 2C higher than with them open. No thermal shutdowns, so I think that's a success. I got that annoying segfault error again, twice. It said the source was the headnode this time, instead of node005. I'm not sure whether it's actually a component going bad, or some weird thing with the code. When it occurs is inconsistent, too. 

I purchased and installed 2x new 140mm case fans in the headnode into some blank spots to help with heat extraction. I also purchased another one to replace the fan in the PSU because it was clicking. However, when I took the fan out and ran it separately, it no longer clicked. I think the fan cable had wiggled loose and was touching the fan blade when it was installed in the PSU because, after I secured the cable, it no longer clicked. The server is pretty quiet now, even when running full blast.

I also mounted the power strip on the side of the cabinet. I had tried various tapes before, but they all eventually failed. This time, I drilled and screwed in brass M3 threaded inserts, 3D printed some brackets I designed to hold the power strip, and screwed them on. After that, I cleaned up the rest of the wiring in and around the cabinet.

No more falling power strip


To do:
1. Replace ntp with chrony on all nodes (ntp works between nodes, but headnode won't sync)
2. Figure out how to deal with switching the heat extraction fans on and off so I don't have to open the cabinet door every time.

No comments:

Post a Comment