Search This Blog

Friday, May 25, 2018

Homelab Cluster: Growing pains. NVMe boot with the X10DAi

This post is entirely about trying to get an NVMe SSD to boot from an X10DAi motherboard.

Apparently installing to and booting from an NVMe with Linux is a common problem. The NVMe, motherboard BIOS, and OS all have to have compatible drivers for it to work. My workstation motherboard is a X10DAi, and the NVMe I'm trying to use is the Samsung 960 Evo. This FAQ says to just enable EFI Option ROM for the PCIe slot the NVMe drive (well, it attached to its adapter) is in, then boot from a UEFI dvd/usb installer and install. However, this did not work for me with CentOS 7.3 because CentOS couldn't see the NVMe. I activated a shell from the centos installer (cntl + alt + F2) and did "lspci", but it didn't detect the NVMe drive, which means the motherboard/BIOS isn't seeing it either. The BIOS was a version out of date (2.0a instead of 3.0a), so I downloaded the new BIOS image and followed the instructions to update it, which went smoothly. Loaded default BIOS settings, changed the EFI OPROM on appropriate slot again, then booted to the CentOS UEFI installer. But it still didn't see the NVMe drive. Back to the shell...nope, nothing. "modprobe nvme" doesn't help either. "lsblk" doesn't see it. Well this sucks. So either the motherboard doesn't work with NVMe's (despite the FAQ), the PCIe adapter is bad, the drive is bad, or the drive is incompatible with Linux (unlikely). To test to see if it's CentOS, I tried installing Ubuntu 18. It also did not see the drive. So then I took it out of the work station and put it in the desktop. The desktops motherboard has an M.2 PCIe drive slot, so it should be compatible with PCIe SSDs in PCIe slots. Bingo, showed up in Ubuntu 16.04 "lsblk" without having to do anything. "lspci" shows the Samsung driver. I was able to partition it and write files to it. So the drive and the adapter are fine. The X10DAi is not compatible with most NVMe PCIe SSDs for boot, despite what the FAQ says. It may only work with certain ones, or it may only boot with Windows, which is annoying. This thread suggests the BIOS might have to be modded, which I really want to avoid. A post in that thread said that the 950 Pro works with an X10 and the latest BIOS. This thread has an interesting post in reply to someone else saying a 950 Pro worked in their X10DAX (which is same line as my X10DAi):
Contrary to nearly all other NVMe SSDs the Samsung 950 Pro has an NVMe Option ROM in the box. That is why you can boot off that SSD in LEGACY mode.
Generally you need a suitable NVMe EFI module within the mainboard BIOS, if you want to be able to boot off an NVMe SSD in UEFI mode.
I guess that explains it. The fancy Intel drives they suggest buying problem have the NVMe Option ROM, too. Here's another good post, this time about the X10DRi-T. He managed to get SM to send him a BIOS with NVMe support for the 960 Evo. Unfortunately, it seems they haven't updated the X10DAi's BIOS with the same code. Moving forward, I'll either have to:
  1. Mod BIOS
  2. Get a Samsung 950 Pro
  3. Get a different motherboard, like an ASUS Z10PE-D8/D16
I looked into the ASUS Z10PE-D8/D16. According to a few threads on servethehome, it turns out that unless you buy one from after Nov. 2015, they do not support dual E5 V4 CPUs, even with a BIOS update. Turns out they a chip has to be replaced on the motherboard to enable dual E5 V4's. Since most of the used boards are probably produced before that, and given my luck so far with this project, I think I won't do that. Unfortunately, the X10DRi-t's are expensive. None of the other dual SM X10 motherboards are confirmed to work with the 960 Evo or other consumer NVMe SSDs. That pretty much leaves getting a Samsung 950 Pro. For now, I'll have to use a regular SATA SSD for the OS. So I installed the OS, did updates, etc. I also contacted Supermicro Support to see if they could do anything.

Then I had a derp moment. I only had one CPU installed, which meant that the PCIe slot I was trying to use for the NVMe was not active. ** ** ***** * * * **** * (Oops). I put it in another slot and booted to the SATA SSD I had already installed CentOS on. "lsblk" showed the nvme, and lspci showed the samsung driver, but oddly for the PM961. I suppose the PM961, like the 950 Pro, is supported then? Maybe the PM951 is, too? Who knows. I shut down, disconnected the SATA drive, put the installer CentOS USB back in, booted to UEFI installer, which also saw the drive. I then installed CentOS 7 minimal to it, shut down, and rebooted. However, the NVMe drive was not listed as a boot option. So close!

So I tried again, but this time with an extra USB drive inserted. I did custom standard partitioning and placed the /boot and /boot/efi partitions (1 GiB each) only on the USB drive, but placed everything else only on the NVMe drive (root 50GiB, swap 8GiB, home ~407 GiB), switched the USB drive to the boot device, installed, and rebooted to the USB drive. THIS WORKED. CentOS minimal booted from the USB to the NVMe. Heck yes. It's ugly, but it works, which is what matters. I'll probably get a low-profile 4-8GB USB 3.0 drive for the boot partition drive and just leave it in my computer forever.

Hopefully I'll be able to enable SATA RAID in the BIOS so I can use the hardware RAID controller for the storage drives. There seems to be some incompatibility with the RAID setting (it's on AHCI now). 

Saturday, May 19, 2018

A few more successful prints

I designed and printed a couple things.

1. A better filament guide for the Wanhao i3.


It's shorter, so it aligns the filament with the extruder better. First attempt interfered with the spool holder, but I was able to modify it. The thingiverse model has been corrected. I made it so it would fit the teflon tube holder that fixes the coiling problem I mentioned earlier.

2. A SSD adapter for a Supermicro drive tray. I needed this for the homelab cluster.


This one is a bit tricky. Ideally, I'd have plastic on both sides and/or the bottom of the drive, but the way HDD's fit in the trays and the lack of space in the tray bays prevent this. Thus, the SSD is only held in by two side screws. Some support at the back helps keep it in place while being loaded. It works well, and is easy to adjust for thicker SFF drives. I thought about printing and selling them on eBay for less than the SM equivalent (MCP-220-00043-0N), but I'd barely break even.

My Phi adapters are selling ok on eBay. It's a small market, but I think my design is superior to all of the alternatives, so I should be able to make a tiny amount of money and hopefully partially pay off this printer.

Tuesday, May 15, 2018

Flashing Rebranded Mellanox Infiniband Cards and Homelab Infiniband, Part 3

I did the performance testing: Link. Bandwidth was about 3.3 GB/s , which is close to the theoretical maximum of 4GB/s (32 Gbit/s) of a QDR link. Primary conclusion: No significant difference between the firmware versions.

Next steps:
  1. Flash all cards with 2.11.2010 firmware
  2. Update IMPI and BIOS in other nodes, change BIOS settings to maximum performance
  3. Install all Infiniband cards
  4. Get OpenMPI and OpenFOAM working on one node. 
  5. Run OpenFOAM benchmarks
  6. Learn how to do network boot
    1. If this fails, mirror node installation to other nodes' ssd's
  7. Get clustered openmpi working over ethernet
  8. Get clustered OpenFOAM working over ethernet
  9. Get clustered OpenFOAM working over inifiniband

Monday, May 14, 2018

Flashing Rebranded Mellanox Infiniband Cards and Homelab Infiniband Part 2

When I last left off with this, I said I was going to purchase 4x more identical Sun QDR HCAs because I knew they worked with my Sun switch. So I did that. Specifically, I bought Sun/Oracle X4242A 375-3696-01 Rev. 51 Dual Port QDR Infiniband HCAs,which according to my research, are equivalent to a Mellanox MHQH29B. Turns out the cards had little stickers on them saying they were MHQH29B-XSR rev A3's, so my research was right.

This should be easy, right? I mean, they work with my desktop and the Sun switch at QDR speeds, what could go wrong? Hahaha...nope. Time for another installment of Infiniband Nightmares.

As a reminder, the server I want to put them in is a 4 node Supermicro 6027TR-HTR (motherboards: X9DRT-HF) server. I installed them all, but 3/4 caused boot to hang at post code 91, which is when the PCI stuff is loaded. Shit. The fourth one seems to let the node boot fine, though. Switching nodes/pci slots doesn't help, and I know all the pci slots work fine because I had the QLE7340's in them. "OK, so you have 3 dead cards."...except not. Here's the weird part: they all boot fine in my desktop (I7-5960x, X99-SLI motherboard) and are recognized by lspci and ibstat.

I thought it might be a BIOS problem, so I upgraded IMPI and the BIOS of one of the nodes. Unfortunately, that didn't help. 

I also tried a bunch of different BIOS settings, none of which helped (UPDATE: not true anymore, see bottom of this post). I tried taping the PCIe SMBus pins. Also didn't help. 

One last thing to try: card firmware. The only differences between the card that allowed boot and the 3/4 that did not are as follows:
  • GUIDs, MACs, Serial numbers (all duhs. Those will be different for every card)
  • Firmware Version. 2.11.2012 vs 2.11.2010
  • MIC Version: 1.5.0 vs. 1.2.0 (same order)
    • not sure what this
  • One line in the .ini files: "log2_uar_bar_megabytes = 7" vs. "sriov_en=true" (same order)
    • not sure what this does
  • The 2.11.2012 cards' firmware .bin file is about 12% larger. It's binary, so can't examine it.
The device ID (26428), PSID (SUN0170000009), HW revision, flint hardware info, mlxburn vpd (minus serial number), are all the same.

I did a ton of research about flashing firmware to rebranded Mellanox ConnectX-2 and ConnectX-3 HCAs. This relevant information for various HP, Dell, IBM, and Sun branded Mellanox cards. Here is a list of the links I found most useful for writing the following guides:
  • 1
  • 2 Look for post by TeeJayHoward. His website has some of the MHQH19B and MHQH29B firmware files. 
  • 3 Look for post by izx
  • 4 
  • 5 Mellanox guide
  • 6 Post I started to deal with this problem
Potentially useful files:
  • Most of TeeJayHoward's relevant files mirrored
  • Sun 2.11.2010 firmware
  • Sun 2.11.2012 firmware
  • Mellanox MFT 4.9 (in case they take it down for some dumb reason)
  • See Mellanox's firmware download site for what they consider current (the available firmware probably is not actually current).
Unfortunately, Mellanox has taken down their custom firmware table, so the mlx files are no longer available. Thus, you're stuck with whatever bin Mellanox provides you on their firmware download pages. HP offers free downloads of their firmware through their firmware website. Not sure about Dell. Sun and IBM's firmwares are locked by behind expensive support contracts. Also unfortunately, the official Mellanox firmware isn't always the most updated. For example, for the MHQH29B, the firmware revision for download is 2.9.1000, which is actually pretty old. It's so old that you can't use RDMA with Windows. If you want to change the firmware of your HCA, you're left with only a few options:
  1. Download the Mellanox firmware bin for your specific card from their firmware download page and burn it to your Mellanox card. This will be a .bin file that only works with one specific PSID. You can burn this to a rebranded Mellanox card using the process described below.
  2. Hope somebody downloaded the custom newer firmware for your card from Mellanox's table and that they're hosting those files somewhere. For example, in link 2 above, TeeJayHoward is hosting the 2.10.720 firmware for all revisions of MHQH19B and MHQH29B's. You have to build and burn the firmware (see process below).
  3. Find the firmware you want from your brand, e.g. HP. They might offer a newer version of the firmware, and they might not. Even with a support contract, they likely cannot provide you a different firmware version. 
  4.  Find the branded firmware from someone else hosting it. For example, the 2.11.2010 sun firmware from a different server is hosted in the last post of this thread. This is exceedingly rare.
  5. Transfer the firmware from one card to another (see process below). This requires you to be lucky enough to already have a card with a working firmware version. This is what I ended up doing.
  6. Buy newer cards (what all of the companies want you to do)
There really isn't much else you can do. 

The process for flashing Mellanox firmware from the Mellanox firmware website to Mellanox cards is straightforward and explained in link 5 above. 

The process for flashing any compatible Mellanox firmware version to a Mellanox of re-branded card is more difficult. This mainly follows link 3 mentioned above. Link 1 is good to read if you use Windows. Note: This may brick your card. Use at your own risk. 
  1. Download and install MFT (see link 5 above for download and guide)
  2. Command: mst start
  3. Figure out your device. There will be two. a pci_crX and a pciconfX or something like that. You want to use the crX device unless otherwise noted. Use whole path, e.g. /dev/mst/mt26428_pci_cr0.
    Command: mst status
  4. Save basic info such as GUIDs, MACs, etc.:
    Command: flint -d (device) query full > flint_query.txt
  5. Save low-level flash chip info:
    Command: flint -d (device) hw query > flint_hwinfo.txt
  6. Save existing firmware. This is very helpful if you have card that works with your system and some that do not:
    Command: flint -d (device)  ri orig_firmware.bin
  7. Save existing FW configuration:
    Command: flint -d (device) dc orig_firmware.ini
  8. Save existing PXE ROM image (if any- mine didn't have this):
    Command: flint -d (device)  rrom orig_rom.bin
  9. Save existing PCI VPD (vital product data):
    Command: mlxburn -d (device pciconfX)  -vpd > orig_vpd.txt
  10. Now things are a little different. The link 3 guide shows you how to burn your own .bin from a mlx and ini file. The mlx is a multi-adapter file. You have to create a .bin file specific for your adapter. The mlx files used to be available, but now only specific bin files are available. However, if you do manage to obtain a mlx, then you also need to obtain the ini file corresponding to your specific adapter and burn the .bin.
    Example command: mlxburn -fw fw-ConnectX3-rel.mlx -conf MCX312A-XCB_A2-A6.ini -wrimage mlnx_firmware.bin
  11. Once you have the .bin, then you need to verify it (bootable, all pass):
    Command: flint -i mlnx_firmware.bin verify
  12. Then you need to double check the firmware version and PSID.
    Command: flint -i mlnx_firmware.bin query full
  13. Finally, burn the new firmware image. If you are flashing an identical card that has a different PSID, e.g. flashing a Mellanox card that has been rebranded as HP with Mellanox firmware, then you need the -allow_psid_change flag. Otherwise, you do not need it.
    Command: flint -d (device)  -i mlnx_firmware.bin -allow_psid_change burn
  14. Reboot and run the query full command again to make sure the flash worked. Also verify that the PSID has changed if you were cross-brand flashing. 
The process for flashing a branded firmware version to the same brand card is a simplified version of the above. The major differences are that you will most likely have the bin file pre-made, and you do not need the -allow_psid_change flag since you are not changing the PSID. This is what I did to downgrade my 2.11.2012 cards to 2.11.2010. 

UPDATE:

If you have the same problem I have, where a newer firmware version is preventing your server from booting, it may have something to do with BAR-space. In post 12 of link 6, Andreas mentions what the Sun release notes say about firmware 2.11.2012. The only difference is that the BAR-space has been increased from 8MB to 128MB. He also quotes what to do to make Sun servers boot with the new firmware. Unfortunately, my motherboard's BIOS does not have these settings. They sound similar to something I came across with the Xeon Phi's, though. To get a Xeon Phi Coprocessor to work, you need a motherboard with something like "“above 4G decoding”, "large PCI MMIO", or “large BAR support”. I tried enabling "above 4G decoding" under PCIe configuration in my BIOS, and it worked! The 2.11.2012 firmware card allowed boot, and ibstat showed link up with rate 40 (QDR). So if your motherboard allows for the adjustment of BAR size, then try that before messing with firmware flashing.

From that same thread: We've determined that these Sun firmware versions (2.11.2010, 2.11.2012) are the latest. It sounds like they created the 2.11.2012 version specifically for motherboards that allow for BAR-space increases, and that 2.11.2010 should be used for all normal motherboards. If we ignore the special large BAR-space firmwares, then Sun, like Mellanox, does not provide more than 1 firmware version per card type. It's also likely that the Sun numbering scheme is the same as Mellanox's, meaning that the firmware they provide is much newer than the version Mellanox does (2.9.1000). Also interesting is that the newer MHQH29C has the same Mellanox firmware listed (2.9.1000). The latest one HP lists for their equivalent to the MHQH29B/C cards is 2.9.1530. Digging through IBM's release documentation for their proprietary updater, their equivalent card's latest firmware seems to be 2.9.1000. The latest Mellanox firmware used commonly here to enable RDMA for Windows on these cards is 2.10.720. So Sun/Oracle's seems to be the most recent for the MHQH29B. Interesting.

Moving forward: I'm going to do back-to-back performance testing between cards with 2012 firmware with above 4G decoding enabled, and between cards with 2010 firmware (no above 4G decoding). Then I will either upgrade or downgrade all of the cards so they are equivalent.

One thing is bothering me, though. My desktop does not support above 4G decoding. In fact, that's why I didn't try that setting in my SM BIOS in the first place. So why does a 2.11.2012 firmware card work in it? The only thing I can think of is that it inherently allows 128MB BAR support.

Updating the IMPI and BIOS of a Supermicro X9 Motherboard

I have a 4 node Supermicro 6027TR-HTR (motherboards: X9DRT-HF) server, and I thought I needed to update IMPI and the BIOS for it. The following is a process for doing that, but there are multiple ways.

Supermicro SMT AETN X9 (there are many different versions, I've only verified this process on mine) IMPI update: 
  1. Hook an ethernet cable up to the IMPI LAN port and connect it to your computer.
  2. Boot server into BIOS (hit "DEL") 
  3. Go to IMPI tab and note down the IP address info. If it's set to DHCP, set it to static, and enter in the IP, subnet, and gateway (make sure you use 3 digits for each entry, so add extra 0's). If you had to change the IP info, save changes and reset (reboot). 
  4. On your computer, setup the ethernet to work with the static IP info that you just noted down from the server. You'll need to set an IP on the same subnet, use the same subnet and gateway.
  5. Go to a browser on your computer and type in the static IP. The IMPI login screen should show up. Log in with the username and password. The default is ADMIN/ADMIN. 
  6. Now you should see the browser interface for your server. You can do lots of things with this, but the thing we want to do is update the IMPI firmware.
  7. Go to your supermicro motherboard's web page and download the most current SMT or IMPI firmware. In this zip, there should be instructions for your operating system. For mine, there was a word document with pictures of how to use the browser to update the firmware. The following steps are the text versions of this.
  8. First, check the "Firmware Revision". Mine was 2.26. The firmware folder I downloaded was SMT_X9_352, which means it's firmware version 3.52, so mine was definitely out of date.
  9. Go to Maintenance->Firmware Update. 
  10. Enter update mode.
  11. Browse to the downloaded firmware file. Mine was SMT_X9_352.bin. Upload.
  12. Click OK and upload firmware
  13. Uncheck the preserve configuration box. This is apparently important.
  14. Click start upgrade. 
  15. During this process, the browser will lose connection. The IMPI system will reboot, though not the server. The browser will not come back up, though, because the static IP was reset to DHCP. After a few minutes, shut down server manually.
  16. Go to Step 2. Repeat steps 3, 5, 6, and 8. You should see new firmware in the BIOS and in the Web-GUI. 
Now that IMPI is updated, you can update the BIOS. 
  1. Go to your motherboard's webpage and download the latest BIOS package. In mine, there was a text document with instructions. In that text document, there was a warning that IMPI firmware revision must be greater than 2.0 or higher before upgrading the BIOS. 
  2. I used RUFUS to create a DOS bootable USB device and copied over all of the files that came with the zip download.
  3. Boot the server with this USB drive. It should boot to a DOS prompt. Type "DIR" to make sure all of the bios files you copied over are there.
  4. Type "AMI.bat BIOSNAME.XXX" to start the BIOS Update. 
  5. When it is complete (you will get C:\> DOS prompt again), shutdown server, unplug AC, clear the CMOS (pull battery, short jumper, put battery back), plug in AC, power on.
  6. Go to BIOS, load default settings, save and reset.
Done.

Friday, May 11, 2018

Xeon Phi Co-processor Testing, part 2

I purchased ~140 7-series Xeon Phi Co-processors from a reseller who got them from a laboratory that was liquidating them. I tested them in a similar manner to part 1, but with an ASUS Z10PA-D8 instead of an HP DL380p server. Unfortunately, work stations don't have cooling. That's why I designed and tested the fan adapters, which are now for sale on eBay.

New single Phi adapter design. The blowers are a little longer,
but they're flatter than the axial fan single Phi adapters.

It took about 5 days to test them all, primarily because of how long it takes the ASUS computer to boot. I couldn't figure out how to make it boot faster unfortunately.

Ghetto test setup

By the end, I had the rhythm down:

1. Install power cables
2. Install fan adapter
3. Plug in Phi
4. Boot
5. Check lspci. If this fails, shutdown, put card in for parts box.
6. If pass lspci, run firmware update bash script. This has all the commands to update firmware and reboot.
7. Once rebooted, run post firmware update bash script. This starts mpss, runs miccheck, etc.
8. Wrap up card and put in working box.

I tried to test two at once with a dual Phi setup. It wasn't too difficult to get them installed with the fan adapter I designed, but handling two Phi's at once was unwieldy, so I went back to doing one at a time.

Some cards prevented computer power on, which was kind of scary...means there was a bad power fault somewhere. Some would be recognized by lspci, but would fail a firmware update...these usually had the F2 post code error, which is related to the memory system. I couldn't figure out how to fix that. A bunch just weren't recognized by lspci. But the majority were in good working condition, so I got lucky there.

Here's the homelabsales post and table if you're interested in buying one or more. I managed to sell about 40 of them in the ~1.5 weeks I had in FL, mostly broken ones to a museum in Oregon, haha. I put them in storage bins inside so they won't corrode in the FL humidity. I'll be listing them again a few months from now.

Friday, April 13, 2018

Venturi Flow Meter Testing, Part 3

I installed the new venturi meter and re-ran the previous tests. As a reminder, the goal is to determine the flow rate and pressure drop in the fan adapters so that fans can be selected. The previous venturi had a very high pressure drop across it. My guess is that it's contraction ratio was too high and the flow was detaching in the diffuser section. I designed a new one with a smaller contraction ratio. The printer is/was still having issues when I printed it, so I had to sand the crap out of it.

Anyways, the results are better than last time, but still not good enough. The fan simply isn't developing enough flow rate. I was about 12% short according to the venturi flow meter. This made me go back and have a look at the fan curve. Turns out that, at the pressure the fan is developing (according to the fan adapter's manometer referenced to ambient), the fan should be producing about 2-3 times as much flow rate as I'm measuring with the venturi flow meter. Unfortunately, I have no way of knowing which is wrong. Either the fan is producing far less flow rate and/or pressure than it is supposed to, or the manometers are reading incorrectly. The manometers are super simple devices...the only way they could be wrong is if the static pressure taps aren't actually measuring pure static pressure, which is certainly possible since I have no way of visualizing the flow inside these parts. I think it's more likely that the fan is simply not following its pressure vs. flow rate curve. I checked the input voltage, and it's right around 12.1V when the fan is connected directly (no controller). 12V is the rated voltage and what the curve was generated with. There's some loss through the PWM controller, which is why I've been running the tests with and with out it. So if it's not a power issue, maybe it's a blockage issue. The single phi adapter has a pretty severe contraction in it...perhaps that's simply blocking some of the area. In fact, looking at it from the fan side, it looks like about 1/2 to 2/3 of the area is taken by the contraction, so that might make sense.

I printed and tested a longer version (+30mm) of the single phi fan adapter. Lengthening it makes the contraction less severe. However, this one actually resulted in a lower flow rate, meaning it had a higher deltaP than the short single phi fan adapter. This still doesn't make a lot of sense to me. The larger surface area increases viscous drag loss, but the less severe contraction should have made up for it. Perhaps the contraction is still too severe. It's not worth making it much longer because a blower type fan will be more compact, then, and if these flow rate numbers are right, then possibly more efficient as well. I didn't bother testing this one any more.

Next, I hooked up the mock phi to the single phi fan adapter. The goal of these tests was to tune the restriction plate holes so that, at the same throttle, the pressure reading at the fan adapter tap referenced to ambient (which is also the pressure developed by the fan and the pressure drop across whatever comes after the adapter) was the same as the fan adapter + real phi + pci bracket (no venturi). I had estimated the hole size and number of holes using thin perforated plate theory, so I expected to be close. Turns out it was exactly right (within measurement ability). Nice.

Then I attached the mock phi and the real phi to the dual phi fan adapter, and ran some more tests.

Dual Phi adapter, mock Phi, real Phi with exit adapter
With the real phi in the bottom position, the fan was pushing about 73% of the required flow rate according to the venturi. With the real phi in the top position, the fan was pushing about 71% of the required flow rate. (pretty good balance for first try...will tune it). However, that's through one Phi. Remember that, with the single Phi adapter, it was pushing about 88% of required flow rate. So with the dual Phi adapter, the fan is actually pushing about 70% more flow rate than with the single Phi. This partially supports my blockage theory. However, it's still about 50% under what the fan should be pushing at that pressure. Looking at the fan side of the dual phi adapter, it looks like maybe 1/3 of it is blocked, so maybe it's still a good theory. I tried taking the fan grill off the back, but that didn't seem to make any measurable difference. Taking the venturi off, the fan adapter pressure corresponds to a flow rate of about 180 m^3/hr on the fan curve...no way it's hitting that.

Now, I did not tune the mock phi for the venturi attached, so I probably should have made the mock phi more restrictive so it matches the real phi+venturi. However, if I completely taped off all of the mock phi exits (100% restrictive), then the flow rate in the real Phi only increases about 5-10%, which is still way under the flow rate required. This also supports my fan blockage theory.

I tried taping off half of the bottom half of a single phi fan adapter, and installing it with the real phi and venturi. If the flow rate was the same as with no tape, then 100% of this problem is likely blockage. However, the flow rate dropped by almost half...which actually makes sense, but not given the other results of these tests. Maybe the tape being right at the exit plane is causing more severe blade stall than the adapter with no tape.

Anyways, either the pressure taps are reading some dynamic pressure or the fan is operating below its published pressure vs flow rate curve. I need to figure out which.

Assuming that blockage is the cause of the fan not operating on it's pressure vs. flow rate curve, how can I improve the efficiency of these adapters? Intuitively, smoother transitions/contractions should help, but as the extended single phi tests showed, they do not. Perhaps if the adapters were made much longer, and the area changes very gradual, then it would help, but there isn't room for that in a desktop computer, and the blower fans, which already have a compact exit area, would be better then. When I was doing initial fan selection, I assumed that the fan adapters would be fairly efficient. That may not actually be the case, particularly for a 80x80mm axial fan to a single phi. Assuming the fan is under performing, the efficiency seems to be ok for a dual Phi.

I can do a fairly easy check by hooking the fan directly up to the venturi. I was too impatient to wait for something to print, so I created a long paper cone to adapt from the fan to the tube before the venturi. Lots of tape for sealing.


It should be good enough...essentially no blockage. The result at full power was about 82 m^3/hr according to the venturi, which is about the same as with the dual Phi card setup. This kind of makes sense because only half of the flow is going through the lossier tubes+venturi with the dual Phi setup, and all of it is going through the tubes + venturi here (higher velocity = higher dP), which just happens to result in a deltaP similar to the dual Phi setup. I poked/cut a small hole in the side of the paper near the fan and put the other manometer tube in, leaving the other end ambient. Wiggling it around some (making sure it's not pointed into the flow), this gave me a pressure of about 186 Pa, which corresponds to about 162 m^3/hr on the fan curve. With the adapter and single Phi at this pressure, the flow rate is about 50 m^3/hr, so perhaps blockage is partially to blame, but clearly there is something else going on. Either the venturi is reading about 1/2 the flow rate, or the fan is outputting about half of its spec flow rate. I doubled checked all of my math...I'm multiplying the manometer height change by 2, the venturi equation and conversion factors are all correct, etc. Without the venturi the pressure is about 155 Pa, which corresponds to about 170 m^3/hr on the fan curve. So the venturi seems to be fairly low loss, which is good. But what is causing the discrepancy between the fan curve and what I'm measuring?

I'd need a hot wire anemometer or a pitot tube rake in order to measure velocity directly, so that's out. I can do a static stall test, though. I took the venturi off and taped off the end of the tube that leads to the venturi. I also added more tape to fan -paper cone interface, taped off the hole I cut there, and checked for any other holes. I used the pressure tap in the end of the tube (the original venturi high pressure port) to measure static pressure, with the other end open to ambient. I got about 470 Pa. I felt around for any leaks...there probably were some very minor ones, but wrapping my hands around various parts didn't move the manometer more than about 0.5mm. Besides, the published spec is 607 Pa...my error would have to be huge to be off by that much. Plus, the static pressure tap is basically a perfect pressure tap at this point because the flow rate over it at the end of the sealed tube is 0. This test lends support behind the motor under performing theory. In fact, some of the measurement points seem to fall just below the next fan in the series' curve, so perhaps the variant Nidec made for supermicro is actually closer to that variant. That said, the current drawn at 12 V agrees more with the first variant...

Anyways, I think my static ports and manometers are probably fine. For whatever reason, the fan is under performing by about half. Blockage from the Phi adapters makes this problem worse, but I've already minimized that, so there's not much more I can do about that. I'll take another look at blowers for single Phi setups. IIRC, the smallest blower that could meet the required flow rate and pressure drew about 18W, which is right around what this Nidec axial fan is drawing now.

Another thing worth mentioning: This required flow rate is worst case scenario, which is 45 C inlet temperature and running the Phi at full power. Properly cooled desktops will not have 45 C air inside them. The measured flow rates achieved with the current dual Phi setup are good for ~39 C inlet air. The measured flow rate achieved with the current single Phi setup is good for ~43 C inlet air. I'm really close to where I wanted to be. These adapters should work fine.

The test goals were partially met. I was able to estimate the pressure drop in the adapter + pci bracket for the single Phi setup. Since my measurement error is fixed, the dual phi setup had relatively more error for this because the dual adapter's dP is lower than the single phi adapter. However, I was unable to determine primary vs. secondary flow due to all of the holes in the secondary flow path of the Phi. I've simply scaled the area of the secondary flow inlet assuming a uniform velocity profile, which is not a good assumption, but it's the best I can do. Overall, I can size fans based on these results. However, this is all based on the partially supported but unconfirmed assumption that the fan I was using was under performing significantly, and that I didn't have another test setup error. If I had a test setup error and the fan was performing correctly, then it means all of my flow rate measurements were low, which means that this fan and these adapters are more than sufficient. I need to try another fan to see if it's just this one under performing. If that one also under performs, then either it's common to embellish the published fan curves and I need to add some serious safety factors to the fan sizing formula, or both fans are actually correct and I had some unknown measurement error that gave extremely conservative results.

Actual Phi testing to commence next week...