Search This Blog

Monday, May 14, 2018

Flashing Rebranded Mellanox Infiniband Cards and Homelab Infiniband Part 2

When I last left off with this, I said I was going to purchase 4x more identical Sun QDR HCAs because I knew they worked with my Sun switch. So I did that. Specifically, I bought Sun/Oracle X4242A 375-3696-01 Rev. 51 Dual Port QDR Infiniband HCAs,which according to my research, are equivalent to a Mellanox MHQH29B. Turns out the cards had little stickers on them saying they were MHQH29B-XSR rev A3's, so my research was right.

This should be easy, right? I mean, they work with my desktop and the Sun switch at QDR speeds, what could go wrong? Hahaha...nope. Time for another installment of Infiniband Nightmares.

As a reminder, the server I want to put them in is a 4 node Supermicro 6027TR-HTR (motherboards: X9DRT-HF) server. I installed them all, but 3/4 caused boot to hang at post code 91, which is when the PCI stuff is loaded. Shit. The fourth one seems to let the node boot fine, though. Switching nodes/pci slots doesn't help, and I know all the pci slots work fine because I had the QLE7340's in them. "OK, so you have 3 dead cards."...except not. Here's the weird part: they all boot fine in my desktop (I7-5960x, X99-SLI motherboard) and are recognized by lspci and ibstat.

I thought it might be a BIOS problem, so I upgraded IMPI and the BIOS of one of the nodes. Unfortunately, that didn't help. 

I also tried a bunch of different BIOS settings, none of which helped (UPDATE: not true anymore, see bottom of this post). I tried taping the PCIe SMBus pins. Also didn't help. 

One last thing to try: card firmware. The only differences between the card that allowed boot and the 3/4 that did not are as follows:
  • GUIDs, MACs, Serial numbers (all duhs. Those will be different for every card)
  • Firmware Version. 2.11.2012 vs 2.11.2010
  • MIC Version: 1.5.0 vs. 1.2.0 (same order)
    • not sure what this
  • One line in the .ini files: "log2_uar_bar_megabytes = 7" vs. "sriov_en=true" (same order)
    • not sure what this does
  • The 2.11.2012 cards' firmware .bin file is about 12% larger. It's binary, so can't examine it.
The device ID (26428), PSID (SUN0170000009), HW revision, flint hardware info, mlxburn vpd (minus serial number), are all the same.

I did a ton of research about flashing firmware to rebranded Mellanox ConnectX-2 and ConnectX-3 HCAs. This relevant information for various HP, Dell, IBM, and Sun branded Mellanox cards. Here is a list of the links I found most useful for writing the following guides:
  • 1
  • 2 Look for post by TeeJayHoward. His website has some of the MHQH19B and MHQH29B firmware files. 
  • 3 Look for post by izx
  • 4 
  • 5 Mellanox guide
  • 6 Post I started to deal with this problem
  • 7 Connectx3 firmware post.
Potentially useful files:
  • Most of TeeJayHoward's relevant files mirrored
  • Sun 2.11.2010 firmware
  • Sun 2.11.2012 firmware
  • Mellanox MFT 4.9 (in case they take it down for some dumb reason)
  • CX3 firmware mirror.
  • See Mellanox's firmware download site for what they consider current (the available firmware probably is not actually current).
Unfortunately, Mellanox has taken down their custom firmware table, so the mlx files are no longer available. Thus, you're stuck with whatever bin Mellanox provides you on their firmware download pages. HP offers free downloads of their firmware through their firmware website. Not sure about Dell. Sun and IBM's firmwares are locked by behind expensive support contracts. Also unfortunately, the official Mellanox firmware isn't always the most updated. For example, for the MHQH29B, the firmware revision for download is 2.9.1000, which is actually pretty old. It's so old that you can't use RDMA with Windows. If you want to change the firmware of your HCA, you're left with only a few options:
  1. Download the Mellanox firmware bin for your specific card from their firmware download page and burn it to your Mellanox card. This will be a .bin file that only works with one specific PSID. You can burn this to a rebranded Mellanox card using the process described below.
  2. Hope somebody downloaded the custom newer firmware for your card from Mellanox's table and that they're hosting those files somewhere. For example, in link 2 above, TeeJayHoward is hosting the 2.10.720 firmware for all revisions of MHQH19B and MHQH29B's. You have to build and burn the firmware (see process below).
  3. Find the firmware you want from your brand, e.g. HP. They might offer a newer version of the firmware, and they might not. Even with a support contract, they likely cannot provide you a different firmware version. 
  4.  Find the branded firmware from someone else hosting it. For example, the 2.11.2010 sun firmware from a different server is hosted in the last post of this thread. This is exceedingly rare.
  5. Transfer the firmware from one card to another (see process below). This requires you to be lucky enough to already have a card with a working firmware version. This is what I ended up doing.
  6. Buy newer cards (what all of the companies want you to do)
There really isn't much else you can do. 

The process for flashing Mellanox firmware from the Mellanox firmware website to Mellanox cards is straightforward and explained in link 5 above. 

The process for flashing any compatible Mellanox firmware version to a Mellanox of re-branded card is more difficult. This mainly follows link 3 mentioned above. Link 1 is good to read if you use Windows. Note: This may brick your card. Use at your own risk. 
  1. Download and install MFT (see link 5 above for download and guide)
  2. Command: mst start
  3. Figure out your device. There will be two. a pci_crX and a pciconfX or something like that. You want to use the crX device unless otherwise noted. Use whole path, e.g. /dev/mst/mt26428_pci_cr0.
    Command: mst status
  4. Save basic info such as GUIDs, MACs, etc.:
    Command: flint -d (device) query full > flint_query.txt
  5. Save low-level flash chip info:
    Command: flint -d (device) hw query > flint_hwinfo.txt
  6. Save existing firmware. This is very helpful if you have card that works with your system and some that do not:
    Command: flint -d (device)  ri orig_firmware.bin
  7. Save existing FW configuration:
    Command: flint -d (device) dc orig_firmware.ini
  8. Save existing PXE ROM image (if any- mine didn't have this):
    Command: flint -d (device)  rrom orig_rom.bin
  9. Save existing PCI VPD (vital product data):
    Command: mlxburn -d (device pciconfX)  -vpd > orig_vpd.txt
  10. Now things are a little different. The link 3 guide shows you how to burn your own .bin from a mlx and ini file. The mlx is a multi-adapter file. You have to create a .bin file specific for your adapter. The mlx files used to be available, but now only specific bin files are available. However, if you do manage to obtain a mlx, then you also need to obtain the ini file corresponding to your specific adapter and burn the .bin.
    Example command: mlxburn -fw fw-ConnectX3-rel.mlx -conf MCX312A-XCB_A2-A6.ini -wrimage mlnx_firmware.bin
  11. Once you have the .bin, then you need to verify it (bootable, all pass):
    Command: flint -i mlnx_firmware.bin verify
  12. Then you need to double check the firmware version and PSID.
    Command: flint -i mlnx_firmware.bin query full
  13. Finally, burn the new firmware image. If you are flashing an identical card that has a different PSID, e.g. flashing a Mellanox card that has been rebranded as HP with Mellanox firmware, then you need the -allow_psid_change flag. Otherwise, you do not need it.
    Command: flint -d (device)  -i mlnx_firmware.bin -allow_psid_change burn
  14. Reboot and run the query full command again to make sure the flash worked. Also verify that the PSID has changed if you were cross-brand flashing. 
The process for flashing a branded firmware version to the same brand card is a simplified version of the above. The major differences are that you will most likely have the bin file pre-made, and you do not need the -allow_psid_change flag since you are not changing the PSID. This is what I did to downgrade my 2.11.2012 cards to 2.11.2010. 

UPDATE:

If you have the same problem I have, where a newer firmware version is preventing your server from booting, it may have something to do with BAR-space. In post 12 of link 6, Andreas mentions what the Sun release notes say about firmware 2.11.2012. The only difference is that the BAR-space has been increased from 8MB to 128MB. He also quotes what to do to make Sun servers boot with the new firmware. Unfortunately, my motherboard's BIOS does not have these settings. They sound similar to something I came across with the Xeon Phi's, though. To get a Xeon Phi Coprocessor to work, you need a motherboard with something like "“above 4G decoding”, "large PCI MMIO", or “large BAR support”. I tried enabling "above 4G decoding" under PCIe configuration in my BIOS, and it worked! The 2.11.2012 firmware card allowed boot, and ibstat showed link up with rate 40 (QDR). So if your motherboard allows for the adjustment of BAR size, then try that before messing with firmware flashing.

From that same thread: We've determined that these Sun firmware versions (2.11.2010, 2.11.2012) are the latest. It sounds like they created the 2.11.2012 version specifically for motherboards that allow for BAR-space increases, and that 2.11.2010 should be used for all normal motherboards. If we ignore the special large BAR-space firmwares, then Sun, like Mellanox, does not provide more than 1 firmware version per card type. It's also likely that the Sun numbering scheme is the same as Mellanox's, meaning that the firmware they provide is much newer than the version Mellanox does (2.9.1000). Also interesting is that the newer MHQH29C has the same Mellanox firmware listed (2.9.1000). The latest one HP lists for their equivalent to the MHQH29B/C cards is 2.9.1530. Digging through IBM's release documentation for their proprietary updater, their equivalent card's latest firmware seems to be 2.9.1000. The latest Mellanox firmware used commonly here to enable RDMA for Windows on these cards is 2.10.720. So Sun/Oracle's seems to be the most recent for the MHQH29B. Interesting.

Moving forward: I'm going to do back-to-back performance testing between cards with 2012 firmware with above 4G decoding enabled, and between cards with 2010 firmware (no above 4G decoding). Then I will either upgrade or downgrade all of the cards so they are equivalent.

One thing is bothering me, though. My desktop does not support above 4G decoding. In fact, that's why I didn't try that setting in my SM BIOS in the first place. So why does a 2.11.2012 firmware card work in it? The only thing I can think of is that it inherently allows 128MB BAR support.

6 comments:

  1. Can't get any of the .bin files to verify. The original backup files verify fine.

    ReplyDelete
  2. Would like to attempt this as well. Please let me know if there was an update on this.

    ReplyDelete
    Replies
    1. Not sure if there is anything new, I haven't touched this stuff in awhile. The processes described in this post worked for me though.

      Delete
  3. Hello, Jed!
    I understand that a lot of time has passed since the publication of the article, but I would like to describe the problem that I encountered.
    When trying to save the firmware of the network card, I constantly encounter an error like:
    Failed to identify the device - Can't create SignatureManager!

    If you have the opportunity, can you tell me what can be done about it.

    ReplyDelete
    Replies
    1. Hmm...I don't think I came across that problem, so I'm not sure what's wrong.

      Delete
    2. Well, that's sad.
      In any case, thanks for answering the question as it is.
      If I manage to fix it, I will add below how I managed to do it.

      Delete