Another annoying thing: I still have a ghost boot loader for windows on the NVMe...I've deleted it twice, but it seems to still be there somehow. I also can't seem to change the boot order regardless of how I order it in the BIOS...windows is always first.
A few more hardware changes for the slave nodes. I had 4 different sets of RAM. I purchased and sold some, so now I have 24x of one type and 8x of another. My goal is to eventually have uniform (32x) RAM in all of the compute nodes.
Software wise, I left off last time with a working Slurm installation. I got it working with OpenMPI and PMI2, but I couldn't get OpenMPI's internal PMIx working with Slurm. I filed a bug report about this, but it's a very low priority since I don't have a support contract, so it will likely never be looked at. Another problem, for which I did not file a bug report, is that slurmd on the slave nodes does not seem to be honoring the srun port range setting in the conf file. This caused me to have to whitelist the entire private subnet instead of being able to open certain ports. I went back and closed the ports I had opened for slurm.
Now that all of the software is finalized, it's time to clone the slave node drive 3x times. This could be avoided with PXE diskless booting, but that looks like it will be a huge pain to setup, and it will take up a lot of RAM if I ever decide to put a commercial CFD program on this cluster. Unfortunately, cloning the drive ended up being a huge pain, too. I have 3 different types of 120 or 128GB SSDs, and I installed everything on the largest. This is bad because now I can't use "dd" to clone the drive to the smaller drives. I tried clonezilla because it has an auto-resize advanced setting, but it failed (I doubt it ever works). If you don't have identical drives for your slave nodes, install everything on the smallest before cloning. I updated the software part 1 instructions with this information. What's even worse: the default CentOS 7 file system is XFS, which is not shrinkable, so I can't just shrink the home partition and logical volume. *slams head on desk repeatedly*. I'm beginning to expect shit like this to happen.
Note, none of the following ended up working. If you're in a similar situation, you're better off just starting from scratch on the smaller drive.
I really only need to shrink the home partition, which doesn't have much on it because I'm mounting the headnode's home folder via NFS. If you're in this situation, then you must do the following:
Note, none of the following ended up working. If you're in a similar situation, you're better off just starting from scratch on the smaller drive.
I really only need to shrink the home partition, which doesn't have much on it because I'm mounting the headnode's home folder via NFS. If you're in this situation, then you must do the following:
- attach another drive (can be a large USB)
- copy all of the /home files to it
- lvs should show the logical volumes on your drive
- umount /home
- lvchange -an /dev/centos/home
- lvremove /dev/centos/home
- lvs should now not show the "home" logical volume
- Create the new home logical volume in the centos volume group. Use a size that results in a total disk size a few G smaller than the smallest disk you have.: lvcreate -L 40G -n home centos
- Create the xfs (or whatever you want) for the new home logical volume.
- mount /dev/centos/home /home
- Copy the files back from the other drive to the /home directory
The next step was to resize the physical volume. However, the free space ended up in the middle of the volume (between home and root), which meant I couldn't resize it. I also couldn't move the root part of the physical volume using pvmove because you can't overlap a volume move with itself. Useful link. If I had an extra 50 gb (size of root) of space, I could move the root part to that, then move it again to take up the current free space + some of the old root space, leaving the free space at the end of the volume, but I can't do that because I don't have the space. So, plan C: remove and recreate the root lv as above (this will move free space to end), then shrink the physical volume. Unfortunately, this requires using a liveCD to boot because you can't unmount root while booted. So I created a liveusb using dd and the live KDE image of CentOS. I then booted the node with that, accessed a terminal as root. Repeated the above steps for root, except don't mount it yet. Then:
- pvs -v --segments /dev/sda2 (this should now show all of the free space at the end, last line)
- pvresize --setphysicalvolumesize 102G /dev/sda2 (the size should be smaller than the available space on the smallest ssd you have, but make sure only cutting into free space)
- If the above completed successfully, run step 1 again, and you should see less free space at the end
- vgs and pvs should show smaller volume sizes now
The plan was then to shrink the sda2 partition, mount root somewhere, mount the usb drive I saved everything from root on, and copy everything back (note: don't need stuff in sys, tmp, or proc). However, I couldn't figure out how to shrink sda2. I tried booting to the drive, which sort of worked, but the whole permission structure of the filesystem is fucked, probably from the cp's. So yeah...going to have to reinstall EVERYTHING just because I didn't install it on the smaller drive, so I couldn't clone it. Damn this sucks.
No comments:
Post a Comment