Impactcore: Linux notes for a Systems Administrator: Rescue a CentOS 7 system with a deleted /boot directory

HOW TO: Your /boot directory is missing or deleted on CentOS 7, you can't boot! Imagine this type of situation happening on a real production system. It is unlikely to happen, but it is always good to know how to recover from such a disastrous failure. Even the most resilient systems can have storage failures. Here's an article on silent corruption: http://perspectives.mvdirona.com/2012/02/observations-on-errors-corrections-trust-of-dependent-systems/

This scenario is based on several tests I've performed on KVM based virtual machines.

Boot the system with a rescue DVD (or ISO for a VM).

At the CentOS 7 boot CD prompt, choose "Troubleshooting" and "Rescue a CentOS system". Next choose "Continue" to allow the rescue environment to mount the machine's file systems under "/mnt/sysimage".

At the prompt you will be in a shell loaded by the boot CD.

sh-4.2# ...

Since we want to work directly with the broken system we will chroot to the mounted FS.

# chroot /mnt/sysimage

Check the state of the boot directory:

# ls -la /boot

At this point if the boot partition was corrupted you could run either parted, gdisk or fdisk to recreate the partition. You could also run fsck to run a filesystem check.

In my case /boot was fine, but empty.

HOW TO FIX A MISSING KERNEL:

Now we need to re-install the kernel... However, the kernel version installed is later than the one on the installation CD. There are several things we can do at this point, but I will outline two:

Install the old kernel from the CD or,
Start the network and install the latest kernel from yum.

NOTE: Once the kernel is reinstalled through RPM or YUM, the installation triggers dracut which re-generates the necessary initramfs files.

RE-INSTALL THE KERNEL FROM THE BOOT CD: (skip if you want to use yum and the network)

Mount the boot CD to the /run/install/repo directory:

# mount /dev/sr0 /run/install/repo

# rpm -ivh --force /run/install/repo/Packages/kernel-<...version and arch...>

RE-INSTALL THE KERNEL FROM THE NETWORK: (skip if you re-installed the kernel from the CD already)

Luckily the network configuration is sound so we can simply start the network device and use yum to reinstall the kernel:

# service network start

Run a yum clean all just in case.

# yum clean all

Reinstall the kernel.

# yum reinstall kernel

RE-INSTALL GRUB:

Run ls -la /boot to verify the /boot directory and you should see the new kernel and associated files listed. Most of the /boot directory's missing files and directories will be created. One key portion that will be missing is Grub2.

# ls -la /boot

So we now need to reinstall grub2 and to recreate the configuration. This process is fairly simple.

Install grub2 on /dev/ -- in my case on a KVM it's /dev/vda

# grub2-install /dev/vda

If no errors were reported, you are ready to reconfigure grub (otherwise you'll need to troubleshoot why you can't write to your device.):

# grub2-mkconfig -o /etc/grub2.cfg

(While the real grub2.cfg file is actually in the /boot/ partition, /etc/grub2.cfg is a symlink and easier to reference - especially if you are using UEFI. If you are using UEFI the grub2.cfg filename is actually /etc/grub2-efi.cfg -> ../boot/efi/EFI/centos/grub.cfg)

Next since you are in a chroot shell you need to exit before you can reboot:

# exit

# reboot

In theory your system should now be able to boot just fine, but the SeLinux relabeling will have been triggered and may take some time to complete. Once done your system will reboot automatically one more time.

If you had multiple kernels installed, but chose to fix this system by installing the base one from the CD, you can install your version again by running:

# yum reinstall kernel-

If you don't know which kernels you had previously installed, you can get the version from the rpm query command:

# rpm -q kernel

There you are...

-----

There are other steps we could have taken to restore / install a kernel, however in general they are quite similar. Mainly, the differences would be where to get the Kernel RPM from. Since the version originally installed on the system can be different from the ones available on media or through yum, it may sometimes be necessary to download a specific kernel and install it manually.

One could even re-compile the kernel but its probably not such a great idea if we are working on a production server. The main problem is that it would require downloading all the sources and headers required, as well as compilation tools. Due to security concerns, it would be best not to install compilation tools on a production server as they could be used to gain elevated privileges in the event of a limited intrusion.

Impactcore: Linux notes for a Systems Administrator

Wednesday, September 13, 2017

Rescue a CentOS 7 system with a deleted /boot directory

No comments:

Post a Comment