Disaster Recovery Testing — 1

Generally, unlike proprietary operating systems, moving hard disks from one Linux system to another usually is seamless and presents few disruptions. Nonetheless I wanted to test some disaster recovery with the home LAN.

The most critical system is the office desktop that also is the home LAN file server and hosts all virtual machines.

I have a decent backup strategy. The first layer of backups is hourly backups to a second internal disk. I test those backups several times each week. Sometimes to compare changes made to files and sometimes to restore fat finger mistakes.

The second backup layer includes cloned disks of that system that are synced every three days.

One possible disaster is losing one of the internal disks. I annually test this scenario. The test is straightforward — shut down the office desktop and replace the disks. In a real disaster with the second internal disk I would lose three days of backups but not lose data files because the data are stored on the first disk. If the first disk failed then I could restore files from the second internal disk backups and lose only data changes made in the most recent hour.

The next disaster recovery test is losing the office desktop, including both internal disks. I wanted to test that disaster.

I have two older computers that could be used as temporary relief. At one time both systems were active in the home LAN. Both systems now are nominal test systems, mostly collecting dust in the office.

One system is an Asus M3N78-EM motherboard with a GeForce 8300 GPU, AMD 5050e 2.6 GHz Dual Core CPU, and 8 GB of DDR2 RAM. The other system is an ASRock N68C-GS4 FX motherboard with a GeForce 7025 GPU, AMD BE-2400 2.3 GHz Dual Core CPU, and 4 GB of DDR2 RAM.

I installed the clone disks into the Asus M3N78-EM clunker.

Oops. First replace the BIOS CMOS battery.

I booted into runlevel 1, deleted /etc/udev/rules.d/70-persistent-net.rules, and rebooted. The network controller initialized as expected.

The kernel did not toggle into framebuffer mode. I found a remnant block list file in /etc/modprobe.d for the nouveau driver. That the file exists is not an issue as these days I avoid motherboards with Nvidia video chips, which is why I had not noticed the block file. I deleted the file and rebooted.

I was greeted with a useless boot message:

Spectre V2 : Spectre mitigation: LFENCE not serializing, switching to generic repoline

Rebuilding the initrd did not help. Adding nospectre_v2 to the boot configuration stopped the message. Why did I see this message with an AMD CPU? Digging around the web indicates certain AMD CPUs do use the lfence instruction.

There was no /dev/sr0 device node for the optical disk and no /dev/fd0 for the floppy disk.

The optical drive manual eject button functioned as expected indicating available power. The optical drive is IDE/PATA. I wondered if that contributed to the mystery.

The test system does not have a floppy disk drive but has a motherboard floppy disk controller. Many years have elapsed since using a floppy disk drive that I pondered how the system is supposed to behave.

The lack of both device nodes are not show stoppers for a temporary replacement system. Curious, I connected an empty IDE/PATA hard disk and booted with a live USB stick. Both device nodes appeared. Likewise thereafter rebooting with the clone disks. Curious.

Thereafter I was greeted with another useless message:

blk_update_request: I/O error, dev fd0, sector 0

Adding blacklist floppy had no effect. Using a sledge hammer and disabling the drive in the BIOS resolved the nuisance.

I repeated the tests with the ASRock with the same results and remedies.

One component I did not test is the two digital TV capture cards. I am not pulling apart the office desktop to test. One card is PCI and the other is PCIe. In an temporary emergency I could get by with only one card. At this stage I presume one or both cards will function as expected.

Some nominal tasks would remain in an actual emergency, such updating Conky configurations.

Both systems are notably slower than the office desktop. With only dual core CPUs, running virtual machines would be slow too.

The big caveat is I really hate that chicken Nvidia.

One short-term option to avoid proprietary video drivers is using the living room media player. That system has a Biostar NM70I-1037U motherboard with on-board Intel video, an Intel Celeron 1037U 1.8 GHz Dual Core CPU, and 4 GB of DDR3 RAM. Of course that would mean no living room video streaming for the duration.

A decent test. I should be able to limp through several days while replacing a failed office desktop. I should amend my disaster recovery plan to include researching replacement hardware at least annually rather than wait until an actual full office desktop failure.

Posted: Category: Usability Tagged: General

Next: Disaster Recovery Testing — 2

Previous: Desktop Calendar