Disaster Recovery Testing — 5

The living room media player was next in the ongoing disaster recovery testing.

This system streams media files stored on the LAN file server. Once upon a time the device supported a web browser for online streaming and web surfing, but that was some years ago when the system was a full home theater PC system. These days the system is a media player only and is treated as an appliance.

Life in the house would continue without serious disruption should the media player fail. Nominally inconvenient perhaps, but not a life stopper.

The mainboard model no longer is available for replacement. A full failure means finding new hardware. Two old systems sitting in the office corner could suffice temporarily, albeit with one notable irritating caveat.

Both test systems have Nvidia video controllers. I really hate that chicken using Nvidia.

For smooth video performance — more or less required in a media player — the proprietary drivers would have to be installed. The cheeks of my backside tighten with the thought.

Fortunately there is another spare computer, with Intel video, an HP dc7900.

There are two disaster possibilities to test. One is a mainboard failure with a surviving hard disk and the other is a disk failure.

Testing a surviving disk is straightforward. In a real mainboard failure the surviving hard disk would be moved to a new system. Not wanting to dismantle the media player I decided to simulate both a hard disk and mainboard failure.

Unlike the office desktop there are no concurrent clone disks. The replacement disk must be built from scratch with backups.

To rebuild a failed disk on any home LAN system, several files containing disk partition information are maintained. There are two text files, one from fdisk -l and the other from lsblk -f. A third partition layout file is a binary sgdisk dump. Along with those files are an image copy of the first 512 bytes of the disk and a copy of the GPT BIOS boot partition. One way or another recreating the same partition layout on a new disk should be straightforward.

The spare test disk was not a like-for-like model and a larger disk.

The first step is restoring the sgdisk dump. The dump file is created using sgdisk --backup=$file /dev/sdX. Restoring the partition layout is performed with sgdisk -g --load-backup=$file /dev/sdX.

Executing the command restored the partition layout just like the original disk. The remainder of the disk was unused space. So far so good.

The sgdisk dump file only restores the partition layout. Next is formatting the partitions and assigning the original UUID to each partition. The lsblk -f partition layout file provides that information.

The media player is backed up to the office desktop backup disk. The backups are stored in a single partition. For redundancy that partition is copied to the weekly backup disks. Those backup files are one big file system.

With the backup files being stored in a single backup partition using only a single directory tree, copying the backup files to the appropriate partitions is a manual process although straightforward.

Testing this restoration went well. The test disk ran fine on the spare HP dc7900 and spare hard disk.

These restoration steps could be scripted.

As with the office desktop, prudence calls for researching replacement hardware sooner rather than waiting until an actual failure.

Posted: Category: Usability Tagged: General

Next: The Pain of Updating Windows

Previous: Disaster Recovery Testing — 4