Disaster Recovery Testing — 9

Disaster recovery testing continues with rebuilding both office desktop hard disks.

Unlike a real disaster that would happen unexpectedly and unplanned, I performed a regular weekly backup and updated the clone disk backups. Unlike an actual disaster this provided snapshots of the office desktop that are as recent as possible. In a real disaster the clone disks would be unavailable. In a real disaster with the weekly backups, up to one week of data and file changes would be lost. As this was a test, updating everything would help with comparing the final restoration.

I bypassed simulating a real disaster and built the replacement disks using the office system removable disk bays.

With the sfdisk dump files and two 1 TB disks, I created the disk partition layouts. I formatted the partitions. With additional dump files I restored disk boot partitions.

After restoring files from the most recent weekly backup I rebooted into runlevel 1. The disks booted, but with some fstab related mount errors.

The reason was missing directories and files. To conserve disk space there are directories and files intentionally excluded from the weekly backups. I expected nominal holes with this part of the backup strategy. While conserving disk space is reasonable, I reconsidered this decision because of the overhead with restoring files and outright missing files. I updated the exclusion lists to allow creating all directories but not respective files.

The only dependable way to test restoring from the weekly backups was to perform another weekly backup with the updated exclusion lists and again restore files. I updated notes and the recovery check list. Not wanting to disrupt the normal weekly backup cycle I waited a week before again testing.

With the updated exclusion lists the results improved notably and there were no related boot errors.

With the updated weekly backup I compared the restored files on the rebuilt disks to the clone backup disks. There were no surprises.

I did not test further. In a real world disaster recovery the rebuilt disks would be used for several days or weeks before noticing further problems. During such a period I presume there would be some additional breakage, although probably not much.

Also needed is restoring all remaining excluded files from external sources.

While rebuilding the disks requires a couple of hours, in all I am pleased with how the exercise unfolded.

Posted: Category: Usability Tagged: General

Next: Disaster Recovery Testing — 10

Previous: Disaster Recovery Testing — 8