Disaster Recovery Testing — Lessons Learned

There are some lessons learned after testing disaster recovery scenarios.

One deficiency was not having a formal disaster recovery plan or check list. While having sufficient skills and knowledge to rebuild disks and restore files from backups, a plan or check list reduces stress and unforeseen and forgotten speed bumps. The basic idea of a disaster recovery plan is to restore operations and production as efficiently and quickly as practical. The original disaster recovery notes are now updated into a usable check list.

The backup strategy proved dependable and robust. Some holes were fixed with the backup file exclusion lists, but the overall results were pleasing.

A potential gap in the backup strategy is the lack of off site backups.

One of the original restoration presumptions is certain files can be restored from sources other than backups, such as local mirror repositories or video DVDs. While the presumptions are reasonable with respect to conserving backup disk space, the time required to restore many of those external files is significant. Avoiding excessive restoration times is preferred.

Restoring from the clone backup disks means a loss of up to three days of changes. Restoring from weekly backups means a loss of up to one or two weeks. Once upon a time a loss of three days was considered acceptable, but needs and work flows change through the years. That potential loss now is questionable because of the increased quantity of daily changes.

Retaining a two year history of backups is nice. Currently each weekly backup disk is 3 TB and stores one year of backups. The two disks are alternated to create the two year history.

As the number of files grow and accumulate so does the disk space required for backups. Nothing new there, but maintaining a two year history likely will require larger disks now that certain files and directories are being removed from the exclusion lists. If certain external files such as DVD ISOs are removed from the exclusion list then larger disks are required or the two year history needs to be reduced.

System performance could be improved by replacing the office desktop internal mechanical disks with SSDs. From a disaster recovery perspective, with SSDs the clone backup disks would remain spinners and the current internal spinners would become shelf spares or a second set of backup disks that could be rotated off site. While 2 TB and 1 TB SSDs are a tad expensive, this idea seems like a prudent disaster recovery step.

Once upon a time notes were written to build a dedicated backup server. That approach would allow keeping close to real-time backup of all files and minimize file change losses. Still needed are storing backups off the network and rotating off site backups. Conversely, maintaining yet another computer with multiple disks and the additional cost of electricity is not tempting.

Revamping the backup strategy and disaster recovery plans could become expensive.

A notable lesson learned is life was less complex two decades ago with only one computer and no network in the house. Storing so much information in digital format creates many challenges and considerations.

Posted: September 21, 2021 Category: Usability Tagged: General

Next: Increasing Partition Space

Previous: Disaster Recovery Testing — 10