Monitoring Disk Usage
Hard disk capacities these days are large compared to the early days of computers. My first hard disk around 1987 or so was a whopping 10 MB housed in an external case the size of a shoe box. Except for certain use cases many people never fill disks to capacity and pay little attention to disk storage space. Typically I am no different although I tinker enough with computers to be aware of disk usage.
I noticed I had lost about 30 GB of space on the house network file server backup partition
/dev/sdb1. Within another day that total increased to about 40 GB.
The backup strategy includes a set of cloned disks. Those disks are updated about every three days using a custom shell script based on
rsync. The sudden disk changes happened within a day of the most recent clone disk backup. I could compare the affected partition between the two disks.
I spent significant time comparing files and directories looking for a root cause. No matter how I inspected and compared files I found nothing obvious.
I noticed a coincidental loss of data in the hourly rsnapshot backups. Coincidence does not mean causation. I never confirmed anything, but I suspected the hourly rsnapshot rotations corrupted, leaving a mess of broken hard links. Perhaps something peculiar such as inodes being mapped as used.
The loss of 40 GB of disk space should have been easy to find and that was not the case.
In response I wrote a shell script to monitor partition space, something to help beyond general daily observations. There are many disk and file monitoring utilities, but often I prefer and enjoy rolling my own remedy with a shell script. Often specialized utilities are designed to cover many use cases and can be overly complex. A shell script comfortably covers a single use case.
The script is not fancy. Through an hourly cron job the script sends the output of
df -hTl | grep ^/dev/ | egrep -v “tmpfs|/mnt/|/media” | sort -V to a log file.
diff command the script compares that output to the previous recorded output. Any difference greater than 2% is reported in an email.
There are no nuisance emails unless I create large temporary files. Even then the emails are only hourly. I do not mind a few extra emails because the lost 40 GB of drive space likely will remain a mystery. I want to avoid repeating another such event. The goal is to investigate timely and hopefully pinpoint the root cause should a similar disk glitch happen.