Preserving Hard Links

Recently I traveled an interesting journey.

At work we use three different rack servers as backup servers. Each server is physically located in a different building, several miles apart. We were going to update one of the backup servers by replacing the rack with one with dual power supplies and larger RAID 5 disk capacity.

There was no rush or emergency with the replacement, but I asked to treat the replacement as a disaster recovery exercise. That would provide us much needed information for our knowledge base.

Presumed in this exercise is we wanted to restore as much as possible of the previous server. In this case the “disaster” was not fully devastating because we still had access to the original server.

Embracing the exercise in this manner revealed much. I wrote some shell scripts to help automate the restoration process. All went well until I tried to restore the rsnapshot rotations from the old server to the new.

There were about 360 GB of rsnapshot files on the old server. Having experienced many times how slow rsync is with an initial copy because nothing exists on the destination server, I used cp -a to copy the files from the old server to the new. The two machines were in the same building and connected with a 1 Gbps switch.

As the directories were being copied I noticed the time to copy each snapshot directory from the old server was taking an abnormally long time for a 1 Gbps connection. I was seeing 10/100 Mbps speeds rather than 1 Gbps speeds.

I searched the web and discovered similar complaints about copying hundreds of thousands or millions of small files, including Windows and Macs. I accepted the basic conclusion that such an exercise was slow.

I should have listened to my instincts that something was not quite right.

By the end of the weekend the new larger disk capacity was full. The rsnapshot files were using about 8 times the disk space of the old server, using about 2.1 TB of space rather than about 360 GB.

Immediately I suspected the snapshots were copied as files rather than hard links.

Digging deeper I learned about the ls -i parameter. This parameter confirmed my suspicion. Next I crawled deeper into the rabbit hole to discover what had happened.

I searched the web. A clue here. A clue there. Nothing concrete but years of using a Linux based system bubbled up an idea.

The grand mystery was that hard links cannot be seen across a network share. Across a network share a client system only sees files on the server. Hard links can be seen only locally. From the client side, copying hard links across a network share will never succeed.

I had been copying the snapshots by pulling the files from the old server to the new.

To preserve the hard links using cp -a, the files must be pushed from the old system to the new and not pulled from the old to the new. This way the old server saw the hard links.

I logged into the old server, mounted the new server rsnapshot storage directory, and then ran cp -a to push the files to the new server. This worked beautifully. Faster too because hard links were being copied rather than full files as with the pull copy.

Posted: Category: Tutorial, Usability Tagged: General

Next: Things to Know Before Using Linux

Previous: 25 Megabits Per Second