Detecting NFS Clients

For many years I have used a “safe-shutdown” script. One of the reasons I have not adopted a systemd distro at home is my reliance on this script. I have not found a palatable way to support this script in systemd. The systemd-inhibit command does not help.

One of my safe shutdown checks is whether NFS clients are connected. If true then the safe-shutdown script aborts. My reason for detecting NFS clients is simple — NFS is evil when the client cannot find the NFS server. The client system will hang during any kind of file related task.

My safe-shutdown script has worked well for some years but by odd circumstance, recently I noticed a client system not being detected. The system running the NFS server powered down despite the connection. Although connected as an NFS client, the system was idle with respect to using that connection. From the NFS server perspective, the client was seen as not being connected.

My safe-shutdown script relies on the netstat command to detect clients. The common methods suggested online for detecting connected NFS clients is the netstat and showmount commands. My recent circumstance revealed that NFS clients will appear not connected when the client system is idle. That is, netstat will not show any ESTABLISHED connections. From the NFS server perspective, there are no connected clients.

The showmount -a command parses the /var/lib/nfs/rmtab file. The showmount and rpc.mountdman pages indicate this command option and file are unreliable. This is a strange design choice — why can’t the NFS server automatically scrub the rmtab file when clients unmount cleanly?

The unreliability of the rmtab file can be verified by noticing no change in the file contents, even after a long period. After several days a significant amount of stale log entries are present in the rmtab file.

When unmounted in a clean manner, the netstat command immediately shows that a client is disconnected. The netstat command seems to be the more sensible choice for detecting NFS clients. Except when a client is idle. Being idle does not mean disconnected or that the exported shares were unmounted.

With idle clients the rmtab file can help only a little by showing what clients have been connected.

This idleness can be observed using the netstat command. Wait five minutes after connecting a client with no network file activity. The netstat command will show no connection. On the client, running the df command or accessing a file on the NFS shares will refresh the NFS server connections. The NFS server will again show an ESTABLISHED connection.

Resolving this corner case challenge required changes to my safe shutdown procedure.

One, I added a simple one-liner in my system cleanup script that is run on shutdown:

cat /dev/null > /var/lib/nfs/rmtab

This does not eliminate the possibility of stale entries during a current session, but is a good start at trimming the log file of long-term stale connections.

I added additional checks in my safe-shutdown script. I run the showmount -a command:

showmount -a | awk -F ’:’ ‘{print $1}’ | grep ^[0-9] | sort | uniq

The result might contain stale entries but at least provides a list of recently connected clients.

I run a single ping to each IP address from that list. If the ping exit code is zero then the ping was successful. The client system is still online. Is the system connected? I run netstat:

netstat -an | grep ${LOCAL_IP_ADDRESS}:2049 | grep ESTABLISHED | awk ‘{print $5}’ | grep $client

If the test returns no ESTABLISHED connections then this client is online but seemingly not connected. The remaining question is whether the client is truly disconnected or idle?

An NFS client cron job running every three minutes would refresh the connections. For example, running the df command or piping the contents of a connected directory to /dev/null.

As I use SSH keys throughout the home LAN, I could have my safe-shutdown script SSH into the client system and check for NFS connections.

The Linux kernel TCP keepalive attributes are not an option. A common reference on the topic is the TCP Keepalive HOWTO. The kernel attributes are intended for TCP/IP connections and not NFS, even when NFS is used over TCP. I had no success testing these attributes with an idle NFS client connection.

The default timeout is five minutes or 300 seconds. This timeout is not related to the NFS mount option timeo. I found no configuration or mount option for changing this default.

My options for detecting an idle NFS client seem limited. I added the following one-liner in my safe-shutdown script when the first netstat command returns empty:

ssh root@${client} /bin/df

Thereafter I again run the netstat command.

Running the df command on the NFS client refreshes the connections and causes the NFS server to correctly display an established connection.

A client side cron job running df every three minutes is another option.

Perhaps this monkey-wrenching could be avoided if /var/lib/nfs/rmtab was updated and reliable or there was a way to override the default five minute timeout.

Posted: Category: Usability Tagged: General

Next: Software As A Service

Previous: Thunderbird and Lightning — 2