Finalizing Globus transfers with rsync

Globus is a useful tool for transferring data between computer systems. It's claimed to be faster than rsync, and is recommended by Compute Canada.

Unfortunately, Globus doesn't preserve permissions or copy symlinks. However, once the transfer completes, it's simple to tidy things up with rsync. Since we can trust the file contents, but not the metadata, the trick is to use --size-only. This flag informs rsync that if the source and destination files have the same size, they will also have the same data, so there is no need to even check their contents.

Once you've transferred data from src on the remote system R using Globus, you might use the command

1
rsync --archive --size-only R:src/ dst

to make sure that the local copy in dst matches the original. It's not a bad idea to first run it with -vni (--verbose, --dry-run, --itemize-changes) to verify that you have the paths correct; otherwise, you might end up duplicating the transfer.

Example

Consider the following source files in ~/globus_test on cedar:

1
2
3
4
5
$ stat -c '%A %s %N' globus_test/*
-rwxr-x--- 16 'globus_test/regular_file'
lrwxrwxrwx 18 'globus_test/symbolic_link' -> '/this/goes/nowhere'
$ cat globus_test/regular_file
This is a file.

After transferring them to ~/recv on graham via Globus, we have the following:

1
2
3
4
$ stat -c '%A %s %N' recv/globus_test/*
-rw-r--r-- 16 'recv/globus_test/regular_file'
$ cat recv/globus_test/regular_file
This is a file.

The contents of regular_file have been copied, but its permissions are wrong, and symbolic_link is missing entirely.

From graham, we can then sync the missing bits without transferring the data all over again. If we happen to specify an incorrect path, this is what our attempt would look like:

1
2
3
4
5
6
7
8
9
$ rsync -vni --archive --size-only cedar:globus_test/ globus_test
receiving incremental file list
created directory globus_test
cd+++++++++ ./
>f+++++++++ regular_file
cL+++++++++ symbolic_link -> /this/goes/nowhere

sent 33 bytes  received 142 bytes  38.89 bytes/sec
total size is 34  speedup is 0.19 (DRY RUN)

All the "+" indicate that those files would be created, a sure sign that something is not right.

Here's how it looks once we fix the destination path:

1
2
3
4
5
6
7
8
$ rsync -vni --archive --size-only cedar:globus_test/ recv/globus_test
receiving incremental file list
.d..tp..... ./
.f..tp..... regular_file
cL+++++++++ symbolic_link -> /this/goes/nowhere

sent 33 bytes  received 138 bytes  48.86 bytes/sec
total size is 34  speedup is 0.20 (DRY RUN)

The symlink is new, but for everything else, it's only going to set the times and permissions.

After we run it for real without -vni, we find that the local copy is now the same as the original:

1
2
3
4
$ rsync --archive --size-only cedar:globus_test/ recv/globus_test
$ stat -c '%A %s %N' recv/globus_test/*
-rwxr-x--- 16 'recv/globus_test/regular_file'
lrwxrwxrwx 18 'recv/globus_test/symbolic_link' -> '/this/goes/nowhere'