Finalizing Globus transfers with rsync
2021-03-20Globus is a useful tool for transferring data between computer systems. It's claimed to be faster than rsync, and is recommended by Compute Canada.
Unfortunately, Globus doesn't preserve permissions or copy symlinks.
However, once the transfer completes, it's simple to tidy things up with rsync.
Since we can trust the file contents, but not the metadata, the trick is to use --size-only
.
This flag informs rsync that if the source and destination files have the same size, they will also have the same data, so there is no need to even check their contents.
Once you've transferred data from src
on the remote system R
using Globus, you might use the command
1 | rsync --archive --size-only R:src/ dst
|
to make sure that the local copy in dst
matches the original.
It's not a bad idea to first run it with -vni
(--verbose
, --dry-run
, --itemize-changes
) to verify that you have the paths correct; otherwise, you might end up duplicating the transfer.
Example
Consider the following source files in ~/globus_test
on cedar
:
1 2 3 4 5 | $ stat -c '%A %s %N' globus_test/*
-rwxr-x--- 16 'globus_test/regular_file'
lrwxrwxrwx 18 'globus_test/symbolic_link' -> '/this/goes/nowhere'
$ cat globus_test/regular_file
This is a file.
|
After transferring them to ~/recv
on graham
via Globus, we have the following:
1 2 3 4 | $ stat -c '%A %s %N' recv/globus_test/*
-rw-r--r-- 16 'recv/globus_test/regular_file'
$ cat recv/globus_test/regular_file
This is a file.
|
The contents of regular_file
have been copied, but its permissions are wrong, and symbolic_link
is missing entirely.
From graham
, we can then sync the missing bits without transferring the data all over again.
If we happen to specify an incorrect path, this is what our attempt would look like:
1 2 3 4 5 6 7 8 9 | $ rsync -vni --archive --size-only cedar:globus_test/ globus_test
receiving incremental file list
created directory globus_test
cd+++++++++ ./
>f+++++++++ regular_file
cL+++++++++ symbolic_link -> /this/goes/nowhere
sent 33 bytes received 142 bytes 38.89 bytes/sec
total size is 34 speedup is 0.19 (DRY RUN)
|
All the "+
" indicate that those files would be created, a sure sign that something is not right.
Here's how it looks once we fix the destination path:
1 2 3 4 5 6 7 8 | $ rsync -vni --archive --size-only cedar:globus_test/ recv/globus_test
receiving incremental file list
.d..tp..... ./
.f..tp..... regular_file
cL+++++++++ symbolic_link -> /this/goes/nowhere
sent 33 bytes received 138 bytes 48.86 bytes/sec
total size is 34 speedup is 0.20 (DRY RUN)
|
The symlink is new, but for everything else, it's only going to set the times and permissions.
After we run it for real without -vni
, we find that the local copy is now the same as the original:
1 2 3 4 | $ rsync --archive --size-only cedar:globus_test/ recv/globus_test
$ stat -c '%A %s %N' recv/globus_test/*
-rwxr-x--- 16 'recv/globus_test/regular_file'
lrwxrwxrwx 18 'recv/globus_test/symbolic_link' -> '/this/goes/nowhere'
|