Send in your Unix questions today! |
See additional Unix tips and tricks
Given the comments and questions I've received since last week's column on using rsync to move a file system from one disk to another, I thought we might run through some simple exercises to gain more insight into how rsync works. Two of the problems that I have seen people running into when using rsync to copy collections of files from one place to another are 1) failing to retain symbolic and hard links and 2) misaligning the source and destination directories such that the copies don't look as intended. Here we'll create a directory with a number of files, copy the directory with an assortment of rsync (and other) commands, take a closer look at some of rsync's command options and then use rsync to verify that our replicated directory is intact.
First, to create a sample directory to be copied, we're going to use a smattering of common Unix commands, taking care to include a variety of file types.
testDir=" rsyncTest"
mkdir $testDir
touch $testDir/emptyfile
cp /etc/motd $testDir/textfile
mkdir $testDir/dir
mknod $testDir/devfile c 11 11
ln -s /etc/hosts $testDir/symlink
ln $testDir/textfile $testDir/link2textfile
|
In the commands above, we've created a small directory containing some regular files (one empty, one containing text), a directory, a special device file, a symbolic link and a hard link. A long listing displays the directory's contents. Notice that I've included the -i option so that I can see the inode used for each of the files.
> ls -li $testDir
total 8
7865105 crw-r--r-- 1 root other 11, 11 Jan 22 12:01 devfile
7865103 drwxr-xr-x 2 root other 512 Jan 22 12:01 dir
7865089 -rw-r--r-- 1 root other 0 Jan 22 12:01 emptyfile
7865090 -rw-r--r-- 2 root other 233 Jan 22 12:01 link2textfile
7865107 lrwxrwxrwx 1 root other 10 Jan 22 12:01 symlink -> /etc/hosts
7865090 -rw-r--r-- 2 root other 233 Jan 22 12:01 textfile
|
Now, let's copy this directory to another location using an rsync command. Notice we are using the -av options. The -a indicates that we want archive (use recursion) mode and the -v is verbose. I'm omitting the -z (compress) option in this case since this isn't going to add value when copying simple directory. In fact, compression will probably only slow down local copies and is likely only of value when a lot of data will be sent over a network. In some tests I ran, the copy took approximately five times longer when the -z option was used for a local copy. In fact, you can shave off a little more time by omitting the -v (verbose) option as well. I generally prefer the reassurance of some feedback when I enter rsync commands by hand, but often omit the -v in scripts.
> rsync -av $testDir /tmp
|
We're copying our test directory to /tmp in this example. The new directory, /tmp/rsyncTest, should end up looking virtually the same as the original with one exception. Let's examine the results of the copy and see what I mean by this.
> ls -li /tmp/rsyncTest
total 64
5268817 crw-r--r-- 1 root other 11, 11 Jan 22 12:01 devfile
4361895 drwxr-xr-x 2 root other 117 Jan 22 12:01 dir
464873865 -rw-r--r-- 1 root other 0 Jan 22 12:01 emptyfile
5468049 -rw-r--r-- 1 root other 233 Jan 22 12:01 link2textfile
8108616 lrwxrwxrwx 1 root other 10 Jan 22 12:02 symlink -> /etc/hosts
5388824 -rw-r--r-- 1 root other 233 Jan 22 12:01 textfile
|
The only thing we've lost in this copy is that the hard link we created in our test directory is now a separate file. We can tell this by looking at the files' inode numbers which are no longer identical.
Now, let's try the same thing using a tar-to-tar:
> tar cvpBf - rsyncTest | (cd /tmp; tar xBf - )
a rsyncTest/ 0K
a rsyncTest/emptyfile 0K
a rsyncTest/textfile 1K
a rsyncTest/dir/ 0K
a rsyncTest/devfile 0K
a rsyncTest/symlink symbolic link to /etc/hosts
a rsyncTest/link2textfile link to rsyncTest/textfile
|
In this case, we can see just from the output of the tar command that the hard link remains a hard link. To get this same behavior out of rsync, we can add the -hard-links option to the command line. In the example below, we run through the first copy exercise again with this option.
> rsync -av --hard-links $testDir /tmp
building file list ... done
devfile
dir/
emptyfile
symlink -> /etc/hosts
textfile
moretext => textfile
sent 407 bytes received 60 bytes 934.00 bytes/sec
total size is 2147484123 speedup is 4598467.07
|
Examining the resultant directory, we can see that we now have a set of hard linked files:
> ls -li /tmp/new
total 64
5268817 crw-r--r-- 1 root other 11, 11 Jan 22 12:01 devfile
4361895 drwxr-xr-x 2 root other 117 Jan 22 12:01 dir
464873865 -rw-r--r-- 1 root other 0 Jan 22 12:01 emptyfile
5388824 -rw-r--r-- 2 root other 233 Jan 22 12:01 link2textfile
8108616 lrwxrwxrwx 1 root other 10 Jan 22 12:02 symlink -> /etc/hosts
5388824 -rw-r--r-- 2 root other 233 Jan 22 12:01 textfile
|
If we rsync the directory at a later time and omit the hard-links option, the hard links will remain as long as we haven't changed the file on the source directory. And here's one of the beauties of rsync: If we forget to use hard-links in an original copy and then run a second rsync command including this option, rsync will pick up on the change and make the correction. Since all of the other files will already be in sync, it will just force the one file to be a hard link as it is in the original directory.
Now let's run a verification process on our source and destination directories. I am running this with the dry-run option, but this is not necessary since there should be no differences.
> rsync -a --verbose --progress --stats --compress --hard-links \
> $testDir /tmp
building file list ... done
Number of files: 7
Number of files transferred: 0
Total file size: 2147484123 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 193
Total bytes sent: 205
Total bytes received: 20
sent 205 bytes received 20 bytes 450.00 bytes/sec
total size is 2147484123 speedup is 9544373.88
|
The verification step confirms that there are no differences between the original directory and the copy.
Rsync is easy to use but, considering the value of the files you are replicating, it's a good idea to practice your rsync commands with a small test directory before you trust them to do the right thing with a large and critical collection of files.