gzip *.imgwhich has a fairly heavily-tested compression algorithm that does error/sanity checking and is also tolerably fast. If you are extremely short of space you might try bzip2 which compresses rather better, at the expense of taking far more CPU to acheive it.
I strongly recommend compressing your data immediately after you've finished processing it, rather than waiting until the end of the data collection run - running gzip on a few thousand files takes a while. It is only necessary to compress the frames - data processing files are usually much smaller than images and so compressing these files are not an obvious gain.
It is important to note that creating the DVDs takes a while, in fact 20-40 minutes from typing the command to the first DVD appearing. The DVD backup machinery takes quite a while when the network is slow or the machine is burning DVDs for multiple people. DVDbackup cannot be run more than one at a time on the same machine, but it appears to be possible to run multiple DVDbackups from multiple machines.
To make the most of this approach it makes sense to collect multiple datasets within the same master directory, and not to have too many Gb of data in each master directory. For example if you know you're going to collect 30-40 Gb of data, waiting for a single DVDbackup job to complete 10 DVDs at the end of the run will take several hours - better to split up the data into 3 or 4 different directories and start the DVDbackup on the previous one once you start writing to the next one. This way the overhead backing up the last few datasets at the end of the run is as small as possible,
Instructions for mounting hard drives on the cyber cafe machines are rather minimal. Basically if you have a FAT32-formatted hard drive (the default for things like the external 250Gb Maxtor drives) then you can simply do a:
mount /mocha/1394_data1(where you should substitute "mocha" for the name of the machine you have your hard drive plugged into) then cd to this directory and use it as a hard drive. To unmount it, make sure you are not cd'd to any directory on the hard drive and then do:
cd umount /mocha/1394_data1If Linux says something like "disk is busy" you're cd'd to a directory on the drive from somewhere.
To do the actual backup you can simply "cp -r" from one directory to another but this tends to be wasteful if you go back and reprocess the data again, then have to "cp" it all over again. You could use the somewhat messy tar option:
cd wherever/I/want/my/data (cd where/my/data/is ; tar czf - data) | tar xvzf -in which case you need to read up on the usage of tar in order to figure out what I've just done. The method I prefer uses rsync in the following manner:
cd wherever/I/want/my/data rsync -azv --delete /where/my/data/is/. .The combination of trailing / and . is significant in rsync so it pays to have them the same. Rsync has the advantage that in princple it will not transfer data that's already in the destination directory (it compares file sizes/modification times using a rapid algorithm). This algorithm is not perfect, and sometimes seems to fall afoul of FAT32. The --delete removes files on the destination (backup) directory that are no longer present in the source directory. This can be an advantage if you delete (or compress) a whole bunch of frames, but understand that you will lose the deleted items if you utilize this option. Read the manual for rsync.
Practical Realities: it's not always practical to FTP your frames back to Princeton. At APS, data collection rates are often considerably too fast to do much more than make a vague attempt at keeping up with data collection, much less FTPing things. For unbinned frames, collecting one frame every 10 seconds, your average collection rate is 1.8 Mb/sec (18 Mb frames every 10 sec). The maximum FTP rate that I usually see is 300 kbyte/sec. To put this in perspective, if you collect an hour's worth of data at 1.8 Mb/sec, it will take you SIX hours to FTP it back. Often exposure times are a little longer than 10 seconds, but even assuming 20 second frames with 50% efficiency on the beamline, you're still greater than the continuous FTP rate. 1.8 Mbyte/sec is not a bad estimate of how much data you generate for binned frames at X29 using a 2 second/frame exposure time, either.
120 Gb of data will take 400,000 seconds at 300 kbyte/sec. That's 111 hours. Back at MSKCC the P.I. was particularly "optimistic" about the practicality of FTP'ing mutliple gigabytes of data across the network. I suggest avoiding it at all costs. If you want to transfer processing files, you can just create compressed tar archives and email them to yourself (or someone else). They are rarely more than 1Mb and so work just fine as attachments. Do:
tar -cvzf my_proc_stuff.tgz proc99/.and RTFM for tar if you're not clear how that line works.
At CHESS and APS, you can FTP directly to Princeton. At BNL you have to go through some sort of proxy setup (via ftpgw.bnl.local) which is particularly slow - BNL's IT department like to make our lives difficult to they seem to have disabled this. BNL also have an Anonymous FTP option (see http://www.px.nsls.bnl.gov/databackup/howto_ftp.htm) that lets you put your data on a local anonymous FTP site and retreive it once you get home. I would call this the option of last desperation if all else fails.