NCSA Home
Contact Us | Intranet | Search

Automated Saving of Files from Batch Jobs

  1. Introduction
  2. saveafterjob features
  3. Examples
  4. Notes and Recommendations
  5. saveafterjob utilities

1. Introduction

The saveafterjob utility on NCSA's SGI Altix (cobalt) and Intel 64 Cluster (abe) provides automated, guaranteed saving of output files from batch jobs to the mass storage system.
For abe only: the +saj SoftEnv key is required to use saveafterjob
The basic process is as follows:
  • You need to specify the files that you want to be transferred to mass storage using saveafterjob.
  • The transfer occurs after the job is ended. Now in all cases, the FTP (File Transfer Protocol) directives in saveafterjob will be executed for:
    • jobs that terminate normally
    • jobs that terminate abnormally, for example, for limit violations
    • jobs that are killed by the system or the user (via qdel)
    • jobs that are killed when the system crashes.
  • With the saveafterjob command, files are not purged from the scratch filesystem until they are successfully transferred or it has been determined that the transfer will never happen (e.g., due to user syntax error).
  • You will be notified by email after the transfer requests to mass storage are processed by the system.

2. Features

Only a small subset of the FTP directives are accepted with saveafterjob:
cd, lcd, mkdir, put, mput, tar, umask
Other FTP directives are ignored.

saveafterjob Directive Description
cd change directory on UniTree
lcd change local working directory
mkdir make directory on UniTree
put send one file to UniTree
mput send multiple files to UniTree
umask get (set) umask on UniTree (see man umask)
tar create or extract a tar file from UniTree

tar is a special built-in command that is also accepted.

IMPORTANT NOTES:

  • The FTP directives specified in saveafterjob are saved by the system and executed sometime after job completion. It is important that the saveafterjob commands be specified as early as possible in the batch script, before the files to be transferred are even created. The reason is that if the job dies prematurely before it reaches the saveafterjob command(s) in the script, the request for saving files is not recorded and files from the job will not be transferred to mass storage. The files or file patterns specified are matched against the files after the job has completed rather than during the job.

  • It is important to issue the saveafterjob command(s) after changing to the directory ($SCR) where the job will execute since the transfer will be executed in the directory from which the saveafterjob request is made.

The -c (or --clear) option removes all previous requests.

See the saveafterjob man page for more information on the new feature, and the ftp man page for more information on ftp directives.

3. Examples

  1. If a job creates an output file named output.dat, the job could begin as follows:
    cd $SCR
    saveafterjob "put output.dat"
    
    Then, sometime after the job finishes, the file output.dat would be saved in the user's mass storage home directory.

  2. If the file needs to be saved in a directory named xyz in UniTree, the command is:
    cd $SCR
    saveafterjob "mkdir xyz, cd xyz, put output.dat"
    
    Note It is also possible to use ";" instead of "," as a delimiter.

  3. If the job creates subdirectories where the output file will reside, for example, in a directory named Run under $SCR, the syntax is:
    cd $SCR
    saveafterjob "lcd Run, put output.dat"
    

  4. If the job creates multiple files *.dat, the tar utility can be used to combine the files into one file job20.tar, which is saved to UniTree, the built-in tar command in saveafterjob can be used (see the msscmd man page in the section USING TAR for details on this syntax):
    saveafterjob "tar cvf job20.tar *.dat"
    
    Note 1: The above built-in tar command automatically uses the tar -K option required for files larger than 2 Gigabytes.

    Note 2: To extract the files once job20.tar is saved to UniTree, enter the following on copper:

    % cd /scratch-global/$USER
    % msscmd "tar xvf job20.tar"
    
    IMPORTANT: Use of tar is strongly recommended for efficient storage to and retrieval from UniTree. However, if the individual files are very large (on the order of Gb), AND your access patterns are such that you usually need to get only one or a small subset of the files at any given time from UniTree, it may be more efficient to save the files individually.

  5. Alternatively, the general syntax for the above example would be:
    cd $SCR
    saveafterjob put '"|tar cf - *.dat"'  job20.tar
    
    Note 1: The above syntax is explained in the ftp man page in the section FILE NAMING CONVENTIONS.
    Note 2: Use this syntax if your tar command requires special options.
    Note 3: Specifying a path name for tar in the above syntax will cause the saveafterjob command to fail.

  6. The built-in tar command can also be used with shell variable names:
    set run=abc20
    cd $SCR
    saveafterjob "tar cf $run.tar $run.*"
    

  7. Example on the use of --clear:
    cd $SCR
    saveafterjob "mput *.dat"
    saveafterjob "mkdir X, cd X, mput *.chk"
    
    queues up 2 requests. If the job later executes
    saveafterjob --clear "mput *.tar"
    
    the first 2 requests are removed and replaced by the new one.

    This could be done in the case of jobs that create intermediate files that need to be saved in the event of premature termination of the job. If the job completes normally and the output files are created, then the output files are saved and the intermediate files are not necessary. In this case the

    saveafterjob --clear
    command would be issued after the execution line.

4. Notes and Recommendations

  • File transfer is only guaranteed when using $SCR. Use of other file systems is not reliable.

  • If your batch jobs are chained (one job submits another job before ending), then you should not rely on the output from the first job making it to UniTree in time to be available for the second job, if the first job uses saveafterjob to save files.

  • If you currently use mssftp in your batch scripts to save files, you must use saveafterjob to be guaranteed safe transfer of files. To save files, replace:
    mssftp << EOF
    .
    .
    .
    EOF
    
    with
    saveafterjob -f - << EOF
    .
    .
    .
    EOF
    
    This is the shell here document syntax.

    Note: You should use msscmd to get files in your batch jobs rather than mssftp.

  • Do not use $SCR in the put commands. For example:
    saveafterjob -f - << EOF
    put "|tar cf - $SCR/*.dat" dat.tar
    EOF
    
    This will result in $SCR being expanded in the file names in dat.tar and will affect future extraction of the files. The better syntax is:
    cd $SCR
    saveafterjob -f - << EOF
    put "|tar cf - *.dat" dat.tar
    EOF
    
  • Each instance of saveafterjob starts from your home directory on mss.

    For example:

    saveafterjob "cd run12, put run12.cpt"
    saveafterjob "cd data, put res.dat"
    
    saves run12.cpt into the directory user/run12
    and res.dat into the directory user/data on unitree.

    Note:It does not save res.dat into user/run12/data

  • If you try to cd to a directory on unitree that does not exist, the file will be saved in the last valid directory from that saveafterjob command.
    If this is the only cd in the command the file will be saved in your home directory on unitree.

    For example:

    saveafterjob "cd data, put res.dat"
    
    if subdirectory data does not exist res.dat will be saved in the home directory.

    or

    saveafterjob "cd run12, cd data, put res.dat"
    
    if subdirectory run12 exists, but run12/data does not then res.dat will be saved in run12.

  • saveafterjob does not work with links or softlinks.

  • saveafterjob does not work with multiple wildcards in a single pattern.

    For example it is not possible to wildcard both directories and files in a single command:

    saveafterjob "tar ./*/data/*.dat"
    

    will not work.

5. saveafterjob utilities

  • The progress of saveafterjob requests can be followed with the command sajstatus.
    Note: Progress information about a saj request may not be available
          immediately after a job has completed, nor will the saj progress
          info remain available permenantly after a SAJ has ended.

  • saveafterjob requests can be cancelled by using the command sajcancel jobid.

  • saveafterjob history can be viewed by using the command sajhist jobid.