NCSA Home
Contact Us | Intranet | Search

Data Transfer

Table of Contents

  1. Data Transfer Overview
    Data Transfer Protocols
    Data Transfer Clients
    NCSA-TeraGrid Data Transfer Resources
    NCSA Mass Storage System Transfers
    Data Transfer Software Installation
    Transfer Performance Considerations
    Data Transfer Examples

Data Transfer Protocols

The various software packages and command-line tools, discussed below, work on top of a few standard protocols. Each protocol has its own set of semantics, security and available software.

SSH/SCP
Advantages:
  • Recursive feature allows simple reproduction of entire directory hierarchies of files.
  • Data is transmitted over a secure channel.
  • Host-key-based authentication is possible.
  • Convenient way to transfer source code or other relatively small files to/from your /home directory.
Disadvantages:
  • Individual files are transmitted separately, which becomes an issue when network latency is high.
  • Performance is poor over wide area links due to small TCP window sizes.
  • File transfers larger than 2GB are not supported on some systems.
  • Data encryption can become a bottleneck for large transfers.
Recommendations:
  • Use to transfer small files or directories containing source code or other relatively small file sets.
  • Tar directories containing large numbers of files when sending over high-latency networks.
HPN-SSH/SCP
Advantages:
  • Works under the normal interface of scp.
  • Allows TCP receive buffers to automatically adjusted or manually set.
  • Allows data encryption to be turned off, thus reducing the CPU load and allowing high-bandwidth transfers to reach the full network potential.
Disadvantages:
  • Both client and server must be patched for full functionality.
  • May degrade transfer performance on local low-latency networks.
Recommendations:
  • Determine if the client at the site you will be issuing scp commands is built with the HPN-SSH patch:
    	>ssh -V
    	    
FTP
Advantages:
  • Long-established Internet protocol means widely available and easy to implement
  • Many available clients are standard on most operating systems
  • Data is transmitted over an open channel (if one is concerned with secutity of data content).
Disadvantages:
  • Data is transmitted over an open channel (if one is concerned with speed of transfer).
Recommendations:
  • Good for interactive remote access.
  • Use for quick access to MSS.
KFTP
  • Same transfer protocol as FTP, but with added Kerberos security features.
  • Kerberized (or GSI-authenticated) FTP clients must be used to access the NCSA MSS or other FTP servers protected by a Kerberos realm.
  • Many standard FTP clients (named "ftp" not "kftp"), including those bundled with popular Linux distributions, already have Kerberos functionality built in.
GridFTP
Advantages:
  • GSI authentication: Allows secure password-less authentication with a valid X509 certificate and proxy.
  • Extended capabilities to accommodate performance increases through parallelism, optimized buffering and other techniques.
Disadvantages:
  • Limited to the FTP interface for recursive operations, listing renaming
  • Detailed knowledge of server deployments and system characteristics may be needed to obtain optimal performance.
Recommendations:
  • Use when moving large data sets to or from a parallel file system.
  • Striping or concurrent transfers (striped or non-striped) can help take advantage of multiple server hosts.