NCSA Home
Contact Us | Intranet | Search

Data Transfer

Table of Contents

  1. Data Transfer Overview
    Data Transfer Protocols
    Data Transfer Clients
    NCSA-TeraGrid Data Transfer Resources
    NCSA Mass Storage System Transfers
    Data Transfer Software Installation
    Transfer Performance Considerations
    Data Transfer Examples

Transfer Performance Considerations

Local file system considerations account for two terms in the equation of ultimate transfer performance: one for the sending end and one for the receiving end. File system performance is rarely a culprit in poor transfer performance, but must not be overlooked when pushing for high bandwidth. Local disk access rates set the expectation for how fast a locally attached node or host can read or write bits. In the case of archival transfers, the tape access time can be very long, depending on the load and the storage pattern of files requested. Once data is staged, file system performance becomes relevant again.

The more obvious piece of the transfer puzzle is the network that exists between the two endpoints. This comonent is often the most vaguely understood. Tools for analyzing network paths between endpoints and/or for memory-to-memory transfer (which exclude file system interaction) are good ways to gage the high-water mark or best-case transfer speed for a given set of endpoints.

File Structure

In most cases larger files are "friendlier" to transfer than the equivalent data volume comprised of many smaller files. Reducing the file count can improve transfer performance.

Tuning for the Network

Latency

When moving data over large geographical distances, network latency begins to degrade performance when default network settings are used. Modern software is beginning to address this issue in an automated fashion, but it is still sometime necessary to manually tune some parameters for high-latency/high-bandwidth transfers.

Third-Party Transfers

A third-party transfer occurs when a client tool like UberFTP or globus-url-copy is used to initiate a transfer between two servers, rather than a transfer between a server and the client itself. A third party allows the client process to be lightweight and to initiate transfers from remote sites. The data-channel traffic is then moved completely to the servers that reside at the endpoints of the transfer. Third-party transfers have the full resources of the servers at both ends. A client–server transfer is limited to the resources of the machine on which the client is running.

All NCSA systems have third-party-capable servers and clients. Our GridFTP servers are deployed on hardware with optimized network and file system connectivity. Using third-party transfers ensures that the data movement is handled by these resources and, in many cases, improves performance.

Running a client–server transfer from a system's login node competes for bandwidth with all other login node traffic, which can vary greatly depending on the load of the system. Always use dedicated resources via third-party transactions whenever possible for large transfers. This technique will minimize impact on our interactive login facilities and maximize transfer performance in most cases.