Table of Contents
- Data Transfer Overview
Data Transfer Protocols
Data Transfer Clients
NCSA-TeraGrid Data Transfer Resources
NCSA Mass Storage System Transfers
Data Transfer Software Installation
Transfer Performance Considerations
Data Transfer Examples
Transfer Performance Considerations
Local file system considerations account for two terms in the equation
of ultimate transfer performance: one for the sending end and one for
the receiving end. File system performance is rarely a culprit in poor
transfer performance, but must not be overlooked when pushing for high
bandwidth. Local disk access rates set the expectation for how fast a
locally attached node or host can read or write bits. In the
case of archival transfers, the tape access time can be very long, depending
on the load and the storage pattern of files requested. Once data is
staged, file system performance becomes relevant again.
The more obvious piece of the transfer puzzle is the network that exists
between the two endpoints. This comonent is often the most vaguely understood.
Tools for analyzing network paths between endpoints and/or for memory-to-memory
transfer (which exclude file system interaction) are good ways to gage
the high-water mark or best-case transfer speed for a given set of endpoints.
File Structure
In most cases larger files are "friendlier" to transfer than
the equivalent data volume comprised of many smaller files. Reducing
the file count can improve transfer performance.
Tuning for the Network
Latency
When moving data over large geographical distances, network latency
begins to degrade performance when default network settings are used.
Modern software is beginning to address this issue in an automated fashion,
but it is still sometime necessary to manually tune some parameters for
high-latency/high-bandwidth transfers.
Third-Party Transfers
A third-party transfer occurs when a client tool like UberFTP or globus-url-copy
is used to initiate a transfer between two servers, rather than a transfer
between a server and the client itself. A third party allows the client
process to be lightweight and to initiate transfers from remote sites.
The data-channel traffic is then moved completely to the servers that
reside at the endpoints of the transfer. Third-party transfers have the
full resources of the servers at both ends. A client–server transfer
is limited to the resources of the machine on which the client is running.
All NCSA systems have third-party-capable servers and clients. Our GridFTP
servers are deployed on hardware with optimized network and file system
connectivity. Using third-party transfers ensures that the data movement
is handled by these resources and, in many cases, improves performance.
Running a client–server transfer from a system's login node competes
for bandwidth with all other login node traffic, which can vary greatly
depending on the load of the system. Always use dedicated resources via
third-party transactions whenever possible for large transfers. This
technique will minimize impact on our interactive login facilities and
maximize transfer performance in most cases.