Cyberinfrastructure Seminar Series
Tuesday, August 2, 2005
Data Grids, Digital Libraries, and Persistent Archives: An Integrated
Approach to Sharing, Publishing, and Archiving Data
Reagan Moore
, SDSC
11:00 AM - 12:30 PM (PDT)
1:00 PM - 2:30
PM (CDT)
5239 Beckman Institute (NCSA) via AG
Live webcast at: www.cichannel.org (Real
Player is required)
Applications on the TeraGrid generate simulation output that can be measured
in the tens of Terabytes and millions of files. The ability to manage
these massive data sets is simplified through the use of data grid technology. Data
grids organize data that may be distributed across multiple sites in a collection
hierarchy. Descriptive metadata is associated with each file to support browsing
and discovery. Digital library services, such as those provided by DSpace,
are used to interact with the collection.
The integration of digital library technology with data grids makes it possible
to share data easily within a scientific community. Multiple scientific disciplines
are now assembling digital libraries composed of digital reference data sets
that represent standard observational or simulation scenarios. By being
able to both publish data for external researchers, and share data under access
controls within a project, data grids enable scientific research. Examples
include the National Virtual Observatory, the Southern California Earthquake
Center, the Biomedical Informatics Research Network, the Alliance for Cell Signaling,
and so on.
Data grids also provide support for incorporating new technology, including
new storage systems and new access methods. The ability to support new
technologies is called infrastructure independence. Along with mechanisms for
authenticity and integrity, infrastructure independence mechanisms form the
critical components of a preservation environment. The ability to use
the same software infrastructure to implement a data sharing environment, a
digital library, and a preservation environment is one of the great advances
provided by data management systems available on the TeraGrid Examples of these
technologies will be illustrated using the SDSC Storage Resource Broker Data
Grid.
The Cyberinfrastructure Seminar Series is a set of presentations
on cyberinfrastructure and related research organized by NCSA and SDSC. These
seminars are available on site at the presenting institution and remotely via
the Access Grid. For more details regarding the AG venue for this seminar,
please refer to: http://agschedule.ncsa.uiuc.edu/meetingdetails.asp?MID=9810.
All Access Grid sites are welcome to participate in this seminar. If you have
any questions, contact Jennie
File,
NCSA Training & Outreach
Group.