The APS Data Management system will streamline processing files created during data collection, and will ease the process of electronic access to data for users. The main design goal for the system is to alleviate tedious data management tasks for beamline staff, while ensuring the integrity and security of data. The system provides an infrastructure for organizing the files that comprise these data sets. By simplifing remote electronic access to data for users, it allows them to transfer or analyze their data shortly after their beam time.
The Data Management system provides easy-to-use automatic data transfers from beamlines to the central APS storage system (currently 250 TB) for short-term curation. From there, the system can also be used to transfer data to outside facilities, such as to a user’s home institution or to a longer-term storage system such as that maintained by Argonne’s CELS Directorate (currently 1.7 PB) allowing data can be staged for analysis with Argonne’s Leadership Computing Facility resources, such as the Mira supercomputer. A service layer within the Data Management system automates data transfer from acquisition systems to the storage system and policy enforcement, such as automatic deletion/archival. The GlobusOnline service is used to transfer data to remote locations. A web portal, integrated with the APS User database and experiment scheduling and safety systems, is used to control data access settings and policies. The GUI is largely complete and the service layer is under development. Prototype deployment targeted for mid-2015. |
Distribution & Impact |
A phased deployment is planned, where beamlines producing the largest volumes of data and where staff are spending the most time on data management will be prioritized. With time, it is expected that this will be integrated into a large fraction of the XSD beamlines. |
Funding Source |
This project was started using Argonne LDRD funds and is now being made production-ready using operational funding from the APS, contract DE-AC02-06CH11357. |
Future Work |
The APS data management system is under active development with prototype deployment targeted for mid-2015. More details can be found on the project's wiki page: https://confluence.aps.anl.gov/x/FQBk |
Details | ||||||||||||
The Data Management system's infrastructure consists of the hardware that stores data, and the software layer that manages the data and integrates various administrative (such as the User account database) and experiment (such as an areaDetector acquisition computer) systems. Individual components are described below:
More details can be found on the project's wiki page: https://confluence.aps.anl.gov/x/FQBk |