APS Data Management System

The APS Data Management system will streamline processing files created during data collection, and will ease the process of electronic access to data for users. The main design goal for the system is to alleviate tedious data management tasks for beamline staff, while ensuring the integrity and security of data. The system provides an infrastructure for organizing the files that comprise these data sets. By simplifing remote electronic access to data for users, it allows them to transfer or analyze their data shortly after their beam time.

The Data Management system provides easy-to-use automatic data transfers from beamlines to the central APS storage system (currently 250 TB) for short-term curation. From there, the system can also be used to transfer data to outside facilities, such as to a user’s home institution or to a longer-term storage system such as that maintained by Argonne’s CELS Directorate (currently 1.7 PB) allowing data can be staged for analysis with Argonne’s Leadership Computing Facility resources, such as the Mira supercomputer. A service layer within the Data Management system automates data transfer from acquisition systems to the storage system and policy enforcement, such as automatic deletion/archival. The GlobusOnline service is used to transfer data to remote locations.

A web portal, integrated with the APS User database and experiment scheduling and safety systems, is used to control data access settings and policies. The GUI is largely complete and the service layer is under development. Prototype deployment targeted for mid-2015.

 

Distribution & Impact

A phased deployment is planned, where beamlines producing the largest volumes of data and where staff are spending the most time on data management will be prioritized. With time, it is expected that this will be integrated into a large fraction of the XSD beamlines.

 

Funding Source

This project was started using Argonne LDRD funds and is now being made production-ready using operational funding from the APS, contract DE-AC02-06CH11357.

 

Future Work

The APS data management system is under active development with prototype deployment targeted for mid-2015. More details can be found on the project's wiki page: https://confluence.aps.anl.gov/x/FQBk

 

Details

The Data Management system's infrastructure consists of the hardware that stores data, and the software layer that manages the data and integrates various administrative (such as the User account database) and experiment (such as an areaDetector acquisition computer) systems. Individual components are described below:

Roles A user may be assigned one or more of four roles in the Data Management system. These roles determine which data a user may access, and the extent to which a user may manipulate certain data in the system.
On-site Storage The APS maintains 250 TB of short-term storage on-site.
Petrel Data Pilot Petrel is a pilot service for data management that allows researchers to store large-scale datasets and easily share that data with collaborators. Researchers from the Argonne Leadership Computing Facility (ALCF) and Globus are developing the system collaboratively. For more information visit: http://petrel.alcf.anl.gov/
Account Tools

The APS has two types of authentication systems. Access to the computing resources (cluster, workstations, etc.) is controller by an LDAP server managed by APS-IT. The APS-IS group maintains access to the Oracle server that lists all the badged Users. Most of the web systems like scheduling and proposal rely on information stored in the Oracle system for authenticating users. Synchronization tools create and maintain entries in LDAP for each badged APS User. These LDAP entries are used for authentication to the Data Management system, and by the on-site storage system for user and group ids (uid/gid).

Data Management Database and Web Portal

All information related to data sets, data policies, and assigned permissions are kept in the Data Management system's database. A web portal allows administrators and users a convenient way to view and modify settings.

Data Storage & Acquisition Services

The Data Storage & Acquisition Services manage monitoring data sources, transferring data files, maintaining the integrity and security of the data, setting permissions on data, and applying policies (such as auto deletion) to data. (More details will come soon.)

More details can be found on the project's wiki page: https://confluence.aps.anl.gov/x/FQBk