Introduction
The ARM Data Center (ADC) Cluster provides a comprehensive suite of hardware and software resources for the creation, manipulation, and retention of scientific data. This document comprises guidelines for acceptable use of those resources. It is an official policy of the ADC Cluster, and as such, must be agreed to by relevant parties as a condition of access to and use of ADC Cluster computational resources.
Data Storage Resources
The ADC Cluster provides an array of data storage platforms, each designed with a particular purpose in mind. These storage areas offer a range of storage capacity and I/O performance to meet application needs. Storage areas are broadly divided into two categories: those intended for user data and those intended for project data. Within each of the two categories, we provide different sub-areas, each with an intended purpose:
Purpose | Storage Area | Path |
---|---|---|
Long-term data for routine access | User Home | /home/$USER |
Short-term project data for fast, batch-job access that is not shared | User Scratch | /lustre/or-hydra/cades-arm/$USER |
Short-term project data for fast, batch-job access that’s shared with other project members | Project Share | /lustre/or-hydra/cades-arm/proj-shared/$USER |
Short-term project data for fast, batch-job access that is shared globally | World Share | /lustre/or-hydra/cades-arm/world-shared/$USER |
Fast read/write access during a batch job | Local Scratch | $localscratch |
Long term storage of data not currently in use (Currently, only accessible using command adc_xfer) | User Temp | /data/adc/stratus/ |
Placeholder | User Archive | HPSS (if applicable) |
User Home
Home directories for each user are Network File System (NFS)-mounted on all ADC Cluster systems and are intended to store long-term, frequently-accessed user data. User Home areas are not backed up. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
User Scratch
Project members get an individual User Scratch directory which reside in the high-capacity Lustre® file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Because of the scratch nature of the file system, it is not backed up and files are automatically purged on a regular basis. Files should not be retained in this file system for long, but rather should be migrated to HPSS Archive space as soon as the files are not actively being used. If a file system associated with a user’s Scratch directory is nearing capacity, the ADC Cluster Support may contact them to request that they reduce the size of their scratch directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Project Share
Project Share directories reside in the high-capacity Lustre® file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Data in this directory are accessible by all project members. Because of the scratch nature of the file system, it is not backed up. If a file system associated with Project Share storage is nearing capacity, the ADC Cluster Support may contact the PI of the project to request that they reduce the size of the Project scratch directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
World Share
Each project has a World Share directory that resides in the high-capacity Lustre® file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Data in this directory are accessible by all users on the system. Because of the scratch nature of the file system, it is not backed up. If a file system associated with World Share storage is nearing capacity, the ADC Cluster may contact the PI of the project to request that they reduce the size of the World Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Local Scratch Storage
A fast solid state (SSD) disk area intended for parallel access to temporary storage in the form of scratch directories. This area is local to the computational node. This directory is, for example, intended to hold temporary and intermediate output generated by a user’s job. This is a run time only file system which is created at the start of a batch job and is purged at the end of the job. Files should not be retained in this file system and should be migrated to Lustre scratch or archival storage before finishing the job.
Path for local scratch storage is available during job runtime via environment variable $localscratch. Variable $localscratch typically has the form "/localscratch/tmp.$USER.$PBS_JOBID.or-condo-pbs01" and is specific to the user and to the scheduled job.
User Temp
Temporary directories for each user/group are NFS-mounted on all ADC Cluster systems and are intended to store short-term output data. User Temp areas are not backed up. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
DATA RETENTION, PURGE, & QUOTAS
Summary
The following table details quota, backup, purge, and retention information for each user-centric and project-centric storage area available at the ADC Cluster.
Data Retention Policy Overview
The table below lists the storage policy for the given storage area.
Area | Path | Type | Permissions | Quota | Backups | Purged | Retention |
---|---|---|---|---|---|---|---|
User Home | /home/$USER | NFS | User-controlled | 50 GB | No | No | NA |
User Scratch | /lustre/or-hydra/cades-arm/$USER | Lustre | 700 | TBD | No | No | TBD Days |
Project Share | /lustre/or-hydra/cades-arm/proj-share | Lustre | 770 | TBD | No | No | TBD Days |
World Share | /lustre/or-hydra/cades-arm/world-share | Lustre | 775 | TBD | No | No | TBD Days |
Lustre Scratch | /lustre/or-hydra/cades-arm/scratch | Lustre | 770 | TBD | No | No | TBD Days |
Local Scratch | /localscratch | SSD | 770 | 2 TB | No | Yes | Length of the job |
User Temp | /data/adc/stratus | NFS | 770 | TBD | Yes | Yes | 7 Days |
Definitions
The table below defines frequently used terms and their definition.
Term | Definition |
---|---|
Area | The general name of storage area. |
Path | The path (symlink) to the storage area’s directory. |
Type | The underlying software technology supporting the storage area. |
Permissions | UNIX Permissions enforced on the storage area’s top-level directory. |
Quota | The limits placed on total number of bytes and/or files in the storage area. |
Backups | States if the data is automatically duplicated for disaster recovery purposes. |
Purged | Period of time, post-file-access, after which a file will be marked as eligible for permanent deletion. |
Retention | Period of time, post-account-deactivation or post-project-end, after which data will be marked as eligible for permanent deletion. |
Data Retention Overview
There is no lifetime retention for any data on ADC Cluster resources. The ADC Cluster specifies a limited post-deactivation timeframe during which user and project data will be retained. When the retention timeframe expires, the ADC Cluster retains the right to delete data. If you have data retention needs beyond the timeframe outlines in this policy, please notify ADC Cluster support.
User Data Retention
The user data retention policy exists to reclaim storage space after a user account is deactivated, e.g., after the user’s involvement on all ADC Cluster projects concludes. The ADC Cluster will retain data in user-centric storage areas only for a designated amount of time after the user’s account is deactivated. During this time, a user can request a temporary extension for data access.
Project Data Retention
The project data retention policy exists to reclaim storage space after a project ends. The ADC Cluster will retain data in project-centric storage areas only for a designated amount of time after the project end date. During this time, a project member can request a temporary extension for data access.
Data Purges
Data purge mechanisms are enabled on some ADC Cluster file system directories in order to maintain sufficient disk space for job execution. Files in these scratch areas are automatically purged on regular intervals. If a file system with an active purge policy is nearing capacity, the ADC Cluster may contact users to request that they reduce the size of a directory within that file system, even if the purge timeframe has not been exceeded. These purges will redirect data to an ADC NFS mount for temporary holding. Requests for purge-proof directories will be considered upon request to accommodate project-specific requirements.
Storage Space Quotas
Each user-centric and project-centric storage area has an associated quota, which could be a hard (systematically-enforceable) quota or a soft (policy-enforceable) quota. Storage usage will be monitored continually. When a user or project exceeds a soft quota for a storage area, the user or project PI will be contacted and will be asked if at all possible to purge data from the offending area. See the section “Data Retention, Purge, & Quota Summary” for details on quotas for each storage area. Requests for increased quotas will be considered based on project requirements and priorities.
Data Prohibitions & Safeguards
Prohibited Data
The ADC Cluster is a computational resource for ARM-specific operational and scientific research use and only contain data related to scientific research and do not contain personally identifiable information (data that falls under the Privacy Act of 1974 5U.S.C. 552a). Use of ADC resources to store, manipulate, or remotely access any national security information is strictly prohibited. This includes, but is not limited to: classified information, unclassified controlled nuclear information (UCNI), naval nuclear propulsion information (NNPI), the design or development of nuclear, biological, or chemical weapons or any weapons of mass destruction.
Principal investigators, users, or project delegates that use ARM cluster resources, or are responsible for overseeing projects that use ARM cluster resources, are strictly responsible for knowing whether their project generates any of these prohibited data types or information that falls under Export Control. For questions, contact cluster support.
Unauthorized Data Modification
Users are prohibited from taking unauthorized actions to intentionally modify or delete information or programs that do not pertain to their jobs/code.
Data Confidentiality, Integrity, & Availability
The ADC Cluster systems provide protections to maintain the confidentiality, integrity, and availability of user data. Measures include implementation of file permissions, archival systems with access control lists, and parity/CRC checks on data paths/files. It is the user’s responsibility to set access controls appropriately for data. In the event of system failure or malicious actions, the ADC Cluster makes no guarantee against loss of data nor makes a guarantee that a user’s data could not be potentially accessed, changed, or deleted by another individual. It is the user’s responsibility to ensure the appropriate level of backup and integrity checks on critical data and programs.
Administrator Access to Data
ADC Cluster resources are federal computer systems. Users should have no explicit or implicit expectation of privacy. ADC Cluster employees and authorized vendor personnel with administrative privileges have access to all data on ADC Cluster systems. Such employees can also log to ADC Cluster systems as other users. ADC Cluster employees will not discuss your data with any unauthorized entities nor grant access to data files to any person other than the owner of the data file, except in the following situations:
- When the owner requests a change of ownership (e.g. if the owner is leaving the project and grants the PI ownership of the data
- As required by a court order
- As required by any other federal investigation
- If ADC staff deems it necessary (e.g. abuse/misuse of computational resources, criminal activity, cyber-security violations, etc.)
Note that the above applies even to project PIs. In general, the ADC Cluster admins will not overwrite existing permissions on data files owned by project members for the purpose of granting access to the project PI. Project PIs should work closely with project members throughout the duration of the project to ensure permissions are set appropriately.