Stratus Quickstart

 

REQUEST AN ACCOUNT

Please use the computing facility request form available by clicking here

SYSTEM OVERVIEW

Stratus is a 1080-core computing cluster available to ARM investigators and users for ARM data analysis and visualization. Stratus consists of a 30-node Cray cluster with a total of 7.68 GB DDR4 memory per core. Each dual socket node consist of two Intel Xeon E5-2697V4 processors (18 cores per processor, 36 cores per node). It provides 57.6 TB fast Solid State Drive (SSD) storage and 100 TB parallel Lustre filesystem storage. It uses a high bandwidth Mellanox InfiniBand network. The system has two (2) external login nodes. Two of the system nodes also provides NVIDIA Kepler3 K80 (GK210) GB GPUs.

Cluster Policies

Please review below policy documents:

  • To review the Stratus Compute policy document click here.
  • To review the Stratus Storage policy document click here.
  • To review the Stratus Cyber security policy document click here.

ACCESSING STRATUS CLUSTER (Back to Top)

SSH Access:

The Stratus cluster is accessed using a secured shell (SSH) client, using the following two steps. XCAMS/UCAMS accounts are required to access Stratus.

Step 1: Login to the login Server:

XCAMS users:
  • ssh username@collab.cades.ornl.gov
UCAMS users:
  • ssh username@login1.ornl.gov

Step 2: Login to the Stratus Cluster:

  • ssh username@stratus.ornl.gov

DATA STORAGE RESOURCES (Back to Top)

The ADC Cluster provides an array of data storage platforms, each designed with a particular purpose in mind. Storage areas are broadly divided into two categories: those intended for user data and those intended for project data. Within each of the two categories, we provide different sub-areas, each with an intended purpose:

Purpose Storage Area Path
Long-term data for routine access User Home /home/$USER
Short-term project data for fast, batch-job access that is not shared User Scratch /lustre/or-hydra/cades-arm/$USER
Short-term project data for fast, batch-job access that’s shared with other project members Project Share /lustre/or-hydra/cades-arm/proj-shared/$USER
Short-term project data for fast, batch-job access that is shared globally World Share /lustre/or-hydra/cades-arm/world-shared/$USER
Fast read/write access during a batch job Local Scratch $localscratch
Long term storage of data not currently in use (Currently, only accessible using command adc_xfer) User Temp /data/adc/stratus/
Placeholder User Archive HPSS (if applicable)

User Home (Back to Top)

Home directories for each user are NFS-mounted on all ADC Cluster systems and are intended to store long-term, frequently-accessed user data. User Home areas are not backed up. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

User Scratch (Back to Top)

Project members get an individual User Scratch directory; these reside in the high-capacity Lustre® file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Because of the scratch nature of the file system, it is not backed up and files are automatically purged on a regular basis. Files should not be retained in this file system for long, but rather should be migrated to HPSS Archive space as soon as the files are not actively being used. If a file system associated with your User Scratch directory is nearing capacity, the ADC Cluster Support may contact you to request that you reduce the size of your Member scratch directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

Project Share (Back to Top)

Individual Project Share directories reside in the high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Because of the scratch nature of the file system, it is not backed up. If a file system associated with Project Share storage is nearing capacity, the ADC Cluster Support may contact the PI of the project to request that he or she reduce the size of the Project scratch directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

World Share (Back to Top)

Each project has a World Share directory that resides in the high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Because of the scratch nature of the file system, it is not backed up. If a file system associated with World Share storage is nearing capacity, the ADC Cluster may contact the PI of the project to request that he or she reduce the size of the World Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

Local Scratch Storage (Back to Top)

A fast solid state (SSD) disk area intended for parallel access to temporary storage in the form of scratch directories. This area is local to the computational node. This directory is, for example, intended to hold temporary and intermediate output generated by a user’s job. This is a run time only file system which is created at the start of a batch job and is purged at the end of the job. Files should not be retained in this file system and should be migrated to Lustre scratch or archival storage before finishing the job.

Path for local scratch storage is available during job runtime via environment variable $localscratch. Variable $localscratch typically has the form "/localscratch/tmp.$USER.$PBS_JOBID.or-condo-pbs01" and is specific to the user and to the scheduled job.

Project Storage (WARP) (Back to Top)

A NFS area intended for temporary data storage for moving data off the Lustre file system. This area is local to the computational node. This directory is, for example, intended to hold temporary and intermediate output generated by a user’s job. This is a run time only file system which is created at the start of a batch job and is purged at the end of the job. Files should not be retained in this file system and should be migrated to Lustre scratch or archival storage before finishing the job.

Data Transfer (Back to Top)

Scripts

We have provided two different methods to get data into the stratus cluster. The datastream_search tool allows access to datastreams that are too big or too old to be stored in the data center's fast storage. The adc_xfer tool allows users access to all the resources available in /data/archive and /data/datastream. For more information, see the table below:

Data Transfer Scripts Available Data Search Paths Script Usage Resource
datastream_search NA: See script instructions This program allows you to search and retrieve ARM datastreams. You can search for datastreams by site, data level, and instrument. Once you found your datastreams you can retrieve them with the -r flag. HPSS Deep Archive
adc_xfer /data/datastream

/data/project

/data/archive

This program allows you to move data between the ARM Data Center and the cluster. You can search for a path in the ADC using the -als flag or on the cluster using -cls. ARM Data Center

Script Usage

Before you have access to these tools you need to export MODULEPATH (see documentation below). Then load the module data_wrapper/1.0.0 . Now you can use the data transfer scripts like a normal Linux program.

Transfer data from HPSS to Stratus lustre:

datastream_search [-h] [-s YYYY-MM-DD] [-e YYYY-MM-DD] [-d DATASTREAM] [--site YYY] [--data_level XX] [--inst INSTRUMENT] [-r] [-v]

The optional arguments and the descriptions are listed below:

Optional Arguments Argument Description
-h --help show this help message and exit
-s --start start date (Default: 1970-01-01)
-e --end end date (Default: today)
-d --datastream Get a specific ARM datastream
--site filters by site
--data_level filter by data level
--inst filter by instrument
-r --retrieve retrieve files
-v verbose

Transfer data from ADC to Stratus lustre:

adc_xfer [-h] (-a ADC_SRC_PATH | -c CLUSTER_SRC_PATH | -als ) [--ops | --sci] DEST

Positional Arguments:

DEST full path to destination

The optional arguments and the descriptions are listed below:

Optional Arguments Argument Description
-h --help show this help message and exit
-a move file from ADC to cluster (See available Data Search Paths)
-c move file from cluster to ADC
-als --arm-list lists a directory in ADC
--ops move to operational directory on warp
--sci move to science directory on warp

DATA RETENTION, PURGE, & QUOTAS

 

Summary

The following table details quota, backup, purge, and retention information for each user-centric and project-centric storage area available at the ADC Cluster.

 

Data Storage Resources

Area Path Type Permissions Quota Backups Purged Retention
User Home /home/$USER NFS User-controlled 10 GB No No NA
User Scratch /lustre/or-hydra/cades-arm/$USER Lustre 700 TBD No No TBD Days
Project Share /lustre/or-hydra/cades-arm/proj-share Lustre 770 TBD No No TBD Days
World Share /lustre/or-hydra/cades-arm/world-share Lustre 775 TBD No No TBD Days
Local Scratch /lustre/or-hydra/cades-arm/scratch Lustre 770 TBD No No TBD Days
 

Software Environment

 

Default Shell (Back to Top)

The default shell on the Stratus cluster is bash. However, Stratus supports the following shells:

  • bash
  • tcsh
  • csh
  • ksh

Using Modules (Back to Top)

The modules software package allows you to dynamically modify your user environment by using pre-written module files.

Each module file contains the information needed to configure the shell for an application. After the modules software package is initialized, the environment can be modified on a per-module basis using the module command, which interprets a module file. Typically, a module file instructs the module command to alter or set shell environment variables such as PATH or MANPATH. Module files can be shared by many users on a system, and users can have their own personal collection to supplement and/or replace the shared module files. As a user, you can add and remove module files from your current shell environment. The environment changes performed by a module file are viewed by using the module command. More information on modules are found by running man module.

 

To access ARM specific softwares in your environment, add following to your .bashrc

Summary of module commands (Back to Top)

Command Description
module list Lists modules currently located in user's environment
module avail Lists all available modules on a system in condensed format
module avail -l Lists all available modules on a system in long format
module display Shows environment changes that will be made by loading a given module
module load Loads a module
module unload Unloads a module
module help Shows help for a module
module swap Swaps a currently loaded module for an unloaded module

Available Software (Back to Top)

Additional software may be installed within the cluster, as required. To check list of available software run command module avail

To see list of softwares loaded in your environment, run command module list

These are the ARM specific modules

ADI nco/4.6.4/td> cdo/1.8.2
lblrtm/12.1 monortm_wrapper/1.0.0 proj/4.9.3
aerioe ATLAS/3.10.3 data_wrapper/1.0.0
monortm/5.2

Other available software are:

amber/14 cuda/8.0(default) intel/18.0.0(default) openmpi/2.1.1
schrodinger/2018-1(default) amber/16(default) darshan/3.1.4(default)
java/1.8.0_131(default) openmpi/2.1.2 silo/4.10.2(default)
anaconda2/4.3.1 eden/clm(default) keras/2.0.2-conda(default)
openmpi/3.0.0(default) silo/4.9.1 anaconda2/4.4.0
EMSOFT/4.0-apr16-18(default) lammps/31Mar17(default) paraview/5.4.0(default)
singularity/2.3.2 anaconda2/5.1.0(default) exodus/5.14(default)
lapack/3.7.1(default) pegasus/4.8.2(default) singularity/2.4
anaconda3/4.4.0 ferret/7.2(default) magma/1.7.0(default)
PE-gnu/1.0(default) singularity/2.4.2(default) anaconda3/5.1.0(default)
ftw/3.3.5(default) matlab/R2017a(default) PE-gnu/2.0
SPEC2006/1.2(default) ArmForge/18.0(default) gcc/4.9.4
metis/5.1.0(default) PE-intel/1.0(default) subversion/1.9.4(default)
ATLAS/3.10.3(default) gcc/5.3.0 mkl/2017(default)
PE-nag/1.0(default) swift/1.4(default) autoconf/2.69(default)
gcc/6.2.0 mpich/3.2(default) PE-pgi/1.0(default)
swig/2.0.12 automake/1.15(default) gcc/6.3.0(default)
nag/6.0(default) pgi/15.7.0(default) swig/3.0.12(default)
blas/3.8.0(default) git/2.11.0(default) namd/2.11(default)
plumed/2(default) szip/2.1(default) boost/1.61.0
gnuplot/5.0.6(default) ncl/6.3.0(default) postgresql/9.3.5(default)
tensorflow/0.11-cpu boost/1.64.0 gromacs/2016.3(default)
nco/4.6.4 python/2.7.12 tensorflow/0.11-cpu-conda(default)
boost/1.67.0(default) gromacs/2016.3-AVX2 nco/4.6.6
python/2.7.13 tensorflow/0.11-gpu bzip2/1.0.6(default)
gromacs/5.1.2 nco/4.6.9(default) python/3.6.1
gromacs/5.1.2 nco/4.6.9(default) python/3.6.1
theano/0.9-conda(default) caffe/1.0-conda(default) gtk+/2.24.31(default)
ncview/2.1.7(default) python/3.6.3(default) trilinos/12.12.1(default)
cctools/6.2.7(default) hdf5/1.8.17(default) netcdf/4.3.3.1(default)
QE/5.4.0 vinampi/v2(default) cdo/1.8.2(default)
hdf5-parallel/1.8.17(default) netcdf-hdf5parallel/4.3.3.1(default) QE/6.0
visit/2.10.3 cmake/3.11.0(default) hypre/2.11.1(default)
nextflow/0.27.6(default) QE/6.1(default) visit/2.13.1(default)
cmake/3.5.2 hypre/2.6.0b nwchem/6.6(default) qt/5.6.1(default)
vmd/1.9.3(default) cmake/3.6.1 idl/8.6(default)
openBLAS/0.2.19 R/3.3.2 wxpython/3.0.2.0(default)
cmake/3.8.2 ilamb/2.1 OpenFOAM/5.0
R/3.3.2-intel xalt/0.7.5 cp2k/5.1(default)
ilamb/latest(default) OpenFOAM/dev(default) R/3.5.0(default)
xalt/0.7.6(default) cuda/6.5 intel/16.0.1
openmpi/1.10.3 sassena/1.4.2(default) zlib/1.2.8(default)
cuda/7.5 intel/17.0.0 openmpi/2.0.0
scalapack/2.0.2(default) utils/atlas/3.11.34

Requesting new software

To request a new software, please contact cluster support

Compiling codes (Back to Top)

C, Fortran examples

coming soon.

MPI Example

copy
                      
 //hello_mpi.c

 #include 
 #include 

int main(int argc, char** argv) {
  // Initialize the MPI environment
  MPI_Init(NULL, NULL);
  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  // Get the name of the processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);
  // Print off a hello world message
  printf("Hello world from processor %s, rank %d  out of %d processors\n", processor_name, world_rank, world_size);
  // Finalize the MPI environment.
  MPI_Finalize();
  }
                      
                   

To compile mpi example, please load gcc and openmpi modules. After loading the modules, use gcc to compile hello_mpi.c. For example,

copy
                      
module load gcc/5.3.0
module load openmpi/2.0.0
gcc $OPENMPI_INC -o mpi_test.o -c mpi_test.c
                      
                   

CUDA/GPU example

coming soon.

IDL example

coming soon.

Running Jobs

 

Available Queues: (Back to Top)

Queue Qos Max Walltime Priority Description
Batch burst Burst queue allows for use of resources beyond ARM Stratus system
arm_high_mem devel 4 hours high Queue for use of CPU nodes
std 48 hours lower
long 2 weeks lowest
arm-prod 6 hours highest Standing reservation for production jobs. Allows for use of up to 15 nodes between 12:00AM - 6:00AM daily. (Use of this queue requires approval from ARM management).

Example Submission Scripts:

 

Burst queue

#PBS -N jobname
#PBS -A arm-burst
#PBS -q batch
#PBS -W group_list=cades-user
#PBS -l qos=burst
#PBS -l nodes=16:ppn=16
#PBS -l walltime=1:00:00

Standard CPU queue:

#PBS -N jobname
#PBS -A arm
#PBS -q arm_high_mem
#PBS -W group_list=cades-arm
#PBS -l qos=std
#PBS -l nodes=16:ppn=16
#PBS -l walltime=1:00:00

GPU queue:

#PBS -N jobname
#PBS -A arm
#PBS -q gpu_ssd
#PBS -W group_list=cades-arm
#PBS -l qos=std
#PBS -l nodes=2:ppn=16
#PBS -l walltime=1:00:00

ARM production queue:

ARM production queue (arm-prod) has been specifically designed for daily production jobs. 15 nodes on Stratus queue is placed in a reservation every day during period of 12:00AM - 6:00AM to provide dedicated and guaranteed resources for critical operational jobs.

Details of the standing reservations can be checked using command “showres”.

#PBS -N jobname
#PBS -A arm
#PBS -q arm_high_mem
#PBS -x=FLAGS:ADVRES:arm-prod.5700
#PBS -W group_list=cades-arm
#PBS -l qos=std
#PBS -l nodes=16:ppn=16
#PBS -l walltime=1:00:00
 

Scheduling batch jobs (Back to Top)

Batch scripts, or job submission scripts, are the mechanism by which a user submits and configures a job for execution. A batch script is simply a shell script which contains:

  • Commands that can be interpreted by batch scheduling software (e.g. PBS)
  • Commands that can be interpreted by a shell

The batch script is submitted to the batch scheduler where it is parsed. Based on the parsed data, the batch scheduler places the script in the scheduler queue as a batch job. Once the batch job makes its way through the queue, the script will be executed on a service node within the set of allocated computational resources.

Sections of a Batch Script - Batch scripts are parsed into the following three sections:

  1. The Interpreter Line: The first line of a script can be used to specify the script’s interpreter. This line is optional. If not used, the submitter's default shell will be used. The line uses the "hash-bang-shell" syntax: #!/path/to/shell
  2. The Scheduler Options Section: The batch scheduler options are preceded by #PBS, making them appear as comments to a shell. PBS will look for #PBS options in a batch script from the script’s first line through the first non-comment line. A comment line begins with #. #PBS options entered after the first non-comment line will not be read by PBS.
  3. The Executable Commands Section: The shell commands follow the last #PBS option and represent the main content of the batch job. If any #PBS lines follow executable statements, they will be ignored as comments.

The execution section of a script will be interpreted by a shell and can contain multiple lines of executable invocations, shell commands, and comments. When the job's queue wait time is finished, commands within this section will be executed on a service node (sometimes called a "head node") from the set of the job's allocated resources. Under normal circumstances, the batch job will exit the queue after the last line of the script is executed.

An example batch script:

1: #!/bin/bash
2: #PBS -A arm
3: #PBS -N kazsacr0reproc
4: #PBS -l nodes=16:ppn=8
5: #PBS -l walltime=4:00:00
6: #PBS -W group_list=cades-arm
7: #PBS -q arm_high_mem
8: #PBS -j oe
9: #PBS -m abe
10: #PBS -M email@address
11: #PBS -V
12: #PBS -o o.log
13: #PBS -e e.log
14: #PBS -S /bin/bash

The following table summarizes frequently-used options to PBS:

Option Use Description
-A #PBS -A Causes the job time to be charged to ???. The account string, e.g. arm, is typically composed of the three letters followed by three digits and optionally followed by a subproject identifier, The utility showproj can be used to list your valid assigned project ID(s). This option is required by all jobs.
-l #PBS -l nodes= Maximum number of compute nodes. Jobs cannot request partial nodes.
#PBS -l ppn= Processors per nodes.
#PBS -l walltime= maximum wall-clock time, is in the format HH:MM:SS.
#PBS -l walltime= maximum wall-clock time, is in the format HH:MM:SS.
#PBS -l partition= Allocates resources on specified partition.
-o #PBS -o Writes standard output to ??? instead of .o$PBS_JOBID, $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier.
-e #PBS -e Writes standard error to ??? instead .e$PBS_JOBID
-j #PBS -j {oe, eo} Combines standard output and standard error into the standard error file (eo) or the standard out file (oe).
-m #PBS -m a Sends email to the submitter when the job aborts.
#PBS -m b Sends email to the submitter when the job begins.
#PBS -m e Sends email to the submitter when the job ends.
-M #PBS -M Specifies email address to use for -m options.
-N #PBS -N Sets the job name to ??? instead of the name of the job script.
-S #PBS -S Sets the shell to interpret the job script.
-q #PBS -q Directs the job to the specified queue. This option is not required to run in the default queue on any given system.
-V #PBS -V Exports all environment variables from the submitting shell into the batch job shell. Not recommended because the login nodes differ from the service nodes, using the '-V' option is not recommended. Users should create the needed environment within the batch job.
-X #PBS -X Enables X11 forwarding. The -X PBS option should be used to tunnel a GUI from an interactive batch job.

To submit job to the queue:

qsub job_submission_script

To check status of the jobs in the queue:

qstat -u username

PBS sets multiple environment variables at submission time. The following PBS variables are useful within batch scripts:

Variable Description
$PBS_O_WORKDIR The directory from which the batch job was submitted. By default, a new job starts in your home directory.
$PBS_JOBID The job's full identifier. A common use for PBS_JOBID is to append the job's ID to the standard output and error files.
$PBS_NUM_NODES The number of nodes requested.
$PBS_JOBNAME The job name supplied by the user.
$PBS_NODEFILE The name of the file containing the list of nodes assigned to the job. Used sometimes on non-Cray clusters.