ARM - Stratus Cluster Quickstart

REQUEST AN ACCOUNT

Please use the computing facility request form available by clicking here

SYSTEM OVERVIEW

Stratus is a 1080-core computing cluster available to ARM investigators and users for ARM data analysis and visualization. Stratus consists of a 30-node Cray cluster with a total of 7.68 GB DDR4 memory per core. Each dual socket node consist of two Intel Xeon E5-2697V4 processors (18 cores per processor, 36 cores per node). It provides 57.6 TB fast Solid State Drive (SSD) storage and 100 TB parallel Lustre filesystem storage. It uses a high bandwidth Mellanox InfiniBand network. The system has two (2) external login nodes. Two of the system nodes also provides NVIDIA Kepler3 K80 (GK210) GB GPUs.

Cluster Policies

Please review below policy documents:

To review the Stratus Compute policy document click here.
To review the Stratus Storage policy document click here.
To review the Stratus Cyber security policy document click here.

ACCESSING STRATUS CLUSTER (Back to Top)

To access the cluster you must first submit a site access request to be approved. Once you have been approved you will be required to create an XCAMs account that will be reviewed and must be approved by ARM management to gain access to the ADC cluster.

SSH Access:

The Stratus cluster is accessed using a secured shell (SSH) client, using the following two steps. XCAMS/UCAMS accounts are required to access Stratus.

Step 1: Login to the login Server:

XCAMS users:

ssh username@collab.cades.ornl.gov

UCAMS users:

ssh username@login1.ornl.gov

Step 2: Login to the Stratus Cluster:

ssh username@stratus.ornl.gov

DATA STORAGE RESOURCES (Back to Top)

The ADC Cluster provides an array of data storage platforms, each designed with a particular purpose in mind. Storage areas are broadly divided into two categories: those intended for user data and those intended for project data. Within each of the two categories, we provide different sub-areas, each with an intended purpose:

Purpose	Storage Area	Path
Long-term data for routine access	User Home	/home/$USER
Short-term project data for fast, batch-job access that is not shared	User Scratch	/lustre/or-hydra/cades-arm/$USER
Short-term project data for fast, batch-job access that’s shared with other project members	Project Share	/lustre/or-hydra/cades-arm/proj-shared/$USER
Short-term project data for fast, batch-job access that is shared globally	World Share	/lustre/or-hydra/cades-arm/world-shared/$USER
Fast read/write access during a batch job	Local Scratch	$localscratch
Long term storage of data not currently in use (Currently, only accessible using command adc_xfer)	User Temp	/data/adc/stratus/
Placeholder	User Archive	HPSS (if applicable)

User Home (Back to Top)

Home directories for each user are NFS-mounted on all ADC Cluster systems and are intended to store long-term, frequently-accessed user data. User Home areas are not backed up. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

User Scratch (Back to Top)

Project members get an individual User Scratch directory; these reside in the high-capacity Lustre® file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Because of the scratch nature of the file system, it is not backed up and files are automatically purged on a regular basis. Files should not be retained in this file system for long, but rather should be migrated to HPSS Archive space as soon as the files are not actively being used. If a file system associated with your User Scratch directory is nearing capacity, the ADC Cluster Support may contact you to request that you reduce the size of your Member scratch directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

Project Share (Back to Top)

Individual Project Share directories reside in the high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Because of the scratch nature of the file system, it is not backed up. If a file system associated with Project Share storage is nearing capacity, the ADC Cluster Support may contact the PI of the project to request that he or she reduce the size of the Project scratch directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

World Share (Back to Top)

Each project has a World Share directory that resides in the high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Because of the scratch nature of the file system, it is not backed up. If a file system associated with World Share storage is nearing capacity, the ADC Cluster may contact the PI of the project to request that he or she reduce the size of the World Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.

Local Scratch Storage (Back to Top)

A fast solid state (SSD) disk area intended for parallel access to temporary storage in the form of scratch directories. This area is local to the computational node. This directory is, for example, intended to hold temporary and intermediate output generated by a user’s job. This is a run time only file system which is created at the start of a batch job and is purged at the end of the job. Files should not be retained in this file system and should be migrated to Lustre scratch or archival storage before finishing the job.

Path for local scratch storage is available during job runtime via environment variable $localscratch. Variable $localscratch typically has the form "/localscratch/tmp.$USER.$PBS_JOBID.or-condo-pbs01" and is specific to the user and to the scheduled job.

Project Storage (WARP) (Back to Top)

A NFS area intended for temporary data storage for moving data off the Lustre file system. This area is local to the computational node. This directory is, for example, intended to hold temporary and intermediate output generated by a user’s job. This is a run time only file system which is created at the start of a batch job and is purged at the end of the job. Files should not be retained in this file system and should be migrated to Lustre scratch or archival storage before finishing the job.

Data Transfer (Back to Top)

Scripts

We have provided two different methods to get data into the stratus cluster. The datastream_search tool allows access to datastreams that are too big or too old to be stored in the data center's fast storage. The adc_xfer tool allows users access to all the resources available in /data/archive and /data/datastream. For more information, see the table below:

Data Transfer Scripts	Available Data Search Paths	Script Usage	Resource
datastream_search	NA: See script instructions	This program allows you to search and retrieve ARM datastreams. You can search for datastreams by site, data level, and instrument. Once you found your datastreams you can retrieve them with the -r flag.	HPSS Deep Archive
adc_xfer	/data/datastream /data/project /data/archive	This program allows you to move data between the ARM Data Center and the cluster. You can search for a path in the ADC using the -als flag or on the cluster using -cls.	ARM Data Center

Data Transfer Scripts

Available Data Search Paths

Script Usage

Resource

datastream_search

NA: See script instructions

This program allows you to search and retrieve ARM datastreams. You can search for datastreams by site, data level, and instrument. Once you found your datastreams you can retrieve them with the -r flag.

HPSS Deep Archive

adc_xfer

/data/datastream

/data/project

/data/archive

This program allows you to move data between the ARM Data Center and the cluster. You can search for a path in the ADC using the -als flag or on the cluster using -cls.

ARM Data Center

Script Usage

Before you have access to these tools you need to export MODULEPATH (see documentation below). Then load the module data_wrapper/1.0.0 . Now you can use the data transfer scripts like a normal Linux program.

Transfer data from HPSS to Stratus lustre:

datastream_search [-h] [-s YYYY-MM-DD] [-e YYYY-MM-DD] [-d DATASTREAM] [--site YYY] [--data_level XX] [--inst INSTRUMENT] [-r] [-v]

The optional arguments and the descriptions are listed below:

Optional Arguments		Argument Description
-h	--help	show this help message and exit
-s	--start	start date (Default: 1970-01-01)
-e	--end	end date (Default: today)
-d	--datastream	Get a specific ARM datastream
	--site	filters by site
	--data_level	filter by data level
	--inst	filter by instrument
-r	--retrieve	retrieve files
-v		verbose

Transfer data from ADC to Stratus lustre:

adc_xfer [-h] (-a ADC_SRC_PATH | -c CLUSTER_SRC_PATH | -als ) [--ops | --sci] DEST

Positional Arguments:

DEST	full path to destination

The optional arguments and the descriptions are listed below:

Optional Arguments		Argument Description
-h	--help	show this help message and exit
-a		move file from ADC to cluster (See available Data Search Paths)
-c		move file from cluster to ADC
-als	--arm-list	lists a directory in ADC
--ops		move to operational directory on warp
--sci		move to science directory on warp

Summary

The following table details quota, backup, purge, and retention information for each user-centric and project-centric storage area available at the ADC Cluster.

Data Storage Resources

Area	Path	Type	Permissions	Quota	Backups	Purged	Retention
User Home	/home/$USER	NFS	User-controlled	10 GB	No	No	NA
User Scratch	/lustre/or-hydra/cades-arm/$USER	Lustre	700	TBD	No	No	TBD Days
Project Share	/lustre/or-hydra/cades-arm/proj-share	Lustre	770	TBD	No	No	TBD Days
World Share	/lustre/or-hydra/cades-arm/world-share	Lustre	775	TBD	No	No	TBD Days
Local Scratch	/lustre/or-hydra/cades-arm/scratch	Lustre	770	TBD	No	No	TBD Days

Default Shell (Back to Top)

The default shell on the Stratus cluster is bash. However, Stratus supports the following shells:

bash
tcsh
csh
ksh

Using Modules (Back to Top)

The modules software package allows you to dynamically modify your user environment by using pre-written module files.

Each module file contains the information needed to configure the shell for an application. After the modules software package is initialized, the environment can be modified on a per-module basis using the module command, which interprets a module file. Typically, a module file instructs the module command to alter or set shell environment variables such as PATH or MANPATH. Module files can be shared by many users on a system, and users can have their own personal collection to supplement and/or replace the shared module files. As a user, you can add and remove module files from your current shell environment. The environment changes performed by a module file are viewed by using the module command. More information on modules are found by running man module.

To access ARM specific softwares in your environment, add following to your .bashrc

export MODULEPATH=/software/user_tools/current/cades-arm/modulefiles:$MODULEPATH

Summary of module commands (Back to Top)

Command	Description
module list	Lists modules currently located in user's environment
module avail	Lists all available modules on a system in condensed format
module avail -l	Lists all available modules on a system in long format
module display	Shows environment changes that will be made by loading a given module
module load	Loads a module
module unload	Unloads a module
module help	Shows help for a module
module swap	Swaps a currently loaded module for an unloaded module

Available Software (Back to Top)

Additional software may be installed within the cluster, as required. To check list of available software run command module avail

To see list of softwares loaded in your environment, run command module list

These are the ARM specific modules

ADI	nco/4.6.4/td>	cdo/1.8.2
lblrtm/12.1	monortm_wrapper/1.0.0	proj/4.9.3
aerioe	ATLAS/3.10.3	data_wrapper/1.0.0
monortm/5.2

Other available software are:

amber/14 cuda/8.0(default)	intel/18.0.0(default)	openmpi/2.1.1
schrodinger/2018-1(default)	amber/16(default)	darshan/3.1.4(default)
java/1.8.0_131(default)	openmpi/2.1.2	silo/4.10.2(default)
anaconda2/4.3.1	eden/clm(default)	keras/2.0.2-conda(default)
openmpi/3.0.0(default)	silo/4.9.1	anaconda2/4.4.0
EMSOFT/4.0-apr16-18(default)	lammps/31Mar17(default)	paraview/5.4.0(default)
singularity/2.3.2	anaconda2/5.1.0(default)	exodus/5.14(default)
lapack/3.7.1(default)	pegasus/4.8.2(default)	singularity/2.4
anaconda3/4.4.0	ferret/7.2(default)	magma/1.7.0(default)
PE-gnu/1.0(default)	singularity/2.4.2(default)	anaconda3/5.1.0(default)
ftw/3.3.5(default)	matlab/R2017a(default)	PE-gnu/2.0
SPEC2006/1.2(default)	ArmForge/18.0(default)	gcc/4.9.4
metis/5.1.0(default)	PE-intel/1.0(default)	subversion/1.9.4(default)
ATLAS/3.10.3(default)	gcc/5.3.0	mkl/2017(default)
PE-nag/1.0(default)	swift/1.4(default)	autoconf/2.69(default)
gcc/6.2.0	mpich/3.2(default)	PE-pgi/1.0(default)
swig/2.0.12	automake/1.15(default)	gcc/6.3.0(default)
nag/6.0(default)	pgi/15.7.0(default)	swig/3.0.12(default)
blas/3.8.0(default)	git/2.11.0(default)	namd/2.11(default)
plumed/2(default)	szip/2.1(default)	boost/1.61.0
gnuplot/5.0.6(default)	ncl/6.3.0(default)	postgresql/9.3.5(default)
tensorflow/0.11-cpu	boost/1.64.0	gromacs/2016.3(default)
nco/4.6.4	python/2.7.12	tensorflow/0.11-cpu-conda(default)
boost/1.67.0(default)	gromacs/2016.3-AVX2	nco/4.6.6
python/2.7.13	tensorflow/0.11-gpu	bzip2/1.0.6(default)
gromacs/5.1.2	nco/4.6.9(default)	python/3.6.1
gromacs/5.1.2	nco/4.6.9(default)	python/3.6.1
theano/0.9-conda(default)	caffe/1.0-conda(default)	gtk+/2.24.31(default)
ncview/2.1.7(default)	python/3.6.3(default)	trilinos/12.12.1(default)
cctools/6.2.7(default)	hdf5/1.8.17(default)	netcdf/4.3.3.1(default)
QE/5.4.0	vinampi/v2(default)	cdo/1.8.2(default)
hdf5-parallel/1.8.17(default)	netcdf-hdf5parallel/4.3.3.1(default)	QE/6.0
visit/2.10.3	cmake/3.11.0(default)	hypre/2.11.1(default)
nextflow/0.27.6(default)	QE/6.1(default)	visit/2.13.1(default)
cmake/3.5.2 hypre/2.6.0b	nwchem/6.6(default)	qt/5.6.1(default)
vmd/1.9.3(default)	cmake/3.6.1	idl/8.6(default)
openBLAS/0.2.19	R/3.3.2	wxpython/3.0.2.0(default)
cmake/3.8.2	ilamb/2.1	OpenFOAM/5.0
R/3.3.2-intel	xalt/0.7.5	cp2k/5.1(default)
ilamb/latest(default)	OpenFOAM/dev(default)	R/3.5.0(default)
xalt/0.7.6(default)	cuda/6.5	intel/16.0.1
openmpi/1.10.3	sassena/1.4.2(default)	zlib/1.2.8(default)
cuda/7.5	intel/17.0.0	openmpi/2.0.0
scalapack/2.0.2(default)	utils/atlas/3.11.34

Requesting new software

To request a new software, please contact cluster support

Compiling codes (Back to Top)

C, Fortran examples

coming soon.

MPI Example

copy

                      
 //hello_mpi.c

 #include 
 #include 

int main(int argc, char** argv) {
  // Initialize the MPI environment
  MPI_Init(NULL, NULL);
  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  // Get the name of the processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);
  // Print off a hello world message
  printf("Hello world from processor %s, rank %d  out of %d processors\n", processor_name, world_rank, world_size);
  // Finalize the MPI environment.
  MPI_Finalize();
  }

To compile mpi example, please load gcc and openmpi modules. After loading the modules, use gcc to compile hello_mpi.c. For example,

copy

                      
module load gcc/5.3.0
module load openmpi/2.0.0
gcc $OPENMPI_INC -o mpi_test.o -c mpi_test.c

CUDA/GPU example

coming soon.

IDL example

coming soon.

Available Queues: (Back to Top)

Queue	Qos	Max Walltime	Priority	Description
Batch	burst			Burst queue allows for use of resources beyond ARM Stratus system
arm_high_mem	devel	4 hours	high	Queue for use of CPU nodes
	std	48 hours	lower
	long	2 weeks	lowest
arm-prod		6 hours	highest	Standing reservation for production jobs. Allows for use of up to 15 nodes between 12:00AM - 6:00AM daily. (Use of this queue requires approval from ARM management).

Example Submission Scripts:

Burst queue

#PBS -N jobname
#PBS -A arm-burst
#PBS -q batch
#PBS -W group_list=cades-user
#PBS -l qos=burst
#PBS -l nodes=16:ppn=16
#PBS -l walltime=1:00:00

Standard CPU queue:

#PBS -N jobname
#PBS -A arm
#PBS -q arm_high_mem
#PBS -W group_list=cades-arm
#PBS -l qos=std
#PBS -l nodes=16:ppn=16
#PBS -l walltime=1:00:00

GPU queue:

#PBS -N jobname
#PBS -A arm
#PBS -q gpu_ssd
#PBS -W group_list=cades-arm
#PBS -l qos=std
#PBS -l nodes=2:ppn=16
#PBS -l walltime=1:00:00

ARM production queue:

ARM production queue (arm-prod) has been specifically designed for daily production jobs. 15 nodes on Stratus queue is placed in a reservation every day during period of 12:00AM - 6:00AM to provide dedicated and guaranteed resources for critical operational jobs.

Details of the standing reservations can be checked using command “showres”.

#PBS -N jobname
#PBS -A arm
#PBS -q arm_high_mem
#PBS -x=FLAGS:ADVRES:arm-prod.5700
#PBS -W group_list=cades-arm
#PBS -l qos=std
#PBS -l nodes=16:ppn=16
#PBS -l walltime=1:00:00

Scheduling batch jobs (Back to Top)

Batch scripts, or job submission scripts, are the mechanism by which a user submits and configures a job for execution. A batch script is simply a shell script which contains:

Commands that can be interpreted by batch scheduling software (e.g. PBS)
Commands that can be interpreted by a shell

The batch script is submitted to the batch scheduler where it is parsed. Based on the parsed data, the batch scheduler places the script in the scheduler queue as a batch job. Once the batch job makes its way through the queue, the script will be executed on a service node within the set of allocated computational resources.

Sections of a Batch Script - Batch scripts are parsed into the following three sections:

The Interpreter Line: The first line of a script can be used to specify the script’s interpreter. This line is optional. If not used, the submitter's default shell will be used. The line uses the "hash-bang-shell" syntax: #!/path/to/shell
The Scheduler Options Section: The batch scheduler options are preceded by #PBS, making them appear as comments to a shell. PBS will look for #PBS options in a batch script from the script’s first line through the first non-comment line. A comment line begins with #. #PBS options entered after the first non-comment line will not be read by PBS.
The Executable Commands Section: The shell commands follow the last #PBS option and represent the main content of the batch job. If any #PBS lines follow executable statements, they will be ignored as comments.

The execution section of a script will be interpreted by a shell and can contain multiple lines of executable invocations, shell commands, and comments. When the job's queue wait time is finished, commands within this section will be executed on a service node (sometimes called a "head node") from the set of the job's allocated resources. Under normal circumstances, the batch job will exit the queue after the last line of the script is executed.

An example batch script:

1: #!/bin/bash
2: #PBS -A arm
3: #PBS -N kazsacr0reproc
4: #PBS -l nodes=16:ppn=8
5: #PBS -l walltime=4:00:00
6: #PBS -W group_list=cades-arm
7: #PBS -q arm_high_mem
8: #PBS -j oe
9: #PBS -m abe
10: #PBS -M email@address
11: #PBS -V
12: #PBS -o o.log
13: #PBS -e e.log
14: #PBS -S /bin/bash

The following table summarizes frequently-used options to PBS:

Option	Use	Description
-A	#PBS -A	Causes the job time to be charged to ???. The account string, e.g. arm, is typically composed of the three letters followed by three digits and optionally followed by a subproject identifier, The utility showproj can be used to list your valid assigned project ID(s). This option is required by all jobs.
-l	#PBS -l nodes=	Maximum number of compute nodes. Jobs cannot request partial nodes.
	#PBS -l ppn=	Processors per nodes.
	#PBS -l walltime=	maximum wall-clock time, is in the format HH:MM:SS.
	#PBS -l walltime=	maximum wall-clock time, is in the format HH:MM:SS.
	#PBS -l partition=	Allocates resources on specified partition.
-o	#PBS -o	Writes standard output to ??? instead of .o$PBS_JOBID, $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier.
-e	#PBS -e	Writes standard error to ??? instead .e$PBS_JOBID
-j	#PBS -j {oe, eo}	Combines standard output and standard error into the standard error file (eo) or the standard out file (oe).
-m	#PBS -m a	Sends email to the submitter when the job aborts.
	#PBS -m b	Sends email to the submitter when the job begins.
	#PBS -m e	Sends email to the submitter when the job ends.
-M	#PBS -M	Specifies email address to use for -m options.
-N	#PBS -N	Sets the job name to ??? instead of the name of the job script.
-S	#PBS -S	Sets the shell to interpret the job script.
-q	#PBS -q	Directs the job to the specified queue. This option is not required to run in the default queue on any given system.
-V	#PBS -V	Exports all environment variables from the submitting shell into the batch job shell. Not recommended because the login nodes differ from the service nodes, using the '-V' option is not recommended. Users should create the needed environment within the batch job.
-X	#PBS -X	Enables X11 forwarding. The -X PBS option should be used to tunnel a GUI from an interactive batch job.

To submit job to the queue:

qsub job_submission_script

To check status of the jobs in the queue:

qstat -u username

PBS sets multiple environment variables at submission time. The following PBS variables are useful within batch scripts:

Variable	Description
$PBS_O_WORKDIR	The directory from which the batch job was submitted. By default, a new job starts in your home directory.
$PBS_JOBID	The job's full identifier. A common use for PBS_JOBID is to append the job's ID to the standard output and error files.
$PBS_NUM_NODES	The number of nodes requested.
$PBS_JOBNAME	The job name supplied by the user.
$PBS_NODEFILE	The name of the file containing the list of nodes assigned to the job. Used sometimes on non-Cray clusters.

Stratus Quickstart

Contents

REQUEST AN ACCOUNT

SYSTEM OVERVIEW

Cluster Policies

ACCESSING STRATUS CLUSTER (Back to Top)

SSH Access:

DATA STORAGE RESOURCES (Back to Top)

User Home (Back to Top)

User Scratch (Back to Top)

Project Share (Back to Top)

World Share (Back to Top)

Local Scratch Storage (Back to Top)

Project Storage (WARP) (Back to Top)

Data Transfer (Back to Top)

DATA RETENTION, PURGE, & QUOTAS

Summary

Data Storage Resources

Software Environment

Default Shell (Back to Top)

Using Modules (Back to Top)

Summary of module commands (Back to Top)

Available Software (Back to Top)

Requesting new software

Compiling codes (Back to Top)

C, Fortran examples

MPI Example

CUDA/GPU example

IDL example

Running Jobs

Available Queues: (Back to Top)

Example Submission Scripts:

Burst queue

Standard CPU queue:

GPU queue:

ARM production queue:

Scheduling batch jobs (Back to Top)

An example batch script:

Stratus Quickstart

Contents

REQUEST AN ACCOUNT

SYSTEM OVERVIEW

Cluster Policies

ACCESSING STRATUS CLUSTER (Back to Top)

SSH Access:

DATA STORAGE RESOURCES (Back to Top)

User Home (Back to Top)

User Scratch (Back to Top)

Project Share (Back to Top)

World Share (Back to Top)

Local Scratch Storage (Back to Top)

Project Storage (WARP) (Back to Top)

Data Transfer (Back to Top)

DATA RETENTION, PURGE, & QUOTAS

Summary

Data Storage Resources

Software Environment

Default Shell (Back to Top)

Using Modules (Back to Top)

Summary of module commands (Back to Top)

Available Software (Back to Top)

Requesting new software

Compiling codes (Back to Top)

C, Fortran examples

MPI Example

CUDA/GPU example

IDL example

Running Jobs

Available Queues: (Back to Top)

Example Submission Scripts:

Burst queue

Standard CPU queue:

GPU queue:

ARM production queue:

Scheduling batch jobs (Back to Top)

An example batch script:

Title