Lessons learned

 

Temporary file storage

  • Unlike traditional Unix/Linux machines use of shared memory space /dev/shm is discouraged on Stratus. On Stratus /dev/shm is only a placeholder without any storage space allocated to it.
  • All Stratus nodes are equipped with ~2 TB of Solid State Drive (SSD) that be used for fast temporary read/write storage space during a job.
  • SSD space is available as scratch space only during run time within a job.
  • It uses a “btrfs” filesystem created at start of the job and destroyed at the end. Runtime location of this temporary scratch storage space is available as environment variable “localscratch”.

Batch scheduler

  • Stratus cluster employs a Torque/Moab scheduler system to efficiently and fairly allocating resources to the user jobs using a queue system.
  • Users interested in utilizing the available resources to speed up a large number of independent serial jobs should use a script/wrapper to pool them a single large job to submit in the queues.
  • While one can submit a large number of small independent jobs to the queue, doing so would impact the priority of their jobs due to scheduler’s use of fair-share scheduling algorithm.