This guide is designed for researchers who are new to the UVA HPC System. Throughout this guide, we shall use the placeholder mst3k to represent the user’s login ID. The user should substitute his/her own login ID for mst3k. Specifications on the current HPC system can be found here.
Rivanna provides a high-performance computing environment for all user levels. A majority of Rivanna’s nodes are Cray Cluster Solutions nodes connected by FDR (fourteen data rate) Infiniband, but there are also two nodes with NVIDIA Kepler K20 GPUs, several nodes with QDR (quad data rate) Infiniband, and quite a few older nodes connected with gigabit ethernet.
All nodes share a Lustre filesystem for temporary storage called /scratch with up to 1.4PB of storage space for all users. Each user is assigned space in /scratch/$USER with a default quota of 10TB of storage.
Accessing the System
Time on Rivanna is allocated as Service Units (SUs). One SU corresponds to one core-hour. Allocations are managed through MyGroups accounts. The group owner is the Principal Investigator (PI) of the allocation. Faculty, staff, and postdoctoral associates are eligible to be PIs. Students—both graduate and undergraduate—must be members of an allocation group sponsored by a PI. Each PI is ultimately responsible for managing the roster of users in the group although PIs may delegate day-to-day management to one or more other members. When users are added or deleted, accounts are automatically created or purged at the next system update.
Trial allocations of 5,000 SUs are available on request. Standard allocations of 50,000 SUs require a short justification. Larger allocations may be requested through administrative grants or may be purchased through a PTAO.
The system is accessed through ssh (Secure Shell) connections using the hostname rivanna.hpc.virginia.edu. Windows users must install an ssh client such as SecureCRT or PuTTY. For Windows users we recommend MobaXterm. Mac OSX and Unix users may connect through a terminal using the command ssh email@example.com. Users working from off Grounds must run the UVA Anywhere VPN client.
Users who wish to run X11 graphical applications may prefer the FastX remote desktop client.
The Modules Environment
User-level software is installed into a shared directory /share/apps. The modules software enables users to manage their environments to access specific software, or even specific versions of the software. The most commonly used commands include:
- module avail (prints a list of all software packages available through a module)
- module avail <package> (prints a list of all versions available for <package>)
- module load <package> (loads the default version of <package>)
- module load <package>/<version> (loads the specific <version> of <package>)
- module unload <package> (removes <package> from the current environment)
- module purge (removes all loaded modules from the environment)
- module list (prints a list of modules loaded in the user’s current environment)
For more details about modules see the documentation.
Software accessed through modules is available for all users. Users may install their own software to their home directory or to shared leased space provided they are legally permitted to do so, either because it is open source or because they have obtained their own license. User-installed software may not require root privileges to install or operate under any circumstances. User software may run daemons (services) provided that those services do not interfere with other users.
Users may petition ARCS to install software into the common directories. Each request will be considered on an individual basis and may be granted if it is determined that the software will be of wide interest. In other cases ARCS may help users install software into their own space.
Submitting Jobs to the Compute Nodes
Rivanna resources are managed by the SLURM workload manager. The login rivanna.hpc.virginia.edu consists of multiple dedicated servers but their use is restricted to editing, compiling, and running very short test processes. All other work must be submitted to SLURM to be scheduled onto a compute node.
SLURM divides the system into partitions which provide different combinations of resource limits, including wallclock time, aggregate cores for all running jobs, and charging rates against the SU allocation. There is no default and users must choose a partition in each script.
Users may run the command queues to determine which partitions are enabled for them. This command will also show the limitations in effect on each queue.
Users may run the command allocations to view the allocation groups to which they belong and to check their balances.
Jobs submitted to these partitions are charged against the group’s allocation.
- parallel: jobs that can take advantage of the InfiniBand interconnect.
- request: like parallel but users may access all high-performance cores. Limited to intervals following maintenance.
- largemem: jobs that require more than one core’s worth of memory per core requested.
- serial: single-core jobs that need higher-speed access to temporary storage.
- development: short debugging runs
- gpu: access to two Kepler-equipped nodes for testing general-purpose GPU (GPGPU) codes.
This partition consists of older nodes with ethernet only. Jobs submitted to the economy partition are charged at a reduced rate.
SLURM jobs are shell scripts consisting of a preamble of directives or pseudocomments that specify the resource requests and other information for the scheduler, followed by the commands required to load any required modules and run the user’s program. Directives begin with the “pseudocomment” #SBATCH followed by options. Most SLURM options have two forms; a shorter form consisting of a single letter preceded by a single hyphen and followed by a space, and a longer form preceded by a double hyphen and followed by an equal sign (=). In SLURM a “task” corresponds to a process; therefore threaded applications should request one task and specify the number of cpus (cores) per task.
Common SLURM Options:
Number of nodes requested:
#SBATCH -N <N> #SBATCH --nodes=<N>
Number of tasks per node:
Total tasks (processes) distributed across nodes by the scheduler:
#SBATCH -n <n> #SBATCH --ntasks=<n>
Number of tasks per core:
Wallclock time requested:
#SBATCH –t d-hh:mm:ss #SBATCH --time=d-hh:mm:ss
Memory request in megabytes over each node (the default is 1000 (1GB)):
Memory request in megabytes per core (may not be used with --mem):
Request partition <part>:
#SBATCH –p <part> #SBATCH --partition=<part>
Specify the account to be charged for the job (this should be present even for economy jobs; the account name is the name of the MyGroups allocation group to be used for the specified run):
#SBATCH –A <account> #SBATCH --account=<account>
Example Serial Job Script:
#!/bin/bash #SBATCH -N 1 #SBATCH --ntasks-per-node=1 #SBATCH -t 12:00:00 #SBATCH -p serial #SBATCH -A mygroup # Run program ./myprog myoptions
Example Parallel Job Script:
#!/bin/bash #SBATCH -N 2 #SBATCH --ntasks-per-node=4 #SBATCH -t 12:00:00 #SBATCH -p parallel #SBATCH -A mygroup # Run parallel program over Infiniband using MVAPICH2 module load mvapich2/intel mpirun -launcher slurm ./xhpl > xhpl_out
Submitting a Job and Checking Status
Once the job script has been prepared it is submitted with the sbatch command:
The scheduler returns the job ID, which is how the system references the job subsequently.
Submitted batch job 36598
To check the status of the job, the user may type
squeue –u mst3k
Status is indicated with PD for pending, R for running, and CG for exiting.
By default SLURM saves both standard output and standard error into a file called slurm-<jobid>.out. This file is created in the submit directory and is appended during the run.
Canceling a Job
Queued or running jobs may be canceled with
Note that user-canceled jobs are charged for the time used when applicable.
Any eligible Principal Investigator may request a Trial allocation of 5,000 SUs. Under certain circumstances a supplemental allocation known as a "Standard" allocation of an additional 5,000 SUs may be granted. If the PI is an affiliate of the College of Arts and Sciences or the School of Engineering and Applied Science, the PI can file a short proposal to request a larger Administrative allocation. PIs affiliated with other units should submit allocation requests to the Data Sciences Institute. Time can also be purchased through external funding at a rate determined by the HPC Steering Committee. Trial, Standard, and Administrative allocation grants are for one year and must be renewed. Purchased time does not expire during the active interval of the grant.
PIs may request only one Trial allocation per year but may extend that group with an additional allocation received for the projects. PIs who must keep projects separated, such as to distinguish those funded externally from those granted internally, may have more than one allocation group.
If a group exhausts its allocation, all members of the group will have access only to the economy queue. If an individual user exceeds the /scratch filesystem limitations, only that user will be blocked from submitting new jobs on any partition.
Exceeding the limits on the frontend will result in the user’s process(es) being killed. Repeated violations will result in a warning; users who ignore warnings risk losing access privileges.
Excessive consumption of licenses for commercial software, either in time or number, if determined by system and/or ARCS staff to be interfering with other users' fair use of the software, will subject the violator's processes or jobs to termination without warning. Staff will attempt to issue a warning before terminating processes or jobs but inadequate response from the violator will not be grounds for permitting the processes/jobs to continue.
Any violation of the University’s security policies, or any behavior that is considered criminal in nature or a legal threat to the University, will result in the immediate termination of access privileges without warning.
|Qty||Processor Family||Base Micro architecture||Cores Per Node||GB RAM Per Node||MHz Processor Speed||Memory Speed||Internal Network (gbps)||UVA Network (gpbs)||Queue Assignment(s)|
|4||Intel Ivy Bridge EP||Sandy Bridge||20||64||2,500||DDR3-1866||56||10 (direct)||none (interactive nodes)|
|240||Intel Ivy Bridge EP||Sandy Bridge||20||128||2,500||DDR3-1866||56||10 (routed)||serial and parallel|
|2||Intel Sandy Bridge-EP||Sandy Bridge||16||256||2,600||DDR3-1600||40||10 (routed)||gpu|
|4||Intel Haswell-EP||Sandy Bridge||16||1,024||2,600||DDR4-1866||56||10 (routed)||largemem|
|11||Intel Sandy Bridge-EP||Sandy Bridge||16||128||2,700||DDR3-1600||1||1 (routed)||economy|
|4||Intel Westmere-EP||Nehalem||8||48||2,400||DDR3-1066||1||1 (routed)||economy|
|22||Intel Westmere-EP||Nehalem||12||96||2,530||DDR3-1333||1||1 (routed)||economy|
|7||Intel Westmere-EP||Nehalem||12||96||2,670||DDR3-1333||1||1 (routed)||economy|
|11||AMD Magny-Cours||16||16||2,000||DDR3-1333||1||1 (routed)||economy
|8||Intel Phi||Knights Landing||64 (256)||208||1,300||DDR4-1200||56||10 (routed)||development|