- How do I gain access to Rivanna?
- How do I log on to Rivanna?
- How do I reset my current password / obtain a new password?
- How do I check my allocation status on Rivanna?
- How do I add or remove people from my allocations?
- How do I use research software that's already installed?
- Does ARCS install research software?
- Is there any other way to install research software that I need?
- How do I submit jobs?
- How do I submit an interactive job?
- What queues can I use?
- How do I choose which queue to use?
- How do I check the status of my jobs?
- Why is my job not starting?
- Why can't I submit jobs anymore?
- How do I check the efficiency of my completed jobs?
- How do I obtain leased storage?
- How do I check my /scratch usage on Rivanna?
- How do I check how much leased or home storage I am using on Rivanna?
Please read and follow these instructions.
Access to the HPC cluster requires a valid Eservices password. Your Netbadge password is not necessarily the same thing, so if you are unable to log in, you should first try resetting your Eservices password here. If the problem persists, contact ITS (which manages all Eservices accounts) through the group's online help desk.
In all cases you will need to use an account with remaining service units in order to submit jobs.
You must use the MyGroups interface to do this, and you must have administrative access to the group.
Please read this.
ARCS will install software onto Rivanna if it is of wide applicability to the user community. Software used by one group should be installed by the group members, ideally onto leased storage for the group. We can provide assistance for individual installations.
For help installing research software on your PC, please contact Research Software Support at firstname.lastname@example.org.
Some groups and departments have installed a bundle of software they need into shared space. Please see your departmental IT support personnel if your department has its own bundle.
You submit jobs by writing a SLURM script and submitting it with the sbatch command; please read this.
You may use the locally-written ijob command to submit an interactive job. The minimum required options are -A
If you wish to forward X11 in order to use a graphical user interface or to run other graphics programs, on the Rivanna frontend node run ssh -Y localhost before you run ijob.
After logging in, run the command queues to see what queues to which you have access, for example:
Note: you may not see the same as the above when you run the command.
Run the command queues, based on the Time-Limit, Maximum Cores/Job, SU Rate, and Usable Accounts values, please pick a queue that best suits the needs of your research.
If reporting a problem to us about a particular job, please let us know the JobID for the job that you are having a problem with. You can also run jobq -l to relate particular jobs to specific submission scripts:
Several things can cause jobs to wait in the queue. If you request a resource combination we do not have, such as 28 cores on a parallel node, the queueing system will not recognize that this condition will not be met and will leave the job pending (PD). You may also have run a large number of jobs in the recent past and the "fair share" algorithm is allowing other users higher priority. Finally, the queue you requested may simply be very busy.
Usually this is because you inadvertently submitted the job to run in a location that the compute nodes can't access or is temporarily unavailable. If your jobs exit immediately this is usually why. Other common reasons include using too much memory, too many cores, or running past a job's timelimit.
You can run sacct:
[aam2y@udc-ba36-27:/root] sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 159637 ompi_char+ parallel hpc_admin 80 COMPLETED 0:0 159637.batch batch hpc_admin 1 COMPLETED 0:0 159637.0 orted hpc_admin 3 COMPLETED 0:0 159638 ompi_char+ parallel hpc_admin 400 TIMEOUT 0:1 159638.batch batch hpc_admin 1 CANCELLED 0:15 159638.0 orted hpc_admin 19 CANCELLED 255:126
If it's still not clear why your job was killed, please contact us and send us the output from sacct.
You must not be overallocated with your /scratch usage and you must have some remaining service units in order to submit jobs. Please check the output of sfsq and/or allocations to determine what the problem is.
If your rating is low, please contact us: we can help.
You can lease Enterprise or Value storage from here.
If you have used up too much space, created too many files, or have "old" files you may be regarded as "overallocated". Please note that if you are overallocated, you won't be able to submit any new jobs until you clean up your /scratch folder.
To check your home space, run quota -s:
bash-4.1$ quota -s Disk quotas for user jm9yq (uid 650224): Filesystem blocks quota limit grace files quota limit grace 10.243.122.179:/home 395M 0 4096M 0 0 0
To check your leased space, change directory in your your leased space and then run df -h /nv/volX, where volX is your leased storage:
bash-4.1$ cd /nv/vol89 bash-4.1$ df -h /nv/vol89 Filesystem Size Used Avail Use% Mounted on nas19-s.itc.virginia.edu:/export/vol89 247G 225G 22G 92% /nv/vol89