What important terms should I be familiar with?
CS Account: Having a CS account means that you have been allocated a login to cycles.cs.princeton.edu. This account is separate from your main Princeton account. Though the username (NetID) is most likely the same, the password is distinct. See How to Request A CS Account for more information.
Account: In the context of the Slurm scheduler, an Account is a list of users who share a resource. It is effectively an access control list of authorized users for a given collection of resources.
Partition: In the context of the Slurm scheduler, a Partition is a collection of resources (usually hardware) that can be assigned to a job or collection of jobs. In the CS cluster, we use partitions to divide hardware that belongs to particular research groups. Accounts are assigned access to their own Partitions, facilitating private group access to centrally managed resources.
Wall time or Walltime: When we discuss job scheduling, "wall time" is an important component of a job. It is the amount of real world time that a job is allowed to run, and is set as part of the job submission. This is notably distinct from the amount of CPU or GPU time a job might consume. The ionic cluster allows walltime to be set up to 7 days, and defaults to 24 hours.
Fairshare (set): Fairshare is a numeric value, generally assigned to an Account and applying to all members of that Account when using that Account, representing the total "share" of resources to which that Account is entitled based on the amount of resource they own within the cluster. This value is mainly important in the context of the sharing arrangement, under which most research groups allow jobs from non-members to run on their hardware in a limited fashion. The fairness of this sharing is managed by this Fairshare value. You can see Fairshare values in the output of this command: (sacctmgr list associations format=Account,Fairshare where user="") If you believe your group's Fairshare value is set incorrectly, reach out to CS Staff to discuss.
Fairshare (computed): Based on the set value discussed above, the computed Fairshare value is a floating point number that combines the weight of the value set above with recent utilization of cluster resources to assign a very fine-grained share value to each individual cluster user. Your computed Fairshare value will differ between accounts as well, so your utilization of the default "allcs" account will affect your Fairshare in that account, but not in your research group account. This value, you can think of as tracking your "consumption" of your fair share of the cluster. The more resource you use, the lower your computed Fairshare value will go. This is not a punishment; merely a mathematical fairness algorithm that attempts to give higher priority to users with lower utilization so that everyone is able to get some time.
How do I get added to the [NAME] Slurm Account so I can use my research group's cluster?
Here is the most efficient and effective workflow:
- Make sure the person who needs access has a CS Account. If they have a CS appointment, or are CS concentrators, they should already have a CS account or (in the case of undergrads) can request one on their own. See How Do I Request a CS Account? for details.
- Make sure the person who needs access has read the CS Cluster Computing page and, crucially, has joined the [beowulf] mailing list mentioned at the top of the page. Without this step, we cannot grant further access to the cluster.
- The faculty or staff sponsor of the research group should send CS Staff an email with the new person's NetID, asking that we add them to the appropriate Account on ionic. The new person may email us directly, but we must hear approval from the sponsor, so the request coming direct from the sponsor will speed things up.
Why is my job still PENDING instead of RUNNING?
It is important to understand that there are thousands of variables involved in the scheduler's decision-making. As a result, it may be impossible - or, at least, impractical - to determine with a high degree of certainty why one job starts before another job at any particular moment, and it is even more difficult to determine this retroactively after the jobs have completed. However, there are some high level concepts to be aware of that may help you understand the behavior you observe. First, a short (and not comprehensive) list of some of the variables the scheduler takes into account when ordering jobs:
- Fair Share - see the first FAQ on this page.
- Job Size - how many CPUs, how much memory, how many GPUs, and how much wall time were requested.
- QOS - for some portions of the cluster, there are limits to how much resource any single user can consume based on the size of their jobs.
- Time in Queue - the longer something waits in the queue, the higher its priority increases.
- Resource Availability - whether or not the resources you requested are available, including the detailed status of the cluster nodes.
- Resource Access - which resources your job is allowed to use, based on the Account under which you submit the job.
- Backfill Scheduling - the backfill scheduler will start lower priority jobs if doing so does not delay the expected start time of any higher priority jobs.
Some of these components can be observed live on the cluster - others are more difficult to witness directly. Fairshare values, for instance, can be observed using the sacctmgr command. The same command can be used to determine if you are a member of the Account you wish to use. You can observe the current state of your queued (or running) job using the scontrol show job command. The output of this command will show the requested resources, as well as any excluded resources, nodes scheduled to run the job, an expected start time if one is known, a reason for queuing if the job is not running, and more. Cluster node status can be observed using the sinfo command, or the scontrol show node command if you wish to find more detail on some specific node or nodes. Queue status can be observed with the squeue command. Detailed usage and share information on all active users can be examined using the sshare command. Lastly, priority information for all queued jobs can be observed using the sprio command.
Once you have dug through all of the above data, you may hopefully be able to determine for yourself why your job is queued. In rare cases, after investigating all of these components, you may find that something is amiss, such as a mismatch between allocated and available resources on a given node (where an apparently idle node seems to be missing resources). When this occurs, please reach out to CS Staff with the specifics of the pathology you observe so that we can investigate correcting it.
Slurm error: Job submit/allocate failed: Invalid account or account/partition combination specified
My jobs are being queued with reason: (ReqNodeNotAvail, Reserved for maintenance)
This queue reason is usually associated with an upcoming cluster maintenance window. Please check your email for announcements on the [downtime] or [beowulf] mailing lists. In the seven days leading up to a maintenance event, jobs that would not complete before the maintenance (based on their requested Walltime limit) will be held for processing after the maintenance. Once the maintenance is complete, the jobs should start normally.
You may also see this queue reason if the specific node or resource you request has an upcoming reservation not associated with a global maintenance window, but this is less likely.
You can see current reservations by running: scontrol show reservations
Where can I find more help?
SchedMD, the maintainers of the Slurm scheduler, also maintain an excellent FAQ. If you can't find your answer here or there, reach out to CS Staff for assistance.