Computer Science Status

Subscribe to Computer Science Status feed
Updated: 1 hour 34 min ago

CS Cycles/Ionic/Neuronic System Downtime, Tuesday, January 7, 2025, 07:00-15:00

Tue, 12/24/2024 - 10:02am
Date: Tuesday, January 7, 2025 (07:00-15:00)

Who is affected:
All users of the CS Department Beowulf high performance computing clusters,
known as ionic and neuronic.

All users of the CS Staff-managed public login systems, including the
cycles, courselab, and armlab systems.

What is happening:
Ionic and neuronic nodes will have Nvidia, CUDA, and kernel drivers updated
to fix GPU-related failures. In addition, cluster management and job
scheduling system slurm and its database will be upgraded. No data loss is
anticipated. After the upgrade, machines will be rebooted.

Cycles, courselab, and armlab machines will be rebooted during this window
to clear some defunct user processes interfering with research work.

Why is it happening:
Ionic nodes are experiencing various GPU-related failures. To address these
problems, we will be updating Nvidia, CUDA, and kernel modules.

Additionally, some user processes have entered a defunct state, hindering
research activities. To resolve this, a system reboot is necessary to clear
these processes.

We will post updates to the status page: www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff