Resolved
Resolved

We've now resolved the incident and the service is fully restored to 100% capacity.

Thanks for your patience.

Avatar for
Updated

The service is currently degraded. GPU capacity is operating at approximately 75%, while CPU capacity remains at around 50%.

Our plan to restore the service to full capacity next week remains on track.

Thank you for your patience.

Avatar for
Updated

The service is currently degraded and operating at approximately 50% capacity (up from 25% yesterday).

The login node and storage arrays are working normally, and data can be accessed or retrieved as usual.

At this stage, we expect full service restoration by Monday.

Avatar for
Recovering

We restored 5 of the 40 nodes on Monday morning and are monitoring the performance this week with a view to a full restoration by Friday.

Avatar for
Identified

The air conditioning unit has been restored. However, we are keeping the compute nodes off overnight and will do a phased recovery tomorrow. Please await a further update in the morning and apologies again for the disruption of service.

Avatar for
Investigating

We are currently experiencing an air-conditioning issue that requires the shutdown of some services including HPC (DMOG). We are working to restore them as a matter of urgency and will update this page with updates. Apologies for the inconvenience.

Began at:

Affected components
  • Research Systems
    • DMOG