Downtime - Time and Duration

Certain maintenance requires that a computing environment be disabled; taken "down". The principles to be followed when choosing an appropriate time and duration are:

  • sufficient time must be allocated for and planning be done such that the likelihood of an environment returning later than advertised is of the same order as the likelihood of the environment simply failing without provocation during the same period of extended downtime.
  • sufficient advance notice must be given that those affected have a chance to react to the notice when they arrive for the day.
  • the opportunity to reschedule the downtime should be provided if practical. The opportunity should include a time beyond which rescheduling can't happen. That time is determined by the time given to see the notice, i.e. no later than the beginning of the working day, two days before the day of the downtime.
  • the notice must describe the impact of the downtime, so that those affected will in fact know that they are affected.
  • systems whose downtime would severely impact the mission of the Faculty are considered to be critical. Downtime of critical systems should occur outside of normal working hours (e.g. 8:30->16:30).

The feasibility of avoiding working hours can be affected by factors such as:

  • availability of staff
  • constraints imposed by outside suppliers needed during the downtime

Reaction to a downtime notice is intended to be the rescheduling of affected work, and/or rescheduling the downtime itself. In the rare case that there's no advantage to either, e.g. the ability to recover files from tape backups, advance notice can be minimal. The downtime should still occur outside of working hours.

Examples

The CS General Environment

For the CS General Environment, morning downtime should end before 8:30, and evening downtime should start no earlier than 16:30.

The Student Mac Environment

Downtime for the cs-teaching Macintosh environment is usually determined by agreement with the ISG, as they represent almost of all of the instructors that use it. It usually happens during the working day, between scheduled classes.

Redundant Systems

Downtime for a redundant system that doesn't provide services that are damaging to interrupt, such as long running CPU service, can usually be scheduled for any time, with notice given the beginning of the working day before the day of the downtime. The offer to reschedule needn't be given.

To the extent that it's possible, new users should be automatically redirected, e.g. via `hostselect` in the student.math environment. It's unclear how far in advance such redirection should be to have the desired effect.

Group Machines

Downtime for a single user workstation, or even for a group machine for which all users can agree, is done whenever it's agreeable. Often that's during the working day.

Topic revision: r3 - 2015-10-15 - BillInce
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback