STAFF RESOURCES

Systems Department Unicorn Contingency Plan

The Systems Department maintains two servers used for Unicorn: the production server on which the live system is housed and the test server. The production server is backed up regularly. Each day incremental tape backups are produced and stored in the Computer Room. Each weekend a full backup is produced. The full back-up tapes for the current weekly backup are stored in the Computer Room and the tapes for the previous week are sent to off-site storage. If a restore is ever needed, daily and weekly backups are on site. If off-site tapes are needed they can be requested and will be retrieved and delivered in less than 24 hours.

Short, unplanned downtime of less than 1 hour

Occasionally the Unicorn hardware or software fails or has to be brought down in an emergency. Examples of short term downtime include: software errors requiring a rebooting of the system; software errors that would cause database corruption if not fixed immediately; hardware malfunction requiring rebooting; hardware failure. The following steps will be taken during regular library hours:

  1. Quickly try to identify the cause of the problem and estimate the amount of time needed to correct the problem. If necessary, contact Sirsi for assistance.>
  2. If the server or software will be unavailable for 5 minutes or more, begin the calling chain and notify the Webmaster to put up a message on the Web page. [Establish set message to be used.]
  3. Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff) alerting them to the problem and giving an estimate of downtime.
  4. Continue to update the faculty and staff if downtime continues.
  5. Once the problem is resolved, begin the calling chain and notify the Webmaster to remove the notice.
  6. Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff) letting them know the system is up.

Short, unplanned downtime of more than 1 hour but less than 24 hours

Occasionally a hardware or software problem takes longer than an hour to get resolved and may persist for a few hours, but is expected to be resolved within the day. This might happen if a software problem requires Sirsi intervention to address and/or fix the error or if readily-available hardware needs to be replaced. If the downtime is expected to last more than one hour but less than 24, the following steps will be taken:

  1. Begin the calling chain and notify the Webmaster to put up a message on the Web page. [Establish set message to be used.]
  2. Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff) alerting them to the problem and giving an estimate of downtime.
  3. Be prepared to process circulation transactions via Standalone when the system comes back up.
  4. Continue to update the faculty and staff if downtime persists.
  5. Once the problem is resolved, begin the calling chain and notify the Webmaster to remove the notice.
  6. Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff) letting them know the system is up.

Short, unplanned downtime lasting longer than 24 hours

Unexpected downtime extending past 24 hours might happen if there is a significant software, hardware, or network failure. In the case of hardware or software failure every attempt must be made to provide a read-only version of the catalog for public and staff use. The following steps will be taken:

  1. Reconfigure the test server to be used as the read-only server. (Since Rutgers has a license for just one test catalog, during the time the read-only version of the catalog is available the test catalog will be unavailable.) Using the most recent software backup, restore the files to the read-only server and bring it up without any write privileges.
  2. When the server is ready, send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff) letting them know the read-only version is available.
  3. Notify the Webmaster to put up a message on the Web page. [Establish set message to be used.]
  4. Be prepared to process circulation transactions via Standalone when the production server is available.
  5. Once the production system is available, notify the Webmaster to remove the message and alert the faculty and staff via the internal lists (rul_faculty, rul_staff).

Extended, planned downtime with an announced ending time

Planned downtime is occasionally necessary to install upgrades to the hardware or software or to reindex portions of the database. The following steps will be taken:

  1. As far in advance as possible, announce the projected downtime schedule via the internal lists (rul_faculty, rul_staff) and an announcement on the Web pages.
  2. Reconfigure the test server to be used as the read-only server. (Since Rutgers has a license for just one test catalog, during the time the read-only version of the catalog is available the test catalog will be unavailable.) Immediately following a full back-up restore the files to the read-only server and bring it up as a static database, without any write privileges.
  3. Periodically update the faculty and staff via the lists letting them know what progress has been made and when the production system is expected to be available.
  4. Be prepared to process circulation transactions via Standalone when the system comes back up.
  5. When the work has been completed, notify notify the Webmaster to remove the message and alert the faculty and staff via the internal lists (rul_faculty, rul_staff).


Last updated: Arm.August 9, 2000
 
URL: http://www.libraries.rutgers.edu/rul/staff/systems/procedures/contingency.shtml
Website Feedback  |  Privacy Policy

© Copyright 1997-2013, Rutgers University Libraries   (Further Copyright Information)