Systems Department Unicorn Contingency Plan
The Systems Department maintains two servers used for Unicorn: the production server on which
the live system is housed and the test server. The production server is backed up regularly.
Each day incremental tape backups are produced and stored in the Computer Room. Each weekend a
full backup is produced. The full back-up tapes for the current weekly backup are stored in the
Computer Room and the tapes for the previous week are sent to off-site storage. If a restore is
ever needed, daily and weekly backups are on site. If off-site tapes are needed they can be
requested and will be retrieved and delivered in less than 24 hours.
Short, unplanned downtime of less than 1 hour
Occasionally the Unicorn hardware or software fails or has to be brought down in an emergency.
Examples of short term downtime include: software errors requiring a rebooting of the system;
software errors that would cause database corruption if not fixed immediately; hardware
malfunction requiring rebooting; hardware failure. The following steps will be taken during
regular library hours:
- Quickly try to identify the cause of the problem and estimate the amount of time needed
to correct the problem. If necessary, contact Sirsi for assistance.>
- If the server or software will be unavailable for 5 minutes or more, begin the calling
chain and notify the Webmaster to put up a message on the Web page. [Establish set message to be
used.]
- Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff)
alerting them to the problem and giving an estimate of downtime.
- Continue to update the faculty and staff if downtime continues.
- Once the problem is resolved, begin the calling chain and notify the Webmaster to remove
the notice.
- Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff)
letting them know the system is up.
Short, unplanned downtime of more than 1 hour but less than 24 hours
Occasionally a hardware or software problem takes longer than an hour to get
resolved and may persist for a few hours, but is expected to be resolved
within the day. This might happen if a software problem requires Sirsi
intervention to address and/or fix the error or if readily-available
hardware needs to be replaced. If the downtime is expected to last more
than one hour but less than 24, the following steps will be taken:
- Begin the calling chain and notify the Webmaster to put up a message on the Web page.
[Establish set message to be used.]
- Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff)
alerting them to the problem and giving an estimate of downtime.
- Be prepared to process circulation transactions via Standalone when the system comes
back up.
- Continue to update the faculty and staff if downtime persists.
- Once the problem is resolved, begin the calling chain and notify the Webmaster to remove
the notice.
- Send a message to the faculty and staff via the internal lists (rul_faculty, rul_staff)
letting them know the system is up.
Short, unplanned downtime lasting longer than 24 hours
Unexpected downtime extending past 24 hours might happen if there is a
significant software, hardware, or network failure. In the case of
hardware or software failure every attempt must be made to provide a
read-only version of the catalog for public and staff use. The following
steps will be taken:
- Reconfigure the test server to be used as the read-only server. (Since Rutgers has a
license for just one test catalog, during the time the read-only version of the catalog is
available the test catalog will be unavailable.) Using the most recent software backup, restore
the files to the read-only server and bring it up without any write privileges.
- When the server is ready, send a message to the faculty and staff via the internal lists
(rul_faculty, rul_staff) letting them know the read-only version is available.
- Notify the Webmaster to put up a message on the Web page. [Establish set message to be
used.]
- Be prepared to process circulation transactions via Standalone when the production
server is available.
- Once the production system is available, notify the Webmaster to remove the message and
alert the faculty and staff via the internal lists (rul_faculty, rul_staff).
Extended, planned downtime with an announced ending time
Planned downtime is occasionally necessary to install upgrades to
the hardware or software or to reindex portions of the database.
The following steps will be taken:
- As far in advance as possible, announce the projected downtime schedule via the internal
lists (rul_faculty, rul_staff) and an announcement on the Web pages.
- Reconfigure the test server to be used as the read-only server. (Since Rutgers has a license
for just one test catalog, during the time the read-only version of the catalog is available the
test catalog will be unavailable.) Immediately following a full back-up restore the files to the
read-only server and bring it up as a static database, without any write privileges.
- Periodically update the faculty and staff via the lists letting them know what progress has
been made and when the production system is expected to be available.
- Be prepared to process circulation transactions via Standalone when the system comes back
up.
- When the work has been completed, notify notify the Webmaster to remove the message and
alert the faculty and staff via the internal lists (rul_faculty, rul_staff).