Glass Room

part of the ArsDigita Community System by Philip Greenspun
ArsDigita Glass Room is a module that lets the community system implement the final component of the ArsDigita Server Architecture: coordinating a bunch of human beings to ensure the reliable operation of a Web service.

The first function that Glass Room must accomplish is the distribution of information. The glassroom_info table contains:

For each of the physical computer systems involved, there is an entry in glassroom_hosts: An expired Verisign certificate can be nearly fatal to a service that requires SSL to operate. Users get hammered with nasty warning messages that they don't understand. So we need the glassroom_certificates table with the following columns: Important news, such as the fact that regular backups have been halted and someone is restoring from tape, are recorded using the standard ACS /news subsystem.

Modeling the software

Every site is going to depend on a set of software modules that can be versioned. The ones that occasion the most discussion are presumably the custom-written software, e.g., the scripts that drive the Web site. However, we still need to keep track of packaged software. People might need to know that we're currently running Oracle 8.0 but plan to upgrade to 8.1 in April 1999.

We also are going to tie bug tickets and feature requests to software modules so that only the relevant personnel need be alerted. Here's what the glassroom_modules table keeps:

So that bug tickets and feature requests can be closed out with a structured "fixed in Release 4.1", Glass Room needs to know about software releases. We have a table glassroom_releases containing:

We also use this table even when we're talking about software releases that we're merely installing, not developing (e.g., for Oracle 8.1).

Modeling and Logging Procedures

A procedure is something that must be regularly done, e.g., "verify backup tape". We want to log everything of this nature that has been done, by whom, and when. Glass Room needs to know which of these procedures need to be done and how frequently. That way it can check the log and raise some alerts when procedures haven't been done sufficiently recently.

We keep a single glassroom_logbook table in which all kinds of events are intermingled. Some of these might even be ad-hoc events for which we don't have a procedure on record as needing to be done.

So that the system can do automated checking of the logbook table, we keep glassroom_procedures:

Logbook entries can be made by human beings or robots. As the Glass Room is generally running on a geographically separate machine from the production servers, the robots will have to make their log entries via HTTP GET or POST.

Here's the data model for glassroom_logbook:

People can comment on logbook entries, but we just do this with the general_comments table.

Suggested Procedures

Check at least the following:

Domains

We don't want an unpaid InterNIC invoice rendering our service inaccessible to most users. So we keep track of all the domains on which our service depends, when they expire, who has paid the bill, and when the last bill was paid.

create table glassroom_domains (
	domain_name	varchar(50),	-- e.g., 'photo.net'
	last_paid	date,
	by_whom_paid	varchar(100),
	expires		date
);

Bug Tracking, Feature Requests, and Tickets

In the tech world, people seem to like organizing things by trouble ticket:
  1. Joe Customer opens a ticket when he is unhappy about a bug on a page
  2. If it is a high priority bug, a variety of folks get notified via email and maybe pager; if it is a low priority bug, it sits in the queue until someone notices
  3. A coordinator assigns the bug to Jane Programmer, causing the system to send Jane email
  4. Jane Programmer fixes the bug and records that fact, causing the system to send email to Joe Customer
The same kind of interaction works well for feature requests, except that Jane Programmer might need to record the version number of the software that will incorporate the new feature.

So that the group can see whether everyone is working together effectively, the system can produce reports such as "average time to implement a requested feature", "response time for bugs arranged by the person who reported them", etc.


philg@mit.edu