Standardized Administration Tools (SAT) Accounts Database

This is an understanding of the Accounts Database and CSCF's redevelopment project.

Architecture

We have invested time over Winter and Spring 2019 discussing the existing system architecture, which includes data from three sources; at this point all are housed within the Odyssey database. See web diagram with architecture overview.

The three data sources are:

  • _quest (imported to odyssey from the Registrar's office)
  • _watiam (imported to odyssey from IST)
  • math_computing (imported to odyssey from the sponsors database text files, via the sponsor_resources script) - see References for schema dump

Goals for database redesign

The following are goals for redesigning the Accounts database, which I'll refer to as "Version 1". We want to:
  • revise the math_computing database schema, which is currently (Sep 2019) a 1:1 image of the sponsors files output, to allow us greater flexibility in accounts management
  • eliminate manually handled updates for data which is automatically recorded in OAT/odyssey, such as course enrollments and watiam
  • decide on a roadmap to incorporate other data sources to allow additional automation and efficiencies- either now, or eventually.
    • where possible, keeping this data in the Odyssey postgres database;
    • where not possible, possibly referencing this data in other sources.
  • understand our work process, dropping data that is no longer necessary, which had been recorded from early days of the sponsors database design (see References below)
  • understand our work process, in order to offer whatever data in whatever formats are necessary for database users to re-implement the accounts and system administration tools for CS. (Requested by Dave: an API for access via python scripts; and some sort of database trigger so infrastructure scripts can be activated when needed rather than frequently polling)
  • keep data structures useful for MFCF, if/when they switch to the same database. Robyn says that for now MFCF will keep using accounts_master/accounts_client for aspects such as check_equipment- these do not require a SQL database.

Remarks on math_computing schema cleanup

Following are some remarks on the Version 1 (Sep 2019) schema and data, regarding pieces we might remove from the database entirely. These are suggested after examining both CSCF and MFCF data as well as man sponsors(5) on mfcf.math.
  • In current data, some tables are redundant and/or stale in CS. Assumption: these can be dropped from the database.
    • .sponsor_netinfo - used for check_equipment, not used by CS and doesn't require SQL database for MFCF.
    • .sponsor_network - same as above.
    • .sponsor_group (descriptions of arg-fe, arg-windows-ts, other groups that look stale). MFCF uses for equipment listings, doesn't require SQL database.
    • .sponsor_userinfo is redundant with odyssey _watiam tables
  • One table is used by the MFCF billing model; and not for CS
    • .sponsor_charge has 26 rows with no foreign key Assumption: this isn't used, and could be removed
  • Many tables have blank fields in both CS and MFCF. Assumption: these fields can be removed
    • .sponsor_computing.computing_description, sponsor_mailalias.mailalias_description, sponsor_printing.printing_description, .sponsor_class.class_requirements and .sponsor_class.class_instructor are metadata described in man sponsors that aren't used.
    • .sponsor_class.class_fee was "lab fee" in the MFCF model; not used.
  • There are systems for recording class load and enrollment. Assumption: the resource database is not the right place to put these, and could be removed?
    • .sponsor_class.class_enrollment has 9 rows with non-null - only representing classes se112, se212, se382.
    • .sponsor_class.class_load has 471 out of 959 classes filled in, with values distributed between ("low/moderate/high/heavy/???") - described as "estimated cpu load for the class"
    • .sponsor_member.userinfo_userid is redundent. It is always NULL in CSCF data, and always equal to member_name in MFCF data.
      • except: userinfo_userid is meant to REFERENCE sponsor_userinfo. So we will keep that column, fix the CSCF data, and remove member_name.

There are two fields which are used by the MFCF charging model but not for CSCF; we can preserve the fields and their data for MFCF use.

  • .sponsor_usage table is not actively used for CS, but is for MFCF. It is used with =.sponsor_class.usage_id as a lookup table which may be "Unknown", "Research", "Administration", "Maintenance", "Teaching", or "[NULL]". We can preserve this table and dropdown for class definitions.
  • .sponsor_class.class_nosubsidy is barely used by CS but is used for MFCF for some records to record the resources are subsidized by the Dean.
    • For the record, CS has 958 valued 'F' and one value 'T' (named "Rshoshin001" with sponsor named "Shoshin Lab")

Remarks on potential design changes for Version 2

Following are remarks on design changes for Version 2 math_computing schema.

sponsor_class and sponsor_member

If the above assumptions hold about deleting unused columns, sponsor_class simplifies to the following definition:
CREATE TABLE IF NOT EXISTS sponsor_class (
    sponsor_id                  integer,
    billcode_id                 integer,
    class_id                    integer,
    PRIMARY KEY (sponsor_id, billcode_id, class_id),

    class_name                  text,
    class_description           text,
    usage_id                    integer REFERENCES sponsor_usage,
    class_nosubsidy             boolean,

    FOREIGN KEY (sponsor_id, billcode_id) REFERENCES sponsor_billcode
);
And sponsor_member simplifies by removing member_name:
CREATE TABLE sponsor_member (
    sponsor_id                  integer,
    billcode_id                 integer,
    class_id                    integer,
    member_id                   integer,
    PRIMARY KEY (sponsor_id, billcode_id, class_id, member_id),

    userinfo_userid             text REFERENCES sponsor_userinfo,
    member_starts               date,
    member_ends                 date,

    FOREIGN KEY (sponsor_id, billcode_id, class_id) REFERENCES sponsor_class
);

Workflow for recording data

As of 2019-09, sponsorship data is translated from sponsors text files via apparently straightforward ways. Assuming the above assumptions are correct about data we can drop, a logical understanding of the existing wokflow performed by sponsor_resources is:
  • .sponsor_class is populated from text file input - primarily from the RO class lists
  • .sponsor_member associates userinfo with class, supplied from text file input - primarily from RO class lists
  • .sponsor_computing, .sponsor_mailalias, .sponsor_printing are supplied from text file input
  • .sponsor_computing_group associates computing with free-form named groups from text file input
  • .sponsor_billcode associates a sponsor with free-form named billcodes (list of billcodes currently populated how?)
  • .sponsor_sponsor is supplied from text file input - and is directly referenced by .sponsor_billcode (and indirectly by nearly all of the above)

Going forward, we can update these via stored procedures within our database; possibly triggered when we have our daily updates to _quest. We will have to automatically populate .sponsor_class and .sponsor_member with all classes and class-memberships identified from _quest; the exact process is TBD. See below for a proposal.

"Membership in groups"

Broadly speaking we have a number of databases which actively record "membership in a group" where:

  • "membership" is either direct or indirect, and grants privileges (such as resource allocation or access)
  • "group" denotes such broad things as "employment in CS," "grad-student in the Math faculty," or "undergrad in another faculty taking a CS course"

These include the following data sources:

  • sources driven by "odyssey":
    • math_computing schema including .sponsor_member, .sponsor_group, and .sponsor_computing_group
    • _quest schema
    • _watiam schema
    • CS Active Directory records, which are generated from math_computing schema
    • Standardized Administrative Tools (SAT) schema, currently called cs_people /RT#808697
  • other sources on campus:
    • workday employment records (which we theoretically could get direct API access to; and see just below about indirect grouper copies of this data)
    • grouper records - including quest data for students, data copied from workday, and manually-updated records
    • campus Active Directory - Nexus LDAP interface - including grouper data and lots of IST data.

For the rest of this document, I will only consider sources driven by odyssey.

Within the Standardized Administration Tools Requirements, ("cs_people" schema) I have been working on a flexible definition for "group," described as follows. From the working schema:

CREATE TABLE group_group (
    group_id          text,
    name              text,
    active_as_of      daterange,
    parent_id         text REFERENCES group_group (group_id) ON DELETE SET NULL,
    short_description text,
    PRIMARY KEY (group_id)
);

Groups are referenced many:many by people, rooms, and other groups. They have date-ranges for when they are active, so that they can automatically expire. They have a text ID, short name, and description. They have a "parent group id" which, if not null, allows indirect associations.

A person can belong to any number of groups. Persons have direct membership in groups and they have indirect memberships, automatically, through parentage. (TBD: why doesn't this include active_as_of daterange?... I expect it should - drallen 2019-09-25)

  • "active_as_of" added to Version 2 as of 2019-10-18
CREATE TABLE person_group (
    person_id        integer REFERENCES _identity.person_person,
    group_id         text    REFERENCES group_group,
    PRIMARY KEY (person_id, group_id)
);

Rooms are mostly outside the scope of the accounts discussion, but for completeness: any room is also associated with any number of groups; this would be used for assigning labs, offices, and desks. As an example, "DC2510 belongs to the group CSCF and implicitly belongs to the group "CS".

CREATE TABLE room_group (
    group_id     text    REFERENCES group_group,
    active_as_of daterange,
    room_id      integer REFERENCES room_room,
    PRIMARY KEY (room_id, group_id)
);

"Groups" proposal

Compare the definition for group_group and _math_computing.sponsor_class; and person_group and sponsor_member. We can eliminate the tables for sponsor_class and sponsor_member in favour of the more flexible groups, if:

We update group_group table to add columns for "sponsor_id", "billcode_id", "nosubsidy", and "usage". We update person_group table to add columns for "sponsor_id", and "billcode_id".

  • Accounts-system classes (math_computing.sponsor_class) are re-implemented as SAT "groups" with a value set for "sponsor_id" and "billcode_id".
  • The sponsor_ tables reference "group_id" instead of sponsor_class.class_id
  • _quest data for class existence will be copied into the "group_group" table, inserting or replacing identically identified groups which have a sponsor_id and billcode_id.
  • _quest data for class membership will automatically be copied into the "person_group" table, inserting or replacing existing data matching the same person and class
  • manual additions to group memberships can be accomplished by adding them manually to "person_group".
  • person_group replaces sponsor_member as follows: class_id maps to group_id, member_id is a superfluous numeric primary key, userinfo_userid maps to person_id, and member_name is unnecessary because we have their identity in _identity (see above).

Implications:

  • group memberships can be hierarchical- which makes it unnecessary to record computing sub-groups in .sponsor_computing_group since any group can have a parent.
    • We can also have arbitrary sub-groups.
    • With no database changes, resource allocations for printing or email could be recorded with sub-groups as well.
  • groups and group memberships have a date-range for activity, allowing us to allocate resources for future terms more easily, and keep past data (with transaction logs of who made changes) without relying on comments in the source text-files.
  • group "sponsorship" may be more generally useful than just for resource allocations - for example, room sponsorships would be built in with no additional changes; and we might consider research-group sponsorships in the same way.
  • It could be argued that the sponsorships schema, as it involves people, groups (classes), and privileges of group memberships, is part of the "people" schema - perhaps we collapse the two schemas to one schema?... probably not named cs_people.
    • Perhaps named _people

Schema Version 2

Following the above assumptions and logic, the Version 2 schema is updated in commit 7cfbc0af.

Workflow for recording data

Once again assuming all the above assumptions are correct, following is a logical understanding of a new wokflow performed by the replacement for sponsor_resources:

The following workflows are automatically updated by quest. They will trigger stored procedures to update.

  • .group_group (replacing .sponsor_class) is populated automatically upon updates to _quest
    • groups are created or updated to correspond to _quest classes.
    • updates are clones of existing groups, with changes, and with updates to the existing and new groups' active_as_of dateranges. (?)
    • new groups are created with a default sponsor_id and billcode_id. (?)
  • .person_group (replacing .sponsor_member) is populated automatically upon updates to _quest, making associations between person_id and .sponsor_class

The following tables will be updated via web UI and potentially by command-line tools.

  • .sponsor_computing, .sponsor_mailalias, .sponsor_printing (to add specific resources to persons) are updated via web UI
  • .group_group automatically includes the parentage information previously recorded in .sponsor_computing_group; (making sub-groups) will be updated via web UI
  • .sponsor_billcode (list of billcodes) is updated via web UI
  • .sponsor_sponsor (list of sponsors) is updated via web UI

References

-- DanielAllen - 2019-09-03

Topic revision: r13 - 2019-10-31 - DanielAllen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback