CF Web>InternalProjects>StandardizedAdministrationTools>SATAccountsManagement>SATAccountsDatabase (2020-07-10, DanielAllen)

EditAttach

Standardized Administration Tools (SAT) Accounts Database

This is an understanding of the Accounts Database and CSCF's redevelopment project.

Standardized Administration Tools (SAT) Accounts Database

Architecture

As of 2019, the system architecture includes data from three sources; at this point all are housed within the Odyssey postgres database. See web diagram

with architecture overview.

The three data sources are:

_quest (imported to odyssey from the Registrar's office)
_watiam (imported to odyssey from IST)
math_computing (currently imported to odyssey from the sponsors database text files, via the sponsor_resources script) - see References for schema dump - to be replaced by a new database as described below.

The project's new database will be housed in postgres, accessed and modified by direct SQL queries and stored procedures. On top of this access, there will be command-line tools for database access and modification, as well as a web UI for access by a wider range of users.

Goals for database redesign

The following are goals for redesigning the Accounts database, which I'll refer to as "Version 1". We want to:

revise the math_computing database schema, which is currently (Sep 2019) a 1:1 image of the sponsors files output, to allow us greater flexibility in accounts management
eliminate manually handled updates for data which is automatically recorded in OAT/odyssey, such as course enrollments and watiam
decide on a roadmap to incorporate other data sources to allow additional automation and efficiencies- either now, or eventually.
- where possible, keeping this data in the Odyssey postgres database;
- where not possible, possibly referencing this data in other sources.
understand our work process, dropping data that is no longer necessary, which had been recorded from early days of the sponsors database design (see References below)
understand our work process, in order to offer whatever data in whatever formats are necessary for database users to re-implement the accounts and system administration tools for CS.
- These include an API - via SQL - for access via python scripts.
keep data structures useful for MFCF, if/when they switch to the same database. Robyn says that for now MFCF will keep using accounts_master/accounts_client for aspects such as check_equipment- which do not require a SQL database.

Remarks on `math_computing` schema cleanup

Following are some remarks on the Version 1 (Sep 2019) schema and data, regarding pieces we might remove from the database entirely. These are suggested after examining both CSCF and MFCF data as well as man sponsors(5) on mfcf.math.

In current data, some tables are redundant and/or stale in CS. Assumption: these can be dropped from the database.
- .sponsor_netinfo - used for check_equipment, not used by CS and doesn't require SQL database for MFCF.
- .sponsor_network - same as above.
- .sponsor_group (descriptions of arg-fe, arg-windows-ts, other groups that look stale). MFCF uses for equipment listings, doesn't require SQL database.
- .sponsor_userinfo is redundant with odyssey _watiam tables
One table is used by the MFCF billing model; and not for CS
- .sponsor_charge has 26 rows with no foreign key Assumption: this isn't used, and could be removed
Many tables have blank fields in both CS and MFCF. Assumption: these fields can be removed
- .sponsor_computing.computing_description, sponsor_mailalias.mailalias_description, sponsor_printing.printing_description, .sponsor_class.class_requirements and .sponsor_class.class_instructor are metadata described in man sponsors that aren't used.
- .sponsor_class.class_fee was "lab fee" in the MFCF model; not used.
There are systems for recording class load and enrollment. Assumption: the resource database is not the right place to put these, and could be removed?
- .sponsor_class.class_enrollment has 9 rows with non-null - only representing classes se112, se212, se382.
- .sponsor_class.class_load has 471 out of 959 classes filled in, with values distributed between ("low/moderate/high/heavy/???") - described as "estimated cpu load for the class"
- .sponsor_member.userinfo_userid is redundent. It is always NULL in CSCF data, and always equal to member_name in MFCF data.
  - except: userinfo_userid is meant to REFERENCE sponsor_userinfo. So we will keep that column, fix the CSCF data, and remove member_name.

There are two fields which are used by the MFCF charging model but not for CSCF; we can preserve the fields and their data for MFCF use.

.sponsor_usage table is not actively used for CS, but is for MFCF. It is used with =.sponsor_class.usage_id as a lookup table which may be "Unknown", "Research", "Administration", "Maintenance", "Teaching", or "[NULL]". We can preserve this table and dropdown for class definitions.
.sponsor_class.class_nosubsidy is barely used by CS but is used for MFCF for some records to record the resources are subsidized by the Dean.
- For the record, CS has 958 valued 'F' and one value 'T' (named "Rshoshin001" with sponsor named "Shoshin Lab")

Remarks on potential design changes for Version 2

Following are remarks on design changes for Version 2 math_computing schema.

sponsor_class and sponsor_member

If the above assumptions hold about deleting unused columns, sponsor_class simplifies to the following definition:

CREATE TABLE IF NOT EXISTS sponsor_class (
    sponsor_id                  integer,
    billcode_id                 integer,
    class_id                    integer,
    PRIMARY KEY (sponsor_id, billcode_id, class_id),

    class_name                  text,
    class_description           text,
    usage_id                    integer REFERENCES sponsor_usage,
    class_nosubsidy             boolean,

    FOREIGN KEY (sponsor_id, billcode_id) REFERENCES sponsor_billcode
);

And sponsor_member simplifies by removing member_name:

CREATE TABLE sponsor_member (
    sponsor_id                  integer,
    billcode_id                 integer,
    class_id                    integer,
    member_id                   integer,
    PRIMARY KEY (sponsor_id, billcode_id, class_id, member_id),

    userinfo_userid             text REFERENCES sponsor_userinfo,
    member_starts               date,
    member_ends                 date,

    FOREIGN KEY (sponsor_id, billcode_id, class_id) REFERENCES sponsor_class
);

Workflow for recording data

As of 2019-09, sponsorship data is translated from sponsors text files via apparently straightforward ways. Assuming the above assumptions are correct about data we can drop, a logical understanding of the existing wokflow performed by sponsor_resources is:

.sponsor_class is populated from text file input - primarily from the RO class lists
.sponsor_member associates userinfo with class, supplied from text file input - primarily from RO class lists
.sponsor_computing, .sponsor_mailalias, .sponsor_printing are supplied from text file input
.sponsor_computing_group associates computing with free-form named groups from text file input
.sponsor_billcode associates a sponsor with free-form named billcodes (list of billcodes currently populated how?)
.sponsor_sponsor is supplied from text file input - and is directly referenced by .sponsor_billcode (and indirectly by nearly all of the above)

Going forward, we can update these via stored procedures within our database; possibly triggered when we have our daily updates to _quest. We will have to automatically populate .sponsor_class and .sponsor_member with all classes and class-memberships identified from _quest; the proposed process is described below under "Groups definition".

"Membership in groups"

Broadly speaking we have a number of databases which actively record "membership in a group" where:

"membership" is either direct or indirect, and grants privileges (such as resource allocation or access)
"group" denotes such broad things as "employment in CS," "grad-student in the Math faculty," or "undergrad in another faculty taking a CS course"

These include the following data sources:

sources driven by "odyssey":
- math_computing schema including .sponsor_member, .sponsor_group, and .sponsor_computing_group
- _quest schema
- _watiam schema
- CS Active Directory records, which are generated from math_computing schema
- Standardized Administrative Tools (SAT) schema, currently called cs_people /RT#808697
other sources on campus:
- workday employment records (which we theoretically could get direct API access to; and see just below about indirect grouper copies of this data)
- grouper records - including quest data for students, data copied from workday, and manually-updated records
- campus Active Directory - Nexus LDAP interface - including grouper data and lots of IST data.

For the rest of this document, I will only consider sources driven by odyssey.

Within the Standardized Administration Tools Requirements, ("cs_people" schema) I have been working on a flexible definition for "group," described as follows. From the working schema:

CREATE TABLE group_group (
    group_id          text,
    name              text,
    active_as_of      daterange,
    parent_id         text REFERENCES group_group (group_id) ON DELETE SET NULL,
    short_description text,
    PRIMARY KEY (group_id)
);

Groups are referenced many:many by people, rooms, and other groups. They have date-ranges for when they are active, so that they can automatically expire. They have a text ID, short name, and description. They have a "parent group id" which, if not null, allows indirect associations.

A person can belong to any number of groups. Persons have direct membership in groups and they have indirect memberships, automatically, through parentage. (TBD: why doesn't this include active_as_of daterange?... I expect it should - drallen 2019-09-25)

"active_as_of" added to Version 2 as of 2019-10-18

CREATE TABLE person_group (
    person_id        integer REFERENCES _identity.person_person,
    group_id         text    REFERENCES group_group,
    PRIMARY KEY (person_id, group_id)
);

Rooms are mostly outside the scope of the accounts discussion, but for completeness: any room is also associated with any number of groups; this would be used for assigning labs, offices, and desks. As an example, "DC2510 belongs to the group CSCF and implicitly belongs to the group "CS".

CREATE TABLE room_group (
    group_id     text    REFERENCES group_group,
    active_as_of daterange,
    room_id      integer REFERENCES room_room,
    PRIMARY KEY (room_id, group_id)
);

"Groups" definition

Compare the definition for group_group and _math_computing.sponsor_class; and person_group and sponsor_member. We can eliminate the tables for sponsor_class and sponsor_member in favour of the more flexible groups, if:

We update group_group table to add columns for "sponsor_id", "billcode_id", "nosubsidy", and "usage". We update person_group table to add columns for "sponsor_id", and "billcode_id".

Accounts-system classes (math_computing.sponsor_class) are re-implemented as SAT "groups" with a value set for "sponsor_id" and "billcode_id".
The sponsor_ tables reference "group_id" instead of sponsor_class.class_id
_quest data for class existence will be copied into the "group_group" table, inserting or replacing identically identified groups which have a sponsor_id and billcode_id.
_quest data for class membership will automatically be copied into the "person_group" table, inserting or replacing existing data matching the same person and class
manual additions to group memberships can be accomplished by adding them manually to "person_group".
person_group replaces sponsor_member as follows: class_id maps to group_id, member_id is a superfluous numeric primary key, userinfo_userid maps to person_id, and member_name is unnecessary because we have their identity in _identity (see above).

Implications:

group memberships can be hierarchical- which makes it unnecessary to record computing sub-groups in .sponsor_computing_group since any group can have a parent.
- We can also have arbitrary sub-groups.
- With no database changes, resource allocations for printing or email could be recorded with sub-groups as well.
- See directly below for a convenience view for accessing computing group information
groups and group memberships have a date-range for activity, allowing us to allocate resources for future terms more easily, and keep past data (with transaction logs of who made changes) without relying on comments in the source text-files.
group "sponsorship" may be more generally useful than just for resource allocations - for example, room sponsorships would be built in with no additional changes; and we might consider research-group sponsorships in the same way.
It could be argued that the sponsorships schema, as it involves people, groups (classes), and privileges of group memberships, is part of the "people" schema - perhaps we collapse the two schemas to one schema?... probably not named cs_people.
- Perhaps named _people

Defining nested computing groups

The group_group table allows arbitrarily nested groups. For convenience, we can use a sql VIEW which expands sponsored groups into a flat list of all nested sponsorships; and join this view to sponsor_computing to return sponsored computing records with nested sponsorships. This replaces sponsor_computing_group.

CREATE RECURSIVE VIEW group_group_root (group_id, parent_id, root_id) AS
(      
	select group_id, parent_id, group_id root_id from group_group
	union all
	select g.group_id, g.parent_id, r.root_id
	from group_group g
	inner join group_group_root r on r.group_id = g.parent_id
)

Usage:

select * on group_group JOIN group_group_root ON group_id where root_id = 'cs241_student'; -- returns group information associated with group cs241_student and all of its children.
select * on sponsor_computing JOIN group_group_root ON group_id JOIN group_group ON group_id; -- returns all sponsored computing records joined with all the hierarchical nested groups.
select * on sponsor_computing JOIN group_group_root ON group_id JOIN group_group ON (group_id) WHERE group_name = "cs241_student"; -- returns all sponsored computing records associated with group named cs241_student and all of its children.

Schema Version 2

Following the above assumptions and logic, the Version 2 schema is updated in commit 7cfbc0af

The working schema

Workflow for recording data

Once again assuming all the above assumptions are correct, following is a logical understanding of a new wokflow performed by the replacement for sponsor_resources:

The following workflows are automatically updated by quest. They will trigger stored procedures to update.

.group_group (replacing .sponsor_class) is populated automatically upon updates to _quest
- groups are created or updated to correspond to _quest classes.
- updates replace existing groups, with old group data stored in the transactions table for history.
- new groups automatically created may be created with a default sponsor_id and billcode_id of 0 to denote un-customized groups; not NULL which denotes no sponsorship. Or, a different default sponsor and billcode can be set for import (process TBD).
.person_group (replacing .sponsor_member) is populated automatically upon updates to _quest, making associations between person_id and .sponsor_class

The following tables will be updated via web UI and potentially by command-line tools.

.sponsor_computing, .sponsor_mailalias, .sponsor_printing (to add specific resources to persons) are updated via web UI
.group_group automatically includes the parentage information previously recorded in .sponsor_computing_group; (making sub-groups) will be updated via web UI
.sponsor_billcode (list of billcodes) is updated via web UI
.sponsor_sponsor (list of sponsors) is updated via web UI

Logic for User Access Levels related to Sponsorship (DRAFT)

Spent some time this afternoon considering how to meet Dave's request for filtering out bill codes, sponsors, and sponsorships a user shouldn't see. Access levels can depend on group memberships; I wrote up how to base this on a person's Primary Appointment (here).

I don't think we need database additions to do sponsorship access levels. I need to map this out more comprehensively, but possibly:

group visibility as above
basic "sponsorship" visibility if you're a member of a group which is "rooted" in a "Sponsorship" group. (noting that groups can have multiple parent groups).
- specific visibility of sponsors if they are sponsors of any groups(classes) which are visible to you. So, you can see them if they sponsor anything you can see.
basic resource visibility if they are associated with any groups(classes) which are visible to you.

Possibly there are root groups for "Sponsorship visibility" "Sponsorship creation" "Sponsorship updating" and "Sponsorship deleting".

Developer Install of SAT Accounts Database

Start from https://git.uwaterloo.ca/cscf/people-app README to set up vagrant and virtualbox image.
Under "Getting the Vagrant Box going", step 3, the running postgres has a hard-coded default password. Change it:

   vagrant@ubuntu-bionic:~/people/schema$ sudo -u postgres psql 
   psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
   Type "help" for help.
   postgres=# \password
   Enter new password:
   Enter it again:
   postgres=# \q
   vagrant@ubuntu-bionic:~$ psql -h 127.0.0.1 -p 5001 -U postgres postgres
   Password for user postgres:
   psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
   Type "help" for help.
   postgres=#

Under "Getting the Vagrant Box going", step 6, the schema and data are in two files supplied by Daniel.
1. people_quest_watiam_sample_data_combined.sql including a subset of quest and watiam data, and the people schema
2. sponsor_data_combined.sql including old-format sponsor data (pre-conversion).
From within vagrant:

   vagrant@ubuntu-bionic:~/people/schema$ sudo -u postgres psql -f people_quest_watiam_sample_data_combined.sql
   vagrant@ubuntu-bionic:~/people/schema$ sudo -u postgres psql -f sponsor_data_combined.sql

from within vagrant, fix postgres for external access:
- add to: /etc/postgresql/10/main/pg_hba.conf
  - host all all 10.0.2.2/32 md5
- add to: /etc/postgresql/10/main/postgresql.conf
  - listen_addresses = '*'
- service postresql restart
From outside vagrant, you can now access postgres at port 50010 (as set in Vagrantfile):
- psql -h localhost -U postgres postgres -p 50010

References

http://www.math.uwaterloo.ca/~rblander/mfcf/old_site/resource.shtml
http://www.math.uwaterloo.ca/~rblander/mfcf/old_site/sponsorship.shtml
https://git.uwaterloo.ca/cscf/odyssey/blob/prod/project/math/sql/sponsor.sql : schema for _math_computing.sql as of 2019-09-26

-- DanielAllen - 2019-09-03

Topic revision: r19 - 2020-07-10 - DanielAllen

Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.