Standardized Administration Tools (SAT) Accounts Database
This is an understanding of the Accounts Database and CSCF's redevelopment project.
Architecture
As of 2019, the system architecture includes data from three sources; at this point all are housed within the Odyssey postgres database. See
web diagram with architecture overview.
The three data sources are:
-
_quest
(imported to odyssey from the Registrar's office)
-
_watiam
(imported to odyssey from IST)
-
math_computing
(currently imported to odyssey from the sponsors database text files, via the sponsor_resources script) - see References for schema dump - to be replaced by a new database as described below.
The project's new database will be housed in postgres, accessed and modified by direct SQL queries and stored procedures. On top of this access, there will be command-line tools for database access and modification, as well as a web UI for access by a wider range of users.
Goals for database redesign
The following are goals for redesigning the Accounts database, which I'll refer to as "Version 1". We want to:
- revise the
math_computing
database schema, which is currently (Sep 2019) a 1:1 image of the sponsors files output, to allow us greater flexibility in accounts management
- eliminate manually handled updates for data which is automatically recorded in OAT/odyssey, such as course enrollments and watiam
- decide on a roadmap to incorporate other data sources to allow additional automation and efficiencies- either now, or eventually.
- where possible, keeping this data in the Odyssey postgres database;
- where not possible, possibly referencing this data in other sources.
- understand our work process, dropping data that is no longer necessary, which had been recorded from early days of the sponsors database design (see References below)
- understand our work process, in order to offer whatever data in whatever formats are necessary for database users to re-implement the accounts and system administration tools for CS.
- These include an API - via SQL - for access via python scripts.
- keep data structures useful for MFCF, if/when they switch to the same database. Robyn says that for now MFCF will keep using accounts_master/accounts_client for aspects such as check_equipment- which do not require a SQL database.
Remarks on math_computing
schema cleanup
Following are some remarks on the Version 1 (Sep 2019) schema and data, regarding pieces we might remove from the database entirely. These are suggested after examining both CSCF and MFCF data as well as
man sponsors(5)
on mfcf.math.
- In current data, some tables are redundant and/or stale in CS. Assumption: these can be dropped from the database.
-
.sponsor_netinfo
- used for check_equipment
, not used by CS and doesn't require SQL database for MFCF.
-
.sponsor_network
- same as above.
-
.sponsor_group
(descriptions of arg-fe, arg-windows-ts, other groups that look stale). MFCF uses for equipment listings, doesn't require SQL database.
-
.sponsor_userinfo
is redundant with odyssey _watiam
tables
- One table is used by the MFCF billing model; and not for CS
-
.sponsor_charge
has 26 rows with no foreign key Assumption: this isn't used, and could be removed
- Many tables have blank fields in both CS and MFCF. Assumption: these fields can be removed
-
.sponsor_computing.computing_description
, sponsor_mailalias.mailalias_description
, sponsor_printing.printing_description
, .sponsor_class.class_requirements
and .sponsor_class.class_instructor
are metadata described in man sponsors
that aren't used.
-
.sponsor_class.class_fee
was "lab fee" in the MFCF model; not used.
- There are systems for recording class load and enrollment. Assumption: the resource database is not the right place to put these, and could be removed?
-
.sponsor_class.class_enrollment
has 9 rows with non-null - only representing classes se112, se212, se382.
-
.sponsor_class.class_load
has 471 out of 959 classes filled in, with values distributed between ("low/moderate/high/heavy/???") - described as "estimated cpu load for the class"
-
.sponsor_member.userinfo_userid
is redundent. It is always NULL in CSCF data, and always equal to member_name in MFCF data.
- except: userinfo_userid is meant to REFERENCE sponsor_userinfo. So we will keep that column, fix the CSCF data, and remove member_name.
There are two fields which are used by the MFCF charging model but not for CSCF; we can preserve the fields and their data for MFCF use.
-
.sponsor_usage
table is not actively used for CS, but is for MFCF. It is used with =.sponsor_class.usage_id as a lookup table which may be "Unknown", "Research", "Administration", "Maintenance", "Teaching", or "[NULL]". We can preserve this table and dropdown for class definitions.
-
.sponsor_class.class_nosubsidy
is barely used by CS but is used for MFCF for some records to record the resources are subsidized by the Dean.
- For the record, CS has 958 valued 'F' and one value 'T' (named "Rshoshin001" with sponsor named "Shoshin Lab")
Remarks on potential design changes for Version 2
Following are remarks on design changes for Version 2
math_computing
schema.
sponsor_class and sponsor_member
If the above assumptions hold about deleting unused columns, sponsor_class simplifies to the following definition:
CREATE TABLE IF NOT EXISTS sponsor_class (
sponsor_id integer,
billcode_id integer,
class_id integer,
PRIMARY KEY (sponsor_id, billcode_id, class_id),
class_name text,
class_description text,
usage_id integer REFERENCES sponsor_usage,
class_nosubsidy boolean,
FOREIGN KEY (sponsor_id, billcode_id) REFERENCES sponsor_billcode
);
And sponsor_member simplifies by removing member_name:
CREATE TABLE sponsor_member (
sponsor_id integer,
billcode_id integer,
class_id integer,
member_id integer,
PRIMARY KEY (sponsor_id, billcode_id, class_id, member_id),
userinfo_userid text REFERENCES sponsor_userinfo,
member_starts date,
member_ends date,
FOREIGN KEY (sponsor_id, billcode_id, class_id) REFERENCES sponsor_class
);
Workflow for recording data
As of 2019-09, sponsorship data is translated from sponsors text files via apparently straightforward ways. Assuming the above assumptions are correct about data we can drop, a logical understanding of the existing wokflow performed by sponsor_resources is:
-
.sponsor_class
is populated from text file input - primarily from the RO class lists
-
.sponsor_member
associates userinfo with class, supplied from text file input - primarily from RO class lists
-
.sponsor_computing
, .sponsor_mailalias
, .sponsor_printing
are supplied from text file input
-
.sponsor_computing_group
associates computing with free-form named groups from text file input
-
.sponsor_billcode
associates a sponsor with free-form named billcodes (list of billcodes currently populated how?)
-
.sponsor_sponsor
is supplied from text file input - and is directly referenced by .sponsor_billcode
(and indirectly by nearly all of the above)
Going forward, we can update these via stored procedures within our database; possibly triggered when we have our daily updates to
_quest
.
We will have to automatically populate
.sponsor_class
and
.sponsor_member
with all classes and class-memberships identified from
_quest
; the proposed process is described below under "Groups definition".
"Membership in groups"
Broadly speaking we have a number of databases which actively record "membership in a group" where:
- "membership" is either direct or indirect, and grants privileges (such as resource allocation or access)
- "group" denotes such broad things as "employment in CS," "grad-student in the Math faculty," or "undergrad in another faculty taking a CS course"
These include the following data sources:
- sources driven by "odyssey":
- math_computing schema including
.sponsor_member
, .sponsor_group
, and .sponsor_computing_group
- _quest schema
- _watiam schema
- CS Active Directory records, which are generated from math_computing schema
- Standardized Administrative Tools (SAT) schema, currently called cs_people /RT#808697
- other sources on campus:
- workday employment records (which we theoretically could get direct API access to; and see just below about indirect grouper copies of this data)
- grouper records - including quest data for students, data copied from workday, and manually-updated records
- campus Active Directory - Nexus LDAP interface - including grouper data and lots of IST data.
For the rest of this document, I will only consider sources driven by odyssey.
Within the
Standardized Administration Tools Requirements, ("cs_people" schema) I have been working on a flexible definition for "group," described as follows.
From
the working schema:
CREATE TABLE group_group (
group_id text,
name text,
active_as_of daterange,
parent_id text REFERENCES group_group (group_id) ON DELETE SET NULL,
short_description text,
PRIMARY KEY (group_id)
);
Groups are referenced many:many by people, rooms, and other groups. They have date-ranges for when they are active, so that they can automatically expire. They have a text ID, short name, and description. They have a "parent group id" which, if not null, allows indirect associations.
A person can belong to any number of groups. Persons have direct membership in groups and they have indirect memberships, automatically, through parentage.
(TBD: why doesn't this include active_as_of daterange?... I expect it should - drallen 2019-09-25)
- "active_as_of" added to Version 2 as of 2019-10-18
CREATE TABLE person_group (
person_id integer REFERENCES _identity.person_person,
group_id text REFERENCES group_group,
PRIMARY KEY (person_id, group_id)
);
Rooms are mostly outside the scope of the accounts discussion, but for completeness: any room is also associated with any number of groups; this would be used for assigning labs, offices, and desks. As an example, "DC2510 belongs to the group CSCF and implicitly belongs to the group "CS".
CREATE TABLE room_group (
group_id text REFERENCES group_group,
active_as_of daterange,
room_id integer REFERENCES room_room,
PRIMARY KEY (room_id, group_id)
);
"Groups" definition
Compare the definition for
group_group
and
_math_computing.sponsor_class
; and
person_group
and
sponsor_member
.
We can eliminate the tables for
sponsor_class
and
sponsor_member
in favour of the more flexible groups, if:
We update
group_group
table to add columns for "sponsor_id", "billcode_id", "nosubsidy", and "usage".
We update
person_group
table to add columns for "sponsor_id", and "billcode_id".
- Accounts-system classes (
math_computing.sponsor_class
) are re-implemented as SAT "groups" with a value set for "sponsor_id" and "billcode_id".
- The
sponsor_
tables reference "group_id" instead of sponsor_class.class_id
- _quest data for class existence will be copied into the "group_group" table, inserting or replacing identically identified groups which have a sponsor_id and billcode_id.
- _quest data for class membership will automatically be copied into the "person_group" table, inserting or replacing existing data matching the same person and class
- manual additions to group memberships can be accomplished by adding them manually to "person_group".
-
person_group
replaces sponsor_member
as follows: class_id maps to group_id, member_id is a superfluous numeric primary key, userinfo_userid maps to person_id, and member_name is unnecessary because we have their identity in _identity (see above).
Implications:
- group memberships can be hierarchical- which makes it unnecessary to record computing sub-groups in
.sponsor_computing_group
since any group can have a parent.
- We can also have arbitrary sub-groups.
- With no database changes, resource allocations for printing or email could be recorded with sub-groups as well.
- See directly below for a convenience view for accessing computing group information
- groups and group memberships have a date-range for activity, allowing us to allocate resources for future terms more easily, and keep past data (with transaction logs of who made changes) without relying on comments in the source text-files.
- group "sponsorship" may be more generally useful than just for resource allocations - for example, room sponsorships would be built in with no additional changes; and we might consider research-group sponsorships in the same way.
- It could be argued that the sponsorships schema, as it involves people, groups (classes), and privileges of group memberships, is part of the "people" schema - perhaps we collapse the two schemas to one schema?... probably not named
cs_people
.
Defining nested computing groups
The
group_group
table allows arbitrarily nested groups. For convenience, we can use a sql VIEW which expands sponsored groups into a flat list of all nested sponsorships; and join this view to sponsor_computing to return sponsored computing records with nested sponsorships. This replaces
sponsor_computing_group
.
CREATE RECURSIVE VIEW group_group_root (group_id, parent_id, root_id) AS
(
select group_id, parent_id, group_id root_id from group_group
union all
select g.group_id, g.parent_id, r.root_id
from group_group g
inner join group_group_root r on r.group_id = g.parent_id
)
Usage:
-
select * on group_group JOIN group_group_root ON group_id where root_id = 'cs241_student';
-- returns group information associated with group cs241_student
and all of its children.
-
select * on sponsor_computing JOIN group_group_root ON group_id JOIN group_group ON group_id;
-- returns all sponsored computing records joined with all the hierarchical nested groups.
-
select * on sponsor_computing JOIN group_group_root ON group_id JOIN group_group ON (group_id) WHERE group_name = "cs241_student";
-- returns all sponsored computing records associated with group named cs241_student
and all of its children.
Schema Version 2
Following the above assumptions and logic, the Version 2 schema is updated in
commit 7cfbc0af.
Workflow for recording data
Once again assuming all the above assumptions are correct, following is a logical understanding of a new wokflow performed by the replacement for sponsor_resources:
The following workflows are automatically updated by quest. They will trigger stored procedures to update.
-
.group_group
(replacing .sponsor_class
) is populated automatically upon updates to _quest
- groups are created or updated to correspond to _quest classes.
- updates replace existing groups, with old group data stored in the transactions table for history.
- new groups automatically created may be created with a default sponsor_id and billcode_id of 0 to denote un-customized groups; not NULL which denotes no sponsorship. Or, a different default sponsor and billcode can be set for import (process TBD).
-
.person_group
(replacing .sponsor_member
) is populated automatically upon updates to _quest
, making associations between person_id and .sponsor_class
The following tables will be updated via web UI and potentially by command-line tools.
-
.sponsor_computing
, .sponsor_mailalias
, .sponsor_printing
(to add specific resources to persons) are updated via web UI
-
.group_group
automatically includes the parentage information previously recorded in .sponsor_computing_group
; (making sub-groups) will be updated via web UI
-
.sponsor_billcode
(list of billcodes) is updated via web UI
-
.sponsor_sponsor
(list of sponsors) is updated via web UI
Logic for User Access Levels related to Sponsorship (DRAFT)
Spent some time this afternoon considering how to meet Dave's request for filtering out bill codes, sponsors, and sponsorships a user shouldn't see. Access levels can depend on group memberships; I wrote up how to base this on a person's Primary Appointment (
here).
I don't think we need database additions to do sponsorship access levels. I need to map this out more comprehensively, but possibly:
- group visibility as above
- basic "sponsorship" visibility if you're a member of a group which is "rooted" in a "Sponsorship" group. (noting that groups can have multiple parent groups).
- specific visibility of sponsors if they are sponsors of any groups(classes) which are visible to you. So, you can see them if they sponsor anything you can see.
- basic resource visibility if they are associated with any groups(classes) which are visible to you.
Possibly there are root groups for "Sponsorship visibility" "Sponsorship creation" "Sponsorship updating" and "Sponsorship deleting".
Developer Install of SAT Accounts Database
- Start from https://git.uwaterloo.ca/cscf/people-app README to set up vagrant and virtualbox image.
- Under "Getting the Vagrant Box going", step 3, the running postgres has a hard-coded default password. Change it:
vagrant@ubuntu-bionic:~/people/schema$ sudo -u postgres psql
psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
Type "help" for help.
postgres=# \password
Enter new password:
Enter it again:
postgres=# \q
vagrant@ubuntu-bionic:~$ psql -h 127.0.0.1 -p 5001 -U postgres postgres
Password for user postgres:
psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
Type "help" for help.
postgres=#
- Under "Getting the Vagrant Box going", step 6, the schema and data are in two files supplied by Daniel.
-
people_quest_watiam_sample_data_combined.sql
including a subset of quest and watiam data, and the people schema
-
sponsor_data_combined.sql
including old-format sponsor data (pre-conversion).
- From within vagrant:
vagrant@ubuntu-bionic:~/people/schema$ sudo -u postgres psql -f people_quest_watiam_sample_data_combined.sql
vagrant@ubuntu-bionic:~/people/schema$ sudo -u postgres psql -f sponsor_data_combined.sql
- from within vagrant, fix postgres for external access:
- add to:
/etc/postgresql/10/main/pg_hba.conf
-
host all all 10.0.2.2/32 md5
- add to:
/etc/postgresql/10/main/postgresql.conf
-
service postresql restart
- From outside vagrant, you can now access postgres at port 50010 (as set in Vagrantfile):
-
psql -h localhost -U postgres postgres -p 50010
References
--
DanielAllen - 2019-09-03