Linux Working Group
Meeting Date
Invited
Anthony (group leader), Lori, Dave, O, Clayton, Guoxiang, Nathan, Nick, Todd, Ed, Devon
Attendees
Anthony (group leader), Guoxiang, Nathan, Nick, Todd, Ed, Devon, Fraser
Review and accept previous meeting minutes.
CsLWGMeeting20231115
Review last meeting's Action Items
There is "CSCF exec" doubt about the value of managing per user quota versus flagging excessive usage and peer-pressure
- Plan to implement and enforce quotas will move forward as per Director
- Delayed due to preparations for chilled water outtage on Dec 6th.
100GB quotas in staged roll out, plan detailed in tickets above
- Email sent, see RT#1288354
- Nick sent second reminder in mid Oct - no replies - Small group of students (5 or 6) have gone over quota during the grace period
- How do we monitor this monthly?
- Check with SAT development team about status of storing a storage quota in appropriate sponsorship tables.
- Clayton has set all the "maxstorage" user entries in each Domain to an initial 100,000,000B base amount.
- Note that in long run (by Aug 2024 hopefully) this ends up being a base amount plus additional SAT entry sponsorships once that mechanism is established. * Build a data file that contains the user and summed quota information. Specifics to be worked out between Clayton and Nathan. * Nathan has been pulled off this
- Update quota CEPHfs xattr on homedirectories (Nathan)
- Nathan has been pulled off this
- Tooling to implement quotas
- Calculate current quotas from sponsorship information, see RT#1298614 (Clayton)
- There appears to be 3 (teaching, non-course-account) users with sponsored quota in excess of 100GB
- Maybe re-run that query as sponsorships can change
CS Mailservers are going away - by Jan 1
- provided Lori with list of hosts still using it as a relay, will check next week for action
- alias / vanity addresses will stop working
- csadviso@cs.uwaterloo.ca special forwarding will cease working, consulting with IST and Brad Lushman (a2brenna) * Nick is looking to replace the forwarding script with Microsoft Power Automate (M365)
Ongoing problems with Inventory and IPAM are hobbling Infrastructure operations - RT1285291
Will schedule some time to talk with Inventory team about the following (a2brenna)
- Inventory is unaware of this IP / domain limitations in IPAM as well as DHCP and MAC address requirements
- Invalid records were imported from Infoblox that work until they are edited
- public domain that resolved to private IP, modifying this record will break the record
- Some CSCF do not have access to create manual DNS entries (Devon, Lori, Guoxiang, Todd have access. Dave?)
- Inventory bug: Changing room field on a record with IPAM DNS & DHCP causes DHCP to break
- Anthony to reach out to IST for clarification regarding is this a policy vs technological limitation.
- Delayed due to preparations for chilled water outtage on Dec 6th.
- Need CSCF management to take over this ticket
What's still using old MySQL?
NextCloud (Vault) pending migration
- Scheduling Nextcloud DB migration (Nathan, Fraser)
- Testing is complete - need to be scheduled
Web server (includes Inventory)
- needs OS (whole LAMP stack) to be updated (Nathan, Isaac)
Retire CS-GENERAL and associated domain controllers
- Last user is Vault
- Vault upgrade needs to be performed - move db to mysql 8, upgrade NextCloud, then fix AD
- Vault migration from GENERAL to CS-GENERAL may take place after upgrade. Nathan to determine what is priority
- Why can't vault switch domains to GENERAL? (a2brenna) - File space in vault is mapped to the user's UUID. Clayton has provided a mapping from GENERAL to CS-GENERAL.
Ongoing problems with NFS ganesha server RT#1303795
- needs further enhancements to monitoring service?
- Devon and Anthony to preparing doc for help desk
- More comprehensive monitoring of NFS performance is in the works (a2brenna, dmerner) ~ end November
- Delayed due to preparations for chilled water outtage on Dec 6th.
Web Service failure [https://rt.uwaterloo.ca/Ticket/Display.html?id=1304871][RT#1304871]
- HAProxy hit open file limit (4096 open file descriptor), change made by a2brenna will not survive a reboot * Nathan set HAProxy LXC container limit to 64k
- Migrate off 18.04 and on to 20.04
- Multiple failures over multiple days
Monitoring Services
- Number of false alerts is a concern.
- Lack of Service Maintenance outside of standard working hours has been more of a problem lately.
- Management is aware and need to review this.
More usage data needed for labs (Mac and Linux) [https://rt.uwaterloo.ca/Ticket/Display.html?id=1284635][RT #1284635]
New business
linux.cscf.uwaterloo.ca
- New linux.cscf.uwaterloo.ca running Ubuntu 22.04 is almost ready
- Needs authentication set up with 2fa
- Switch over first week of Jan
Incremental backups of block devices
- Possible solutions include rsync and borg but neither is ideal
- gxshen to investigate Legato NetWorker backups of block devices
* add dashboard to screen outside Dave's office (devon)
* env monitoring needs to be plugged into UPS
Snapshots are still disabled
* a2brenna to enable snapshots on a file system to test performance - hoping to be done before start of next term
* communication should be sent at the beginning of the term to inform users of the current status
Comments