Linux Working Group

Meeting Date

  • TEAMS: 2023-11-29


Anthony (group leader), Lori, Dave, O, Clayton, Guoxiang, Nathan, Nick, Todd, Ed, Devon


Anthony (group leader), Guoxiang, Nathan, Nick, Todd, Ed, Devon, Fraser

Review and accept previous meeting minutes.


Review last meeting's Action Items

Homedirectory quotas (a2brenna) - RTs: RT#1112506, RT#1288354, RT#1298614, et al

There is "CSCF exec" doubt about the value of managing per user quota versus flagging excessive usage and peer-pressure
  • Plan to implement and enforce quotas will move forward as per Director
  • Delayed due to preparations for chilled water outtage on Dec 6th.

100GB quotas in staged roll out, plan detailed in tickets above

  • Email sent, see RT#1288354
    • Nick sent second reminder in mid Oct - no replies - Small group of students (5 or 6) have gone over quota during the grace period
    • How do we monitor this monthly?
  • Check with SAT development team about status of storing a storage quota in appropriate sponsorship tables.
    • Clayton has set all the "maxstorage" user entries in each Domain to an initial 100,000,000B base amount.
    • Note that in long run (by Aug 2024 hopefully) this ends up being a base amount plus additional SAT entry sponsorships once that mechanism is established. * Build a data file that contains the user and summed quota information. Specifics to be worked out between Clayton and Nathan. * Nathan has been pulled off this
  • Update quota CEPHfs xattr on homedirectories (Nathan)
    • Nathan has been pulled off this
  • Tooling to implement quotas
    • Calculate current quotas from sponsorship information, see RT#1298614 (Clayton)
  • There appears to be 3 (teaching, non-course-account) users with sponsored quota in excess of 100GB
    • Maybe re-run that query as sponsorships can change

CS Mailservers are going away - by Jan 1

  • provided Lori with list of hosts still using it as a relay, will check next week for action
  • alias / vanity addresses will stop working
  • special forwarding will cease working, consulting with IST and Brad Lushman (a2brenna) * Nick is looking to replace the forwarding script with Microsoft Power Automate (M365)

Ongoing problems with Inventory and IPAM are hobbling Infrastructure operations - RT1285291

Will schedule some time to talk with Inventory team about the following (a2brenna)
  • Inventory is unaware of this IP / domain limitations in IPAM as well as DHCP and MAC address requirements
    • Invalid records were imported from Infoblox that work until they are edited
    • public domain that resolved to private IP, modifying this record will break the record
  • Some CSCF do not have access to create manual DNS entries (Devon, Lori, Guoxiang, Todd have access. Dave?)
  • Inventory bug: Changing room field on a record with IPAM DNS & DHCP causes DHCP to break
  • Anthony to reach out to IST for clarification regarding is this a policy vs technological limitation.
  • Delayed due to preparations for chilled water outtage on Dec 6th.
  • Need CSCF management to take over this ticket

What's still using old MySQL?

NextCloud (Vault) pending migration

  • Scheduling Nextcloud DB migration (Nathan, Fraser)
  • Testing is complete - need to be scheduled

Web server (includes Inventory)

  • needs OS (whole LAMP stack) to be updated (Nathan, Isaac)

Retire CS-GENERAL and associated domain controllers

  • Last user is Vault
    • Vault upgrade needs to be performed - move db to mysql 8, upgrade NextCloud, then fix AD
    • Vault migration from GENERAL to CS-GENERAL may take place after upgrade. Nathan to determine what is priority
    • Why can't vault switch domains to GENERAL? (a2brenna) - File space in vault is mapped to the user's UUID. Clayton has provided a mapping from GENERAL to CS-GENERAL.

NFS ganesh"> Ongoing problems with NFS ganesha server RT#1303795

  • needs further enhancements to monitoring service?
    • Devon and Anthony to preparing doc for help desk
    • More comprehensive monitoring of NFS performance is in the works (a2brenna, dmerner) ~ end November
    • Delayed due to preparations for chilled water outtage on Dec 6th.

Web Service failure [][RT#1304871]

  • HAProxy hit open file limit (4096 open file descriptor), change made by a2brenna will not survive a reboot * Nathan set HAProxy LXC container limit to 64k
  • Migrate off 18.04 and on to 20.04
  • Multiple failures over multiple days

Monitoring Services

  • Number of false alerts is a concern.
  • Lack of Service Maintenance outside of standard working hours has been more of a problem lately.
    • Management is aware and need to review this.

More usage data needed for labs (Mac and Linux) [][RT #1284635]

New business

  • New running Ubuntu 22.04 is almost ready
    • Needs authentication set up with 2fa
    • Switch over first week of Jan

Incremental backups of block devices

  • Possible solutions include rsync and borg but neither is ideal
  • gxshen to investigate Legato NetWorker backups of block devices

Temperature monitoring for m3 -

* add dashboard to screen outside Dave's office (devon) * env monitoring needs to be plugged into UPS

Snapshots are still disabled

* a2brenna to enable snapshots on a file system to test performance - hoping to be done before start of next term * communication should be sent at the beginning of the term to inform users of the current status


Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2023-11-29 - ToddLichty
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback