Linux Working Group



Meeting Date

  • TEAMS: 2023-10-04

Invited

Anthony (group leader), Lori, Dave, O, Clayton, Guoxiang, Nathan, Nick, Todd, Ed

Attendees

Anthony (group leader), Dave, Devon, Clayton, Fraser, Guoxiang, Nick, Steve

Review and accept previous meeting minutes.

CsLWGMeeting20230920

Urgent Business

New LInux Vulnerability (Loony Tunables)

After patching requires system to be rebooted RT#1304195

Review last meeting's Action Items

Homedirectory quotas (a2brenna) - RTs: RT#1112506, RT#1288354, RT#1298614, et al

There is "CSCF exec" doubt about the value of managing per user quota versus flagging excessive usage and pear-pressure.

100GB quotas in staged roll out, plan detailed in tickets above

  • Email sent, see RT#1288354
    • Nick to send second reminder in mid Oct (two weeks before Nov 1)
  • Check with SAT development team about status of storing a storage quota in appropriate sponsorship tables.
    • Clayton will work on setting all the "maxstorage" user entries in each Domain to an initial 100,000,000B base amount.
    • Note that in long run (by Aug 2024 hopefully) this ends up being a base amount plus additional SAT entry sponsorships once that mechanism is established. * Build a data file that contains the user and summed quota information. Specifics to be worked out between Clayton and Nathan.
  • Update quota CEPHfs xattr on homedirectories (Nathan)
  • Has this default quota been implemented for new accounts / accounts under 100GB?
    • Progress?
  • Tooling to implement quotas
    • Calculate current quotas from sponsorship information, see RT#1298614 (Clayton)
  • There appears to be 3 (teaching, non-course-account) users with sponsored quota in excess of 100GB
    • Maybe re-run that query as sponsorships can change

Regarding URAs

Does SAT sponsorship contain URAs?

Do we have a Domain "netgroup" that contains URAs? (Clayton? )

CS Mailservers are going away

No update

Ongoing problems with Inventory and IPAM are hobbling Infrastructure operations - RT1285291

Will schedule some time to talk with Inventory team about the following (a2brenna)
  • Inventory is unaware of this IP / domain limitations in IPAM as well as DHCP and MAC address requirements
  • Some CSCF do not have access to create manual DNS entries (Devon, Lori, Guoxiang, Todd have access. Dave?)
  • Inventory bug: Changing room field on a record with IPAM DNS & DHCP causes DHCP to break
  • Anthony to reach out to IST for clarification regarding is this a policy vs technological limitation.
  • Invalid records were imported from Infoblox that work until they are edited

What's still using old MySQL?

NextCloud (Vault) pending migration

  • Scheduling Nextcloud DB migration ~21st (Nathan, Fraser)
    • Complete? Problems?

Web server (includes Inventory)

  • needs OS (whole LAMP stack) to be updated

NFS ganesha server needed rebooting on Monday, RT#1303795

GaneshaNFS needed to be rebooted, similar that cephfs locked up on web server
  • needs further enhancements to monitoring service?
  • add tests/checks for mounts

New Items

Webserver failure, RT#1304121

Cephfs timeout evicted the web server? Icinga did alert on this. (It checks both accuracy and response time.) See

Web page showing status of linux.student.cs hosts that students and course staff can check RT#1279831

  • Requested by CS136 staff due to how many linux.student.cs issues we had in W23 term
    • CS 136 will have ~1000 users
    • Devon will produce dedicated Grafana page (dashboard) for this.

Ticket hygiene

Many people are not "closing" tickets.
  • Anthony will mention at next CSCF group meeting.

Monitoring Services

  • Number of false alerts is a concern.
  • Lack of Service Maintenance outside of standard working hours has been more of a problem lately.
    • Management will need to review this.
  • For NetTops monitoring, Devon and Anthony are putting something together.
  • Web servers needs have it's network storage access monitored
    • check URL's for Web monitoring dashboards in "Webserver failure" item above if this is still true.
    • MFCF using LOKI to analyse and alert based on system log entries.
      • MFCF has noted that IST pen testing can cause false positives.

Comments

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2023-10-26 - DaveGawley
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback