Linux Working Group



Invitees - Attendees

  • Adrian, Anthony (group leader), Clayton, Guoxiang, Lori, Fraser, Devon, Nathan, Nick, Todd, Dave, Lawrence, Omar

Review and accept previous meeting minutes.

Last meeting's tasks

  • Anthony - create a ticket for documentation of AD process -> assign to Clayton
    • Done. See RT: 1206465
  • Anthony - create a ticket to request Guoxiang to create a test volume on the NetApp
    • Done, NetApp volume mounted and being used for testing. See RTs: 1206057, 1197177
  • Anthony - create a ticket to document experiments with network file performance
    • Done, see RT: 1206057
  • Anthony/Adrian - work on new postfix recipe to have servers send mail out directly - RT#1204074
    • In progress
  • Anthony/Guoxiang - initiate warranty - RT#1196085 - under warranty until 2025-03-30
    • Stalled. Substantial progress on other hardware issues (replaced some RAM), see RT: 1205275
    • got two memory sticks from Dell and replaced, machine is now back in service
    • -004 still has a problem, but didn't get new memory sticks, not in the round-robin
    • errors to do with load? Guoxiang will put it back in the round-robin to test
    • Guoxiang - put -002 back in round-robin
  • Clayton - decommission the older NFS / Ganesha servers
    • done
  • Clayton - document process of adding hosts to AD and move to a generally accessible place
    • in progress
  • Clayton - create NFS share /opt/csw - RT#1194157 - /opt/csw
    • New share is live, needs to be mounted on the client.
  • Devon - create ticket to document power issues to show Plant Operations - done? Anthony will follow-up
    • ticket created?
    • any follow-up?
    • Dave says that 2+ years ago PlantOps says that's what Waterloo North provides
    • supports our requirements for UPS in our budget
    • consider our UPS evergreen cycle - currently 16 years, should it be lower? (after 3 battery replacements)
    • discussion about the maintenance costs of dirty power
    • Sample graph from PLG UPS
    • what are next steps?
      • prepare a document to be presented to CS Exec, who can then take it to PlantOps
      • Devon to create a ticket to start -> RT#1206808
      • Devon to provide power data, Dave to provide costs of bad power (from budget)
  • Guoxiang/Lori - create or report# ticket for Storage option catalogue (low priority)
    • low priority, take off Action items?
  • Lawrence - discuss continued use of the NetApp with MFCF
    • discussion initiated. What would we anticipate the total time we would be using it?
    • we don't know, MFCF asked/guessed this term + next term - would that be accurate?
    • sounds like a good upper-bound
    • lower bound - rest of term
    • can MFCF use their half independently of what CSCF does?
    • we half two disk shelves, they have four shelves, one shared shelf of SSD drives
    • need a discussion between Hari, Robyn, Jim?, Guoxiang and Dave
    • Odyssey database there? -> Guoxiang has a ticket for that - RT#1193614
  • Lawrence / RSG - update jerusalem and graceland to mount new NFS share - RT#1194157
  • Lawrence - create ticket for Devon to create Research HostGroups on icinga
    • done, Devon showed Lawrence how to create HostGroups and setup notifications
    • all Research HostGroups have been created and are working
    • Lawrence spent significant time to research and acknowledge all down systems and created tickets where appropriate
    • please now address any Icinga Down notification - review, find/create a ticket and Acknowledge - don't leave it Down
    • Guoxiang notes that even when Scheduling Downtime, there are still notices sent out
      • Devon points out that you need to check the box to include all of the separate services
      • Nathan points out that you may wish to check "Flexible downtime"

New Agenda Items

What is our "beefiest" student machine

  • 2004-010 ? 2004-002?

Polkit security vulnerability

A check of package version:

# dpkg -s policykit-1 |grep Version

Should give on a fixed machine:

Ubuntu 21.10
policykit-1 - 0.105-31ubuntu0.1
Ubuntu 20.04
policykit-1 - 0.105-26ubuntu1.2
Ubuntu 18.04
policykit-1 - 0.105-20ubuntu0.18.04.6

Not fixed 20.04 eg.
# dpkg -s policykit-1 |grep Version
Version: 0.105-26ubuntu1.1

Homedir Service Alternative Testing - Determine test criteria (RT: 1206587) (Anthony)

  • RT#1206587
  • What tests do we need to run and why?
  • How will decisions be made best on test results? (Quantifiable metrics).
    • Will need to benchmark production volumes.
    • Anthony has custom probe he's working on.
    • fio recommended by Fraser, for its ability to measure variance and simulate many users.
    • unscheduled downtime does not appear to be related to specific volumes.
    • Everyone, please think about this and document suggestions in RT: 1206587

Upgrade of ganesha CephFS gateway bits to version 4.0 (RT: 1153244) (Lori)

Creation and Maintenance of Inventory records (Lawrence)

  • did a lot of cleanup and research when working on the list of down systems in Icinga
  • In general, need to keep better records, especially of servers in our machine rooms
  • yes, we have a lot of problems in general in inventory, but let's start with some of our most valuable assets - our cloud hosts
  • we have a good index of the Cloud hosts:
  • I took all of those hosts and put them in a spreadsheet
    • 2022-01-22 Cloud hosts.xlsx
    • I would like to see an Admin contact, Warranty Start/Stop for each, and a "Purpose" for each machine.
    • I have added a link to the Virtual Host Index in the Comments of each record. It might be better to have an Edocs link to "Cloud Hosts" that has more information and the link to the Virtual Host Index there

Include Hostnames in RTs (Lawrence)

Action Items for Next Meeting

  • Anthony/Adrian - work on new postfix recipe to have servers send mail out directly - RT#1204074
  • Clayton - document process of adding hosts to AD and move to a generally accessible place
  • Devon - create ticket to document power issues to show Plant Operations - done? Anthony will follow-up -> RT#1206808
  • Guoxiang & Dave to discuss with Hari and Robyn maintenance of the NetApp
  • Lawrence / RSG - update jerusalem and graceland to mount new NFS share - RT#1194157
  • All - fix 5 inventory records and report next meeting on patterns of wrongness.
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2022-02-09 - LawrenceFolland
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback