TWiki
>
CF Web
>
CSLinuxWorkingGroup
>
CsLWGMeeting20220504
(2022-05-04,
LawrenceFolland
)
E
dit
A
ttach
Linux Working Group
AGENDA LOCKED
Meeting Date
Invitees - Attendees
Review and accept previous meeting minutes.
Review last meeting's Action Items
Proposed Agenda Items
Action Items for next meeting
AGENDA LOCKED
Meeting Date
TEAMS: 2022-05-04
Invitees - Attendees
Invited: Adrian, Anthony (group leader), Clayton, Guoxiang, Lori, Fraser, Devon, Nathan, Nick, Todd, Dave, Lawrence, Omar
Present: Adrian, Anthony (group leader), Clayton, Guoxiang, Lori, Fraser, Devon, Nathan, Todd, Lawrence, Omar
Review and accept previous meeting minutes.
CsLWGMeeting20220420
Review last meeting's Action Items
Anthony - will create a ticket for monitoring processor data
Devon and Anthony making progress
Combined general use cpu load graph (and possibly other metrics)
Clayton/Fraser - document process of adding hosts to AD and move to a generally accessible place (git) ->
https://rt.uwaterloo.ca/Ticket/Display.html?id=1217894
waiting for Fraser to test it out on graphics lab machines
Dave - put up Beta version of new Virtual Host Index / Anthony to create a ticket - RT#1211603 -> working on it
not up yet
Lori - create ticket for Devon to create combined graph for DFSc
https://rt.uwaterloo.ca/Ticket/Display.html?id=1215768
ticket created for network bandwidth (RT#1215768)
fixed issue with averaging
still need to create ticket for "combined graph" - load? CPU? across all machines?
may need some definitions, but generally shows overall health of the Ceph cluster
Devon will create a combined graph of some type - can be refined
Total DFSc CPU monitoring usage added at the top of
https://icinga.cscf.uwaterloo.ca/grafana/d/03EnhXZGz/dfsc-monitoring?orgId=1
Lawrence - follow-up with
SuperMicro
re: RT#1079451 - ubuntu1804-006 CPU hardware errors
Looks like CPU is in flight
need to verify that - not clear to me - LF will follow-up
Anthony - update the Purpose field of currently "unused" machines in the Virtual Host Index: some progress, need to make another pass.
progress was made, some more work to do
Proposed Agenda Items
proposed reboots coming up soon?
ideally Thursday after 5pm, will take a couple hours
machines/services involved? all (most) cloud machines - general use servers - (ubuntu1804*, ubuntu2004*)
vault and web services should be unaffected as machines will be rebooted one-at-a-time
*411 hosts outside of scope of this reboot, so
MySQL
services will be unaffected (and dependent apps)
Lawrence/Omar will send out a notice
again suggest updating all motd files - Anthony will do ASAP
kernel will be reverted to 5.4 on most machines (can be put back to 5.13, if necessary)
Consider btrfs usage? Testing/benchmarking. Positives: snapshots, compression. RAID0/1, volume mgmt. Negatives: write hole/RAID5/6, code churn, stability - Nathan
do we use parity RAID? May not be an issue
tabled for next meeting
Action Items for next meeting
Anthony - create dashboards for monitoring processor data
Clayton/Fraser - document process of adding hosts to AD and move to a generally accessible place (git) ->
https://rt.uwaterloo.ca/Ticket/Display.html?id=1217894
- Fraser to test process on GL machines
Dave - put up Beta version of new Virtual Host Index / Anthony to create a ticket - RT#1211603 -> working on it
Lori - create ticket for Devon to create combined graph for DFSc
Lawrence - follow-up with
SuperMicro
re: RT#1079451 - ubuntu1804-006 CPU hardware errors - verify CPU being shipped
Anthony - update the Purpose field of currently "unused" machines in the Virtual Host Index: some progress, need to make another pass.
Nathan - create ticket for btrf testing, especially wrt for managing LXC root file size limits
Lawrence/Omar - send out notice regarding upcoming between-term reboots, Anthony - update relevant motd files
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r4
<
r3
<
r2
<
r1
|
B
acklinks
|
R
aw View
|
WYSIWYG
|
M
ore topic actions
Topic revision: r4 - 2022-05-04
-
LawrenceFolland
CF
Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.
CF Web
CF Web Home
Changes
Index
Search
Administration
Communication
Email
Hardware
HelpDeskGuide
Infrastructure
InternalProjects
Linux
MachineNotes
Macintosh
Management
Networking
Printing
Research
Security
Software
Solaris
StaffStuff
TaskGroups
TermGoals
Teaching
UserSupport
Vendors
Windows
XHier
Other Webs
CSEveryBody
Main
Sandbox
TWiki
UW
My links
People
CERAS
WatForm
Tetherless lab
Ubuntu Main.HowTo
eDocs
RGG NE notes
RGG
CS infrastructure
Grad images
Edit
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback