Or how something seemingly perplexing can devolve into something trivial
I almost cancelled this as being too trivial. However, in part of it
we can see how some of the LXC creators seemed to be similarly confused
about some of the details here.
Start
Ubuntu Inotify Tuning Demo 20191025
Or how something seemingly perplexing can devolve into something trivial
I almost cancelled this as being too trivial. However, in part of it
we can see how some of the LXC creators seemed to be similarly confused
about some of the details here.
00-intro
Linux systems have /proc/sys
Ubuntu systems have /etc/sysctl.conf
-rw-r--r-- 1 root root 3751 Sep 27 13:07 /etc/sysctl.conf
Entries like the following:
kernel.pty.max=32768
Cause at boottime...
cscf-adm@xsbook7:~% grep '^' /proc/sys/kernel/pty/max
32768
cscf-adm@xsbook7:~%
A.B.C.D => /proc/sys/A/B/C/D
Lots of parameters affect obsure details of performance (limits).
#10-lxc+inotify #95-further
./qd-stop-demos
Two other windows, both
root@xsbook7:/home/cscf-adm/demo-20191025
10-lxc+inotify
"lxc" is a containerization suite that CSCF uses.
https://linuxcontainers.org/
- create "fake machines", e.g. even on your own workstation/laptop
I found I could not simultaneously run more than about 7 usefully running
lxc containers on my 16G workstation.
https://github.com/lxc/lxd/blob/master/doc/production-setup.md
(seemingly part of source documentation for lxd)
(lxd is would-be successor to lxc)
points to "ls -ld /proc/sys/*/inotify/*"
-rw-r--r-- 1 root root 0 Oct 17 12:02 /proc/sys/fs/inotify/max_queued_events
-rw-r--r-- 1 root root 0 Oct 17 12:02 /proc/sys/fs/inotify/max_user_instances
-rw-r--r-- 1 root root 0 Oct 17 12:02 /proc/sys/fs/inotify/max_user_watches
(However, in details the information there seems to be wrong).
My laptop (12G may be relevant) now uses something like...
cscf-adm@xsbook7:~% grep '^' /proc/sys/*/inotify/*
/proc/sys/fs/inotify/max_queued_events:262144
/proc/sys/fs/inotify/max_user_instances:131072
/proc/sys/fs/inotify/max_user_watches:196608
cscf-adm@xsbook7:~%
#20-demo01 <prev>
grep '^' /proc/sys/*/inotify/*
20-demo01
To demonstrate the problem containers run into, I use the following
go back to the default state...
root@xsbook7:~# cat /home/cscf-adm/demo-20191025/make-default-inotify
#!/bin/bash
echo 16384 > /proc/sys/fs/inotify/max_queued_events
echo 128 > /proc/sys/fs/inotify/max_user_instances
echo 8192 > /proc/sys/fs/inotify/max_user_watches
root@xsbook7:~#
#30-demo05-startlxc <prev>
./make-default-inotify
./show-inotify
30-demo05-startlxc
Now, after setting the parameters to default, I use the following to start
all my 41 trivial demo containers...
root@xsbook7:~# cat /home/cscf-adm/demo-20191025/start-all-demos
#!/bin/bash
# Takes 30 seconds to run, maybe...
# must be superuser
CONTAINERS=`lxc-ls -f | grep '^u....tunedemo' | awk '{print $1}' | grep -v '00$' `
for c in $CONTAINERS ; do
echo $c
lxc-start -n $c
done
root@xsbook7:~#
#40-demo10-showlxc <prev>
./start-all-demos
./show-all-demos
./show-all-demos
./show-all-demos
./qd-show-hung-demos
40-demo10-showlxc
Very sad, we wait and wait, but a lot of containers fail to get
an IP address. We detect that with the following command...
root@xsbook7:~# cat /home/cscf-adm/demo-20191025/qd-show-hung-demos
#!/bin/bash
lxc-ls -f | grep RUNNING | grep -v ' 10[.]'
root@xsbook7:~#
#45-demo12-restart <prev>
./show-all-demos
./qd-show-hung-demos
./qd-show-hung-demos
./qd-show-hung-demos
45-demo12-restart
Let's try restarting all the hung containers--but it won't totally help.
(But it did in my case let me demonstrate a red herring).
Use the following script...
root@xsbook7:~# cat ./qd-restart-demos
#!/bin/bash
STALLED=`lxc-ls -f | grep RUNNING | grep -v ' 10[.]' | awk '{print $1}' `
for c in $STALLED; do
echo $c
lxc-stop --timeout 2 -n $c
lxc-start -n $c
done
root@xsbook7:~#
#47-demo14-restart <prev>
47-demo14-restart
./qd-restart-demos
./show-all-demos
./show-all-demos
./qd-show-hung-demos
./qd-show-hung-demos | wc
Oh well. One or two containers might advance.
If we are lucky I can show you the "too many open files" diagnostic
from "tail". But that doesn't seem to happen in a live demonstration!
(Red herring: "open files" is referring to inotify attempts)
You can set /proc/sys/fs/file-max as high as you want and it won't help.
cscf-adm@xsbook7:~$ grep -H '^' /proc/sys/fs/file-max
/proc/sys/fs/file-max:2317350
cscf-adm@xsbook7:~$
#50-demo15-fixit <prev>
tail -f /var/log/syslog
50-demo15-fixit
So, having found the hints at
https://github.com/lxc/lxd/blob/master/doc/production-setup.md
I use a conservative version of them...
(Actually while creating this demo I determined that
max_user_instances is the only crucially important one).
root@xsbook7:~# cat /home/cscf-adm/demo-20191025/make-good-inotify
#!/bin/bash
#https://github.com/lxc/lxd/blob/master/doc/production-setup.md
# says use 1048576 = (1024*1024) for all three.
# That seems sloppy.
# In tests, 320 for max_user_instances seemed minimally adequate.
# 262144 for all seemed (more than) adequate.
# max_queued_events > max_user_watches > max_user_instances
#echo 262144 > /proc/sys/fs/inotify/max_queued_events
echo 360 > /proc/sys/fs/inotify/max_user_instances
#echo 196608 > /proc/sys/fs/inotify/max_user_watches
root@xsbook7:~#
The above action will not immediately fix the problem.
That is, the containers will not spontaneously unlock.
#60-demo20-restart <prev>
./make-good-inotify
./show-inotify
./show-all-demos
./qd-show-hung-demos
./qd-show-hung-demos | wc
60-demo20-restart
So restart all hung containers.
root@xsbook7:~# cat /home/cscf-adm/demo-20191025/qd-restart-demos
#!/bin/bash
STALLED=`lxc-ls -f | grep RUNNING | grep -v ' 10[.]' | awk '{print $1}' `
for c in $STALLED; do
echo $c
lxc-stop --timeout 2 -n $c
lxc-start -n $c
done
root@xsbook7:~#
./qd-restart-demos
./show-all-demos
./qd-show-hung-demos
./qd-show-hung-demos | wc
Hurray!
#90-questions <prev>
90-questions
Take-aways...
"lxc" https://linuxcontainers.org/ is a powerful
near-virtualization method CSCF uses
sysctl command and /etc/sysctl.conf are interfaces to the
more hacky manipulation of /proc/sys (for kernel tuning)
Some such values can impact heavy container use, and need to be changed.
man 5 proc ; man 7 inotify
(Actually require "manpages" package; sometimes not on containers).
"Too many open files" red herring.
This is your computer.
This is your computer on lxc.
This is your computer on lxc with some tuning.
Any questions?
#95-further <prev>
./qd-stop-demos
./make-new-inotify
95-further
/etc/sysctl.d/[0-9][0-9]-*
Basic detailed problem in this small case is root is constrained
as much as any other user for inotify instances.
The sysctl(8) command is the correct interface to use for the
tuning. But direct manipulation of /proc is easier and more fun.
In actual practice you use /etc/sysctl.conf etc.
Recently we observed that ubuntu1804-200 had...
ubuntu1804-200% grep -C3 inotify /etc/sysctl.d/*
/etc/sysctl.d/10-lxd-inotify.conf:# Increase the user inotify instance limit to allow for about
/etc/sysctl.d/10-lxd-inotify.conf-# 100 containers to run before the limit is hit again
/etc/sysctl.d/10-lxd-inotify.conf:fs.inotify.max_user_instances = 1024
ubuntu1804-200%
ubuntu1804-200% dpkg-query -S /etc/sysctl.d/10-lxd-inotify.conf
lxd: /etc/sysctl.d/10-lxd-inotify.conf
ubuntu1804-200%
So they backed off from their own suggestion of (1024*1024) for all three.
Other things I have tweaked...
/proc/sys/kernel/pty/max
/proc/sys/fs/file-max
(both probably red herrings)
/proc/sys/kernel/keys/maxkeys 2000 from 200
/proc/sys/net/core/netdev_max_backlog 25000 from 1000
First is theoretical in my case.
Second may be relevant if it applies to lxcbr0 local networking.
Start <prev>