Starving Docker of File-Handles


The value in /proc/sys/fs/file-max "defines a system-wide limit on the number of open files for all processes". This limit can be exceded by privileged processes, those with CAP_SYS_ADMIN capability.

file-max can be set with the ‘sysctl’ command.

# sysctl -w fs.file-max=99200

If an unprivileged process attempts to open files in excess of fs.file-max, you get ENFILE, errno.h #23 "File table overflow" and a message along the lines of:

Too many open files in system

I was curious what effect limiting the number of available file-handles would have on Docker. That is, what would happen when I try to start a Docker container, but the system does not have enough available file-handles.

/proc/sys/fs/file-nr aka fs.file-nr contains three values

1) the number of allocated file-handles 2) the number of allocated but unused file-handles 3) the maximum number of file-handles.

While running a series of experiments, I discovered that /proc/sys/fs/file-nr didn’t update as I expected.

With further experimentation, I discovered if I ran 'lsof' in the same shell that I had run my "open n-files" program, I could get fs.file-nr to update "on demand" as it were.

I also discovered that fs.file-nr only increases/decreases in blocks of 32. So, if I were monitoring fs.file-nr and the number of allocated file-handles was, say 960, and I opened one file, I expected fs.file-nr to then read 961... but that’s not how it works.

fs.file-nr showing 960 means that while there are no fewer that 960 allocated file-handles, the true value could be as many as 991.

This makes finding the true number of allocated file-handles a bit tedious.

The way I did this was to open an increasingly large number of files, and search for the value where "one more file" bumped fs.file-nr into the next block of 32.

# open.n.files 119 & lsof
# open.n.files 120 & lsof
# open.n.files 121 & lsof
# open.n.files 122 & lsof

While I did this, in another terminal I ran a script which would echo fs.file-nr to stdout, along with a timestamp.

13:35:14 1088   0       99200
13:35:16 1088   0       99200
13:35:18 1088   0       99200
13:35:20 1088   0       99200

Using this method I was able to find a value where fs.file-nr jumped from one 32-block to the next...

Opening 'm' files gives

13:36:55 1088   0       99200

And opening 'm + 1' files gives

13:37:03 1120       99200

Using this method I found the exact number of files to open so as to leave zero available file-handles for the system.

I then ran my program thusly

# open.n.files $((m - n)) & lsof

With 'n' being the number of file-handles I want available to the system.

This method gave me fine grained control over the number of available file-handles... these are the errors I found with the given number of available file-handles.

[0000-0011] available file-handles

user@debian:~$ docker run -t -d mydeb
-bash: start_pipeline: pgrp pipe: Too many open files in system
-bash: /usr/bin/docker: Too many open files in system

[0012-0013] available file-handles

user@debian:~$ docker run -t -d mydeb
docker: error while loading shared libraries: libpthread.so.0: cannot open shared object file: Error 23

[0014-0014] available file-handles

user@debian:~$ docker run -t -d mydeb
docker: error while loading shared libraries: libdl.so.2: cannot open shared object file: Error 23

[0015-0015] available files-handles

user@debian:~$ docker run -t -d mydeb
docker: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 23

[0016-0016] available file-handles

user@debian:~$ docker run -t -d mydeb
runtime: epollcreate failed with 23
fatal error: runtime: netpollinit failed

goroutine 1 [running, locked to thread]:
runtime.throw(0x561981971c34, 0x1b)
        /usr/local/go/src/runtime/panic.go:616 +0x83 fp=0xc42007fbd8 sp=0xc42007fbb8 pc=0x56198069ba43
runtime.netpollinit()
        /usr/local/go/src/runtime/netpoll_epoll.go:36 +0xca fp=0xc42007fc00 sp=0xc42007fbd8 pc=0x56198069904a
internal/poll.runtime_pollServerInit()
        /usr/local/go/src/runtime/netpoll.go:87 +0x22 fp=0xc42007fc10 sp=0xc42007fc00 pc=0x561980697cc2
sync.(*Once).Do(0x561982e8ed08, 0x5619823603c0)
        /usr/local/go/src/sync/once.go:44 +0xc0 fp=0xc42007fc48 sp=0xc42007fc10 pc=0x5619806de8f0
internal/poll.(*pollDesc).init(0xc42003a1a8, 0xc42003a190, 0x1, 0xc42003a190)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:36 +0x3f fp=0xc42007fc90 sp=0xc42007fc48 pc=0x561980703ebf
internal/poll.(*FD).Init(0xc42003a190, 0x561981948aab, 0x4, 0x80001, 0x0, 0x3)
        /usr/local/go/src/internal/poll/fd_unix.go:62 +0x62 fp=0xc42007fcc0 sp=0xc42007fc90 pc=0x561980704be2
os.newFile(0x3, 0x5619819722d1, 0x1c, 0x1, 0x3)
        /usr/local/go/src/os/file_unix.go:117 +0xf5 fp=0xc42007fd10 sp=0xc42007fcc0 pc=0x56198070e1b5
os.openFileNolog(0x5619819722d1, 0x1c, 0x0, 0x0, 0x1a0, 0xc4200ac000, 0x0)
        /usr/local/go/src/os/file_unix.go:194 +0x1a4 fp=0xc42007fd68 sp=0xc42007fd10 pc=0x56198070e4b4
os.OpenFile(0x5619819722d1, 0x1c, 0x0, 0xc400000000, 0x0, 0x561982e70fa0, 0xc42007fe10)
        /usr/local/go/src/os/file.go:269 +0x61 fp=0xc42007fdb0 sp=0xc42007fd68 pc=0x56198070c101
os.Open(0x5619819722d1, 0x1c, 0xc4200ac000, 0xc42007fe48, 0x56198067a384)
        /usr/local/go/src/os/file.go:250 +0x48 fp=0xc42007fdf8 sp=0xc42007fdb0 pc=0x56198070bfe8
net.open(0x5619819722d1, 0x1c, 0xc4200ac000, 0x0, 0xd)
        /usr/local/go/src/net/parse.go:68 +0x3b fp=0xc42007fe58 sp=0xc42007fdf8 pc=0x5619807980ab
net.maxListenerBacklog(0x0)
        /usr/local/go/src/net/sock_linux.go:10 +0x45 fp=0xc42007fe98 sp=0xc42007fe58 pc=0x56198079aa55
net.init()
        /usr/local/go/src/net/net.go:358 +0xee0 fp=0xc42007fef8 sp=0xc42007fe98 pc=0x5619807ae070
github.com/docker/cli/vendor/github.com/spf13/pflag.init()
        <autogenerated>:1 +0x89 fp=0xc42007ff28 sp=0xc42007fef8 pc=0x5619807d6bc9
github.com/docker/cli/vendor/github.com/spf13/cobra.init()
        <autogenerated>:1 +0x6b fp=0xc42007ff68 sp=0xc42007ff28 pc=0x561980813c4b
github.com/docker/cli/cli.init()
        <autogenerated>:1 +0x5a fp=0xc42007ff78 sp=0xc42007ff68 pc=0x56198081654a
main.init()
        <autogenerated>:1 +0x5e fp=0xc42007ff88 sp=0xc42007ff78 pc=0x561981943a3e
runtime.main()
        /usr/local/go/src/runtime/proc.go:186 +0x1d2 fp=0xc42007ffe0 sp=0xc42007ff88 pc=0x56198069d292
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc42007ffe8 sp=0xc42007ffe0 pc=0x5619806c8be1

goroutine 5 [runnable]:
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:20
created by os/signal.init.0
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x43

[0017-0022] available file-handles

user@debian:~$ docker run -t -d mydeb
8673a88932992868b602060d7278672183b042a3af5cad1c6e8a2be86bff6cf4
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.

[0023-0058] available file-handles

user@debian:~$ docker run -t -d mydeb
71c5da097f10e75fc4432f0c63a08fa6cc90feef0f88002106f6cf740e773d62
docker: Error response from daemon: cannot start a stopped process: unknown.

[0059-0072] available file-handles
No errors displayed to user... likely errors are in /var/log/daemon.log

[0073-0089] available file-handles
No error displayed to user. Docker container started, and listed with 'docker container ls' But can’t 'docker exec -ti <container id> bash' into the container.

user@debian:~$ docker exec -ti b4a3203e4de8 bash
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused \"too many open files in system\": unknown

[0090-0094] available file-handles Can "exec bash" into container but can’t run ‘ls’.

user@debian:~$ docker exec -ti be5a0cf6dd10 bash
root@be5a0cf6dd10:/# ls
ls: cannot open directory '.': Too many open files in system

[0095-nnnn] available file-handles Can "exec bash" into container, and run ‘ls’.

root@4f4c5a2a9600:/# ls
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr