image— layout: page title: “Readme” category: “docker” —

image

image

image

Two core concepts:

image

image

Namespaces

image

docker run traefik
pstree -spa 66560

systemd,1 --system --deserialize 18
  └─containerd-shim,66535 -namespace moby -id 0ac949292b659a21e0037c91c7149f6fea12235ae4c5840d8448714081973154 -address /run/containerd/containerd.sock
      └─traefik,66560 traefik

image

nsenter - run program in different namespaces

Filesystem (Mount Namespace) comparision

image

Similary, we can check

Implementation

image


#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>

/* Define a stack for clone, stack size 1M */
#define STACK_SIZE (1024 * 1024)

static char container_stack [ STACK_SIZE ] ;

char * const container_args [] = {
    "/bin/bash" ,
    NULL
} ;

int container_main(void* arg)
{
    /* Looking at the PID of the child process,
    we can see that the pid of the output child process is 1 */
    printf("Container [%5d] - inside the container!\n", getpid());
    sethostname("container",10);
    execv(container_args[0], container_args);
    printf("Something's wrong!\n");
    return 1;
}

int main()
{
    printf("Parent [%5d] - start a container!\n", getpid());
    /* PID namespace - CLONE_NEWPID */
    int container_pid = clone(container_main, container_stack+STACK_SIZE, 
            CLONE_NEWUTS | CLONE_NEWPID | SIGCHLD, NULL); 
    waitpid(container_pid, NULL, 0);
    printf("Parent - container stopped!\n");
    return 0;
}

Output

hchen@ubuntu:~$ sudo ./pid
Parent [ 3474] - start a container!
Container [ 1] - inside the container!
root@container:~# echo $$
1

Ref: https://coolshell.cn/articles/17010.html

Cgroups

image

image

➜  ~ head -n 1 /proc/66560/cgroup

12:pids:/docker/0ac949292b659a21e0037c91c7149f6fea12235ae4c5840d8448714081973154

On most Linux systems, this very large number(2^63 - 1) is used to represent an “unlimited” or “no-limit” setting within cgroups.

➜  ~ cat /sys/fs/cgroup/memory/docker/0ac949292b659a21e0037c91c7149f6fea12235ae4c5840d8448714081973154/memory.limit_in_bytes

9223372036854771712

Deep Dive into Docker Internals - Union Filesystem

https://martinheinz.dev/blog/44

Overlay filesystem

What is overlay fs https://wiki.archlinux.org/title/Overlay_filesystem

image

Ubuntu example

image

How containers use this

image

If container writes any files, it doesn’t modify anything in lower layers

image

docker run -d traefik
c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6

Running mount -t overlay inside the docker

docker exec -it c702369a8429 sh
/ # mount -t overlay
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/CK3RK6RKLXTDLCVT7J6XUNJFYI:/var/lib/docker/overlay2/l/MJZW5RC5EQX5QV64ZQFI5YRA6V:/var/lib/docker/overlay2/l/XG3WJGGNM4CP67RWANTABIWBOL:/var/lib/docker/overlay2/l/X32XXQFB6ADFFO2FLDCVIV6J2K:/var/lib/docker/overlay2/l/T72XWGVHJ6FWJXBYGSBLRK6FPE,upperdir=/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/diff,workdir=/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/work)
docker inspect c702369a8429 | grep  GraphDriver -A 8
        "GraphDriver": {
            "Data": {
                "ID": "c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6",
                "LowerDir": "/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413-init/diff:/var/lib/docker/overlay2/0107d134713b05fc02091a41f1da372a9c9a0b7442f0c6a9ec130ace13940fe8/diff:/var/lib/docker/overlay2/8e8803ebddca09cd58274141eed8e426ddb4d3b96273cdda29c61f17ca20513b/diff:/var/lib/docker/overlay2/6b075fb9786d41cae6451f6ccc4e7708133646b57f45460394508e63a0da822b/diff:/var/lib/docker/overlay2/8beff5c84e30b1915a9017f659232bacde302c7386b5a9b7e4196b3932492780/diff",
                "MergedDir": "/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/merged",
                "UpperDir": "/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/diff",
                "WorkDir": "/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/work"
            },
            "Name": "overlay2"

Also on the host by searching merged dir /var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/merged

➜  ~ mount | grep 79ded441a3bd88ad3721bf119dc6266904
overlay on /var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/CK3RK6RKLXTDLCVT7J6XUNJFYI:/var/lib/docker/overlay2/l/MJZW5RC5EQX5QV64ZQFI5YRA6V:/var/lib/docker/overlay2/l/XG3WJGGNM4CP67RWANTABIWBOL:/var/lib/docker/overlay2/l/X32XXQFB6ADFFO2FLDCVIV6J2K:/var/lib/docker/overlay2/l/T72XWGVHJ6FWJXBYGSBLRK6FPE,upperdir=/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/diff,workdir=/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/work)
➜  ~ sudo findmnt --target /var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/merged
TARGET                                                                                           SOURCE  FSTYPE  OPTIONS
/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/merged overlay overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/CK3RK6RKLXTDLCVT7J6XUNJFYI:/var/lib/docker/overlay2/l
➜  ~

In the following, we can see detailed map of the filesystem environment for docker container process 14188:

➜  ~ sudo cat /proc/14188/mountinfo
1121 994 0:80 / / rw,relatime - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/CK3RK6RKLXTDLCVT7J6XUNJFYI:/var/lib/docker/overlay2/l/MJZW5RC5EQX5QV64ZQFI5YRA6V:/var/lib/docker/overlay2/l/XG3WJGGNM4CP67RWANTABIWBOL:/var/lib/docker/overlay2/l/X32XXQFB6ADFFO2FLDCVIV6J2K:/var/lib/docker/overlay2/l/T72XWGVHJ6FWJXBYGSBLRK6FPE,upperdir=/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/diff,workdir=/var/lib/docker/overlay2/79ded441a3bd88ad3721bf119dc626690444ce58c9ed378f5a1b923667abe413/work
1122 1121 0:87 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
1123 1121 0:88 / /dev rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755
1124 1123 0:89 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666
1125 1121 0:90 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro
1126 1125 0:91 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,mode=755
1127 1126 0:29 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,xattr,name=systemd
1128 1126 0:31 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,perf_event
1129 1126 0:32 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,cpu,cpuacct
1130 1126 0:33 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,cpuset
1131 1126 0:34 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,blkio
1132 1126 0:35 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,rdma
1133 1126 0:36 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,net_cls,net_prio
1134 1126 0:37 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:20 - cgroup cgroup rw,devices
1135 1126 0:38 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:21 - cgroup cgroup rw,freezer
1136 1126 0:39 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:22 - cgroup cgroup rw,hugetlb
1137 1126 0:40 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:23 - cgroup cgroup rw,memory
1138 1126 0:41 /docker/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6 /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime master:24 - cgroup cgroup rw,pids
1139 1123 0:86 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw
1140 1123 0:92 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,size=65536k
1141 1121 8:1 /var/lib/docker/containers/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/sda1 rw,errors=remount-ro,data=ordered
1142 1121 8:1 /var/lib/docker/containers/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6/hostname /etc/hostname rw,relatime - ext4 /dev/sda1 rw,errors=remount-ro,data=ordered
1143 1121 8:1 /var/lib/docker/containers/c702369a8429445312f561631ef8871ed9b8c055551151e549190398fef936e6/hosts /etc/hosts rw,relatime - ext4 /dev/sda1 rw,errors=remount-ro,data=ordered
995 1122 0:87 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw
996 1122 0:87 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw
997 1122 0:87 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw
998 1122 0:87 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
999 1122 0:87 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
1012 1122 0:93 / /proc/acpi ro,relatime - tmpfs tmpfs ro
1013 1122 0:88 /null /proc/interrupts rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755
1014 1122 0:88 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755
1015 1122 0:88 /null /proc/keys rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755
1016 1122 0:88 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755
1017 1122 0:88 /null /proc/sched_debug rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755
1018 1122 0:94 / /proc/scsi ro,relatime - tmpfs tmpfs ro
1019 1125 0:95 / /sys/firmware ro,relatime - tmpfs tmpfs ro

Ignore typos my AI generated image image

Changes to the image

➜  ~ docker diff c702369a8429
A /nishanth.txt
C /root
A /root/.ash_history
docker history traefik
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
a14917e96c7b   3 weeks ago   LABEL org.opencontainers.image.vendor=Traefi…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   CMD ["traefik"]                                 0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ENTRYPOINT ["/entrypoint.sh"]                   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   EXPOSE map[80/tcp:{}]                           0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   COPY entrypoint.sh / # buildkit                 419B      buildkit.dockerfile.v0
<missing>      3 weeks ago   RUN /bin/sh -c set -ex;  apkArch="$(apk --pr…   168MB     buildkit.dockerfile.v0
<missing>      3 weeks ago   RUN /bin/sh -c apk --no-cache add ca-certifi…   1MB       buildkit.dockerfile.v0
<missing>      3 weeks ago   CMD ["/bin/sh"]                                 0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ADD alpine-minirootfs-3.22.2-x86_64.tar.gz /…   8.32MB    buildkit.dockerfile.v0

The docker file is https://github.com/traefik/traefik-library-image/blob/master/v3.5/alpine/Dockerfile, you can relate above with the following

FROM alpine:3.22
RUN apk --no-cache add ca-certificates tzdata
RUN set -ex; \
	apkArch="$(apk --print-arch)"; \
	case "$apkArch" in \
		armhf) arch='armv6' ;; \
		aarch64) arch='arm64' ;; \
		x86_64) arch='amd64' ;; \
		riscv64) arch='riscv64' ;; \
		s390x) arch='s390x' ;; \
		ppc64le) arch='ppc64le' ;; \
		*) echo >&2 "error: unsupported architecture: $apkArch"; exit 1 ;; \
	esac; \
	wget --quiet -O /tmp/traefik.tar.gz "https://github.com/traefik/traefik/releases/download/v3.5.3/traefik_v3.5.3_linux_$arch.tar.gz"; \
	tar xzvf /tmp/traefik.tar.gz -C /usr/local/bin traefik; \
	rm -f /tmp/traefik.tar.gz; \
	chmod +x /usr/local/bin/traefik
COPY entrypoint.sh /
EXPOSE 80
ENTRYPOINT ["/entrypoint.sh"]
CMD ["traefik"]

# Metadata
LABEL org.opencontainers.image.vendor="Traefik Labs" \
    org.opencontainers.image.url="https://traefik.io" \
    org.opencontainers.image.source="https://github.com/traefik/traefik" \
    org.opencontainers.image.title="Traefik" \
    org.opencontainers.image.description="A modern reverse-proxy" \
    org.opencontainers.image.version="v3.5.3" \
    org.opencontainers.image.documentation="https://docs.traefik.io"

image

Concepts

image

image

image

image

image

image

image

image

History

A Brief History of Containers (by Jeff Victor & Kir Kolyshkin)

image

Ref: https://www.aquasec.com/blog/a-brief-history-of-containers-from-1970s-chroot-to-docker-2016/

LXC and libcontainer

image

Ref: https://stackoverflow.com/questions/34152365/difference-between-lxc-and-libcontainer

image

Read more at https://stackoverflow.com/questions/41645665/how-containerd-compares-to-runc

contained, runc, shim

image

runc

image

What happens under the hood when we create a new container on Linux?

image

image

Ref: https://stackoverflow.com/questions/46649592/dockerd-vs-docker-containerd-vs-docker-runc-vs-docker-containerd-ctr-vs-docker-c

What happens when you run a container

terminal <-> docker <-> dockerd <-> containerd <-> shim <-> application (container)

image

Ref: https://labs.iximiuz.com/tutorials/docker-run-vs-attach-vs-exec

image

Ref: https://labs.iximiuz.com/tutorials/docker-run-vs-attach-vs-exec

TODO

https://blog.quarkslab.com/digging-into-runtimes-runc.html

Ref: https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2021/12/22/runc-internals-1

Ref: https://iximiuz.com/en/posts/journey-from-containerization-to-orchestration-and-beyond/