Skip to the content.

cgroup

用户态简单介绍

Linux kernel provides support for following twelve control group subsystems:

from : https://0xax.gitbooks.io/linux-insides/content/Cgroups/linux-cgroups-1.html

overview

基础查询函数

创建

proc

[ ] /sys/kernel/cgroup/delegate

不太懂,delegate 是什么意思

/proc/kpagecgroup

记录每一个页所属的 kpagecgroup 的 inode

#!/usr/bin/env bash
inodes=(
	1
)
for i in "${inodes[@]}"; do
	echo "${i}"
	find /sys/fs/cgroup -inum "$i"
done

使用 ls -i /sys/fs/cgroup 中的目录的 cgroup 就是

/proc/cgroups

WARN_ON(!proc_create_single("cgroups", 0, NULL, proc_cgroupstats_show));

v1

➜  ~ cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  7       1       1
cpu     6       1       1
cpuacct 6       1       1
blkio   11      1       1
memory  10      91      1
devices 8       46      1
freezer 14      1       1
net_cls 4       1       1
perf_event      13      1       1
net_prio        4       1       1
hugetlb 5       1       1
pids    9       55      1
rdma    2       1       1
misc    3       1       1
debug   12      1       1

v2

➜  ~ cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  0       87      1
cpu     0       87      1
cpuacct 0       87      1
blkio   0       87      1
memory  0       87      1
devices 0       87      1
freezer 0       87      1
net_cls 0       87      1
perf_event      0       87      1
net_prio        0       87      1
hugetlb 0       87      1
pids    0       87      1
rdma    0       87      1
misc    0       87      1
debug   0       87      1

可见,在 v2 中,/proc/cgroups 是个没有必要存在的。

/proc/pid/cgroup

proc_cgroup_show()

🧀  cat /proc/self/cgroup
0::/user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-wezterm-5210.scope
➜  ~ cat /proc/self/cgroup
0::/user.slice/user-0.slice/session-3.scope
➜  ~ cgexec -g memory:mem  cat /proc/self/cgroup
0::/mem

但是如果是 v1 ,其结果为

guest 中

14:freezer:/
13:perf_event:/
12:debug:/
11:blkio:/
10:memory:/user.slice/user-0.slice/session-1.scope
9:pids:/user.slice/user-0.slice/session-1.scope
8:devices:/user.slice
7:cpuset:/
6:cpu,cpuacct:/
5:hugetlb:/
4:net_cls,net_prio:/
3:misc:/
2:rdma:/
1:name=systemd:/user.slice/user-0.slice/session-1.scope
/* The list of hierarchy roots */
LIST_HEAD(cgroup_roots);

/* iterate across the hierarchies */
#define for_each_root(root)           \
  list_for_each_entry((root), &cgroup_roots, root_list)

v1 v2 的差别

如何切换 cgroup v2 来测试

检测当前是那个版本: https://kubernetes.io/docs/concepts/architecture/cgroups/

stat -fc %T /sys/fs/cgroup/
sudo grubby --update-kernel=ALL --args=systemd.unified_cgroup_hierarchy=1

老版本的 libcgroup 不能支持 cgroup v2 :

➜ sudo cgcreate -g cpu:A

[sudo] password for martins3:
cgcreate: libcgroup initialization failed: Cgroup is not mounted

可以用来搞清楚,那些代码是 v1 的,那些是 v2 的。

  1. unified hierarchy: a process can belong only to a single subgroup.
    • All controller behaviors are hierarchical - if a controller is enabled on a cgroup, it affects all processes which belong to the cgroups consisting the inclusive sub-hierarchy of the cgroup. 如图所示 loading
  2. 使用类型 BPF_PROG_TYPE_CGROUP_DEVICE 来控制 bpf 程序
  3. You cannot attach a process to an internal subgroup,process 只能挂载到末端上。
  4. All threads of a process belong to the same cgroup.

  5. https://github.com/opencontainers/runc/blob/master/docs/cgroup-v2.md
    • /sys/fs/cgroup/cgroup.controllers is present.

Man cgroup(7)

TLDR Understanding the new cgroups v2 API by Rami Rosen

kernel doc

一些接口的使用说明

memory.usage_in_bytes 和 memory.memsw.usage_in_bytes 区别

在 v1 中,memsw 包括 memory + swap

v2 中使用 memory.swap.* 来描述。

cgroup.controllers 和 cgroup.subtree_control

前者只读,后者控制,来控制一个 cgroup 中存在那些 controller

cgroup.procs

v1 以前叫 tasks,后来都是

cgroup.events

populated 0
frozen 0
  1. populated : 描述一个 cgroup 中是否存在进程
  2. frozen : @todo

可以通过 epoll 来监听这些文件,从而判断那些 cgroup 变为空了,内部靠 cgroup_file_notify 来实现。

cgroup.misc

细节看 : https://lwn.net/Articles/856438/

处理一些 ASID 之类的分配:

/sys/fs/cgroup/misc.capacity
user.slice/misc.current
user.slice/misc.events
user.slice/misc.max

reference

如何理解

这几个结构体的基本关系: struct cgroup_subsys_state *d_css; struct cgroup *dsct; struct css_set *src_cset; cgroup_root

/*
 * The default css_set - used by init and its children prior to any
 * hierarchies being mounted. It contains a pointer to the root state
 * for each subsystem. Also used to anchor the list of css_sets. Not
 * reference-counted, to improve performance when child cgroups
 * haven't been created.
 */
struct css_set init_css_set = {
	.refcount		= REFCOUNT_INIT(1),
	.dom_cset		= &init_css_set,
	.tasks			= LIST_HEAD_INIT(init_css_set.tasks),
	.mg_tasks		= LIST_HEAD_INIT(init_css_set.mg_tasks),
	.dying_tasks		= LIST_HEAD_INIT(init_css_set.dying_tasks),
	.task_iters		= LIST_HEAD_INIT(init_css_set.task_iters),
	.threaded_csets		= LIST_HEAD_INIT(init_css_set.threaded_csets),
	.cgrp_links		= LIST_HEAD_INIT(init_css_set.cgrp_links),
	.mg_src_preload_node	= LIST_HEAD_INIT(init_css_set.mg_src_preload_node),
	.mg_dst_preload_node	= LIST_HEAD_INIT(init_css_set.mg_dst_preload_node),
	.mg_node		= LIST_HEAD_INIT(init_css_set.mg_node),

	/*
	 * The following field is re-initialized when this cset gets linked
	 * in cgroup_init().  However, let's initialize the field
	 * statically too so that the default cgroup can be accessed safely
	 * early during boot.
	 */
	.dfl_cgrp		= &cgrp_dfl_root.cgrp,
};

controller 的实现分析

本站所有文章转发 CSDN 将按侵权追究法律责任,其它情况随意。