NUMA
- 创建的内存最好是就是在附近 :buddy 和 slub 分配器的策略,这些策略被整理成为 mempolicy.c
- 运行过程中间发生变化 : migrate.c
首先分析一波 numa 的基础知识 1
用户层次: Available policies are
- page interleaving (i.e., allocate in a round-robin fashion from all, or a subset, of the nodes on the system), inorder to overload the initial boot node with boot-time allocations.
- preferred node allocation (i.e., preferably allocate on a particular node),
- local allocation (i.e., allocate on the node on which the task is currently executing), or
- allocation only on specific nodes (i.e., allocate on some subset of the available nodes). It is also possible to bind tasks to specific nodes.
分析 syscall :
- get_mempolicy
- mbind
- migrate_page
问题
- 每一个 node 主要管理什么信息,每一个 zone 中间放置什么内容?
- 哪里涉及到了 nodemask 的,主要作用是什么 ?
- 如何理解这个? ```c /*
- Array of node states. / nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, [N_ONLINE] = { { [0] = 1UL } }, #ifndef CONFIG_NUMA [N_NORMAL_MEMORY] = { { [0] = 1UL } }, #ifdef CONFIG_HIGHMEM [N_HIGH_MEMORY] = { { [0] = 1UL } }, #endif [N_MEMORY] = { { [0] = 1UL } }, [N_CPU] = { { [0] = 1UL } }, #endif / NUMA */ }; EXPORT_SYMBOL(node_states); ```
- 到底 memory policy 是一个进程的行为还是直接影响所有的程序的
numad
- https://pagure.io/numad/tree/master
这个下面的接口都可以看看
- /sys/devices/system/node/node0
看看这个工具
https://github.com/intel/numatop
https://frankdenneman.nl/2016/07/06/introduction-2016-numa-deep-dive-series/
阅读一下
- https://frankdenneman.nl/2016/07/06/introduction-2016-numa-deep-dive-series/
对比一下这两个
#define first_online_node first_node(node_states[N_ONLINE])
#define first_memory_node first_node(node_states[N_MEMORY])
static __always_inline unsigned int next_online_node(int nid)
{
return next_node(nid, node_states[N_ONLINE]);
}
static __always_inline unsigned int next_memory_node(int nid)
{
return next_node(nid, node_states[N_MEMORY]);
}
cluster 到底是什么?
cluster id 的意思是看
看这个: https://lists.gnu.org/archive/html/qemu-arm/2021-03/msg01031.html
A cluster means a group of cores that share some resources (e.g. cache) among them under the LLC. For example, ARM64 server chip Kunpeng 920 has 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters share L3 cache data while cores within each cluster share the L2 cache.
检查一下 Yanan Wang ,对应的时间段 kernel 中也提交了很多相关的代码。
看来 cluster 是共享 L2 的含义
https://www.hikunpeng.com/doc_center/source/zh/kunpengcpfs/systuningguide/systemtg/kunpengcluster_05_0007.html
这两个应该关注下:
cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
echo 1 > /proc/sys/kernel/sched_cluster
一个具体的问题
qemu-system-aarch64: warning: CPU-9 and CPU-10 in socket-0-cluster-0 have been associated with node-0 and node-1 respectively. It can cause OSes like Linux to misbehave
配置在 qemu 中看到: validate_cpu_cluster_to_numa_boundary
History: #0
Commit: a494fdb715832000ee9047a549a35aacfea8175e
Author: Gavin Shan <gshan@redhat.com>
Committer: Paolo Bonzini <pbonzini@redhat.com>
Author Date: Tue 09 May 2023 08:27:37 AM CST
Committer Date: Mon 26 Jun 2023 04:23:01 PM CST
numa: Validate cluster and NUMA node boundary if required
For some architectures like ARM64, multiple CPUs in one cluster can be
associated with different NUMA nodes, which is irregular configuration
because we shouldn't have this in baremetal environment. The irregular
configuration causes Linux guest to misbehave, as the following warning
messages indicate.
-smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
-numa node,nodeid=0,cpus=0-1,memdev=ram0 \
-numa node,nodeid=1,cpus=2-3,memdev=ram1 \
-numa node,nodeid=2,cpus=4-5,memdev=ram2 \
看看这个图,太棒了
https://unix.stackexchange.com/questions/468766/understanding-output-of-lscpu
从其他的角度看看
- https://sqltouch.blogspot.com/2023/08/numa-and-soft-numa-in-sql-server-to-get.html
- https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/dn282282(v=ws.11)
- https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html
本站所有文章转发 CSDN 将按侵权追究法律责任,其它情况随意。