Skip to the content.

iouring register buffers

https://unixism.net/loti/ref-iouring/io_uring_register.html

主要关联的源码: rsrc.c

内核关联代码: io_uring/rsrc.c

核心数据结构:

struct io_mapped_ubuf {
	u64		ubuf;
	u64		ubuf_end;
	unsigned int	nr_bvecs;
	unsigned long	acct_pages;
	// 注册的 buffer 都保存到这里
	struct bio_vec	bvec[] __counted_by(nr_bvecs);
};

从 IORING_OP_READ_FIXED 和 IORING_OP_READ 就是经典观察了:

[!NOTE] 参考 Deepseeek ,有待验证

void io_uring_prep_read(struct io_uring_sqe *sqe,
                            int fd,
                            void *buf,
                            unsigned nbytes,
                            __u64 offset);

void io_uring_prep_read_fixed(struct io_uring_sqe *sqe,
                               int fd,
                               void *buf,
                               unsigned nbytes,
                               __u64 offset,
                               int buf_index);

相比就是多了一个参数 buf_index

[!NOTE] 参考神奇海螺的意见,有待验证

为什么有了 buf_index 还需要 buf

在使用 io_uring 的注册缓冲区(Registered Buffers)特性时,你首先会调用 io_uring_register 将一组内存区域(数组)告知内核。内核会提前锁定(pin)这些物理页,以避免在每次 I/O 时进行复杂的内存映射和取消映射操作。

一点简单的测试

对应测试代码在: vn/code/src/c/iouring/ubuf.c

首先注册这些 buffer

在之后的 io 中来

内核中是如何 verified 注册地址的

io_sqe_buffer_register 中:

  1. io_pin_pages : 首先将注册的 pages 都 pin 下来
  2. io_buffer_account_pin : 统计这些 page 是否超过了

headpage_already_acct 的理解,这里为什么不去统计普通页的重复问题,只是统计大页的重复问题。

不过结果就是这样的:

将内存 ping 下来之后,的确 grep VmPin /proc/pid/status 可以看到这个:

cat /proc/5460/status| grep Pin
  VmPin:        40 kB

细节

聚合

io_mapped_ubuf::bvec 数组中是,一个位置存储一个 page ,现在可以存储一个 vector 了:

https://lore.kernel.org/all/20240731090133.4106-1-cliang01.li@samsung.com/

commit a8edbb424b1391b077407c75d8f5d2ede77aa70d
Author: Chenliang Li <cliang01.li@samsung.com>
Date:   Wed Jul 31 17:01:33 2024 +0800

    io_uring/rsrc: enable multi-hugepage buffer coalescing

    Add support for checking and coalescing multi-hugepage-backed fixed
    buffers. The coalescing optimizes both time and space consumption caused
    by mapping and storing multi-hugepage fixed buffers.

    A coalescable multi-hugepage buffer should fully cover its folios
    (except potentially the first and last one), and these folios should
    have the same size. These requirements are for easier processing later,
    also we need same size'd chunks in io_import_fixed for fast iov_iter
    adjust.

    Signed-off-by: Chenliang Li <cliang01.li@samsung.com>
    Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/20240731090133.4106-3-cliang01.li@samsung.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_sqe_buffer_register 中:

	/* If it's huge page(s), try to coalesce them into fewer bvec entries */
	if (nr_pages > 1 && io_check_coalesce_buffer(pages, nr_pages, &data)) {
		if (data.nr_pages_mid != 1)
			coalesced = io_coalesce_buffer(&pages, &nr_pages, &data);
	}

让问题变的奇怪的是 io_imu_folio_data 中要求 head folio 可以 partially included :

struct io_imu_folio_data {
	/* Head folio can be partially included in the fixed buf */
	unsigned int	nr_pages_head;
	/* For non-head/tail folios, has to be fully included */
	unsigned int	nr_pages_mid;
	unsigned int	folio_shift;
	unsigned int	nr_folios;
};

不知道为什么对于 head folio 会网开一面的处理

io_uring/kbuf.c

aio 似乎是需要拷贝 iovec 的吧!

本站所有文章转发 CSDN 将按侵权追究法律责任,其它情况随意。