rcu_read_lock_bh

参考 Documentation/RCU/whatisRCU.rst ，基本 API 为:

bh::

	Critical sections	Grace period		Barrier

	rcu_read_lock_bh	[Same as RCU]		[Same as RCU]
	rcu_read_unlock_bh
	[local_bh_disable]
	[and friends]
	rcu_dereference_bh
	rcu_dereference_bh_check
	rcu_dereference_bh_protected
	rcu_read_lock_bh_held

以下是我的推测为什么有 rcu-bh ，但是没有 rcu-irq 的版本:

需要 rcu_read_lock_bh 保护，意味着 rcu_read_lock 的 critical region 中，就一个 grace period 就出现，这合理吗。也就是，软中断就是会触发 grace period 的结束，
所以，基于相同的思考，那就是硬中断不会推动 grace period 的结束 (这两个理解都是错误的，不是 softirq 会推动 grace period ，而是 softirq / hardirq 都是在 rcu_read_lock 持有的时候运行的，那么这个时候，有的代码，希望可以屏蔽掉 softirq ，动机就是这么简单。就是很多时候，有需要屏蔽 softirq ，又需要 rcu lock )
也是会受到 preemption 是否打开的情况，如果 preemption 没有打开，bh 还有意义吗？
- rcu 本来就可能被打断，然后在打断中，进行结构体的释放
  - 问题的关键在于，释放操作是 rcu subsystem 管理的，rcu subsystem 来负责等没有人用的时候来释放而不是 softirq 中自己来释放。

例子

vhost

commit b0c057ca7e835b36c6050c7627634b664796c1d6
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Thu Feb 13 11:45:11 2014 +0200

    vhost: fix a theoretical race in device cleanup

    vhost_zerocopy_callback accesses VQ right after it drops a ubuf
    reference.  In theory, this could race with device removal which waits
    on the ubuf kref, and crash on use after free.

    Do all accesses within rcu read side critical section, and synchronize
    on release.

    Since callbacks are always invoked from bh, synchronize_rcu_bh seems
    enough and will help release complete a bit faster.

    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 41be4de37e81..a0fa5de210cf 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -308,6 +308,8 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	struct vhost_virtqueue *vq = ubufs->vq;
 	int cnt;

+	rcu_read_lock_bh();
+
 	/* set len to mark this desc buffers done DMA */
 	vq->heads[ubuf->desc].len = success ?
 		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
@@ -322,6 +324,8 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	 */
 	if (cnt <= 1 || !(cnt % 16))
 		vhost_poll_queue(&vq->poll);
+
+	rcu_read_unlock_bh();
 }

 /* Expects to be always run from workqueue - which acts as
@@ -799,6 +803,8 @@ static int vhost_net_release(struct inode *inode, struct file *f)
 		fput(tx_sock->file);
 	if (rx_sock)
 		fput(rx_sock->file);
+	/* Make sure no callbacks are outstanding */
+	synchronize_rcu_bh();
 	/* We do an extra flush before freeing memory,
 	 * since jobs can re-queue themselves. */
 	vhost_net_flush(n);

__dev_queue_xmit

最经典的使用应该是在 __dev_queue_xmit 中， docs/trace/code/ftrace-function.sh 可以看到 __dev_queue_xmit 就是在软中断执行的

int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
{
	struct net_device *dev = skb->dev;
	struct netdev_queue *txq = NULL;
	struct Qdisc *q;
	int rc = -ENOMEM;
	bool again = false;

	skb_reset_mac_header(skb);
	skb_assert_len(skb);

	if (unlikely(skb_shinfo(skb)->tx_flags &
		     (SKBTX_SCHED_TSTAMP | SKBTX_BPF)))
		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);

	/* Disable soft irqs for various locks below. Also
	 * stops preemption for RCU.
	 */
	rcu_read_lock_bh();

@[
        __dev_queue_xmit+0
        br_forward_finish+96
        br_nf_hook_thresh+280
        br_nf_forward_finish+372
        br_nf_forward_arp+300
        br_nf_forward+512
        nf_hook_slow+80
        __br_forward+160
        deliver_clone+76
        maybe_deliver+168
        br_flood+164
        br_handle_frame_finish+412
        br_handle_frame+708
        __netif_receive_skb_core.constprop.0+744
        __netif_receive_skb_list_core+280
        netif_receive_skb_list_internal+536
        napi_complete_done+136
        hns3_nic_common_poll+304
        __napi_poll+64
        net_rx_action+368
        handle_softirqs+300
        __do_softirq+28
        ____do_softirq+24
        call_on_irq_stack+36
        do_softirq_own_stack+36
        __irq_exit_rcu+316
        irq_exit_rcu+24
        el1_interrupt+72
        el1h_64_irq_handler+24
        el1h_64_irq+128
        default_idle_call+56
        cpuidle_idle_call+380
        do_idle+244
        cpu_startup_entry+60
        rest_init+196
        start_kernel+1104
        __primary_switched+136
]: 134

[!NOTE] 参考神奇海螺的意见，有待验证

那么在进程上下文下： softirq 仍然可能在本 CPU 上抢占执行，softirq 代码可能触发： qdisc 切换 qdisc 的 RCU 回收推进

这会破坏你对并发关系的假设。

网络的用户很喜欢在中断上下文中执行?

(2026-04-07 我认为下一步的调查在于，网络在 softirq 中到底都做了什么工作，这个可以让 codex 做，应该是比较接近了)

rg rcu_read_lock_bh 基本上全部都是网络的组件调用，如果把 bpf xdp 也算进去，那么 kernel/padata.c 就是唯一的一个例外了。

这个问题非常经典，也就是，如果速度够快，那么可以放到 hardirq 中，如果事情多，那么就需要放到 softirq 中的。我初步感觉，网络的 softirq 中会完成特别多的工作，其逻辑复杂，所以需要 rcu_read_lock_bh 。

首先，存储不是不用 softirq ，而是 nvme 基本不用，而 sata 会用。

@[
        folio_wake_bit+1
        folio_end_writeback_no_dropbehind+101
        folio_end_writeback+22
        iomap_finish_ioend_buffered+303
        clone_endio+147
        blk_mq_end_request_batch+288
        nvme_irq+122
        __handle_irq_event_percpu+85
        handle_irq_event+56
        handle_edge_irq+197
        __common_interrupt+76
        common_interrupt+128
        asm_common_interrupt+38
        cpuidle_enter_state+204
        cpuidle_enter+49
        cpuidle_idle_call+245
        do_idle+119
        cpu_startup_entry+41
        start_secondary+294
        common_startup_64+318
]: 9

sata 的基本都是会用:

@[
        io_complete_rw+5
        blkdev_bio_end_io_async+78
        blk_update_request+415
        scsi_end_request+39
        scsi_io_completion+83
        blk_done_softirq+74
        handle_softirqs+241
        __irq_exit_rcu+194
        common_interrupt+133
        asm_common_interrupt+38
        cpuidle_enter_state+211
        cpuidle_enter+45
        cpuidle_idle_call+241
        do_idle+120
        cpu_startup_entry+41
        start_secondary+296
        common_startup_64+318
]: 301

再仔细看看文档吧

总结一下，目前知道的

这个问题 chatgpt 不靠谱
需要看文档
softirq 中显然也不会释放，释放是 rcu 的工作

🧀 rg bh rcu.rst 57: “rcu_read_lock_bh”, “rcu_read_unlock_bh”, “srcu_read_lock”,

whatisRCU.rst

d.	Do you need to treat NMI handlers, hardirq handlers,
	and code segments with preemption disabled (whether
	via preempt_disable(), local_irq_save(), local_bh_disable(),
	or some other mechanism) as if they were explicit RCU readers?
	If so, RCU-sched readers are the only choice that will work
	for you, but since about v4.20 you use can use the vanilla RCU
	update primitives.

e.	Do you need RCU grace periods to complete even in the face of
	softirq monopolization of one or more of the CPUs?  For example,
	is your code subject to network-based denial-of-service attacks?
	If so, you should disable softirq across your readers, for
	example, by using rcu_read_lock_bh().  Since about v4.20 you
	use can use the vanilla RCU update primitives.

lockdep.rst 18: rcu_read_lock_bh_held() for RCU-bh. 20: rcu_read_lock_any_held() for any of normal RCU, RCU-bh, and RCU-sched. 34: rcu_dereference_bh(p): 35: Check for RCU-bh read-side critical section. 44: rcu_dereference_bh_check(p, c): 46: rcu_read_lock_bh_held(). This is useful in code that 47: is invoked by both RCU-bh readers and updaters.

stallwarn.rst 459: disabled, perhaps via local_bh_disable(). It is of course possible

UP.rst 126: elsewhere using an _bh variant of the spinlock primitive. 129: like spin_lock_bh() to acquire the lock. Please note that

checklist.rst 67: pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), 241: and re-enables softirq, for example, rcu_read_lock_bh() and 242: rcu_read_unlock_bh(), or (3) any pair of primitives that disables 274: invocation happens entirely within a single local_bh_disable() 351: such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which 353: order to keep lockdep happy, in this case, rcu_dereference_bh(). 375: with softirq disabled, e.g., via spin_lock_bh(). Failing to

Design/Requirements/Requirements.rst 1285:be legal, including within preempt-disable code, local_bh_disable() 1291:protection of local_bh_disable(). In both the Linux kernel and in 2489:The RCU-bh flavor of RCU has since been expressed in terms of the other 2495:The softirq-disable (AKA “bottom-half”, hence the “_bh” abbreviations) 2496:flavor of RCU, or RCU-bh, was developed by Dipankar Sarma to provide a 2505:The solution was the creation of RCU-bh, which does 2506:local_bh_disable() across its read-side critical sections, and which 2509:offline. This means that RCU-bh grace periods can complete even when 2511:algorithms based on RCU-bh to withstand network-based denial-of-service 2514:Because rcu_read_lock_bh() and rcu_read_unlock_bh() disable and 2516:during the RCU-bh read-side critical section will be deferred. In this 2517:case, rcu_read_unlock_bh() will invoke softirq processing, which can 2519:overhead should be associated with the code following the RCU-bh 2520:read-side critical section rather than rcu_read_unlock_bh(), but the 2523:RCU-bh read-side critical section executes during a time of heavy 2527:rcu_read_unlock_bh(). This can of course make it appear at first 2528:glance as if rcu_read_unlock_bh() was executing very slowly. 2530:The `RCU-bh 2532:includes rcu_read_lock_bh(), rcu_read_unlock_bh(), rcu_dereference_bh(), 2533:rcu_dereference_bh_check(), and rcu_read_lock_bh_held(). However, the 2534:old RCU-bh update-side APIs are now gone, replaced by synchronize_rcu(), 2536:anything that disables bottom halves also marks an RCU-bh read-side 2537:critical section, including local_bh_disable() and local_bh_enable(), 2565:latency and overhead entailed. Just as with rcu_read_unlock_bh(),

RTFP.txt 2462:@mastersthesis{AbhinavDuggal2010Masters 2463:,author=”Abhinav Duggal” 2469: http://www.filesystems.org/docs/abhinav-thesis/abhinav_thesis.pdf 2618:,author = {Seyster, Justin and Radhakrishnan, Prabakar and Katoch, Samriti and Duggal, Abhinav and Stoller, Scott D. and Zadok, Erez}

Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst 14:third RCU-bh flavor having been implemented in terms of the other two.

Design/Data-Structures/Data-Structures.rst 1184:Turner, Abhishek Srivastava, Matt Kowalczyk, and Serge Hallyn for

在 softirq 无法被抢占，那么 rcu 的实现会有不同吗?

[   24.163867] 6 locks held by swapper/3/0:
[   24.163960]  #0: ffff800081558830 (rcu_read_lock){....}-{1:3}, at: netif_receive_skb_list_internal+0xec/0x3f0
[   24.164036]  #1: ffff800081558830 (rcu_read_lock){....}-{1:3}, at: ip_local_deliver_finish+0x60/0x1d0
[   24.164102]  #2: ffff800081558830 (rcu_read_lock){....}-{1:3}, at: tcp_rcv_state_process+0x150/0x1128
[   24.164163]  #3: ffff800081558830 (rcu_read_lock){....}-{1:3}, at: tcp_v4_send_synack+0xe8/0x3d0
[   24.164224]  #4: ffff800081558830 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xfc/0x978
[   24.164284]  #5: ffff800081558858 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x6c/0x1340

softirq 结束的时候 gp 才可以结束吗?

本站所有文章转发 CSDN 将按侵权追究法律责任，其它情况随意。