lockdep usage
配套测试代码
打开的方法 : CONFIG_PROVE_LOCKING
/proc
和 lockdep 相关的
- lockdep
- lockdep_chains
- lockdep_stats
对应的实现的代码是 : kernel/locking/lockdep_proc.c
lock_stat locks
kernel hacking -> Lock Debugging (spinlocks, mutexes, etc…)
[ ] Lock debugging: prove locking correctness (CONFIG_PROVE_LOCKING 就是大名鼎鼎的 lockdep)
[ ] Lock usage statistics
[ ] RT Mutex debugging, deadlock detection
[ ] Spinlock and rw-lock debugging: basic checks
[ ] Mutex debugging: basic checks
[ ] Wait/wound mutex debugging: Slowpath testing
[ ] RW Semaphore debugging: basic checks
[ ] Lock debugging: detect incorrect freeing of live locks
[ ] Sleep inside atomic section checking
[ ] Locking API boot-time self-tests
< > torture tests for locking
< > Wait/wound mutex selftests
< > torture tests for smp_call_function*()
[ ] Debugging for csd_lock_wait(), called from smp_call_function*()
CONFIG_LOCK_TORTURE_TEST : 这个没办法测试 irq 吧,而且 bh 也没有测试 CONFIG_SCF_TORTURE_TEST : 测试 smp_call_function 原来 torture 测试框架在这里: kernel/torture.c
concurrent/lockdep.c
这都是 CONFIG_PROVE_LOCKING=y 打开的结果 看看不打开的结果
case 0 和 case 1
case 0 : 展示在任何一个位置持有的锁是什么
[40878.237345] #0: ffff8881015c2420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x71/0xf0
[40878.238490] #1: ffff888105faa690 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x107/0x230
[40878.239799] #2: ffff88813e562738 (kn->active#129){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x110/0x230
[40878.241119] #3: ffffffff8255abc0 (rcu_read_lock){....}-{1:3}, at: tes
第一个锁 : vfs_write -> file_start_write
percpu_down_read(sb->s_writers.rw_sem + level - 1);
第二个和第三个锁
mutex_lock(&of->mutex); // 注意,这个是文件的锁,不是 inode 的锁
if (!kernfs_get_active(of->kn)) { // 这个是 lockdep 自己引入的
也就是,一共两个锁,一个保障 fs 不被 umount
根据这个观察,我们在 concurrent/lockdep.cpp 可以验证的问题
sudo cat /proc/17671/stack
[<0>] test_lockdep+0x81/0x150 [martins3]
[<0>] lockdep_store.cold+0x34/0xcb [martins3]
[<0>] kernfs_fop_write_iter+0x14e/0x230
[<0>] vfs_write+0x2bc/0x570
[<0>] ksys_write+0x71/0xf0
[<0>] do_syscall_64+0x72/0x180
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
~ 🐈
🧀 sudo cat /proc/17672/stack
[<0>] fdget_pos+0x7d/0xb0
[<0>] ksys_write+0x2b/0xf0
[<0>] do_syscall_64+0x72/0x180
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
case 5 触发经典死锁
[ 21.265316] mutex_store : action = 5 current=bash now=21273248041
[ 21.265857]
[ 21.265976] ============================================
[ 21.266355] WARNING: possible recursive locking detected
[ 21.266711] 6.9.1 #3 Tainted: G O
[ 21.267002] --------------------------------------------
[ 21.267320] bash/1571 is trying to acquire lock:
[ 21.267591] ffffffffc000c8d8 (test_mutex_hi){+.+.}-{4:4}, at: test_mutex+0x58/0x140 [martins3]
[ 21.268070]
[ 21.268070] but task is already holding lock:
[ 21.268378] ffffffffc000c8d8 (test_mutex_hi){+.+.}-{4:4}, at: test_mutex+0x4a/0x140 [martins3]
[ 21.268801]
[ 21.268801] other info that might help us debug this:
[ 21.269103] Possible unsafe locking scenario:
[ 21.269103]
[ 21.269377] CPU0
[ 21.269489] ----
[ 21.269593] lock(test_mutex_hi);
[ 21.269740] lock(test_mutex_hi);
[ 21.269882]
[ 21.269882] *** DEADLOCK ***
[ 21.269882]
[ 21.270109] May be due to missing lock nesting notation
[ 21.270109]
[ 21.270368] 4 locks held by bash/1571:
[ 21.270515] #0: ffff888119815428 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0xe1/0x390
[ 21.270809] #1: ffff888106f4fc90 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xd0/0x1b0
[ 21.271120] #2: ffff888122ceead0 (kn->active#144){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xd8/0x1b0
[ 21.271434] #3: ffffffffc000c8d8 (test_mutex_hi){+.+.}-{4:4}, at: test_mutex+0x4a/0x140 [martins3]
[ 21.271751]
[ 21.271751] stack backtrace:
[ 21.271904] CPU: 3 PID: 1571 Comm: bash Tainted: G O 6.9.1 #3
[ 21.272152] Hardware name: Martins3 Inc Hacking Alpine, BIOS 12 2022-2-2
[ 21.272382] Call Trace:
[ 21.272469] <TASK>
[ 21.272546] dump_stack_lvl+0xa7/0x100
[ 21.272681] validate_chain+0x909/0x2260
[ 21.272820] __lock_acquire+0x963/0xbd0
[ 21.272956] ? test_mutex+0x58/0x140 [martins3]
[ 21.273118] lock_acquire+0xf2/0x290
[ 21.273245] ? test_mutex+0x58/0x140 [martins3]
[ 21.273406] ? test_mutex+0x58/0x140 [martins3]
[ 21.273569] __mutex_lock+0x81/0x860
[ 21.273698] ? test_mutex+0x58/0x140 [martins3]
[ 21.273858] ? _printk+0x5d/0x80
[ 21.273974] test_mutex+0x58/0x140 [martins3]
[ 21.274129] mutex_store+0xa0/0xd0 [martins3]
[ 21.274285] kernfs_fop_write_iter+0x122/0x1b0
[ 21.274442] vfs_write+0x31d/0x390
[ 21.274563] ksys_write+0x75/0xe0
[ 21.274683] do_syscall_64+0xf8/0x210
[ 21.274811] ? exc_page_fault+0xb2/0x280
[ 21.274949] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 21.275128] RIP: 0033:0x7fa2a6cdfd24
[ 21.275254] Code: 15 11 91 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d fd 16 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
[ 21.275893] RSP: 002b:00007ffcce9c4658 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 21.276153] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa2a6cdfd24
[ 21.276398] RDX: 0000000000000002 RSI: 000055969f29b3e0 RDI: 0000000000000001
[ 21.276643] RBP: 000055969f29b3e0 R08: 000055969f29b3e0 R09: 0000000000000000
[ 21.276890] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000002
[ 21.277135] R13: 00007fa2a6dba5a0 R14: 0000000000000002 R15: 00007fa2a6db5880
[ 21.277380] </TASK>
case 6 和 case 7
测试 mutex nested lock https://lore.kernel.org/lkml/1516699646-7321-1-git-send-email-jasowang@redhat.com/
[ 29.504282] ================================================
[ 29.505147] WARNING: lock held when returning to user space!
[ 29.505863] 6.15.4 #10 Tainted: G O
[ 29.506454] ------------------------------------------------
[ 29.507168] tee/2267 is leaving the kernel with locks still held!
[ 29.507946] 2 locks held by tee/2267:
[ 29.508337] #0: ffffffffc0ea8370 (test_mutex_hi){+.+.}-{4:4}, at: test_lockdep+0x58/0x130 [martins3]
[ 29.509605] #1: ffffffffc0ea82d0 (test_mutex_hi2){+.+.}-{4:4}, at: test_lockdep+0x66/0x130 [martins3]
这种可以被检测到:
mutex_lock(&test_mutex_hi);
mutex_lock(&test_mutex_hi2);
这种不会检测到
mutex_lock_nested(&test_mutex_hi, 0);
mutex_lock_nested(&test_mutex_hi, 1);
也就是并不会检测,如果系统调用结束之后,所有的锁都是释放的: 这是一个很简单的需求啊,不知道为什么没有做
mutex_lock(&test_mutex_hi);
case 10
[ 83.437982] =============================
[ 83.437983] WARNING: suspicious RCU usage
[ 83.437984] 6.9.1 #3 Tainted: G O
[ 83.437985] [rcu_basic_writer] start
[ 83.437985] -----------------------------
[ 83.437987] /home/martins3/core/vn/code/module/rcupdate.c:58 suspicious rcu_dereference_check() usage!
[ 83.437992]
[ 83.437992] other info that might help us debug this:
[ 83.437992]
[ 83.437993]
[ 83.437993] rcu_scheduler_active = 2, debug_locks = 1
[ 83.437995] 2 locks held by kworker/u256:6/64:
[ 83.437997] #0: ffff888104932958 ((wq_completion)rcu_test){+.+.}-{0:0}, at: process_scheduled_works+0x23a/0x640
[ 83.438005] #1: ffffc90000cebe48 ((work_completion)(&rcu_basic2.work)){+.+.}-{0:0}, at: process_scheduled_works+0x25a/0x640
[ 83.438009]
[ 83.438009] stack backtrace:
[ 83.438010] CPU: 0 PID: 64 Comm: kworker/u256:6 Tainted: G O 6.9.1 #3
[ 83.438012] Hardware name: Martins3 Inc Hacking Alpine, BIOS 12 2022-2-2
[ 83.438012] Workqueue: rcu_test rcu_basic_reader [martins3]
[ 83.438018] Call Trace:
[ 83.438019] <TASK>
[ 83.438021] dump_stack_lvl+0xa7/0x100
[ 83.438029] lockdep_rcu_suspicious+0x14f/0x190
[ 83.438034] rcu_basic_reader+0x1ce/0x200 [martins3]
[ 83.438039] ? process_scheduled_works+0x25a/0x640
[ 83.438040] process_scheduled_works+0x2b0/0x640
[ 83.438046] worker_thread+0x270/0x370
[ 83.438047] ? __pfx_worker_thread+0x10/0x10
[ 83.438049] kthread+0x109/0x130
[ 83.438051] ? __pfx_kthread+0x10/0x10
[ 83.438052] ret_from_fork+0x37/0x50
[ 83.438053] ? __pfx_kthread+0x10/0x10
[ 83.438055] ret_from_fork_asm+0x1a/0x30
[ 83.438060] </TASK>
CONFIG_DEBUG_ATOMIC_SLEEP
如果不打开 lockdep 不会有这个东西吧。
[ 977.663859] preempt_store : action = 0 current=bash now=977704749634
[ 977.664875] BUG: sleeping function called from invalid context at /home/martins3/core/vn/code/module/preempt.c:27
[ 977.665595] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1647, name: bash
[ 977.665857] preempt_count: 1, expected: 0
[ 977.665997] RCU nest depth: 0, expected: 0
[ 977.666141] 4 locks held by bash/1647:
[ 977.666274] #0: ffff88810853e428 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0xe1/0x390
[ 977.666558] #1: ffff888106fc3490 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xd0/0x1b0
[ 977.666871] #2: ffff88817e5062e8 (kn->active#146){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xd8/0x1b0
[ 977.667188] #3: ffffffffc000d3d8 (sl_static){....}-{3:3}, at: test_preempt+0x174/0x300 [martins3]
[ 977.667500] irq event stamp: 11254
[ 977.667620] hardirqs last enabled at (11253): [<ffffffff8123a2b0>] console_unlock+0xd0/0x190
[ 977.667911] hardirqs last disabled at (11254): [<ffffffff8269d693>] _raw_spin_lock_irqsave+0x43/0xd0
[ 977.668424] softirqs last enabled at (10050): [<ffffffff8118072a>] handle_softirqs+0x36a/0x3d0
[ 977.668899] softirqs last disabled at (10043): [<ffffffff81180921>] __irq_exit_rcu+0x71/0x100
[ 977.669323] CPU: 1 PID: 1647 Comm: bash Tainted: G W O 6.9.1 #3
[ 977.669670] Hardware name: Martins3 Inc Hacking Alpine, BIOS 12 2022-2-2
[ 977.669981] Call Trace:
[ 977.670098] <TASK>
[ 977.670199] dump_stack_lvl+0xa7/0x100
[ 977.670369] __might_resched+0x24f/0x270
[ 977.670546] test_preempt+0x188/0x300 [martins3]
[ 977.670753] preempt_store+0xa0/0xd0 [martins3]
[ 977.670957] kernfs_fop_write_iter+0x122/0x1b0
[ 977.671139] vfs_write+0x31d/0x390
[ 977.671284] ksys_write+0x75/0xe0
[ 977.671426] do_syscall_64+0xf8/0x210
[ 977.671579] ? exc_page_fault+0xb2/0x280
[ 977.671743] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 977.671939] RIP: 0033:0x7f1d62dadd24
[ 977.672073] Code: 15 11 91 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d fd 16 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
[ 977.672719] RSP: 002b:00007fff94c68ae8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 977.672986] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1d62dadd24
[ 977.673233] RDX: 0000000000000002 RSI: 0000563d24832410 RDI: 0000000000000001
[ 977.673481] RBP: 0000563d24832410 R08: 0000563d24832410 R09: 0000000000000020
[ 977.673727] R10: 00007f1d62e87b40 R11: 0000000000000202 R12: 0000000000000002
[ 977.673973] R13: 00007f1d62e885a0 R14: 0000000000000002 R15: 00007f1d62e83880
[ 977.674224] </TASK>
[ 977.674327] preempt_store : now=977715218039
- 展示持有了那些 lock
- softirq 和 hardirq 的 disable 的时间
TODO
这种是靠什么实现的,这个不是比 lockdep_assert_held 要好? 还是他会自动获取的意思?
static void process_one_work(struct worker *worker, struct work_struct *work)
__releases(&pool->lock)
__acquires(&pool->lock)
workqueue 的 process_one_work 中,需要 work 上锁之后,需要停掉 lock
lockdep_depth(current) 是做什么的
这段做什么的?
#ifdef CONFIG_LOCKDEP
/*
* It is permissible to free the struct work_struct from
* inside the function that is called from it, this we need to
* take into account for lockdep too. To avoid bogus "held
* lock freed" warnings as well as problems when looking into
* work->lockdep_map, make a copy and use that here.
*/
struct lockdep_map lockdep_map;
lockdep_copy_map(&lockdep_map, &work->lockdep_map);
#endif
本站所有文章转发 CSDN 将按侵权追究法律责任,其它情况随意。