拯救者 R9000P 2023
基本硬件信息: https://linux-hardware.org/?probe=95c540792e
这个电脑是我当时在拼多多买的,买回来之后,发现非常容易宕机,我切换了多个内核版本, 都会有问题,记录在附录中,我感觉太奇怪了,所以我把这个切换为 Windows ,找联想售后
维修花费 3 周:
- 结果一个 32G 内存条被拆下来了,两个 32G 同时在一起还是容易宕机,和售后沟通,说只能替换,不能退钱,至今没用
- BIOS 中 windows 无法检测到 Discrete 显卡,需要手动调整。
- NIXOS 的启动项丢失
运行 Windows 平均 3 个月宕机一次。
至此,拉黑拼多多和联想。
调查过程
开始的时候,感觉和 AMD 的 cstate 有关系,但是显然这都是弯路。
- https://superuser.com/questions/1640355/setting-max-c-state-in-windows-10
- https://superuser.com/questions/121883/any-way-to-disable-specific-cpu-idle-cx-states ```c /*
- AMD Erratum 400 aware idle routine. We handle it the same way as C3 power
- states (local apic timer and TSC stop). *
- XXX this function is completely buggered vs RCU and tracing.
/
static void amd_e400_idle(void)
{
/
- We cannot use static_cpu_has_bug() here because X86_BUG_AMD_APIC_C1E
- gets set after static_cpu_has() places have been converted via
- alternatives. */ if (!boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) { default_idle(); return; }
tick_broadcast_enter();
default_idle();
tick_broadcast_exit(); } ```
/*
* Check if the CPU can handle C2 and deeper
*/
static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate)
{
/*
* Early models (<=5) of AMD Opterons are not supposed to go into
* C2 state.
*
* Steppings 0x0A and later are good
*/
if (boot_cpu_data.x86 == 0x0F &&
boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
boot_cpu_data.x86_model <= 0x05 &&
boot_cpu_data.x86_stepping < 0x0A)
return 1;
else if (boot_cpu_has(X86_BUG_AMD_APIC_C1E))
return 1;
else
return max_cstate;
}
附录
1
2025-01-29
[ 177.539729] Oops: general protection fault, probably for non-canonical address 0x80000001c2509203: 0000 [#1] PREEMPT SMP NOPTI
[ 177.539740] CPU: 20 UID: 1000 PID: 5779 Comm: qemu-system-x86 Tainted: P O 6.12.10 #1-NixOS
[ 177.539744] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[ 177.539745] Hardware name: LENOVO 82WM/INVALID, BIOS LPCN39WW 04/28/2023
[ 177.539747] RIP: 0010:__apic_accept_irq+0x21/0x2a0 [kvm]
[ 177.539802] Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 89 cf 41 56 45 89 c6 41 55 41 54 41 89 d4 55 48 89 fd 53 89 f3 48 83 ec 08 <4c> 8b af 90 00 00 00 41 8b 75 24 0f 1f 44 00 00 81 fb 00 04 00 00
[ 177.539804] RSP: 0018:ffffb07d5165fc20 EFLAGS: 00010282
[ 177.539807] RAX: ffff96ff029de640 RBX: 0000000000000000 RCX: 0000000000000000
[ 177.539808] RDX: 00000000000000fc RSI: 0000000000000000 RDI: 80000001c2509173
[ 177.539809] RBP: 80000001c2509173 R08: 0000000000000000 R09: 0000000000000000
[ 177.539810] R10: 0000000000000001 R11: 00000000000000fc R12: 00000000000000fc
[ 177.539811] R13: 0000000000000015 R14: 0000000000000000 R15: 0000000000000000
[ 177.539813] FS: 00007f98ee1fc6c0(0000) GS:ffff97055d800000(0000) knlGS:0000000000000000
[ 177.539814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 177.539815] CR2: 00007f2fc5898000 CR3: 000000011a21e000 CR4: 0000000000f50ef0
[ 177.539816] PKRU: 55555554
[ 177.539817] Call Trace:
[ 177.539822] <TASK>
[ 177.539826] ? die_addr+0x36/0x90
[ 177.539831] ? exc_general_protection+0x144/0x350
[ 177.539837] ? asm_exc_general_protection+0x26/0x30
[ 177.539841] ? __apic_accept_irq+0x21/0x2a0 [kvm]
[ 177.539884] __pv_send_ipi.part.0+0x5c/0xc0 [kvm]
[ 177.539929] kvm_pv_send_ipi+0xc4/0x130 [kvm]
[ 177.539971] __kvm_emulate_hypercall+0x2a2/0x400 [kvm]
[ 177.540016] ? trace_hardirqs_off_finish+0x32/0x90
[ 177.540020] ? svm_vcpu_enter_exit+0x8e/0xe0 [kvm_amd]
[ 177.540030] ? svm_get_segment+0x1c/0x120 [kvm_amd]
[ 177.540038] kvm_emulate_hypercall+0x17d/0x210 [kvm]
[ 177.540080] kvm_arch_vcpu_ioctl_run+0x197/0x6f0 [kvm]
[ 177.540123] kvm_vcpu_ioctl+0x233/0x980 [kvm]
[ 177.540162] ? futex_wake+0x85/0x1a0
[ 177.540167] __x64_sys_ioctl+0x99/0xe0
[ 177.540170] do_syscall_64+0xc1/0x220
[ 177.540173] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 177.540176] RIP: 0033:0x7f9c41fffaef
[ 177.540207] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 28 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 177.540209] RSP: 002b:00007f98ee1fb530 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 177.540211] RAX: ffffffffffffffda RBX: 00005605bacd6950 RCX: 00007f9c41fffaef
[ 177.540212] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000005a
[ 177.540213] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000
[ 177.540214] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 177.540214] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 177.540216] </TASK>
[ 177.540365] ---[ end trace 0000000000000000 ]---
[ 177.909662] pstore: backend (efi_pstore) writing error (-28)
[ 177.909664] RIP: 0010:__apic_accept_irq+0x21/0x2a0 [kvm]
[ 177.909713] Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 89 cf 41 56 45 89 c6 41 55 41 54 41 89 d4 55 48 89 fd 53 89 f3 48 83 ec 08 <4c> 8b af 90 00 00 00 41 8b 75 24 0f 1f 44 00 00 81 fb 00 04 00 00
[ 177.909715] RSP: 0018:ffffb07d5165fc20 EFLAGS: 00010282
[ 177.909718] RAX: ffff96ff029de640 RBX: 0000000000000000 RCX: 0000000000000000
[ 177.909719] RDX: 00000000000000fc RSI: 0000000000000000 RDI: 80000001c2509173
[ 177.909720] RBP: 80000001c2509173 R08: 0000000000000000 R09: 0000000000000000
[ 177.909721] R10: 0000000000000001 R11: 00000000000000fc R12: 00000000000000fc
[ 177.909722] R13: 0000000000000015 R14: 0000000000000000 R15: 0000000000000000
[ 177.909723] FS: 00007f98ee1fc6c0(0000) GS:ffff97055d800000(0000) knlGS:0000000000000000
[ 177.909724] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 177.909726] CR2: 00007f2fc5898000 CR3: 000000011a21e000 CR4: 0000000000f50ef0
[ 177.909727] PKRU: 55555554
在 kernel 内部 mini bpf scheduler 可以触发这个错误:
[ 45.613408] sched_ext: BPF scheduler "minimal_scheduler" enabled
[ 45.613981] sched_ext: scx_bpf_dispatch() renamed to scx_bpf_dsq_insert()
[ 67.420994] NOHZ tick-stop error: local softirq work is pending, handler #200!!!
[ 101.638553] git (6951) used greatest stack depth: 10576 bytes left
[ 148.278622] sched_ext: Soft lockup - CPU0 stuck for 19s, disabling "minimal_scheduler"
[ 149.923968] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 149.923968] rcu: 13-...!: (21000 ticks this GP) idle=098c/1/0x4000000000000000 softirq=23663/23663 fqs=22
[ 149.923968] rcu: (t=21002 jiffies g=37697 q=4 ncpus=32)
[ 149.923968] rcu: rcu_preempt kthread starved for 20895 jiffies! g37697 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=22
[ 149.923968] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 149.923968] rcu: RCU grace-period kthread stack dump:
[ 149.923968] task:rcu_preempt state:R running task stack:14664 pid:17 tgid:17 ppid:2 flags:0x00004000
[ 149.923968] Sched_ext: minimal_scheduler (enabled+all), task: runnable_at=-20895ms
[ 149.923968] Call Trace:
[ 149.923968] <TASK>
[ 149.923968] __schedule+0x3a4/0x13a0
[ 149.923968] ? _raw_spin_lock_irqsave+0x23/0x60
[ 149.923968] ? preempt_count_sub+0x4b/0x60
[ 149.923968] ? schedule_timeout+0x87/0x100
[ 149.923968] schedule+0x41/0x1c0
[ 149.923968] schedule_timeout+0x87/0x100
[ 149.923968] ? __pfx_process_timeout+0x10/0x10
[ 149.923968] rcu_gp_fqs_loop+0x121/0x6d0
[ 149.923968] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 149.923968] rcu_gp_kthread+0x1ac/0x280
[ 149.923968] kthread+0xdc/0x110
[ 149.923968] ? __pfx_kthread+0x10/0x10
[ 149.923968] ret_from_fork+0x31/0x50
[ 149.923968] ? __pfx_kthread+0x10/0x10
[ 149.923968] ret_from_fork_asm+0x1a/0x30
[ 149.923968] </TASK>
[ 149.923968] rcu: Stack dump where RCU GP kthread last ran:
2
[ 552.709464] Oops: general protection fault, probably for non-canonical address 0x80000001c61d71fb: 0000 [#1] PREEMPT SMP NOPTI
[ 552.709477] CPU: 20 UID: 1000 PID: 7526 Comm: qemu-system-x86 Tainted: P O 6.12.10 #1-NixOS
[ 552.709481] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[ 552.709483] Hardware name: LENOVO 82WM/INVALID, BIOS LPCN39WW 04/28/2023
[ 552.709485] RIP: 0010:kvm_arch_dy_has_pending_interrupt+0x12/0x40 [kvm]
[ 552.709557] Code: 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 66 90 48 8b 87 08 02 00 00 <80> b8 98 00 00 00 00 74 11 e9 b0 55 fb c5 48 8b 87 08 02 00 00 48
[ 552.709559] RSP: 0018:ffffa4ed1de27d88 EFLAGS: 00010202
[ 552.709563] RAX: 80000001c61d7163 RBX: 000000000000000b RCX: 0000000000000000
[ 552.709565] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c08c61dccb0
[ 552.709567] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000001770
[ 552.709568] R10: 0000000000000bb8 R11: 0000000000000000 R12: ffffa4ed1dd19000
[ 552.709569] R13: ffffa4ed1dd1a128 R14: ffff8c08c719ccb0 R15: ffff8c08c61dccb0
[ 552.709571] FS: 00007f30b3fff6c0(0000) GS:ffff8c0f1d800000(0000) knlGS:0000000000000000
[ 552.709573] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 552.709575] CR2: 00007f3f89b61000 CR3: 00000001c2974000 CR4: 0000000000f50ef0
[ 552.709576] PKRU: 55555554
[ 552.709578] Call Trace:
[ 552.709582] <TASK>
[ 552.709587] ? die_addr+0x36/0x90
[ 552.709594] ? exc_general_protection+0x144/0x350
[ 552.709601] ? asm_exc_general_protection+0x26/0x30
[ 552.709607] ? kvm_arch_dy_has_pending_interrupt+0x12/0x40 [kvm]
[ 552.709661] kvm_vcpu_on_spin+0x211/0x260 [kvm]
[ 552.709716] pause_interception+0x89/0x100 [kvm_amd]
[ 552.709731] kvm_arch_vcpu_ioctl_run+0x197/0x6f0 [kvm]
[ 552.709784] kvm_vcpu_ioctl+0x233/0x980 [kvm]
[ 552.709837] ? futex_wake+0x85/0x1a0
[ 552.709845] __x64_sys_ioctl+0x99/0xe0
[ 552.709852] do_syscall_64+0xc1/0x220
[ 552.709858] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 552.709862] RIP: 0033:0x7f3482bffaef
[ 552.709912] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 28 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 552.709915] RSP: 002b:00007f30b3ffe530 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 552.709917] RAX: ffffffffffffffda RBX: 0000564908ef8770 RCX: 00007f3482bffaef
[ 552.709918] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000082
[ 552.709920] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000
[ 552.709922] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 552.709923] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 552.709926] </TASK>
[ 552.710185] ---[ end trace 0000000000000000 ]---
[ 553.121753] pstore: backend (efi_pstore) writing error (-28)
[ 553.121758] RIP: 0010:kvm_arch_dy_has_pending_interrupt+0x12/0x40 [kvm]
[ 553.121818] Code: 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 66 90 48 8b 87 08 02 00 00 <80> b8 98 00 00 00 00 74 11 e9 b0 55 fb c5 48 8b 87 08 02 00 00 48
[ 553.121821] RSP: 0018:ffffa4ed1de27d88 EFLAGS: 00010202
[ 553.121824] RAX: 80000001c61d7163 RBX: 000000000000000b RCX: 0000000000000000
[ 553.121825] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c08c61dccb0
[ 553.121827] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000001770
[ 553.121828] R10: 0000000000000bb8 R11: 0000000000000000 R12: ffffa4ed1dd19000
[ 553.121829] R13: ffffa4ed1dd1a128 R14: ffff8c08c719ccb0 R15: ffff8c08c61dccb0
[ 553.121831] FS: 00007f30b3fff6c0(0000) GS:ffff8c0f1d800000(0000) knlGS:0000000000000000
[ 553.121832] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 553.121834] CR2: 00007f3f89b61000 CR3: 00000001c2974000 CR4: 0000000000f50ef0
[ 553.121835] PKRU: 55555554
同时 guest 中触发的问题:
localhost login: [ 283.626048] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 283.626048] rcu: 18-....: (20999 ticks this GP) idle=5c14/1/0x4000000000000000 softirq=31417/31417 fqs=4795
[ 283.626048] rcu: (t=21001 jiffies g=40829 q=718 ncpus=32)
[ 283.626048] CPU: 18 UID: 0 PID: 204 Comm: kswapd0 Not tainted 6.13.0 #14
[ 283.626048] Hardware name: Martins3 Inc Hacking Alpine, BIOS 12 2012-3-4
[ 283.626048] RIP: 0010:smp_call_function_many_cond+0x107/0x570
[ 283.626048] Code: 74 56 f3 48 0f bc c0 89 c1 83 f8 3f 77 4a be 01 00 00 00 48 63 c1 48 8b 13 48 03 14 c5 a0 fb 8e 82 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 83 c1 01 48 63 c1 48 83 f8 3f 77 1b 48 89 f0
[ 283.626048] RSP: 0018:ffffc900027f7808 EFLAGS: 00000202
[ 283.626048] RAX: 0000000000000011 RBX: ffff8980f96b2000 RCX: 000000000000001b
[ 283.626048] RDX: ffff8980f98f6d80 RSI: 0000000000000001 RDI: 000000000000001f
[ 283.626048] RBP: 0000000000000001 R08: 000000000000001f R09: 0000000000000000
[ 283.626048] R10: 000000007ffbfdff R11: 0000000000000000 R12: ffffffff8108cde0
[ 283.626048] R13: ffff8980f96afd80 R14: ffffffff8108d670 R15: 0000000000000012
[ 283.626048] FS: 0000000000000000(0000) GS:ffff8980f9680000(0000) knlGS:0000000000000000
[ 283.626048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 283.626048] CR2: 00007f3f36bf3000 CR3: 0000000003036000 CR4: 0000000000750ef0
[ 283.626048] PKRU: 55555554
[ 283.626048] Call Trace:
[ 283.626048] <IRQ>
[ 283.626048] ? rcu_dump_cpu_stacks+0x116/0x1d0
[ 283.626048] ? rcu_sched_clock_irq+0x3ab/0x1260
[ 283.626048] ? __walk_groups.isra.0+0x1f/0x70
[ 283.626048] ? tmigr_requires_handle_remote+0xcc/0xe0
[ 283.626048] ? update_process_times+0x6f/0xc0
[ 283.626048] ? tick_nohz_handler+0x8f/0x140
[ 283.626048] ? __pfx_tick_nohz_handler+0x10/0x10
[ 283.626048] ? __hrtimer_run_queues+0x85/0x2d0
[ 283.626048] ? hrtimer_interrupt+0xff/0x250
[ 283.626048] ? __sysvec_apic_timer_interrupt+0x52/0x120
[ 283.626048] ? sysvec_apic_timer_interrupt+0x6e/0x80
[ 283.626048] </IRQ>
[ 283.626048] <TASK>
[ 283.626048] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 283.626048] ? __pfx_flush_tlb_func+0x10/0x10
[ 283.626048] ? __pfx_tlb_is_not_lazy+0x10/0x10
[ 283.626048] ? smp_call_function_many_cond+0x107/0x570
[ 283.626048] ? smp_call_function_many_cond+0x291/0x570
[ 283.626048] ? __pfx_flush_tlb_func+0x10/0x10
[ 283.626048] ? __pfx_flush_tlb_func+0x10/0x10
[ 283.626048] ? __pfx_tlb_is_not_lazy+0x10/0x10
[ 283.626048] on_each_cpu_cond_mask+0x40/0x80
[ 283.626048] arch_tlbbatch_flush+0x115/0x130
[ 283.626048] try_to_unmap_flush_dirty+0x36/0x50
[ 283.626048] shrink_folio_list+0x69c/0xdb0
[ 283.626048] evict_folios+0x258/0x610
[ 283.626048] try_to_shrink_lruvec+0x1a4/0x2b0
[ 283.626048] shrink_one+0xfd/0x1e0
[ 283.626048] shrink_node+0xabd/0xc80
[ 283.626048] ? mem_cgroup_iter+0x1b9/0x210
[ 283.626048] balance_pgdat+0x4ca/0x910
[ 283.626048] ? perf_pmu_resched+0x10/0x60
[ 283.626048] ? trace_hardirqs_on+0x21/0x80
[ 283.626048] ? finish_task_switch.isra.0+0x9e/0x2e0
[ 283.626048] kswapd+0x1ec/0x390
[ 283.626048] ? __pfx_autoremove_wake_function+0x10/0x10
[ 283.626048] ? __pfx_kswapd+0x10/0x10
[ 283.626048] kthread+0xdc/0x110
[ 283.626048] ? __pfx_kthread+0x10/0x10
[ 283.626048] ret_from_fork+0x31/0x50
[ 283.626048] ? __pfx_kthread+0x10/0x10
[ 283.626048] ret_from_fork_asm+0x1a/0x30
[ 283.626048] </TASK>
[ 288.349418] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [indexer18:4532]
[ 288.349868] CPU#2 Utilization every 4s during lockup:
[ 288.350055] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 288.350055] #2: 101% system, 0% softirq, 0% hardirq, 0% idle
[ 288.350055] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 288.350055] #4: 101% system, 0% softirq, 0% hardirq, 0% idle
[ 288.350055] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 288.350055] Modules linked in: xt_addrtype bridge stp llc overlay rpcsec_gss_krb5 auth_rpcgss xt_MASQUERADE xt_mark tun nf_tables iptable_nat 9p vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd configfs ext4 mbcache jbd2 usb_storage usbhid kvm_amd ccp uhci_hcd sha1_generic ehci_pci nvme ehci_hcd kvm 9pnet_virtio nvme_core usbcore crc32c_intel virtio_scsi virtio_console 9pnet virtio_balloon nvme_auth usb_common virtio_net sch_fq_codel nfsv4 nfs lockd grace sunrpc netfs configs fuse virtio_pci virtio_pci_modern_dev virtio_pci_legacy_dev
[ 288.350055] CPU: 2 UID: 1000 PID: 4532 Comm: indexer18 Not tainted 6.13.0 #14
3
guest 和 host 同时构建内核,guest 会卡死,这显然是 kernel bug 了
[12290.947770] perf: interrupt took too long (2517 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[12291.035751] perf: interrupt took too long (3160 > 3146), lowering kernel.perf_event_max_sample_rate to 63000
[12291.327763] perf: interrupt took too long (3957 > 3950), lowering kernel.perf_event_max_sample_rate to 50000
[12292.203829] perf: interrupt took too long (4985 > 4946), lowering kernel.perf_event_max_sample_rate to 40000
[12293.299532] perf: interrupt took too long (6242 > 6231), lowering kernel.perf_event_max_sample_rate to 32000
[12298.565584] perf: interrupt took too long (7810 > 7802), lowering kernel.perf_event_max_sample_rate to 25000
[144115809.998853] sched: DL replenish lagged too much
[144115809.998853] ------------[ cut here ]------------
[144115809.998853] sa->load_avg || sa->util_avg || sa->runnable_avg
[144115809.998853] WARNING: CPU: 23 PID: 0 at kernel/sched/fair.c:4024 sched_balance_update_blocked_averages+0x706/0x760
[144115809.998853] Modules linked in: vhost_net vhost vhost_iotlb xt_addrtype bridge stp llc overlay rpcsec_gss_krb5 auth_rpcgss xt_MASQUERADE xt_mark tun nf_tables iptable_nat 9p vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd configfs ext4 mbcache jbd2 usbhid usb_storage kvm_amd ccp uhci_hcd sha1_generic ehci_pci kvm 9pnet_virtio ehci_hcd nvme usbcore nvme_core crc32c_intel 9pnet usb_common virtio_console virtio_balloon virtio_net nvme_auth virtio_scsi sch_fq_codel nfsv4 nfs lockd grace sunrpc netfs configs fuse virtio_pci virtio_pci_modern_dev virtio_pci_legacy_dev
[144115809.998853] CPU: 23 UID: 0 PID: 0 Comm: swapper/23 Not tainted 6.13.0 #13
[144115809.998853] Hardware name: Martins3 Inc Hacking Alpine, BIOS 12 2012-3-4
[144115809.998853] RIP: 0010:sched_balance_update_blocked_averages+0x706/0x760
[144115809.998853] Code: 48 8b 0c 24 e9 12 ff ff ff 48 c7 c2 ff ff ff ff e9 40 fe ff ff c6 05 b3 66 17 02 01 90 48 c7 c7 48 8c 76 82 e8 7b 60 fa ff 90 <0f> 0b 90 90 e9 23 fb ff ff c6 05 9a 66 17 02 01 90 48 c7 c7 f8 89
[144115809.998853] RSP: 0018:ffffc900037d8ee8 EFLAGS: 00010082
[144115809.998853] RAX: 0000000000000000 RBX: ffff898070867800 RCX: ffff89836f7dca88
[144115809.998853] RDX: 0000000000000027 RSI: 0000000000000027 RDI: 0000000000000001
[144115809.998853] RBP: 0000000000000001 R08: 00000000ffffbfff R09: 0000000000000001
[144115809.998853] R10: 00000000ffffbfff R11: ffff89876f8a0000 R12: ffff89801fc77000
[144115809.998853] R13: ffff89836f7f15c8 R14: ffff898070867948 R15: 00000000000000b8
[144115809.998853] FS: 0000000000000000(0000) GS:ffff89836f7c0000(0000) knlGS:0000000000000000
[144115809.998853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[144115809.998853] CR2: 00007fa6f9c14000 CR3: 000001000458c000 CR4: 0000000000750ef0
[144115809.998853] PKRU: 55555554
[144115809.998853] Call Trace:
[144115809.998853] <IRQ>
[144115809.998853] ? __warn+0x89/0x130
[144115809.998853] ? sched_balance_update_blocked_averages+0x706/0x760
[144115809.998853] ? report_bug+0x164/0x190
[144115809.998853] ? handle_bug+0x54/0x90
[144115809.998853] ? exc_invalid_op+0x17/0x70
[144115809.998853] ? asm_exc_invalid_op+0x1a/0x20
[144115809.998853] ? sched_balance_update_blocked_averages+0x706/0x760
[144115809.998853] ? sched_balance_update_blocked_averages+0x705/0x760
[144115809.998853] ? timerqueue_add+0x98/0xc0
[144115809.998853] ? enqueue_hrtimer+0x35/0x90
[144115809.998853] sched_balance_softirq+0x43/0x60
[144115809.998853] handle_softirqs+0x10b/0x3b0
[144115809.998853] __irq_exit_rcu+0xd9/0x100
[144115809.998853] irq_exit_rcu+0xe/0x20
[144115809.998853] sysvec_apic_timer_interrupt+0x73/0x80
[144115809.998853] </IRQ>
[144115809.998853] <TASK>
[144115809.998853] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[144115809.998853] RIP: 0010:default_idle+0xf/0x20
[144115809.998853] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 23 4f 2c 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[144115809.998853] RSP: 0018:ffffc9000228fee0 EFLAGS: 00000202
[144115809.998853] RAX: 0000000000000017 RBX: ffff8980023bb080 RCX: ffff89800b965308
[144115809.998853] RDX: 0000000000000017 RSI: ffffffff8288b04d RDI: 00000000005c8d44
[144115809.998853] RBP: 0000000000000017 R08: 00000000005c8d44 R09: 0000000000000002
[144115809.998853] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[144115809.998853] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[144115809.998853] default_idle_call+0x3f/0x110
[144115809.998853] do_idle+0x1cf/0x210
[144115809.998853] cpu_startup_entry+0x29/0x30
[144115809.998853] start_secondary+0x11e/0x140
[144115809.998853] common_startup_64+0x13e/0x148
[144115809.998853] </TASK>
[144115809.998853] ---[ end trace 0000000000000000 ]---
[144115809.998853] BUG: kernel NULL pointer dereference, address: 0000000000000051
[144115809.998853] #PF: supervisor read access in kernel mode
[144115809.998853] #PF: error_code(0x0000) - not-present page
[144115809.998853] PGD 1002a989067 P4D 1002a989067 PUD 10070b91067 PMD 0
[144115809.998853] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[144115809.998853] CPU: 25 UID: 0 PID: 345 Comm: kworker/25:1 Tainted: G W 6.13.0 #13
[144115809.998853] Tainted: [W]=WARN
[144115809.998853] Hardware name: Martins3 Inc Hacking Alpine, BIOS 12 2012-3-4
[144115809.998853] Workqueue: 0x0 (events)
[144115809.998853] RIP: 0010:pick_task_fair+0x3a/0x140
[144115809.998853] Code: 00 00 41 54 49 89 fc 55 53 41 8b 8c 24 10 01 00 00 85 c9 0f 84 e5 00 00 00 4c 89 ed eb 2e 66 90 66 90 48 89 ef e8 06 78 ff ff <80> 78 51 00 48 89 c3 0f 85 a7 00 00 00 48 85 db 74 cd 48 8b ab a8
[144115809.998853] RSP: 0018:ffffc90004a73d68 EFLAGS: 00010086
[144115809.998853] RAX: 0000000000000000 RBX: ffffffff8289a710 RCX: 000000000000002a
[144115809.998853] RDX: ffd1ae2783c29000 RSI: 000000000000042a RDI: 0000000000000400
[144115809.998853] RBP: ffff89836f870bc0 R08: 0000000000000400 R09: 0000000000000002
[144115809.998853] R10: ffff89836f8612e0 R11: 0000000000000000 R12: ffff89836f870ac0
[144115809.998853] R13: ffff89836f870bc0 R14: ffff8980160c8000 R15: ffff89836f870ac0
[144115809.998853] FS: 0000000000000000(0000) GS:ffff89836f840000(0000) knlGS:0000000000000000
[144115809.998853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[144115809.998853] CR2: 0000000000000051 CR3: 00000100162dc000 CR4: 0000000000750ef0
[144115809.998853] PKRU: 55555554
[144115809.998853] Call Trace:
[144115809.998853] <TASK>
[144115809.998853] ? __die+0x23/0x70
[144115809.998853] ? page_fault_oops+0x17d/0x550
[144115809.998853] ? exc_page_fault+0x79/0x180
[144115809.998853] ? asm_exc_page_fault+0x26/0x30
[144115809.998853] ? pick_task_fair+0x3a/0x140
[144115809.998853] ? pick_task_fair+0x3a/0x140
[144115809.998853] pick_next_task_fair+0x21/0x3c0
[144115809.998853] __pick_next_task+0x3e/0x1a0
[144115809.998853] __schedule+0x166/0x1530
[144115809.998853] ? queue_delayed_work_on+0x74/0x90
[144115809.998853] ? worker_thread+0x1b1/0x3b0
[144115809.998853] schedule+0x41/0x1c0
[144115809.998853] worker_thread+0x1b1/0x3b0
[144115809.998853] ? __pfx_worker_thread+0x10/0x10
[144115809.998853] kthread+0xdc/0x110
[144115809.998853] ? __pfx_kthread+0x10/0x10
[144115809.998853] ret_from_fork+0x31/0x50
[144115809.998853] ? __pfx_kthread+0x10/0x10
[144115809.998853] ret_from_fork_asm+0x1a/0x30
[144115809.998853] </TASK>
[144115809.998853] Modules linked in: vhost_net vhost vhost_iotlb xt_addrtype bridge stp llc overlay rpcsec_gss_krb5 auth_rpcgss xt_MASQUERADE xt_mark tun nf_tables iptable_nat 9p vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd configfs ext4 mbcache jbd2 usbhid usb_storage kvm_amd ccp uhci_hcd sha1_generic ehci_pci kvm 9pnet_virtio ehci_hcd nvme usbcore nvme_core crc32c_intel 9pnet usb_common virtio_console virtio_balloon virtio_net nvme_auth virtio_scsi sch_fq_codel nfsv4 nfs lockd grace sunrpc netfs configs fuse virtio_pci virtio_pci_modern_dev virtio_pci_legacy_dev
[144115809.998853] CR2: 0000000000000051
[144115809.998853] ---[ end trace 0000000000000000 ]---
[144115809.998853] RIP: 0010:pick_task_fair+0x3a/0x140
[144115809.998853] Code: 00 00 41 54 49 89 fc 55 53 41 8b 8c 24 10 01 00 00 85 c9 0f 84 e5 00 00 00 4c 89 ed eb 2e 66 90 66 90 48 89 ef e8 06 78 ff ff <80> 78 51 00 48 89 c3 0f 85 a7 00 00 00 48 85 db 74 cd 48 8b ab a8
[144115809.998853] RSP: 0018:ffffc90004a73d68 EFLAGS: 00010086
[144115809.998853] RAX: 0000000000000000 RBX: ffffffff8289a710 RCX: 000000000000002a
[144115809.998853] RDX: ffd1ae2783c29000 RSI: 000000000000042a RDI: 0000000000000400
[144115809.998853] RBP: ffff89836f870bc0 R08: 0000000000000400 R09: 0000000000000002
[144115809.998853] R10: ffff89836f8612e0 R11: 0000000000000000 R12: ffff89836f870ac0
[144115809.998853] R13: ffff89836f870bc0 R14: ffff8980160c8000 R15: ffff89836f870ac0
[144115809.998853] FS: 0000000000000000(0000) GS:ffff89836f840000(0000) knlGS:0000000000000000
[144115809.998853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[144115809.998853] CR2: 0000000000000051 CR3: 00000100162dc000 CR4: 0000000000750ef0
[144115809.998853] PKRU: 55555554
[144115809.998853] note: kworker/25:1[345] exited with irqs disabled
4
切换 6.6 内核之后,还有问题:
2月 02 16:57:28 nixos kernel: general protection fault, probably for non-canonical address 0x1650e1d88812c: 0000 [#1] PREEMPT SMP NOPTI
2月 02 16:57:28 nixos kernel: CPU: 21 PID: 2197 Comm: .gnome-shell-wr Tainted: P W O 6.6.72 #1-NixOS
2月 02 16:57:28 nixos kernel: Hardware name: LENOVO 82WM/INVALID, BIOS LPCN39WW 04/28/2023
2月 02 16:57:28 nixos kernel: RIP: 0010:kmem_cache_alloc+0xe0/0x3b0
2月 02 16:57:28 nixos kernel: Code: 84 11 02 00 00 66 90 8b 05 05 8f 99 01 85 c0 0f 84 86 02 00 00 48 c7 44 24 10 00 00 00 00 49 8b 04 24 65 48 03 05 58 69 83 71 <48> 8b 50 08 48 83 78 10 00 4c 8b 30 0f 84 0a 02 00 00 4d 85 f6 0f
2月 02 16:57:28 nixos kernel: RSP: 0018:ffffc90002197e50 EFLAGS: 00010207
2月 02 16:57:28 nixos kernel: RAX: 0001650e1d888124 RBX: 0000000000000cc0 RCX: 0000000000000000
2月 02 16:57:28 nixos kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2月 02 16:57:28 nixos kernel: RBP: ffffc90002197ea0 R08: 0000000000000000 R09: 0000000000000000
2月 02 16:57:28 nixos kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810cbde400
2月 02 16:57:28 nixos kernel: R13: ffff8881010eb1c0 R14: 00007fff85a26d50 R15: 0000000000000cc0
2月 02 16:57:28 nixos kernel: FS: 00007f5fc8cd3e80(0000) GS:ffff88881d880000(0000) knlGS:0000000000000000
2月 02 16:57:28 nixos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2月 02 16:57:28 nixos kernel: CR2: 00007f0b10e76010 CR3: 000000014a5a6000 CR4: 0000000000f50ee0
2月 02 16:57:28 nixos kernel: PKRU: 55555554
2月 02 16:57:28 nixos kernel: Call Trace:
2月 02 16:57:28 nixos kernel: <TASK>
2月 02 16:57:28 nixos kernel: ? die_addr+0x36/0x90
2月 02 16:57:28 nixos kernel: ? exc_general_protection+0x143/0x3c0
2月 02 16:57:28 nixos kernel: ? asm_exc_general_protection+0x26/0x30
2月 02 16:57:28 nixos kernel: ? kmem_cache_alloc+0xe0/0x3b0
2月 02 16:57:28 nixos kernel: ? nvidia_unlocked_ioctl+0x326/0x930 [nvidia]
2月 02 16:57:28 nixos kernel: nvidia_unlocked_ioctl+0x326/0x930 [nvidia]
2月 02 16:57:28 nixos kernel: __x64_sys_ioctl+0x9c/0xe0
2月 02 16:57:28 nixos kernel: do_syscall_64+0x39/0x90
2月 02 16:57:28 nixos kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
2月 02 16:57:28 nixos kernel: RIP: 0033:0x7f5fd9d12aef
2月 02 16:57:28 nixos kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 28 48 8b 44 24 18 64 48 2b 04 25 28 00 00
2月 02 16:57:28 nixos kernel: RSP: 002b:00007fff85a26c40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
2月 02 16:57:28 nixos kernel: RAX: ffffffffffffffda RBX: 00007fff85a26d50 RCX: 00007f5fd9d12aef
2月 02 16:57:28 nixos kernel: RDX: 00007fff85a26d50 RSI: 00000000c020462a RDI: 000000000000000d
2月 02 16:57:28 nixos kernel: RBP: 00000000c020462a R08: 00007fff85a26d50 R09: 00007fff85a26d6c
2月 02 16:57:28 nixos kernel: R10: 00000000c1d00019 R11: 0000000000000246 R12: 000000000000000d
2月 02 16:57:28 nixos kernel: R13: 00007fff85a26d6c R14: 00000000679f3378 R15: 00007fff85a26ca0
2月 02 16:57:28 nixos kernel: </TASK>
2月 02 16:57:28 nixos kernel: ---[ end trace 0000000000000000 ]---
2月 02 16:57:28 nixos kernel: pstore: backend (efi_pstore) writing error (-5)
2月 02 16:57:28 nixos kernel: RIP: 0010:kmem_cache_alloc+0xe0/0x3b0
2月 02 16:57:28 nixos kernel: Code: 84 11 02 00 00 66 90 8b 05 05 8f 99 01 85 c0 0f 84 86 02 00 00 48 c7 44 24 10 00 00 00 00 49 8b 04 24 65 48 03 05 58 69 83 71 <48> 8b 50 08 48 83 78 10 00 4c 8b 30 0f 84 0a 02 00 00 4d 85 f6 0f
2月 02 16:57:28 nixos kernel: RSP: 0018:ffffc90002197e50 EFLAGS: 00010207
2月 02 16:57:28 nixos kernel: RAX: 0001650e1d888124 RBX: 0000000000000cc0 RCX: 0000000000000000
2月 02 16:57:28 nixos kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2月 02 16:57:28 nixos kernel: RBP: ffffc90002197ea0 R08: 0000000000000000 R09: 0000000000000000
2月 02 16:57:28 nixos kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810cbde400
2月 02 16:57:28 nixos kernel: R13: ffff8881010eb1c0 R14: 00007fff85a26d50 R15: 0000000000000cc0
2月 02 16:57:28 nixos kernel: FS: 00007f5fc8cd3e80(0000) GS:ffff88881d880000(0000) knlGS:0000000000000000
2月 02 16:57:28 nixos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2月 02 16:57:28 nixos kernel: CR2: 00007f0b10e76010 CR3: 000000014a5a6000 CR4: 0000000000f50ee0
2月 02 16:57:28 nixos kernel: PKRU: 55555554
5
2月 02 19:29:21 nixos kernel: list_del corruption. next->prev should be ffff888168554a38, but was 36b399449cc4488c. (next=ffff88819b9de2b8)
2月 02 19:29:21 nixos kernel: ------------[ cut here ]------------
2月 02 19:29:21 nixos kernel: kernel BUG at lib/list_debug.c:65!
本站所有文章转发 CSDN 将按侵权追究法律责任,其它情况随意。