mce 的工作原理
- https://unix.stackexchange.com/questions/451655/running-mcelog-on-an-amd-processor
-
https://www.cnblogs.com/dataart/p/10374028.html
-
f9781bb18ed828e7b83b7bac4a4ad7cd497ee7d7
- 在老内核上可以正确运行吗 ?
文档
- https://docs.kernel.org/driver-api/edac.html : edac 的文档
- https://lwn.net/Articles/480575/ : edac 的更新 patch
rasdaemon 的一段 log
- aer_event
systemd[1]: Starting RAS daemon to log the RAS events... rasdaemon[644709]: rasdaemon: ras:mc_event event enabled rasdaemon[644709]: rasdaemon: Enabled event ras:mc_event rasdaemon[644709]: rasdaemon: ras:aer_event event enabled rasdaemon[644709]: rasdaemon: Enabled event ras:aer_event rasdaemon[644709]: rasdaemon: mce:mce_record event enabled rasdaemon[644709]: rasdaemon: Enabled event mce:mce_record rasdaemon[644709]: rasdaemon: ras:extlog_mem_event event enabled rasdaemon[644709]: rasdaemon: Enabled event ras:extlog_mem_event rasdaemon[644709]: ras:mc_event event enabled rasdaemon[644709]: Enabled event ras:mc_event rasdaemon[644709]: ras:aer_event event enabled rasdaemon[644709]: Enabled event ras:aer_event rasdaemon[644709]: mce:mce_record event enabled rasdaemon[644709]: Enabled event mce:mce_record rasdaemon[644709]: ras:extlog_mem_event event enabled rasdaemon[644709]: Enabled event ras:extlog_mem_event systemd[1]: Started RAS daemon to log the RAS events. rasdaemon[644709]: wait_access() failed, /sys/kernel/debug/tracing/instances/rasdaemon/events> rasdaemon[644709]: rasdaemon: wait_access() failed, /sys/kernel/debug/tracing/instances/rasda> rasdaemon[644709]: rasdaemon: Can't get net:net_dev_xmit_timeout traces. Perhaps this feature> rasdaemon[644709]: wait_access() failed, /sys/kernel/debug/tracing/instances/rasdaemon/events> rasdaemon[644709]: rasdaemon: wait_access() failed, /sys/kernel/debug/tracing/instances/rasda> rasdaemon[644709]: rasdaemon: Can't get devlink:devlink_health_report traces. Perhaps this fe> rasdaemon[644709]: rasdaemon: Can't get traces from devlink:devlink_health_report rasdaemon[644709]: rasdaemon: block:block_rq_complete event enabled rasdaemon[644709]: rasdaemon: Enabled event block:block_rq_complete rasdaemon[644709]: rasdaemon: Listening to events for cpus 0 to 0 rasdaemon[644709]: overriding event (1153) ras:mc_event with new print handler rasdaemon[644709]: overriding event (1150) ras:aer_event with new print handler rasdaemon[644709]: overriding event (113) mce:mce_record with new print handler rasdaemon[644709]: overriding event (1154) ras:extlog_mem_event with new print handler rasdaemon[644709]: overriding event (978) block:block_rq_c
manual
Intel 64 and IA32 Architectures Software Developer’s manual, Volume 3, System programming guide Parts 1 and 2. Machine checks are described in Chapter 14 in Part1 and in Append ix E in Part2.
mce
-
发现只要是替换内核,那么 /dev/ 下没有 mcelog 的
-
mcelog 操作需要/dev/mcelog 设备,这个设备通常自动由 udev 创建,也可以通过手工命令创建 mknod /dev/mcelog c 10 227。设备创建后剋通过 ls -lh /dev/mcelog 检查:
- 似乎 centos 8 没有办法自动创建
默认没有配置 /sys/devices/system/machinecheck/machinecheck0/trigger,这时这个内容是空的。当将/usr/sbin/mcelog 添加到这个 proc 文件中 ,就会在内核错误发生时触发运行/usr/sbin/mcelog 来处理解码错误日志,方便排查故障。
/etc/mcelog/mcelog.conf 是 mcelog 配置文件
这一步似乎是必须的:
- modprobe mce-inject
- cd /sys/devices/system/machinecheck/machinecheck0 && echo 3 > tolerant # 为了防止出现 hardware 错误的时候,不要将机器 panic
注入的方法: https://www.cnblogs.com/augusite/p/15561662.html
参考资料
- https://huataihuang.gitbooks.io/cloud-atlas/content/os/linux/log/mcelog.html
- https://www.cnblogs.com/muahao/p/6003910.html
- https://stackoverflow.com/questions/38496643/how-can-we-generate-mcemachine-check-errors : 如何使用 memory inject
- https://mcelog.org/ : 官方文档
mce 和 edac 的关系
-
https://cloud-atlas.readthedocs.io/zh_CN/latest/linux/server/hardware/edac.html
-
https://mcelog.org/faq.html#13 : mcelog 认为 edac 可以在 AMD 上工作
-
为什么 AMD 无法正确的支持 mcelog 的哇 ?
-
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890301
- 从 debian 中也是将这个关闭了
[ ] mcedev 已经被内核抛弃了
config X86_MCELOG_LEGACY
bool "Support for deprecated /dev/mcelog character device"
depends on X86_MCE
help
Enable support for /dev/mcelog which is needed by the old mcelog
userspace logging daemon. Consider switching to the new generation
rasdaemon solution.
内核中在 2017 的时候将这个移除了:
History: #0
Commit: 5de97c9f6d85fd83af76e09e338b18e7adb1ae60
Author: Tony Luck <tony.luck@intel.com>
Committer: Ingo Molnar <mingo@kernel.org>
Author Date: Mon 27 Mar 2017 05:33:03 PM CST
Committer Date: Tue 28 Mar 2017 02:55:01 PM CST
x86/mce: Factor out and deprecate the /dev/mcelog driver
Move all code relating to /dev/mcelog to a separate source file.
/dev/mcelog driver can now operate from the machine check notifier with
lowest prio.
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Move the mce_helper and trigger functionality behind CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-6-bp@alien8.de
[ Renamed CONFIG_X86_MCELOG to CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- 相关的讨论:
- https://lore.kernel.org/all/20170327093304.10683-6-bp@alien8.de/T/#u
[ ] 确认一下,在 ARM 中也是使用 rasdaemon 的
- 是不是 ARM 上没有 mce 而已?
x86 mce 处理流程
/*
* The default IDT entries which are set up in trap_init() before
* cpu_init() is invoked. Interrupt stacks cannot be used at that point and
* the traps which use them are reinitialized with IST after cpu_init() has
* set up TSS.
*/
static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_DE, asm_exc_divide_error),
INTG(X86_TRAP_NMI, asm_exc_nmi),
INTG(X86_TRAP_BR, asm_exc_bounds),
INTG(X86_TRAP_UD, asm_exc_invalid_op),
INTG(X86_TRAP_NM, asm_exc_device_not_available),
INTG(X86_TRAP_OLD_MF, asm_exc_coproc_segment_overrun),
INTG(X86_TRAP_TS, asm_exc_invalid_tss),
INTG(X86_TRAP_NP, asm_exc_segment_not_present),
INTG(X86_TRAP_SS, asm_exc_stack_segment),
INTG(X86_TRAP_GP, asm_exc_general_protection),
INTG(X86_TRAP_SPURIOUS, asm_exc_spurious_interrupt_bug),
INTG(X86_TRAP_MF, asm_exc_coprocessor_error),
INTG(X86_TRAP_AC, asm_exc_alignment_check),
INTG(X86_TRAP_XF, asm_exc_simd_coprocessor_error),
#ifdef CONFIG_X86_32
TSKG(X86_TRAP_DF, GDT_ENTRY_DOUBLEFAULT_TSS),
#else
INTG(X86_TRAP_DF, asm_exc_double_fault),
#endif
INTG(X86_TRAP_DB, asm_exc_debug),
#ifdef CONFIG_X86_MCE
INTG(X86_TRAP_MC, asm_exc_machine_check),
#endif
SYSG(X86_TRAP_OF, asm_exc_overflow),
#if defined(CONFIG_IA32_EMULATION)
SYSG(IA32_SYSCALL_VECTOR, entry_INT80_compat),
#elif defined(CONFIG_X86_32)
SYSG(IA32_SYSCALL_VECTOR, entry_INT80_32),
#endif
};
/*
* Fields are zero when not available. Also, this struct is shared with
* userspace mcelog and thus must keep existing fields at current offsets.
* Only add new fields to the end of the structure
*/
struct mce {
__u64 status; /* Bank's MCi_STATUS MSR */
__u64 misc; /* Bank's MCi_MISC MSR */
__u64 addr; /* Bank's MCi_ADDR MSR */
__u64 mcgstatus; /* Machine Check Global Status MSR */
__u64 ip; /* Instruction Pointer when the error happened */
__u64 tsc; /* CPU time stamp counter */
__u64 time; /* Wall time_t when error was detected */
__u8 cpuvendor; /* Kernel's X86_VENDOR enum */
__u8 inject_flags; /* Software inject flags */
__u8 severity; /* Error severity */
__u8 pad;
__u32 cpuid; /* CPUID 1 EAX */
__u8 cs; /* Code segment */
__u8 bank; /* Machine check bank reporting the error */
__u8 cpu; /* CPU number; obsoleted by extcpu */
__u8 finished; /* Entry is valid */
__u32 extcpu; /* Linux CPU number that detected the error */
__u32 socketid; /* CPU socket ID */
__u32 apicid; /* CPU initial APIC ID */
__u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */
__u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */
__u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */
__u64 ppin; /* Protected Processor Inventory Number */
__u32 microcode; /* Microcode revision */
__u64 kflags; /* Internal kernel use */
};
- do_machine_check
__mc_scan_banks- mce_read_aux :初始化 struct mce ,通过 mce_rdmsrl
- mce_log
错误注入的方法
MCE error handling can use the MCE inject: https://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git For it to work, Kernel mce-inject module should be compiled and loaded.
mce-inject 似乎很老了,似乎是依赖 /dev/mcelog 的 mce 只是 x86 专用的
APEI error injection can use this tool: https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/mce-test.git/
AER error injection can use this tool: https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git/
从现在的内核上来说,mce-inject 是通过 debugfs 来进行的
- https://linux.die.net/man/8/mce-inject
- https://stackoverflow.com/questions/38496643/how-can-we-generate-mcemachine-check-errors :
注入错误之后,如何检查:
- https://unix.stackexchange.com/questions/533196/where-does-rasdaemon-record-its-logs
APEI 错误
似乎是 ACPI 有关的:
- https://blog.csdn.net/qq_21186033/article/details/116977474
[ ] AER 是在这个体系下的吗
edac 和 mce 是什么关系,可以从内核中找到吗
虽然 /dev/mcelog 被移除了,但是 mce 文件夹还是存在很多内容哇
这三个错误的注入都是依靠什么的
简单的检测一下,发现其中的 : arch/x86/kernel/cpu/mce/inject.c
mcelog 的基本工作原理
很多错误并不是致命的,mcelog 可以将周期性的错误汇总一下:
基本的传输过程是: mce_log 将 mce queue 到 mcedev 上,最后通过 /dev/mcedev 提供取出。
[ ] 了解 rasdaemon 的工作原理
https://github.com/mchehab/rasdaemon
- 为什么感觉似乎是只能收集内存错误
- 内存错误不是 mce 中的一种
Its long term goal is to be the userspace tool that will collect all hardware error events reported by the Linux Kernel from several sources (EDAC, MCE, PCI, …) into one common framework.
难道 EDAC 和 MCE 不是一个东西 ?
–enable-aer enable PCIe AER events (currently experimental) –enable-mce enable MCE events (currently experimental)
- 为什么 mce 只是实验的功能 ?
- 是如何支持 mce 功能的 ?
分析函数 read_ras_event_all_cpus,应该是通过 ftrace 导出的。
edac 中,intel 也是支持的
https://lkml.iu.edu/hypermail/linux/kernel/1903.0/02742.html
[ ] 和虚拟化有关系吗
同时,在 QEMU 中有 mce_init 来初始化的
- mce_init : machine check exception, 初始化之后,那些 helper 就可以正确工作了
如何通过 kvm 中向 Guest 注入 mce 错误
- kvm_vcpu_ioctl_x86_setup_mce
- kvm_vcpu_ioctl_x86_set_mce 向 guest 注入 mce
- 初始化 vcpu->arch.mce_banks; @todo 暂时不知道如何这个 bank 是如何传递给 Guest 的
- kvm_queue_exception(vcpu, MC_VECTOR);
应该是设置一下 msr 寄存器之类的操作吧
4.105 KVM_X86_SETUP_MCE
Capability: KVM_CAP_MCE
Architectures: x86
Type: vcpu ioctl
Parameters: u64 mcg_cap (in)
Returns: 0 on success,
-EFAULT if u64 mcg_cap cannot be read,
-EINVAL if the requested number of banks is invalid,
-EINVAL if requested MCE capability is not supported.
Initializes MCE support for use. The u64 mcg_cap parameter
has the same format as the MSR_IA32_MCG_CAP register and
specifies which capabilities should be enabled. The maximum
supported number of error-reporting banks can be retrieved when
checking for KVM_CAP_MCE. The supported capabilities can be
retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED.
4.106 KVM_X86_SET_MCE
Capability: KVM_CAP_MCE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_x86_mce (in)
Returns: 0 on success,
-EFAULT if struct kvm_x86_mce cannot be read,
-EINVAL if the bank number is invalid,
-EINVAL if VAL bit is not set in status field.
Inject a machine check error (MCE) into the guest. The input
parameter is:
struct kvm_x86_mce {
__u64 status;
__u64 addr;
__u64 misc;
__u64 mcg_status;
__u8 bank;
__u8 pad1[7];
__u64 pad2[3];
};
If the MCE being reported is an uncorrected error, KVM will
inject it as an MCE exception into the guest. If the guest
MCG_STATUS register reports that an MCE is in progress, KVM
causes an KVM_EXIT_SHUTDOWN vmexit.
Otherwise, if the MCE is a corrected error, KVM will just
store it in the corresponding bank (provided this bank is
not holding a previously reported uncorrected error).
- https://stackoverflow.com/questions/43158596/mce-injection-on-qemu : 似乎尝试过,但是失败了
- 这个 patch 合并了一个
- https://patchwork.kernel.org/project/kvm/patch/1245722714.22246.424.camel@yhuang-dev.sh.intel.com/
- 但是仅仅限制为 tcg 模式下的:
(qemu) help mce mce [-b] cpu bank status mcgstatus addr misc -- inject a MCE on the given CPU [and broadcast to other CPUs with -b option]可以使用
mce 0 0 0x8000000000000000 1 1 1
但是在 kvm 模式下,没有任何的反应。
以上命令在 tcg 中也是没有任何反应的。
[ ] mce-test/cases/function/kvm/README
- mce 的测试: https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
测试一下内存错误注入,分析内核中如何处理 uncorrectable error
ecc 是 mce 的一种才对
检查 dimm 是否有 bug
ue_count=`cat /sys/devices/system/edac/mc/mc$mc/dimm$dimm/dimm_ue_count`
if [ $ue_count -gt 0 ]; then
echo "mc $mc dimm $dimm corrupted"
fi
done
CMCI
https://access.redhat.com/solutions/2710451
本站所有文章转发 CSDN 将按侵权追究法律责任,其它情况随意。