Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
openSUSE:Leap:15.0:Staging:B
xen
5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-aft...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch of Package xen
# Commit a44f1697968e04fcc6145e3bd51c748b57047240 # Date 2018-02-20 10:16:56 +0100 # Author Igor Druzhinin <igor.druzhinin@citrix.com> # Committer Jan Beulich <jbeulich@suse.com> x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap We're noticing a reproducible system boot hang on certain Skylake platforms where the BIOS is configured in legacy boot mode with x2APIC disabled. The system stalls immediately after writing the first SMP initialization sequence into APIC ICR. The cause of the problem is watchdog NMI handler execution - somewhere near the end of NMI handling (after it's already rescheduled the next NMI) it tries to access IO port 0x61 to get the actual NMI reason on CPU0. Unfortunately, this port is emulated by BIOS using SMIs and this emulation for some reason takes more time than we expect during INIT-SIPI-SIPI sequence. As the result, the system is constantly moving between NMI and SMI handler and not making any progress. To avoid this, initialize the watchdog after SMP bootstrap on CPU0 and, additionally, protect the NMI handler by moving IO port access before NMI re-scheduling. The latter should also help in case of post boot CPU onlining. Although we're running watchdog at much lower frequency at this point, it's neveretheless possible we may trigger the issue anyway. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -682,7 +682,7 @@ void setup_local_APIC(void) printk("Leaving ESR disabled.\n"); } - if (nmi_watchdog == NMI_LOCAL_APIC) + if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id()) setup_apic_nmi_watchdog(); apic_pm_activate(); } --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -1284,7 +1284,10 @@ int __cpu_up(unsigned int cpu) void __init smp_cpus_done(void) { if ( nmi_watchdog == NMI_LOCAL_APIC ) + { + setup_apic_nmi_watchdog(); check_nmi_watchdog(); + } setup_ioapic_dest(); --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1670,7 +1670,7 @@ static nmi_callback_t *nmi_callback = du void do_nmi(const struct cpu_user_regs *regs) { unsigned int cpu = smp_processor_id(); - unsigned char reason; + unsigned char reason = 0; bool handle_unknown = false; ++nmi_count(cpu); @@ -1678,6 +1678,16 @@ void do_nmi(const struct cpu_user_regs * if ( nmi_callback(regs, cpu) ) return; + /* + * Accessing port 0x61 may trap to SMM which has been actually + * observed on some production SKX servers. This SMI sometimes + * takes enough time for the next NMI tick to happen. By reading + * this port before we re-arm the NMI watchdog, we reduce the chance + * of having an NMI watchdog expire while in the SMI handler. + */ + if ( cpu == 0 ) + reason = inb(0x61); + if ( (nmi_watchdog == NMI_NONE) || (!nmi_watchdog_tick(regs) && watchdog_force) ) handle_unknown = true; @@ -1685,7 +1695,6 @@ void do_nmi(const struct cpu_user_regs * /* Only the BSP gets external NMIs from the system. */ if ( cpu == 0 ) { - reason = inb(0x61); if ( reason & 0x80 ) pci_serr_error(regs); if ( reason & 0x40 )
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor