group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #22856
[Bug 1758378] Re: [Hyper-V] PCI: hv: Fix 2 hang issues in hv_compose_msi_msg
The patches apply cleanly to the 4.13 linux-azure kernel. I built a test
kernel with them that is available at
http://kernel.ubuntu.com/~mhcerri/azure/linux-
azure-4.13.0-1014.17~lp1758378.1/
** No longer affects: linux-azure-edge (Ubuntu Xenial)
** No longer affects: linux-azure (Ubuntu Bionic)
** Also affects: linux-azure (Ubuntu Bionic)
Importance: High
Assignee: Marcelo Cerri (mhcerri)
Status: Fix Committed
** Also affects: linux-azure-edge (Ubuntu Bionic)
Importance: Undecided
Status: New
** Also affects: linux-azure (Ubuntu Xenial)
Importance: Undecided
Status: New
** Also affects: linux-azure-edge (Ubuntu Xenial)
Importance: Undecided
Status: New
** Changed in: linux-azure-edge (Ubuntu Xenial)
Status: New => Fix Released
** Changed in: linux-azure-edge (Ubuntu Xenial)
Importance: Undecided => Critical
** Changed in: linux-azure-edge (Ubuntu Xenial)
Assignee: (unassigned) => Marcelo Cerri (mhcerri)
** Changed in: linux-azure-edge (Ubuntu Bionic)
Status: New => Invalid
** Changed in: linux-azure (Ubuntu Xenial)
Status: New => In Progress
** Changed in: linux-azure (Ubuntu Xenial)
Assignee: (unassigned) => Marcelo Cerri (mhcerri)
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1758378
Title:
[Hyper-V] PCI: hv: Fix 2 hang issues in hv_compose_msi_msg
Status in linux-azure package in Ubuntu:
Fix Committed
Status in linux-azure-edge package in Ubuntu:
Invalid
Status in linux-azure source package in Xenial:
In Progress
Status in linux-azure-edge source package in Xenial:
Fix Released
Status in linux-azure source package in Bionic:
Fix Committed
Status in linux-azure-edge source package in Bionic:
Invalid
Bug description:
We've identified some issues in recent testing against upstream 4.15
SR-IOV and DPDK. The following commits are in Lorenzo's PCI tree on
their way into 4.16 and stable:
Tree:
https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/log/?h=pci/hv
PCI: hv: Only queue new work items in hv_pci_devices_present() if necessary
If there is pending work in hv_pci_devices_present() we just need to add
the new dr entry into the dr_list. Add a check to detect pending work
items and update the code to skip queuing work if pending work items
are detected.
https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=948373b3ed1bcf05a237c24675b84804315aff14
PCI: hv: Remove the bogus test in hv_eject_device_work()
When kernel is executing hv_eject_device_work(), hpdev->state value must
be hv_pcichild_ejecting; any other value would consist in a bug,
therefore replace the bogus check with an explicit WARN_ON() on the
condition failure detection.
https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=fca288c0153b2b97114b9081bc3c33c3735145b6
PCI: hv: Fix a comment typo in _hv_pcifront_read_config()
Comment in _hv_pcifront_read_config() contains a typo, fix it.
No functional change.
https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=df3f2159f4e4146d40b244725ce79ed921530b99
PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()
1. With the patch "x86/vector/msi: Switch to global reservation mode",
the recent v4.15 and newer kernels always hang for 1-vCPU Hyper-V VM
with SR-IOV. This is because when we reach hv_compose_msi_msg() by
request_irq() -> request_threaded_irq() ->__setup_irq()->irq_startup()
-> __irq_startup() -> irq_domain_activate_irq() -> ... ->
msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
disabled in __setup_irq().
Note: when we reach hv_compose_msi_msg() by another code path:
pci_enable_msix_range() -> ... -> irq_domain_activate_irq() -> ... ->
hv_compose_msi_msg(), local irq is not disabled.
hv_compose_msi_msg() depends on an interrupt from the host.
With interrupts disabled, a UP VM always hangs in the busy loop in
the function, because the interrupt callback hv_pci_onchannelcallback()
can not be called.
We can do nothing but work it around by polling the channel. This
is ugly, but we don't have any other choice.
2. If the host is ejecting the VF device before we reach
hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
forever, because at this time the host doesn't respond to the
CREATE_INTERRUPT request. This issue exists the first day the
pci-hyperv driver appears in the kernel.
Luckily, this can also by worked around by polling the channel
for the PCI_EJECT message and hpdev->state, and by checking the
PCI vendor ID.
Note: actually the above 2 issues also happen to a SMP VM, if
"hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=de0aa7b2f97d348ba7d1e17a00744c989baa0cb6
PCI: hv: Serialize the present and eject work items
When we hot-remove the device, we first receive a PCI_EJECT message and
then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
The first message is offloaded to hv_eject_device_work(), and the second
is offloaded to pci_devices_present_work(). Both the paths can be running
list_del(&hpdev->list_entry), causing general protection fault, because
system_wq can run them concurrently.
The patch eliminates the race condition.
Since access to present/eject work items is serialized, we do not need the
hbus->enum_sem anymore, so remove it.
https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=021ad274d7dc31611d4f47f7dd4ac7a224526f30
All 4.15-based kernels need these fixes, or any kernels that picked up:
Fixes: 4900be83602b ("x86/vector/msi: Switch to global reservation mode")
The race condition fixed by the serialization patch applies to all kernels with PCI passthrough on Hyper-V:
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") (the catch-all for PCI passthrough)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1758378/+subscriptions