← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1894942] Re: [UBUNTU 20.04] Lost virtio host --> guest notifications cause devices to cease normal operation

 

Hi,
the patches LGTM and I'll make it part of the Ubuntu builds.

One question thou: on one hand "occasionally end up with" sounds like this is almost impossible to explicitly test, but then the attached debug patch suggests there might be a way to trigger this through protvirt.
But even if that is the case backporting it to all kind of older versions there is no protvirt.
So I wanted to ask:
a) is there any way to reliably trigger this for A/B testing of the fix as far back as qemu 2.5?
b) how real is the danger to hit and consequences of this in the pre-protvirt (=<Focal) era?

** Also affects: qemu (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: qemu (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: qemu (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: qemu (Ubuntu Xenial)
       Status: New => Incomplete

** Changed in: qemu (Ubuntu Bionic)
       Status: New => Incomplete

** Changed in: qemu (Ubuntu Focal)
       Status: New => Triaged

** Changed in: qemu (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: qemu (Ubuntu)
   Importance: Undecided => High

** Changed in: qemu (Ubuntu)
       Status: New => In Progress

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1894942

Title:
  [UBUNTU 20.04] Lost virtio host --> guest notifications cause devices
  to cease normal operation

Status in Ubuntu on IBM z Systems:
  New
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Xenial:
  Incomplete
Status in qemu source package in Bionic:
  Incomplete
Status in qemu source package in Focal:
  Triaged

Bug description:
  Problem Description:

  When irqfds are not used setting of the adapter interruption
  host-->guest notifier bit is accomplished by the QEMU function
  virtio_set_ind_atomic().

  The atomic_cmpxchg() loop in virtio_set_ind_atomic() is broken because we occasionally end up with old and _old having different values (a legit compiler can generate code that accessed *ind_addr again to pick up a value for _old instead of using the value of old that was already fetched according to the rules of the abstract machine). This means the underlying CS instruction may use a different old (_old) than the one we intended to use if atomic_cmpxchg() performed the xchg part.
      
  The direct consequence of the problem is that host --> guest notifications can get lost. The indirect consequence is that queues may get stuck and the devices may cease operate normally. We stumbled on debugging a choked virtio-net interface (one that used the qemu driver and not vhost). But it can affect other virtio-ccw devices as well. 

  If irqfds are used for host->guest notifications, then we are safe
  because notifier bit manipulation is done in the kernel (and it's done
  correctly).

  
  The problem described above is fixed upstream by commit.

  1a8242f7c3 ("virtio-ccw: fix virtio_set_ind_atomic")

  All upstream versions since v2.0.0 are (potentially) affected.

  The same mistake was made in QEMU in another place, and is fixed by:

  45175361f1 ("s390x/pci: fix set_ind_atomic")

  We can file a separate BZ for it if necessary.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1894942/+subscriptions