← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1894942] Re: [UBUNTU 20.04] Lost virtio host --> guest notifications cause devices to cease normal operation

 

** Changed in: ubuntu-z-systems
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1894942

Title:
  [UBUNTU 20.04] Lost virtio host --> guest notifications cause devices
  to cease normal operation

Status in Ubuntu on IBM z Systems:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Xenial:
  Fix Released
Status in qemu source package in Bionic:
  Fix Released
Status in qemu source package in Focal:
  Fix Released
Status in qemu source package in Groovy:
  Fix Released

Bug description:
  [Impact]

   * Host -> Guest notifications can be lost and kill I/O due to that,
     see below at the original bug report for more details.

   * Backport the fix that ensures that the generated code has to re-load 
     variables properly avoiding the issue.

  [Test Case]

   * Set up iperf in the host and run the server "iperf -s"
   * get a guest using driver=qemu like:
      <interface type='network'>
      <source network='default'/>
      <model type='virtio'/>
      <driver name='qemu'/>
      <interface/>
   * In the guest run a loop of iperf runs connecting to the
     server on the host.
      #!/bin/bash
      for i in $(seq 1 1000);
      do
        echo Try $i
        iperf -c 192.168.122.1 || break
      done
   * Depending on the HW model, the machine saturation and such it seems
     the above test either is rather reproducible or not-at-all.
     That is bad, but we haven't found a much better repro, gladly IBM
     who reported this issue (and created the fix) can recreate this on 
     their end and are willing to do so again for the SRU verification.

  [Regression Potential]

   * The changed code path is s390x only and there on the virtio-ccw 
     handling. Therefore regressions - if any - would be isolated to s390x 
     only and there manifest on virtio-ccw based I/O.

  [Other Info]
   
   * n/a

  ----

  
  Problem Description:

  When irqfds are not used setting of the adapter interruption
  host-->guest notifier bit is accomplished by the QEMU function
  virtio_set_ind_atomic().

  The atomic_cmpxchg() loop in virtio_set_ind_atomic() is broken because
  we occasionally end up with old and _old having different values (a
  legit compiler can generate code that accessed *ind_addr again to pick
  up a value for _old instead of using the value of old that was already
  fetched according to the rules of the abstract machine). This means
  the underlying CS instruction may use a different old (_old) than the
  one we intended to use if atomic_cmpxchg() performed the xchg part.

  The direct consequence of the problem is that host --> guest
  notifications can get lost. The indirect consequence is that queues
  may get stuck and the devices may cease operate normally. We stumbled
  on debugging a choked virtio-net interface (one that used the qemu
  driver and not vhost). But it can affect other virtio-ccw devices as
  well.

  If irqfds are used for host->guest notifications, then we are safe
  because notifier bit manipulation is done in the kernel (and it's done
  correctly).

  The problem described above is fixed upstream by commit.

  1a8242f7c3 ("virtio-ccw: fix virtio_set_ind_atomic")

  All upstream versions since v2.0.0 are (potentially) affected.

  The same mistake was made in QEMU in another place, and is fixed by:

  45175361f1 ("s390x/pci: fix set_ind_atomic")

  We can file a separate BZ for it if necessary.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1894942/+subscriptions