← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1904726] [NEW] Live migration of instances with ephemeral storage inconsistent errors

 

Public bug reported:

Cloud: bionic-ussuri
Nova package: 2:21.0.0-0ubuntu0.20.04.1~cloud0

Live migration of instances sometimes succeeds, but sometimes fails
with:

/var/log/nova/nova-conductor.log.1:2020-11-17 22:03:40.123 574 WARNING
nova.scheduler.utils [req-ad81725c-de6d-4752-b7d4-f0b790afbb8a
7f5f38bb243a44e0b59a6fb4a0749e65 1e230541617441419e993d1c83cc61da -
ca9f2a8cfc05425ba056465f875e869a ca9f2a8cfc05425ba056465f875e869a]
[instance: 1408faec-eed0-4e5c-b7d1-0397472c5453] Setting instance to
ACTIVE state.: nova.exception.MigrationPreCheckError: Timeout while
checking if we can live migrate to host: <COMPUTE_HOST>

Checking nova logs on the remote host, we see an exception:

2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server [req-f95f177d-3eb4-4a2a-9e2a-9c63a3153ac9 7f5f38bb243a44e0b59a6fb4a0749e65 1e230541617441419e993d1c83cc61da - ca9f2a8cfc05425ba056465f875e869a ca9f2a8cfc05425ba056465f875e869a] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d,backing_fmt=raw /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk
Exit code: 1
Stdout: "Formatting '/var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk', fmt=qcow2 size=214748364800 backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d backing_fmt=raw cluster_size=65536 lazy_refcounts=off refcount_bits=16\n"
Stderr: 'qemu-img: /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk: Could not create file: No such file or directory\n'

a little bit down after the python exception:

2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d,backing_fmt=raw /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk
2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Exit code: 1
2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Stdout: "Formatting '/var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk', fmt=qcow2 size=214748364800 backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d backing_fmt=raw cluster_size=65536 lazy_refcounts=off refcount_bits=16\n"
2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Stderr: 'qemu-img: /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk: Could not create file: No such file or directory\n'
2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server
2020-11-17 22:04:36.101 176798 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection

Looks like a race-condition happening somewhere during live migration.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1904726

Title:
  Live migration of instances with ephemeral storage inconsistent errors

Status in OpenStack Compute (nova):
  New

Bug description:
  Cloud: bionic-ussuri
  Nova package: 2:21.0.0-0ubuntu0.20.04.1~cloud0

  Live migration of instances sometimes succeeds, but sometimes fails
  with:

  /var/log/nova/nova-conductor.log.1:2020-11-17 22:03:40.123 574 WARNING
  nova.scheduler.utils [req-ad81725c-de6d-4752-b7d4-f0b790afbb8a
  7f5f38bb243a44e0b59a6fb4a0749e65 1e230541617441419e993d1c83cc61da -
  ca9f2a8cfc05425ba056465f875e869a ca9f2a8cfc05425ba056465f875e869a]
  [instance: 1408faec-eed0-4e5c-b7d1-0397472c5453] Setting instance to
  ACTIVE state.: nova.exception.MigrationPreCheckError: Timeout while
  checking if we can live migrate to host: <COMPUTE_HOST>

  Checking nova logs on the remote host, we see an exception:

  2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server [req-f95f177d-3eb4-4a2a-9e2a-9c63a3153ac9 7f5f38bb243a44e0b59a6fb4a0749e65 1e230541617441419e993d1c83cc61da - ca9f2a8cfc05425ba056465f875e869a ca9f2a8cfc05425ba056465f875e869a] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d,backing_fmt=raw /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk
  Exit code: 1
  Stdout: "Formatting '/var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk', fmt=qcow2 size=214748364800 backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d backing_fmt=raw cluster_size=65536 lazy_refcounts=off refcount_bits=16\n"
  Stderr: 'qemu-img: /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk: Could not create file: No such file or directory\n'

  a little bit down after the python exception:

  2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d,backing_fmt=raw /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk
  2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Exit code: 1
  2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Stdout: "Formatting '/var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk', fmt=qcow2 size=214748364800 backing_file=/var/lib/nova/instances/_base/2c2154ce055c5cb1f6753afa1dfc93c42a8c1c6d backing_fmt=raw cluster_size=65536 lazy_refcounts=off refcount_bits=16\n"
  2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server Stderr: 'qemu-img: /var/lib/nova/instances/b0b5ee90-3f02-4fba-af62-6714dfd44990/disk: Could not create file: No such file or directory\n'
  2020-11-17 22:04:32.563 176798 ERROR oslo_messaging.rpc.server
  2020-11-17 22:04:36.101 176798 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection

  Looks like a race-condition happening somewhere during live migration.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1904726/+subscriptions


Follow ups