← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1467570] [NEW] Nova can't provision instance from snapshot with a ceph backend

 

Public bug reported:

This is a weird issue that does not happen in our Juno setup, but
happens in our Kilo setup. The configuration between the two setups is
pretty much the same, with only kilo-specific changes done (namely,
moving lines around to new sections).

Here's how to reproduce: 
1.Provision an instance.
2.Make a snapshot of this instance.
3.Try to provision an instance with that snapshot.

Nova-compute will complain that it can't find the disk and the instance
will fall in error.

Here's what the default behavior is supposed to be from my observations: 
-When the image is uploaded into ceph, a snapshot is created automatically inside ceph (this is NOT an instance snapshot per say, but a ceph internal snapshot). 
-When an instance is booted from image in nova, this snapshot gets a clone in the nova ceph pool. Nova then uses that clone as the instance's disk. This is called copy-on-write cloning.

Here's when things get funky: -When an instance is booted from a
snapshot, the copy-on-write cloning does not happen. Nova looks for the
disk and, of course, fails to find it in its pool, thus failing to
provision the instance . There's no trace anywhere of the copy-on-write
clone failing (In part because ceph doesn't log client commands, from
what I see).

The compute logs I got are in this pastebin :
http://pastebin.com/ADHTEnhn

There's a few things I notice here that I'd like to point out :

-Nova create an ephemeral drive file, then proceeds to delete it before
using rbd_utils instead. While strange, this may be the intended but
somewhat dirty behavior, as nova consider it an ephemeral instance,
before realizing that it's actually a ceph instance and doesn't need its
ephemeral disk. Or maybe these conjectures are completely wrong and this
is part of the issue.

-Nova "creates the image" (I'm guessing it's the copy-on-write cloning
happening here). What exactly happens here isn't very clear, but then it
complains that it can't find the clone in its pool to use as block
device.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: ceph

** Tags added: ceph

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1467570

Title:
  Nova can't provision instance from snapshot with a ceph backend

Status in OpenStack Compute (Nova):
  New

Bug description:
  This is a weird issue that does not happen in our Juno setup, but
  happens in our Kilo setup. The configuration between the two setups is
  pretty much the same, with only kilo-specific changes done (namely,
  moving lines around to new sections).

  Here's how to reproduce: 
  1.Provision an instance.
  2.Make a snapshot of this instance.
  3.Try to provision an instance with that snapshot.

  Nova-compute will complain that it can't find the disk and the
  instance will fall in error.

  Here's what the default behavior is supposed to be from my observations: 
  -When the image is uploaded into ceph, a snapshot is created automatically inside ceph (this is NOT an instance snapshot per say, but a ceph internal snapshot). 
  -When an instance is booted from image in nova, this snapshot gets a clone in the nova ceph pool. Nova then uses that clone as the instance's disk. This is called copy-on-write cloning.

  Here's when things get funky: -When an instance is booted from a
  snapshot, the copy-on-write cloning does not happen. Nova looks for
  the disk and, of course, fails to find it in its pool, thus failing to
  provision the instance . There's no trace anywhere of the copy-on-
  write clone failing (In part because ceph doesn't log client commands,
  from what I see).

  The compute logs I got are in this pastebin :
  http://pastebin.com/ADHTEnhn

  There's a few things I notice here that I'd like to point out :

  -Nova create an ephemeral drive file, then proceeds to delete it
  before using rbd_utils instead. While strange, this may be the
  intended but somewhat dirty behavior, as nova consider it an ephemeral
  instance, before realizing that it's actually a ceph instance and
  doesn't need its ephemeral disk. Or maybe these conjectures are
  completely wrong and this is part of the issue.

  -Nova "creates the image" (I'm guessing it's the copy-on-write cloning
  happening here). What exactly happens here isn't very clear, but then
  it complains that it can't find the clone in its pool to use as block
  device.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1467570/+subscriptions


Follow ups

References