← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1554093] [NEW] Cached images incorrectly removed after instance storage comes back online after a prolonged >= 24 hour outage

 

Public bug reported:

After a prolonged outage of >= 24 hours any cached images stored on
shared instance storage are prone to removal as compute nodes race to
complete a pass of the cache manager once the storage returns.

This pass of the cache manager first registers the current node as an
active user of the instance store before compiling a list of instances
on hosts registered to the instance store. This list then being used to
determine which of the cached images can be safely removed.

After a prolonged outage of >= 24 hours the first compute node to run a
cache manager pass will only find itself listed as an active user of the
instance store. Thus it can and likely will remove cached images for
instances hosted on other compute nodes.

IMHO additional care should be taken before calling for the removal of
cached images for instances on registered but seemingly inactive compute
nodes.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1554093

Title:
  Cached images incorrectly removed after instance storage comes back
  online after a prolonged >= 24 hour outage

Status in OpenStack Compute (nova):
  New

Bug description:
  After a prolonged outage of >= 24 hours any cached images stored on
  shared instance storage are prone to removal as compute nodes race to
  complete a pass of the cache manager once the storage returns.

  This pass of the cache manager first registers the current node as an
  active user of the instance store before compiling a list of instances
  on hosts registered to the instance store. This list then being used
  to determine which of the cached images can be safely removed.

  After a prolonged outage of >= 24 hours the first compute node to run
  a cache manager pass will only find itself listed as an active user of
  the instance store. Thus it can and likely will remove cached images
  for instances hosted on other compute nodes.

  IMHO additional care should be taken before calling for the removal of
  cached images for instances on registered but seemingly inactive
  compute nodes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1554093/+subscriptions


Follow ups