← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1932127] [NEW] downloading large image from glance locks out threads

 

Public bug reported:

Reproduction scenario:

- create a large base image (50 GB or so). This could be done for example by creating a VM, filling its ephemeral storage (which is 50 GB in our case) with lots of junk data, shutting down the VM, and creating an image from that.
- create a VM based on this image.

While the image is downloaded by nova-compute from glance, it seems that
other threads are locked out (too long). Network connection failures get
logged an if it lasts long enough, creating the VM often fails.

We first saw this on Queens, but I reproduced the same issue on Ussuri.
As hypervisor we use Libvirt + KVM.
For storage we use Quobyte (shared storage).
For networking we use Midonet (on Queens) and OVS on Ussuri.

My solution was to put "greenthreads.sleep(0)" in the inner loop, like
so:


From: Olaf Seibert <o.seibert@xxxxxxxxxxxx>
Date: Thu, 10 Jun 2021 11:38:16 +0000
Subject: Allow other threads to run.

While downloading a base image from Glance, other threads don't get
enough of a chance to run, and network connections start to time out.
See os-9400.
---
 nova/image/glance.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/nova/image/glance.py b/nova/image/glance.py
index 13bfd90..8231351 100644
--- a/nova/image/glance.py
+++ b/nova/image/glance.py
@@ -30,6 +30,7 @@ import time
 import cryptography
 from cursive import exception as cursive_exception
 from cursive import signature_utils
+from eventlet import greenthread
 import glanceclient
 import glanceclient.exc
 from glanceclient.v2 import schemas
@@ -384,6 +385,7 @@ class GlanceImageServiceV2(object):
                 try:
                     for chunk in image_chunks:
                         verifier.update(chunk)
+                        greenthread.sleep(0)
                     verifier.verify()
 
                     LOG.info('Image signature verification succeeded '
@@ -400,6 +402,7 @@ class GlanceImageServiceV2(object):
                     if verifier:
                         verifier.update(chunk)
                     data.write(chunk)
+                    greenthread.sleep(0)
                 if verifier:
                     verifier.verify()
                     LOG.info('Image signature verification succeeded '

However, the download happens in chunks of only 64 KB, so the sleep(0)
is called extremely frequently. Maybe there is a better solution, but it
should not be too complicated for such a tight loop.

I have attached some log files, since this bug tracker thinks the bug
description is too long.

** Affects: nova
     Importance: Undecided
         Status: New

** Attachment added: "Log file extracts"
   https://bugs.launchpad.net/bugs/1932127/+attachment/5504996/+files/Bugreport

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1932127

Title:
  downloading large image from glance locks out threads

Status in OpenStack Compute (nova):
  New

Bug description:
  Reproduction scenario:

  - create a large base image (50 GB or so). This could be done for example by creating a VM, filling its ephemeral storage (which is 50 GB in our case) with lots of junk data, shutting down the VM, and creating an image from that.
  - create a VM based on this image.

  While the image is downloaded by nova-compute from glance, it seems
  that other threads are locked out (too long). Network connection
  failures get logged an if it lasts long enough, creating the VM often
  fails.

  We first saw this on Queens, but I reproduced the same issue on Ussuri.
  As hypervisor we use Libvirt + KVM.
  For storage we use Quobyte (shared storage).
  For networking we use Midonet (on Queens) and OVS on Ussuri.

  My solution was to put "greenthreads.sleep(0)" in the inner loop, like
  so:

  
  From: Olaf Seibert <o.seibert@xxxxxxxxxxxx>
  Date: Thu, 10 Jun 2021 11:38:16 +0000
  Subject: Allow other threads to run.

  While downloading a base image from Glance, other threads don't get
  enough of a chance to run, and network connections start to time out.
  See os-9400.
  ---
   nova/image/glance.py | 3 +++
   1 file changed, 3 insertions(+)

  diff --git a/nova/image/glance.py b/nova/image/glance.py
  index 13bfd90..8231351 100644
  --- a/nova/image/glance.py
  +++ b/nova/image/glance.py
  @@ -30,6 +30,7 @@ import time
   import cryptography
   from cursive import exception as cursive_exception
   from cursive import signature_utils
  +from eventlet import greenthread
   import glanceclient
   import glanceclient.exc
   from glanceclient.v2 import schemas
  @@ -384,6 +385,7 @@ class GlanceImageServiceV2(object):
                   try:
                       for chunk in image_chunks:
                           verifier.update(chunk)
  +                        greenthread.sleep(0)
                       verifier.verify()
   
                       LOG.info('Image signature verification succeeded '
  @@ -400,6 +402,7 @@ class GlanceImageServiceV2(object):
                       if verifier:
                           verifier.update(chunk)
                       data.write(chunk)
  +                    greenthread.sleep(0)
                   if verifier:
                       verifier.verify()
                       LOG.info('Image signature verification succeeded '

  However, the download happens in chunks of only 64 KB, so the sleep(0)
  is called extremely frequently. Maybe there is a better solution, but
  it should not be too complicated for such a tight loop.

  I have attached some log files, since this bug tracker thinks the bug
  description is too long.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1932127/+subscriptions