← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1734784] Re: Cannot boot instances on filesystem without O_DIRECT support (fails on tmpfs)

 

Reviewed:  https://review.openstack.org/523554
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e6ce9557f84cdcdf4ffdd12ce73a008c96c7b94a
Submitter: Zuul
Branch:    master

commit e6ce9557f84cdcdf4ffdd12ce73a008c96c7b94a
Author: Thomas Goirand <zigo@xxxxxxxxxx>
Date:   Tue Nov 28 23:22:56 2017 +0100

    qemu-img do not use cache=none if no O_DIRECT support
    
    If /var/lib/nova/instances is mounted on a filesystem like tmpfs that
    doesn't have support for O_DIRECT, "qemu-img convert" currently crashes
    because it's unconditionally using the "-t none" flag.
    
    This patch therefore:
    - moves the _supports_direct_io() function out of the libvirt driver,
    from nova/virt/libvirt/driver.py to nova/utils.py and makes it public.
    - uses that function to decide to use -t none or -t writethrough when
    converting images with qemu-img.
    
    Closes-Bug: #1734784
    
    Co-Authored-By: melanie witt <melwittt@xxxxxxxxx>
    
    Change-Id: Ifb47de00abf3f83442ca5264fbc24885df924a19


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1734784

Title:
  Cannot boot instances on filesystem without O_DIRECT support (fails on
  tmpfs)

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  I'm trying to (tempest) validate OpenStack Pike for Debian. So I'm
  running Pike (nova 16.0.3) on Debian Sid. My environment for running
  tempest is a Debian Live system which I just boot once, install all of
  OpenStack on, and run tempest.

  As a consequence, my filesystem is a bit weirdo. It's a single root
  partition that is using overlayfs, which has its read/write volume on
  tpmfs. This worked well when validating Newton, but it seems there's a
  regression with Pike. Here's what happens when trying to boot an
  instance:

   Traceback (most recent call last):
     File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2192, in _build_resources
       yield resources
     File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2007, in _build_and_run_instance
       block_device_info=block_device_info)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2802, in spawn
       block_device_info=block_device_info)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3240, in _create_image
       fallback_from_host)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3331, in _create_and_inject_local_root
       instance, size, fallback_from_host)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6988, in _try_fetch_image_cache
       size=size)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 241, in cache
       *args, **kwargs)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 595, in create_image
       prepare_template(target=base, *args, **kwargs)
     File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
       return f(*args, **kwargs)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 237, in fetch_func_sync
       fetch_func(target=target, *args, **kwargs)
     File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 446, in fetch_image
       images.fetch_to_raw(context, image_id, target)
     File "/usr/lib/python2.7/dist-packages/nova/virt/images.py", line 171, in fetch_to_raw
       % {'exp': exp})
   ImageUnacceptable: Image f8dc206b-d5a1-4123-b26c-7216d03ab1e7 is unacceptable: Unable to convert image to raw: Image /var/lib/nova/instances/_base/c44b0b620ae7c6fd8111e0abb5a8d1fc39fcdf08.part is unacceptable: Unable to convert image to raw: Unexpected error while running command.
   Command: qemu-img convert -t none -O raw -f qcow2 /var/lib/nova/instances/_base/c44b0b620ae7c6fd8111e0abb5a8d1fc39fcdf08.part /var/lib/nova/instances/_base/c44b0b620ae7c6fd8111e0abb5a8d1fc39fcdf08.converted
   Exit code: 1
   Stdout: u''
   Stderr: u"qemu-img: file system may not support O_DIRECT\nqemu-img: Could not open '/var/lib/nova/instances/_base/c44b0b620ae7c6fd8111e0abb5a8d1fc39fcdf08.converted': Could not open '/var/lib/nova/instances/_base/c44b0b620ae7c6fd8111e0abb5a8d1fc39fcdf08.converted': Invalid argument\n"

  In this log, the important bit is:

  file system may not support O_DIRECT

  Indeed, my weirdo filesystem setup doesn't supports this.

  I tried to set [libvirt]/disk_cachemodes = file=writethrough, as per
  recommendation on #openstack-nova, but it didn't fix the problem. It
  looks like nova doesn't really care about this.

  Because I also have a small scratch disk to do some cinder tests, and
  wanted to make sure about this issue, I tried mounting
  /var/lib/nova/instances on ext4, just to see if there was still the
  issue. And indeed, it fixed the problem.

  So, here, nova should check if /var/lib/nova/instances really has
  O_DIRECT support and use the correct options for libvirt, but it's
  looking like it isn't doing things properly. This IMO deserves a fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1734784/+subscriptions


References