← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2091114] Re: Nova validation checks checks reject valid UEFI image

 

ok os repoducing this locally using
https://review.opendev.org/c/openstack/oslo.utils/+/937037

the error is


[18:59:29]➜ python3 -m oslo_utils.imageutils.format_inspector ../flatcar-stable-4081.2.0-kube-v1.30.1.img.raw 
inspecting file: ../flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
detected file format: gpt
running safety checks...
Safety check mbr on gpt failed because GPT MBR defines invalid extra partitions
FAILED! Safety checks failed: mbr

1/1 failed

and according to

https://wiki.osdev.org/GPT#LBA_0\:_Protective_Master_Boot_Record

""" The UEFI specification requires that the PMBR partition table
contain one partition record, with the other three partition records set
to zero."""

so i need to look at the detection code in oslo.utils to confirm but I'm
99% sure the flatcar image does not contain a valid PMBR record based on
the uefi spec requirements.

as such nova is correctly rejecting the image

it may have been a working image but it does not look like its a valid
one.

I'm going to mark this as invalid for nova and add oslo.utils to the bug
given this is shared code in the imageutils.


i see a few paths forward. 

one close this as invalid and flatcar can make there images conform to the uefi spec.
two add a compatibility flag that relaxes this constraint if opted into
three relax it unconditionally


the concern with 2 and 3 is that if the ovmf firmware in qemu or on real hardware ever enforces the requirement it will break in the future.

option 1 means existing "working" but potentially invlid images will not
work on OpenStack.

there are a few things we need to confirm

first does the flatcar image have multiple Partions in the PMBR

as we can see form rocky 8

[18:59:40]❯ python3 -m oslo_utils.imageutils.format_inspector ../Rocky-8-GenericCloud-Base.latest.x86_64.raw 
inspecting file: ../Rocky-8-GenericCloud-Base.latest.x86_64.raw
detected file format: gpt
running safety checks...
PASSED!

having multiple partitions is ok, it's listing more then one in the
first sector, the Protective Master Boot Record, that is invliad.

second we need to see if we can find a direct refecne to the uefi
requirement

third we need to discussion with oslo and the other stakeholder if a
compact mot is a viald approach or do we really want to require strict
confromance.

** Also affects: oslo.utils
   Importance: Undecided
       Status: New

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2091114

Title:
  Nova validation checks checks reject valid UEFI image

Status in OpenStack Compute (nova):
  Invalid
Status in oslo.utils:
  New

Bug description:
  This relates specifically to this image:
  https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/flatcar/flatcar-stable-4081.2.0-kube-v1.30.1.img

  However, the problem should be easy enough to understand just from the
  description here without downloading it.

  When attempting to boot the image in 2024.2 devstack we see the
  following failure:

  Dec 04 11:01:53
  capo-e2e-controller.c.k8s-infra-e2e-boskos-107.internal nova-
  compute[114399]: ESC[01;33mWARNING
  oslo_utils.imageutils.format_inspector [ESC[01;36mNone
  req-993c42cb-8da1-4cc5-83fd-1c16c08cbc13 ESC[00;36mdemo
  demoESC[01;33m] ESC[01;35mESC[01;33mSafety check mbr on gpt failed
  because GPT MBR defines invalid extra partitionsESC[00m:
  oslo_utils.imageutils.format_inspector.SafetyViolation: GPT MBR
  defines invalid extra partitionsESC[00m

  There is an associated stack trace and the server enters the ERROR
  state.

  This is a QCOW2 image. After downloading it I can manually convert it
  to raw to inspect its partition table:

  > qemu-img convert -f qcow2 flatcar-stable-4081.2.0-kube-v1.30.1.img -O raw flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
  > fdisk -l flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
  Disk flatcar-stable-4081.2.0-kube-v1.30.1.img.raw: 20 GiB, 21474836480 bytes, 41943040 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 512 bytes
  I/O size (minimum/optimal): 512 bytes / 512 bytes
  Disklabel type: gpt
  Disk identifier: D814FAF6-AD0A-4FC1-8DE9-236755D902E5

  Device                                          Start      End  Sectors  Size Type
  flatcar-stable-4081.2.0-kube-v1.30.1.img.raw1    4096   266239   262144  128M EFI System
  flatcar-stable-4081.2.0-kube-v1.30.1.img.raw2  266240   270335     4096    2M BIOS boot
  flatcar-stable-4081.2.0-kube-v1.30.1.img.raw3  270336  2367487  2097152    1G unknown
  flatcar-stable-4081.2.0-kube-v1.30.1.img.raw4 2367488  4464639  2097152    1G unknown
  flatcar-stable-4081.2.0-kube-v1.30.1.img.raw6 4464640  4726783   262144  128M Linux filesystem
  flatcar-stable-4081.2.0-kube-v1.30.1.img.raw7 4726784  4857855   131072   64M unknown
  flatcar-stable-4081.2.0-kube-v1.30.1.img.raw9 4857856 41943006 37085151 17.7G unknown

  We apparently have Nova configured to convert qcow2 images to raw
  before booting them, which we can also see in the logs:

  Dec 04 11:01:37
  capo-e2e-controller.c.k8s-infra-e2e-boskos-107.internal nova-
  compute[114399]: ESC[00;32mDEBUG nova.virt.images [ESC[01;36mNone
  req-993c42cb-8da1-4cc5-83fd-1c16c08cbc13 ESC[00;36mdemo
  demoESC[00;32m]
  ESC[01;35mESC[00;32m945136cb-6cc4-4e09-a785-50eaa79e2b10 was qcow2,
  converting to rawESC[00m ESC[00;33m{{(pid=114399) fetch_to_raw
  /opt/stack/nova/nova/virt/images.py:254}}ESC[00mESC[00m

  
  Using a patch to oslo.utils from Stephen Finucane and adding some extra print statements of my own, it's clear that we're failing here:

  https://github.com/openstack/oslo.utils/blob/79f5ec658e2fee8ab46201a71faaff8d3b67a690/oslo_utils/imageutils/format_inspector.py#L1273-L1274

  > ./venv/bin/python ./oslo_utils/imageutils/format_inspector.py /tmp/flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
  inspecting file: /tmp/flatcar-stable-4081.2.0-kube-v1.30.1.raw
  detected file format: gpt
  running safety checks...
  i: 0, ostype: 12
  i: 1, ostype: 238
  i: 2, ostype: 0
  i: 3, ostype: 0
  valid_partions: [0, 1]
  Safety check mbr on gpt failed because GPT MBR defines invalid extra partitions
  FAILED! Safety checks failed: mbr

  1/1 failed

  This code expects there to be exactly one partition with a non-zero
  partition type, and that this partition must be the first one. In this
  image, both of the first 2 partitions have a non-zero partition type.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2091114/+subscriptions



References