yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95011
[Bug 2091114] Re: Nova validation checks checks reject valid UEFI image
ok os repoducing this locally using
https://review.opendev.org/c/openstack/oslo.utils/+/937037
the error is
[18:59:29]➜ python3 -m oslo_utils.imageutils.format_inspector ../flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
inspecting file: ../flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
detected file format: gpt
running safety checks...
Safety check mbr on gpt failed because GPT MBR defines invalid extra partitions
FAILED! Safety checks failed: mbr
1/1 failed
and according to
https://wiki.osdev.org/GPT#LBA_0\:_Protective_Master_Boot_Record
""" The UEFI specification requires that the PMBR partition table
contain one partition record, with the other three partition records set
to zero."""
so i need to look at the detection code in oslo.utils to confirm but I'm
99% sure the flatcar image does not contain a valid PMBR record based on
the uefi spec requirements.
as such nova is correctly rejecting the image
it may have been a working image but it does not look like its a valid
one.
I'm going to mark this as invalid for nova and add oslo.utils to the bug
given this is shared code in the imageutils.
i see a few paths forward.
one close this as invalid and flatcar can make there images conform to the uefi spec.
two add a compatibility flag that relaxes this constraint if opted into
three relax it unconditionally
the concern with 2 and 3 is that if the ovmf firmware in qemu or on real hardware ever enforces the requirement it will break in the future.
option 1 means existing "working" but potentially invlid images will not
work on OpenStack.
there are a few things we need to confirm
first does the flatcar image have multiple Partions in the PMBR
as we can see form rocky 8
[18:59:40]❯ python3 -m oslo_utils.imageutils.format_inspector ../Rocky-8-GenericCloud-Base.latest.x86_64.raw
inspecting file: ../Rocky-8-GenericCloud-Base.latest.x86_64.raw
detected file format: gpt
running safety checks...
PASSED!
having multiple partitions is ok, it's listing more then one in the
first sector, the Protective Master Boot Record, that is invliad.
second we need to see if we can find a direct refecne to the uefi
requirement
third we need to discussion with oslo and the other stakeholder if a
compact mot is a viald approach or do we really want to require strict
confromance.
** Also affects: oslo.utils
Importance: Undecided
Status: New
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2091114
Title:
Nova validation checks checks reject valid UEFI image
Status in OpenStack Compute (nova):
Invalid
Status in oslo.utils:
New
Bug description:
This relates specifically to this image:
https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/flatcar/flatcar-stable-4081.2.0-kube-v1.30.1.img
However, the problem should be easy enough to understand just from the
description here without downloading it.
When attempting to boot the image in 2024.2 devstack we see the
following failure:
Dec 04 11:01:53
capo-e2e-controller.c.k8s-infra-e2e-boskos-107.internal nova-
compute[114399]: ESC[01;33mWARNING
oslo_utils.imageutils.format_inspector [ESC[01;36mNone
req-993c42cb-8da1-4cc5-83fd-1c16c08cbc13 ESC[00;36mdemo
demoESC[01;33m] ESC[01;35mESC[01;33mSafety check mbr on gpt failed
because GPT MBR defines invalid extra partitionsESC[00m:
oslo_utils.imageutils.format_inspector.SafetyViolation: GPT MBR
defines invalid extra partitionsESC[00m
There is an associated stack trace and the server enters the ERROR
state.
This is a QCOW2 image. After downloading it I can manually convert it
to raw to inspect its partition table:
> qemu-img convert -f qcow2 flatcar-stable-4081.2.0-kube-v1.30.1.img -O raw flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
> fdisk -l flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
Disk flatcar-stable-4081.2.0-kube-v1.30.1.img.raw: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: D814FAF6-AD0A-4FC1-8DE9-236755D902E5
Device Start End Sectors Size Type
flatcar-stable-4081.2.0-kube-v1.30.1.img.raw1 4096 266239 262144 128M EFI System
flatcar-stable-4081.2.0-kube-v1.30.1.img.raw2 266240 270335 4096 2M BIOS boot
flatcar-stable-4081.2.0-kube-v1.30.1.img.raw3 270336 2367487 2097152 1G unknown
flatcar-stable-4081.2.0-kube-v1.30.1.img.raw4 2367488 4464639 2097152 1G unknown
flatcar-stable-4081.2.0-kube-v1.30.1.img.raw6 4464640 4726783 262144 128M Linux filesystem
flatcar-stable-4081.2.0-kube-v1.30.1.img.raw7 4726784 4857855 131072 64M unknown
flatcar-stable-4081.2.0-kube-v1.30.1.img.raw9 4857856 41943006 37085151 17.7G unknown
We apparently have Nova configured to convert qcow2 images to raw
before booting them, which we can also see in the logs:
Dec 04 11:01:37
capo-e2e-controller.c.k8s-infra-e2e-boskos-107.internal nova-
compute[114399]: ESC[00;32mDEBUG nova.virt.images [ESC[01;36mNone
req-993c42cb-8da1-4cc5-83fd-1c16c08cbc13 ESC[00;36mdemo
demoESC[00;32m]
ESC[01;35mESC[00;32m945136cb-6cc4-4e09-a785-50eaa79e2b10 was qcow2,
converting to rawESC[00m ESC[00;33m{{(pid=114399) fetch_to_raw
/opt/stack/nova/nova/virt/images.py:254}}ESC[00mESC[00m
Using a patch to oslo.utils from Stephen Finucane and adding some extra print statements of my own, it's clear that we're failing here:
https://github.com/openstack/oslo.utils/blob/79f5ec658e2fee8ab46201a71faaff8d3b67a690/oslo_utils/imageutils/format_inspector.py#L1273-L1274
> ./venv/bin/python ./oslo_utils/imageutils/format_inspector.py /tmp/flatcar-stable-4081.2.0-kube-v1.30.1.img.raw
inspecting file: /tmp/flatcar-stable-4081.2.0-kube-v1.30.1.raw
detected file format: gpt
running safety checks...
i: 0, ostype: 12
i: 1, ostype: 238
i: 2, ostype: 0
i: 3, ostype: 0
valid_partions: [0, 1]
Safety check mbr on gpt failed because GPT MBR defines invalid extra partitions
FAILED! Safety checks failed: mbr
1/1 failed
This code expects there to be exactly one partition with a non-zero
partition type, and that this partition must be the first one. In this
image, both of the first 2 partitions have a non-zero partition type.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2091114/+subscriptions
References