yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96198
[Bug 2117481] [NEW] disk and interfaces not handling the pcie device limit for q35
Public bug reported:
The num_pcie_ports libvirt option defines the total number of
available PCIe slots for an instance to hotplug using the q35 hardware machine type.
https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=ignored%20by%20nova.-,num_pcie_ports,-%C2%B6
Since both volume attachments and virtual NICs (Neutron ports)
consume PCIe slots or precisely pcie-root-port, the "max_disk_devices_to_attach" configuration
option is suboptimal because it doesn't account for the NICs/Ports attached to the VM.
https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=means%20no%20limit.-,max_disk_devices_to_attach,-%C2%B6
This can lead to a resource allocation issue and config setting which will never be applied correctly.
For example, consider the following configuration:
num_pcie_ports = 19
max_disk_devices_to_attach = 15
A user could create a VM with 5 Ports and then attach 14 volumes, consuming all 19 available PCIe slots.
If they then try to attach another volume, libvirt will deny the request and raise a
"No more available PCI slots exception".
Crucially, OpenStack doesn't inform the user with a HTTP 500 or 403 that
the volume attachment is failing due to a lack of available PCIe slots,
which causes confusion. In this scenario, the
"max_disk_devices_to_attach" limit can't be even reached if the VM is
configured with more than 5 Ports, as the instance runs out of PCIe
slots first.
This silent failure only applies to volume attachments. Attempting to
add another Port for example returns a "500 Failed to attach network
adapter device error". However, this message also obscures the root
cause of the failure, as it doesn't expose the underlying libvirt
exception.
We created a patch that checks for available PCIe ports during both
volume and network interface attachments. This check respects the
max_disk_devices_to_attach configuration option.
Ideally, the num_pcie_ports configuration should define the actual limit
for attachable PCIe devices. However, in our QEMU + Libvirt environment,
this setting is unreliable. For example, when num_pcie_ports is set to
the default maximum of 28, the instance only has 25 available PCIe
ports. For some unknown reason, three ports are always missing.
This discrepancy causes the instance to run out of PCIe slots before the
attachment limit is ever reached, reintroducing the original problem.
** Affects: nova
Importance: Undecided
Status: New
** Tags: cinder config libvirt neutron volumes
** Description changed:
The num_pcie_ports libvirt option defines the total number of
available PCIe slots for an instance to hotplug using the q35 hardware machine type.
https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=ignored%20by%20nova.-,num_pcie_ports,-%C2%B6
Since both volume attachments and virtual NICs (Neutron ports)
consume PCIe slots or precisely pcie-root-port, the "max_disk_devices_to_attach" configuration
option is suboptimal because it doesn't account for the NICs/Ports attached to the VM.
+ https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=means%20no%20limit.-,max_disk_devices_to_attach,-%C2%B6
+
This can lead to a resource allocation issue and config setting which will never be applied correctly.
For example, consider the following configuration:
num_pcie_ports = 19
max_disk_devices_to_attach = 15
A user could create a VM with 5 Ports and then attach 14 volumes, consuming all 19 available PCIe slots.
- If they then try to attach another volume, libvirt will deny the request and raise a
+ If they then try to attach another volume, libvirt will deny the request and raise a
"No more available PCI slots exception".
Crucially, OpenStack doesn't inform the user with a HTTP 500 or 403 that
the volume attachment is failing due to a lack of available PCIe slots,
which causes confusion. In this scenario, the
"max_disk_devices_to_attach" limit can't be even reached if the VM is
configured with more than 5 Ports, as the instance runs out of PCIe
slots first.
This silent failure only applies to volume attachments. Attempting to
add another Port for example returns a "500 Failed to attach network
adapter device error". However, this message also obscures the root
cause of the failure, as it doesn't expose the underlying libvirt
exception.
-
- We created a patch that checks for available PCIe ports during both volume and network interface attachments. This check respects the max_disk_devices_to_attach configuration option.
+ We created a patch that checks for available PCIe ports during both
+ volume and network interface attachments. This check respects the
+ max_disk_devices_to_attach configuration option.
Ideally, the num_pcie_ports configuration should define the actual limit
for attachable PCIe devices. However, in our QEMU + Libvirt environment,
this setting is unreliable. For example, when num_pcie_ports is set to
the default maximum of 28, the instance only has 25 available PCIe
ports. For some unknown reason, three ports are always missing.
This discrepancy causes the instance to run out of PCIe slots before the
attachment limit is ever reached, reintroducing the original problem.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2117481
Title:
disk and interfaces not handling the pcie device limit for q35
Status in OpenStack Compute (nova):
New
Bug description:
The num_pcie_ports libvirt option defines the total number of
available PCIe slots for an instance to hotplug using the q35 hardware machine type.
https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=ignored%20by%20nova.-,num_pcie_ports,-%C2%B6
Since both volume attachments and virtual NICs (Neutron ports)
consume PCIe slots or precisely pcie-root-port, the "max_disk_devices_to_attach" configuration
option is suboptimal because it doesn't account for the NICs/Ports attached to the VM.
https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=means%20no%20limit.-,max_disk_devices_to_attach,-%C2%B6
This can lead to a resource allocation issue and config setting which will never be applied correctly.
For example, consider the following configuration:
num_pcie_ports = 19
max_disk_devices_to_attach = 15
A user could create a VM with 5 Ports and then attach 14 volumes, consuming all 19 available PCIe slots.
If they then try to attach another volume, libvirt will deny the request and raise a
"No more available PCI slots exception".
Crucially, OpenStack doesn't inform the user with a HTTP 500 or 403
that the volume attachment is failing due to a lack of available PCIe
slots, which causes confusion. In this scenario, the
"max_disk_devices_to_attach" limit can't be even reached if the VM is
configured with more than 5 Ports, as the instance runs out of PCIe
slots first.
This silent failure only applies to volume attachments. Attempting to
add another Port for example returns a "500 Failed to attach network
adapter device error". However, this message also obscures the root
cause of the failure, as it doesn't expose the underlying libvirt
exception.
We created a patch that checks for available PCIe ports during both
volume and network interface attachments. This check respects the
max_disk_devices_to_attach configuration option.
Ideally, the num_pcie_ports configuration should define the actual
limit for attachable PCIe devices. However, in our QEMU + Libvirt
environment, this setting is unreliable. For example, when
num_pcie_ports is set to the default maximum of 28, the instance only
has 25 available PCIe ports. For some unknown reason, three ports are
always missing.
This discrepancy causes the instance to run out of PCIe slots before
the attachment limit is ever reached, reintroducing the original
problem.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2117481/+subscriptions
Follow ups