← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2117481] Re: disk and interfaces not handling the pcie device limit for q35

 

the correct error code would be a 409 conflict i think.

the request is syntacticly viald but the not semanticly for the current state of the resouce.
so a 403 is not correct not would a 500 be.


there are other ways to consume pci ports in the vm beyound volumes and nics like pci passhtoug
vnc, vgpus to name but a few.


interface and volume attach/detach are also async operations.

we cannot return a synconous error code via tthe api. we can only report
error via the isntace action api.



** Changed in: nova
       Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2117481

Title:
  disk and interfaces not handling the pcie device limit for q35

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  The num_pcie_ports libvirt option defines the total number of
  available PCIe slots for an instance to hotplug using the q35 hardware machine type.

  https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=ignored%20by%20nova.-,num_pcie_ports,-%C2%B6

  Since both volume attachments and virtual NICs (Neutron ports)
  consume PCIe slots or precisely pcie-root-port, the "max_disk_devices_to_attach" configuration
  option is suboptimal because it doesn't account for the NICs/Ports attached to the VM.

  https://docs.openstack.org/nova/latest/configuration/config.html#:~:text=means%20no%20limit.-,max_disk_devices_to_attach,-%C2%B6

  This can lead to a resource allocation issue and config setting which will never be applied correctly.
  For example, consider the following configuration:

  num_pcie_ports = 19
  max_disk_devices_to_attach = 15

  A user could create a VM with 5 Ports and then attach 14 volumes, consuming all 19 available PCIe slots.
  If they then try to attach another volume, libvirt will deny the request and raise a
  "No more available PCI slots exception".

  Crucially, OpenStack doesn't inform the user with a HTTP 500 or 403
  that the volume attachment is failing due to a lack of available PCIe
  slots, which causes confusion. In this scenario, the
  "max_disk_devices_to_attach" limit can't be even reached if the VM is
  configured with more than 5 Ports, as the instance runs out of PCIe
  slots first.

  This silent failure only applies to volume attachments. Attempting to
  add another Port for example returns a "500 Failed to attach network
  adapter device error". However, this message also obscures the root
  cause of the failure, as it doesn't expose the underlying libvirt
  exception.

  We created a patch that checks for available PCIe ports during both
  volume and network interface attachments. This check respects the
  max_disk_devices_to_attach configuration option.

  https://review.opendev.org/c/openstack/nova/+/955584

  Ideally, the num_pcie_ports configuration should define the actual
  limit for attachable PCIe devices. However, in our QEMU + Libvirt
  environment, this setting is unreliable. For example, when
  num_pcie_ports is set to the default maximum of 28, the instance only
  has 25 available PCIe ports. For some unknown reason, three ports are
  always missing.

  This discrepancy causes the instance to run out of PCIe slots before
  the attachment limit is ever reached, reintroducing the original
  problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2117481/+subscriptions



References