← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1918419] Re: vCPU resource max_unit is hardcoded

 

in general i dont feel like this is a valid bug.
it is perhaps a feature request which chould be acomplished by an extention to Provider.yaml
to allow standard resouce class inventories to be updated by the operator.

in general what you are asking for is intentionally not allowed.

max_unit must be less then total to prevent oversubsrction of a singel
allocation against istelf.

e.g. if total was 4 and max_unit as 8 the we could not actully allocate
8 to a vm without the vm over subsribing against its self.

this would be invalid there for changing max_unit in this way would be
incorrect.

the supported way to adress your current problem would be to resize your impacted vms before moving them perhaps to ones with 2 numa node
e.g.  hw:numa_nodes=2 hw:mem_page_size=small.
note: hw:mem_page_size should always be set if you use hw:numa_nodes

im going to mark this as invalid for now but we could discuss this at the PTG
realisticaly though i dont see a clean way to resovle this while also keeping the vms alive.
resize would work but the live requirement is what makes that unpalitable.



** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1918419

Title:
  vCPU resource max_unit is hardcoded

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Becasue the spectre/meltdown vulnerabilities (2018) we needed to
  disable SMT in all public facing compute nodes. As result the number
  of available cores was reduced by half.

  We had flavors available with 32vCPUs that couldn't be used anymore
  because placement max_unit for vCPUs is hardcoded to be the total
  number of cpus regardless the allocation_ratio.

  To me it's a sensible default but doesn't offer any flexibility for
  operators.

  See the IRC discussion at that time:
  http://eavesdrop.openstack.org/irclogs/%23openstack-placement/%23openstack-placement.2018-09-20.log.html

  
  As conclusion, we informed the users that we couldn't offer those flavors anymore. The old VMs (that were created before disabling SMT) continued to run without any issue.

  So... after ~2 year I'm hitting again this problem :)

  These compute nodes need now to be retired and we are live migrating
  all the instances to the replacement hardware.

  When trying to live migrate these instances (vCPUs > max_unit) it
  fails, becasue the migration allocation can't be created against the
  source compute node. For the new hardware (dest_compute) the vCPUS  <
  max_unit, so no issue for the new allocation.

  I'm working around this problem (to live migrate the instances),
  patching the code to have a higher max_unit for vCPUs in the compute
  nodes hosting these instances.

  I feel that this issue should be discussed again and consider the
  possibility to configure the max_unit value.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1918419/+subscriptions


References