← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1861067] Re: [Ocata]resource tracker does not validate placement allocation

 

I checked and on stable/ocata nova ignores the error from placement in
the reported case. So I made this confirmed for ocata. The same issue is
not valid for newer branches. Ocata is in extended maintenance so the
official project does not focus on fixing issues there but you can still
persuade your OpenStack vendor to fix the problem upstream.

** Changed in: nova
       Status: New => Confirmed

** Also affects: nova/ocata
   Importance: Undecided
       Status: New

** Changed in: nova/ocata
       Status: New => Confirmed

** Changed in: nova
       Status: Confirmed => Invalid

** Changed in: nova/ocata
   Importance: Undecided => Low

** Tags added: placement scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1861067

Title:
  [Ocata]resource tracker does not validate placement allocation

Status in OpenStack Compute (nova):
  Invalid
Status in OpenStack Compute (nova) ocata series:
  Confirmed

Bug description:
  For stable/ocata, we got serious scheduler problem makes us to upgrade
  to upper release. I could not find any issue report for that so leave
  it for whom meet this issue later.

  The problem which we encounter is like this
  - conductor try to schedule one compute nodes for 2 instances
  - nova-compute at that time has enough resource in compute_nodes, scheduler choose the nova-compute
  - resource tracker in nova-compute claim for resource to placement
  - placement returns for the answer of one of the request 409, since there were several concurrent requests.
  - [BUG here] resource tracker in nova-compute does not care about the return code from placement, so 'allocation' is only increased for share of the one instance.
  - After that compute_nodes in scheduler was full but allocation in placement has slot to be used.
  - [User meet weirdness here] since there were slot to be used in scheduler side, instance could be made in compute node which is actually full. The result is that compute node is over provisionning.
  - OOM occurs. (We got tight memory, if admin has other resource policy, they would be meet different side effect)

  I found it's already fixed over pike in which scheduler make
  allocation first and nova-compute just checks the compute_nodes. But
  for me, it's very hard to find root cause and need to investigate a
  lot for scheduler history, so I hope someone who meet this problem
  would be helpful.

  I do not sure it should be fixed since ocata is quite old though, we
  can fix it up to change the function (nova/scheduler/client/report.py
  _allocate_for_instance()) to catch the 409 conflict similar to the
  function latter added (put_allocations())

  Thanks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1861067/+subscriptions


References