← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1738083] Re: DBDeadlock when when syncing traits in Placement during list_allocation_candidates

 

Reviewed:  https://review.openstack.org/527836
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c66ae65775bb9d885fac059847063fee70617bc5
Submitter: Zuul
Branch:    master

commit c66ae65775bb9d885fac059847063fee70617bc5
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Wed Dec 13 21:22:32 2017 -0500

    Retry _trait_sync on deadlock
    
    We're seeing DBDeadlock failures during scheduling in CI jobs
    when syncing traits when getting allocation candidates.
    
    We have a lock around this code but that's not going to carry across
    multiple processes, so we need to be able to retry on deadlock if
    one occurs.
    
    Change-Id: I6cf1793c1cbed18d850ec7e32b5b195e78cb4e68
    Closes-Bug: #1738083


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1738083

Title:
  DBDeadlock when when syncing traits in Placement during
  list_allocation_candidates

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress

Bug description:
  This killed a scheduling request so we resulted with a NoValidHost:

  http://logs.openstack.org/64/527564/1/gate/legacy-tempest-dsvm-
  py35/7db2d64/logs/screen-placement-api.txt.gz#_Dec_13_17_07_40_968321

  It looks like it blows up here:

  Dec 13 17:07:40.973678 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler   File "/opt/stack/new/nova/nova/api/openstack/placement/handlers/allocation_candidate.py", line 217, in list_allocation_candidates
  Dec 13 17:07:40.973796 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler     cands = rp_obj.AllocationCandidates.get_by_requests(context, requests)
  Dec 13 17:07:40.973893 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler   File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 3182, in get_by_requests
  Dec 13 17:07:40.973969 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler     _ensure_trait_sync(context)
  Dec 13 17:07:40.974045 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler   File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 135, in _ensure_trait_sync
  Dec 13 17:07:40.974140 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler     _trait_sync(ctx)
  Dec 13 17:07:40.974218 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler   File "/usr/local/lib/python3.5/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 984, in wrapper
  Dec 13 17:07:40.974294 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler     return fn(*args, **kwargs)
  Dec 13 17:07:40.974366 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler   File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 108, in _trait_sync

  Due to this deadlock:

  oslo_db.exception.DBDeadlock: (pymysql.err.InternalError) (1213,
  'Deadlock found when trying to get lock; try restarting transaction')
  [SQL: 'INSERT INTO traits (created_at, name) VALUES (%(created_at)s,
  %(name)s)'] [parameters: ({'created_at': datetime.datetime(2017, 12,
  13, 17, 7, 40, 954357), 'name': 'HW_GPU_API_CUDA_V2_1'},
  {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954363),
  'name': 'HW_CPU_X86_TSX'}, {'created_at': datetime.datetime(2017, 12,
  13, 17, 7, 40, 954365), 'name': 'HW_CPU_X86_AVX512ER'}, {'created_at':
  datetime.datetime(2017, 12, 13, 17, 7, 40, 954367), 'name':
  'HW_NIC_OFFLOAD_GRO'}, {'created_at': datetime.datetime(2017, 12, 13,
  17, 7, 40, 954369), 'name': 'HW_GPU_API_DIRECT3D_V11_2'},
  {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954371),
  'name': 'HW_GPU_API_OPENGL_V4_4'}, {'created_at':
  datetime.datetime(2017, 12, 13, 17, 7, 40, 954373), 'name':
  'HW_GPU_API_CUDA_V1_2'}, {'created_at': datetime.datetime(2017, 12,
  13, 17, 7, 40, 954375), 'name': 'HW_CPU_X86_AVX512VL'}  ... displaying
  10 of 163 total bound parameter sets ...  {'created_at':
  datetime.datetime(2017, 12, 13, 17, 7, 40, 954663), 'name':
  'HW_NIC_OFFLOAD_FDF'}, {'created_at': datetime.datetime(2017, 12, 13,
  17, 7, 40, 954665), 'name': 'HW_GPU_API_OPENGL_V4_0'})]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1738083/+subscriptions


References