yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #69922
[Bug 1738083] Re: DBDeadlock when when syncing traits in Placement during list_allocation_candidates
Reviewed: https://review.openstack.org/527836
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c66ae65775bb9d885fac059847063fee70617bc5
Submitter: Zuul
Branch: master
commit c66ae65775bb9d885fac059847063fee70617bc5
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Wed Dec 13 21:22:32 2017 -0500
Retry _trait_sync on deadlock
We're seeing DBDeadlock failures during scheduling in CI jobs
when syncing traits when getting allocation candidates.
We have a lock around this code but that's not going to carry across
multiple processes, so we need to be able to retry on deadlock if
one occurs.
Change-Id: I6cf1793c1cbed18d850ec7e32b5b195e78cb4e68
Closes-Bug: #1738083
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1738083
Title:
DBDeadlock when when syncing traits in Placement during
list_allocation_candidates
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
In Progress
Bug description:
This killed a scheduling request so we resulted with a NoValidHost:
http://logs.openstack.org/64/527564/1/gate/legacy-tempest-dsvm-
py35/7db2d64/logs/screen-placement-api.txt.gz#_Dec_13_17_07_40_968321
It looks like it blows up here:
Dec 13 17:07:40.973678 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/api/openstack/placement/handlers/allocation_candidate.py", line 217, in list_allocation_candidates
Dec 13 17:07:40.973796 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler cands = rp_obj.AllocationCandidates.get_by_requests(context, requests)
Dec 13 17:07:40.973893 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 3182, in get_by_requests
Dec 13 17:07:40.973969 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler _ensure_trait_sync(context)
Dec 13 17:07:40.974045 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 135, in _ensure_trait_sync
Dec 13 17:07:40.974140 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler _trait_sync(ctx)
Dec 13 17:07:40.974218 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler File "/usr/local/lib/python3.5/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 984, in wrapper
Dec 13 17:07:40.974294 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler return fn(*args, **kwargs)
Dec 13 17:07:40.974366 ubuntu-xenial-citycloud-sto2-0001423712 devstack@placement-api.service[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 108, in _trait_sync
Due to this deadlock:
oslo_db.exception.DBDeadlock: (pymysql.err.InternalError) (1213,
'Deadlock found when trying to get lock; try restarting transaction')
[SQL: 'INSERT INTO traits (created_at, name) VALUES (%(created_at)s,
%(name)s)'] [parameters: ({'created_at': datetime.datetime(2017, 12,
13, 17, 7, 40, 954357), 'name': 'HW_GPU_API_CUDA_V2_1'},
{'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954363),
'name': 'HW_CPU_X86_TSX'}, {'created_at': datetime.datetime(2017, 12,
13, 17, 7, 40, 954365), 'name': 'HW_CPU_X86_AVX512ER'}, {'created_at':
datetime.datetime(2017, 12, 13, 17, 7, 40, 954367), 'name':
'HW_NIC_OFFLOAD_GRO'}, {'created_at': datetime.datetime(2017, 12, 13,
17, 7, 40, 954369), 'name': 'HW_GPU_API_DIRECT3D_V11_2'},
{'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954371),
'name': 'HW_GPU_API_OPENGL_V4_4'}, {'created_at':
datetime.datetime(2017, 12, 13, 17, 7, 40, 954373), 'name':
'HW_GPU_API_CUDA_V1_2'}, {'created_at': datetime.datetime(2017, 12,
13, 17, 7, 40, 954375), 'name': 'HW_CPU_X86_AVX512VL'} ... displaying
10 of 163 total bound parameter sets ... {'created_at':
datetime.datetime(2017, 12, 13, 17, 7, 40, 954663), 'name':
'HW_NIC_OFFLOAD_FDF'}, {'created_at': datetime.datetime(2017, 12, 13,
17, 7, 40, 954665), 'name': 'HW_GPU_API_OPENGL_V4_0'})]
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1738083/+subscriptions
References