yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #65609
[Bug 1701288] Re: In scale testing RPC timeout error seen in the ovs_neutron_agent when update_device_list is called with DVR routers
Reviewed: https://review.openstack.org/479091
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5be1ace5462994b492b811e20d8a90caac0fcffd
Submitter: Jenkins
Branch: master
commit 5be1ace5462994b492b811e20d8a90caac0fcffd
Author: Kevin Benton <kevin@xxxxxxxxxx>
Date: Thu Jun 29 15:46:21 2017 -0700
Use subqueryload in l2pop DB for binding ports
This adjusts the l2pop DB queries that leverage the port relationship
on the PortBinding and DistributedPortBinding models to ensure that the
port gets loaded from the DB as part of the initial query.
Without this the number of DB queries will increase with the number
of ports since the port wasn't being loaded until the 'port' attribute
was referenced on its corresponding binding object.
Closes-Bug: #1701288
Change-Id: I7c3b08d525b2c90100c9fe4efaee973cf3a076f3
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1701288
Title:
In scale testing RPC timeout error seen in the ovs_neutron_agent when
update_device_list is called with DVR routers
Status in neutron:
Fix Released
Bug description:
At large scale testing when trying to deploy around 8000 VMs with DVR
routers, we are seeing an RPC Timeout error in ovs_neutron_agent.
This RPC Timeout error occurs when the ovs_neutron_agent tries to bind the vif port.
On further analysis it seems that the update_port_status is taking a lot more time at large scale to return and so the ovs_neutron_agent timesout waiting for the message.
Looking into the update_port_status code path, after the port status
update occurs it calls the update_port_postcommit call. Since L2pop is
enabled by default with DVR, the update_port_postcommit calls
_create_agent_fdb entries for the agent, if this is the first port
associated with the agent.
In _create_agent_fdb it tries to retrieve all the PortInfo associated
with network and this DB call is very expensive and sometimes we have
seen it take upto to 3900s at some instances.
2017-06-15 17:48:30.651 9320 DEBUG neutron.agent.linux.utils [req-51df1df5-8a51-4679-938b-895545a225c2 - - - - -] Exit code: 0 execute /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/agent/linux/utils.py:146
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-ece15133-1294-46c0-b0b5-cab785d4314b - - - - -] Error while processing VIF ports
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2044, in rpc_loop
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent port_info, ovs_restarted)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return f(*args, **kwargs)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1648, in process_network_ports
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent failed_devices['added'] |= self._bind_devices(need_binding_devices)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 888, in _bind_devices
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.conf.host)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/agent/rpc.py", line 181, in update_device_list
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent agent_id=agent_id, host=host)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in call
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent time.sleep(wait)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.force_reraise()
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent six.reraise(self.type_, self.value, self.tb)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in call
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return self._original_context.call(ctxt, method, **kwargs)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=self.retry)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent timeout=timeout, retry=retry)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 464, in send
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 453, in _send
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = self._waiter.wait(msg_id, timeout)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in wait
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message = self.waiters.get(msg_id, timeout=timeout)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 238, in get
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message ID %s' % msg_id)
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 8369f051e1984c719f092939c328276c
2017-06-15 17:48:38.420 9320 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
2017-06-15 17:48:38.422 9320 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-ece15133-1294-46c0-b0b5-cab785d4314b - - - - -] Agent rpc_loop - iteration:206137 completed. Processed ports statistics: {'regular': {'updated': 20, 'added': 28, 'removed': 0}}. Elapsed:1299.347 loop_count_and_wait /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1776
2017-06-15 17:48:38.423 9320 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-ece15133-1294-46c0-b0b5-cab785d4314b - - - - -] Loop iteration exceeded interval (2 vs. 1299.34698987)! loop_count_and_wait /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1783
2017-06-22 22:36:34.697 52398 DEBUG neutron.plugins.ml2.rpc [req-119aa871-80bc-4884-a753-37ba1bfd5b6d - - - - -] #####DVR port status update - start : <neutron.db.models_v2.Port[object at 7f10138e6990] {project_id=u'4f763f5b5eaa4568ad05db73230d1134', id=u'fb99bf73-261c-41a2-8660-4fb4e8947c19', name=u'', network_id=u'4c06681d-307f-45a9-9ad7-ac53f1a93051', mac_address=u'fa:16:3e:b0:df:9d', admin_state_up=True, status=u'ACTIVE', device_id=u'350135dd-811e-4299-ba75-12558137443e', device_owner=u'network:router_interface_distributed', ip_allocation=u'immediate', standard_attr_id=18987L}> update_port_status_to_active /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py:247
2017-06-22 22:40:50.960 52398 DEBUG neutron.plugins.ml2.rpc [req-119aa871-80bc-4884-a753-37ba1bfd5b6d - - - - -] #####DVR port status update - end : <neutron.db.models_v2.Port[object at 7f10138e6990] {project_id=u'4f763f5b5eaa4568ad05db73230d1134', id=u'fb99bf73-261c-41a2-8660-4fb4e8947c19', name=u'', network_id=u'4c06681d-307f-45a9-9ad7-ac53f1a93051', mac_address=u'fa:16:3e:b0:df:9d', admin_state_up=True, status=u'ACTIVE', device_id=u'350135dd-811e-4299-ba75-12558137443e', device_owner=u'network:router_interface_distributed', ip_allocation=u'immediate', standard_attr_id=18987L}> update_port_status_to_active /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py:250
2017-06-21 04:07:52.276 11142 DEBUG neutron.plugins.ml2.rpc [req-9abbbc61-53c8-47a4-acf9-8c9e4fd9ee52 - - - - -] #####Returning Update device list: {'devices_up': [u'bde81150-3eee-442d-aa62-e77790c4ecbf', u'403318ce-de3b-4197-95b1-ce740a9dfaee', u'685cf936-9e3b-4d35-b69d-3f1ad14593b7', u'4d7da5b7-a8d0-4603-ac88-fe88c610536f', u'c5b594b0-351b-45d0-9618-a7cb2ae916e7', u'cfb1caa9-2686-4b99-8e24-44a3b3803656', u'b1b645c9-69f5-4caf-a2dd-af967380f06e', u'2ac323c4-5ff1-493b-9e79-65afe3268026', u'f232c17c-67a3-4864-bfc0-b5ed34a1ed16', u'99e1c12b-5233-4fb1-9d35-2f6aa6cb5b8a', u'6cb3816a-00e1-48c7-b61f-48e42a8f0ee3', u'e0186cc1-d06a-4c51-ac73-e491678752e9', u'2ace823c-a1cc-4d5e-affb-b51bed76dfd2', u'f537bb9e-f598-4b6e-bc4e-ceb24c59d890', u'5800d3d4-0668-4d97-8dac-a65bcb3bc076', u'df23fe8d-9fa4-4a67-9bdc-ed1d7321ebc1', u'fb492060-776d-4221-be95-89fbe91f5451', u'bf0f5100-5f01-4fa9-b9f9-4d0fe696f5cf', u'97ff79a5-c5ea-47cd-886b-278a0b0a7ca7', u'219cbd72-a267-48d4-9e5b-592acfa8f97a', u'55eee229-352f-4341-b429-a0eef8d89117', u'6e998280-b254-421b-bb9e-3d6431904839', u'eb5957f0-48fd-4ff4-b696-70ba4bf7f5f0', u'294295cd-b329-41a9-9820-64cb3e9a6aec', u'7fa9e9c6-bb36-4e50-81d2-551a29135c51', u'5268db05-7b12-4ac9-a9bd-ec689febbb9b', u'7a86ec2a-3697-4df5-85a1-201d417a929b', u'e7a450c9-095f-4b50-8ef4-f34d08448dc9', u'22703bfb-720b-4d93-b121-91cfa1261958', u'551020a7-3940-4a85-8102-f6313dca4ca3', u'b880444d-b2dc-4e94-a444-1bfc560d771f', u'8fcbb263-7ef4-4334-8a04-6eba2d479139', u'd775ff6d-4bf7-4358-a79c-4d468989b770', u'aeb92cd1-a34a-4e02-b732-7335e1b13c06', u'00350caf-c548-4688-bf21-cdfc4cd6297a', u'88dbf75e-0d56-4133-8767-eb7560310958', u'009a64f1-93fe-4b21-a370-74c1b0b91216', u'ecb74975-5eb8-4f19-9766-5f06fe1ac8f4', u'44051055-967f-4557-93dd-7b8056b49f9c', u'3484a698-5c38-4410-8e30-96c86fa07a00', u'82f39687-01a0-4952-99db-0c738998f943', u'26039aec-6161-44d0-b621-1ef222be3b84', u'0b7dd388-82a9-42f6-a067-ad476f06d908', u'700ca67e-84c1-4b05-9dcf-443fd56568bc', u'c26df1c2-5fb2-446b-aba9-368f49bd05d3', u'e5669474-bf8a-48b9-9fdf-f2b3ef0d6c34', u'16eb1389-6193-45da-b494-b70a87045603', u'68a93692-fe8a-48a4-a495-62f537ca4948', u'baf6a7ff-5545-48cc-89f2-974a89c33f45', u'fb99bf73-261c-41a2-8660-4fb4e8947c19', u'22d872db-3555-46ab-9296-5fc230fe498b', u'f6f69fce-586e-4193-b65c-be2b5e675ae6', u'be27c341-9ece-4180-b42c-322e55b3f945', u'68d0ce84-0d28-4105-b979-8492742252a8', u'6ea10c6e-0f53-4473-9615-d32d032ce11c', u'1d96e2c6-290f-4733-a66d-0fd5583a9aa8', u'1374ca1f-15ec-4326-8177-79e960e18083', u'ead2c860-cf40-4b45-b8ae-0d35be23b8fe', u'47d8b420-8594-4be0-b170-7fd5c411bda5', u'c158efaf-186c-496d-901d-0736ebf205f1', u'42800fd7-5d4e-47f3-82a6-a4f3ad4e205a', u'4ce41e1c-a7d3-4bba-add4-b6e4d1cccf7f', u'3b9c1701-3bec-46ad-a4b8-53c7bb0cc881'], 'failed_devices_up': [], 'failed_devices_down': [], 'devices_down': []} update_device_list /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py:319
2017-06-21 04:07:52.278 11142 DEBUG oslo_messaging._drivers.amqpdriver [req-9abbbc61-53c8-47a4-acf9-8c9e4fd9ee52 - - - - -] sending reply msg_id: c8a67ae66d084aeb8522d2d3a8f37cd5 reply queue: reply_91649f5a873b420d91d1a5fb4d0bd29e time elapsed: 3911.77179416s _send_reply /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:73
2017-06-21 02:14:09.991 11151 DEBUG neutron.plugins.ml2.rpc [req-
9abbbc61-53c8-47a4-acf9-8c9e4fd9ee52 - - - - -] #####Returning Update
device list: {'devices_up': [u'e0186cc1-d06a-4c51-ac73-e491678752e9',
u'4d7da5b7-a8d0-4603-ac88-fe88c610536f', u'5268db05-7b12-4ac9-a9bd-
ec689febbb9b', u'ecb74975-5eb8-4f19-9766-5f06fe1ac8f4',
u'cfb1caa9-2686-4b99-8e24-44a3b3803656', u'82f39687-01a0-4952-99db-
0c738998f943', u'2ac323c4-5ff1-493b-9e79-65afe3268026',
u'26039aec-6161-44d0-b621-1ef222be3b84',
u'0b7dd388-82a9-42f6-a067-ad476f06d908', u'6cb3816a-00e1-48c7-b61f-
48e42a8f0ee3', u'c26df1c2-5fb2-446b-aba9-368f49bd05d3', u'e5669474
-bf8a-48b9-9fdf-f2b3ef0d6c34', u'403318ce-
de3b-4197-95b1-ce740a9dfaee', u'68a93692-fe8a-48a4-a495-62f537ca4948',
u'baf6a7ff-5545-48cc-89f2-974a89c33f45', u'fb99bf73-261c-
41a2-8660-4fb4e8947c19', u'22d872db-3555-46ab-9296-5fc230fe498b',
u'f537bb9e-f598-4b6e-bc4e-ceb24c59d890', u'f6f69fce-586e-4193-b65c-
be2b5e675ae6', u'fb492060-776d-4221-be95-89fbe91f5451',
u'68d0ce84-0d28-4105-b979-8492742252a8', u'ead2c860-cf40-4b45-b8ae-
0d35be23b8fe', u'eb5957f0-48fd-4ff4-b696-70ba4bf7f5f0',
u'294295cd-b329-41a9-9820-64cb3e9a6aec',
u'7fa9e9c6-bb36-4e50-81d2-551a29135c51', u'009a64f1-93fe-
4b21-a370-74c1b0b91216', u'7a86ec2a-3697-4df5-85a1-201d417a929b',
u'e7a450c9-095f-4b50-8ef4-f34d08448dc9', u'42800fd7-5d4e-
47f3-82a6-a4f3ad4e205a', u'22703bfb-720b-4d93-b121-91cfa1261958',
u'551020a7-3940-4a85-8102-f6313dca4ca3', u'be27c341-9ece-4180-b42c-
322e55b3f945', u'd775ff6d-4bf7-4358-a79c-4d468989b770'],
'failed_devices_up': [], 'failed_devices_down': [], 'devices_down':
[]} update_device_list /opt/stack/venv/neutron-
20170421T033908Z/lib/python2.7/site-
packages/neutron/plugins/ml2/rpc.py:319
2017-06-21 02:14:09.992 11151 DEBUG oslo_messaging._drivers.amqpdriver
[req-9abbbc61-53c8-47a4-acf9-8c9e4fd9ee52 - - - - -] sending reply
msg_id: 3a5f717295264205aa1fc468eb36847c reply queue:
reply_91649f5a873b420d91d1a5fb4d0bd29e time elapsed: 2289.6716877s
_send_reply /opt/stack/venv/neutron-20170421T033908Z/lib/python2.7
/site-packages/oslo_messaging/_drivers/amqpdriver.py:73
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1701288/+subscriptions
References