← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1824911] Re: [scale issue] the bottleneck lock will multiply increase processing time of agent resources

 

Reviewed:  https://review.opendev.org/656164
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0f471a47c073eb0cf2ed68c30482e1ae71ff6927
Submitter: Zuul
Branch:    master

commit 0f471a47c073eb0cf2ed68c30482e1ae71ff6927
Author: LIU Yulong <i@xxxxxxxxxxxx>
Date:   Fri Apr 26 15:42:32 2019 +0800

    Async notify neutron-server for HA states
    
    RPC notifier method can sometimes be time-consuming,
    this will cause other parallel processing resources
    fail to send notifications in time. This patch changes
    the notify to asynchronous.
    
    Closes-Bug: #1824911
    Change-Id: I3f555a0c78fbc02d8214f12b62c37d140bc71da1


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1824911

Title:
  [scale issue] the bottleneck lock will multiply increase processing
  time of agent resources

Status in neutron:
  Fix Released

Bug description:
  Env: stable/queens
  CentOS 7.5
  kernel-3.10.0-862.11.6.el7.x86_64

  There are many bottleneck locks in the agent extensions. For instance, l3 agent extensions now have lock 'qos-fip' [1], 'qos-gateway-ip' [2], 'port-forwarding' [3], 'log-port' [4]. For L2 agent, it is 'qos-port' lock [5].
  For these agent extensions when a large number of resources need to be processed in parallel, the processing time may get longer and longer. Let's take the l3 agent extension as the example. Because, firstly, the more time for one router processing time, the more lock waiting time for others. Then, if every router have a large set of resource need to be done in the extension (for example, floating IP QoS tc rules), every router processing will hold the lock for a bit long time, and more waiting time for others then. It shows a trend of multiple growth.

  [1] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/extensions/qos/fip.py#L283
  [2] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/extensions/qos/gateway_ip.py#L84
  [3] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/extensions/port_forwarding.py#L426
  [4] https://github.com/openstack/neutron/blob/master/neutron/services/logapi/agent/l3/base.py#L96
  [5] https://github.com/openstack/neutron/blob/master/neutron/agent/l2/extensions/qos.py#L241

  Here are some Logs from L3 agent, you may see there are some
  'add_router' which hold the lock for 16.271s and another wait the lock
  32.547s.

  2019-04-12 09:38:31.526 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" released by "neutron.notifiers.batch_notifier.synced_send" :: held 9.697s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:38:31.526 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 12.753s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:38:35.435 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" released by "neutron.notifiers.batch_notifier.synced_send" :: held 3.909s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:38:35.435 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 16.216s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:38:37.435 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" released by "neutron.notifiers.batch_notifier.synced_send" :: held 2.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:38:37.436 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 16.639s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:38:39.436 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" released by "neutron.notifiers.batch_notifier.synced_send" :: held 2.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:38:39.436 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 16.376s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:38:41.437 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" released by "neutron.notifiers.batch_notifier.synced_send" :: held 2.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:38:41.437 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 15.856s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:38:43.437 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" released by "neutron.notifiers.batch_notifier.synced_send" :: held 2.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285

  2019-04-12 09:40:49.268 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" released by "neutron.agent.l3.extensions.qos.fip.add_router" :: held 1.797s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:03.908 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" released by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: held 14.640s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:03.909 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" acquired by "neutron.agent.l3.extensions.port_forwarding.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:03.909 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" released by "neutron.agent.l3.extensions.port_forwarding.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:11.143 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" released by "neutron.agent.l3.extensions.qos.fip.add_router" :: held 18.025s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:11.144 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" acquired by "neutron.agent.l3.extensions.qos.fip.add_router" :: waited 14.209s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:15.428 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" released by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: held 4.285s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:20.867 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" released by "neutron.agent.l3.extensions.qos.fip.add_router" :: held 9.723s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:20.867 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" acquired by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:20.868 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" acquired by "neutron.agent.l3.extensions.qos.fip.add_router" :: waited 22.125s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:21.768 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" released by "neutron.agent.l3.extensions.qos.fip.add_router" :: held 0.900s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:21.768 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" acquired by "neutron.agent.l3.extensions.qos.fip.add_router" :: waited 18.762s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:24.712 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:24.713 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:27.642 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" released by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: held 6.775s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:27.642 40627 DEBUG oslo_concurrency.lockutils [-] Lock "floatingip_snat" acquired by "neutron.agent.l3.extensions.floatingip_snat.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:27.642 40627 DEBUG oslo_concurrency.lockutils [-] Lock "floatingip_snat" released by "neutron.agent.l3.extensions.floatingip_snat.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:27.643 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" acquired by "neutron.agent.l3.extensions.port_forwarding.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:27.643 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" released by "neutron.agent.l3.extensions.port_forwarding.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:27.643 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" acquired by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: waited 5.876s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:37.360 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" released by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: held 9.716s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:37.360 40627 DEBUG oslo_concurrency.lockutils [-] Lock "floatingip_snat" acquired by "neutron.agent.l3.extensions.floatingip_snat.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:37.360 40627 DEBUG oslo_concurrency.lockutils [-] Lock "floatingip_snat" released by "neutron.agent.l3.extensions.floatingip_snat.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:37.361 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" acquired by "neutron.agent.l3.extensions.port_forwarding.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:37.361 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" released by "neutron.agent.l3.extensions.port_forwarding.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:38.039 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" released by "neutron.agent.l3.extensions.qos.fip.add_router" :: held 16.271s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:38.039 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" acquired by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:38.040 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" acquired by "neutron.agent.l3.extensions.qos.fip.add_router" :: waited 32.547s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:41.661 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:41.662 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:42.328 40627 DEBUG oslo_concurrency.lockutils [-] Lock "enable_radvd" acquired by "neutron.agent.l3.ha_router.enable_radvd" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:42.331 40627 DEBUG oslo_concurrency.lockutils [-] Lock "enable_radvd" released by "neutron.agent.l3.ha_router.enable_radvd" :: held 0.003s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:42.331 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.process_ha_state" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:42.331 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.process_ha_state" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:42.332 40627 DEBUG oslo_concurrency.lockutils [-] Lock "resize_greenpool" acquired by "neutron.agent.l3.agent._resize_process_pool" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:41:42.333 40627 DEBUG oslo_concurrency.lockutils [-] Lock "resize_greenpool" released by "neutron.agent.l3.agent._resize_process_pool" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:41:42.333 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils
  .py:273
  2019-04-12 09:41:47.155 40627 DEBUG oslo_concurrency.lockutils [-] Lock "notifier-58a2b315-d411-4aa0-bb23-7c0da0b57a70" released by "neutron.notifiers.batch_notifier.synced_send" :: held 4.821s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.p
  y:285

  2019-04-12 09:55:15.600 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" released by "neutron.agent.l3.extensions.qos.fip.add_router" :: held 65.798s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:55:15.600 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" acquired by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:55:15.601 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-fip" acquired by "neutron.agent.l3.extensions.qos.fip.add_router" :: waited 307.347s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:55:51.892 40627 DEBUG oslo_concurrency.lockutils [-] Lock "qos-gateway-ip" released by "neutron.agent.l3.extensions.qos.gateway_ip.add_router" :: held 36.292s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:55:51.892 40627 DEBUG oslo_concurrency.lockutils [-] Lock "floatingip_snat" acquired by "neutron.agent.l3.extensions.floatingip_snat.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:55:51.892 40627 DEBUG oslo_concurrency.lockutils [-] Lock "floatingip_snat" released by "neutron.agent.l3.extensions.floatingip_snat.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:55:51.892 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" acquired by "neutron.agent.l3.extensions.port_forwarding.add_router" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:55:51.893 40627 DEBUG oslo_concurrency.lockutils [-] Lock "port-forwarding" released by "neutron.agent.l3.extensions.port_forwarding.add_router" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:56:00.703 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.process_ha_state" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:56:00.703 40627 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.process_ha_state" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2019-04-12 09:56:00.704 40627 DEBUG oslo_concurrency.lockutils [-] Lock "resize_greenpool" acquired by "neutron.agent.l3.agent._resize_process_pool" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2019-04-12 09:56:00.705 40627 DEBUG oslo_concurrency.lockutils [-] Lock "resize_greenpool" released by "neutron.agent.l3.agent._resize_process_pool" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285

  Potential solution:
  Minimizing agent extension lock granularity.
  So the code basiclly are same as master branch, IMO, we should refactor these lock mechanism. Find out the real critical resource, and only lock for it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1824911/+subscriptions


References