yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1526559] [NEW] L3 agent parallel configuration of routers might slow things down

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Assaf Muller <amuller@xxxxxxxxxx>
Date: Tue, 15 Dec 2015 22:57:05 -0000
Reply-to: Bug 1526559 <1526559@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

In the L3 agent's _process_routers_loop method, it spawns a GreenPool
with 8 eventlet threads. Those threads then take updates off the agent's
queue and process router updates. Router updates are serialized by
router_id so that two threads don't process the same router at any given
time.

In an environment running on a powerful baremetal server, on agent
restart it was trying to sync roughly 600 routers. Around half were HA
routers, and half were legacy routers. With the default GreenPool size
of 8, the result was that the server ground to a halt as CPU usage
skyrocketed to over 600%. The main offenders were ip, bash, keepalived
and Python. This was on an environment without rootwrap daemon based off
stable/juno. It took around 60 seconds to configure a single router.
Changing the GreenPool size from 8 to 1, caused the agent to:

1) Configure a router in 30 seconds, a 50% improvement.
2) Reduce CPU load from 600% to 70%, freeing the machine to do other things.

I'm filing this bug so that:

1) Someone can confirm my personal experience in a more controlled way - For example, graph router configuration time and CPU load as a result of GreenPool size.
2) If my findings are confirmed on master with rootwrap daemon, start considering alternatives like multiprocessing instead of eventlet multithreading, or at the very least optimize the GreenPool size.

** Affects: neutron
Importance: Undecided
Status: New

** Tags: l3-ipam-dhcp loadimpact

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1526559

Title:
L3 agent parallel configuration of routers might slow things down

Status in neutron:
New

Bug description:
In the L3 agent's _process_routers_loop method, it spawns a GreenPool
with 8 eventlet threads. Those threads then take updates off the
agent's queue and process router updates. Router updates are
serialized by router_id so that two threads don't process the same
router at any given time.

In an environment running on a powerful baremetal server, on agent
restart it was trying to sync roughly 600 routers. Around half were HA
routers, and half were legacy routers. With the default GreenPool size
of 8, the result was that the server ground to a halt as CPU usage
skyrocketed to over 600%. The main offenders were ip, bash, keepalived
and Python. This was on an environment without rootwrap daemon based
off stable/juno. It took around 60 seconds to configure a single
router. Changing the GreenPool size from 8 to 1, caused the agent to:

1) Configure a router in 30 seconds, a 50% improvement.
2) Reduce CPU load from 600% to 70%, freeing the machine to do other things.

I'm filing this bug so that:

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1526559/+subscriptions

Follow ups

[Bug 1526559] Re: L3 agent parallel configuration of routers might slow things down
From: Carl Baldwin, 2016-01-04