← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1425759] [NEW] L3 advanced services singleton pattern messing up L3 agent functional testing

 

Public bug reported:

The idea behind the AdvancedServices (Metadata, *aaS) services to be singletons was to (1) offer a way to get an instance of such a service in a convenient manner, and to (2) ensure that only one service registered per service type.
    
(1) Since the AdvancedService.instance method required an instance of a L3 agent, this point was kind of missed. If you have a L3 agent instance already in your hand, just use it to access the service you're looking for.
    
(2) This is now fulfilled by asserting that only one service is registered (per type) in the .add method.
    
The motivation to make them non-singletons is that:
a) They don't need to be
b) The code is simplified this way
c) I've been facing crazy issues during functional testing where we're constantly instantiating L3 agents, and the (singleton) services were referencing the wrong configuration object. Basically the service was re-initialized during a test, screwing up the configuration of other mid-run tests.

Bug triage (Some of this is my *guess* as to what's wrong, other parts are facts):
tox -e dsvm-functional neutron.tests.functional.agent.test_l3_agent by default spawns a test runner per CPU on the machine it is running on. So, for 4 CPUs, 4 workers that work in parallel. Each worker (a process) shares memory space so that while each test registers its own agent, they all share the same AdvancedServices (Specifically the metadata driver advanced service). The metadata advanced service has a reference back to the configuration of the L3 agent that started it (Namely for the different directory paths like PIDs). The first test initializes the (single) metadata advanced service instance and sets up its configuration to that of the test's L3 agent. If it finishes before the next test initializes its agent, fantastic, as (With a horrible hack by myself: https://review.openstack.org/#/c/143300/2/neutron/tests/common/agents/l3_agent.py) each initialization of the test L3 agent re-creates all of the registered AdvancedServices, thereby pointing them to that new agent. However, if this happens *during* the run of another test (In the same worker; I don't know how this could happen as tests seem to run serially in a single worker) Bad Things Happen (TM). Specifically, the PIDs directory path is now different than when it was when the metadata proxy was spawned and so when you assert that the metadata proxy is up, it's looking at the wrong path for PIDs that don't exist (But do exist elsewhere on the disk) and concludes that no, the metadata proxy is not up. What is absolutely true is that I observed the metadata advanced driver starting the proxy under one directory path and then the test looking if its up in another directory path. The solution to this mess is have the AdvancedServices not be singleton. Each agent instantiates its AdvancedServices and keeps references to them.

** Affects: neutron
     Importance: Undecided
     Assignee: Assaf Muller (amuller)
         Status: In Progress

** Changed in: neutron
       Status: New => In Progress

** Changed in: neutron
     Assignee: (unassigned) => Assaf Muller (amuller)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1425759

Title:
  L3 advanced services singleton pattern messing up L3 agent functional
  testing

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  The idea behind the AdvancedServices (Metadata, *aaS) services to be singletons was to (1) offer a way to get an instance of such a service in a convenient manner, and to (2) ensure that only one service registered per service type.
      
  (1) Since the AdvancedService.instance method required an instance of a L3 agent, this point was kind of missed. If you have a L3 agent instance already in your hand, just use it to access the service you're looking for.
      
  (2) This is now fulfilled by asserting that only one service is registered (per type) in the .add method.
      
  The motivation to make them non-singletons is that:
  a) They don't need to be
  b) The code is simplified this way
  c) I've been facing crazy issues during functional testing where we're constantly instantiating L3 agents, and the (singleton) services were referencing the wrong configuration object. Basically the service was re-initialized during a test, screwing up the configuration of other mid-run tests.

  Bug triage (Some of this is my *guess* as to what's wrong, other parts are facts):
  tox -e dsvm-functional neutron.tests.functional.agent.test_l3_agent by default spawns a test runner per CPU on the machine it is running on. So, for 4 CPUs, 4 workers that work in parallel. Each worker (a process) shares memory space so that while each test registers its own agent, they all share the same AdvancedServices (Specifically the metadata driver advanced service). The metadata advanced service has a reference back to the configuration of the L3 agent that started it (Namely for the different directory paths like PIDs). The first test initializes the (single) metadata advanced service instance and sets up its configuration to that of the test's L3 agent. If it finishes before the next test initializes its agent, fantastic, as (With a horrible hack by myself: https://review.openstack.org/#/c/143300/2/neutron/tests/common/agents/l3_agent.py) each initialization of the test L3 agent re-creates all of the registered AdvancedServices, thereby pointing them to that new agent. However, if this happens *during* the run of another test (In the same worker; I don't know how this could happen as tests seem to run serially in a single worker) Bad Things Happen (TM). Specifically, the PIDs directory path is now different than when it was when the metadata proxy was spawned and so when you assert that the metadata proxy is up, it's looking at the wrong path for PIDs that don't exist (But do exist elsewhere on the disk) and concludes that no, the metadata proxy is not up. What is absolutely true is that I observed the metadata advanced driver starting the proxy under one directory path and then the test looking if its up in another directory path. The solution to this mess is have the AdvancedServices not be singleton. Each agent instantiates its AdvancedServices and keeps references to them.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1425759/+subscriptions


Follow ups

References