← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1965140] Re: Eventlet fails when starting network agents

 

Based on the information at hand, I don't think this is a bug. The
upstream Neutron team tests the linuxbridge agent with every patch that
is submitted for the stable/xena branch:
https://review.opendev.org/q/project:openstack%252Fneutron+branch:stable%252Fxena+status:open
. One example is
https://review.opendev.org/c/openstack/neutron/+/833857. Note that we
have a job named neutron-tempest-plugin-scenario-linuxbridge-xena.
Here's the successful execution of that job for the aforementioned patch
https://zuul.opendev.org/t/openstack/build/1ee2736adf46480ea9d48f125b5cd229
and the corresponding linuxbridge agent log
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1ee/833857/1/check/neutron-
tempest-plugin-scenario-linuxbridge-xena/1ee2736/controller/logs/screen-
q-agt.txt . As you can see, that agent started running today with no
problems: "Logs begin at Wed 2022-03-16 10:05:56 UTC".

As far of thoughts on what may be causing your problem, I lean towards
an installation / dependencies issue:

1) Is this a fresh new install or did it run successfully before and has
started failing recently? If the latter, what changed?

2) Your logging seem a little odd. Note that the third line in the
upstream log is: Mar 16 10:19:43.410735 nested-virt-ubuntu-focal-
vexxhost-ca-ymq-1-0028853094 neutron-linuxbridge-agent[71074]: INFO
neutron.common.config [-] Logging enabled!. In your case, logging
doesn't seem to be enabled at all. The second thing the agent does is
setup logging:
https://github.com/openstack/neutron/blob/2f4661c87681567bb08d7733c723c2b0c31ed6c8/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L1017

3) Next, the agent sets up privsep (to be able to execute commands with
privileges):
https://github.com/openstack/neutron/blob/2f4661c87681567bb08d7733c723c2b0c31ed6c8/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L1018.
It seems this is where your agent fails, because we see in your log
twice:

Mar 16 09:14:09 w3.int.lunarc sudo[37085]:  neutron : TTY=unknown ;
PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap
/etc/neutron/rootwrap.conf privsep-helper --config-file
/usr/share/neutron/neutron-dist.conf --config-file
/etc/neutron/neutron.conf --config-file
/etc/neutron/plugins/ml2/linuxbridge_agent.ini --config-dir
/etc/neutron/conf.d/neutron-linuxbridge-agent --privsep_context
neutron.privileged.default --privsep_sock_path
/tmp/tmp93tzwqg3/privsep.sock

and then the traceback. Comparing with the upstream agent, we see that
the privsep daemon starts running successfully:

Mar 16 10:19:43.421615 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 sudo[71765]:    stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf privsep-helper --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --privsep_context neutron.privileged.default --privsep_sock_path /tmp/tmppedn6sui/privsep.sock
Mar 16 10:19:43.421948 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 sudo[71765]: pam_unix(sudo:session): session opened for user root by (uid=0)
Mar 16 10:19:43.898488 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 sudo[71765]: pam_unix(sudo:session): session closed for user root
Mar 16 10:19:43.908480 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 neutron-linuxbridge-agent[71074]: INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
Mar 16 10:19:43.909023 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 neutron-linuxbridge-agent[71074]: DEBUG oslo.privsep.daemon [-] Accepted privsep connection to /tmp/tmppedn6sui/privsep.sock {{(pid=71074) __init__ /usr/local/lib/python3.8/dist-packages/oslo_privsep/daemon.py:363}}
Mar 16 10:19:43.909722 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 neutron-linuxbridge-agent[71074]: INFO oslo.privsep.daemon [-] privsep daemon starting
Mar 16 10:19:43.909962 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 neutron-linuxbridge-agent[71074]: INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
Mar 16 10:19:43.910155 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 neutron-linuxbridge-agent[71074]: INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN|CAP_SYS_PTRACE/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN|CAP_SYS_PTRACE/none
Mar 16 10:19:43.910337 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0028853094 neutron-linuxbridge-agent[71074]: INFO oslo.privsep.daemon [-] privsep daemon running as pid 71807

I would compare the config files passed to the rootwrap commands above.
What do you have for the root_helper option:
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1ee/833857/1/check/neutron-
tempest-plugin-scenario-linuxbridge-
xena/1ee2736/controller/logs/etc/neutron/plugins/ml2/ml2_conf.ini

Here's a little explanation from the greenlet docs on the specific error
you get at the bottom of the traceback:
https://greenlet.readthedocs.io/en/stable/python_threads.html

For now, I'm going to mark this as a non bug, unless you provide more
information to indicate otherwise


** Changed in: neutron
       Status: New => Invalid

** Changed in: neutron
       Status: Invalid => Incomplete

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965140

Title:
  Eventlet fails when starting network agents

Status in neutron:
  Incomplete

Bug description:
  I have a two nodes openstack setup, where one node (w3) runs all
  controller services (keystone, glance, placement, nova, neutron,
  horizon, cinder) as well as nova-compute and cinder-volume and the
  second (w6) runs nova-compute and linuxbridge agent.

  All network agents on w3 are dead
  [root@w3 ~]# openstack network agent list
  +--------------------------------------+--------------------+---------------+-------------------+-------+-------+---------------------------+
  | ID                                   | Agent Type         | Host          | Availability Zone | Alive | State | Binary                    |
  +--------------------------------------+--------------------+---------------+-------------------+-------+-------+---------------------------+
  | 330269b7-b73c-4207-abc7-21f1a2972b7b | Linux bridge agent | w6.int.lunarc | None              | :-)   | UP    | neutron-linuxbridge-agent |
  | 83d16241-8a3a-42b0-beda-87246d945dc1 | L3 agent           | w3.int.lunarc | nova              | XXX   | UP    | neutron-l3-agent          |
  | a52ab60f-d893-491d-a43e-823a0d482810 | Linux bridge agent | w3.int.lunarc | None              | XXX   | UP    | neutron-linuxbridge-agent |
  | abd75644-d895-41ae-94fa-6c4351cbc4bf | Metadata agent     | w3.int.lunarc | None              | XXX   | UP    | neutron-metadata-agent    |
  | c05c65bc-779e-4fe5-a19e-350c44900be4 | DHCP agent         | w3.int.lunarc | nova              | XXX   | UP    | neutron-dhcp-agent        |
  +--------------------------------------+--------------------+---------------+-------------------+-------+-------+---------------------------+

  , and I cannot start them anymore. I tried restarting said agent
  alone, restarting all openstack daemon on w3 and even restarting the
  whole node but nothing seems to help and I always have teh same issue
  and the same trace as show below.

  I could not find any useful info in the logs, but systemd does report an issue with eventlet/greenlet:
  [root@w3 ~]# journalctl -fu neutron-linuxbridge-agent
  -- Logs begin at Wed 2022-03-16 04:32:31 EDT. --
  Mar 16 09:14:09 w3.int.lunarc sudo[37085]:  neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap /etc/neutron/rootwrap.conf privsep-helper --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/linuxbridge_agent.ini --config-dir /etc/neutron/conf.d/neutron-linuxbridge-agent --privsep_context neutron.privileged.default --privsep_sock_path /tmp/tmp93tzwqg3/privsep.sock
  Mar 16 09:14:12 w3.int.lunarc sudo[37107]:  neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap /etc/neutron/rootwrap.conf privsep-helper --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/linuxbridge_agent.ini --config-dir /etc/neutron/conf.d/neutron-linuxbridge-agent --privsep_context neutron.privileged.link_cmd --privsep_sock_path /tmp/tmp81iy5eni/privsep.sock
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]: Traceback (most recent call last):
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]:   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]:     timer()
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]:   File "/usr/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in __call__
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]:     cb(*args, **kw)
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]:   File "/usr/lib/python3.6/site-packages/eventlet/semaphore.py", line 152, in _do_acquire
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]:     waiter.switch()
  Mar 16 09:14:15 w3.int.lunarc neutron-linuxbridge-agent[37073]: greenlet.error: cannot switch to a different thread

  I am running OpenStack Xena on CentOS Stream 8 freshly installed. Here are other details:
  [root@w3 ~]# uname -a
  Linux w3.int.lunarc 4.18.0-365.el8.x86_64 #1 SMP Thu Feb 10 16:11:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  Any clue on how I can find out what makes this happen, or just how I
  can get past this crippling greenlet/eventlet error, and get these
  agents to run again?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1965140/+subscriptions



References