yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #90964
[Bug 2002629] Re: devstack build in the gate fails with: ovnnb_db.sock: database connection failed
<< The search expression 'message:"ovnnb_db.sock: database connection
failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org
for the last 2 weeks.
I added some more filters and it gives 50 such results:-
https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!t,params:(query:tripleo-ci-centos-9-standalone-external-compute-target-host),type:phrase),query:(match_phrase:(build_name:tripleo-ci-centos-9-standalone-external-compute-target-host))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!t,params:(query:tobiko-tripleo-minimal),type:phrase),query:(match_phrase:(build_name:tobiko-tripleo-minimal))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-output.txt),type:phrase),query:(match_phrase:(filename:job-output.txt)))),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22ovnnb_db.sock:%20database%20connection%20failed%22%20AND%20build_status:FAILURE'),sort:!())
In past we have seen taking much time to start and available of db files for those increasing timeout helped https://review.opendev.org/c/openstack/devstack/+/848548.
But now it's little different issue where it takes time to stop and in that window(can be seen if the window is less than a second in below example) wait_for_sock_file returns true and moves forward and later connection to those .sock files fails as service is not restarted by that time.
2023-01-11 09:24:11.273593 | controller | + lib/neutron_plugins/ovn_agent:_start_process:239 : sudo systemctl restart ovn-central.service
2023-01-11 09:24:11.295863 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:711 : wait_for_sock_file /var/run/ovn/ovnnb_db.sock
2023-01-11 09:24:11.298605 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 : local count=0
2023-01-11 09:24:11.300757 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/ovn/ovnnb_db.sock ']'
2023-01-11 09:24:11.303155 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:712 : wait_for_sock_file /var/run/ovn/ovnsb_db.sock
2023-01-11 09:24:11.305826 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 : local count=0
2023-01-11 09:24:11.308367 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/ovn/ovnsb_db.sock ']'
2023-01-11 09:24:11.310862 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:713 : wait_for_db_file /var/lib/ovn/ovnnb_db.db
2023-01-11 09:24:11.313570 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
2023-01-11 09:24:11.316126 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-11 09:24:11.319726 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:177 : sleep 1
2023-01-11 09:24:12.323489 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:178 : count=1
2023-01-11 09:24:12.326545 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:179 : '[' 1 -gt 40 ']'
2023-01-11 09:24:12.328763 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-11 09:24:12.331382 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db
2023-01-11 09:24:12.333636 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
2023-01-11 09:24:12.336270 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
2023-01-11 09:24:12.339188 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy
2023-01-11 09:24:12.357535 | controller | + functions-common:is_service_enabled:2089 : return 0
2023-01-11 09:24:12.359893 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
2023-01-11 09:24:12.367707 | controller | ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
from service status:-
â— ovn-ovsdb-server-nb.service - Open vSwitch database server for OVN Northbound database
Loaded: loaded (/lib/systemd/system/ovn-ovsdb-server-nb.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-01-11 09:24:11 UTC; 47s ago
Main PID: 96196 (ovsdb-server)
Tasks: 1 (limit: 9475)
Memory: 2.9M
CGroup: /system.slice/ovn-ovsdb-server-nb.service
└─96196 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db
Jan 11 09:24:11 ubuntu-focal-inmotion-iad3-0032697188 systemd[1]: Started Open vSwitch database server for OVN Northbound database.
Jan 11 09:24:11 ubuntu-focal-inmotion-iad3-0032697188 ovn-ctl[96196]: * /var/lib/ovn/ovnnb_db.db does not exist
Jan 11 09:24:13 ubuntu-focal-inmotion-iad3-0032697188 ovn-ctl[96196]: * Creating empty database /var/lib/ovn/ovnnb_db.db
Jan 11 09:24:13 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
Jan 11 09:24:13 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.8
Jan 11 09:24:23 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00003|memory|INFO|7340 kB peak resident set size after 10.0 seconds
Jan 11 09:24:23 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00004|memory|INFO|cells:31 monitors:2 sessions:1
from an another ex https://zuul.openstack.org/build/b8761291279548239343257ae2e02fc7 it's more clear(service restart triggered at 10:45:46.189787, service stops at 10:45:49 and service ready at 10:46:05, wait for sock files succeeds before service stops 10:45:48):-
2023-01-12 10:45:46.189787 | controller | + lib/neutron_plugins/ovn_agent:_start_process:239 : sudo systemctl restart ovn-central.service
2023-01-12 10:45:48.159236 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:711 : wait_for_sock_file /var/run/ovn/ovnnb_db.sock
2023-01-12 10:45:48.161673 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 : local count=0
2023-01-12 10:45:48.166417 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/ovn/ovnnb_db.sock ']'
2023-01-12 10:45:48.167425 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:712 : wait_for_sock_file /var/run/ovn/ovnsb_db.sock
2023-01-12 10:45:48.169822 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 : local count=0
2023-01-12 10:45:48.172787 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/ovn/ovnsb_db.sock ']'
2023-01-12 10:45:48.175211 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:713 : wait_for_db_file /var/lib/ovn/ovnnb_db.db
2023-01-12 10:45:48.177991 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
2023-01-12 10:45:48.181047 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-12 10:45:48.183699 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:177 : sleep 1
2023-01-12 10:45:49.188939 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:178 : count=1
2023-01-12 10:45:49.191707 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:179 : '[' 1 -gt 40 ']'
2023-01-12 10:45:49.194073 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-12 10:45:49.196129 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:177 : sleep 1
2023-01-12 10:45:50.200890 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:178 : count=2
2023-01-12 10:45:50.203234 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:179 : '[' 2 -gt 40 ']'
2023-01-12 10:45:50.205212 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-12 10:45:50.207584 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db
2023-01-12 10:45:50.209988 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
2023-01-12 10:45:50.212481 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
2023-01-12 10:45:50.215127 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy
2023-01-12 10:45:50.232297 | controller | + functions-common:is_service_enabled:2089 : return 0
2023-01-12 10:45:50.234744 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
2023-01-12 10:45:50.244116 | controller | ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
â— ovn-ovsdb-server-nb.service - Open vSwitch database server for OVN Northbound database
Loaded: loaded (/lib/systemd/system/ovn-ovsdb-server-nb.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-01-12 10:45:49 UTC; 1min 4s ago
Main PID: 77511 (ovsdb-server)
Tasks: 1 (limit: 9247)
Memory: 2.8M
CGroup: /system.slice/ovn-ovsdb-server-nb.service
└─77511 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db
Jan 12 10:45:49 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 systemd[1]: ovn-ovsdb-server-nb.service: Succeeded.
Jan 12 10:45:49 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 systemd[1]: Stopped Open vSwitch database server for OVN Northbound database.
Jan 12 10:45:49 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 systemd[1]: Started Open vSwitch database server for OVN Northbound database.
Jan 12 10:45:50 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovn-ctl[77511]: * /var/lib/ovn/ovnnb_db.db does not exist
Jan 12 10:46:05 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovn-ctl[77511]: * Creating empty database /var/lib/ovn/ovnnb_db.db
Jan 12 10:46:05 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
Jan 12 10:46:05 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.8
Jan 12 10:46:15 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00003|memory|INFO|7272 kB peak resident set size after 10.0 seconds
Jan 12 10:46:15 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00004|memory|INFO|cells:31 monitors:2 sessions:1
Ensuring sock files are absent just like dbs should be able to handle this issue. Will push a patch to devstack.
** Changed in: neutron
Importance: Undecided => High
** Also affects: devstack
Importance: Undecided
Status: New
** Changed in: devstack
Assignee: (unassigned) => yatin (yatinkarel)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2002629
Title:
devstack build in the gate fails with: ovnnb_db.sock: database
connection failed
Status in devstack:
New
Status in neutron:
New
Bug description:
Recently we seem to have many the same devstack build failure in many
different gate jobs. The usual error message is:
+ lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
+ lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy
+ functions-common:is_service_enabled:2089 : return 0
+ lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
+ lib/neutron_plugins/ovn_agent:start_ovn:1 : exit_trap
A few example logs:
https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36
https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357
The search expression 'message:"ovnnb_db.sock: database connection
failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org
for the last 2 weeks.
To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/2002629/+subscriptions
References