← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2002629] Re: devstack build in the gate fails with: ovnnb_db.sock: database connection failed

 

<< The search expression 'message:"ovnnb_db.sock: database connection
failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org
for the last 2 weeks.

I added some more filters and it gives 50 such results:-
https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!t,params:(query:tripleo-ci-centos-9-standalone-external-compute-target-host),type:phrase),query:(match_phrase:(build_name:tripleo-ci-centos-9-standalone-external-compute-target-host))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!t,params:(query:tobiko-tripleo-minimal),type:phrase),query:(match_phrase:(build_name:tobiko-tripleo-minimal))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-output.txt),type:phrase),query:(match_phrase:(filename:job-output.txt)))),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22ovnnb_db.sock:%20database%20connection%20failed%22%20AND%20build_status:FAILURE'),sort:!())


In past we have seen taking much time to start and available of db files for those increasing timeout helped https://review.opendev.org/c/openstack/devstack/+/848548.
But now it's little different issue where it takes time to stop and in that window(can be seen if the window is less than a second in below example) wait_for_sock_file returns true and moves forward and later connection to those .sock files fails as service is not restarted by that time.


2023-01-11 09:24:11.273593 | controller | + lib/neutron_plugins/ovn_agent:_start_process:239 :   sudo systemctl restart ovn-central.service
2023-01-11 09:24:11.295863 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:711 :   wait_for_sock_file /var/run/ovn/ovnnb_db.sock
2023-01-11 09:24:11.298605 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 :   local count=0
2023-01-11 09:24:11.300757 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 :   '[' '!' -S /var/run/ovn/ovnnb_db.sock ']'
2023-01-11 09:24:11.303155 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:712 :   wait_for_sock_file /var/run/ovn/ovnsb_db.sock
2023-01-11 09:24:11.305826 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 :   local count=0
2023-01-11 09:24:11.308367 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 :   '[' '!' -S /var/run/ovn/ovnsb_db.sock ']'
2023-01-11 09:24:11.310862 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:713 :   wait_for_db_file /var/lib/ovn/ovnnb_db.db
2023-01-11 09:24:11.313570 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
2023-01-11 09:24:11.316126 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-11 09:24:11.319726 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:177 :   sleep 1
2023-01-11 09:24:12.323489 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:178 :   count=1
2023-01-11 09:24:12.326545 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:179 :   '[' 1 -gt 40 ']'
2023-01-11 09:24:12.328763 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-11 09:24:12.331382 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:714 :   wait_for_db_file /var/lib/ovn/ovnsb_db.db
2023-01-11 09:24:12.333636 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
2023-01-11 09:24:12.336270 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
2023-01-11 09:24:12.339188 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:716 :   is_service_enabled tls-proxy
2023-01-11 09:24:12.357535 | controller | + functions-common:is_service_enabled:2089 :   return 0
2023-01-11 09:24:12.359893 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:717 :   sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
2023-01-11 09:24:12.367707 | controller | ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)


from service status:-
● ovn-ovsdb-server-nb.service - Open vSwitch database server for OVN Northbound database
     Loaded: loaded (/lib/systemd/system/ovn-ovsdb-server-nb.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-01-11 09:24:11 UTC; 47s ago
   Main PID: 96196 (ovsdb-server)
      Tasks: 1 (limit: 9475)
     Memory: 2.9M
     CGroup: /system.slice/ovn-ovsdb-server-nb.service
             └─96196 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db

Jan 11 09:24:11 ubuntu-focal-inmotion-iad3-0032697188 systemd[1]: Started Open vSwitch database server for OVN Northbound database.
Jan 11 09:24:11 ubuntu-focal-inmotion-iad3-0032697188 ovn-ctl[96196]:  * /var/lib/ovn/ovnnb_db.db does not exist
Jan 11 09:24:13 ubuntu-focal-inmotion-iad3-0032697188 ovn-ctl[96196]:  * Creating empty database /var/lib/ovn/ovnnb_db.db
Jan 11 09:24:13 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
Jan 11 09:24:13 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.8
Jan 11 09:24:23 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00003|memory|INFO|7340 kB peak resident set size after 10.0 seconds
Jan 11 09:24:23 ubuntu-focal-inmotion-iad3-0032697188 ovsdb-server[96196]: ovs|00004|memory|INFO|cells:31 monitors:2 sessions:1


from an another ex https://zuul.openstack.org/build/b8761291279548239343257ae2e02fc7 it's more clear(service restart triggered at 10:45:46.189787, service stops at 10:45:49 and service ready at 10:46:05, wait for sock files succeeds before service stops 10:45:48):-
2023-01-12 10:45:46.189787 | controller | + lib/neutron_plugins/ovn_agent:_start_process:239 :   sudo systemctl restart ovn-central.service
2023-01-12 10:45:48.159236 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:711 :   wait_for_sock_file /var/run/ovn/ovnnb_db.sock
2023-01-12 10:45:48.161673 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 :   local count=0
2023-01-12 10:45:48.166417 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 :   '[' '!' -S /var/run/ovn/ovnnb_db.sock ']'
2023-01-12 10:45:48.167425 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:712 :   wait_for_sock_file /var/run/ovn/ovnsb_db.sock
2023-01-12 10:45:48.169822 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 :   local count=0
2023-01-12 10:45:48.172787 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 :   '[' '!' -S /var/run/ovn/ovnsb_db.sock ']'
2023-01-12 10:45:48.175211 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:713 :   wait_for_db_file /var/lib/ovn/ovnnb_db.db
2023-01-12 10:45:48.177991 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
2023-01-12 10:45:48.181047 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-12 10:45:48.183699 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:177 :   sleep 1
2023-01-12 10:45:49.188939 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:178 :   count=1
2023-01-12 10:45:49.191707 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:179 :   '[' 1 -gt 40 ']'
2023-01-12 10:45:49.194073 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-12 10:45:49.196129 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:177 :   sleep 1
2023-01-12 10:45:50.200890 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:178 :   count=2
2023-01-12 10:45:50.203234 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:179 :   '[' 2 -gt 40 ']'
2023-01-12 10:45:50.205212 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-12 10:45:50.207584 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:714 :   wait_for_db_file /var/lib/ovn/ovnsb_db.db
2023-01-12 10:45:50.209988 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
2023-01-12 10:45:50.212481 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
2023-01-12 10:45:50.215127 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:716 :   is_service_enabled tls-proxy
2023-01-12 10:45:50.232297 | controller | + functions-common:is_service_enabled:2089 :   return 0
2023-01-12 10:45:50.234744 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:717 :   sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
2023-01-12 10:45:50.244116 | controller | ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)

● ovn-ovsdb-server-nb.service - Open vSwitch database server for OVN Northbound database
     Loaded: loaded (/lib/systemd/system/ovn-ovsdb-server-nb.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-01-12 10:45:49 UTC; 1min 4s ago
   Main PID: 77511 (ovsdb-server)
      Tasks: 1 (limit: 9247)
     Memory: 2.8M
     CGroup: /system.slice/ovn-ovsdb-server-nb.service
             └─77511 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db


Jan 12 10:45:49 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 systemd[1]: ovn-ovsdb-server-nb.service: Succeeded.
Jan 12 10:45:49 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 systemd[1]: Stopped Open vSwitch database server for OVN Northbound database.


Jan 12 10:45:49 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 systemd[1]: Started Open vSwitch database server for OVN Northbound database.
Jan 12 10:45:50 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovn-ctl[77511]:  * /var/lib/ovn/ovnnb_db.db does not exist
Jan 12 10:46:05 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovn-ctl[77511]:  * Creating empty database /var/lib/ovn/ovnnb_db.db
Jan 12 10:46:05 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
Jan 12 10:46:05 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.8
Jan 12 10:46:15 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00003|memory|INFO|7272 kB peak resident set size after 10.0 seconds
Jan 12 10:46:15 nested-virt-ubuntu-focal-ovh-bhs1-0032709305 ovsdb-server[77511]: ovs|00004|memory|INFO|cells:31 monitors:2 sessions:1


Ensuring sock files are absent just like dbs should be able to handle this issue. Will push a patch to devstack.

** Changed in: neutron
   Importance: Undecided => High

** Also affects: devstack
   Importance: Undecided
       Status: New

** Changed in: devstack
     Assignee: (unassigned) => yatin (yatinkarel)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2002629

Title:
  devstack build in the gate fails with: ovnnb_db.sock: database
  connection failed

Status in devstack:
  New
Status in neutron:
  New

Bug description:
  Recently we seem to have many the same devstack build failure in many
  different gate jobs. The usual error message is:

  + lib/neutron_plugins/ovn_agent:start_ovn:714 :   wait_for_db_file /var/lib/ovn/ovnsb_db.db
  + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 :   local count=0
  + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 :   '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
  + lib/neutron_plugins/ovn_agent:start_ovn:716 :   is_service_enabled tls-proxy
  + functions-common:is_service_enabled:2089 :   return 0
  + lib/neutron_plugins/ovn_agent:start_ovn:717 :   sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
  ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
  + lib/neutron_plugins/ovn_agent:start_ovn:1 :   exit_trap

  A few example logs:

  https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36
  https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357

  The search expression 'message:"ovnnb_db.sock: database connection
  failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org
  for the last 2 weeks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/2002629/+subscriptions



References