← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1847589] Re: Nova compute service restarts with break of rabbitmq server connection

 

[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1847589

Title:
  Nova compute service restarts with break of rabbitmq server connection

Status in OpenStack Compute (nova):
  Expired

Bug description:
  The nova-compute services restarts every time when the rabbitmq
  connection is lost and established back again also we can see lost of
  error logs for timeout to message reply <message-id> There is no lot
  of connections observed at rabbitmq end

  LOgs 
  2019-10-10 10:34:31,073.073 40972 INFO nova.compute.resource_tracker [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c2
  96a2bee1 - default default] Final resource view: name=cmp023.jpe2a.jiocloud.com phys_ram=161156MB used_ram=160844MB phys_disk=886759GB used_disk=0GB total_vcpus=32 used
  _vcpus=58 pci_stats=[]
  2019-10-10 10:35:10,488.488 40972 WARNING oslo.messaging._drivers.impl_rabbit [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd
  3828eb8c296a2bee1 - default default] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 110] Connection timed out
  2019-10-10 10:35:14,253.253 40972 ERROR oslo.messaging._drivers.impl_rabbit [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd38
  28eb8c296a2bee1 - default default] [833f2e5e-13a2-40d0-96ed-f73be982853d] AMQP server on 10.157.218.30:5672 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker for
  ced connection closure with reason 'shutdown'. Trying again in 1 seconds. Client port: None: ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connectio
  n closure with reason 'shutdown'
  2019-10-10 10:35:15,260.260 40972 ERROR oslo.messaging._drivers.impl_rabbit [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd38
  28eb8c296a2bee1 - default default] [833f2e5e-13a2-40d0-96ed-f73be982853d] AMQP server on 10.157.218.30:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1
  seconds. Client port: None: error: [Errno 111] ECONNREFUSED
  2019-10-10 10:35:16,979.979 40972 INFO oslo.messaging._drivers.impl_rabbit [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c296a2bee1 - default default] [833f2e5e-13a2-40d0-96ed-f73be982853d] Reconnected to AMQP server on 10.157.218.32:5672 via [amqp] client with port 56002.
  2019-10-10 10:35:39,068.068 40972 ERROR oslo.messaging._drivers.impl_rabbit [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c296a2bee1 - default default] [e28c93ab-7cf1-47f9-b37f-933c23881d0b] AMQP server 10.157.218.30:5672 closed the connection. Check login credentials: Socket closed: IOError: Socket closed
  2019-10-10 10:35:40,079.079 40972 INFO oslo.messaging._drivers.impl_rabbit [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c296a2bee1 - default default] [e28c93ab-7cf1-47f9-b37f-933c23881d0b] Reconnected to AMQP server on 10.157.218.30:5672 via [amqp] client with port 39778.
  2019-10-10 10:35:40,084.084 40972 WARNING oslo.service.loopingcall [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c296a2bee1 - default default] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 1.02 sec
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c296a2bee1 - default default] Error during ComputeManager._run_image_cache_manager_pass: MessagingTimeout: Timed out waiting for a reply to message ID 7d25d4ef1e4e4417a2c57c2418d5579f
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task Traceback (most recent call last):
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/oslo_service/periodic_task.py", line 220, in run_periodic_tasks
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     task(self, context)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 7137, in _run_image_cache_manager_pass
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     filters, expected_attrs=[], use_slave=True)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     args, kwargs)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     args=args, kwargs=kwargs)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     retry=self.retry)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     retry=retry)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 555, in _send
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     result = self._waiter.wait(msg_id, timeout)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 447, in wait
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     message = self.waiters.get(msg_id, timeout=timeout)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 335, in get
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task     'to message ID %s' % msg_id)
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID 7d25d4ef1e4e4417a2c57c2418d5579f
  2019-10-10 11:35:32,542.542 40972 ERROR oslo_service.periodic_task
  2019-10-10 11:35:40,090.090 40972 WARNING nova.servicegroup.drivers.db [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c296a2bee1 - default default] Lost connection to nova-conductor for reporting service status.: MessagingTimeout: Timed out waiting for a reply to message ID 7016474790f34610ae1b342fba68d76c
  2019-10-10 11:35:40,091.091 40972 WARNING oslo.service.loopingcall [req-0ea34030-6eac-46aa-aac0-aec64d1f72a6 b04e76da460b4d4c900905a361981083 1f9ae90366ce4cd3828eb8c296a2bee1 - default default] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 3540.01 sec

  rabbitmq logs:

  =INFO REPORT==== 10-Oct-2019::12:57:57 ===
  closing AMQP connection <0.11861.3449> (10.157.218.19:56030 -> 10.157.218.30:5672 - nova-conductor:2694:e54e25a3-a71a-4acb-aadd-f03eb55f71bf, vhost: '/openstack', user:

  =INFO REPORT==== 10-Oct-2019::12:57:58 ===
  closing AMQP connection <0.25155.3455> (10.157.218.17:52746 -> 10.157.218.30:5672 - nova-conductor:29315:80f5e99b-3928-456a-a2a7-5c5619af75f5, vhost: '/openstack', user

  =INFO REPORT==== 10-Oct-2019::12:57:59 ===
  closing AMQP connection <0.24950.3455> (10.157.218.19:56240 -> 10.157.218.30:5672 - nova-conductor:2694:f60b4877-868e-48b6-9e3b-48459f524fa6, vhost: '/openstack', user:

  =INFO REPORT==== 10-Oct-2019::12:58:00 ===
  closing AMQP connection <0.22926.3455> (10.157.218.17:52758 -> 10.157.218.30:5672 - nova-conductor:29315:ddb5e157-3f1a-4efe-a1c5-fac696eb7e2d, vhost: '/openstack', user

  =INFO REPORT==== 10-Oct-2019::12:58:00 ===
  closing AMQP connection <0.16489.3064> (10.157.218.17:52790 -> 10.157.218.30:5672 - nova-conductor:29315:08321a15-6be4-4b02-b21f-b0e3e84d2a3b, vhost: '/openstack', user

  =INFO REPORT==== 10-Oct-2019::12:58:01 ===
  closing AMQP connection <0.25210.3455> (10.157.218.20:42032 -> 10.157.218.30:5672 - nova-conductor:12256:9ce4f67d-403a-431b-8aff-797c08249d46, vhost: '/openstack', user

  =INFO REPORT==== 10-Oct-2019::12:58:01 ===
  closing AMQP connection <0.25236.3455> (10.157.218.17:52816 -> 10.157.218.30:5672 - nova-conductor:29315:4cd3facf-ce08-42f8-9f31-cdede790301e, vhost: '/openstack', user

  =INFO REPORT==== 10-Oct-2019::12:58:01 ===
  closing AMQP connection <0.22861.3455> (10.157.218.17:52844 -> 10.157.218.30:5672 - nova-conductor:29315:e5e8a2c0-26f4-4715-9fb7-5ca5c93fd834, vhost: '/openstack', user

  =WARNING REPORT==== 10-Oct-2019::12:58:02 ===
  closing AMQP connection <0.23015.3452> (10.157.218.123:39778 -> 10.157.218.30:5672 - nova-compute:40972:e28c93ab-7cf1-47f9-b37f-933c23881d0b, vhost: '/openstack', user:
  client unexpectedly closed TCP connection

  =INFO REPORT==== 10-Oct-2019::12:58:02 ===
  closing AMQP connection <0.25087.3455> (10.157.218.19:56536 -> 10.157.218.30:5672 - nova-conductor:2694:de4a00f2-e47c-48f2-9377-12d403344e80, vhost: '/openstack', user:

  =INFO REPORT==== 10-Oct-2019::12:58:03 ===
  closing AMQP connection <0.22518.3455> (10.157.218.17:52904 -> 10.157.218.30:5672 - nova-conductor:29315:7a085e73-1ca9-4d9a-92e5-967813aa4b9f, vhost: '/openstack', user

  =INFO REPORT==== 10-Oct-2019::12:58:03 ===
  closing AMQP connection <0.25262.3455> (10.157.218.19:56570 -> 10.157.218.30:5672 - nova-conductor:2691:c8c07352-41ed-4fa7-b3b6-69b5d882947e, vhost: '/openstack', user:

  =INFO REPORT==== 10-Oct-2019::12:58:04 ===
  closing AMQP connection <0.25349.3455> (10.157.218.20:42158 -> 10.157.218.30:5672 - nova-conductor:12256:223bf35b-3233-402d-b0a7-d367d60ae3a5, vhost: '/openstack', user

  
  (jpe2a)root@msg01:/var/log/rabbitmq# less -S rabbit@xxxxxxxxx
  tcp        0      0 10.157.218.30:5672      10.157.218.20:34556     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.106:44906    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.20:47786     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.18:47180     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:15672     10.157.218.30:53018     TIME_WAIT   -
  tcp        0      0 10.157.218.30:5672      10.157.218.20:46864     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.20:48112     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.112:38126    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.219.2:36536      ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.18:56980     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.125:55766    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.19:51318     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.18:41030     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.104:40904    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.26:51980     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.16:40410     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.17:59692     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.17:57012     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.142:36812    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.20:46648     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.219.5:54016      ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.18:35288     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.28:46418     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.17:59110     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.102:33278    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.17:60190     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.138:53058    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.176:36166    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.196:38662    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.120:56994    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.104:43916    ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.20:46762     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.16:33876     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.18:34454     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.18:57332     ESTABLISHED 1249/beam.smp
  tcp        0      0 10.157.218.30:5672      10.157.218.18:34900     ESTABLISHED 1249/beam.smp
  tcp6       0      0 :::4369                 :::*                    LISTEN      1/init
  tcp6       0      0 :::9126                 :::*                    LISTEN      1241/telegraf
  tcp6       0      0 10.157.218.30:9126      10.157.218.42:58696     ESTABLISHED 1241/telegraf
  tcp6       0      0 127.0.0.1:4369          127.0.0.1:40310         ESTABLISHED 1238/epmd
  tcp6       0      0 10.157.218.30:9126      10.157.218.43:39070     ESTABLISHED 1241/telegraf
  tcp6       0      0 10.157.218.30:9126      10.157.218.44:37188     ESTABLISHED 1241/telegraf
  tcp6       0      0 10.157.218.30:4369      10.157.218.30:40335     TIME_WAIT   -

  (jpe2a)root@msg01:~# ss --tcp -f inet --listening --numeric --processes
  State      Recv-Q Send-Q                                       Local Address:Port                                                      Peer Address:Port
  LISTEN     0      128                                                      *:22                                                                   *:*                   users:(("sshd",pid=1268,fd=3))
  LISTEN     0      1024                                         10.157.218.30:15672                                                                *:*                   users:(("beam.smp",pid=1249,fd=109))
  LISTEN     0      4096                                             127.0.0.1:8186                                                                 *:*                   users:(("telegraf",pid=1241,fd=3))
  LISTEN     0      128                                                      *:24231                                                                *:*                   users:(("ruby",pid=11723,fd=23))
  LISTEN     0      128                                          10.157.218.30:5672                                                                 *:*                   users:(("beam.smp",pid=1249,fd=16))
  LISTEN     0      128                                                      *:25672                                                                *:*                   users:(("beam.smp",pid=1249,fd=46))
  LISTEN     0      4096                                         10.157.218.29:5672                                                                 *:*                   users:(("haproxy",pid=1429,fd=5))

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1847589/+subscriptions


References