← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1371812] [NEW] n-cpu timeout on rpc allocating network

 

Public bug reported:

During a server create when the ec2 metadata triggers the async creation
of the network information for the server n-cpu times out waiting for
the rpc response from n-net causing the server to go into an error
state. See:

http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-
full/0258bda/logs/screen-n-cpu.txt.gz#_2014-09-19_17_54_32_139

Looking at the n-net logs for the same req-id show that n-net got the
allocate network call at:

http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-
full/0258bda/logs/screen-n-net.txt.gz#_2014-09-19_17_53_11_153

and completed it successfully a little over a  1 min later at:

http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-
full/0258bda/logs/screen-n-net.txt.gz#_2014-09-19_17_54_32_681

and there is no other indication as to why n-cpu raised the timeout. The
rabbit logs don't show an error around that time:

http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-full/0258bda/logs/rabbitmq/clarkbtest.txt.gz
(ignore the unusual path it's the rabbit log for that run)

My guess is that the timeout for that rpc call was 60sec and taking a
little longer caused the timeout and this was just run on a slow node.

looking at logstash for:

message:"MessagingTimeout: Timed out waiting for a reply to message" AND
message:"ec2utils.get_ip_info_for_instance_from_nw_info" AND
tags:"screen-n-cpu.txt"

only shows one hit in the past 7 days, so it's probably not super
prolific.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1371812

Title:
  n-cpu timeout on rpc allocating network

Status in OpenStack Compute (Nova):
  New

Bug description:
  During a server create when the ec2 metadata triggers the async
  creation of the network information for the server n-cpu times out
  waiting for the rpc response from n-net causing the server to go into
  an error state. See:

  http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-
  full/0258bda/logs/screen-n-cpu.txt.gz#_2014-09-19_17_54_32_139

  Looking at the n-net logs for the same req-id show that n-net got the
  allocate network call at:

  http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-
  full/0258bda/logs/screen-n-net.txt.gz#_2014-09-19_17_53_11_153

  and completed it successfully a little over a  1 min later at:

  http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-
  full/0258bda/logs/screen-n-net.txt.gz#_2014-09-19_17_54_32_681

  and there is no other indication as to why n-cpu raised the timeout.
  The rabbit logs don't show an error around that time:

  http://logs.openstack.org/27/122527/2/check/check-tempest-dsvm-full/0258bda/logs/rabbitmq/clarkbtest.txt.gz
  (ignore the unusual path it's the rabbit log for that run)

  My guess is that the timeout for that rpc call was 60sec and taking a
  little longer caused the timeout and this was just run on a slow node.

  looking at logstash for:

  message:"MessagingTimeout: Timed out waiting for a reply to message"
  AND message:"ec2utils.get_ip_info_for_instance_from_nw_info" AND
  tags:"screen-n-cpu.txt"

  only shows one hit in the past 7 days, so it's probably not super
  prolific.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1371812/+subscriptions


Follow ups

References