openstack team mailing list archive

Thread
Date
Cannot ping or even ARP VM by its fixed IP - and solution

To: openstack@xxxxxxxxxxxxxxxxxxx
From: Eugene Kirpichov <ekirpichov@xxxxxxxxx>
Date: Thu, 19 Jul 2012 20:35:47 -0700
Hi,

This is a tale of a problem and its solution. I'm posting it to the
mailing list to make it googlable by someone who experiences the same
problem. This is why I'm also packing it with SEO keywords.

===The only thing I'd like to ask the community=== is to help me
identify the proper places in documentation and wiki which could be
improved with this experience. Just point me, and I'll improve them.

* I suffered from not knowing what the network is supposed to look
like with each networking mode - which interfaces should there be, how
they should be configured (e.g. which should have an IP address and
which shouldn't), which pings should work and which shouldn't. Also,
which piece of code is supposed to do this piece of network
configuration, on behalf of whom and when.
* I suffered from lack of a trouble-shooting guide.

Now, to the actual business.

I hereby glorify the person with IRC nick Diopter who spent a lot of
time to help me resolve this problem. Many of the things I'm saying "I
tried" below, I tried after he suggested them.

== Environment ==
I have a few machines: (actually VMs but that doesn' matter):
controller and a couple of compute nodes.
They are connected by 2 networks: 1) internal network on eth1 2) VM
network (10.0.0.0/24) on eth2.
I've freshly deployed OpenStack on them in Flat DHCP mode
(FlatDHCPManager) following the recipes of
https://github.com/puppetlabs/puppetlabs-openstack.

I've imported the Cirros image I got from here
http://wiki.openstack.org/GettingImages [as it turns out, incorrectly]
and "nova boot"-ed an instance, with an IP of 10.0.0.3. The instance
status was ACTIVE.

The network configuration looked exactly as it's supposed to look in
flat DHCP mode: all machines had br100 attached to eth2, and the VM's
vnet0 was attached to br100 as well. None of br100, eth2, vnet0 had an
IP address (this is correct too).

== Problem ==
Then I pinged the VM:
$ ping 10.0.0.3

And got "Destination Host Unreachable".

== Investigations ==
A really weird thing is that I usually would get a single ping
response per VM lifecycle. Another weird thing is that dnsmasq on the
controller was properly configured and /var/log/syslog had DHCPREQUEST
and even DHCPACK entries for the VM's MAC and IP.

I tried to "arp 10.0.0.3" from both the controller and the VM, I also
tried "arping" but to no avail - tcpdump was always showing that ARP
requests arrive where they should, but there are no ARP responses.
Another sad thing is that you can't catch ARP packets with iptables -j
TRACE and ipt_LOG, and arptables don't have logging facilities at all.

I authorized the default security group to allow ICMP traffic (Horizon
-> Projects -> Security -> allow ICMP protocol code -1 type -1 CIDR
0.0.0.0). That didn't help.

Then I looked at the instance log
(/var/lib/nova/instances/instance*/console.log) and it turned out
empty! Now this was not expected behavior.

Since the controller and VMs were headless, I couldn't VNC to the
instance. So I took a screenshot with "virsh screenshot
instance-00000009".
I opened the screenshot and it said "Boot failed: not a bootable disk"
and then the PXE boot process and then "No more network devices" and
finally "No bootable device".

== True problem and solution ==
So, it turns out the ACTUAL PROBLEM was this: I imported the image
incorrectly (glance add name=cirros disk_format=raw
container_format=bare <cirros.img), so it didn't know it had a hard
disk.
The sole ping that got through, apparently, did so while BIOS was
attempting a PXE boot, and after it failed, it wasn't listening to
network packets anymore.

The proper way to import the Cirros image is this:
* Download it from
https://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-disk.img
(just getting the .img from inside the -uec- image .tar.gz at
http://wiki.openstack.org/GettingImages won't do!)
* Import it as "glance add name=cirros container_format=bare
disk_format=qcow2 < cirros-0.3.0-x86_64-disk.img"
* Then everything works fine.

TO REMIND: Please help me find the places in documentation that you
think could benefit from me improving them to this experience.

-- 
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov