← Back to team overview

openstack team mailing list archive

Openstack networking failure after server reboot

 

Hi Stackers,

We have a small Openstack lab using three servers. The components are
distributed as:
1. Network controller - Quantum L3 & DHCP, L2 agent, Nova, Openvswitch
2. Cloud controller - Quantum server, L2 agent, Nova, Openvswitch,
Dashboard, API, MySQL, Rabbitmq
3. Compute node - Nova, Openvswitch, L2 agent

The network is setup in the following way:
1. Each server has 4 nics. We are using only one public IP and one
private IP for the openstack setup. We have a private switch for
inter-vm communication
2. We are using gre tunnelling and openvswitch
3. br-int is assigned an IP address
4. br-ex is configured for floating IP allocation

Everything works perfectly when we are setting it up from scratch!!!!

Each vm is able to get the private IP's assigned and the NAT based
floating IP is also assigned and we are able to SSH into it.
The VM's also get created on all the three hosts.

So we are confident that we have the right configurations in place as
we have fully operational Openstack implementation using gre-tunnels.

In order to test the resilience of the setup, we decided to reboot the
servers to see if everything comes up again. We faced some dependency
of services errors and after server reboot we restarted the services
in the proper order i.e. on cloud controller we have mysql, rabbitmq,
keystone, openvswitch and quantum-server started. This was followed by
starting openvswitch, L3, dhcp and L2 agent. After which we started L2
agents on all the remaining servers and followed by nova. There is
some confusion on how to orchestrate the right order of services. This
could possibly be something we will need to work upon in future.

After this, we have nova working properly i.e. we are able to create
vm's and the pre-existing ones are also started (virsh list also shows
the vm's). ovsctl shows all the interfaces as earlier. However we are
unable to access the vm's. On logging into the vm we do not see any IP
address being assigned as the VM is unable to contact the dhcp server.

The questions that come up are:
* What could change after a reboot that would compromise a running
network configuration?
* Could there be issues with the TAP interfaces created? What is the
best way to troubleshoot such a situation?
* Has anyone seen a similar behaviour and is it specific to when we
use gre-tunnels? Is it then specific to openvswitch which we are
using?
* On reboot of the network controller are any steps required to ensure
that Openstack continues to function properly?

The setup has failed twice on reboot. For the second iteration we are
assigning the IP on startup to br-int so that openvswitch does not
give errors.

Regards
Aniruddha


Follow ups