openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #18389
Re: Openstack networking failure after server reboot
On Wed, Nov 7, 2012 at 5:52 PM, Gary Kotton <gkotton@xxxxxxxxxx> wrote:
> On 11/07/2012 11:47 AM, Aniruddha Khadkikar wrote:
>>
>> Hi Stackers,
>>
>> We have a small Openstack lab using three servers. The components are
>> distributed as:
>> 1. Network controller - Quantum L3& DHCP, L2 agent, Nova, Openvswitch
>>
>> 2. Cloud controller - Quantum server, L2 agent, Nova, Openvswitch,
>> Dashboard, API, MySQL, Rabbitmq
>> 3. Compute node - Nova, Openvswitch, L2 agent
>>
>> The network is setup in the following way:
>> 1. Each server has 4 nics. We are using only one public IP and one
>> private IP for the openstack setup. We have a private switch for
>> inter-vm communication
>> 2. We are using gre tunnelling and openvswitch
>> 3. br-int is assigned an IP address
>> 4. br-ex is configured for floating IP allocation
>>
>> Everything works perfectly when we are setting it up from scratch!!!!
>>
>> Each vm is able to get the private IP's assigned and the NAT based
>> floating IP is also assigned and we are able to SSH into it.
>> The VM's also get created on all the three hosts.
>>
>> So we are confident that we have the right configurations in place as
>> we have fully operational Openstack implementation using gre-tunnels.
>>
>> In order to test the resilience of the setup, we decided to reboot the
>> servers to see if everything comes up again. We faced some dependency
>> of services errors and after server reboot we restarted the services
>> in the proper order i.e. on cloud controller we have mysql, rabbitmq,
>> keystone, openvswitch and quantum-server started. This was followed by
>> starting openvswitch, L3, dhcp and L2 agent. After which we started L2
>> agents on all the remaining servers and followed by nova. There is
>> some confusion on how to orchestrate the right order of services. This
>> could possibly be something we will need to work upon in future.
>>
>> After this, we have nova working properly i.e. we are able to create
>> vm's and the pre-existing ones are also started (virsh list also shows
>> the vm's). ovsctl shows all the interfaces as earlier. However we are
>> unable to access the vm's. On logging into the vm we do not see any IP
>> address being assigned as the VM is unable to contact the dhcp server.
>>
>> The questions that come up are:
>> * What could change after a reboot that would compromise a running
>> network configuration?
>> * Could there be issues with the TAP interfaces created? What is the
>> best way to troubleshoot such a situation?
>> * Has anyone seen a similar behaviour and is it specific to when we
>> use gre-tunnels? Is it then specific to openvswitch which we are
>> using?
>> * On reboot of the network controller are any steps required to ensure
>> that Openstack continues to function properly?
>
>
> Can you please look in the log files for Quantum and see if there are any
> errors?
>
> There is an open issue with Quantum and QPID after rebooting - the Quantum
> service hangs? On the host for Quantum is you do "netstat -an |grep 9696" do
> you see anything?
>
Unfortunately we recreated the cloud again. This time however we have
not assigned an IP to the br-int interface.
It is working currently as we will do the reboot today. By evening I
will provide details of the errors.
In the syslog on the network node we started seeing a lot of:
Nov 7 12:59:30 dnsmasq-dhcp[5722]: last message repeated 3 times
Nov 7 12:59:30 us000901 dnsmasq-dhcp[5746]:
DHCPDISCOVER(tap224fcabc-70) fa:16:3e:52:38:ce
Nov 7 12:59:30 us000901 dnsmasq-dhcp[5722]:
DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available
Nov 7 12:59:30 us000901 dnsmasq-dhcp[5746]: DHCPOFFER(tap224fcabc-70)
172.24.2.11 fa:16:3e:52:38:ce
Nov 7 12:59:30 us000901 dnsmasq-dhcp[5722]:
DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available
Nov 7 12:59:39 us000901 dnsmasq-dhcp[5722]:
DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available
Nov 7 12:59:39 us000901 dnsmasq-dhcp[5746]:
DHCPDISCOVER(tap224fcabc-70) fa:16:3e:52:38:ce
Nov 7 12:59:39 us000901 dnsmasq-dhcp[5746]: DHCPOFFER(tap224fcabc-70)
172.24.2.11 fa:16:3e:52:38:ce
Nov 7 12:59:57 us000901 dnsmasq-dhcp[5722]:
DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available
Nov 7 12:59:57 us000901 dnsmasq-dhcp[5746]:
DHCPDISCOVER(tap224fcabc-70) fa:16:3e:52:38:ce
Nov 7 12:59:57 us000901 dnsmasq-dhcp[5746]: DHCPOFFER(tap224fcabc-70)
172.24.2.11 fa:16:3e:52:38:ce
The above actions are associated with near 100% cpu for kvm processes
and dnsmasq.
The Quantum dhcp log relevant part is at http://pastebin.com/GmksGeK6
Regards
Aniruddha
>>
>> The setup has failed twice on reboot. For the second iteration we are
>> assigning the IP on startup to br-int so that openvswitch does not
>> give errors.
>>
>> Regards
>> Aniruddha
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
References