← Back to team overview

openstack team mailing list archive

Re: DCHP Server Stops Responding

 

Vish,
 
For future reference, I have resolved the issue of DCHP server not responding. It turns out that KVM had a bug up until v14.2 that would cause VMs to randomly lose network connectivity. The bug has been fixed in the very recent qemu-kvm v14.3 public release. 
 
Bug/Fix: [https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978] https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978
 
As a side note, making DHCP leases very long seems to cause networking errors occasionally during instance launch.
 
 
Justin


 
-----Original Message-----
From: "Vishvananda Ishaya" <vishvananda@xxxxxxxxx>
Sent: Friday, October 12, 2012 11:10am
To: "Justin Hurley" <justin.hurley@xxxxxxxxxxx>
Cc: openstack@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Openstack] DCHP Server Stops Responding




On Oct 12, 2012, at 10:33 AM, Justin Hurley <[mailto:justin.hurley@xxxxxxxxxxx] justin.hurley@xxxxxxxxxxx> wrote:

Hello All,
 
I am having problems with some instances not receiving DCHPACK from my VM's dhcp server after extended periods of time. These instances are running heavy network, I/O, and RAM loads when the DHCPREQUEST does not receive a response. Note that this only happens to a handful of instances, not all instances running the exact same load. I have printed a relevant section of the syslog below.
 
My current environment is a multi-host flat dhcp network with nova-network running on each node. All servers and VM's are running Ubuntu 12.04 and using kvm/libvirt. I am also using large NFS servers to transfer big files to and from instances across the VM network. 
 
In the past, I used Eucalyptus and the same problem would occur. This tends to happen more frequently with more VMs and overall network load. 
 
If anyone has any ideas as to why the dhcp server stops responding to only a few instances occasionally please let me know.
I haven't seen this before, but you may be able to work around the issue by setting much longer leases:
force_dhcp_release=True
dhcp_lease_time=86400 # 1 day leases
fixed_ip_disassociate_timout=172800 # 2 day timeout
Vish

References