openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #13226
Re: instances loosing IP address while running, due to No DHCPOFFER
Hey all,
many many thanks for all your replies, and while already having raised the
dhcp timeouts
just by now, I'll have now enough time to sleep to actually apply the
dnsmasq fix
tomorrow then.
Yes, I am running in VLAN-mode, since this is also the propagated way.
Maybe OpenStack (nova-network) should check the version number of dnsmasq
and
if running in vlan mode, it really should issue a (critical) warning into
the logs,
especially where this kind of error can lead to disasters in datacenters. :)
I also hope that Ubuntu 12.04 will pick up this patch soon enough, so the
"us" won't
end up in a patch-dominated distribution :-)
Good night all,
Christian.
On Fri, Jun 15, 2012 at 1:16 AM, Narayan Desai <narayan.desai@xxxxxxxxx>wrote:
> I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat
> similar problem. (it had to do with lease renewal problems on ip
> aliases or something like that).
>
> This issue was particularly pronounced with windows VMs, apparently.
> -nld
>
> On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart <trapni@xxxxxxxxx>
> wrote:
> > Hey,
> >
> > thanks for your reply. Unfortunately there was no process restart in
> > nova-network nor in dnsmasq,
> > both processes seem to have been up for about 2 and 3 days.
> >
> > However, why is the default dhcp_lease_time value equal 120s? Not having
> > this one overridden
> > causes the clients to actually re-acquire a new DHCP lease every 42
> seconds
> > (at least on my nodes),
> > which is completely ridiculous.
> > OTOH, I took a look at the sources (linux_net.py) and found out, why the
> > max_lease_time is
> > set to 2048, because that is the size of my network.
> > So why is the max lease time the size of my network?
> > I've written a tiny patch to allow overriding this value in nova.conf,
> and
> > will submit it to launchpad
> > soon - and hope it'll be accepted and then also applied to essex, since
> this
> > is a very straight forward
> > few-liner helpful thing.
> >
> > Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
> > instances getting
> > no DHCP replies/offers after some hours/days anymore.
> >
> > The one host that caused issues today (a few hours ago), I fixed it by
> hard
> > rebooting the instance,
> > however, just about 40 minutes later, it again forgot its IP, so one
> might
> > say, that it
> > maybe did not get any reply from the dhcp server (dnsmasq) almost right
> > after it got
> > a lease on instance boot.
> >
> > So long,
> > Christian.
> >
> > On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton
> > <nathanael.i.burton@xxxxxxxxx> wrote:
> >>
> >> Has nova-network been restarted? There was an issue where nova-network
> was
> >> signalling dnsmasq which would cause dnsmasq to stop responding to
> requests
> >> yet appear to be running fine.
> >>
> >> You can see if killing dnsmasq, restarting nova-network, and rebooting
> an
> >> instance allows it to get a dhcp address again ...
> >>
> >> Nate
> >>
> >> On Jun 14, 2012 4:46 PM, "Christian Parpart" <trapni@xxxxxxxxx> wrote:
> >>>
> >>> Hey all,
> >>>
> >>> I feel really sad with saying this, now, that we have quite a few
> >>> instances in producgtion
> >>> since about 5 days at least, I now have encountered the second instance
> >>> loosing its
> >>> IP address due to "No DHCPOFFER" (as of syslog in the instance).
> >>>
> >>> I checked the logs in the central nova-network and gateway node and
> found
> >>> dnsmasq still to reply on requests from all the other instances and it
> >>> even
> >>> got the request from the instance in question and even sent an OFFER,
> as
> >>> of what
> >>> I can tell by now (i'm investigating / posting logs asap), but while it
> >>> seemed
> >>> that the dnsmasq sends an offer, the instances says it didn't receive
> one
> >>> - wtf?
> >>>
> >>> Please tell me what I can do to actually *fix* this issue, since this
> is
> >>> by far very fatal.
> >>>
> >>> One chance I'd see (as a workaround) is, to let created instanced
> >>> retrieve
> >>> its IP via dhcp, but then reconfigure /etc/network/instances to
> continue
> >>> with
> >>> static networking setup. However, I'd just like the dhcp thingy to get
> >>> fixed.
> >>>
> >>> I'm very open to any kind of helping comments, :)
> >>>
> >>> So long,
> >>> Christian.
> >>>
> >>>
> >>> _______________________________________________
> >>> Mailing list: https://launchpad.net/~openstack
> >>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> >>> Unsubscribe : https://launchpad.net/~openstack
> >>> More help : https://help.launchpad.net/ListHelp
> >>>
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~openstack
> > More help : https://help.launchpad.net/ListHelp
> >
>
References