← Back to team overview

openstack team mailing list archive

Re: VM can't ping self floating IP after a snapshot is taken

 

I have fixed it here  https://review.openstack.org/#/c/11925/

2012/8/25 Sam Su <susltd.su@xxxxxxxxx>:
> Hi,
>
> I also reported this bug:
>  https://bugs.launchpad.net/nova/+bug/1040255
>
>  If someone can combine you guys solution and get a perfect way to fix this
> bug, that will be great.
>
> BRs,
> Sam
>
>
> On Thu, Aug 23, 2012 at 9:27 PM, heut2008 <heut2008@xxxxxxxxx> wrote:
>>
>> this bug has been filed here  https://bugs.launchpad.net/nova/+bug/1040537
>>
>> 2012/8/24 Vishvananda Ishaya <vishvananda@xxxxxxxxx>:
>> > +1 to this. Evan, can you report a bug (if one hasn't been reported yet)
>> > and
>> > propose the fix? Or else I can find someone else to propose it.
>> >
>> > Vish
>> >
>> > On Aug 23, 2012, at 1:38 PM, Evan Callicoat <diopter@xxxxxxxxx> wrote:
>> >
>> > Hello all!
>> >
>> > I'm the original author of the hairpin patch, and things have changed a
>> > little bit in Essex and Folsom from the original Diablo target. I
>> > believe I
>> > can shed some light on what should be done here to solve the issue in
>> > either
>> > case.
>> >
>> > ---
>> > For Essex (stable/essex), in nova/virt/libvirt/connection.py:
>> > ---
>> >
>> > Currently _enable_hairpin() is only being called from spawn(). However,
>> > spawn() is not the only place that vifs (veth#) get added to a bridge
>> > (which
>> > is when we need to enable hairpin_mode on them). The more relevant
>> > function
>> > is _create_new_domain(), which is called from spawn() and other places.
>> > Without changing the information that gets passed to
>> > _create_new_domain()
>> > (which is just 'xml' from to_xml()), we can easily rewrite the first 2
>> > lines
>> > in _enable_hairpin(), as follows:
>> >
>> > def _enable_hairpin(self, xml):
>> >     interfaces = self.get_interfaces(xml['name'])
>> >
>> > Then, we can move the self._enable_hairpin(instance) call from spawn()
>> > up
>> > into _create_new_domain(), and pass it xml as follows:
>> >
>> > [...]
>> > self._enable_hairpin(xml)
>> > return domain
>> >
>> > This will run the hairpin code every time a domain gets created, which
>> > is
>> > also when the domain's vif(s) gets inserted into the bridge with the
>> > default
>> > of hairpin_mode=0.
>> >
>> > ---
>> > For Folsom (trunk), in nova/virt/libvirt/driver.py:
>> > ---
>> >
>> > There've been a lot more changes made here, but the same strategy as
>> > above
>> > should work. Here, _create_new_domain() has been split into
>> > _create_domain()
>> > and _create_domain_and_network(), and _enable_hairpin() was moved from
>> > spawn() to _create_domain_and_network(), which seems like it'd be the
>> > right
>> > thing to do, but doesn't quite cover all of the cases of vif
>> > reinsertion,
>> > since _create_domain() is the only function which actually creates the
>> > domain (_create_domain_and_network() just calls it after doing some
>> > pre-work). The solution here is likewise fairly simple; make the same 2
>> > changes to _enable_hairpin():
>> >
>> > def _enable_hairpin(self, xml):
>> >     interfaces = self.get_interfaces(xml['name'])
>> >
>> > And move it from _create_domain_and_network() to _create_domain(), like
>> > before:
>> >
>> > [...]
>> > self._enable_hairpin(xml)
>> > return domain
>> >
>> > I haven't yet tested this on my Essex clusters and I don't have a Folsom
>> > cluster handy at present, but the change is simple and makes sense.
>> > Looking
>> > at to_xml() and _prepare_xml_info(), it appears that the 'xml' variable
>> > _create_[new_]domain() gets is just a python dictionary, and xml['name']
>> > =
>> > instance['name'], exactly what _enable_hairpin() was using the
>> > 'instance'
>> > variable for previously.
>> >
>> > Let me know if this works, or doesn't work, or doesn't make sense, or if
>> > you
>> > need an address to send gifts, etc. Hope it's solved!
>> >
>> > -Evan
>> >
>> > On Thu, Aug 23, 2012 at 11:20 AM, Sam Su <susltd.su@xxxxxxxxx> wrote:
>> >>
>> >> Hi Oleg,
>> >>
>> >> Thank you for your investigation. Good lucky!
>> >>
>> >> Can you let me know if find how to fix the bug?
>> >>
>> >> Thanks,
>> >> Sam
>> >>
>> >> On Wed, Aug 22, 2012 at 12:50 PM, Oleg Gelbukh <ogelbukh@xxxxxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> Is it possible that, during snapshotting, libvirt just tears down
>> >>> virtual
>> >>> interface at some point, and then re-creates it, with hairpin_mode
>> >>> disabled
>> >>> again?
>> >>> This bugfix [https://bugs.launchpad.net/nova/+bug/933640] implies that
>> >>> fix works on spawn of instance. This means that upon resume after
>> >>> snapshot,
>> >>> hairpin is not restored. May be if we insert the _enable_hairpin()
>> >>> call in
>> >>> snapshot procedure, it helps.
>> >>> We're currently investigating this issue in one of our environments,
>> >>> hope
>> >>> to come up with answer by tomorrow.
>> >>>
>> >>> --
>> >>> Best regards,
>> >>> Oleg
>> >>>
>> >>> On Wed, Aug 22, 2012 at 11:29 PM, Sam Su <susltd.su@xxxxxxxxx> wrote:
>> >>>>
>> >>>> My friend has found a way to enable ping itself, when this problem
>> >>>> happened. But not found why this happen.
>> >>>> sudo echo "1" >
>> >>>> /sys/class/net/br1000/brif/<virtual-interface-name>/hairpin_mode
>> >>>>
>> >>>> I file a ticket to report this problem:
>> >>>> https://bugs.launchpad.net/nova/+bug/1040255
>> >>>>
>> >>>> hopefully someone can find why this happen and solve it.
>> >>>>
>> >>>> Thanks,
>> >>>> Sam
>> >>>>
>> >>>>
>> >>>> On Fri, Jul 20, 2012 at 3:50 PM, Gabriel Hurley
>> >>>> <Gabriel.Hurley@xxxxxxxxxx> wrote:
>> >>>>>
>> >>>>> I ran into some similar issues with the _enable_hairpin() call. The
>> >>>>> call is allowed to fail silently and (in my case) was failing. I
>> >>>>> couldn’t
>> >>>>> for the life of me figure out why, though, and since I’m really not
>> >>>>> a
>> >>>>> networking person I didn’t trace it along too far.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Just thought I’d share my similar pain.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> -          Gabriel
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> From:
>> >>>>> openstack-bounces+gabriel.hurley=nebula.com@xxxxxxxxxxxxxxxxxxx
>> >>>>>
>> >>>>> [mailto:openstack-bounces+gabriel.hurley=nebula.com@xxxxxxxxxxxxxxxxxxx] On
>> >>>>> Behalf Of Sam Su
>> >>>>> Sent: Thursday, July 19, 2012 11:50 AM
>> >>>>> To: Brian Haley
>> >>>>> Cc: openstack
>> >>>>> Subject: Re: [Openstack] VM can't ping self floating IP after a
>> >>>>> snapshot is taken
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Thank you for your support.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> I checked the file  nova/virt/libvirt/connection.py, the sentence
>> >>>>> self._enable_hairpin(instance) is already added to the function
>> >>>>> _hard_reboot().
>> >>>>>
>> >>>>> It looks like there are some difference between taking snapshot and
>> >>>>> reboot instance. I tried to figure out how to fix this bug but
>> >>>>> failed.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> It will be much appreciated if anyone can give some hints.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Sam
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Jul 19, 2012 at 8:37 AM, Brian Haley <brian.haley@xxxxxx>
>> >>>>> wrote:
>> >>>>>
>> >>>>> On 07/17/2012 05:56 PM, Sam Su wrote:
>> >>>>> > Hi,
>> >>>>> >
>> >>>>> > Just This always happens in Essex release. After I take a snapshot
>> >>>>> > of
>> >>>>> > my VM ( I
>> >>>>> > tried Ubuntu 12.04 or CentOS 5.8), VM can't ping its self floating
>> >>>>> > IP; before I
>> >>>>> > take a snapshot though, VM can ping its self floating IP.
>> >>>>> >
>> >>>>> > This looks closely related to
>> >>>>> > https://bugs.launchpad.net/nova/+bug/933640, but
>> >>>>> > still a little different. In 933640, it sounds like VM can't ping
>> >>>>> > its
>> >>>>> > self
>> >>>>> > floating IP regardless whether we take a snapshot or not.
>> >>>>> >
>> >>>>> > Any suggestion to make an easy fix? And what is the root cause of
>> >>>>> > the
>> >>>>> > problem?
>> >>>>>
>> >>>>> It might be because there's a missing _enable_hairpin() call in the
>> >>>>> reboot()
>> >>>>> function.  Try something like this...
>> >>>>>
>> >>>>> nova/virt/libvirt/connection.py, _hard_reboot():
>> >>>>>
>> >>>>>              self._create_new_domain(xml)
>> >>>>> +            self._enable_hairpin(instance)
>> >>>>>              self.firewall_driver.apply_instance_filter(instance,
>> >>>>> network_info)
>> >>>>>
>> >>>>> At least that's what I remember doing myself recently when testing
>> >>>>> after a
>> >>>>> reboot, don't know about snapshot.
>> >>>>>
>> >>>>> Folsom has changed enough that something different would need to be
>> >>>>> done there.
>> >>>>>
>> >>>>> -Brian
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Mailing list: https://launchpad.net/~openstack
>> >>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> >>>> Unsubscribe : https://launchpad.net/~openstack
>> >>>> More help   : https://help.launchpad.net/ListHelp
>> >>>>
>> >>>
>> >>
>> >>
>> >> _______________________________________________
>> >> Mailing list: https://launchpad.net/~openstack
>> >> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> >> Unsubscribe : https://launchpad.net/~openstack
>> >> More help   : https://help.launchpad.net/ListHelp
>> >>
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~openstack
>> > Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~openstack
>> > More help   : https://help.launchpad.net/ListHelp
>> >
>> >
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~openstack
>> > Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~openstack
>> > More help   : https://help.launchpad.net/ListHelp
>> >
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>


Follow ups

References