← Back to team overview

openstack team mailing list archive

Re: VM can't ping self floating IP after a snapshot is taken

 

for stable/essex the patach is here https://review.openstack.org/#/c/11986/,

2012/8/25 Sam Su <susltd.su@xxxxxxxxx>:
> That's great, thank you for your efforts. Can you make a backport for essex?
>
> Sent from my iPhone
>
> On Aug 24, 2012, at 7:15 PM, heut2008 <heut2008@xxxxxxxxx> wrote:
>
>> I have fixed it here  https://review.openstack.org/#/c/11925/
>>
>> 2012/8/25 Sam Su <susltd.su@xxxxxxxxx>:
>>> Hi,
>>>
>>> I also reported this bug:
>>> https://bugs.launchpad.net/nova/+bug/1040255
>>>
>>> If someone can combine you guys solution and get a perfect way to fix this
>>> bug, that will be great.
>>>
>>> BRs,
>>> Sam
>>>
>>>
>>> On Thu, Aug 23, 2012 at 9:27 PM, heut2008 <heut2008@xxxxxxxxx> wrote:
>>>>
>>>> this bug has been filed here  https://bugs.launchpad.net/nova/+bug/1040537
>>>>
>>>> 2012/8/24 Vishvananda Ishaya <vishvananda@xxxxxxxxx>:
>>>>> +1 to this. Evan, can you report a bug (if one hasn't been reported yet)
>>>>> and
>>>>> propose the fix? Or else I can find someone else to propose it.
>>>>>
>>>>> Vish
>>>>>
>>>>> On Aug 23, 2012, at 1:38 PM, Evan Callicoat <diopter@xxxxxxxxx> wrote:
>>>>>
>>>>> Hello all!
>>>>>
>>>>> I'm the original author of the hairpin patch, and things have changed a
>>>>> little bit in Essex and Folsom from the original Diablo target. I
>>>>> believe I
>>>>> can shed some light on what should be done here to solve the issue in
>>>>> either
>>>>> case.
>>>>>
>>>>> ---
>>>>> For Essex (stable/essex), in nova/virt/libvirt/connection.py:
>>>>> ---
>>>>>
>>>>> Currently _enable_hairpin() is only being called from spawn(). However,
>>>>> spawn() is not the only place that vifs (veth#) get added to a bridge
>>>>> (which
>>>>> is when we need to enable hairpin_mode on them). The more relevant
>>>>> function
>>>>> is _create_new_domain(), which is called from spawn() and other places.
>>>>> Without changing the information that gets passed to
>>>>> _create_new_domain()
>>>>> (which is just 'xml' from to_xml()), we can easily rewrite the first 2
>>>>> lines
>>>>> in _enable_hairpin(), as follows:
>>>>>
>>>>> def _enable_hairpin(self, xml):
>>>>>    interfaces = self.get_interfaces(xml['name'])
>>>>>
>>>>> Then, we can move the self._enable_hairpin(instance) call from spawn()
>>>>> up
>>>>> into _create_new_domain(), and pass it xml as follows:
>>>>>
>>>>> [...]
>>>>> self._enable_hairpin(xml)
>>>>> return domain
>>>>>
>>>>> This will run the hairpin code every time a domain gets created, which
>>>>> is
>>>>> also when the domain's vif(s) gets inserted into the bridge with the
>>>>> default
>>>>> of hairpin_mode=0.
>>>>>
>>>>> ---
>>>>> For Folsom (trunk), in nova/virt/libvirt/driver.py:
>>>>> ---
>>>>>
>>>>> There've been a lot more changes made here, but the same strategy as
>>>>> above
>>>>> should work. Here, _create_new_domain() has been split into
>>>>> _create_domain()
>>>>> and _create_domain_and_network(), and _enable_hairpin() was moved from
>>>>> spawn() to _create_domain_and_network(), which seems like it'd be the
>>>>> right
>>>>> thing to do, but doesn't quite cover all of the cases of vif
>>>>> reinsertion,
>>>>> since _create_domain() is the only function which actually creates the
>>>>> domain (_create_domain_and_network() just calls it after doing some
>>>>> pre-work). The solution here is likewise fairly simple; make the same 2
>>>>> changes to _enable_hairpin():
>>>>>
>>>>> def _enable_hairpin(self, xml):
>>>>>    interfaces = self.get_interfaces(xml['name'])
>>>>>
>>>>> And move it from _create_domain_and_network() to _create_domain(), like
>>>>> before:
>>>>>
>>>>> [...]
>>>>> self._enable_hairpin(xml)
>>>>> return domain
>>>>>
>>>>> I haven't yet tested this on my Essex clusters and I don't have a Folsom
>>>>> cluster handy at present, but the change is simple and makes sense.
>>>>> Looking
>>>>> at to_xml() and _prepare_xml_info(), it appears that the 'xml' variable
>>>>> _create_[new_]domain() gets is just a python dictionary, and xml['name']
>>>>> =
>>>>> instance['name'], exactly what _enable_hairpin() was using the
>>>>> 'instance'
>>>>> variable for previously.
>>>>>
>>>>> Let me know if this works, or doesn't work, or doesn't make sense, or if
>>>>> you
>>>>> need an address to send gifts, etc. Hope it's solved!
>>>>>
>>>>> -Evan
>>>>>
>>>>> On Thu, Aug 23, 2012 at 11:20 AM, Sam Su <susltd.su@xxxxxxxxx> wrote:
>>>>>>
>>>>>> Hi Oleg,
>>>>>>
>>>>>> Thank you for your investigation. Good lucky!
>>>>>>
>>>>>> Can you let me know if find how to fix the bug?
>>>>>>
>>>>>> Thanks,
>>>>>> Sam
>>>>>>
>>>>>> On Wed, Aug 22, 2012 at 12:50 PM, Oleg Gelbukh <ogelbukh@xxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Is it possible that, during snapshotting, libvirt just tears down
>>>>>>> virtual
>>>>>>> interface at some point, and then re-creates it, with hairpin_mode
>>>>>>> disabled
>>>>>>> again?
>>>>>>> This bugfix [https://bugs.launchpad.net/nova/+bug/933640] implies that
>>>>>>> fix works on spawn of instance. This means that upon resume after
>>>>>>> snapshot,
>>>>>>> hairpin is not restored. May be if we insert the _enable_hairpin()
>>>>>>> call in
>>>>>>> snapshot procedure, it helps.
>>>>>>> We're currently investigating this issue in one of our environments,
>>>>>>> hope
>>>>>>> to come up with answer by tomorrow.
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Oleg
>>>>>>>
>>>>>>> On Wed, Aug 22, 2012 at 11:29 PM, Sam Su <susltd.su@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> My friend has found a way to enable ping itself, when this problem
>>>>>>>> happened. But not found why this happen.
>>>>>>>> sudo echo "1" >
>>>>>>>> /sys/class/net/br1000/brif/<virtual-interface-name>/hairpin_mode
>>>>>>>>
>>>>>>>> I file a ticket to report this problem:
>>>>>>>> https://bugs.launchpad.net/nova/+bug/1040255
>>>>>>>>
>>>>>>>> hopefully someone can find why this happen and solve it.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sam
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 20, 2012 at 3:50 PM, Gabriel Hurley
>>>>>>>> <Gabriel.Hurley@xxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> I ran into some similar issues with the _enable_hairpin() call. The
>>>>>>>>> call is allowed to fail silently and (in my case) was failing. I
>>>>>>>>> couldn’t
>>>>>>>>> for the life of me figure out why, though, and since I’m really not
>>>>>>>>> a
>>>>>>>>> networking person I didn’t trace it along too far.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Just thought I’d share my similar pain.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -          Gabriel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From:
>>>>>>>>> openstack-bounces+gabriel.hurley=nebula.com@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>
>>>>>>>>> [mailto:openstack-bounces+gabriel.hurley=nebula.com@xxxxxxxxxxxxxxxxxxx] On
>>>>>>>>> Behalf Of Sam Su
>>>>>>>>> Sent: Thursday, July 19, 2012 11:50 AM
>>>>>>>>> To: Brian Haley
>>>>>>>>> Cc: openstack
>>>>>>>>> Subject: Re: [Openstack] VM can't ping self floating IP after a
>>>>>>>>> snapshot is taken
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you for your support.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I checked the file  nova/virt/libvirt/connection.py, the sentence
>>>>>>>>> self._enable_hairpin(instance) is already added to the function
>>>>>>>>> _hard_reboot().
>>>>>>>>>
>>>>>>>>> It looks like there are some difference between taking snapshot and
>>>>>>>>> reboot instance. I tried to figure out how to fix this bug but
>>>>>>>>> failed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It will be much appreciated if anyone can give some hints.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Sam
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 19, 2012 at 8:37 AM, Brian Haley <brian.haley@xxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 07/17/2012 05:56 PM, Sam Su wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Just This always happens in Essex release. After I take a snapshot
>>>>>>>>>> of
>>>>>>>>>> my VM ( I
>>>>>>>>>> tried Ubuntu 12.04 or CentOS 5.8), VM can't ping its self floating
>>>>>>>>>> IP; before I
>>>>>>>>>> take a snapshot though, VM can ping its self floating IP.
>>>>>>>>>>
>>>>>>>>>> This looks closely related to
>>>>>>>>>> https://bugs.launchpad.net/nova/+bug/933640, but
>>>>>>>>>> still a little different. In 933640, it sounds like VM can't ping
>>>>>>>>>> its
>>>>>>>>>> self
>>>>>>>>>> floating IP regardless whether we take a snapshot or not.
>>>>>>>>>>
>>>>>>>>>> Any suggestion to make an easy fix? And what is the root cause of
>>>>>>>>>> the
>>>>>>>>>> problem?
>>>>>>>>>
>>>>>>>>> It might be because there's a missing _enable_hairpin() call in the
>>>>>>>>> reboot()
>>>>>>>>> function.  Try something like this...
>>>>>>>>>
>>>>>>>>> nova/virt/libvirt/connection.py, _hard_reboot():
>>>>>>>>>
>>>>>>>>>             self._create_new_domain(xml)
>>>>>>>>> +            self._enable_hairpin(instance)
>>>>>>>>>             self.firewall_driver.apply_instance_filter(instance,
>>>>>>>>> network_info)
>>>>>>>>>
>>>>>>>>> At least that's what I remember doing myself recently when testing
>>>>>>>>> after a
>>>>>>>>> reboot, don't know about snapshot.
>>>>>>>>>
>>>>>>>>> Folsom has changed enough that something different would need to be
>>>>>>>>> done there.
>>>>>>>>>
>>>>>>>>> -Brian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: https://launchpad.net/~openstack
>>>>>>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~openstack
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~openstack
>>>>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~openstack
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~openstack
>>>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~openstack
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~openstack
>>>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~openstack
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~openstack
>>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~openstack
>>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>


Follow ups

References