openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #16137
Re: VM can't ping self floating IP after a snapshot is taken
That's great, thank you for your efforts. Can you make a backport for essex?
Sent from my iPhone
On Aug 24, 2012, at 7:15 PM, heut2008 <heut2008@xxxxxxxxx> wrote:
> I have fixed it here https://review.openstack.org/#/c/11925/
>
> 2012/8/25 Sam Su <susltd.su@xxxxxxxxx>:
>> Hi,
>>
>> I also reported this bug:
>> https://bugs.launchpad.net/nova/+bug/1040255
>>
>> If someone can combine you guys solution and get a perfect way to fix this
>> bug, that will be great.
>>
>> BRs,
>> Sam
>>
>>
>> On Thu, Aug 23, 2012 at 9:27 PM, heut2008 <heut2008@xxxxxxxxx> wrote:
>>>
>>> this bug has been filed here https://bugs.launchpad.net/nova/+bug/1040537
>>>
>>> 2012/8/24 Vishvananda Ishaya <vishvananda@xxxxxxxxx>:
>>>> +1 to this. Evan, can you report a bug (if one hasn't been reported yet)
>>>> and
>>>> propose the fix? Or else I can find someone else to propose it.
>>>>
>>>> Vish
>>>>
>>>> On Aug 23, 2012, at 1:38 PM, Evan Callicoat <diopter@xxxxxxxxx> wrote:
>>>>
>>>> Hello all!
>>>>
>>>> I'm the original author of the hairpin patch, and things have changed a
>>>> little bit in Essex and Folsom from the original Diablo target. I
>>>> believe I
>>>> can shed some light on what should be done here to solve the issue in
>>>> either
>>>> case.
>>>>
>>>> ---
>>>> For Essex (stable/essex), in nova/virt/libvirt/connection.py:
>>>> ---
>>>>
>>>> Currently _enable_hairpin() is only being called from spawn(). However,
>>>> spawn() is not the only place that vifs (veth#) get added to a bridge
>>>> (which
>>>> is when we need to enable hairpin_mode on them). The more relevant
>>>> function
>>>> is _create_new_domain(), which is called from spawn() and other places.
>>>> Without changing the information that gets passed to
>>>> _create_new_domain()
>>>> (which is just 'xml' from to_xml()), we can easily rewrite the first 2
>>>> lines
>>>> in _enable_hairpin(), as follows:
>>>>
>>>> def _enable_hairpin(self, xml):
>>>> interfaces = self.get_interfaces(xml['name'])
>>>>
>>>> Then, we can move the self._enable_hairpin(instance) call from spawn()
>>>> up
>>>> into _create_new_domain(), and pass it xml as follows:
>>>>
>>>> [...]
>>>> self._enable_hairpin(xml)
>>>> return domain
>>>>
>>>> This will run the hairpin code every time a domain gets created, which
>>>> is
>>>> also when the domain's vif(s) gets inserted into the bridge with the
>>>> default
>>>> of hairpin_mode=0.
>>>>
>>>> ---
>>>> For Folsom (trunk), in nova/virt/libvirt/driver.py:
>>>> ---
>>>>
>>>> There've been a lot more changes made here, but the same strategy as
>>>> above
>>>> should work. Here, _create_new_domain() has been split into
>>>> _create_domain()
>>>> and _create_domain_and_network(), and _enable_hairpin() was moved from
>>>> spawn() to _create_domain_and_network(), which seems like it'd be the
>>>> right
>>>> thing to do, but doesn't quite cover all of the cases of vif
>>>> reinsertion,
>>>> since _create_domain() is the only function which actually creates the
>>>> domain (_create_domain_and_network() just calls it after doing some
>>>> pre-work). The solution here is likewise fairly simple; make the same 2
>>>> changes to _enable_hairpin():
>>>>
>>>> def _enable_hairpin(self, xml):
>>>> interfaces = self.get_interfaces(xml['name'])
>>>>
>>>> And move it from _create_domain_and_network() to _create_domain(), like
>>>> before:
>>>>
>>>> [...]
>>>> self._enable_hairpin(xml)
>>>> return domain
>>>>
>>>> I haven't yet tested this on my Essex clusters and I don't have a Folsom
>>>> cluster handy at present, but the change is simple and makes sense.
>>>> Looking
>>>> at to_xml() and _prepare_xml_info(), it appears that the 'xml' variable
>>>> _create_[new_]domain() gets is just a python dictionary, and xml['name']
>>>> =
>>>> instance['name'], exactly what _enable_hairpin() was using the
>>>> 'instance'
>>>> variable for previously.
>>>>
>>>> Let me know if this works, or doesn't work, or doesn't make sense, or if
>>>> you
>>>> need an address to send gifts, etc. Hope it's solved!
>>>>
>>>> -Evan
>>>>
>>>> On Thu, Aug 23, 2012 at 11:20 AM, Sam Su <susltd.su@xxxxxxxxx> wrote:
>>>>>
>>>>> Hi Oleg,
>>>>>
>>>>> Thank you for your investigation. Good lucky!
>>>>>
>>>>> Can you let me know if find how to fix the bug?
>>>>>
>>>>> Thanks,
>>>>> Sam
>>>>>
>>>>> On Wed, Aug 22, 2012 at 12:50 PM, Oleg Gelbukh <ogelbukh@xxxxxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Is it possible that, during snapshotting, libvirt just tears down
>>>>>> virtual
>>>>>> interface at some point, and then re-creates it, with hairpin_mode
>>>>>> disabled
>>>>>> again?
>>>>>> This bugfix [https://bugs.launchpad.net/nova/+bug/933640] implies that
>>>>>> fix works on spawn of instance. This means that upon resume after
>>>>>> snapshot,
>>>>>> hairpin is not restored. May be if we insert the _enable_hairpin()
>>>>>> call in
>>>>>> snapshot procedure, it helps.
>>>>>> We're currently investigating this issue in one of our environments,
>>>>>> hope
>>>>>> to come up with answer by tomorrow.
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Oleg
>>>>>>
>>>>>> On Wed, Aug 22, 2012 at 11:29 PM, Sam Su <susltd.su@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> My friend has found a way to enable ping itself, when this problem
>>>>>>> happened. But not found why this happen.
>>>>>>> sudo echo "1" >
>>>>>>> /sys/class/net/br1000/brif/<virtual-interface-name>/hairpin_mode
>>>>>>>
>>>>>>> I file a ticket to report this problem:
>>>>>>> https://bugs.launchpad.net/nova/+bug/1040255
>>>>>>>
>>>>>>> hopefully someone can find why this happen and solve it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sam
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 20, 2012 at 3:50 PM, Gabriel Hurley
>>>>>>> <Gabriel.Hurley@xxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> I ran into some similar issues with the _enable_hairpin() call. The
>>>>>>>> call is allowed to fail silently and (in my case) was failing. I
>>>>>>>> couldn’t
>>>>>>>> for the life of me figure out why, though, and since I’m really not
>>>>>>>> a
>>>>>>>> networking person I didn’t trace it along too far.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Just thought I’d share my similar pain.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> - Gabriel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From:
>>>>>>>> openstack-bounces+gabriel.hurley=nebula.com@xxxxxxxxxxxxxxxxxxx
>>>>>>>>
>>>>>>>> [mailto:openstack-bounces+gabriel.hurley=nebula.com@xxxxxxxxxxxxxxxxxxx] On
>>>>>>>> Behalf Of Sam Su
>>>>>>>> Sent: Thursday, July 19, 2012 11:50 AM
>>>>>>>> To: Brian Haley
>>>>>>>> Cc: openstack
>>>>>>>> Subject: Re: [Openstack] VM can't ping self floating IP after a
>>>>>>>> snapshot is taken
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you for your support.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I checked the file nova/virt/libvirt/connection.py, the sentence
>>>>>>>> self._enable_hairpin(instance) is already added to the function
>>>>>>>> _hard_reboot().
>>>>>>>>
>>>>>>>> It looks like there are some difference between taking snapshot and
>>>>>>>> reboot instance. I tried to figure out how to fix this bug but
>>>>>>>> failed.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It will be much appreciated if anyone can give some hints.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Sam
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 19, 2012 at 8:37 AM, Brian Haley <brian.haley@xxxxxx>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 07/17/2012 05:56 PM, Sam Su wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Just This always happens in Essex release. After I take a snapshot
>>>>>>>>> of
>>>>>>>>> my VM ( I
>>>>>>>>> tried Ubuntu 12.04 or CentOS 5.8), VM can't ping its self floating
>>>>>>>>> IP; before I
>>>>>>>>> take a snapshot though, VM can ping its self floating IP.
>>>>>>>>>
>>>>>>>>> This looks closely related to
>>>>>>>>> https://bugs.launchpad.net/nova/+bug/933640, but
>>>>>>>>> still a little different. In 933640, it sounds like VM can't ping
>>>>>>>>> its
>>>>>>>>> self
>>>>>>>>> floating IP regardless whether we take a snapshot or not.
>>>>>>>>>
>>>>>>>>> Any suggestion to make an easy fix? And what is the root cause of
>>>>>>>>> the
>>>>>>>>> problem?
>>>>>>>>
>>>>>>>> It might be because there's a missing _enable_hairpin() call in the
>>>>>>>> reboot()
>>>>>>>> function. Try something like this...
>>>>>>>>
>>>>>>>> nova/virt/libvirt/connection.py, _hard_reboot():
>>>>>>>>
>>>>>>>> self._create_new_domain(xml)
>>>>>>>> + self._enable_hairpin(instance)
>>>>>>>> self.firewall_driver.apply_instance_filter(instance,
>>>>>>>> network_info)
>>>>>>>>
>>>>>>>> At least that's what I remember doing myself recently when testing
>>>>>>>> after a
>>>>>>>> reboot, don't know about snapshot.
>>>>>>>>
>>>>>>>> Folsom has changed enough that something different would need to be
>>>>>>>> done there.
>>>>>>>>
>>>>>>>> -Brian
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~openstack
>>>>>>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~openstack
>>>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~openstack
>>>>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~openstack
>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~openstack
>>>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~openstack
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~openstack
>>>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~openstack
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help : https://help.launchpad.net/ListHelp
>>
>>
Follow ups
References
-
VM can't ping self floating IP after a snapshot is taken
From: Sam Su, 2012-07-17
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Brian Haley, 2012-07-19
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Sam Su, 2012-07-19
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Gabriel Hurley, 2012-07-20
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Sam Su, 2012-08-22
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Oleg Gelbukh, 2012-08-22
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Sam Su, 2012-08-23
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Evan Callicoat, 2012-08-23
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Vishvananda Ishaya, 2012-08-24
-
Re: VM can't ping self floating IP after a snapshot is taken
From: heut2008, 2012-08-24
-
Re: VM can't ping self floating IP after a snapshot is taken
From: Sam Su, 2012-08-24
-
Re: VM can't ping self floating IP after a snapshot is taken
From: heut2008, 2012-08-25