← Back to team overview

fuel-dev team mailing list archive

Re: Bonding problems

 

Guys, suggested https://review.openstack.org/76345 fix works OK, though it
makes impossible to understand patch names :-) So we are waiting for Sergey
to provide more human-readable workaround. But we can continue testing with
this patch applied to ensure that 1.9.3 downgrade does not introduce any
regressions.


On Wed, Feb 26, 2014 at 12:11 AM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>wrote:

> Guys, we are testing OVS 1.9.3 on Ubuntu right now. It seems we have some
> problems with l23network module:
> https://bugs.launchpad.net/fuel/+bug/1284801
> We are going to apply a workaround for it. If everything else goes fine,
> we are going to move to 1.9.3 as it is OVS LTS version both for CentOS and
> Ubuntu.
>
>
> On Tue, Feb 25, 2014 at 11:27 PM, Mike Scherbakov <
> mscherbakov@xxxxxxxxxxxx> wrote:
>
>> Great news!!!
>> Andrey, thanks for staying late and waking up early these days in order
>> to resolve this. You deserve a good rest. Przmek - thanks for help!
>> Documentation would be really needed, otherwise users will be getting back
>> to us and complaining that something doesn't work..
>>
>>
>>
>> On Tue, Feb 25, 2014 at 11:04 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>
>>> Okay. I finally have learned to set up LACP between OVS and Procurve
>>> 2510G. It works fine, like the balance-slb do. I leave my flood ping for
>>> the night and will tell you the results tomorrow. It seems we can fly to
>>> production with current versions of openvswitch. But here in Moscow we
>>> still try to build a fully OVS-1.9.3 ISO and test it.
>>>
>>> Of course we need to document all the issues properly. As I know Przemek
>>> wants to publish a good written examples of OVS, Cisco, Juniper and Arista
>>> configs about enabling LACP.
>>>
>>>
>>> On Tue, Feb 25, 2014 at 3:12 PM, Mike Scherbakov <
>>> mscherbakov@xxxxxxxxxxxx> wrote:
>>>
>>>> Good news.
>>>> Thanks Andrey, keep going!
>>>>
>>>>
>>>> On Tue, Feb 25, 2014 at 2:28 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>
>>>>> After 14 hours of a flood ping a hardware lab lost few packets and
>>>>> virtual env lost hundreds of packets. Mode: balance-slb.
>>>>>
>>>>> I'm going to test LACP behaviour today.
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 3:50 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>
>>>>>> Fine. They wrote about that in the documentation too:
>>>>>> http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf page 14 It was
>>>>>> introduced two years ago since version 1.5.0. One problem less!
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 25, 2014 at 3:37 AM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Andrey is correct. It appears that balance-tcp requires successful
>>>>>>> LACP negotiation. See here:
>>>>>>> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L610 and
>>>>>>> here:
>>>>>>> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L1438.
>>>>>>> This also means that when we create bonds with balance-tcp we need to
>>>>>>> configure lacp as well.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 24, 2014 at 3:14 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>>>
>>>>>>>> And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and
>>>>>>>> current problem can be unrelated but they have similar error messages in
>>>>>>>> OVS logs.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>>>>
>>>>>>>>> Guys, I set up hardware (2 nodes) and software (3 nodes) labs
>>>>>>>>> today with ISO #181 to test bonding. Unfortunately, balance-tcp mode is
>>>>>>>>> totally broken. When I use it during deployment or switch to it in a
>>>>>>>>> working cluster, all traffic stops. Playing with rebalance interval doesn't
>>>>>>>>> help.
>>>>>>>>> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware
>>>>>>>>> nodes) and CentOS (virtual env) works without any traffic lost. I'm running
>>>>>>>>> a flooded ping between virtual instances inside of clouds for a night and
>>>>>>>>> will check a number of lost packets. Also I want to play with iperf.
>>>>>>>>>
>>>>>>>>> Next things we can do:
>>>>>>>>> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of
>>>>>>>>> OVS and play with them. Yesterday we decided to build Ubuntu 12.04 with
>>>>>>>>> Debian Sid 1.9.3 version of OVS. There is the ticket about that
>>>>>>>>> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its
>>>>>>>>> own version of an ISO with Sid package.
>>>>>>>>> * Dump openflow rules in balance-tcp mode and try to fix them.
>>>>>>>>> It's hard to do that because Aliens developed their syntax.
>>>>>>>>> * Run Igor's tests again and again until balance-slb starts block
>>>>>>>>> a traffic. Then dig into openflow rules.
>>>>>>>>> * Play with LACP on a real hardware. Maybe balance-tcp can be used
>>>>>>>>> only with lacp=active.
>>>>>>>>> * Ask the openvswitch community about our problems.
>>>>>>>>>
>>>>>>>>> Andrew, yes, the PXE network still nailed to an interface. I hope
>>>>>>>>> we will fix it in 5.0.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <
>>>>>>>>> ishishkin@xxxxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> Hello, Dmitry.
>>>>>>>>>>
>>>>>>>>>> It's 100% reproducible on virtual environment when we're trying
>>>>>>>>>> to deploy bonding in balance tcp or balance slb mode.
>>>>>>>>>> Tests related as a way to reproduce and a warning why these tests
>>>>>>>>>> should fail when they'll be merged.
>>>>>>>>>>
>>>>>>>>>> As we can see problem is in rebalance procedure openvswitch tries
>>>>>>>>>> to do since it started bonded interface. And in this time bonded interfaces
>>>>>>>>>> stops to accept ARPs.
>>>>>>>>>>
>>>>>>>>>> I just built openvswitch=1.9.3 which is LTS and wanna try it in
>>>>>>>>>> the same case and try to descrease bond-rebalance-interval to 0(as Andrey
>>>>>>>>>> K. suggested). If any of this will help - this could be the solution(but
>>>>>>>>>> I'm really not sure bond-rebalance-interval=0 is a good way).
>>>>>>>>>> --
>>>>>>>>>> Igor Shishkin
>>>>>>>>>> QA Engineer
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <
>>>>>>>>>> dborodaenko@xxxxxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> > Mike, Igor,
>>>>>>>>>> >
>>>>>>>>>> > Can you provide more details on how the integration test in
>>>>>>>>>> review
>>>>>>>>>> > #75161 helps to reproduce bug #1272842?
>>>>>>>>>> >
>>>>>>>>>> > As far as I understand, the bug is a highly intermittent
>>>>>>>>>> problem with
>>>>>>>>>> > ARP that was only showing up after an environment with LACP
>>>>>>>>>> bonding
>>>>>>>>>> > was operational for at least a few hours.
>>>>>>>>>> >
>>>>>>>>>> > On the other hand, the problem Igor is reporting based on the
>>>>>>>>>> > integration test sounds like something 100% reproducible that
>>>>>>>>>> doesn't
>>>>>>>>>> > require real hardware or LACP and is not necessarily related to
>>>>>>>>>> ARP.
>>>>>>>>>> >
>>>>>>>>>> > Are you sure you're not confusing two unrelated problems?
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > -DmitryB
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>>>>>>>>>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>>>>>>>>>> >> The issue is here:
>>>>>>>>>> https://bugs.launchpad.net/fuel/+bug/1272842.
>>>>>>>>>> >> Those who know what can be wrong with our openvswitch/kernel,
>>>>>>>>>> please provide
>>>>>>>>>> >> your input..
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <
>>>>>>>>>> ishishkin@xxxxxxxxxxxx>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Hello,
>>>>>>>>>> >>>
>>>>>>>>>> >>> Currently we have this review
>>>>>>>>>> https://review.openstack.org/#/c/75161 with
>>>>>>>>>> >>> test cases for our brand new shiny bonding feature but
>>>>>>>>>> >>> balance-tcp/balance-slb modes are not working for now.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Steps to reproduce are very simple:
>>>>>>>>>> >>> Create cluster with simple or HA configuration, select
>>>>>>>>>> balance-tcp or
>>>>>>>>>> >>> balance-slb bonding mode and start deployment.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Deployment will not finish with success because of rebalance
>>>>>>>>>> procedure
>>>>>>>>>> >>> problems.
>>>>>>>>>> >>> --
>>>>>>>>>> >>> Igor Shishkin
>>>>>>>>>> >>> QA Engineer
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Mike Scherbakov
>>>>>>>>>> >> #mihgen
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>>>> >> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>>>> >> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>> >>
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Dmitry Borodaenko
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Andrey Danin
>>>>>>>>> adanin@xxxxxxxxxxxx
>>>>>>>>> skype: gcon.monolake
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Andrey Danin
>>>>>>>> adanin@xxxxxxxxxxxx
>>>>>>>> skype: gcon.monolake
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrey Danin
>>>>>> adanin@xxxxxxxxxxxx
>>>>>> skype: gcon.monolake
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Andrey Danin
>>>>> adanin@xxxxxxxxxxxx
>>>>> skype: gcon.monolake
>>>>>
>>>>> --
>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Mike Scherbakov
>>>> #mihgen
>>>>
>>>
>>>
>>>
>>> --
>>> Andrey Danin
>>> adanin@xxxxxxxxxxxx
>>> skype: gcon.monolake
>>>
>>
>>
>>
>> --
>> Mike Scherbakov
>> #mihgen
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Senior Deployment Engineer,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 45bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com <http://www.mirantis.ru/>
> www.mirantis.ru
> vkuklin@xxxxxxxxxxxx
>



-- 
Yours Faithfully,
Vladimir Kuklin,
Senior Deployment Engineer,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
45bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin@xxxxxxxxxxxx

Follow ups

References