← Back to team overview

fuel-dev team mailing list archive

Re: Bonding problems

 

Great news!!!
Andrey, thanks for staying late and waking up early these days in order to
resolve this. You deserve a good rest. Przmek - thanks for help!
Documentation would be really needed, otherwise users will be getting back
to us and complaining that something doesn't work..



On Tue, Feb 25, 2014 at 11:04 PM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:

> Okay. I finally have learned to set up LACP between OVS and Procurve
> 2510G. It works fine, like the balance-slb do. I leave my flood ping for
> the night and will tell you the results tomorrow. It seems we can fly to
> production with current versions of openvswitch. But here in Moscow we
> still try to build a fully OVS-1.9.3 ISO and test it.
>
> Of course we need to document all the issues properly. As I know Przemek
> wants to publish a good written examples of OVS, Cisco, Juniper and Arista
> configs about enabling LACP.
>
>
> On Tue, Feb 25, 2014 at 3:12 PM, Mike Scherbakov <mscherbakov@xxxxxxxxxxxx
> > wrote:
>
>> Good news.
>> Thanks Andrey, keep going!
>>
>>
>> On Tue, Feb 25, 2014 at 2:28 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>
>>> After 14 hours of a flood ping a hardware lab lost few packets and
>>> virtual env lost hundreds of packets. Mode: balance-slb.
>>>
>>> I'm going to test LACP behaviour today.
>>>
>>>
>>> On Tue, Feb 25, 2014 at 3:50 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>
>>>> Fine. They wrote about that in the documentation too:
>>>> http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf page 14 It was
>>>> introduced two years ago since version 1.5.0. One problem less!
>>>>
>>>>
>>>> On Tue, Feb 25, 2014 at 3:37 AM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:
>>>>
>>>>> Andrey is correct. It appears that balance-tcp requires successful
>>>>> LACP negotiation. See here:
>>>>> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L610 and
>>>>> here: https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L1438.
>>>>> This also means that when we create bonds with balance-tcp we need to
>>>>> configure lacp as well.
>>>>>
>>>>>
>>>>> On Mon, Feb 24, 2014 at 3:14 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>
>>>>>> And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and
>>>>>> current problem can be unrelated but they have similar error messages in
>>>>>> OVS logs.
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>>
>>>>>>> Guys, I set up hardware (2 nodes) and software (3 nodes) labs today
>>>>>>> with ISO #181 to test bonding. Unfortunately, balance-tcp mode is totally
>>>>>>> broken. When I use it during deployment or switch to it in a working
>>>>>>> cluster, all traffic stops. Playing with rebalance interval doesn't help.
>>>>>>> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware
>>>>>>> nodes) and CentOS (virtual env) works without any traffic lost. I'm running
>>>>>>> a flooded ping between virtual instances inside of clouds for a night and
>>>>>>> will check a number of lost packets. Also I want to play with iperf.
>>>>>>>
>>>>>>> Next things we can do:
>>>>>>> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of OVS
>>>>>>> and play with them. Yesterday we decided to build Ubuntu 12.04 with Debian
>>>>>>> Sid 1.9.3 version of OVS. There is the ticket about that
>>>>>>> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its own
>>>>>>> version of an ISO with Sid package.
>>>>>>> * Dump openflow rules in balance-tcp mode and try to fix them. It's
>>>>>>> hard to do that because Aliens developed their syntax.
>>>>>>> * Run Igor's tests again and again until balance-slb starts block a
>>>>>>> traffic. Then dig into openflow rules.
>>>>>>> * Play with LACP on a real hardware. Maybe balance-tcp can be used
>>>>>>> only with lacp=active.
>>>>>>> * Ask the openvswitch community about our problems.
>>>>>>>
>>>>>>> Andrew, yes, the PXE network still nailed to an interface. I hope we
>>>>>>> will fix it in 5.0.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <
>>>>>>> ishishkin@xxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> Hello, Dmitry.
>>>>>>>>
>>>>>>>> It's 100% reproducible on virtual environment when we're trying to
>>>>>>>> deploy bonding in balance tcp or balance slb mode.
>>>>>>>> Tests related as a way to reproduce and a warning why these tests
>>>>>>>> should fail when they'll be merged.
>>>>>>>>
>>>>>>>> As we can see problem is in rebalance procedure openvswitch tries
>>>>>>>> to do since it started bonded interface. And in this time bonded interfaces
>>>>>>>> stops to accept ARPs.
>>>>>>>>
>>>>>>>> I just built openvswitch=1.9.3 which is LTS and wanna try it in the
>>>>>>>> same case and try to descrease bond-rebalance-interval to 0(as Andrey K.
>>>>>>>> suggested). If any of this will help - this could be the solution(but I'm
>>>>>>>> really not sure bond-rebalance-interval=0 is a good way).
>>>>>>>> --
>>>>>>>> Igor Shishkin
>>>>>>>> QA Engineer
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <
>>>>>>>> dborodaenko@xxxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> > Mike, Igor,
>>>>>>>> >
>>>>>>>> > Can you provide more details on how the integration test in review
>>>>>>>> > #75161 helps to reproduce bug #1272842?
>>>>>>>> >
>>>>>>>> > As far as I understand, the bug is a highly intermittent problem
>>>>>>>> with
>>>>>>>> > ARP that was only showing up after an environment with LACP
>>>>>>>> bonding
>>>>>>>> > was operational for at least a few hours.
>>>>>>>> >
>>>>>>>> > On the other hand, the problem Igor is reporting based on the
>>>>>>>> > integration test sounds like something 100% reproducible that
>>>>>>>> doesn't
>>>>>>>> > require real hardware or LACP and is not necessarily related to
>>>>>>>> ARP.
>>>>>>>> >
>>>>>>>> > Are you sure you're not confusing two unrelated problems?
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > -DmitryB
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>>>>>>>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>>>>>>>> >> The issue is here: https://bugs.launchpad.net/fuel/+bug/1272842.
>>>>>>>> >> Those who know what can be wrong with our openvswitch/kernel,
>>>>>>>> please provide
>>>>>>>> >> your input..
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <
>>>>>>>> ishishkin@xxxxxxxxxxxx>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hello,
>>>>>>>> >>>
>>>>>>>> >>> Currently we have this review
>>>>>>>> https://review.openstack.org/#/c/75161 with
>>>>>>>> >>> test cases for our brand new shiny bonding feature but
>>>>>>>> >>> balance-tcp/balance-slb modes are not working for now.
>>>>>>>> >>>
>>>>>>>> >>> Steps to reproduce are very simple:
>>>>>>>> >>> Create cluster with simple or HA configuration, select
>>>>>>>> balance-tcp or
>>>>>>>> >>> balance-slb bonding mode and start deployment.
>>>>>>>> >>>
>>>>>>>> >>> Deployment will not finish with success because of rebalance
>>>>>>>> procedure
>>>>>>>> >>> problems.
>>>>>>>> >>> --
>>>>>>>> >>> Igor Shishkin
>>>>>>>> >>> QA Engineer
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Mike Scherbakov
>>>>>>>> >> #mihgen
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>> >> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>> >> More help   : https://help.launchpad.net/ListHelp
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Dmitry Borodaenko
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Andrey Danin
>>>>>>> adanin@xxxxxxxxxxxx
>>>>>>> skype: gcon.monolake
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrey Danin
>>>>>> adanin@xxxxxxxxxxxx
>>>>>> skype: gcon.monolake
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Andrey Danin
>>>> adanin@xxxxxxxxxxxx
>>>> skype: gcon.monolake
>>>>
>>>
>>>
>>>
>>> --
>>> Andrey Danin
>>> adanin@xxxxxxxxxxxx
>>> skype: gcon.monolake
>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Mike Scherbakov
>> #mihgen
>>
>
>
>
> --
> Andrey Danin
> adanin@xxxxxxxxxxxx
> skype: gcon.monolake
>



-- 
Mike Scherbakov
#mihgen

Follow ups

References