← Back to team overview

fuel-dev team mailing list archive

Re: Bonding problems

 

Okay. I finally have learned to set up LACP between OVS and Procurve 2510G.
It works fine, like the balance-slb do. I leave my flood ping for the night
and will tell you the results tomorrow. It seems we can fly to production
with current versions of openvswitch. But here in Moscow we still try to
build a fully OVS-1.9.3 ISO and test it.

Of course we need to document all the issues properly. As I know Przemek
wants to publish a good written examples of OVS, Cisco, Juniper and Arista
configs about enabling LACP.


On Tue, Feb 25, 2014 at 3:12 PM, Mike Scherbakov
<mscherbakov@xxxxxxxxxxxx>wrote:

> Good news.
> Thanks Andrey, keep going!
>
>
> On Tue, Feb 25, 2014 at 2:28 PM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:
>
>> After 14 hours of a flood ping a hardware lab lost few packets and
>> virtual env lost hundreds of packets. Mode: balance-slb.
>>
>> I'm going to test LACP behaviour today.
>>
>>
>> On Tue, Feb 25, 2014 at 3:50 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>
>>> Fine. They wrote about that in the documentation too:
>>> http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf page 14 It was
>>> introduced two years ago since version 1.5.0. One problem less!
>>>
>>>
>>> On Tue, Feb 25, 2014 at 3:37 AM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:
>>>
>>>> Andrey is correct. It appears that balance-tcp requires successful LACP
>>>> negotiation. See here:
>>>> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L610 and
>>>> here: https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L1438.
>>>> This also means that when we create bonds with balance-tcp we need to
>>>> configure lacp as well.
>>>>
>>>>
>>>> On Mon, Feb 24, 2014 at 3:14 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>
>>>>> And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and
>>>>> current problem can be unrelated but they have similar error messages in
>>>>> OVS logs.
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>
>>>>>> Guys, I set up hardware (2 nodes) and software (3 nodes) labs today
>>>>>> with ISO #181 to test bonding. Unfortunately, balance-tcp mode is totally
>>>>>> broken. When I use it during deployment or switch to it in a working
>>>>>> cluster, all traffic stops. Playing with rebalance interval doesn't help.
>>>>>> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware
>>>>>> nodes) and CentOS (virtual env) works without any traffic lost. I'm running
>>>>>> a flooded ping between virtual instances inside of clouds for a night and
>>>>>> will check a number of lost packets. Also I want to play with iperf.
>>>>>>
>>>>>> Next things we can do:
>>>>>> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of OVS
>>>>>> and play with them. Yesterday we decided to build Ubuntu 12.04 with Debian
>>>>>> Sid 1.9.3 version of OVS. There is the ticket about that
>>>>>> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its own
>>>>>> version of an ISO with Sid package.
>>>>>> * Dump openflow rules in balance-tcp mode and try to fix them. It's
>>>>>> hard to do that because Aliens developed their syntax.
>>>>>> * Run Igor's tests again and again until balance-slb starts block a
>>>>>> traffic. Then dig into openflow rules.
>>>>>> * Play with LACP on a real hardware. Maybe balance-tcp can be used
>>>>>> only with lacp=active.
>>>>>> * Ask the openvswitch community about our problems.
>>>>>>
>>>>>> Andrew, yes, the PXE network still nailed to an interface. I hope we
>>>>>> will fix it in 5.0.
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <
>>>>>> ishishkin@xxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hello, Dmitry.
>>>>>>>
>>>>>>> It’s 100% reproducible on virtual environment when we’re trying to
>>>>>>> deploy bonding in balance tcp or balance slb mode.
>>>>>>> Tests related as a way to reproduce and a warning why these tests
>>>>>>> should fail when they’ll be merged.
>>>>>>>
>>>>>>> As we can see problem is in rebalance procedure openvswitch tries to
>>>>>>> do since it started bonded interface. And in this time bonded interfaces
>>>>>>> stops to accept ARPs.
>>>>>>>
>>>>>>> I just built openvswitch=1.9.3 which is LTS and wanna try it in the
>>>>>>> same case and try to descrease bond-rebalance-interval to 0(as Andrey K.
>>>>>>> suggested). If any of this will help - this could be the solution(but I'm
>>>>>>> really not sure bond-rebalance-interval=0 is a good way).
>>>>>>> —
>>>>>>> Igor Shishkin
>>>>>>> QA Engineer
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <
>>>>>>> dborodaenko@xxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> > Mike, Igor,
>>>>>>> >
>>>>>>> > Can you provide more details on how the integration test in review
>>>>>>> > #75161 helps to reproduce bug #1272842?
>>>>>>> >
>>>>>>> > As far as I understand, the bug is a highly intermittent problem
>>>>>>> with
>>>>>>> > ARP that was only showing up after an environment with LACP bonding
>>>>>>> > was operational for at least a few hours.
>>>>>>> >
>>>>>>> > On the other hand, the problem Igor is reporting based on the
>>>>>>> > integration test sounds like something 100% reproducible that
>>>>>>> doesn't
>>>>>>> > require real hardware or LACP and is not necessarily related to
>>>>>>> ARP.
>>>>>>> >
>>>>>>> > Are you sure you're not confusing two unrelated problems?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > -DmitryB
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>>>>>>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>>>>>>> >> The issue is here: https://bugs.launchpad.net/fuel/+bug/1272842.
>>>>>>> >> Those who know what can be wrong with our openvswitch/kernel,
>>>>>>> please provide
>>>>>>> >> your input..
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <
>>>>>>> ishishkin@xxxxxxxxxxxx>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hello,
>>>>>>> >>>
>>>>>>> >>> Currently we have this review
>>>>>>> https://review.openstack.org/#/c/75161 with
>>>>>>> >>> test cases for our brand new shiny bonding feature but
>>>>>>> >>> balance-tcp/balance-slb modes are not working for now.
>>>>>>> >>>
>>>>>>> >>> Steps to reproduce are very simple:
>>>>>>> >>> Create cluster with simple or HA configuration, select
>>>>>>> balance-tcp or
>>>>>>> >>> balance-slb bonding mode and start deployment.
>>>>>>> >>>
>>>>>>> >>> Deployment will not finish with success because of rebalance
>>>>>>> procedure
>>>>>>> >>> problems.
>>>>>>> >>> --
>>>>>>> >>> Igor Shishkin
>>>>>>> >>> QA Engineer
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Mike Scherbakov
>>>>>>> >> #mihgen
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>> >> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>> >> More help   : https://help.launchpad.net/ListHelp
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Dmitry Borodaenko
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrey Danin
>>>>>> adanin@xxxxxxxxxxxx
>>>>>> skype: gcon.monolake
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Andrey Danin
>>>>> adanin@xxxxxxxxxxxx
>>>>> skype: gcon.monolake
>>>>>
>>>>> --
>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Andrey Danin
>>> adanin@xxxxxxxxxxxx
>>> skype: gcon.monolake
>>>
>>
>>
>>
>> --
>> Andrey Danin
>> adanin@xxxxxxxxxxxx
>> skype: gcon.monolake
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Mike Scherbakov
> #mihgen
>



-- 
Andrey Danin
adanin@xxxxxxxxxxxx
skype: gcon.monolake

Follow ups

References