← Back to team overview

fuel-dev team mailing list archive

Re: Bonding problems

 

Fine. They wrote about that in the documentation too:
http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf page 14 It was introduced
two years ago since version 1.5.0. One problem less!


On Tue, Feb 25, 2014 at 3:37 AM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:

> Andrey is correct. It appears that balance-tcp requires successful LACP
> negotiation. See here:
> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L610 and here:
> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L1438. This
> also means that when we create bonds with balance-tcp we need to configure
> lacp as well.
>
>
> On Mon, Feb 24, 2014 at 3:14 PM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:
>
>> And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and
>> current problem can be unrelated but they have similar error messages in
>> OVS logs.
>>
>>
>> On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>
>>> Guys, I set up hardware (2 nodes) and software (3 nodes) labs today with
>>> ISO #181 to test bonding. Unfortunately, balance-tcp mode is totally
>>> broken. When I use it during deployment or switch to it in a working
>>> cluster, all traffic stops. Playing with rebalance interval doesn't help.
>>> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware nodes)
>>> and CentOS (virtual env) works without any traffic lost. I'm running a
>>> flooded ping between virtual instances inside of clouds for a night and
>>> will check a number of lost packets. Also I want to play with iperf.
>>>
>>> Next things we can do:
>>> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of OVS and
>>> play with them. Yesterday we decided to build Ubuntu 12.04 with Debian Sid
>>> 1.9.3 version of OVS. There is the ticket about that
>>> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its own
>>> version of an ISO with Sid package.
>>> * Dump openflow rules in balance-tcp mode and try to fix them. It's hard
>>> to do that because Aliens developed their syntax.
>>> * Run Igor's tests again and again until balance-slb starts block a
>>> traffic. Then dig into openflow rules.
>>> * Play with LACP on a real hardware. Maybe balance-tcp can be used only
>>> with lacp=active.
>>> * Ask the openvswitch community about our problems.
>>>
>>> Andrew, yes, the PXE network still nailed to an interface. I hope we
>>> will fix it in 5.0.
>>>
>>>
>>> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <ishishkin@xxxxxxxxxxxx>wrote:
>>>
>>>> Hello, Dmitry.
>>>>
>>>> It’s 100% reproducible on virtual environment when we’re trying to
>>>> deploy bonding in balance tcp or balance slb mode.
>>>> Tests related as a way to reproduce and a warning why these tests
>>>> should fail when they’ll be merged.
>>>>
>>>> As we can see problem is in rebalance procedure openvswitch tries to do
>>>> since it started bonded interface. And in this time bonded interfaces stops
>>>> to accept ARPs.
>>>>
>>>> I just built openvswitch=1.9.3 which is LTS and wanna try it in the
>>>> same case and try to descrease bond-rebalance-interval to 0(as Andrey K.
>>>> suggested). If any of this will help - this could be the solution(but I'm
>>>> really not sure bond-rebalance-interval=0 is a good way).
>>>> —
>>>> Igor Shishkin
>>>> QA Engineer
>>>>
>>>>
>>>>
>>>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <dborodaenko@xxxxxxxxxxxx>
>>>> wrote:
>>>>
>>>> > Mike, Igor,
>>>> >
>>>> > Can you provide more details on how the integration test in review
>>>> > #75161 helps to reproduce bug #1272842?
>>>> >
>>>> > As far as I understand, the bug is a highly intermittent problem with
>>>> > ARP that was only showing up after an environment with LACP bonding
>>>> > was operational for at least a few hours.
>>>> >
>>>> > On the other hand, the problem Igor is reporting based on the
>>>> > integration test sounds like something 100% reproducible that doesn't
>>>> > require real hardware or LACP and is not necessarily related to ARP.
>>>> >
>>>> > Are you sure you're not confusing two unrelated problems?
>>>> >
>>>> > Thanks,
>>>> > -DmitryB
>>>> >
>>>> >
>>>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>>>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>>>> >> The issue is here: https://bugs.launchpad.net/fuel/+bug/1272842.
>>>> >> Those who know what can be wrong with our openvswitch/kernel, please
>>>> provide
>>>> >> your input..
>>>> >>
>>>> >>
>>>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <
>>>> ishishkin@xxxxxxxxxxxx>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> Currently we have this review
>>>> https://review.openstack.org/#/c/75161 with
>>>> >>> test cases for our brand new shiny bonding feature but
>>>> >>> balance-tcp/balance-slb modes are not working for now.
>>>> >>>
>>>> >>> Steps to reproduce are very simple:
>>>> >>> Create cluster with simple or HA configuration, select balance-tcp
>>>> or
>>>> >>> balance-slb bonding mode and start deployment.
>>>> >>>
>>>> >>> Deployment will not finish with success because of rebalance
>>>> procedure
>>>> >>> problems.
>>>> >>> --
>>>> >>> Igor Shishkin
>>>> >>> QA Engineer
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Mike Scherbakov
>>>> >> #mihgen
>>>> >>
>>>> >> --
>>>> >> Mailing list: https://launchpad.net/~fuel-dev
>>>> >> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>>>> >> More help   : https://help.launchpad.net/ListHelp
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Dmitry Borodaenko
>>>>
>>>>
>>>> --
>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>
>>>
>>>
>>> --
>>> Andrey Danin
>>> adanin@xxxxxxxxxxxx
>>> skype: gcon.monolake
>>>
>>
>>
>>
>> --
>> Andrey Danin
>> adanin@xxxxxxxxxxxx
>> skype: gcon.monolake
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>


-- 
Andrey Danin
adanin@xxxxxxxxxxxx
skype: gcon.monolake

Follow ups

References