fuel-dev team mailing list archive
-
fuel-dev team
-
Mailing list archive
-
Message #00532
Re: Bonding problems
Andrey is correct. It appears that balance-tcp requires successful LACP
negotiation. See here:
https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L610 and here:
https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L1438. This also
means that when we create bonds with balance-tcp we need to configure lacp
as well.
On Mon, Feb 24, 2014 at 3:14 PM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:
> And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and current
> problem can be unrelated but they have similar error messages in OVS logs.
>
>
> On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:
>
>> Guys, I set up hardware (2 nodes) and software (3 nodes) labs today with
>> ISO #181 to test bonding. Unfortunately, balance-tcp mode is totally
>> broken. When I use it during deployment or switch to it in a working
>> cluster, all traffic stops. Playing with rebalance interval doesn't help.
>> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware nodes)
>> and CentOS (virtual env) works without any traffic lost. I'm running a
>> flooded ping between virtual instances inside of clouds for a night and
>> will check a number of lost packets. Also I want to play with iperf.
>>
>> Next things we can do:
>> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of OVS and
>> play with them. Yesterday we decided to build Ubuntu 12.04 with Debian Sid
>> 1.9.3 version of OVS. There is the ticket about that
>> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its own
>> version of an ISO with Sid package.
>> * Dump openflow rules in balance-tcp mode and try to fix them. It's hard
>> to do that because Aliens developed their syntax.
>> * Run Igor's tests again and again until balance-slb starts block a
>> traffic. Then dig into openflow rules.
>> * Play with LACP on a real hardware. Maybe balance-tcp can be used only
>> with lacp=active.
>> * Ask the openvswitch community about our problems.
>>
>> Andrew, yes, the PXE network still nailed to an interface. I hope we will
>> fix it in 5.0.
>>
>>
>> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <ishishkin@xxxxxxxxxxxx>wrote:
>>
>>> Hello, Dmitry.
>>>
>>> It's 100% reproducible on virtual environment when we're trying to
>>> deploy bonding in balance tcp or balance slb mode.
>>> Tests related as a way to reproduce and a warning why these tests should
>>> fail when they'll be merged.
>>>
>>> As we can see problem is in rebalance procedure openvswitch tries to do
>>> since it started bonded interface. And in this time bonded interfaces stops
>>> to accept ARPs.
>>>
>>> I just built openvswitch=1.9.3 which is LTS and wanna try it in the same
>>> case and try to descrease bond-rebalance-interval to 0(as Andrey K.
>>> suggested). If any of this will help - this could be the solution(but I'm
>>> really not sure bond-rebalance-interval=0 is a good way).
>>> --
>>> Igor Shishkin
>>> QA Engineer
>>>
>>>
>>>
>>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <dborodaenko@xxxxxxxxxxxx>
>>> wrote:
>>>
>>> > Mike, Igor,
>>> >
>>> > Can you provide more details on how the integration test in review
>>> > #75161 helps to reproduce bug #1272842?
>>> >
>>> > As far as I understand, the bug is a highly intermittent problem with
>>> > ARP that was only showing up after an environment with LACP bonding
>>> > was operational for at least a few hours.
>>> >
>>> > On the other hand, the problem Igor is reporting based on the
>>> > integration test sounds like something 100% reproducible that doesn't
>>> > require real hardware or LACP and is not necessarily related to ARP.
>>> >
>>> > Are you sure you're not confusing two unrelated problems?
>>> >
>>> > Thanks,
>>> > -DmitryB
>>> >
>>> >
>>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>>> >> The issue is here: https://bugs.launchpad.net/fuel/+bug/1272842.
>>> >> Those who know what can be wrong with our openvswitch/kernel, please
>>> provide
>>> >> your input..
>>> >>
>>> >>
>>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <
>>> ishishkin@xxxxxxxxxxxx>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Currently we have this review https://review.openstack.org/#/c/75161with
>>> >>> test cases for our brand new shiny bonding feature but
>>> >>> balance-tcp/balance-slb modes are not working for now.
>>> >>>
>>> >>> Steps to reproduce are very simple:
>>> >>> Create cluster with simple or HA configuration, select balance-tcp or
>>> >>> balance-slb bonding mode and start deployment.
>>> >>>
>>> >>> Deployment will not finish with success because of rebalance
>>> procedure
>>> >>> problems.
>>> >>> --
>>> >>> Igor Shishkin
>>> >>> QA Engineer
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Mike Scherbakov
>>> >> #mihgen
>>> >>
>>> >> --
>>> >> Mailing list: https://launchpad.net/~fuel-dev
>>> >> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>>> >> More help : https://help.launchpad.net/ListHelp
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Dmitry Borodaenko
>>>
>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help : https://help.launchpad.net/ListHelp
>>>
>>
>>
>>
>> --
>> Andrey Danin
>> adanin@xxxxxxxxxxxx
>> skype: gcon.monolake
>>
>
>
>
> --
> Andrey Danin
> adanin@xxxxxxxxxxxx
> skype: gcon.monolake
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help : https://help.launchpad.net/ListHelp
>
>
Follow ups
References