fuel-dev team mailing list archive
-
fuel-dev team
-
Mailing list archive
-
Message #00535
Re: Bonding problems
Good news.
Thanks Andrey, keep going!
On Tue, Feb 25, 2014 at 2:28 PM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:
> After 14 hours of a flood ping a hardware lab lost few packets and virtual
> env lost hundreds of packets. Mode: balance-slb.
>
> I'm going to test LACP behaviour today.
>
>
> On Tue, Feb 25, 2014 at 3:50 AM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:
>
>> Fine. They wrote about that in the documentation too:
>> http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf page 14 It was
>> introduced two years ago since version 1.5.0. One problem less!
>>
>>
>> On Tue, Feb 25, 2014 at 3:37 AM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:
>>
>>> Andrey is correct. It appears that balance-tcp requires successful LACP
>>> negotiation. See here:
>>> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L610 and
>>> here: https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L1438.
>>> This also means that when we create bonds with balance-tcp we need to
>>> configure lacp as well.
>>>
>>>
>>> On Mon, Feb 24, 2014 at 3:14 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>
>>>> And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and
>>>> current problem can be unrelated but they have similar error messages in
>>>> OVS logs.
>>>>
>>>>
>>>> On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>
>>>>> Guys, I set up hardware (2 nodes) and software (3 nodes) labs today
>>>>> with ISO #181 to test bonding. Unfortunately, balance-tcp mode is totally
>>>>> broken. When I use it during deployment or switch to it in a working
>>>>> cluster, all traffic stops. Playing with rebalance interval doesn't help.
>>>>> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware nodes)
>>>>> and CentOS (virtual env) works without any traffic lost. I'm running a
>>>>> flooded ping between virtual instances inside of clouds for a night and
>>>>> will check a number of lost packets. Also I want to play with iperf.
>>>>>
>>>>> Next things we can do:
>>>>> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of OVS
>>>>> and play with them. Yesterday we decided to build Ubuntu 12.04 with Debian
>>>>> Sid 1.9.3 version of OVS. There is the ticket about that
>>>>> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its own
>>>>> version of an ISO with Sid package.
>>>>> * Dump openflow rules in balance-tcp mode and try to fix them. It's
>>>>> hard to do that because Aliens developed their syntax.
>>>>> * Run Igor's tests again and again until balance-slb starts block a
>>>>> traffic. Then dig into openflow rules.
>>>>> * Play with LACP on a real hardware. Maybe balance-tcp can be used
>>>>> only with lacp=active.
>>>>> * Ask the openvswitch community about our problems.
>>>>>
>>>>> Andrew, yes, the PXE network still nailed to an interface. I hope we
>>>>> will fix it in 5.0.
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <
>>>>> ishishkin@xxxxxxxxxxxx> wrote:
>>>>>
>>>>>> Hello, Dmitry.
>>>>>>
>>>>>> It's 100% reproducible on virtual environment when we're trying to
>>>>>> deploy bonding in balance tcp or balance slb mode.
>>>>>> Tests related as a way to reproduce and a warning why these tests
>>>>>> should fail when they'll be merged.
>>>>>>
>>>>>> As we can see problem is in rebalance procedure openvswitch tries to
>>>>>> do since it started bonded interface. And in this time bonded interfaces
>>>>>> stops to accept ARPs.
>>>>>>
>>>>>> I just built openvswitch=1.9.3 which is LTS and wanna try it in the
>>>>>> same case and try to descrease bond-rebalance-interval to 0(as Andrey K.
>>>>>> suggested). If any of this will help - this could be the solution(but I'm
>>>>>> really not sure bond-rebalance-interval=0 is a good way).
>>>>>> --
>>>>>> Igor Shishkin
>>>>>> QA Engineer
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <dborodaenko@xxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>> > Mike, Igor,
>>>>>> >
>>>>>> > Can you provide more details on how the integration test in review
>>>>>> > #75161 helps to reproduce bug #1272842?
>>>>>> >
>>>>>> > As far as I understand, the bug is a highly intermittent problem
>>>>>> with
>>>>>> > ARP that was only showing up after an environment with LACP bonding
>>>>>> > was operational for at least a few hours.
>>>>>> >
>>>>>> > On the other hand, the problem Igor is reporting based on the
>>>>>> > integration test sounds like something 100% reproducible that
>>>>>> doesn't
>>>>>> > require real hardware or LACP and is not necessarily related to ARP.
>>>>>> >
>>>>>> > Are you sure you're not confusing two unrelated problems?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > -DmitryB
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>>>>>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>>>>>> >> The issue is here: https://bugs.launchpad.net/fuel/+bug/1272842.
>>>>>> >> Those who know what can be wrong with our openvswitch/kernel,
>>>>>> please provide
>>>>>> >> your input..
>>>>>> >>
>>>>>> >>
>>>>>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <
>>>>>> ishishkin@xxxxxxxxxxxx>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hello,
>>>>>> >>>
>>>>>> >>> Currently we have this review
>>>>>> https://review.openstack.org/#/c/75161 with
>>>>>> >>> test cases for our brand new shiny bonding feature but
>>>>>> >>> balance-tcp/balance-slb modes are not working for now.
>>>>>> >>>
>>>>>> >>> Steps to reproduce are very simple:
>>>>>> >>> Create cluster with simple or HA configuration, select
>>>>>> balance-tcp or
>>>>>> >>> balance-slb bonding mode and start deployment.
>>>>>> >>>
>>>>>> >>> Deployment will not finish with success because of rebalance
>>>>>> procedure
>>>>>> >>> problems.
>>>>>> >>> --
>>>>>> >>> Igor Shishkin
>>>>>> >>> QA Engineer
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Mike Scherbakov
>>>>>> >> #mihgen
>>>>>> >>
>>>>>> >> --
>>>>>> >> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> >> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> >> More help : https://help.launchpad.net/ListHelp
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Dmitry Borodaenko
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Andrey Danin
>>>>> adanin@xxxxxxxxxxxx
>>>>> skype: gcon.monolake
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Andrey Danin
>>>> adanin@xxxxxxxxxxxx
>>>> skype: gcon.monolake
>>>>
>>>> --
>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>
>>
>> --
>> Andrey Danin
>> adanin@xxxxxxxxxxxx
>> skype: gcon.monolake
>>
>
>
>
> --
> Andrey Danin
> adanin@xxxxxxxxxxxx
> skype: gcon.monolake
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help : https://help.launchpad.net/ListHelp
>
>
--
Mike Scherbakov
#mihgen
Follow ups
References