← Back to team overview

fuel-dev team mailing list archive

Re: Bonding problems

 

And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and current
problem can be unrelated but they have similar error messages in OVS logs.


On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx> wrote:

> Guys, I set up hardware (2 nodes) and software (3 nodes) labs today with
> ISO #181 to test bonding. Unfortunately, balance-tcp mode is totally
> broken. When I use it during deployment or switch to it in a working
> cluster, all traffic stops. Playing with rebalance interval doesn't help.
> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware nodes) and
> CentOS (virtual env) works without any traffic lost. I'm running a flooded
> ping between virtual instances inside of clouds for a night and will check
> a number of lost packets. Also I want to play with iperf.
>
> Next things we can do:
> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of OVS and
> play with them. Yesterday we decided to build Ubuntu 12.04 with Debian Sid
> 1.9.3 version of OVS. There is the ticket about that
> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its own
> version of an ISO with Sid package.
> * Dump openflow rules in balance-tcp mode and try to fix them. It's hard
> to do that because Aliens developed their syntax.
> * Run Igor's tests again and again until balance-slb starts block a
> traffic. Then dig into openflow rules.
> * Play with LACP on a real hardware. Maybe balance-tcp can be used only
> with lacp=active.
> * Ask the openvswitch community about our problems.
>
> Andrew, yes, the PXE network still nailed to an interface. I hope we will
> fix it in 5.0.
>
>
> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <ishishkin@xxxxxxxxxxxx>wrote:
>
>> Hello, Dmitry.
>>
>> It’s 100% reproducible on virtual environment when we’re trying to deploy
>> bonding in balance tcp or balance slb mode.
>> Tests related as a way to reproduce and a warning why these tests should
>> fail when they’ll be merged.
>>
>> As we can see problem is in rebalance procedure openvswitch tries to do
>> since it started bonded interface. And in this time bonded interfaces stops
>> to accept ARPs.
>>
>> I just built openvswitch=1.9.3 which is LTS and wanna try it in the same
>> case and try to descrease bond-rebalance-interval to 0(as Andrey K.
>> suggested). If any of this will help - this could be the solution(but I'm
>> really not sure bond-rebalance-interval=0 is a good way).
>> —
>> Igor Shishkin
>> QA Engineer
>>
>>
>>
>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <dborodaenko@xxxxxxxxxxxx>
>> wrote:
>>
>> > Mike, Igor,
>> >
>> > Can you provide more details on how the integration test in review
>> > #75161 helps to reproduce bug #1272842?
>> >
>> > As far as I understand, the bug is a highly intermittent problem with
>> > ARP that was only showing up after an environment with LACP bonding
>> > was operational for at least a few hours.
>> >
>> > On the other hand, the problem Igor is reporting based on the
>> > integration test sounds like something 100% reproducible that doesn't
>> > require real hardware or LACP and is not necessarily related to ARP.
>> >
>> > Are you sure you're not confusing two unrelated problems?
>> >
>> > Thanks,
>> > -DmitryB
>> >
>> >
>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>> >> The issue is here: https://bugs.launchpad.net/fuel/+bug/1272842.
>> >> Those who know what can be wrong with our openvswitch/kernel, please
>> provide
>> >> your input..
>> >>
>> >>
>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <ishishkin@xxxxxxxxxxxx
>> >
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> Currently we have this review https://review.openstack.org/#/c/75161with
>> >>> test cases for our brand new shiny bonding feature but
>> >>> balance-tcp/balance-slb modes are not working for now.
>> >>>
>> >>> Steps to reproduce are very simple:
>> >>> Create cluster with simple or HA configuration, select balance-tcp or
>> >>> balance-slb bonding mode and start deployment.
>> >>>
>> >>> Deployment will not finish with success because of rebalance procedure
>> >>> problems.
>> >>> --
>> >>> Igor Shishkin
>> >>> QA Engineer
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Mike Scherbakov
>> >> #mihgen
>> >>
>> >> --
>> >> Mailing list: https://launchpad.net/~fuel-dev
>> >> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>> >> More help   : https://help.launchpad.net/ListHelp
>> >>
>> >
>> >
>> >
>> > --
>> > Dmitry Borodaenko
>>
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
>
> --
> Andrey Danin
> adanin@xxxxxxxxxxxx
> skype: gcon.monolake
>



-- 
Andrey Danin
adanin@xxxxxxxxxxxx
skype: gcon.monolake

Follow ups

References