fuel-dev team mailing list archive

Thread
Date

Re: Propose new rules for bringing in external puppet modules

To: Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
From: Dmitry Borodaenko <dborodaenko@xxxxxxxxxxxx>
Date: Fri, 28 Mar 2014 12:41:37 -0700
Cc: "fuel-dev@xxxxxxxxxxxxxxxxxxx" <fuel-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAHAWLf1u=Lu6AOL_iKTAOdZb2jXLmFGjV07yM-w9WDenLnXbxg@mail.gmail.com>

No. The fact that we failed miserably at tracking upstream so far is
not a valid excuse for refusing to even try:
http://hakanforss.files.wordpress.com/2014/03/are-you-too-busy-to-improve1.png

Vladimir, you optimization goal is flawed: it assumes that staying
close to upstream is a goal in itself. It is not a cult, there are
important reasons why we want to track upstream.

Yes, sometimes we find and fix bugs in the upstream code, but it is
much more common for upstream to find and fix its own bugs. On
average, we loose code quality over time, and that causes us to lose
velocity by wasting efforts on fixing bugs that were already addressed
by upstream, or bugs that were introduced because our code is poorly
structured and full of ugly hacks that upstream would have done (or
already did) better.

Staying close to upstream and submitting your fixes to them on a
regular basis also makes it more likely that upstream developers would
be able and willing to help you. Right now we can't report bugs
upstream (because we've diverged too far), we can't submit patches
(often because we do things in a Fuel specific way that would be
unacceptable to upstream), so we're always on our own when we have
problems with any of the Puppet code in fuel-library.

In short, assuming that we can do a better job alone than in
collaboration with all our upstreams is just arrogant. We're not that
good and there's not that many of us.

Bogdan, your logstash commit is such an obvious illustration of the
above that I am very surprised that I have to explain it.

When I said that merging this patch will increase our Puppet code base
by 130% I expected everybody to understand that it is nothing short of
a catastrophe:
https://stackoverflow.com/questions/184071/when-if-ever-is-number-of-lines-of-code-a-useful-metric

When I said that it looked auto-generated I thought it would be
treated as a red flag to be verified, not just an observation that can
be ignored. You have to have very good reasons to commit
auto-generated code (instead of just the generator), and
puppet-logstash didn't have them. Even worse, it didn't even include
the generator script (although commit history looks as if they kept
using it after the initial commit).

The fact that current version of puppet-logstash is able to do the
same task in merely 673 lines of Puppet code instead of 20790 is by
itself a MASSIVE improvement. If we commit the old version now, and
give up on tracking upstream, we miss this improvement and are stuck
maintaining an obsolete 20KLOC version for years. It will not be
pleasant. If at some point we give up and try to move to a newer
upstream version, we'll have to deal with the fact that upstream had
to do a full rewrite of the manifests. And we would have to spend even
more effort migrating to the new upstream version in the future that
we would now, because we'll have accumulated more Fuel specific
changes that will have to be redone essentially from scratch.

I don't see what better example of needing a newer upstream version
you may need. There may be cases where it makes sense to postpone
merging latest upstream, but this is certainly not one of them.
Especially since it's not merged yet.


On Fri, Mar 28, 2014 at 5:36 AM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx> wrote:
> Guys, I suggest a hybrid approach.
>
> Let's just specify what do we want. Let me write these requirements down in
> some kind of optimization problem
>
> We want to be as close to upstream modules as possible, subject to the fact
> that our code should satisfy all the functional requirements for particular
> features.
>
> In this case I would assume we need the following to do:
>
> 1) For any request require from the requester to submit the original module.
> Not the current upstream HEAD, but his head where his own development
> branched from.
> 2) Then the developer must submit the diverged code. As long as it meets
> functional requirements, it should be OK to use it in FUEL
> 3) Set upstream merging task as a "stretch goal" and strive to be as close
> to upstream as possible.
>
> There should almost no cult of "upstream" code in our project, I think.
> Because if there is a code piece that is not working in the upstream and we
> fix it and upstream community does not accept it, then we have nothing to do
> but maintain our own fork. Thus, let's split requests according to this
> approach and request diverging changes. If they work - let's merge them and
> leave upstream merging as background task.
>
>
> On Fri, Mar 28, 2014 at 12:08 PM, Bogdan Dobrelya <bdobrelia@xxxxxxxxxxxx>
> wrote:
>>
>> On 03/19/2014 07:45 PM, Dmitry Borodaenko wrote:
>> > On Wed, Mar 19, 2014 at 4:56 AM, Bogdan Dobrelya
>> > <bdobrelia@xxxxxxxxxxxx> wrote:
>> >> On 03/19/2014 01:31 PM, Dmitry Nikishov wrote:
>> >>> Actually, zabbix module doesn't come from any upstream repo, it's our
>> >>> own reimplementation of the module by PL team, which was based on
>> >>> upstream code. In Zabbix thread it was suggested that we split it into
>> >>> multiple commits instead of trying to push the whole thing at once.
>> >
>> > It can't be both our own reimplementation and based on upstream code
>> > at the same time. If it's not rewritten from scratch, you should track
>> > down the original upstream version and start your commit series from
>> > that.
>> >
>> >>> It's unclear how it will affect internally developed modules like
>> >>> zabbix
>> >>> one. Should there be any at all? Or should we make a public repo with
>> >>> that module first and then try to include it into fuel-library?
>> >
>> > That's a very good question. Personally, I think that we should prefer
>> > the second option most of the time: create a public repo for the
>> > standalone module, and use that as upstream for the copy of that
>> > module in fuel-library. It will make it more likely that this module
>> > will attract other engineers who will help us find and fix bugs in
>> > this code, and eventually even add more new features that we'd be able
>> > to reuse in Fuel.
>> >
>> >> It is also not quite clear how to submit these team-specific changes.
>> >> 1) E.g. I've submitted the puppet-logstash module from Apollo11 team
>> >> (Poland) and that module was diverged from the original upstream
>> >> version
>> >> by the team. Now we have a two diverged versions - an upstream one and
>> >> a
>> >> submitted one.
>> >
>> > I think that you should present each hand-over as a separate commit:
>> > {original upstream commit} -> {modifications from Apollo11 team} ->
>> > {modifications to integrate with Fuel}. If you have the commit history
>> > from Apollo11 and it's not too long, it would be nice to have all of
>> > it instead of squashing it all into one commit (although if it's more
>> > than a dozen commits it might now be worth the trouble of pulling each
>> > commit through Gerrit).
>> >
>> >> 2) Should I submit the diverged commits rebased onto the upstream
>> >> HEAD/stable version as a separate patchset which depends on the
>> >> verbatim
>> >> copy of HEAD/stable patchset? That could be a very bad idea, because
>> >> rebasing might broke the module completely.
>> >
>> > Yes, I also think that would be a bad idea. Rebase is a kind of a
>> > change, you don't want to combine that and other changes in a single
>> > commit, or you loose ability to distinguish what were you changes and
>> > what changed due to rebase.
>> >
>> >> 3) Should I do the same as (2) but use the common parent commit as an
>> >> upstream base for verbatim copy instead? Despite on no more rebasing
>> >> needed, that is not so good idea as well, because it would also
>> >> complicate the  upstream sync contribution process, if any planned in
>> >> the future.
>> >
>> > You're not going to be able to kill two rabbits with one stone here.
>> > Either you significantly diverge from upstream, and keeping up will be
>> > near impossible, or you keep Fuel specific changes minimal and well
>> > isolated, and keeping up becomes simple.
>> >
>> > a) If you diverge, the best you can do is submit all your
>> > non-Fuel-specific improvements to upstream (you will obviously need to
>> > heavily modify them to decouple from Fuel specific code), and then
>> > periodically (e.g. once per Fuel release) merge upstream changes back
>> > into Fuel by hand. The further you deviate, the harder this process
>> > becomes. It becomes even harder if you don't submit anything to
>> > upstream, because there will be more changes to hand-port later.
>> >
>> > b) If you can isolate Fuel specific code, keeping up with upstream
>> > becomes much easier. Create a fork of upstream repo on Github, create
>> > a fuel branch in that fork, commit all changes for that module to the
>> > fuel branch before submitting them to fuel-library. Submit non Fuel
>> > specific changes to upstream (keep them on your fuel branch until they
>> > are merged). Pulling a new upstream version into fuel-library becomes
>> > a rebase of your fuel branch of the forked upstream onto the latest
>> > upstream release, and then copying the result verbatim into
>> > fuel-library. If Fuel specific code is well isolated, that rebase will
>> > be trivial.
>> >
>> >> 4) So, looks like the only good option is to accept changes to the
>> >> puppet modules which are only the sync requests from the upstream (see
>> >> Openstack projects and Oslo) and never change them locally in the Fuel?
>> >> But I'm afraid the Fuel puppet modules are not ready yet for such
>> >> dramatical changes... Looks like we need a kind of Fuel-oslo ;)
>> >
>> > I think we're very far from being able to use this approach.
>> >
>>
>> Dmitry, thank you - that is a good point that makes sense.
>> But looks like we still have a decision-blocker for introducing any new
>> modules for puppet in Fuel-library, if one was not synced with the
>> upstream while in dev / PoC process (i.e. we silently follow the
>> rejected options (2) and (4) despite on the fact we have near to 90% of
>> puppet modules in Fuel are far more than 1 year outdated and never
>> synced with upstream)
>>
>> Here is an example of such decision blockers:
>> https://review.openstack.org/#/c/80025/
>> https://review.openstack.org/#/c/80024/
>>
>> Hence, lets accept we have a blocker condition in introducing puppet
>> modules for Fuel, lets decide how to deal with it once again and follow
>> it in reviews as well.

-- 
Dmitry Borodaenko

Follow ups

Re: Propose new rules for bringing in external puppet modules
From: Andrew Woodward, 2014-03-28
Re: Propose new rules for bringing in external puppet modules
From: Vladimir Kuklin, 2014-03-28

References

Propose new rules for bringing in external puppet modules
From: Andrew Woodward, 2014-03-18
Re: Propose new rules for bringing in external puppet modules
From: Dmitry Borodaenko, 2014-03-19
Re: Propose new rules for bringing in external puppet modules
From: Dmitry Nikishov, 2014-03-19
Re: Propose new rules for bringing in external puppet modules
From: Bogdan Dobrelya, 2014-03-19
Re: Propose new rules for bringing in external puppet modules
From: Dmitry Borodaenko, 2014-03-19
Re: Propose new rules for bringing in external puppet modules
From: Bogdan Dobrelya, 2014-03-28
Re: Propose new rules for bringing in external puppet modules
From: Vladimir Kuklin, 2014-03-28