fuel-dev team mailing list archive

Thread
Date
Re: Discussing DB migrations

To: Evgeniy L <eli@xxxxxxxxxxxx>
From: Nikolay Markov <nmarkov@xxxxxxxxxxxx>
Date: Mon, 31 Mar 2014 18:12:11 +0400
Cc: "fuel-dev@xxxxxxxxxxxxxxxxxxx" <fuel-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CABfuu9p_RewQo7skEgqKAeFE8-DAwkcrFz79eBFv5hiUCHDx7w@mail.gmail.com>
>
> I think it will be easier to add changes in a single
> schema instead of merging before release because
> in case of merging we have additional manual
> labour, we need to remember that we need to do it
> before release and we need to merge the migration
> files manually.


All we need to do in this case is simple copy-paste, it can even be
automated if we are not happy about doing it by hands. All code in
upgrade() and downgrade() methods executes one migration by one, it doesn't
matter if it's located in one file or multiple.

Common practice is to keep in a single migration
> file all changes which were made during development
> cycle.


As long-time web developer in the past - never saw this practice. It was
always multiple files.

I would say you're thinking too much about developers looking through
migrations. I can say you almost never need to look at previous migrations,
you just need to create yours from previous state (no matter what it is) to
yours.

Also, it actually doesn't matter how long does it take to apply DB
migration. In the scope of upgrading process as a whole it will be a tiny
thing and even if we add field and then delete it - it doesn't make any
notable difference for users, but it's easier for developers to not look
back.

If release == new database, we will have performance degradation in N times
> (where N equal to amount of releases).


Why? We can do requests in parallel. And what are possible problems with
transactions? We still keep all the objects with v1 in DBv1 and objects v2
in DBv2. They will never intersect, in transactions as well.

On Mon, Mar 31, 2014 at 3:28 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:

> Hi,
>
> >> The question is, do we need to keep it single during development
> process or we should just merge all the files into one migration just
> before release?
>
> I think it will be easier to add changes in a single
> schema instead of merging before release because
> in case of merging we have additional manual
> labour, we need to remember that we need to do it
> before release and we need to merge the migration
> files manually.
>
> >> As for me, I don't see any issues with keeping multiple migrations in code
> repo (that's the common practice of majority of projects). Please write
> your objections.
>
> Common practice is to keep in a single migration
> file all changes which were made during development
> cycle. Our development cycles are much longer
> than development cycles of regular web services
> (it's a specific of our product) as result our migration
> files bigger.
>
> I can provide several examples why 1 migration file
> per release is better than hundreds of small migration files.
>
> 1. it looks better to have a single file per release
>
> current.py # I think we need to rename it to 5.0
> fuel_4_0.py
>
> If you want to see what was changed between two
> versions you can just open a single file.
>
> .... here a lot of files
> 4_0_fix_project_user_quotas_resource_length.py
> 4_0_add_metrics_in_compute_nodes.py
> 4_0_add_extra_resources_in_compute_nodes.py
> 4_0_add_details_column_to_instance_actions_events.py
> 4_0_add_ephemeral_key_uuid.py
> 4_0_drop_dump_tables.py
> 4_0_add_stats_in_compute_nodes.py
>
> Here you have to follow some additional file naming
> convention.
> And not all of this names are obvious, as result you
> have to look inside of this files anyway.
>
> 2. development
>
> Developer A added field "a".
> Developer B during development found that this field and decided to delete
> it or to rename it.
>
>  4_0_fix_project_user_quotas_resource_length.py
> 4_0_add_a_in_compute_nodes.py - Developer A added this migration file
> 4_0_add_extra_resources_in_compute_nodes.py
> 4_0_add_details_column_to_instance_actions_events.py
> 4_0_add_ephemeral_key_uuid.py - Last migration
>
> What developer B should to do? Should he create new
> migration file or should he change/remove previous files?
> It's very easy to miss the file '4_0_add_a_in_compute_nodes.py'
> in the list, in this case developer will create new extra migration
> file to remove or to rename field "a".
>
> In case of single migration file per release developer will be able
> to see, that this field was added in the current release, and
> he will be able to remove/rename it.
>
> >> I proposed to use separate DB for each major API version (which may have
> completely independent schemas) and just write data migration scripts
> (v1->v2 and v2->v1), for example, to allow adding nodes to v1 cluster.
>
> If release == new database, we will have performance degradation in N
> times (where N equal to amount of releases).
> How are you going to use transactions when you have several databases?
> It adds complexity.
>
> Thanks,
>
>
>
> On Fri, Mar 28, 2014 at 7:12 PM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:
>
>> Hello colleagues,
>>
>> Right now we already have working DB migration mechanism presented by
>> Alembic, but it becomes more and more complex as we move towards
>> upgrades.
>>
>> First, as we agreed, migration from previous version of Fuel DB to the
>> next one will be presented by a single file. The question is, do we
>> need to keep it single during development process or we shouls just
>> merge all the files into one migration just before release?
>>
>> To clarify things, it's not really possible to generate completely
>> working migration from the scratch taking the diff between two
>> releases, because there are some issues in auto-generated scripts
>> which may be fixed by hands only during development. And our single
>> migration script (current.py) is becoming more and more huge as we
>> don't keep small updates in a separate files.
>>
>> As for me, I don't see any issues with keeping multiple migrations in
>> code repo (that's the common practice of majority of projects). Please
>> write your objections.
>>
>> Second, it's not clear right now how we're going to achieve backward
>> compatibility. We will have separate versions of almost all objects in
>> code and will select corresponding ones by Environment versions. The
>> thing is, it will be very hard for us to write working migrations in
>> both directions without serious data loss, especially if we'll have
>> lots of changes in DB schema.
>>
>> I proposed to use separate DB for each major API version (which may
>> have completely independent schemas) and just write data migration
>> scripts (v1->v2 and v2->v1), for example, to allow adding nodes to v1
>> cluster. This seems as a huge overhead, but actually helps to get away
>> of bad headache writing DB migrations.
>>
>> Please let's discuss all these things it this thread.
>>
>> --
>> Best regards,
>> Nick Markov
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>


-- 
Best regards,
Nick Markov
Follow ups

Re: Discussing DB migrations
From: Ryan Moe, 2014-04-11
References

Discussing DB migrations
From: Nikolay Markov, 2014-03-28
Re: Discussing DB migrations
From: Evgeniy L, 2014-03-31