fuel-dev team mailing list archive

Thread
Date
Re: Discussing DB migrations

To: Andrew Woodward <xarses@xxxxxxxxx>
From: Evgeniy L <eli@xxxxxxxxxxxx>
Date: Tue, 15 Apr 2014 21:10:51 +0400
Cc: "fuel-dev@xxxxxxxxxxxxxxxxxxx" <fuel-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CACEfbZiv1mJNKA6R8vTCYo43U8x5jUHTWy9EGDxg+z8wLSoCbA@mail.gmail.com>
Hi guys, sorry, but I was really busy and didn't have time to respond you

>>  I have also never seen database migrations handled in any other way
than with multiple files.

We will have multiple files, but it will be single file per release.

>> I agree with Nikolay and Ryan, Multiple files makes more sense. One
Alembic tracks the dependencies between them and applies them in order. Two
it allow us to revert changes that include db changes safely. Three it
allows people with working db's to migrate between versions safely between
revisions.

You still can do it with file per release.

>> Regardless of the one per release argument vs many files. We still
haven't created fuel_4.1.py which if we are doing one per release, is very
necessary. There is no point of managing db migrations if we don't create
the files per release.

While we don't have upgrades we need to have a single migrations file i.e.
5.0.
When we start develop 5.1 release we will create 5.1 migration file.

I'll try to to describe problems which we will have in case of several
files per release.

1. we have to create some kind of naming convention for this files. And
there will be a lot -1 for this.
E.g.
330ec2ab2bbf_add_nodegroup.py - where nodegroup was added?
596b7e3f2b11_upgrade.py - or this file's name tells almost nothing

In case of file per release it much simpler

fuel_5_0.py
fuel_5_1.py

2. from this file names it's not obvious what order will be used to apply
this migrations files, you need to run some script to find the order.

In case of file per release it is obvious what order will be used

fuel_5_0.py
fuel_5_1.py

3. and argument from my previous email

Developer A added field "a".
Developer B during development found that this field and decided to delete
it or to rename it.

4_0_fix_project_user_quotas_resource_length.py
4_0_add_a_in_compute_nodes.py - Developer A added this migration file
4_0_add_extra_resources_in_compute_nodes.py
4_0_add_details_column_to_instance_actions_events.py
4_0_add_ephemeral_key_uuid.py - Last migration

What developer B should to do? Should he create new
migration file or should he change/remove previous files?
It's very easy to miss the file '4_0_add_a_in_compute_nodes.py'
in the list, in this case developer will create new extra migration
file to remove or to rename field "a".

In case of single migration file per release developer will be able
to see, that this field was added in the current release, and
he will be able to remove/rename it.

[0]
https://github.com/stackforge/fuel-web/tree/master/nailgun/nailgun/db/migration/alembic_migrations/versions

Thanks,


On Sat, Apr 12, 2014 at 1:25 AM, Andrew Woodward <xarses@xxxxxxxxx> wrote:

> Ryan helped to find that the changes I found from [1] are in fact due to
> buggy migrations from Alembic 0.6.2, moving to 0.6.4 resolves this issue.
> So that was a false alarm. I am intrigued as to why no one has raised this.
>
> [1] https://gist.github.com/xarses/10498338
>
>
> On Fri, Apr 11, 2014 at 1:21 PM, Andrew Woodward <xarses@xxxxxxxxx> wrote:
>
>> I agree with Nikolay and Ryan, Multiple files makes more sense. One
>> Alembic tracks the dependencies between them and applies them in order. Two
>> it allow us to revert changes that include db changes safely. Three it
>> allows people with working db's to migrate between versions safely between
>> revisions.
>>
>> Regardless of the one per release argument vs many files. We still
>> haven't created fuel_4.1.py which if we are doing one per release, is
>> very necessary. There is no point of managing db migrations if we don't
>> create the files per release.
>>
>> Also, I have found that there are changes currently in master that are
>> not covered by a migration [1]. This shows that either changes aren't being
>> tracked propery in with current.py or people don't what or how to update
>> this. If we are going to keep the one-per-release approach, it would be
>> better to just not manage the migration files until we are ready to
>> generate the release and create it once.
>>
>> [1] https://gist.github.com/xarses/10498338
>>
>>
>> On Fri, Apr 11, 2014 at 12:28 PM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:
>>
>>> Have we reached a consensus on how we're handling migrations? I see some
>>> reviews modifying current.py and some adding new migration files. FWIW I
>>> agree with everything Nikolay said. I have also never seen database
>>> migrations handled in any other way than with multiple files.
>>>
>>> Thanks,
>>> Ryan
>>>
>>> On Mon, Mar 31, 2014 at 7:12 AM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:
>>>
>>>> I think it will be easier to add changes in a single
>>>>> schema instead of merging before release because
>>>>> in case of merging we have additional manual
>>>>> labour, we need to remember that we need to do it
>>>>> before release and we need to merge the migration
>>>>> files manually.
>>>>
>>>>
>>>> All we need to do in this case is simple copy-paste, it can even be
>>>> automated if we are not happy about doing it by hands. All code in
>>>> upgrade() and downgrade() methods executes one migration by one, it doesn't
>>>> matter if it's located in one file or multiple.
>>>>
>>>> Common practice is to keep in a single migration
>>>>> file all changes which were made during development
>>>>> cycle.
>>>>
>>>>
>>>> As long-time web developer in the past - never saw this practice. It
>>>> was always multiple files.
>>>>
>>>> I would say you're thinking too much about developers looking through
>>>> migrations. I can say you almost never need to look at previous migrations,
>>>> you just need to create yours from previous state (no matter what it is) to
>>>> yours.
>>>>
>>>> Also, it actually doesn't matter how long does it take to apply DB
>>>> migration. In the scope of upgrading process as a whole it will be a tiny
>>>> thing and even if we add field and then delete it - it doesn't make any
>>>> notable difference for users, but it's easier for developers to not look
>>>> back.
>>>>
>>>> If release == new database, we will have performance degradation in N
>>>>> times (where N equal to amount of releases).
>>>>
>>>>
>>>> Why? We can do requests in parallel. And what are possible problems
>>>> with transactions? We still keep all the objects with v1 in DBv1 and
>>>> objects v2 in DBv2. They will never intersect, in transactions as well.
>>>>
>>>> On Mon, Mar 31, 2014 at 3:28 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> >> The question is, do we need to keep it single during development
>>>>> process or we should just merge all the files into one migration just
>>>>> before release?
>>>>>
>>>>> I think it will be easier to add changes in a single
>>>>> schema instead of merging before release because
>>>>> in case of merging we have additional manual
>>>>> labour, we need to remember that we need to do it
>>>>> before release and we need to merge the migration
>>>>> files manually.
>>>>>
>>>>> >> As for me, I don't see any issues with keeping multiple migrations
>>>>> in code repo (that's the common practice of majority of projects).
>>>>> Please write your objections.
>>>>>
>>>>> Common practice is to keep in a single migration
>>>>> file all changes which were made during development
>>>>> cycle. Our development cycles are much longer
>>>>> than development cycles of regular web services
>>>>> (it's a specific of our product) as result our migration
>>>>> files bigger.
>>>>>
>>>>> I can provide several examples why 1 migration file
>>>>> per release is better than hundreds of small migration files.
>>>>>
>>>>> 1. it looks better to have a single file per release
>>>>>
>>>>> current.py # I think we need to rename it to 5.0
>>>>> fuel_4_0.py
>>>>>
>>>>> If you want to see what was changed between two
>>>>> versions you can just open a single file.
>>>>>
>>>>> .... here a lot of files
>>>>> 4_0_fix_project_user_quotas_resource_length.py
>>>>> 4_0_add_metrics_in_compute_nodes.py
>>>>> 4_0_add_extra_resources_in_compute_nodes.py
>>>>> 4_0_add_details_column_to_instance_actions_events.py
>>>>> 4_0_add_ephemeral_key_uuid.py
>>>>> 4_0_drop_dump_tables.py
>>>>> 4_0_add_stats_in_compute_nodes.py
>>>>>
>>>>> Here you have to follow some additional file naming
>>>>> convention.
>>>>> And not all of this names are obvious, as result you
>>>>> have to look inside of this files anyway.
>>>>>
>>>>> 2. development
>>>>>
>>>>> Developer A added field "a".
>>>>> Developer B during development found that this field and decided to
>>>>> delete it or to rename it.
>>>>>
>>>>>  4_0_fix_project_user_quotas_resource_length.py
>>>>> 4_0_add_a_in_compute_nodes.py - Developer A added this migration file
>>>>> 4_0_add_extra_resources_in_compute_nodes.py
>>>>> 4_0_add_details_column_to_instance_actions_events.py
>>>>> 4_0_add_ephemeral_key_uuid.py - Last migration
>>>>>
>>>>> What developer B should to do? Should he create new
>>>>> migration file or should he change/remove previous files?
>>>>> It's very easy to miss the file '4_0_add_a_in_compute_nodes.py'
>>>>> in the list, in this case developer will create new extra migration
>>>>> file to remove or to rename field "a".
>>>>>
>>>>> In case of single migration file per release developer will be able
>>>>> to see, that this field was added in the current release, and
>>>>> he will be able to remove/rename it.
>>>>>
>>>>> >> I proposed to use separate DB for each major API version (which
>>>>> may have completely independent schemas) and just write data
>>>>> migration scripts (v1->v2 and v2->v1), for example, to allow adding
>>>>> nodes to v1 cluster.
>>>>>
>>>>> If release == new database, we will have performance degradation in N
>>>>> times (where N equal to amount of releases).
>>>>> How are you going to use transactions when you have several databases?
>>>>> It adds complexity.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 28, 2014 at 7:12 PM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:
>>>>>
>>>>>> Hello colleagues,
>>>>>>
>>>>>> Right now we already have working DB migration mechanism presented by
>>>>>> Alembic, but it becomes more and more complex as we move towards
>>>>>> upgrades.
>>>>>>
>>>>>> First, as we agreed, migration from previous version of Fuel DB to the
>>>>>> next one will be presented by a single file. The question is, do we
>>>>>> need to keep it single during development process or we shouls just
>>>>>> merge all the files into one migration just before release?
>>>>>>
>>>>>> To clarify things, it's not really possible to generate completely
>>>>>> working migration from the scratch taking the diff between two
>>>>>> releases, because there are some issues in auto-generated scripts
>>>>>> which may be fixed by hands only during development. And our single
>>>>>> migration script (current.py) is becoming more and more huge as we
>>>>>> don't keep small updates in a separate files.
>>>>>>
>>>>>> As for me, I don't see any issues with keeping multiple migrations in
>>>>>> code repo (that's the common practice of majority of projects). Please
>>>>>> write your objections.
>>>>>>
>>>>>> Second, it's not clear right now how we're going to achieve backward
>>>>>> compatibility. We will have separate versions of almost all objects in
>>>>>> code and will select corresponding ones by Environment versions. The
>>>>>> thing is, it will be very hard for us to write working migrations in
>>>>>> both directions without serious data loss, especially if we'll have
>>>>>> lots of changes in DB schema.
>>>>>>
>>>>>> I proposed to use separate DB for each major API version (which may
>>>>>> have completely independent schemas) and just write data migration
>>>>>> scripts (v1->v2 and v2->v1), for example, to allow adding nodes to v1
>>>>>> cluster. This seems as a huge overhead, but actually helps to get away
>>>>>> of bad headache writing DB migrations.
>>>>>>
>>>>>> Please let's discuss all these things it this thread.
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Nick Markov
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Nick Markov
>>>>
>>>> --
>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Andrew
>> Mirantis
>> Ceph community
>>
>
>
>
> --
> Andrew
> Mirantis
> Ceph community
>
Follow ups

Re: Discussing DB migrations
From: Andrew Woodward, 2014-04-15
References

Discussing DB migrations
From: Nikolay Markov, 2014-03-28
Re: Discussing DB migrations
From: Evgeniy L, 2014-03-31
Re: Discussing DB migrations
From: Nikolay Markov, 2014-03-31
Re: Discussing DB migrations
From: Ryan Moe, 2014-04-11
Re: Discussing DB migrations
From: Andrew Woodward, 2014-04-11
Re: Discussing DB migrations
From: Andrew Woodward, 2014-04-11