← Back to team overview

fuel-dev team mailing list archive

Re: Discussing DB migrations

 

Have we reached a consensus on how we're handling migrations? I see some
reviews modifying current.py and some adding new migration files. FWIW I
agree with everything Nikolay said. I have also never seen database
migrations handled in any other way than with multiple files.

Thanks,
Ryan

On Mon, Mar 31, 2014 at 7:12 AM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:

> I think it will be easier to add changes in a single
>> schema instead of merging before release because
>> in case of merging we have additional manual
>> labour, we need to remember that we need to do it
>> before release and we need to merge the migration
>> files manually.
>
>
> All we need to do in this case is simple copy-paste, it can even be
> automated if we are not happy about doing it by hands. All code in
> upgrade() and downgrade() methods executes one migration by one, it doesn't
> matter if it's located in one file or multiple.
>
> Common practice is to keep in a single migration
>> file all changes which were made during development
>> cycle.
>
>
> As long-time web developer in the past - never saw this practice. It was
> always multiple files.
>
> I would say you're thinking too much about developers looking through
> migrations. I can say you almost never need to look at previous migrations,
> you just need to create yours from previous state (no matter what it is) to
> yours.
>
> Also, it actually doesn't matter how long does it take to apply DB
> migration. In the scope of upgrading process as a whole it will be a tiny
> thing and even if we add field and then delete it - it doesn't make any
> notable difference for users, but it's easier for developers to not look
> back.
>
> If release == new database, we will have performance degradation in N
>> times (where N equal to amount of releases).
>
>
> Why? We can do requests in parallel. And what are possible problems with
> transactions? We still keep all the objects with v1 in DBv1 and objects v2
> in DBv2. They will never intersect, in transactions as well.
>
> On Mon, Mar 31, 2014 at 3:28 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>
>> Hi,
>>
>> >> The question is, do we need to keep it single during development
>> process or we should just merge all the files into one migration just
>> before release?
>>
>> I think it will be easier to add changes in a single
>> schema instead of merging before release because
>> in case of merging we have additional manual
>> labour, we need to remember that we need to do it
>> before release and we need to merge the migration
>> files manually.
>>
>> >> As for me, I don't see any issues with keeping multiple migrations in code
>> repo (that's the common practice of majority of projects). Please write
>> your objections.
>>
>> Common practice is to keep in a single migration
>> file all changes which were made during development
>> cycle. Our development cycles are much longer
>> than development cycles of regular web services
>> (it's a specific of our product) as result our migration
>> files bigger.
>>
>> I can provide several examples why 1 migration file
>> per release is better than hundreds of small migration files.
>>
>> 1. it looks better to have a single file per release
>>
>> current.py # I think we need to rename it to 5.0
>> fuel_4_0.py
>>
>> If you want to see what was changed between two
>> versions you can just open a single file.
>>
>> .... here a lot of files
>> 4_0_fix_project_user_quotas_resource_length.py
>> 4_0_add_metrics_in_compute_nodes.py
>> 4_0_add_extra_resources_in_compute_nodes.py
>> 4_0_add_details_column_to_instance_actions_events.py
>> 4_0_add_ephemeral_key_uuid.py
>> 4_0_drop_dump_tables.py
>> 4_0_add_stats_in_compute_nodes.py
>>
>> Here you have to follow some additional file naming
>> convention.
>> And not all of this names are obvious, as result you
>> have to look inside of this files anyway.
>>
>> 2. development
>>
>> Developer A added field "a".
>> Developer B during development found that this field and decided to
>> delete it or to rename it.
>>
>>  4_0_fix_project_user_quotas_resource_length.py
>> 4_0_add_a_in_compute_nodes.py - Developer A added this migration file
>> 4_0_add_extra_resources_in_compute_nodes.py
>> 4_0_add_details_column_to_instance_actions_events.py
>> 4_0_add_ephemeral_key_uuid.py - Last migration
>>
>> What developer B should to do? Should he create new
>> migration file or should he change/remove previous files?
>> It's very easy to miss the file '4_0_add_a_in_compute_nodes.py'
>> in the list, in this case developer will create new extra migration
>> file to remove or to rename field "a".
>>
>> In case of single migration file per release developer will be able
>> to see, that this field was added in the current release, and
>> he will be able to remove/rename it.
>>
>> >> I proposed to use separate DB for each major API version (which may have
>> completely independent schemas) and just write data migration scripts
>> (v1->v2 and v2->v1), for example, to allow adding nodes to v1 cluster.
>>
>> If release == new database, we will have performance degradation in N
>> times (where N equal to amount of releases).
>> How are you going to use transactions when you have several databases?
>> It adds complexity.
>>
>> Thanks,
>>
>>
>>
>> On Fri, Mar 28, 2014 at 7:12 PM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:
>>
>>> Hello colleagues,
>>>
>>> Right now we already have working DB migration mechanism presented by
>>> Alembic, but it becomes more and more complex as we move towards
>>> upgrades.
>>>
>>> First, as we agreed, migration from previous version of Fuel DB to the
>>> next one will be presented by a single file. The question is, do we
>>> need to keep it single during development process or we shouls just
>>> merge all the files into one migration just before release?
>>>
>>> To clarify things, it's not really possible to generate completely
>>> working migration from the scratch taking the diff between two
>>> releases, because there are some issues in auto-generated scripts
>>> which may be fixed by hands only during development. And our single
>>> migration script (current.py) is becoming more and more huge as we
>>> don't keep small updates in a separate files.
>>>
>>> As for me, I don't see any issues with keeping multiple migrations in
>>> code repo (that's the common practice of majority of projects). Please
>>> write your objections.
>>>
>>> Second, it's not clear right now how we're going to achieve backward
>>> compatibility. We will have separate versions of almost all objects in
>>> code and will select corresponding ones by Environment versions. The
>>> thing is, it will be very hard for us to write working migrations in
>>> both directions without serious data loss, especially if we'll have
>>> lots of changes in DB schema.
>>>
>>> I proposed to use separate DB for each major API version (which may
>>> have completely independent schemas) and just write data migration
>>> scripts (v1->v2 and v2->v1), for example, to allow adding nodes to v1
>>> cluster. This seems as a huge overhead, but actually helps to get away
>>> of bad headache writing DB migrations.
>>>
>>> Please let's discuss all these things it this thread.
>>>
>>> --
>>> Best regards,
>>> Nick Markov
>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>>
>
>
> --
> Best regards,
> Nick Markov
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References