← Back to team overview

fuel-dev team mailing list archive

Re: Discussing DB migrations

 

Ryan helped to find that the changes I found from [1] are in fact due to
buggy migrations from Alembic 0.6.2, moving to 0.6.4 resolves this issue.
So that was a false alarm. I am intrigued as to why no one has raised this.

[1] https://gist.github.com/xarses/10498338


On Fri, Apr 11, 2014 at 1:21 PM, Andrew Woodward <xarses@xxxxxxxxx> wrote:

> I agree with Nikolay and Ryan, Multiple files makes more sense. One
> Alembic tracks the dependencies between them and applies them in order. Two
> it allow us to revert changes that include db changes safely. Three it
> allows people with working db's to migrate between versions safely between
> revisions.
>
> Regardless of the one per release argument vs many files. We still haven't
> created fuel_4.1.py which if we are doing one per release, is very
> necessary. There is no point of managing db migrations if we don't create
> the files per release.
>
> Also, I have found that there are changes currently in master that are not
> covered by a migration [1]. This shows that either changes aren't being
> tracked propery in with current.py or people don't what or how to update
> this. If we are going to keep the one-per-release approach, it would be
> better to just not manage the migration files until we are ready to
> generate the release and create it once.
>
> [1] https://gist.github.com/xarses/10498338
>
>
> On Fri, Apr 11, 2014 at 12:28 PM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:
>
>> Have we reached a consensus on how we're handling migrations? I see some
>> reviews modifying current.py and some adding new migration files. FWIW I
>> agree with everything Nikolay said. I have also never seen database
>> migrations handled in any other way than with multiple files.
>>
>> Thanks,
>> Ryan
>>
>> On Mon, Mar 31, 2014 at 7:12 AM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:
>>
>>> I think it will be easier to add changes in a single
>>>> schema instead of merging before release because
>>>> in case of merging we have additional manual
>>>> labour, we need to remember that we need to do it
>>>> before release and we need to merge the migration
>>>> files manually.
>>>
>>>
>>> All we need to do in this case is simple copy-paste, it can even be
>>> automated if we are not happy about doing it by hands. All code in
>>> upgrade() and downgrade() methods executes one migration by one, it doesn't
>>> matter if it's located in one file or multiple.
>>>
>>> Common practice is to keep in a single migration
>>>> file all changes which were made during development
>>>> cycle.
>>>
>>>
>>> As long-time web developer in the past - never saw this practice. It was
>>> always multiple files.
>>>
>>> I would say you're thinking too much about developers looking through
>>> migrations. I can say you almost never need to look at previous migrations,
>>> you just need to create yours from previous state (no matter what it is) to
>>> yours.
>>>
>>> Also, it actually doesn't matter how long does it take to apply DB
>>> migration. In the scope of upgrading process as a whole it will be a tiny
>>> thing and even if we add field and then delete it - it doesn't make any
>>> notable difference for users, but it's easier for developers to not look
>>> back.
>>>
>>> If release == new database, we will have performance degradation in N
>>>> times (where N equal to amount of releases).
>>>
>>>
>>> Why? We can do requests in parallel. And what are possible problems with
>>> transactions? We still keep all the objects with v1 in DBv1 and objects v2
>>> in DBv2. They will never intersect, in transactions as well.
>>>
>>> On Mon, Mar 31, 2014 at 3:28 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>>>
>>>> Hi,
>>>>
>>>> >> The question is, do we need to keep it single during development
>>>> process or we should just merge all the files into one migration just
>>>> before release?
>>>>
>>>> I think it will be easier to add changes in a single
>>>> schema instead of merging before release because
>>>> in case of merging we have additional manual
>>>> labour, we need to remember that we need to do it
>>>> before release and we need to merge the migration
>>>> files manually.
>>>>
>>>> >> As for me, I don't see any issues with keeping multiple migrations
>>>> in code repo (that's the common practice of majority of projects).
>>>> Please write your objections.
>>>>
>>>> Common practice is to keep in a single migration
>>>> file all changes which were made during development
>>>> cycle. Our development cycles are much longer
>>>> than development cycles of regular web services
>>>> (it's a specific of our product) as result our migration
>>>> files bigger.
>>>>
>>>> I can provide several examples why 1 migration file
>>>> per release is better than hundreds of small migration files.
>>>>
>>>> 1. it looks better to have a single file per release
>>>>
>>>> current.py # I think we need to rename it to 5.0
>>>> fuel_4_0.py
>>>>
>>>> If you want to see what was changed between two
>>>> versions you can just open a single file.
>>>>
>>>> .... here a lot of files
>>>> 4_0_fix_project_user_quotas_resource_length.py
>>>> 4_0_add_metrics_in_compute_nodes.py
>>>> 4_0_add_extra_resources_in_compute_nodes.py
>>>> 4_0_add_details_column_to_instance_actions_events.py
>>>> 4_0_add_ephemeral_key_uuid.py
>>>> 4_0_drop_dump_tables.py
>>>> 4_0_add_stats_in_compute_nodes.py
>>>>
>>>> Here you have to follow some additional file naming
>>>> convention.
>>>> And not all of this names are obvious, as result you
>>>> have to look inside of this files anyway.
>>>>
>>>> 2. development
>>>>
>>>> Developer A added field "a".
>>>> Developer B during development found that this field and decided to
>>>> delete it or to rename it.
>>>>
>>>>  4_0_fix_project_user_quotas_resource_length.py
>>>> 4_0_add_a_in_compute_nodes.py - Developer A added this migration file
>>>> 4_0_add_extra_resources_in_compute_nodes.py
>>>> 4_0_add_details_column_to_instance_actions_events.py
>>>> 4_0_add_ephemeral_key_uuid.py - Last migration
>>>>
>>>> What developer B should to do? Should he create new
>>>> migration file or should he change/remove previous files?
>>>> It's very easy to miss the file '4_0_add_a_in_compute_nodes.py'
>>>> in the list, in this case developer will create new extra migration
>>>> file to remove or to rename field "a".
>>>>
>>>> In case of single migration file per release developer will be able
>>>> to see, that this field was added in the current release, and
>>>> he will be able to remove/rename it.
>>>>
>>>> >> I proposed to use separate DB for each major API version (which may have
>>>> completely independent schemas) and just write data migration scripts
>>>> (v1->v2 and v2->v1), for example, to allow adding nodes to v1 cluster.
>>>>
>>>> If release == new database, we will have performance degradation in N
>>>> times (where N equal to amount of releases).
>>>> How are you going to use transactions when you have several databases?
>>>> It adds complexity.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> On Fri, Mar 28, 2014 at 7:12 PM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:
>>>>
>>>>> Hello colleagues,
>>>>>
>>>>> Right now we already have working DB migration mechanism presented by
>>>>> Alembic, but it becomes more and more complex as we move towards
>>>>> upgrades.
>>>>>
>>>>> First, as we agreed, migration from previous version of Fuel DB to the
>>>>> next one will be presented by a single file. The question is, do we
>>>>> need to keep it single during development process or we shouls just
>>>>> merge all the files into one migration just before release?
>>>>>
>>>>> To clarify things, it's not really possible to generate completely
>>>>> working migration from the scratch taking the diff between two
>>>>> releases, because there are some issues in auto-generated scripts
>>>>> which may be fixed by hands only during development. And our single
>>>>> migration script (current.py) is becoming more and more huge as we
>>>>> don't keep small updates in a separate files.
>>>>>
>>>>> As for me, I don't see any issues with keeping multiple migrations in
>>>>> code repo (that's the common practice of majority of projects). Please
>>>>> write your objections.
>>>>>
>>>>> Second, it's not clear right now how we're going to achieve backward
>>>>> compatibility. We will have separate versions of almost all objects in
>>>>> code and will select corresponding ones by Environment versions. The
>>>>> thing is, it will be very hard for us to write working migrations in
>>>>> both directions without serious data loss, especially if we'll have
>>>>> lots of changes in DB schema.
>>>>>
>>>>> I proposed to use separate DB for each major API version (which may
>>>>> have completely independent schemas) and just write data migration
>>>>> scripts (v1->v2 and v2->v1), for example, to allow adding nodes to v1
>>>>> cluster. This seems as a huge overhead, but actually helps to get away
>>>>> of bad headache writing DB migrations.
>>>>>
>>>>> Please let's discuss all these things it this thread.
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Nick Markov
>>>>>
>>>>> --
>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Nick Markov
>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Andrew
> Mirantis
> Ceph community
>



-- 
Andrew
Mirantis
Ceph community

Follow ups

References