← Back to team overview

syncany-team team mailing list archive

Re: Database reconciliation

 

HI Fabrice,

it really amazes me how fast you got the whole idea. You description of the
algorithm is entirely correct!
And I don't think it's an easy concept to grasp -- especially with the lack
of documentation on the subject!!

I've be reviewing the reconciliation algorithm and code
> (org.syncany.operations.DatabaseReconciliator). The code is quite
> complicated, so I've focused on the algorithm. Please note that what
> follows is my understanding of the algorithm, not Philipp's description.
> I hope I did not make major mistakes.
>

Nope! It's all correct!


> (...)
>
> In the current implementation of the reconciliation, C (or any other
> client) uses a deterministic tie breaking algorithm to choose a winner.
> The algorithm uses the timestamp of db-A-2 and db-B-2 as recorded in
> those deltas (this is the local time of A and B when they created the
> delta). If by any chance the timestamps are identical, then the
> repository ids are used to break the tie.
>

The final tie breaker is the machine name of the client -- meaning that A
wins before B. The "repository id" identifies a repository. It is generated
by the init operation and stays the same among all clients.

So the order is:
1. Compare vector clocks
2. If in conflict, compare the local timestamps of the conflicting database
versions (oldest wins, 10h wins before 11h)
3. If they are identical, compare machine names alphabetically (A wins
before B)

So for instance here, the official linearized/trimmed history becomes
> db-A-1 -> db-B-1 -> db-A-2 and db-B-b2 is discarded.
>

Discarded is a strong word. All clients follow the winning database delta
files: A and C simply ignore db-B-2 until B reconciles its changes and
uploads a new db-B-3, which is based on the winning version.

When B downloads the changes for the next time ("down" operation), it
realizes that it lost and marks db-B-2 as dirty. At the next upload ("up"
operation), the multichunks/chunks from this dirty version are reused and
added to db-B-3.

Because the tie breaking algorithm is deterministic, all clients will
> agree on this linearized/trimmed history. Assume for instance that C has
> downloaded the repository and uploaded its own changes. We have now a
> new delta, db-C-1 (A=>2,B=>1,C=>1). Notice the B=>1 as db-B-2 was
> discarded. When A comes back, it will consider the history to be
> db-A-1 -> db-B-1 -> db-A-2 -> db-C-1 and will know how to apply db-C-1
> to take into account the modifications. When B does the same things, it
> will also agree on this history and will discover that its upload where
> discarded. It has to take actions to be able to upload this again.
>

This is correct: The loser has to take action. That's one of the
fundamental design principles of the algorithm and it makes things really
easy.

 In my opinion, this works but is suboptimal as it assumes conflicts even
> if none are present.
>

That is also correct: Two database versions might be in conflict even if
files are not in conflict.

Example:
- db-A-2 with vector clock (A=>2, B=>1) adds a file "file1.jpg"
- db-B-2 with vector clock (A=>1, B=>2) adds a file "file2.jpg"

These two database versions are in conflict even though "file1.jpg" and
"file2.jpg" could be added and reconciled without conflict. The reason
behind that is that it's really really easy that way. Reconciliation on
file level is hard: Just imagine moving a file tree on client A and
changing a file in this file tree on client B -- and that's just one
example.

Also, most of the literature on distributed systems
> seem to agree on the need for a reconciliation to happen in order to fix
> the problem. In the above case, it means producing a delta that has both
> db-A-2 and db-B-2 in its causal history.
>

Probably true, but again: This is really hard and is "only" an
optimization. :-)


> 1) when a client loose, its content modifications are in the storage (it
> has uploaded the new multichunks before uploading the database delta),
> yet they are not available for the other clients. This is a waste of
> resources.
>

True, when a client loses, the multichunks are already uploaded and ignored
by other clients until C reconciles its changes: After C's next "sync"
round (= "down", then "up" operation), the already uploaded multichunks are
referenced and thereby not wasted.

If, however, C never ever does a "sync" again, this branch (db-C-x) will
always be in conflict and waste resources forever (uploaded multichunks
have no pointer on them). Having a reconciliation in the "down" operation
would definitely solve this issue --> so basically, instead of just
ignoring the loser, one could/should take the losers branch into account
and merge changes into the winning branch.

This is definitely a possibility to minimize the round trips and improve
the algorithm!

Do you think we should implement this right now? I know how hard this is
and I'd rather focus on making the current implementation more
understandable (side note: I said "understandable", not "stable" -- because
I actually believe that it's quite stable already!!)

Keep in mind how irrelevant this issue is when Syncany is used as a daemon:
Every 'up' operation is announced to all clients (which react upon these
announcements by running 'down') -- so a situation like this is pretty time
limited. It's more relevant if a client uploads its changes, shuts of the
computer on goes on vacation. If his/her uploads were in conflict, the
others will not be able to retrieve any changes until these changes are
reconciled by the losing client.


> 2) the only way to the modifications done on a looser to become
> available is for the looser to reconnect latter to the storage, while
> the needed content is already in the database. This prevents fast
> conflict resolutions and is a typical starvation situation.
>

True, see (1).


> 3) because the conflict is solved latter, the history walking has to be
> done on each client, leading to some useless complication. Also, the
> history remains non causal and strongly tied to the deterministic tie
> breaking rule, as opposed to solely based on vector clocks (which have
> guaranteed properties).
>

I'm not sure that I understand you correctly. Once the loser uploads a
reconciled database version, the conflicting old versions are removed
(locally and remotely), and the history is solely based on vector clocks.


> 4) As long as there is a dangling conflict, the history cannot be
> compressed. The resolution of the conflict in each client implies
> downloading multiple delta files.
>

This is indeed something that I have not looked at. The 'cleanup' operation
is the only missing thing, but might be right: Only non-conflicting
database versions can be compressed, because the reconciliation by the
losing client(s) requires the winning versions.


> 5) while I'm confident the algorithm is sound, it is also original (as
> far as I know) and I would feel more secure with a more classical solution.
>

So would I. I would have loved to reuse something from existing systems,
but there was nothing that fulfilled all my requirements. I've reviewd many
file sync algorithms and none of them seem to do a perfect job. rsync needs
an active component, unison has no versioning and compares only files.
All/most version control systems need an active component and all of them
lack encryption ...


> - completely remove the --force option that allows uploading something
> without downloading previous databases before. This is a recipe for
> failure and I'm not even sure this cannot break the system beyond repair.
>

I am pretty certain that using --force cannot break the system: It will
upload a database version that will definitely be in conflict, but it will
not break anything. Clients will download the database version, compare it
and realize that it loses and ignore it ...

While the --force option in the command line tool can (and probably should)
be removed, the operation should still have it -- as it is the only way to
be able to provoke conflicts in the tests.

- implement a user guided conflict resolution when the vector clocks
> cannot be used to guarantee a linear history. For instance in the case
> described above, we have C with db-A-2 and db-B-2 that are not causally
> related. Knowing that, C will first apply to db-B-1 (the last version in
> the linear history) all updates from db-A-2 and db-B-2 that do not
> conflict (that is which concerns different files). I'm confident this
> will be the more general case.
>

Interesting. I agree that this will be the general case.

Although I do not agree that a user guided conflict resolution is ever
useful. I like the idea of having an automatic resolution (with "conflict"
files, like it is implemented right now)


> Then for all conflicting updates, the
> user will be notified (notice that for text files standard merging
> algorithms could be applied automatically as git does). In the end, C
> will push a db-C-1 (A=>2, B=>2, C=>1) which will be a sort of merge
> commit. Once this is pushed, C can remove from the storage db-A-2 and
> db-B-2 as db-C-1 supersedes those. And then it will be allowed to upload
> it local changes, for instance. Obviously this can be done also by
> either A or B. In any case, the only operation allowed at that time
> would be the merge operation, so if another client (including A and B)
> tries to upload something, it will first need to implement the
> reconciliation between A and B (no --force!).
>

This merge-commit would definitely need a locking mechanism, because a one
client would merge databases from different clients (here: C merges db-A-2
and db-B-2 into db-C-1, and deletes the conflicting database versions
db-A-2 and db-B-2).

In the current implementation, A only operates on db-A-* files, B only on
db-B-* files, and so on. In general, locking on a dumb storage is really
dangerous, because it might block/lock the storage forever if a client goes
offline.

However, we have the need for locking anyhow in the 'cleanup' operation,
since this cannot be done by all clients simultaneously, so this might not
be such as bad idea. I'll think about this a little more, and I think we
really need to talk/chat this over. It's very difficult to do this via
e-mail.

(...) So we will need in this case to implement some form
> of waiting/locking to ensure someone actually win the race to resolve
> the conflict. Maybe the client writes to a .resolution file with its
> name in it. Someone is guaranteed to win the race to this file as
> uploads are atomic. When one has written to it, it starts to upload its
> solution. At the end of the upload, it verifies it actually won the race
> (in order to accommodate for eventual consistency). It this is the case,
> it commits (i.e. move in the case of ftp like remote) its solution. If
> not, it removes it and wait to the winner to complete the reconciliation
> process.
>

I should have read ahead. I just realized that you saw the need for locking
as well ... :-)

Comments?
>

I like the idea and I think Syncany could definitely benefit from this.
What I understand (and could see) is something like this (your suggestion
in my own words):

New "down" extended operation (Client C's perspective)
1. Download conflicting db-A-2 and db-B-2 by any client (e.g. by C)
2. Realize there is a conflict in database versions, write/upload file
"resolve-A-2-B-2-by-C" (= lock!)
3. Apply non-conflicting updates from db-A-2 and db-B-2
4. Apply conflicting updates from db-A-2 and db-B-2 (automatically, oldest
wins & "conflict" files; non-trivial!)
5. Create and upload file db-C-1, delete remote files db-A-2 and db-B-2
6. Remove lock file "resolve-A-2-B-2-by-C" from remote storage

--> How do A and B react on these changes?
--> Can they easily fix their local databases?!
--> Is this what you had in mind?

My summary: Changing the behavior like this would mean having to invest
significant brain power and probably face-to-face meetings to make a sound
algorithm (and to actually implement it -- which is sometimes harder than
it sounds, given the limits of Java and list walking).

How do you suggest we proceed? Do you think this could be done in a second
iteration of the algorithm? I do not want to delay the development any
further. I started the Syncany development over 2.5 years ago, so at some
point, I should release something.

Best regards and many many thanks,
Philipp

Follow ups

References