syncany-team team mailing list archive

Thread
Date
Re: Database reconciliation

To: Philipp Heckel <philipp.heckel@xxxxxxxxx>
From: Fabrice Rossi <Fabrice.Rossi@xxxxxxxxxxx>
Date: Mon, 02 Dec 2013 09:16:14 +0100
Cc: Syncany Mailing List <syncany-team@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAAvm79a=RbAv+H6BamO9_KYfw_6FpqD+NZDRi5WAZE-f_wDXow@mail.gmail.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1
Hi Philipp,

Le 02/12/2013 00:00, Philipp Heckel a écrit :
> it really amazes me how fast you got the whole idea. You description 
> of the algorithm is entirely correct! And I don't think it's an easy 
> concept to grasp -- especially with the lack of documentation on the 
> subject!!

Thanks, but actually the code documentation is ok. The Reconciliator
code is a nightmare though ;-) (I wouldn't have done better, just to be
clear.)

>> If by any chance the timestamps are identical, then the repository 
>> ids are used to break the tie.
> 
> The final tie breaker is the machine name of the client -- meaning 
> that A wins before B. The "repository id" identifies a repository.
> It is generated by the init operation and stays the same among all 
> clients.

That's what I meant. How would you name in discussions different storage
areas (on the remote and on the clients)? (as the repository is the
general beast).

>> So for instance here, the official linearized/trimmed history 
>> becomes db-A-1 -> db-B-1 -> db-A-2 and db-B-b2 is discarded.
> 
> Discarded is a strong word. All clients follow the winning database 
> delta files: A and C simply ignore db-B-2 until B reconciles its 
> changes and uploads a new db-B-3, which is based on the winning 
> version.

Indeed, but as a branch in the history, db-B-2 will be discarded
ultimately, right?

> When B downloads the changes for the next time ("down" operation), it
> realizes that it lost and marks db-B-2 as dirty. At the next upload
> ("up" operation), the multichunks/chunks from this dirty version are
> reused and added to db-B-3.

A great, the dirty database concept was the one I did not fully understand.

> [..]
> 
> This is correct: The loser has to take action. That's one of the 
> fundamental design principles of the algorithm and it makes things 
> really easy.

It does, but with a certain price. Something I learning from "the art of
multiprocessor programming" an awesome book by Herlihy and Shavit (two
very famous researchers in parallel/concurrent programming), is that a
simple way to increase significantly the performances of concurrent data
structure is to allow every thread to fix problems. In the present
context, only losers can fix the repository while any client has almost
all the information needed to do so.

>> In my opinion, this works but is suboptimal as it assumes
>> conflicts even if none are present.
>> 
> 
> That is also correct: Two database versions might be in conflict
> even if files are not in conflict.
> 
> Example: - db-A-2 with vector clock (A=>2, B=>1) adds a file 
> "file1.jpg" - db-B-2 with vector clock (A=>1, B=>2) adds a file 
> "file2.jpg"
> 
> These two database versions are in conflict even though "file1.jpg" 
> and "file2.jpg" could be added and reconciled without conflict. The 
> reason behind that is that it's really really easy that way. 
> Reconciliation on file level is hard: Just imagine moving a file
> tree on client A and changing a file in this file tree on client B --
> and that's just one example.

You are 100 % right and I'm all in favor of reconciliation at the commit
level (database level). But in the end, the file level conflicts have to
be dealt with and I think this should not be postponed.

>> Also, most of the literature on distributed systems seem to agree 
>> on the need for a reconciliation to happen in order to fix the 
>> problem. In the above case, it means producing a delta that has 
>> both db-A-2 and db-B-2 in its causal history.
>> 
> 
> Probably true, but again: This is really hard and is "only" an 
> optimization. :-)

Well, there is a vast literature on such systems (see for instance
papers about the coda filesystem and WinFS) and everybody is trying to
allow each replica to implement reconciliation locally. I'm not sure
they are all wrong ;-)

> so basically, instead of just ignoring the loser, one could/should 
> take the losers branch into account and merge changes into the 
> winning branch.
> 
> This is definitely a possibility to minimize the round trips and 
> improve the algorithm!

That's exactly what I'm proposing.

> Do you think we should implement this right now? I know how hard
> this is and I'd rather focus on making the current implementation
> more understandable (side note: I said "understandable", not "stable"
> -- because I actually believe that it's quite stable already!!)

See my comments on the roadmap (next email).

> Keep in mind how irrelevant this issue is when Syncany is used as a 
> daemon: Every 'up' operation is announced to all clients (which
> react upon these announcements by running 'down') -- so a situation
> like this is pretty time limited. It's more relevant if a client
> uploads its changes, shuts of the computer on goes on vacation. If
> his/her uploads were in conflict, the others will not be able to
> retrieve any changes until these changes are reconciled by the losing
> client.

Again, see my comments on the roadmap.

>> [..] 3) because the conflict is solved latter, the history walking 
>> has to be done on each client, leading to some useless 
>> complication. Also, the history remains non causal and strongly 
>> tied to the deterministic tie breaking rule, as opposed to solely 
>> based on vector clocks (which have guaranteed properties).
>> 
> 
> I'm not sure that I understand you correctly. Once the loser uploads 
> a reconciled database version, the conflicting old versions are 
> removed (locally and remotely), and the history is solely based on 
> vector clocks.

Yes but this can take some time as the loser is the only one able to
solve the conflict. A massive advantage of having an optimistically
locked but mandatory reconciliation "commit" is that after this commit
any client other than the conflicting ones, can discard any non causal
commits to bring its database uptospeed, leading to a very simple
algorithm.

>> 4) As long as there is a dangling conflict, the history cannot be 
>> compressed. The resolution of the conflict in each client implies 
>> downloading multiple delta files.
> 
> This is indeed something that I have not looked at. The 'cleanup' 
> operation is the only missing thing, but might be right: Only 
> non-conflicting database versions can be compressed, because the 
> reconciliation by the losing client(s) requires the winning 
> versions.

As pointed out by Gregor, we still need to keep the deltas responsible
for the conflict.

>> 5) while I'm confident the algorithm is sound, it is also original 
>> (as far as I know) and I would feel more secure with a more 
>> classical solution.
>> 
> So would I. I would have loved to reuse something from existing 
> systems, but there was nothing that fulfilled all my requirements. 
> I've reviewd many file sync algorithms and none of them seem to do a 
> perfect job. rsync needs an active component, unison has no 
> versioning and compares only files. All/most version control systems 
> need an active component and all of them lack encryption ...

You are perfectly right as far as the full use case is considered. But
the Harmony system developed by the unison team was described in some
very interesting paper. They as not trying to solve the same thing, but
they surely have similar issues as they don't even rely on a centralized
database (however dumb this centralized db is). Also git-annex is now
fully encrypted and versionned via git-remote-gcrypt. But of course in
this case you need an active component on the remote. Actually, Joey
developed a lot of special remotes for git-annex but none of them stores
the database, which is a strong indication on how difficult this is
(Joey worked full time one year to improve git-annex last year).

> [..] I am pretty certain that using --force cannot break the system: 
> It will upload a database version that will definitely be in 
> conflict, but it will not break anything. Clients will download the 
> database version, compare it and realize that it loses and ignore it 
> ...

Hum, I won't be so sure. If I use --force on a system with a broken time
setting, I can rewrite the past and kill the tie breaking algorithm.

> While the --force option in the command line tool can (and probably 
> should) be removed, the operation should still have it -- as it is 
> the only way to be able to provoke conflicts in the tests.

I would go for a completely different solution using a dedicated program
to create broken repositories. This is what Joey designed recently for
git-annex and he managed to find numerous bugs with it.

But this is not a very important point.

> [..] Although I do not agree that a user guided conflict resolution 
> is ever useful. I like the idea of having an automatic resolution 
> (with "conflict" files, like it is implemented right now)

That's actually what I add in mind.

> [..] This merge-commit would definitely need a locking mechanism, 
> because a one client would merge databases from different clients 
> (here: C merges db-A-2 and db-B-2 into db-C-1, and deletes the 
> conflicting database versions db-A-2 and db-B-2).

See my answer to Gregor: I think C should only upload the new commit
delta and leave it to A an B to delete their files.

> In the current implementation, A only operates on db-A-* files, B 
> only on db-B-* files, and so on.

Let's keep it that way ;-)

> In general, locking on a dumb storage is really dangerous, because
> it might block/lock the storage forever if a client goes offline.

Indeed, but the use case/design of Syncany offers no other solution.

> However, we have the need for locking anyhow in the 'cleanup' 
> operation, since this cannot be done by all clients simultaneously, 
> so this might not be such as bad idea. I'll think about this a
> little more, and I think we really need to talk/chat this over. It's
> very difficult to do this via e-mail.

I'll try to free some time for this.

> [..] 
> New "down" extended operation (Client C's perspective) 1. Download 
> conflicting db-A-2 and db-B-2 by any client (e.g. by C) 2. Realize 
> there is a conflict in database versions, write/upload file 
> "resolve-A-2-B-2-by-C" (= lock!) 3. Apply non-conflicting updates 
> from db-A-2 and db-B-2 4. Apply conflicting updates from db-A-2 and 
> db-B-2 (automatically, oldest wins & "conflict" files; non-trivial!)
>  5. Create and upload file db-C-1, delete remote files db-A-2 and 
> db-B-2 6. Remove lock file "resolve-A-2-B-2-by-C" from remote 
> storage
> 
> --> How do A and B react on these changes? --> Can they easily fix 
> their local databases?! --> Is this what you had in mind?

Yes, good summary with two minor things to fix:

1) the lock is based on a winner determination at the end of the upload,
which means that several clients can be trying to fix the repository at
the same time. Only one will succeed but that can increase the liveness
of the system (with a price to pay in waste transfers, obviously).
2) deletion of conflicting databases will be done by the last client
responsible of the conflict or by a clean up operation.

> My summary: Changing the behavior like this would mean having to 
> invest significant brain power and probably face-to-face meetings to 
> make a sound algorithm (and to actually implement it -- which is 
> sometimes harder than it sounds, given the limits of Java and list 
> walking).

I'm a mathematician so I think we should prove that the scheme is
correct. This is probably doable for both schemes (the current one and
the one I'm proposing). I've nothing against chat, meeting and so on,
but I trust the maths ;-)

> How do you suggest we proceed? Do you think this could be done in a 
> second iteration of the algorithm? I do not want to delay the 
> development any further. I started the Syncany development over 2.5 
> years ago, so at some point, I should release something.

I think you have to decide on a tentative roadmap. I'll start a new
email about this, this one is too long :-)

Best regards,

Fabrice
References

Database reconciliation
From: Fabrice Rossi, 2013-12-01
Re: Database reconciliation
From: Philipp Heckel, 2013-12-01