syncany-team team mailing list archive

Thread
Date
Syncany Architecture

To: Syncany Mailing List <syncany-team@xxxxxxxxxxxxxxxxxxx>
From: Philipp Heckel <philipp.heckel@xxxxxxxxx>
Date: Sat, 28 May 2011 03:34:53 +0200
Hello everyone,

I wanted to tell you a little about Syncany.
I hope this helps for you to understand what Syncany does, and how it does it.

You don't have to read everything. Just read the first couple of
sentences for each part and you're good.


1. VERSIONING CONCEPT

Since the files and folders in the repository need to be encrypted,
Syncany cannot use any existing versioning mechanisms. Hence it
implements its own versioning based on chunks. New files are chunked
into (currently) fixed-size chunks. Each file hence consists of a set
of chunks in a specific order. When parts of a file change, the file
still consists mostly of the same chunks, and replaces (or adds) only
the updated parts of the file with new chunks.

Example (Ci are chunks of 512 bytes):
Before. File A, version 1: [ C1, C2, C3, C4 ]
After changing 10 bytes of the file in chunk C2: File A, version 2:
[C1, C5, C3, C4]

As you can see, the 2nd chunk of file A, version 2 is now C5.
All chunks are stored on the data repository (any online storage). To
restore files, one simply needs to download all chunks of a file in
the desired version and assemble them. To restore File A, version 1,
we need to get C1-C4.

In the database, each file version is represented by a "CloneFile"
object (in the example 2 versions), each chunk is represented by a
"CloneChunk" object (5 chunks in the example). JPA is used for
database access. The class DatabaseHelper implements functions to help
database access.


2. REPOSITORY STRUCTURE

The following files can be found in each repository (= any remote data storage):
- chunk-<id>: encrypted chunk files; <id> is the checksum of the
unencrypted chunk
- repository-<time>: contains metadata of the repository; in
particular, which remote folders are handled by this repo (= needed
for wizard)
- profile-<machinename>-<time>: encrypted properties file with
properties for a specific client (currently only the username !=
machinename)
- image-<machinename>-<time>: encrypted image file for a specific client
- update-<machinename>-<time>: contains theencrypted version history
of client <clientname> for all of its files. This is (currently!) just
a CSV file with a dump of the "CloneFile" table. Right now, it
actually contains the whole history which can be pretty long. In a
stable version, this must be just the updates (instead of the whole
history). I was considering something like a "base database" file for
each client where it stores the version history up to a specific
point; so that only the new updates reside in the "update" files.

The CSV columns are identical to the DB columns (= CloneFile properties).

Since we have very different storage types, the repository structure
is flat. There are no folders. We assume we need nothing but the
filename, i.e. no file metadata.


3. CONNECTION PLUGINS

Each storage type has its own "plugin". Each plugin must
extend/implement the following classes/interfaces:
- PluginInfo: describes the details of the plugin (name, version, ...)
and can create a new Connection object for this plugin
- Connection: represents the settings for a particular protocol, e.g.
host, username, password, etc. The connection object also creates
TranferManagers and the ConfigPanel based on the connection settings.
- TransferManager: communicates with the storage using a minimal set
of operations (connect, disconnect, upload, download, list, delete).
All operations are synchronous.
- ConfigPanel: graphical panel to change the connection settings of
the Connection object. Used in the SettingsDialog and the Wizard.

Example usage:
FtpConnection c = (FtpConnection) Plugins.get("ftp").createConnection();
c.setHost(...);
c.setPort(...);
...
TransferManager tm = c.createTransferManager();

tm.connect();
tm.upload(new File("/some/file"), new RemoteFile("chunk-123"));
...


4. ARCHITECTURE

The following list shows the different components of Syncany, which
are roughly also reflected in the packages.

4.1. Files get changed locally and have to be uploaded

4.1.1. Watcher
The Watcher watches specified folders of the local file system and
fires events if something changed, e.g. a file is created or renamed.
All it does is to hand over the event to the Indexer. The current
implementation is based on jpathwatch, an open source project that
replicates and extends the Java7 WatchService API. The Linux version
of it is based on inotify, a part of the Linux kernel. Since
jpathwatch (and all the other libraries) only support watching a
single directory (and not the subdirectories), I implemented the
"BufferedWatcher", which can watch folders recursively and maps
RENAME_TO and RENAME_FROM events together (at least on Linux). The
Windows version lacks this feature at the moment. Read [1] for
details. @Stefan Mai: This might be important for you!

There is one watcher for the whole application. Singleton.

4.1.2. Indexer
The indexer is responsible for breaking the local files into chunks
and adding these files to the database. The indexer's methods are
mainly called by the watcher. Watch events are queued in the indexers
queue and processed sequentially in a separate indexer thread. There
are a couple of index request. Most of them map to watch events:
 -- CheckIndexRequest: checks whether a file is new or already exists
in the database, or if it is new and must be newly added. This
function also tries to map files via checksum and filename similarity
to find files that have been updated while Syncany wasn't running.
-- NewIndexRequest: chunks a new file and adds it to the DB
-- etc.

There is one indexer for the whole application. Singleton.

4.1.3. Uploader
The uploader simply uploads given CloneFile instances to the remote
storage. It has a queue for UploadRequests and processes them in a
single thread. It only uploads the chunks of a file, not its metadata.
Metadata (= update-files) are uploaded in the RemoteWatcher (see
below).

There is one uploader per profile, i.e. one per repository.


4.2. Files get changed by another user/computer. Changes have to be
downloaded and applied locally.

4.2.1. RemoteWatcher
Since the remote storage is dumb, it cannot notify our application
when changes occur. Hence the remote watcher must poll the repository
for changes. This thread does the following things in a defined
interval:
- connect to the remote storage
- get a list of existing files
- if changed, retrieve the repository-* file
- if changed locally, upload a new repository-* file
- download all unseen update-* files
- parse updates and add to DependencyQueue (see below)
- hand over update list to ChangeManager (see below)
- if changed locally, upload local profile (currently only the username)
- if changed, retrieve other users' profile files
- (same for images)
- delete old files from remote storage

4.2.2. DependencyQueue
The DependencyQueue class is a helper class that analyzes all the
given updates (~ file version) from the different clients and creates
one single update file out of it.
Example: multiple clients A, B, C have changed a file X. The
dependency queue compares the file histories H(A,X), H(B,X) and H(C,X)
with each other and determines which one of them "wins", so that in
the end, only one history for file X survives.

The class adds the updates to a single queue and makes sure that if an
update is taken from the queue, all the dependencies have left the
queue before.
Example: the folder "pictures" must be created BEFORE the file
"pictures/one.jpg" is created.

4.2.3. ChangeManager
The change manager processes all the updates from the dependency queue
sequentially and checks if there are any local conflicts. In a
nutshell, each update is checked according to the following cases
(pseudo code):

if (file id known in database):
   if (is conflict && my responsibility to solve the problem):
      resolveConflict(); // = rename my version to "conflicting",
download winning version
   else if (no conflict or not my problem):
      applyUpdate(); // can be "rename", "delete", "new", "changed"

if (file id NOT known in database):
  if (NO file with the same filename+path exists in DB):
     applyUpdate(); // = must be a new file
  else if (file exists in DB with different file ID):
     // If the checksum doesn't match, this is a conflict
     // If the checksum matches, the histories must be merged

In my opinion, this is BY FAR the MOST COMPLEX part of Syncany. In
fact, this class is ugly and doesn't work as expected. The following
things make it so difficult:
- Applying updates cannot always be done without looking in the
future. Example: if I change a file 4 times, Syncany shouldn't
download the chunks of all the 4 file versions, but only the last one.
It should hence skip the other updates (= add to DB, but not apply
local file changes), and only perform the last one.
- I'm not sure if the decision tree i built covers all possible cases.
- whenever any of the steps above fails, this must be handled by the
program somehow. It's difficult to find the correct solution for every
situation. Example things that can go wrong: e.g. a file exists at a
given path even though it shouldn't be there, or a rename fails, or
the local cache is full, ....

I would love if some of you guys could take a look at this, check my
logic, and bring it into some nicer form. This is the most important
part of Syncany. It MUST work perfectly in any situation.

5. GUI

The GUI is entirely based on Swing (and AWT) components. The only
exception is the Linux Tray, which uses the java-gnome library (see
below).

5.1. File Manager Integration (org.syncany.gui.desktop)
This architecture is based on the one from the original Dropbox
plugin. I would have done it a little differently, but it's clever, I
must admit :-)

Syncany starts two servers to which the file manager plugins can
connect to. Both client/server protocols are based on a simple
text-based data exchange.
Note: "Client" refers to the file manager plugin. "Server" to Syncany:

- CommandServer (client --[req]--> server --[resp]--> client): allows
the file manager plugin to retrieve information such as the emblem for
a certain file (= green/blue sync icon), context menu, etc.
- TouchServer (server --[req]--> client): Allows Syncany to send
"invalidate file" requests to the file manager. This causes the file
manager plugin to re-request all the information for a file.

5.2. Tray
In theory, the tray icon could be implement identically for all
platforms since AWT provides a TrayIcon class. However, if you have
used this class on Linux, you can see how ugly the icon and its popup
menu looks. Hence, the Linux tray is implemented differently.

All tray classes support the methods "updateUI" (= updates the menu),
setStatus(UPTODATE/SYNCING), notify(message).

5.1. Windows/Mac Tray
I have read (not tried) that Windows and Mac can use AWT TrayIcon
class without any gosh-how-ugly-is-this problems :-) So there is a
common tray for both of them. The only method missing for Windows at
the moment is notify(). For Mac I don't know.

5.2. Linux Tray (org.syncany.gui.tray and org.syncany.gui.linux)
The Linux tray uses the java-gnome library to implement the nice
looking tray menu (and some other things). The problem is that
java-gnome and Swing _cannot be used together_, if the Gtk look&feel
is used (took me days to figure that out!). That means that in the
same Java application, they cannot coexist without crashing in random
moments. Don't try it....

So the "natural" way to get the Gtk based tray and menu anyway was to
create a separate program for it and let the original Syncany
application communicate with that program. This external program
delivers al functions that use java-gnome library. This particularly
includes:

- The tray icon & menu
- The notification (notify-osd)
- The Linux native file open dialogs
- (could be more)

All these functions are grouped in a class called "LinuxNativeService"
in the org.syncany.gui.linux package. This class has its own main
method and is started by Syncany itself.
When the LinuxTray class is initialized, it creates a
"LinuxNativeClient", which runs a new child process for the
"LinuxNativeService".

Once started, the LinuxNativeService starts a server socket on a
random port and tells that to the client. From then on, they can then
communicate via the send() method of the client.

I must admit, the Linux tray adds a lot of baggage to the application,
even though the functionality would be delivered by the AWT tray....
But I believe it's worth it.


Okaaayy, I think that's it for now.
Hope this helps you guys :-)

Cheers,
Philipp


[1] https://sourceforge.net/projects/jpathwatch/forums/forum/888207/topic/4538927
Follow ups

Re: Syncany Architecture
From: Philipp Heckel, 2011-05-28