← Back to team overview

syncany-team team mailing list archive

Re: Syncany Architecture

 

Hey everyone,

attached is a simplified vision representation of the architecture
(without GUI).

Regards
Philipp

On Sat, May 28, 2011 at 3:34 AM, Philipp Heckel
<philipp.heckel@xxxxxxxxx> wrote:
> Hello everyone,
>
> I wanted to tell you a little about Syncany.
> I hope this helps for you to understand what Syncany does, and how it does it.
>
> You don't have to read everything. Just read the first couple of
> sentences for each part and you're good.
>
>
> 1. VERSIONING CONCEPT
>
> Since the files and folders in the repository need to be encrypted,
> Syncany cannot use any existing versioning mechanisms. Hence it
> implements its own versioning based on chunks. New files are chunked
> into (currently) fixed-size chunks. Each file hence consists of a set
> of chunks in a specific order. When parts of a file change, the file
> still consists mostly of the same chunks, and replaces (or adds) only
> the updated parts of the file with new chunks.
>
> Example (Ci are chunks of 512 bytes):
> Before. File A, version 1: [ C1, C2, C3, C4 ]
> After changing 10 bytes of the file in chunk C2: File A, version 2:
> [C1, C5, C3, C4]
>
> As you can see, the 2nd chunk of file A, version 2 is now C5.
> All chunks are stored on the data repository (any online storage). To
> restore files, one simply needs to download all chunks of a file in
> the desired version and assemble them. To restore File A, version 1,
> we need to get C1-C4.
>
> In the database, each file version is represented by a "CloneFile"
> object (in the example 2 versions), each chunk is represented by a
> "CloneChunk" object (5 chunks in the example). JPA is used for
> database access. The class DatabaseHelper implements functions to help
> database access.
>
>
> 2. REPOSITORY STRUCTURE
>
> The following files can be found in each repository (= any remote data storage):
> - chunk-<id>: encrypted chunk files; <id> is the checksum of the
> unencrypted chunk
> - repository-<time>: contains metadata of the repository; in
> particular, which remote folders are handled by this repo (= needed
> for wizard)
> - profile-<machinename>-<time>: encrypted properties file with
> properties for a specific client (currently only the username !=
> machinename)
> - image-<machinename>-<time>: encrypted image file for a specific client
> - update-<machinename>-<time>: contains theencrypted version history
> of client <clientname> for all of its files. This is (currently!) just
> a CSV file with a dump of the "CloneFile" table. Right now, it
> actually contains the whole history which can be pretty long. In a
> stable version, this must be just the updates (instead of the whole
> history). I was considering something like a "base database" file for
> each client where it stores the version history up to a specific
> point; so that only the new updates reside in the "update" files.
>
> The CSV columns are identical to the DB columns (= CloneFile properties).
>
> Since we have very different storage types, the repository structure
> is flat. There are no folders. We assume we need nothing but the
> filename, i.e. no file metadata.
>
>
> 3. CONNECTION PLUGINS
>
> Each storage type has its own "plugin". Each plugin must
> extend/implement the following classes/interfaces:
> - PluginInfo: describes the details of the plugin (name, version, ...)
> and can create a new Connection object for this plugin
> - Connection: represents the settings for a particular protocol, e.g.
> host, username, password, etc. The connection object also creates
> TranferManagers and the ConfigPanel based on the connection settings.
> - TransferManager: communicates with the storage using a minimal set
> of operations (connect, disconnect, upload, download, list, delete).
> All operations are synchronous.
> - ConfigPanel: graphical panel to change the connection settings of
> the Connection object. Used in the SettingsDialog and the Wizard.
>
> Example usage:
> FtpConnection c = (FtpConnection) Plugins.get("ftp").createConnection();
> c.setHost(...);
> c.setPort(...);
> ...
> TransferManager tm = c.createTransferManager();
>
> tm.connect();
> tm.upload(new File("/some/file"), new RemoteFile("chunk-123"));
> ...
>
>
> 4. ARCHITECTURE
>
> The following list shows the different components of Syncany, which
> are roughly also reflected in the packages.
>
> 4.1. Files get changed locally and have to be uploaded
>
> 4.1.1. Watcher
> The Watcher watches specified folders of the local file system and
> fires events if something changed, e.g. a file is created or renamed.
> All it does is to hand over the event to the Indexer. The current
> implementation is based on jpathwatch, an open source project that
> replicates and extends the Java7 WatchService API. The Linux version
> of it is based on inotify, a part of the Linux kernel. Since
> jpathwatch (and all the other libraries) only support watching a
> single directory (and not the subdirectories), I implemented the
> "BufferedWatcher", which can watch folders recursively and maps
> RENAME_TO and RENAME_FROM events together (at least on Linux). The
> Windows version lacks this feature at the moment. Read [1] for
> details. @Stefan Mai: This might be important for you!
>
> There is one watcher for the whole application. Singleton.
>
> 4.1.2. Indexer
> The indexer is responsible for breaking the local files into chunks
> and adding these files to the database. The indexer's methods are
> mainly called by the watcher. Watch events are queued in the indexers
> queue and processed sequentially in a separate indexer thread. There
> are a couple of index request. Most of them map to watch events:
>  -- CheckIndexRequest: checks whether a file is new or already exists
> in the database, or if it is new and must be newly added. This
> function also tries to map files via checksum and filename similarity
> to find files that have been updated while Syncany wasn't running.
> -- NewIndexRequest: chunks a new file and adds it to the DB
> -- etc.
>
> There is one indexer for the whole application. Singleton.
>
> 4.1.3. Uploader
> The uploader simply uploads given CloneFile instances to the remote
> storage. It has a queue for UploadRequests and processes them in a
> single thread. It only uploads the chunks of a file, not its metadata.
> Metadata (= update-files) are uploaded in the RemoteWatcher (see
> below).
>
> There is one uploader per profile, i.e. one per repository.
>
>
> 4.2. Files get changed by another user/computer. Changes have to be
> downloaded and applied locally.
>
> 4.2.1. RemoteWatcher
> Since the remote storage is dumb, it cannot notify our application
> when changes occur. Hence the remote watcher must poll the repository
> for changes. This thread does the following things in a defined
> interval:
> - connect to the remote storage
> - get a list of existing files
> - if changed, retrieve the repository-* file
> - if changed locally, upload a new repository-* file
> - download all unseen update-* files
> - parse updates and add to DependencyQueue (see below)
> - hand over update list to ChangeManager (see below)
> - if changed locally, upload local profile (currently only the username)
> - if changed, retrieve other users' profile files
> - (same for images)
> - delete old files from remote storage
>
> 4.2.2. DependencyQueue
> The DependencyQueue class is a helper class that analyzes all the
> given updates (~ file version) from the different clients and creates
> one single update file out of it.
> Example: multiple clients A, B, C have changed a file X. The
> dependency queue compares the file histories H(A,X), H(B,X) and H(C,X)
> with each other and determines which one of them "wins", so that in
> the end, only one history for file X survives.
>
> The class adds the updates to a single queue and makes sure that if an
> update is taken from the queue, all the dependencies have left the
> queue before.
> Example: the folder "pictures" must be created BEFORE the file
> "pictures/one.jpg" is created.
>
> 4.2.3. ChangeManager
> The change manager processes all the updates from the dependency queue
> sequentially and checks if there are any local conflicts. In a
> nutshell, each update is checked according to the following cases
> (pseudo code):
>
> if (file id known in database):
>   if (is conflict && my responsibility to solve the problem):
>      resolveConflict(); // = rename my version to "conflicting",
> download winning version
>   else if (no conflict or not my problem):
>      applyUpdate(); // can be "rename", "delete", "new", "changed"
>
> if (file id NOT known in database):
>  if (NO file with the same filename+path exists in DB):
>     applyUpdate(); // = must be a new file
>  else if (file exists in DB with different file ID):
>     // If the checksum doesn't match, this is a conflict
>     // If the checksum matches, the histories must be merged
>
> In my opinion, this is BY FAR the MOST COMPLEX part of Syncany. In
> fact, this class is ugly and doesn't work as expected. The following
> things make it so difficult:
> - Applying updates cannot always be done without looking in the
> future. Example: if I change a file 4 times, Syncany shouldn't
> download the chunks of all the 4 file versions, but only the last one.
> It should hence skip the other updates (= add to DB, but not apply
> local file changes), and only perform the last one.
> - I'm not sure if the decision tree i built covers all possible cases.
> - whenever any of the steps above fails, this must be handled by the
> program somehow. It's difficult to find the correct solution for every
> situation. Example things that can go wrong: e.g. a file exists at a
> given path even though it shouldn't be there, or a rename fails, or
> the local cache is full, ....
>
> I would love if some of you guys could take a look at this, check my
> logic, and bring it into some nicer form. This is the most important
> part of Syncany. It MUST work perfectly in any situation.
>
> 5. GUI
>
> The GUI is entirely based on Swing (and AWT) components. The only
> exception is the Linux Tray, which uses the java-gnome library (see
> below).
>
> 5.1. File Manager Integration (org.syncany.gui.desktop)
> This architecture is based on the one from the original Dropbox
> plugin. I would have done it a little differently, but it's clever, I
> must admit :-)
>
> Syncany starts two servers to which the file manager plugins can
> connect to. Both client/server protocols are based on a simple
> text-based data exchange.
> Note: "Client" refers to the file manager plugin. "Server" to Syncany:
>
> - CommandServer (client --[req]--> server --[resp]--> client): allows
> the file manager plugin to retrieve information such as the emblem for
> a certain file (= green/blue sync icon), context menu, etc.
> - TouchServer (server --[req]--> client): Allows Syncany to send
> "invalidate file" requests to the file manager. This causes the file
> manager plugin to re-request all the information for a file.
>
> 5.2. Tray
> In theory, the tray icon could be implement identically for all
> platforms since AWT provides a TrayIcon class. However, if you have
> used this class on Linux, you can see how ugly the icon and its popup
> menu looks. Hence, the Linux tray is implemented differently.
>
> All tray classes support the methods "updateUI" (= updates the menu),
> setStatus(UPTODATE/SYNCING), notify(message).
>
> 5.1. Windows/Mac Tray
> I have read (not tried) that Windows and Mac can use AWT TrayIcon
> class without any gosh-how-ugly-is-this problems :-) So there is a
> common tray for both of them. The only method missing for Windows at
> the moment is notify(). For Mac I don't know.
>
> 5.2. Linux Tray (org.syncany.gui.tray and org.syncany.gui.linux)
> The Linux tray uses the java-gnome library to implement the nice
> looking tray menu (and some other things). The problem is that
> java-gnome and Swing _cannot be used together_, if the Gtk look&feel
> is used (took me days to figure that out!). That means that in the
> same Java application, they cannot coexist without crashing in random
> moments. Don't try it....
>
> So the "natural" way to get the Gtk based tray and menu anyway was to
> create a separate program for it and let the original Syncany
> application communicate with that program. This external program
> delivers al functions that use java-gnome library. This particularly
> includes:
>
> - The tray icon & menu
> - The notification (notify-osd)
> - The Linux native file open dialogs
> - (could be more)
>
> All these functions are grouped in a class called "LinuxNativeService"
> in the org.syncany.gui.linux package. This class has its own main
> method and is started by Syncany itself.
> When the LinuxTray class is initialized, it creates a
> "LinuxNativeClient", which runs a new child process for the
> "LinuxNativeService".
>
> Once started, the LinuxNativeService starts a server socket on a
> random port and tells that to the client. From then on, they can then
> communicate via the send() method of the client.
>
> I must admit, the Linux tray adds a lot of baggage to the application,
> even though the functionality would be delivered by the AWT tray....
> But I believe it's worth it.
>
>
> Okaaayy, I think that's it for now.
> Hope this helps you guys :-)
>
> Cheers,
> Philipp
>
>
> [1] https://sourceforge.net/projects/jpathwatch/forums/forum/888207/topic/4538927
>

Attachment: architecture.pdf
Description: Adobe PDF document


References