syncany-team team mailing list archive

Thread
Date
Bigger database issues

To: syncany-team@xxxxxxxxxxxxxxxxxxx
From: Philipp Heckel <philipp.heckel@xxxxxxxxx>
Date: Sun, 08 Dec 2013 00:17:18 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5
Hello everyone,

I apologize for my absence in the ID discussion, but I didn't have much
time lately -- that's why I also didn't commit anything in the last two
days, and a lot of the previous commits were just JavaDoc *feeling bad*
:-) -- Steffen is on vacation, and it's hard to solve bigger problems
without someone to talk to about them ...

Now to the topic: While I am really, really happy that you guys are
discussing so enthusiastically, I think we're drifting a bit into
philosophical and academic discussions. Please do not get this the wrong
way, I think discussion is important, but I think that sometimes code is
easier to understand  -- especially when it's a relatively small change
in code (like with the IDs). That's why I suggest to simply play around
in code and show us what you mean.

Also -- and again: do not take this the wrong way! -- there are many
important things to do to get a working piece of software, and I feel
that the ID question is more of an optimization. Now I know that Fabrice
likes to get to 1MM files (and believe me we'll get there!), but we
first need to be able to perform a cleanup of files and file versions,
and represent the local database in general in a more efficient way. So
if you will: there are bigger issues to consider when drafting an ID
solution, and bigger issues to solve in general :-)

That includes:
1) The entire local database is loaded into memory on start. This
obviously includes the IDs, but it more importantly includes all
FileVersions, PartialFileHistories, ChunkEntrys, etc.
2) It gets worse: Even deleted FileVersions and PartialFileHistories are
still loaded, nothing is ever discarded (no cleanup!)
3) To efficiently "query" the Database, we're keeping several "caches"
in the Database class. Look at the class and how the caches are loaded.
This is truely horrible code!
 - a) The nicer caches are "just" Maps à la checksum->filehistory,
filename->filehistory
 - b) The not so nice cache is the fullDatabaseVersionCache, a
duplicated version of the database in RAM (just pointers to the same
objects, but still, lots of pointers!)
4) Then there is the compatibility of the proposed ID solutions with a
potential JPA integration (Steffen and I already experimented in the
database/databaseexperiments branch). Have you guys thought about how
that could work?
5) ... and as Fabrice already noticed, the relationship between the
org.syncany.database package and the org.syncany.chunk package also
carries a few ID questions ...

So I guess what I'm saying is:
- Keep thinking about these issues, but be a little more pragmatic
- If you think it might be easier to understand in code, write code
first, then explain :-)
- Keep the bigger picture in mind (see above!) -- if it's not clear, ask!

Next steps:
- I'm meeting with Gregor tomorrow: My original goal was to talk about
the database stuff in general, but I guess we'll also talk over the ID
stuff. Maybe we'll be enlightened then. We'll review all the code and
suggestions and hopefully implement something. (Btw. I liked the
ShortId<T> & ArrayId<T> idea)
- It would be very valuable to me if you could review the general
Database in-memory representation. My solution to the ever-growing local
RAM was to simply put everything in a local SQL database, and load it on
demand, but the JPA stuff is complex and maybe it can be done more
easily ... Ideas?

Thank you guys! Keep up the good work!!

Best,
Philipp
Follow ups

Re: Bigger database issues
From: Fabrice Rossi, 2013-12-08