syncany-team team mailing list archive

Thread
Date

Re: moving from JPA to ?

To: Philipp Heckel <philipp.heckel@xxxxxxxxx>, syncany-team@xxxxxxxxxxxxxxxxxxx
From: Jeroen Verheye <jeroen.verheye@xxxxxxxxx>
Date: Tue, 3 Jan 2012 22:46:39 +0100
In-reply-to: <CAAvm79Zm5Z0Nfh1u90ODisokxHHpevu1=Z06rc7scZY-7m45nA@mail.gmail.com>

Philipp

Yes, I understand the problem.  A memory map seems a good idea, but memory
usage still should be minimized. I switched from Firefox to Chrome actually
for this reason.

Maybe it would be a good idea to store *only a part of the
chunks*(indexes) in a memory map with a max number of items (sort of
cache) and
persist everything in a custom file format. The cache could be implemented
as a LRU (Least Recently Used) list. So when the cache is full, items at
the end of the list (the least recently used files) are removed:
http://en.wikipedia.org/wiki/Cache_algorithms#Least-Frequently_Used

You could introduce a worker thread that periodically checks the changed or
added index items in the cache and persists them, so that quick
write-through still is guaranteed (your other threads won't have to wait
for the index beïng persisted, as it will be done in the memory map).

I don't know how far you have been through researching it, but I thought I
could share my ideas about this.

Sincerely

Jeroen Verheye

2012/1/3 Philipp Heckel <philipp.heckel@xxxxxxxxx>

> Hi Jeroen,
>
>
> I read that one of the goals was removing JPA from the project, because
>> it's too slow.
>> So I guess that
>>
>>    1. you want to write queries yourself without JPA
>>    2. *OR*
>>    3. you also want to get rid of the database, which is I believe
>>    Apache Derby?
>>
>> What's the alternative? Saving the indexes into plain text files?
>>
>
> The problem is not _saving_ the index, it's the thousands of queries
> Syncany has to make during the chunking process: for each chunk, the
> chunking algorithm has to ask the index "is this chunk there yet (SELECT)?
> If not, add it (INSERT)". Assuming a 400 MB file and 4 KB chunks, that
> would be 100.000 queries (+ 100.000 INSERTs if the file is new). With an
> SQL based index, this is slow. In fact, I assume that any SQL based DB is
> slower than a simple hash table in the memory. And as long as the chunk
> index doesn't grow too big, we're good with that approach.
>
> Take this example: you store 10 GB of data in the Syncany folder. With a
> chunk size of 4 KB, that's 10 x 1000 x1000 KB / 4 KB = 2.5 million chunks.
> Assuming a memory usage of 50 byte (checksum of the chunk + references),
> that's 2.5 million x 50 bytes = 125 MB -- which is not low, but okay
> compared to some other applications (*cough* Firefox *cough*).
>
> Since we need to exchange the index with the other clients anyway, we do
> need to store it in a custom file format.
>
> Good for an answer? :-)
>
> Cheers,
> Philipp
>
>>
>> Sincerely,
>>
>> Jeroen Verheye
>>
>> --
>> Mailing list: https://launchpad.net/~syncany-team
>> Post to     : syncany-team@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~syncany-team
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>

References

moving from JPA to ?
From: Jeroen Verheye, 2012-01-03
Re: moving from JPA to ?
From: Philipp Heckel, 2012-01-03