syncany-team team mailing list archive

Thread
Date

Re: Long ids

To: Fabrice Rossi <Fabrice.Rossi@xxxxxxxxxxx>
From: Philipp Heckel <philipp.heckel@xxxxxxxxx>
Date: Thu, 5 Dec 2013 00:05:30 +0100
Cc: Syncany Mailing List <syncany-team@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <529E162C.40600@apiacoa.org>

Hello Fabrice,

I finally got around to look at your code, sorry for the delay.

So I implemented a not so nice FileId (see my branch
> https://github.com/fabrice-rossi/syncany/tree/longer-file-id) to have a
> 16 bytes id. Philipp was not happy for two reasons: the abstraction was
> leaking (you needed two longs to initialize a FileId) and there was no
> way we could customize the number of bytes used to identify a file.
>
> (..)
>
> Philipp, does that suits you better? If this is the case, I will add
> proper documentation and tests, and unify this with other ids.
>

Go for it. Looks perfectly fine to me. I know that you are not happy with
the memory consumption, but it'll have to do for now. It would be great to
have FileContentId, MultiChunkId, etc. all based on ObjectId. I'll merge
the current state in the master, and you can build on that.

Be sure to remove ByteArray afterwards.

To all, do you know a simple way to have something which does not waste
> memory, is type safe (reasonably) and still configurable at runtime?
> What I could like to have is a hierarchy of ObjectId, in order to have
> type safety (then you cannot mistakingly search for a file using a chunk
> id, for instance). That's easy. But I would also like to have small
> memory footprint ids. For instance, if I use 16 bytes, then I can pack
> them in two longs rather than putting them in a byte[] (long story
> short, this moves the memory occupation from 48 bytes par id to 32 bytes
> on hotspot running in 64 bits mode with compressed pointers). This is
> important if we want to aim at very large repositories. Again, this is
> easy: have ObjectId be an abstract class and then implement different
> size constrained subclasses (such as TwoLongObjectId) and again specific
> subclasses, like FileId which will derived from TwoLongObjectId. But
> this prevents configuring the actual memory size at runtime, at least in
> a non super annoying way. I mean that I can of course design a factory
> and have multiple classes implementing a FileId interface (as a type
> marker) and inheriting from the different size constrained subclasses,
> but this feels heavy. Any other solution?
>

In short (what we discussed):
a. Typesafe IDs through concrete implementations
b. Flexible size at runtime/compile time
c. Low memory footprint (b/c we have many many IDs)

As we discussed, this is tricky, because (a)+(b) and (c) basically
contradict. As I understood you, a low footprint can be reached by using
primtive types (two longs (2*64 bit) and an int (32 bit) for a SHA1 sum)
and less abstraction. Type safety needs abstraction and a flexible size
can't be done without arrays or lists.

I am not sure if it is necessary to have the size configurable at runtime.
We could also make this an application wide constant, but that would imply
that all checksums and all file IDs must always be based on SHA1 and hence
be 20 bytes long. This is my major concern here.

Do you think that's a bad thing?

Best,
Philipp

Follow ups

Re: Long ids
From: Fabrice Rossi, 2013-12-05
Re: Long ids
From: Fabrice Rossi, 2013-12-05

References

Long ids
From: Fabrice Rossi, 2013-12-03