syncany-team team mailing list archive
-
syncany-team team
-
Mailing list archive
-
Message #00580
Long ids
Hi,
While discussing with Philipp a few days, we noticed that identifying
files by a long, as is done now, is a bit dangerous in terms of possible
collisions. The probabilities are low (see
https://en.wikipedia.org/wiki/Birthday_attack), but a bit higher than
they should.
So I implemented a not so nice FileId (see my branch
https://github.com/fabrice-rossi/syncany/tree/longer-file-id) to have a
16 bytes id. Philipp was not happy for two reasons: the abstraction was
leaking (you needed two longs to initialize a FileId) and there was no
way we could customize the number of bytes used to identify a file.
I've now implemented a better FileId based on Philipp's input. It's
basically a trimmed down ByteArray. For now, it is not configurable in
term of size, but that would be super easy to do. As this is based on
ObjectId, it can be generalized for other ids.
Philipp, does that suits you better? If this is the case, I will add
proper documentation and tests, and unify this with other ids.
To all, do you know a simple way to have something which does not waste
memory, is type safe (reasonably) and still configurable at runtime?
What I could like to have is a hierarchy of ObjectId, in order to have
type safety (then you cannot mistakingly search for a file using a chunk
id, for instance). That's easy. But I would also like to have small
memory footprint ids. For instance, if I use 16 bytes, then I can pack
them in two longs rather than putting them in a byte[] (long story
short, this moves the memory occupation from 48 bytes par id to 32 bytes
on hotspot running in 64 bits mode with compressed pointers). This is
important if we want to aim at very large repositories. Again, this is
easy: have ObjectId be an abstract class and then implement different
size constrained subclasses (such as TwoLongObjectId) and again specific
subclasses, like FileId which will derived from TwoLongObjectId. But
this prevents configuring the actual memory size at runtime, at least in
a non super annoying way. I mean that I can of course design a factory
and have multiple classes implementing a FileId interface (as a type
marker) and inheriting from the different size constrained subclasses,
but this feels heavy. Any other solution?
Cheers,
Fabrice
Follow ups