syncany-team team mailing list archive

Thread
Date

Re: Long ids

To: Gregor Trefs <gregor.trefs@xxxxxxxxx>, Philipp Heckel <philipp.heckel@xxxxxxxxx>
From: Fabrice Rossi <Fabrice.Rossi@xxxxxxxxxxx>
Date: Fri, 06 Dec 2013 18:44:16 +0100
Cc: Syncany Mailing List <syncany-team@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <52A1B352.9080806@gmail.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1

Hi Gregor,

Le 06/12/2013 12:21, Gregor Trefs a écrit :
>> 1) if we want ObjectID sizes to depend on the indexed type (this
>> is the case), we need something slightly more complicated. My take 
>> would be to have all indexable classes to implement an Indexable 
>> interface with a preferredSize method which returns the number of 
>> bytes needed. Then we will have ObjectId<T extends Indexable> or 
>> something like this.
> No. Just let the Object ID be so simple from the outside as it is
> ;). If you want to make the ObjectID sizes dependent on the type 
> information, then you might use the StrategyPattern.

I'm puzzled, isn't that a factory pattern? (I'm not being pedantic, it's
just that it does not strike me as different from a factory.)

> A bit more detail:
> 
> 1. Class ObjectIdFactory This class has several creation methods for 
> the different classes (e.g. createMultiChunckId(byte[] array)). 
> Within such a method, you are aware for which type you create the
> ID. However, ObjectId should not be aware of this fact.

Well, that's more complicated in my opinion. It means that the
logic for creating an Id is neither in the ObjectId nor in the object to
identify. This seems too much flexible and a maintenance burden (but
this below for more).

> Also, MultiChunckRef should not know nothing more about his 
> identifier than its pure existince.

That's a bit more complicated (see the discussion on the Chunks with
Philipp). A PartialFileHistory cannot contribute much in term of
choosing its id, as there is no simple deterministic scheme that would
work in this context (file content change, names also, etc.). So we
resort on a random id, which is a general facility that can be provided
by ObjectId (or a factory, see below).  But in the case of a Chunk, we
have a natural candidate for the id, the chunk checksum and we have no
reason to do something else. So Chunks know how to create their id,
PartialFileHistorys don't.

> One solution is, to have an abstract Strategy class whose instance 
> know about the type (e.g. class of MultiChunkRef), id structure and 
> how to best take care of ID related actions (e.g. compare with other 
> ID).

Why moving that to an abstract factory, if we can do it with a concrete
one?

> A little draft code:
> 
> public class ObjectId <T> { private final IdStrategy<T> stratgey;
> 
> ... public void equals(ObjectId other){ ... 
> if(strategy.eqauls(other)){ return true; } }
> 
> }

Like I said, one of the design objective is to be able to use memory
tight ids. If one keeps a pointer to a strategy or to a class in each
instance of an Id, one wastes 4 or 8 bytes (depending on the 32/64 bits
flavor of the JVM and whether the 64bits reference can be packed in this
case). I think we can do better.

I would do something like this:

- an interface/abstract class Id<T>
- at least two concrete classes:
  - ShortId<T> for FileId and other objects for which a 16 bytes random
id is enough (based on the UUID experience, this is very adapted in many
cases)
  - ArrayId<T> for objects who longueur ids (like content based id for
Chunks)
  - both concrete classes have proper hashcode, equals and toString
methods, but none use T in those methods
- a factory IdFactory which provides:
  - convenience methods for creating random ids of a given length, for
turning a String into an id and vice versa (the latter being used in the
toString)
  - possibly, as you propose, factory methods for each indexable type

What is still no clear in my head is whether the factory should be
generic and based on properties described in an Indexable interface
(then we will have Id<T extends Indexable>) or more pragmatic with as
many methods are concrete indexable types. On a philosophical point of
view, I prefer the first solution, but it seems a bit overengineered.
Also, it needs an existing indexable object before creating an id which
will no be always super convenient and flexible.

In any case, we have type safety as an Id<A> is never an Id<B> (if A and
B are unrelated), which makes taking the T type into account useless at
the equals level, if I'm not mistaken. We have memory efficiency and we
don't need to multiply the number of concrete Id type by the number of
concrete indexable type. Seems to be quite nice.

Cheers,

Fabrice

References

Long ids
From: Fabrice Rossi, 2013-12-03
Re: Long ids
From: Philipp Heckel, 2013-12-04
Re: Long ids
From: Fabrice Rossi, 2013-12-05
Re: Long ids
From: Fabrice Rossi, 2013-12-06