← Back to team overview

gtg-user team mailing list archive

Re: My vision of backends in GTG


On Thu, Dec 17, 2009 at 9:56 AM, Lionel Dricot <ploum@xxxxxxxxx> wrote:
> The following describes my vision of backends and storage in GTG. This
> is what I intended to do for 0.2. If you agree, it will be done for 0.3!

Thanks for sharing the vision.

I don't want to stop you from doing this work, but if you want to have
the best outcome, then it's worth being clear on exactly *why* you
want to do this, and on what the result will look like to the user.
Because I believe this, I'm going to ask quite a few questions.

> >From the very first lines of GTG, we knew that storing your tasks would
> be different for everybody. That's why we designed our backend
> architecture and we designed GTG to work with multiple backends at the
> same time.

Why would storing tasks be different for everyone? I don't use Tomboy
any more, but I used it for years without caring how it stored my
notes. Why would I care about how GTG stores my tasks?

> Did you know ? GTG already supports multiple backends since 0.1 ! If you
> manually add a second backend in the backends.xml, it will work like
> expected.

I did know that. It makes me wonder, is there an explanation of the
layers of the GTG architecture?

> That's fine to read task, but how will the user define in which backend
> he wants to store a specific task ?

Why would someone need more than one backend for storing a specific
task? Why not have a document-based approach where users can open a
task database and read & write from there?

> Use the already existant Luke! Do you see? Indeed, I'm looking at
> @tags !
> So, will it work ?
> I imagine 4 types of backend you can have : read-write, read-only,
> import, export.

"Backend" is a pretty generic term. I know it has a particular use
within the GTG codebase, but I think that the discussion would be
clearer if we could find something better to use.

> * Read-write is the traditionnal backend you already know. It will also
> be used for the couchdb backend and, in fact, most backends.

Is there anything already in the code or documentation that specifies
what you must do to implement one of these? I think that writing such
a thing would be an excellent first step.

Looking at GTG/backends/localfile.py, it seems that there's a bunch of
top-level functions that must be provided by a module, and a Backend

The purpose of most of the top-level methods is clear, but
get_parameters() seems pretty vague. It's not clear why some things
are top-level functions and other things are methods.

> * Read-only displays a list of tasks that you have to do but you should
> trigger something external to close them. It might be useful for
> backends like bugzilla (at least as a first step) or for centralized
> backend where your boss have to acknowledge that you finished a task.

AIUI, the _only_ thing this gives you that import backends don't is
that it stops you from editing the tasks.

In that case, rather than having multiple types of backends, why not
have a property on tasks that says whether or not that task is

> * Import backends will simply take tasks from the backend, just as the
> user entered them manually. A typical example might me a Twitter
> backend. It would retrieve all of your tweets with the tag #todo and
> create a new task with them.

I don't think it's helpful to think of this as a "backend". It's
actually two things:
  - a process for putting things into a backend
  - configuration to say how often this should take place

> * Export backend write the tasks somewhere but still keep them in GTG.
> For example, it could export your tasks to a simple task viewer on your
> cellphone without any complicated sync mechanism.

As above. It's not a backend at all. It's a way of getting stuff out
of a backend and into something else.

> As you can see, I think that RW and Import will already cover most
> usecases.
> So, let's call a given instance of a backend a "TaskSource"
> By default, GTG will start with one tasksource using the current text
> file backend. It will be possible for users to add other tasksources
> like you add accounts in Empathy.

Or couchdb by default.

> When adding multiple tasksources, the user has to define a default
> tasksource. By default, every task will go there.

If you are going to allow for multiple tasksources, then this is a
very sane approach.

> For other tasksources, user will define tags related to each
> tasksource.
> For example, if I define a tasksource couchdb with my tasks and that I
> have, at work, a local file tasksource (tasks about my work are
> confidential and cannot leave the computer), I can simply say that
> couchdb is my default and that the local file tasksource catch all tasks
> with the tag "@work" (including subtags).
> It means that, at work, I will see my personal tasks and my work tasks
> when, at home, I will only see my personal tasks.
> Tasksources will also have the "catchall" option so that every task goes
> into that particular tasksource.

I'm not sure how I feel about this. Although you're re-using an
existing entity (the tag), it seems a bit complicated.

You are already going to need to have a user interface around task
source, because people will need to be able to configure it. Why not
make that user interface explicit when you are editing tasks, and just
allow people to choose a task source there?

> But wait, what if a task as multiple tags and go into multiple backend ?
> I see two solutions :
> - we give unique ID to each task, regardless of the backend so GTG
> understand that this is the same task.

Do that. It's a well-understood solution to a well-understood problem.

> - we adopt the "one title = one task" mantra, like Tomboy. Our unique ID
> is the title! If you create a task with an already existing title, a (1)
> is added "New Task (1)". If quickadd/create children with an existing
> name, that existing task is reopened. (that might be useful to find
> informations about the last time you did that task).
> If there's a conflict, we might put both version or does something
> similar : it's only text after all. Also, it means that a given task
> could have multiple ID. That's not a problem, it only has to be done.

Don't do that.

> Each backend will define itself with a set of options (a bit like
> plugins). In order to not copy/paste too much, we might have some
> predefined backends that are instantiated by others. For example, we
> will have a local-text-file backend for which the user can choose the
> location of his datas. The default backend will only be an instantiation
> of that backend but with a fixed data path (value=XDG_HOME_DATA).

That's a great idea. Another way of thinking about it is that you want
an object to represent a format (what we call a backend), and an
object to represent a document of that format (what we call a
tasksource). It should be possible to create a document of a
particular format from the format object.

The critical thing is that there has to be a clearly defined interface
that any valid TaskSource must implement. Ideally, we'd have unit
tests for this interface, and run these tests against all known types
of tasksource.

You probably will also want an object that looks like a tasksource but
in fact represents multiple tasksources.

> Backends will also declare if they support specific GTG features like
> start_date, subtasks and others in the future. For example, the RTM
> backend will not support subtasks. As a consequence, tasks from the RTM
> backend will not accept subtasks and this feature will be disabled in
> them. (of course, RTM plugin will be ported as a backend, at least I
> hope so !)

That's also a good idea.

Another way to do this is to have the tasksource raise a well-known
error if you try to access a feature that it doesn't provide. You
might find that you'll end up doing both.

Although we're a long while away from this, disabling the features is
correct, hiding them is bad. Also, the user should always have some
way of figuring out why a thing is disabled and be able to figure out
what they can do to enable the feature.

> So, let's take one extreme example :
> My default tasksource will be a couchdb one so I can access my tasks
> from everywhere.
> At work, I will have a local-file tasksource with @work tag (and
> subtags).
> I will have an import twitter tasksource that will put in my default
> tasksource all tweets sent to @ploum with the tag #gtg (including mine)
> (the tag @fromtwitter will be added)
> I will have a read-only tasksource that display all bugs assigned to me
> in Launchpad. Those will automatically have the @gtg tags.
> I will have an export tasksource that send to my phone every tasks witth
> the tag @shopping, overwritting the old list at each synchronisation
> (but, as you can see, the tasks stay in the default tasksource).

That's quite an extreme example.

> So, what's my plan to do this?
> Before everything (step 0), I want to change and improve the backends
> API. For example, the get_all will no longer returns every single task
> from the dawn of humanity but will instead retrieve opened tasks and
> tasks that were closed less that X days ago where X is a parameter,
> common to all backends, that the user can change. This will solve :
> https://bugs.edge.launchpad.net/gtg/+bug/312700
> https://bugs.edge.launchpad.net/gtg/+bug/495758

Better yet, don't have a get_all method: have something that supports
a streaming approach.

> As a compensation, the backends will also (maybe later) have a "search"
> function. When receiving a search, the backend will answer with every
> single tasks matching the search, even old ones.

Which means you'll need a search API. They are tricky.

> But now, we can start the implementation.
> First, we have to make the name of a task unique.

No we don't. :)

> Also, the ID of a tasksource should be unique because that ID is written
> in the description of tasks.


> Second step is ensure that tasks could support multiple IDs from
> differents tasksources.


> Third step : add support to assign a specific backends based on tags.
> Yes, it means that tasks migth be first added to the default backend
> then removed from it once a tag is added.

Actually, no. The next step is having a user interface for configuring
a backend at all.

Then the next step is having support for _multiple_ tasksources with
one default write tasksources. This includes things like being able to
look at a task and see which tasksource it is from. It will also
include things like disabling features based on backend.

*Then* you get to having support based on assigning a tasksource to a tag.

> Fourth : we should ensure that removing tags puts well the task in the
> default backend.

You mean, moves the task to the default backend? Yes that's important.

> Fifth : put a GUI above all of that so the user can choose and configure
> his own tasksources.

As mentioned before, I think you should do this sooner. None of the
earlier steps have any user benefit until you can do this.

> This is the vision of a multi-backends tasks system I developed for more
> than one year. This is the last big stuff I want to add to GTG before a
> 1.0 release. Of course, nothing is written in stone. I'm open to critics
> and proposition (mostly from coding people as this is a quite technical
> proposition). Also remember that GTG is a community project and that my
> vision will only be accepted if I can convince the majority of
> developers that this is a good idea.

Thank you for putting your vision on the line.

I realize that I've been a bit critical. This is partly because I have
a bias against big infrastructural changes, but more because I care
about GTG and want it to keep doing well.


Follow ups