← Back to team overview

zim-wiki team mailing list archive

Re: Time Stamped Text (TST) plugin

 

Jaap,
I am trying to figure out, what is the best storage solution for small
changes, so far there are two suggestions:

   1. single file with separated patches and integrated timestamps (in
   progress)
   2. zip archive, which contains patches / files (name of the file
   represents timestamps)

solution nr. 1:

   - index is not possible in this solution, only a chain of patches
   - special characters and following timestamps are used to separate the
   patches
   - it is easier to append new patches to end of the file
   - the file could be tracked by the main VCS (since it is text)

solution nr. 2:

   - possibility to keep index file, which increases lookup of patches
   - timestamps are stored in single files in the zip file
   - there is involved compression of the timeline
   - it is easier to handle list all patches even in file browser

Probably the best approach is the first version. Additionally I do have a
question, whether it is fine to use other code, which is not licenced under
GPL. Particularly it concerns suitable python module licensed under Apache
License, Version 2.0: http://code.google.com/p/google-diff-match-patch/

The attachments contain:

   1. diff_match_patch.py (python code, which generates patches and
   reconstructs text)
   2. page.timeline (proposed storage of changes with timestamps)
   3. testing of concept.py (code, which utilizes diff_match_patch.py and
   serves as a prove of concept)

JK


On 6 June 2013 15:41, Jaap Karssenberg <jaap.karssenberg@xxxxxxxxx> wrote:

> On Tue, Jun 4, 2013 at 11:41 PM, NorfCran <norfcran@xxxxxxxxx> wrote:
>
>> Yes, I do not want to replace the text files with XML (the concept based
>> on TXT files is the most flexible in my opinion). You are right, attaching
>> additional meta information in a shadow file is the only way.
>>
>> Since you have asked for use cases, I am going to model some of them
>> separately for a notebook and for a single page, in order to illustrate
>> utilization of Time Stamped Text (TST):
>>
>> *notebook*
>>
>>    1. search the most recently modified pages by a time range
>>       - hierarchical structures like wiki are dynamic and often changes
>>       happen on many pages, so why not to preserve this flow determined by time
>>       in its natural form?
>>       - ordinary search with many matching pages may be filtered out by
>>       a time range, which is almost possible only very generally based on ctime
>>       and mtime in the zim's database
>>
>>  *page (main utilization of TST)*
>>
>>    1. highlighting up to date changes by smooth versions stored in TST
>>    data structure
>>       - most up to date changes are highlighted for instance by a red
>>       color and it fades to black (so it is easier to see, in case of
>>       modifications and revisions)
>>    2. provides possibility to revert changes by performing undo and redo
>>    any time even though the text buffer is no longer available
>>    3. time is a natural binder for any other activity performed in
>>    parallel to the note taking process → for instance searching information on
>>    the web (traceable from browser's history)
>>
>>  One of my projects attached to the email researches a graph
>> data-structure among other solutions capable of storing timestamps per word
>> chunks (as a lowest granularity) separated by spaces. The data-structure is
>> based on graphs and it has been implemented, but it is still not robust
>> enough for all cases. Maybe it can bring some additional understanding and
>> further direction of our discussion.
>>
>
> OK, I would plan something like that as follow:
>
> 1/ Come up with a compact patch / diff like format that we can use to
> store small changes in a "journal file" next to the source file)
> 2/ Hook up a plugin to write such a patch file and update on each
> auto-save action
> 3/ Add an API to the plugin such that we can
>    a/ query timestamp for a specific piece of text in a specific file
>    b/ can request timestamps for each part of the current version of the
> file
>    c/ request previous / next change for a given file
> 4/ Extend the search function to use API part "a" and add a column to the
> search dialog
> --> fulfills first use case
>
> 5/ Extend the page view to use API part "b" to highlight recent changes
> 6/ Hook up the undo/redo-manager
>    a/ to use API part "c" to extend undo in the past
>    b/ send data to the plugin per change as they happen, instead of
> waiting for auto-save
> --> fulfills 2nd use case
>
> Probably the quickest result for a first result would be if you look into
> step 1-3a and I hack in step 4. If that is working I'm willing to help with
> steps 5 & 6 to integrate into the editor widget.
>
> Regards,
>
> Jaap
>
>

Attachment: diff_match_patch.py
Description: Binary data

Attachment: page.timeline
Description: Binary data

Attachment: testing of concept.py
Description: Binary data


Follow ups

References