zim-wiki team mailing list archive

Thread
Date
Re: Directly adding files and sub-directories and re-indexing

To: Jaap Karssenberg <jaap.karssenberg@xxxxxxxxx>
From: Jose Lourenco <jose@xxxxxxxxxxx>
Date: Mon, 22 Oct 2012 13:40:34 +0100
Cc: zim-wiki@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CA+TmwMHb5jkjhU1p+EsB5pRG2LN2UP6YO0d1g-ckE=1pQoLwHg@mail.gmail.com>
Hi Jaap,

Thanks for your reply.
I have tried to recheck and reproduce what I have said. Please see the
answers.


On Fri, Oct 19, 2012 at 8:33 AM, Jaap Karssenberg <
jaap.karssenberg@xxxxxxxxx> wrote:

> Hi Jose,
>
> reading over this description, I think I see two different issues that we
> shoudl probably separate.
>
> 1/ When you save files directly into the zim folder structure it messes up
> the index.
>
> This is a bug. Any file should be allowed as attachment and be visible
> e.g. in the attachment browser pane. If you have examples of specific
> situations that cause bugs, please put them in the bug tracker.
>
> 1.1) I believe this does happens only when a .txt text file with a
non-normalized name is saved into a folder. In this case it won't be
normalized nor indexed.
       Eg.:
       "This is a test file.txt"

1.2) Of course, any files/sudirs that may exist on a created folder with a
non-normalized name folder won't see their names normalized or indexed.
This is another kind of issue, though.

1.3) There is also one other thing about naming's conventions of Zim that I
would like to point and it's related to files portability among OS: Linux,
Windows, Macs.
1.3.1) I would suggest that optionally Zim could enforce that filenames be
portable by replacing non allowed characters from filenames, such as ':'
(on Windows).  Webpages sometimes use this and other characters that cause
issues when we try to copy files to other filesystems, such as windows or
to a pendrive for portable offline usage.

1.4) I found another issue, not necessarily a bug, with a ill-formed .txt
file that has a type of (info from command "file"):
        "Doxygen.txt: Non-ISO extended-ASCII English text, with CRLF, NEL
line terminators".
        As expected this file will stop Zim's indexing indicating that it
cannot decode file.

The only restriction is that when you create folders to download the files
> to, these folders should follow zim naming conventions.
>
> 1.5) Yes. However, if we place a folder (inside another properly created)
the corresponding file name will appear grayed on the left index panel. If
we happen to select this file it will open. If, for any reason, we don't
write something on it right away, the file will disappear and will not be
shown on index until a full index is performed. Meanwhile, trying to create
a page with the folder's name for it will not succeed. One have to create
one new page with another name, copy from the previous folder to the new,
delete older folder and rename page.


> 2/ You want to auto-generate pages that have links to the files and tags
>
> Question from my side is if this is really needed if the indexing is
> fixed. Say you download a bunch of files to a folder in the zim notebook.
> When the index detects the new folder it results in an empty page with a
> bunch of attachments which you can see in the attachment browser pane. You
> might have to add the tags manually to the page, but probably you have it
> open anyway to add more notes etc.
>

2.1) yes, most of the cases and, for most of the people this will be the
case. However I tend to group files I download (to on one or more folders)
- webpages, pdf, sourcecode, ... - by matters searched.  But, sometimes, I
don't care to  tag or write notes. This happens to me (frequently) when in
the search process I find things that I feel I will need in the future or
that I needed in the past but, due to time constraints, wouldn't like to
make a note or tag it right away. I rely on the search index on these cases
as if these (to be attached) files are linked on the page, the words on the
filename will be indexed and can be successfully searched and found and may
be cataloged later.

>
> What I often do is open a page, write some notes, then click the folder
> icon in the attachment browser. This opens up my standard file browser for
> that folder. Then I just drag and drop whatever I have in attachments etc.
> to that folder.
>
>
> Another thought is that if your workflow is specific from e.g. firefox,
> you may also think about a firefox plugin that facilitates the flow by
> downloading to the right folder with a single click, allowing you to add
> tags on the fly etc.
>
> This would be indeed very nice, unfortunately, the web-to-pdf plugin I am
using now (because is the more accurate and featured with I need) makes it
less practical, being more adequate to send everything to one place. For
instance we can tell it to convert all open tabs in Firefox browser. Open
tabs may have mixed matters that I select and put in the appropriate
folder, usually outside Zim namespaces. So, also, because of 2.1, I tend to
create folders with several files that I later move to Zim.

Having files attached and external links also, automatically created or
not, may enable that one write some code to:
3.1) convert all files content (full or partially) to text and may be able
to operate over them for:
3.1.1) full indexing within or outside zim
3.1.3) automated cataloging by means of some sort of data-mining
...

As almost always this may end on the category: should we change our
workflow to meet the program or change de program to meet our workflow...

Personally I will probably try to address the filenames OS compatibility
issues and also the automatic creation of pages from new folders, with some
kind of options usage and tagging to know how they were created. Doing this
externally is not too much trouble and the code is useful and usable for
other files I have on my systems. While doing this I will try to study
better about Zim and may create a plugin if there would be any interest.


> Regards,
>
> Jaap
>
>
> Thanks again.
Best regards,

Jose


> On Thu, Oct 18, 2012 at 8:46 PM, Jose Lourenco <jose@xxxxxxxxxxx> wrote:
>
>> Hi Jaap,
>>
>>
>> A personal thanks for Zim - an excellent tool.
>>
>> Also a congratulation for the quality of the code and sapience shown.
>>
>> This message I'm sending is very long and may be not that much important
>> to the community.
>>
>> So, if for any reason you aren't able to read it, please rest assured
>> that it is perfectly OK and that I am and will be grateful in any
>> circumstance.
>>
>> But, if you manage to read it, I would very much appreciate any comment
>> or suggestion you may find pertinent.
>>
>> Also, please accept apologizes for my "english" which is not my
>> mother-language.
>>
>>
>>
>> I do a lot of researching on Internet and I am used to save almost
>> everything that found useful in the process; not only links but also .mht
>> files and PDF prints of pages consulted and others related files.
>>
>> Saving is needed  not only because of offline reading but, most
>> importantly, to rest assured that consulted information will stay available
>> for later usage and reference.
>>
>> So, I am using Zim almost like Zotero (http://www.zotero.org/).
>>
>> The process I am using on Zim is rather time consuming done manually -
>> due to quantity - and is error prone.
>>
>> If filenames are not "normalized" as Zim expects, they don't get indexed.
>> Additionally, if directories are "created manually" that information is not
>> seen.
>> Also, this can create general problems with indexing, making Zim
>> inoperable because it detects these index-related problems. When this
>> happens a "normalization" of all affecting paths/filenames is required to
>> be done manually so that Zim will work again.
>>
>> I would like to automate this process as much as possible.
>>
>> I am doing and considering the following:
>>
>> 1) Put downloaded pages grouped by matters searched on related
>> directories inside a general base drectory.
>>     E.g.: /data/z3l/zim-wiki/Downloads-to-Zim
>>
>> 2) by making a simple change to the indexed functionality of Zim it
>> outputs the fullpath of invalid files (and paths) when performing a full
>> index.
>>
>>     ./zim.py --index  /data/z3l/zim-wiki/ &> files_list_invalid.txt
>> (see Listing 1, below)
>>
>> 2.1) I tried to add an external option, such as --indexpaths but have not
>> been successful (without considering a global variable) as I couldn't get
>> this option information way to the function zim/stores/
>> files.py/get_pagelist(self, path) where warnings are issued by writing
>> to logger.warn() when appropriate.
>> 2.1.1) I will try to use a different approach later.
>>
>> 3) I will read the created file (files_list_invalid.txt) and:
>> 3.1) "normalize" the collected invalid paths as per Zim requirements:
>>     * change '  ' to '_'
>>     * skip "*.txt" files (only to be done if there isn't a corresponding
>> directory, as .txt may need to be attached)
>>     * skip ".*" and "_*" directories
>>     * others that may be needed but I haven't found out yet
>>
>>     Note: this is now done "externally", that is, a python script is run
>> and changes are made on filesystem.
>>
>> 4) the idea is then to run a script that:
>> 4.1) moves the files to Notebooks' filesystem tree
>> 4.2) moves directories to Notebooks' filesystem tree
>> 4.2.1) creates a Zim text document ".txt" with the name of the directory,
>> so that the directory may be indexed.
>> 4.2.2) on the Zim page, create links to existing files inside the
>> directory
>> 4.2.3) repeat for each directory and recurse
>>
>> 4.3) pages will have a location derived from the directory name which
>> contains tag information on it.
>> Eg.:
>>     * the content of directory named :
>>       @Python,@Papers,@Storage;@Programming,@Dedupliction;esFS - Storage
>> Efficient Filesystem in Python
>>
>>       will be saved on the following partially-existent (or to be
>> created) path:
>>
>> ..../Python/Papers/Storage/FS_-_Storage_Efficient_Filesystem_in_Python/
>>
>>       The tags following the first ';' are to be added to the
>> corresponding  .txt file to be created.
>>      * the file FS_-_Storage_Efficient_Filesystem _in_Python.txt will be
>> created on the parent dir
>>      * links to files existing on directory
>> FS_-_Storage_Efficient_Filesystem _in_Python will be inserted on the .txt
>> file.
>> Eg.:
>>      FS_-_Storage_Efficient_Filesystem_in_Python.txt:
>>      -------Begin-------
>>             Wiki-Format: zim 0.4
>>             Creation-Date: 2012-10-18T17:41:13+01:00
>>
>>             ====== FS - Storage Efficient Filesystem in Python ======
>>             Created Quinta 18 Outubro 2012
>>
>>             [[./244-941-1-PB.pdf]]
>>
>>             [[./FUSE’ing_Python_for_Development_244.mht]]
>>
>>             @Python
>>             @Papers
>>             @Storage
>>             @Programming
>>             @Deduplication
>>      -------End-------
>>      * parent '.txt' files  may need to be actualized.
>>
>> 4.4) perform a Zim --index
>>
>> 5) Later, when more familiar with Zim innards, I would like to:
>> 5.1) perform part of the above operations with the own Zim existing
>> functionalities (fs.py, ...)
>> 5.2) arrange a way to perform the actions wanted with a greater
>> integration with Zim. For example:
>>       * Zim plug-in for firefox would be nice, if some features were added
>>       * make Zim (optionally) accept directly saved files and create
>> directories, making him "normalize" paths/filenames and create
>> corresponding .txt page files for directories.
>>
>> Listings:
>> (1)
>> z2@pclevo:~/Development/python/zim-0.57$ ./zim.py --index
>> /data/z3l/zim-wiki
>> WARNING: "/data/z3l/zim-wiki/Downloads-to-Zim/@PDF,@PDF_Tools,@Okular;PDF
>> editor annotations_Okular"
>> WARNING:
>> "/data/z3l/zim-wiki/Downloads-to-Zim/@Python,@Papers,@Storage;@Programming,@Dedupliction;esFS
>> - Storage Efficient Filesystem in Python"
>> WARNING: "/data/z3l/zim-wiki/Downloads-to-Zim/@Python,@Papers;Python
>> Papers"
>> WARNING:
>> "/data/z3l/zim-wiki/Downloads-to-Zim/@Python,@Programming,@Testing;@Scripts,@Testing,@Examples,Tests
>> in Python"
>> WARNING:
>> "/data/z3l/zim-wiki/Downloads-to-Zim/@Python,@Scripts,@Shell_tools,@Programming,@Examples,Python
>> Shell Utilities"
>> WARNING: "/data/z3l/zim-wiki/Downloads-to-Zim/@python,@lambda Python
>> Lamba Funcrions
>> ...
>> (these were saved directly inside Zim's tree filesystem)
>> WARNING:
>> "/data/z3l/zim-wiki/Notebooks/Notes/Home/Development/Python/TUI/Python
>> EasyGUI"
>> WARNING:
>> "/data/z3l/zim-wiki/Notebooks/Notes/Home/Development/Python/TUI/UniCurses
>> for Python"
>> WARNING:
>> "/data/z3l/zim-wiki/Notebooks/Notes/Home/Development/Python/TUI/Urwid -
>> Console User Interface Library"
>> WARNING:
>> "/data/z3l/zim-wiki/Notebooks/Notes/Home/Development/Python/TUI/ranger -
>> file manager"
>> WARNIN ...
>>
>>
>> If you have any considerations, advises, or just your raw opinion about
>> the usefulness and feasibility, they will be much appreciated.
>>
>> Meanwhile, please accept my
>>
>> Very best regards,
>>
>> Jose Lourenco
>> _______________________________________________________________
>> Para manter a Wikipedia a funcionar por favor considere fazer a sua
>> doação.
>> Please consider donating to maintain Wikipedia alive.
>>  Support Wikipedia<http://wikimediafoundation.org/wiki/Support_Wikipedia/en>
>>  _______________________________________________________________
>> http://jal.stumbleupon.com
>> http://www.lourenco.ws/about-me
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~zim-wiki
>> Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~zim-wiki
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
References

Directly adding files and sub-directories and re-indexing
From: Jose Lourenco, 2012-10-18
Re: Directly adding files and sub-directories and re-indexing
From: Jaap Karssenberg, 2012-10-19