← Back to team overview

zim-wiki team mailing list archive

Re: Making indexing optional ?

 

Hi Jaap, thank you for all your help on this.

I just tested the development branch which seems to include your mentioned fix. Sadly zim startup took approximately the same time as before. Maybe my use case is abnormal? If so I don't want you to spend lots of time on this.

Below the output of ./zim.py -V -D. All the time is spent in BackgroundCheck. Is this what you would expect? May we (I?) add debug messages to track the time spent indexing large files?

Thank you,
mario

---------------------

mario_bezzi@XPS-15-9560:~/Downloads/zim-desktop-wiki-develop$ ./zim.py -V -D
DEBUG: Loading config from: <ConfigFile: /home/mario_bezzi/.config/zim/preferences.conf> DEBUG: New extendable: <zim.plugins.InsertedObjectTypeMap object at 0x7f99b7a5eaf0>
DEBUG: Loading plugin: pageindex
DEBUG: Loading plugin: pathbar
DEBUG: Loading plugin: insertsymbol
DEBUG: Loading plugin: printtobrowser
DEBUG: Loading plugin: versioncontrol
DEBUG: Loading plugin: attachmentbrowser
DEBUG: Loading plugin: bookmarksbar
DEBUG: Loading plugin: journal
INFO: This is zim 0.73.5
DEBUG: Python version is sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)
DEBUG: Platform is posix
DEBUG: Running from a source dir: /home/mario_bezzi/Downloads/zim-desktop-wiki-develop
DEBUG: Set XDG_DATA_HOME to /home/mario_bezzi/.local/share
DEBUG: Set XDG_DATA_DIRS to [<Dir: /usr/share/ubuntu>, <Dir: /usr/local/share>, <Dir: /usr/share>, <Dir: /var/lib/snapd/desktop>]
DEBUG: Set XDG_CONFIG_HOME to /home/mario_bezzi/.config
DEBUG: Set XDG_CONFIG_DIRS to [<Dir: /etc/xdg/xdg-ubuntu>, <Dir: /etc/xdg>]
DEBUG: Set XDG_CACHE_HOME to /home/mario_bezzi/.cache
DEBUG: Connecting to /run/user/1000/zim-0.73.5-43e7ec27
DEBUG: Got error in dispatch: No such file or directory
DEBUG: Starting primary process
DEBUG: Start listening on: /run/user/1000/zim-0.73.5-43e7ec27
DEBUG: Loading config from: <zim.notebook.info.VirtualFile object at 0x7f99b5719040> DEBUG: Loading config from: /home/mario_bezzi/Dropbox/Documents/Wikis/notebook.zim DEBUG: Loading config from: /home/mario_bezzi/Dropbox/Documents/Wikis/notebook.zim DEBUG: Loading config from: /home/mario_bezzi/.cache/zim/notebook-home_mario_bezzi_Dropbox_Documents_Wikis/state.conf
DEBUG: New extendable: <Notebook: Wikis>
DEBUG: Load extension: <class 'zim.plugins.journal.JournalNotebookExtension'> DEBUG: Load extension: <class 'zim.plugins.versioncontrol.VersionControlNotebookExtension'>
INFO: No VCS detected
DEBUG: Loading config from: <ConfigFile: /home/mario_bezzi/.config/zim/style.conf>
DEBUG: Autosave interval: 15 - use threads: True
DEBUG: Loading config from: <ConfigFile: /home/mario_bezzi/.config/zim/customtools/command shell-usercreated.desktop>
INFO: Page changed on disk: Home:ZABEXPRF:ZX Manuals
INFO: Open page: Home:ZABEXPRF:ZX Manuals (Home:ZABEXPRF:ZX Manuals)
DEBUG: New extendable: <notebookview.NotebookView ZABEXPRF at 0x7f99b56bee00 (zim+gui+notebookview+NotebookView at 0x1bbbb60)> DEBUG: Load extension: <class 'zim.plugins.attachmentbrowser.AttachmentBrowserWindowExtension'>
DEBUG: Action: toggle_panes(False)
DEBUG: Action: toggle_panes(True)
DEBUG: Load extension: <class 'zim.plugins.insertsymbol.InsertSymbolPageViewExtension'> DEBUG: Load extension: <class 'zim.plugins.journal.JournalNotebookViewExtension'> DEBUG: Load extension: <class 'zim.plugins.pageindex.PageIndexNotebookViewExtension'> DEBUG: Load extension: <class 'zim.plugins.printtobrowser.PrintToBrowserPageViewExtension'> DEBUG: New extendable: <mainwindow.MainWindow object at 0x7f99b56b6300 (zim+gui+mainwindow+MainWindow at 0x1b9a270)> DEBUG: Load extension: <class 'zim.plugins.bookmarksbar.BookmarksBarMainWindowExtension'> DEBUG: Load extension: <class 'zim.plugins.pathbar.PathBarMainWindowExtension'> DEBUG: Load extension: <class 'zim.plugins.versioncontrol.VersionControlMainWindowExtension'>
DEBUG: Accelmap: /home/mario_bezzi/.config/zim/accelmap
DEBUG: Add window: MainWindow
DEBUG: BackgroundCheck started
DEBUG: BackgroundCheck finished



On 6/26/21 7:55 PM, Jaap Karssenberg wrote:
Hi Mario,

Just pushed a fix (29bdea) that improves how we check whether files are a zim page or not. Now max 50 characters are being read at the start of the file when indexing. If your large files are not "line based" (thus resulting in a very long read when trying to read the first line) this should fix the issue.

Regards,

Jaap


On Sat, Apr 24, 2021 at 10:06 AM Mario Bezzi <subscriptions.mario.bezzi@xxxxxxxxx <mailto:subscriptions.mario.bezzi@xxxxxxxxx>> wrote:

    Hi Jaap, thank you for your help on this.

    To give you some more details: Of the 3000+ files which size sums
    up to 2GB, the top 500 account for 1.6GB. Among these the average
    size is 3.5MB, and each of the top three is in the 250MB range.

    Please let me know if there is anything I can do to help testing
    your fix,
    mario

    On 4/23/21 2:55 PM, Jaap Karssenberg wrote:
    Hi Mario,

    That is not the result I hoped for :(   I will need to generate
    some random large text files to test & debug on my end.

    Regards,

    Jaap


    On Fri, Apr 23, 2021 at 12:59 PM Mario Bezzi
    <subscriptions.mario.bezzi@xxxxxxxxx
    <mailto:subscriptions.mario.bezzi@xxxxxxxxx>> wrote:

        I think I submitted my request circa 2014 under the previous
        bug tracking system - was it hosted by Ubuntu-one? - but yes,
        the idea is similar.

        I just downloaded the development version, extracted it into
        a temporary folder, and ran it via the ./zim.py command.

        Indexing took some 15 minutes. Below a snapshot of what top
        was saying about the execution.

        top - 12:45:28 up 3 days, 16:12,  1 user,  load average:
        1.87, 1.92, 2.48
        Tasks: 356 total,   3 running, 353 sleeping,   0 stopped,   0
        zombie
        %Cpu(s): 13.0 us,  5.4 sy,  0.0 ni, 81.6 id, 0.0 wa,  0.0
        hi,  0.0 si,  0.0 st
        MiB Mem :  31658.1 total,    320.9 free, 19312.0 used, 
        12025.3 buff/cache
        MiB Swap:    976.0 total,      0.0 free, 976.0 used.  10085.6
        avail Mem

            PID USER      PR  NI    VIRT    RES    SHR S  %CPU 
        %MEM     TIME+ COMMAND
        159310 mario_b+  20   0 771220  80184  43420 R 100.0   0.2
        *14:42.13 zim.py*

        Please let me know if there is more I can do.

        Thank you,
        mario

        On 4/23/21 11:25 AM, Jaap Karssenberg wrote:
        Yes that explains, those large files will have a big impact
        on the indexer.

        You are referring to this issue: Make indexer ignore text
        files that are not zim pages · Issue #907 ·
        zim-desktop-wiki/zim-desktop-wiki (github.com)
        <https://github.com/zim-desktop-wiki/zim-desktop-wiki/issues/907> which
        is fixed in the development branch and will be in the next
        release.

        With that fix the indexer will read the first line of each
        file to decide whether it is a zim file or not, and if not
        it will not try to access the contents.

        Would be great if you have a chance to test the development
        branch and see whether it works in practice for your case !

        -- Jaap


        On Thu, Apr 22, 2021 at 7:32 PM Mario Bezzi
        <subscriptions.mario.bezzi@xxxxxxxxx
        <mailto:subscriptions.mario.bezzi@xxxxxxxxx>> wrote:

            The folder contains 3118 ".txt" files, for a total of
            2GB of data. Some large txt files are attachments. A
            long time ago I submitted a request to avoid indexing
            these. Not sure it has been fulfilled though.

            Thank you,
            mario

            On 4/8/21 7:32 PM, Jaap Karssenberg wrote:
            Can you indicate how big your notebook folder is?
            Either an extreme case, or some bug making it take much
            longer than needed.

            Op do 8 apr. 2021 15:59 schreef Mario Bezzi
            <subscriptions.mario.bezzi@xxxxxxxxx
            <mailto:subscriptions.mario.bezzi@xxxxxxxxx>>:

                Thanks Jaap, I was not aware of this.

                To give you an idea, I just restarted Zim, and
                indexing kept a processor 100% busy for 13 minutes
                to come to an end.  It was nice if this could be
                avoided.

                Thank you,
                mario

                On 4/8/21 10:06 AM, Jaap Karssenberg wrote:
                The indexing is not used for searching alone, it
                is also needed to e.g. present the page tree in
                the side pane and to track links

                Op do 8 apr. 2021 09:34 schreef Mario Bezzi
                <subscriptions.mario.bezzi@xxxxxxxxx
                <mailto:subscriptions.mario.bezzi@xxxxxxxxx>>:

                    Hello,

                    I may be the only one, but with my quite large
                    notebooks I do find the
                    search function impractical, and for this
                    reason I never use it. Still,
                    when it starts, Zim goes crazy for a long time
                    indexing, and I came to
                    the conclusion that this is normal.

                    If this is the case, I would like to file a
                    requirement to add the
                    ability to make indexing optional.

                    Thank you,
                    mario

                    _______________________________________________
                    Mailing list: https://launchpad.net/~zim-wiki
                    <https://launchpad.net/~zim-wiki>
                    Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
                    <mailto:zim-wiki@xxxxxxxxxxxxxxxxxxx>
                    Unsubscribe : https://launchpad.net/~zim-wiki
                    <https://launchpad.net/~zim-wiki>
                    More help   :
                    https://help.launchpad.net/ListHelp
                    <https://help.launchpad.net/ListHelp>







Follow ups

References