← Back to team overview

zim-wiki team mailing list archive

Re: Making indexing optional ?

 

Hi Mario,

Just pushed a fix (29bdea) that improves how we check whether files are a
zim page or not. Now max 50 characters are being read at the start of the
file when indexing. If your large files are not "line based" (thus
resulting in a very long read when trying to read the first line) this
should fix the issue.

Regards,

Jaap


On Sat, Apr 24, 2021 at 10:06 AM Mario Bezzi <
subscriptions.mario.bezzi@xxxxxxxxx> wrote:

> Hi Jaap, thank you for your help on this.
>
> To give you some more details: Of the 3000+ files which size sums up to
> 2GB, the top 500 account for 1.6GB. Among these the average size is 3.5MB,
> and each of the top three is in the 250MB range.
>
> Please let me know if there is anything I can do to help testing your fix,
> mario
>
> On 4/23/21 2:55 PM, Jaap Karssenberg wrote:
>
> Hi Mario,
>
> That is not the result I hoped for :(   I will need to generate some
> random large text files to test & debug on my end.
>
> Regards,
>
> Jaap
>
>
> On Fri, Apr 23, 2021 at 12:59 PM Mario Bezzi <
> subscriptions.mario.bezzi@xxxxxxxxx> wrote:
>
>> I think I submitted my request circa 2014 under the previous bug tracking
>> system - was it hosted by Ubuntu-one? - but yes, the idea is similar.
>>
>> I just downloaded the development version, extracted it into a temporary
>> folder, and ran it via the ./zim.py command.
>>
>> Indexing took some 15 minutes. Below a snapshot of what top was saying
>> about the execution.
>>
>> top - 12:45:28 up 3 days, 16:12,  1 user,  load average: 1.87, 1.92, 2.48
>> Tasks: 356 total,   3 running, 353 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 13.0 us,  5.4 sy,  0.0 ni, 81.6 id,  0.0 wa,  0.0 hi,  0.0 si,
>> 0.0 st
>> MiB Mem :  31658.1 total,    320.9 free,  19312.0 used,  12025.3
>> buff/cache
>> MiB Swap:    976.0 total,      0.0 free,    976.0 used.  10085.6 avail
>> Mem
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>> COMMAND
>>
>>  159310 mario_b+  20   0  771220  80184  43420 R 100.0   0.2  *14:42.13
>> zim.py*
>>
>> Please let me know if there is more I can do.
>>
>> Thank you,
>> mario
>>
>> On 4/23/21 11:25 AM, Jaap Karssenberg wrote:
>>
>> Yes that explains, those large files will have a big impact on the
>> indexer.
>>
>> You are referring to this issue: Make indexer ignore text files that are
>> not zim pages · Issue #907 · zim-desktop-wiki/zim-desktop-wiki (github.com)
>> <https://github.com/zim-desktop-wiki/zim-desktop-wiki/issues/907> which
>> is fixed in the development branch and will be in the next release.
>>
>> With that fix the indexer will read the first line of each file to decide
>> whether it is a zim file or not, and if not it will not try to access the
>> contents.
>>
>> Would be great if you have a chance to test the development branch and
>> see whether it works in practice for your case !
>>
>> -- Jaap
>>
>>
>> On Thu, Apr 22, 2021 at 7:32 PM Mario Bezzi <
>> subscriptions.mario.bezzi@xxxxxxxxx> wrote:
>>
>>> The folder contains 3118 ".txt" files, for a total of 2GB of data. Some
>>> large txt files are attachments. A long time ago I submitted a request to
>>> avoid indexing these. Not sure it has been fulfilled though.
>>>
>>> Thank you,
>>> mario
>>>
>>> On 4/8/21 7:32 PM, Jaap Karssenberg wrote:
>>>
>>> Can you indicate how big your notebook folder is? Either an extreme
>>> case, or some bug making it take much longer than needed.
>>>
>>> Op do 8 apr. 2021 15:59 schreef Mario Bezzi <
>>> subscriptions.mario.bezzi@xxxxxxxxx>:
>>>
>>>> Thanks Jaap, I was not aware of this.
>>>>
>>>> To give you an idea, I just restarted Zim, and indexing kept a
>>>> processor 100% busy for 13 minutes to come to an end.  It was nice if this
>>>> could be avoided.
>>>>
>>>> Thank you,
>>>> mario
>>>>
>>>> On 4/8/21 10:06 AM, Jaap Karssenberg wrote:
>>>>
>>>> The indexing is not used for searching alone, it is also needed to e.g.
>>>> present the page tree in the side pane and to track links
>>>>
>>>> Op do 8 apr. 2021 09:34 schreef Mario Bezzi <
>>>> subscriptions.mario.bezzi@xxxxxxxxx>:
>>>>
>>>>> Hello,
>>>>>
>>>>> I may be the only one, but with my quite large notebooks I do find the
>>>>> search function impractical, and for this reason I never use it.
>>>>> Still,
>>>>> when it starts, Zim goes crazy for a long time indexing, and I came to
>>>>> the conclusion that this is normal.
>>>>>
>>>>> If this is the case, I would like to file a requirement to add the
>>>>> ability to make indexing optional.
>>>>>
>>>>> Thank you,
>>>>> mario
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~zim-wiki
>>>>> Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~zim-wiki
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>>
>>>
>>
>

Follow ups

References