← Back to team overview

zim-wiki team mailing list archive

Re: Major slowdown in indexing with upgrade to 67rc2

 

You could try removing the "ORDER BY" and/or "path LIKE" pieces of the
query, see what that does for your performance.

If this turns out to help, I can split the loop into several while yielding
in between to make the rest of the application "tick"

-- Jaap


On Sun, Mar 25, 2018 at 11:14 PM <HawKing@xxxxxxxxxxxxx> wrote:

> At brief glance, I'm guessing this is the issue:
>
>
> https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/d96b3509890f4c9b9af9119f64b64947337d8da7/zim/notebook/index/files.py
> line 89
>
>  def _update_iter_inner(self, prefix=''):
>                 # sort folders before files: first index structure,
>         then contents # this makes e.g. index links more efficient and
>         robust # sort by id to ensure parents are found before children
>                 while True:
>                         row = self.db.execute(
>                                 'SELECT id, path, node_type FROM files'
>                                 ' WHERE index_status = ? AND path
>         LIKE ?' ' ORDER BY node_type, id',
>                                 (STATUS_NEED_UPDATE, prefix + '%')
>                         ).fetchone()
>
>                         if row:
>                                 node_id, path, node_type = row
>                                 #print ">> UPDATE", node_id, path,
>                         node_type
>                          else:
>                                 break
>
> It seems like the whole database is being re-loaded and re-ordered
> again for the import of every single file. As file number in a notebook
> increases, this per-file database operation seems not to scale
> linearly, but some much higher order. Something like globbing for the
> entire notebook-subdirectory structure and then db importing on a loop
> through that glob would be vastly more efficient for large file numbers.
>
>
>
> On Sun, 25 Mar 2018 14:30:33 -0500
> <HawKing@xxxxxxxxxxxxx> wrote:
>
> > Would you mind pointing me to the source file(s) that manage this
> > indexing? I'd like to see if there is any way to speed the process
> > up for large numbers of files.
> >
> >
> >
> > On Mon, 3 Jul 2017 17:42:07 +0000
> > <HawKing@xxxxxxxxxxxxx> wrote:
> >
> > > Yes, for medium sized notebooks, and those with a "normal" amount of
> > > files, indexing is still under 5 minutes. I also use Zim to manage a
> > > notebook under which there are lots of small work data files
> > > (>350,000).
> > >
> > > The progress bar suggests there is some part of the parsing process
> > > that slows down over time, as does a cursory check on the contents
> > > of the database updating over time. There are many more files added
> > > within the first few minutes, and many fewer over time, such that
> > > after a while, only one or two files are added ever several minutes.
> > > It suggests to me that the whole list is being re-processed or
> > > re-opened as part of the indexing loop, perhaps re-opening the
> > > sqlite file for every new file or something. Ultimately, I don't
> > > think that exponential slowdown is a necessity, but I have not had
> > > a free moment to familiarize myself with the source yet.
> > >
> > > Thanks!
> > >
> > >
> > >
> > > On Mon, 03 Jul 2017 08:04:36 +0000
> > > Jaap Karssenberg <jaap.karssenberg@xxxxxxxxx> wrote:
> > >
> > > > Yes, zim does indeed now build a tabel of all files in the
> > > > notebook folder, not just text files. However it doesn't access
> > > > them, it just stores file names and mtime.
> > > >
> > > > Despite this change, the indexing is faster than with 0.65 in most
> > > > of my test cases. The behavior you describe suggest a huge amount
> > > > of files under the notebook folder, is this the case?
> > > >
> > > > -- Jaap
> > > >
> > > > On Sun, Jul 2, 2017 at 8:43 PM <HawKing@xxxxxxxxxxxxx> wrote:
> > > >
> > > > > The notebooks that used to take me about 5 minutes to re-index
> > > > > are taking close to 40 hours for me (they are larger notebooks).
> > > > >
> > > > > It looks like the sql database is indexing every file under the
> > > > > root directory of the notebook, even those not associated with
> > > > > Zim directly, like zip or data files. I'm not sure if that was
> > > > > happening with earlier versions.
> > > > >
> > > > >
> > > > >
> > > > > On Sat, 1 Jul 2017 23:32:16 +0200
> > > > > Olivier Boesch <boesch@xxxxxxx> wrote:
> > > > >
> > > > > > 6 minutes to reindex. pretty long in comparison with the 0.65.
> > > > > >
> > > > > >
> > > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit :
> > > > > > >
> > > > > > > I seem to experience the same issue...
> > > > > > >
> > > > > > > I clicked the "cancel" button after several minutes...
> > > > > > >
> > > > > > > testing now how long it takes to re-index...
> > > > > > >
> > > > > > >
> > > > > > > Le 01/07/2017 à 23:04, HawKing@xxxxxxxxxxxxx a écrit :
> > > > > > >> After this latest upgrade came through (it looks great),
> > > > > > >> notebooks that took me several minutes to re-index are now
> > > > > > >> taking multiple days of time, and it seems like an
> > > > > > >> exponential slowdown with the number (and maybe size) of
> > > > > > >> files under the notebook root directory. Has anyone else
> > > > > > >> experienced this?
> > > > > > >>
> > > > > > >>
> > > > > > >> _______________________________________________
> > > > > > >> Mailing list:https://launchpad.net/~zim-wiki
> > > > > > >> Post to     :zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > > >> Unsubscribe :https://launchpad.net/~zim-wiki
> > > > > > >> More help   :https://help.launchpad.net/ListHelp
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > > > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > > > > More help   : https://help.launchpad.net/ListHelp
> > > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > > More help   : https://help.launchpad.net/ListHelp
> > > > >
> > >
> > >
> > > _______________________________________________
> > > Mailing list: https://launchpad.net/~zim-wiki
> > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > More help   : https://help.launchpad.net/ListHelp
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~zim-wiki
> > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~zim-wiki
> > More help   : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~zim-wiki
> Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~zim-wiki
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References