← Back to team overview

zim-wiki team mailing list archive

Re: Major slowdown in indexing with upgrade to 67rc2

 

Thank you for the suggestion, and sorry for the delay. I did try
removing each of those, but it looks like they may be part of the
termination condition for building an empty index, and so with either or
both edits, the notebooks just open with no index having being built. 



On Tue, 27 Mar 2018 08:37:06 +0000
Jaap Karssenberg <jaap.karssenberg@xxxxxxxxx> wrote:

> You could try removing the "ORDER BY" and/or "path LIKE" pieces of the
> query, see what that does for your performance.
> 
> If this turns out to help, I can split the loop into several while
> yielding in between to make the rest of the application "tick"
> 
> -- Jaap
> 
> 
> On Sun, Mar 25, 2018 at 11:14 PM <HawKing@xxxxxxxxxxxxx> wrote:
> 
> > At brief glance, I'm guessing this is the issue:
> >
> >
> > https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/d96b3509890f4c9b9af9119f64b64947337d8da7/zim/notebook/index/files.py
> > line 89
> >
> >  def _update_iter_inner(self, prefix=''):
> >                 # sort folders before files: first index structure,
> >         then contents # this makes e.g. index links more efficient
> > and robust # sort by id to ensure parents are found before children
> >                 while True:
> >                         row = self.db.execute(
> >                                 'SELECT id, path, node_type FROM
> > files' ' WHERE index_status = ? AND path
> >         LIKE ?' ' ORDER BY node_type, id',
> >                                 (STATUS_NEED_UPDATE, prefix + '%')
> >                         ).fetchone()
> >
> >                         if row:
> >                                 node_id, path, node_type = row
> >                                 #print ">> UPDATE", node_id, path,
> >                         node_type
> >                          else:
> >                                 break
> >
> > It seems like the whole database is being re-loaded and re-ordered
> > again for the import of every single file. As file number in a
> > notebook increases, this per-file database operation seems not to
> > scale linearly, but some much higher order. Something like globbing
> > for the entire notebook-subdirectory structure and then db
> > importing on a loop through that glob would be vastly more
> > efficient for large file numbers.
> >
> >
> >
> > On Sun, 25 Mar 2018 14:30:33 -0500
> > <HawKing@xxxxxxxxxxxxx> wrote:
> >  
> > > Would you mind pointing me to the source file(s) that manage this
> > > indexing? I'd like to see if there is any way to speed the process
> > > up for large numbers of files.
> > >
> > >
> > >
> > > On Mon, 3 Jul 2017 17:42:07 +0000
> > > <HawKing@xxxxxxxxxxxxx> wrote:
> > >  
> > > > Yes, for medium sized notebooks, and those with a "normal"
> > > > amount of files, indexing is still under 5 minutes. I also use
> > > > Zim to manage a notebook under which there are lots of small
> > > > work data files (>350,000).  
> > > >
> > > > The progress bar suggests there is some part of the parsing
> > > > process that slows down over time, as does a cursory check on
> > > > the contents of the database updating over time. There are many
> > > > more files added within the first few minutes, and many fewer
> > > > over time, such that after a while, only one or two files are
> > > > added ever several minutes. It suggests to me that the whole
> > > > list is being re-processed or re-opened as part of the indexing
> > > > loop, perhaps re-opening the sqlite file for every new file or
> > > > something. Ultimately, I don't think that exponential slowdown
> > > > is a necessity, but I have not had a free moment to familiarize
> > > > myself with the source yet.
> > > >
> > > > Thanks!
> > > >
> > > >
> > > >
> > > > On Mon, 03 Jul 2017 08:04:36 +0000
> > > > Jaap Karssenberg <jaap.karssenberg@xxxxxxxxx> wrote:
> > > >  
> > > > > Yes, zim does indeed now build a tabel of all files in the
> > > > > notebook folder, not just text files. However it doesn't
> > > > > access them, it just stores file names and mtime.
> > > > >
> > > > > Despite this change, the indexing is faster than with 0.65 in
> > > > > most of my test cases. The behavior you describe suggest a
> > > > > huge amount of files under the notebook folder, is this the
> > > > > case?
> > > > >
> > > > > -- Jaap
> > > > >
> > > > > On Sun, Jul 2, 2017 at 8:43 PM <HawKing@xxxxxxxxxxxxx> wrote:
> > > > >  
> > > > > > The notebooks that used to take me about 5 minutes to
> > > > > > re-index are taking close to 40 hours for me (they are
> > > > > > larger notebooks).
> > > > > >
> > > > > > It looks like the sql database is indexing every file under
> > > > > > the root directory of the notebook, even those not
> > > > > > associated with Zim directly, like zip or data files. I'm
> > > > > > not sure if that was happening with earlier versions.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, 1 Jul 2017 23:32:16 +0200
> > > > > > Olivier Boesch <boesch@xxxxxxx> wrote:
> > > > > >  
> > > > > > > 6 minutes to reindex. pretty long in comparison with the
> > > > > > > 0.65.
> > > > > > >
> > > > > > >
> > > > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit :  
> > > > > > > >
> > > > > > > > I seem to experience the same issue...
> > > > > > > >
> > > > > > > > I clicked the "cancel" button after several minutes...
> > > > > > > >
> > > > > > > > testing now how long it takes to re-index...
> > > > > > > >
> > > > > > > >
> > > > > > > > Le 01/07/2017 à 23:04, HawKing@xxxxxxxxxxxxx a écrit :  
> > > > > > > >> After this latest upgrade came through (it looks
> > > > > > > >> great), notebooks that took me several minutes to
> > > > > > > >> re-index are now taking multiple days of time, and it
> > > > > > > >> seems like an exponential slowdown with the number
> > > > > > > >> (and maybe size) of files under the notebook root
> > > > > > > >> directory. Has anyone else experienced this?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> _______________________________________________
> > > > > > > >> Mailing list:https://launchpad.net/~zim-wiki
> > > > > > > >> Post to     :zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > > > >> Unsubscribe :https://launchpad.net/~zim-wiki
> > > > > > > >> More help   :https://help.launchpad.net/ListHelp  
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > > > > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > > > > > More help   : https://help.launchpad.net/ListHelp  
> > > > > > >  
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > > > More help   : https://help.launchpad.net/ListHelp
> > > > > >  
> > > >
> > > >
> > > > _______________________________________________
> > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > More help   : https://help.launchpad.net/ListHelp  
> > >
> > >
> > > _______________________________________________
> > > Mailing list: https://launchpad.net/~zim-wiki
> > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > More help   : https://help.launchpad.net/ListHelp  
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~zim-wiki
> > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~zim-wiki
> > More help   : https://help.launchpad.net/ListHelp
> >  



References