← Back to team overview

zim-wiki team mailing list archive

Re: Major slowdown in indexing with upgrade to 67rc2

 

At brief glance, I'm guessing this is the issue:

https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/d96b3509890f4c9b9af9119f64b64947337d8da7/zim/notebook/index/files.py
line 89

 def _update_iter_inner(self, prefix=''):
		# sort folders before files: first index structure,
	then contents # this makes e.g. index links more efficient and
	robust # sort by id to ensure parents are found before children
		while True:
			row = self.db.execute(
				'SELECT id, path, node_type FROM files'
				' WHERE index_status = ? AND path
	LIKE ?' ' ORDER BY node_type, id',
				(STATUS_NEED_UPDATE, prefix + '%')
			).fetchone()

			if row:
				node_id, path, node_type = row
				#print ">> UPDATE", node_id, path,
			node_type
			 else:
				break

It seems like the whole database is being re-loaded and re-ordered
again for the import of every single file. As file number in a notebook
increases, this per-file database operation seems not to scale
linearly, but some much higher order. Something like globbing for the
entire notebook-subdirectory structure and then db importing on a loop
through that glob would be vastly more efficient for large file numbers.



On Sun, 25 Mar 2018 14:30:33 -0500
<HawKing@xxxxxxxxxxxxx> wrote:

> Would you mind pointing me to the source file(s) that manage this
> indexing? I'd like to see if there is any way to speed the process
> up for large numbers of files. 
> 
> 
> 
> On Mon, 3 Jul 2017 17:42:07 +0000
> <HawKing@xxxxxxxxxxxxx> wrote:
> 
> > Yes, for medium sized notebooks, and those with a "normal" amount of
> > files, indexing is still under 5 minutes. I also use Zim to manage a
> > notebook under which there are lots of small work data files  
> > (>350,000).       
> > 
> > The progress bar suggests there is some part of the parsing process
> > that slows down over time, as does a cursory check on the contents
> > of the database updating over time. There are many more files added
> > within the first few minutes, and many fewer over time, such that
> > after a while, only one or two files are added ever several minutes.
> > It suggests to me that the whole list is being re-processed or
> > re-opened as part of the indexing loop, perhaps re-opening the
> > sqlite file for every new file or something. Ultimately, I don't
> > think that exponential slowdown is a necessity, but I have not had
> > a free moment to familiarize myself with the source yet. 
> > 
> > Thanks!
> > 
> > 
> > 
> > On Mon, 03 Jul 2017 08:04:36 +0000
> > Jaap Karssenberg <jaap.karssenberg@xxxxxxxxx> wrote:
> >   
> > > Yes, zim does indeed now build a tabel of all files in the
> > > notebook folder, not just text files. However it doesn't access
> > > them, it just stores file names and mtime.
> > > 
> > > Despite this change, the indexing is faster than with 0.65 in most
> > > of my test cases. The behavior you describe suggest a huge amount
> > > of files under the notebook folder, is this the case?
> > > 
> > > -- Jaap
> > > 
> > > On Sun, Jul 2, 2017 at 8:43 PM <HawKing@xxxxxxxxxxxxx> wrote:
> > >     
> > > > The notebooks that used to take me about 5 minutes to re-index
> > > > are taking close to 40 hours for me (they are larger notebooks).
> > > >
> > > > It looks like the sql database is indexing every file under the
> > > > root directory of the notebook, even those not associated with
> > > > Zim directly, like zip or data files. I'm not sure if that was
> > > > happening with earlier versions.
> > > >
> > > >
> > > >
> > > > On Sat, 1 Jul 2017 23:32:16 +0200
> > > > Olivier Boesch <boesch@xxxxxxx> wrote:
> > > >      
> > > > > 6 minutes to reindex. pretty long in comparison with the 0.65.
> > > > >
> > > > >
> > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit :      
> > > > > >
> > > > > > I seem to experience the same issue...
> > > > > >
> > > > > > I clicked the "cancel" button after several minutes...
> > > > > >
> > > > > > testing now how long it takes to re-index...
> > > > > >
> > > > > >
> > > > > > Le 01/07/2017 à 23:04, HawKing@xxxxxxxxxxxxx a écrit :      
> > > > > >> After this latest upgrade came through (it looks great),
> > > > > >> notebooks that took me several minutes to re-index are now
> > > > > >> taking multiple days of time, and it seems like an
> > > > > >> exponential slowdown with the number (and maybe size) of
> > > > > >> files under the notebook root directory. Has anyone else
> > > > > >> experienced this?
> > > > > >>
> > > > > >>
> > > > > >> _______________________________________________
> > > > > >> Mailing list:https://launchpad.net/~zim-wiki
> > > > > >> Post to     :zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > >> Unsubscribe :https://launchpad.net/~zim-wiki
> > > > > >> More help   :https://help.launchpad.net/ListHelp      
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > > > More help   : https://help.launchpad.net/ListHelp      
> > > > >      
> > > >
> > > >
> > > > _______________________________________________
> > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > More help   : https://help.launchpad.net/ListHelp
> > > >      
> > 
> > 
> > _______________________________________________
> > Mailing list: https://launchpad.net/~zim-wiki
> > Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~zim-wiki
> > More help   : https://help.launchpad.net/ListHelp  
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~zim-wiki
> Post to     : zim-wiki@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~zim-wiki
> More help   : https://help.launchpad.net/ListHelp



Follow ups

References